A reading list for iSING students
AI-Centric Networking
- Towards Domain-Specific Network Transport for Distributed DNN Training, NSDI 2024
- A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters, OSDI 2020
- A Deep Reinforcement Learning Perspective on Internet Congestion Control, ICML 2019
- A Generic Communication Scheduler for Distributed DNN Training Acceleration, SOSP 2019
- Priority-based Parameter Propagation for Distributed DNN Training, SysML 2019
Host Networking
DCN Transport
- PLB: congestion signals are simple and effective for network load balancing, SIGCOMM 2022
- Swift: Delay is Simple and Effective for Congestion Control in the Datacenter, SIGCOMM 2020
- HPCC: High Precision Congestion Control, SIGCOMM 2019
- Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities, SIGCOMM 2018
- Re-architecting datacenter networks and stacks for low latency and high performance, SIGCOMM 2017
- Credit-Scheduled Delay-Bounded Congestion Control for Datacenters, SIGCOMM 2017
- Scheduling Mix-flows in Commodity Datacenters with Karuna, SIGCOMM 2016
- CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark, SIGCOMM 2016
- Enabling ECN in Multi-Service Multi-Queue Data Centers, NSDI 2016
- Efficient Coflow Scheduling Without Prior Knowledge, SIGCOMM 2015
- Queues Don’t Matter When You Can JUMP Them!, NSDI 2015
- Information-Agnostic Flow Scheduling for Commodity Data Centers, NSDI 2015
- Decentralized Task-aware Scheduling for Data Center Networks, SIGCOMM 2014
- Efficient Coflow Scheduling with Varys, SIGCOMM 2014
- Friends, not Foes - Synthesizing Existing Transport Strategies for Data Center Networks, SIGCOMM 2014
- pFabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM 2013
- Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM 2012
- Deadline-Aware Datacenter TCP (D2TCP), SIGCOMM 2012
- Improving Datacenter Performance and Robustness with Multipath TCP, SIGCOMM 2011
- Better Never than Late: Meeting Deadlines in Datacenter Networks, SIGCOMM 2011
- Design, implementation and evaluation of congestion control for multipath TCP, NSDI 2011
- ICTCP: Incast Congestion Control for TCP in Data Center Networks, CoNEXT 2010
- Data Center TCP (DCTCP), SIGCOMM 2010
- Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication, SIGCOMM 2009
RDMA in DCN
- Revisiting Congestion Control for Lossless Ethernet, NSDI 2024
- Network Load Balancing with In-network Reordering Support for RDMA, SIGCOMM 2023
- SRNIC: A Scalable Architecture for RDMA NICs, NSDI 2023
- Empowering Azure Storage with RDMA, NSDI 2023
- Congestion Detection in Lossless Networks, SIGCOMM 2021
- Re-architecting Congestion Management in Lossless Ethernet, NSDI 2020
- Gentle Flow Control: Avoiding Deadlock in Lossless Networks, SIGCOMM 2019
- Revisiting Network Support for RDMA, SIGCOMM 2018
- RDMA over Commodity Ethernet at Scale, SIGCOMM 2016
- TIMELY: RTT-based Congestion Control for the Datacenter, SIGCOMM 2015
- Congestion Control for Large-Scale RDMA deployments, SIGCOMM 2015
DCN Structure
- TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs, NSDI 2024
- Enabling Wide-spread Communications on Optical Fabric with MegaSwitch, NSDI 2017
- Quartz: A New Design Element for Low-Latency DCNs, SIGCOMM 2014
- Circuit Switching Under the Radar with REACToR, NSDI 2014
- Integrating Microsecond Circuit Switching into the Data Center, SIGCOMM 2013
- OSA: An Optical Switching Architecture for Data Center Networks with Unprecedented Flexibility, NSDI 2012
- Jellyfish: Networking Data Centers Randomly, NSDI 2012
- c-Through: Part-time Optics in Data Centers, SIGCOMM 2010
- Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers, SIGCOMM 2010
- SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies, NSDI 2010
- BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers, SIGCOMM 2009
- VL2: A Scalable and Flexible Data Center Network, SIGCOMM 2009
- PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric, SIGCOMM 2009
- DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers, SIGCOMM 2008
- A Scalable, Commodity Data Center Network Architecture, SIGCOMM 2008
Back to Kai Chen's home page.