More about HKUST
Towards High-performance Datacenter Systems with Application-oriented Optimizations
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Towards High-performance Datacenter Systems with Application-oriented Optimizations" By Mr. Chaoliang ZENG Abstract: In recent decades, we have witnessed extensive construction of datacenters and widespread deployments of various applications. With the rapid rise of Internet services and cloud computing but the slowdown of Moore's law and Dennard scaling, there is a conflict between expanding application requirements and slow evolutions of general-purpose processors. Therefore, it is critical to build high-performance datacenter systems with application-oriented optimizations. This thesis describes my research efforts in building high-performance datacenter systems with careful exploitation of application-specific characteristics and hardware architectures. Specifically, we explore three application-oriented datacenter systems. First, we present Herald, a runtime embedding scheduler, for efficient cache-enabled recommendation model training. Herald fully exploits the predictability and occasionality of embedding cache access to reduce the embedding transmissions between caches and PS during training. We believe that the scheduling philosophy of Herald can be generally extended to the training of embedding models. Second, we study the embedding-based retrieval algorithm from the first principles and derive a practically ideal architecture for optimal performance. Based on the derived architecture, we propose FAERY for high-performance embedding-based retrieval running on FPGA. FAERY leverages appropriate parallel techniques to orchestrate key operators in embedding-based retrieval, so that FAERY can outperform CPU- and GPU-based approaches. Although FAERY is a domain-specific accelerator for retrieval in recommendation systems, we believe similar optimization techniques can be applied to systems bounded by memory and computation. Third, we design Tiara, a three-tier hardware architecture to accelerate stateful layer-4 load balancing. Tiara makes the best use of heterogeneous hardware by decoupling the load balancing function. As a result, Tiara can provide high performance with cost, energy, and space efficiency. We believe Tiara three-tier architecture is generic and can benefit more datacenter gateway functions. Date: Tuesday, 18 July 2023 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Chairman: Prof. Shiheng WANG (ACCT) Committee Members: Prof. Kai CHEN (Supervisor) Prof. Gary CHAN Prof. Dan XU Prof. Jun ZHANG (ECE) Prof. Hong XU (CUHK) **** ALL are Welcome ****