More about HKUST
Systems Software in the New Computing World
Speaker: Dr. Diyu ZHOU EPFL Title: "Systems Software in the New Computing World" Date: Thursday, 9 March 2023 Time: 3:00pm - 4:00pm HKT Zoom link: https://hkust.zoom.us/j/465698645?pwd=aVRaNWs2RHNFcXpnWGlkR05wTTk3UT09 Meeting ID: 465 698 645 Passcode: 20222023 Abstract: Exponential growth in users, requests, and data poses an ever increasing demand on the performance of today's data centers. This challenge has resulted in two major trends. First, data centers scale out the computation by leveraging multicore architecture and deploying more servers. Second, ultra-fast storage devices are developed to meet the exponential growth in data. Unfortunately, traditional systems software is a poor fit for these trends, rendering applications unable to realize the potential of these developments. In this talk, I will present my work on designing modern system software to exploit these computing trends by supporting three critical application requirements: I/O efficiency, multicore scalability, and practical reliability. I will first present OdinFS, a high-performance and scalable file system for emerging non-volatile memory (NVM). By taking into account the unique characteristics of NVM, OdinFS scales to hundreds of cores and achieves tens to hundreds of times better performance than prior state of the art. I will next present RRC, an application-transparent replication system for commercial off-the-shelf containers. RRC incurs latency overhead up to 75x lower than competitive schemes, while also achieving significantly lower throughput overhead, thus enabling practical deployment for critical server applications. ******************* Biography: Diyu Zhou is postdoctoral researcher at EPFL. He completed his Ph.D. at UCLA advised by Yuval Tamir. His research focuses on building high-performance, scalable, and reliable computer systems. Specifically, he has developed I/O stacks to support modern storage devices, devised frameworks and algorithms for synchronization primitives to scale to massive multi-core machines, found and fixed concurrency bugs, and designed practical fault tolerance mechanisms for modern systems.