More about HKUST
Faster priority ordered irregular parallelism through hardware and software
Speaker: Professor Mark C. Jeffrey University of Toronto Title: "Faster priority ordered irregular parallelism through hardware and software" Date: Monday, 30 September 2024 Time: 4:00pm - 5:00pm Venue: Lecture Theater F (Leung Yat Sing Lecture Theater) near lift 25/26, HKUST Abstract: While parallel regular algorithms thrive on ever larger manycores, GPUs, and accelerators, parallel irregular algorithms has struggled to improve. We focus on two challenges facing irregular parallelism: run-time efficiency and programmer productivity. First, many irregular algorithms schedule their work, or tasks, according to a priority order for correctness or faster convergence. Prior software and hardware systems for ordered irregular parallelism compromise on parallelism, work efficiency, or low-level overheads, leading to missed performance opportunities. On the hardware front, we present Hive, a task-based execution model and speculative manycore architecture that extracts abundant fine-grain parallelism from algorithms that demand the priority update operation, a lesser-known but vital tool beyond push and pop. On software (C++) front, we present the Multi Bucket Queue, which targets a promising unexplored point in the design space for concurrent priority scheduling. The MBQ leverages the strengths of the MultiQueue and Multi-Level Bucket Queue, while avoiding their weaknesses, yielding a priority scheduler that keeps threads busy and running useful work, yet with high-efficiency queue operations. Hive improves performance at 256 cores by up to 2.8x over the next best hardware solution while the MBQ is competitive with or outperforms prior software schedulers at 48 cores. Second, the Rust programming language is lauded for enabling fearless concurrency with zero cost: detecting concurrency errors at compile time. Given the enduring difficulty of irregular parallel programming in other languages, this implied panacea warrants analysis. In particular, the efficacy of Rust across types of parallelism remains unexplored. Is parallel programming always devoid of fear with Rust? We answer this question through a case study, porting 14 benchmarks with abundant regular and irregular parallelism from C++ to Rust and reporting our experience and observations. We find that Rust with its libraries delivers fearlessness for program phases comprising only regular parallelism. However, for applications with any irregular parallelism, the programmer must choose between unsafe code or high-overhead dynamic checks with errors that manifest at run time, leaving the arduous task of parallel programming as scary with Rust as with its predecessors. **************** Biography: Mark Jeffrey is an Assistant Professor at the University of Toronto in the Edward S. Rogers Sr. Department of Electrical and Computer Engineering and is cross-appointed to the Department of Computer Science. He earned the PhD degree from the Massachusetts Institute of Technology and the MASc and BASc degrees from the University of Toronto. Mark's research interests are in the areas of computer architecture, computer systems, and parallelism. Prior to joining the University of Toronto, he was a Research Scientist at Facebook AI Research and has also worked in software engineering at Google, Epson, and a Y-combinator startup called AeroFS. His work has been recognized with an IEEE Micro Top Picks award and honorable mention and a Best Paper Award at the International Workshop for Network on Chip Architectures.