Faster priority ordered irregular parallelism through hardware and software

Speaker: Professor Mark C. Jeffrey
         University of Toronto

Title:  "Faster priority ordered irregular parallelism through hardware
and software"

Date:    Monday, 30 September 2024

Time:    4:00pm - 5:00pm

Venue:   Lecture Theater F
         (Leung Yat Sing Lecture Theater)
         near lift 25/26, HKUST

Abstract:

While parallel regular algorithms thrive on ever larger manycores, GPUs,
and accelerators, parallel irregular algorithms has struggled to improve.
We focus on two challenges facing irregular parallelism: run-time
efficiency and programmer productivity.

First, many irregular algorithms schedule their work, or tasks, according
to a priority order for correctness or faster convergence. Prior software
and hardware systems for ordered irregular parallelism compromise on
parallelism, work efficiency, or low-level overheads, leading to missed
performance opportunities. On the hardware front, we present Hive, a
task-based execution model and speculative manycore architecture that
extracts abundant fine-grain parallelism from algorithms that demand the
priority update operation, a lesser-known but vital tool beyond push and
pop. On software (C++) front, we present the Multi Bucket Queue, which
targets a promising unexplored point in the design space for concurrent
priority scheduling. The MBQ leverages the strengths of the MultiQueue and
Multi-Level Bucket Queue, while avoiding their weaknesses, yielding a
priority scheduler that keeps threads busy and running useful work, yet
with high-efficiency queue operations. Hive improves performance at 256
cores by up to 2.8x over the next best hardware solution while the MBQ is
competitive with or outperforms prior software schedulers at 48 cores.

Second, the Rust programming language is lauded for enabling fearless
concurrency with zero cost: detecting concurrency errors at compile time.
Given the enduring difficulty of irregular parallel programming in other
languages, this implied panacea warrants analysis. In particular, the
efficacy of Rust across types of parallelism remains unexplored. Is
parallel programming always devoid of fear with Rust? We answer this
question through a case study, porting 14 benchmarks with abundant regular
and irregular parallelism from C++ to Rust and reporting our experience
and observations. We find that Rust with its libraries delivers
fearlessness for program phases comprising only regular parallelism.
However, for applications with any irregular parallelism, the programmer
must choose between unsafe code or high-overhead dynamic checks with
errors that manifest at run time, leaving the arduous task of parallel
programming as scary with Rust as with its predecessors.


****************
Biography:

Mark Jeffrey is an Assistant Professor at the University of Toronto in the
Edward S. Rogers Sr. Department of Electrical and Computer Engineering and
is cross-appointed to the Department of Computer Science. He earned the
PhD degree from the Massachusetts Institute of Technology and the MASc and
BASc degrees from the University of Toronto. Mark's research interests are
in the areas of computer architecture, computer systems, and parallelism.
Prior to joining the University of Toronto, he was a Research Scientist at
Facebook AI Research and has also worked in software engineering at
Google, Epson, and a Y-combinator startup called AeroFS. His work has been
recognized with an IEEE Micro Top Picks award and honorable mention and a
Best Paper Award at the International Workshop for Network on Chip
Architectures.