Optimizing Data Management for Scalable Graph Neural Network Training

Speaker: Prof. Ruben Mayer
Department of Computer Science
University of Bayreuth, Germany

Title: "Optimizing Data Management for Scalable Graph Neural Network 
Training"

Date:   Tuesday, 3 December 2024

Time:   10:00am - 11:00am

Venue:  Room 4472 (via lift 25/26), HKUST

Abstract:


Graph Neural Networks (GNNs) are a versatile and powerful architecture
for machine learning  on graph-structured data, enabling  tasks at the
node,  edge, and  graph  levels. GNNs  have  broad applications,  from
social networks  and web  graphs to knowledge  graphs, protein-protein
interactions, and  product recommendations. However, training  GNNs on
large-scale, real-world  graphsoften containing billions of  nodes and
edgespresents  significant  computational challenges.  Efficient  data
management   is  essential   to   make  GNN   training  scalable   and
cost-effective,  particularly through  strategies  that optimize  data
locality during processing.

In this  talk, I will discuss  our recent work on  three critical data
management  strategies that  enhance  GNN  training efficiency:  graph
partitioning,  ordering, and  sampling. Each  of these  areas involves
solving complex  graph-theoretic problems while addressing  the unique
demands of GNN  training pipelines. Our research  highlights how these
data management techniques can be tailored to overcome the scalability
bottlenecks associated with GNNs.


**************
Biography:

Ruben Mayer  is a Professor of  Computer Science at the  University of
Bayreuth, Germany, where he leads the Data Systems group. His research
focuses on data  management in distributed systems,  with a particular
emphasis  on supporting  machine  learning  workloads. Currently,  his
projects explore efficient data systems  for graph neural networks and
federated  machine  learning.  Driven  by a  commitment  to  advancing
scalability  and  efficiency,  Dr.   Mayer’s  work  aims  to  optimize
distributed systems through innovative data management strategies.