More about HKUST
Efficient Multi-Objective Optimization for Deep Learning
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Efficient Multi-Objective Optimization for Deep Learning"
By
Mr. Weiyu CHEN
Abstract:
While traditional deep learning optimizes for single objectives, real-world
applications like multi-task learning and Large Language Models (LLMs)
require balancing conflicting goals. This thesis addresses this challenge
by developing efficient algorithms for Multi-Objective Optimization (MOO) in
deep learning.
First, we address the limitations of fixed reference vectors in
gradient-based MOO, which often fail to approximate the Pareto front
uniformly. We propose a novel algorithm that treats reference vector
determination as a bilevel optimization problem. By learning these vectors
adaptively, the method dynamically adapts to the Pareto front's geometry,
yielding more accurate and diverse discrete solutions than static methods.
Second, we introduce a method to learn the entire continuous Pareto
manifold, generating infinite trade-off solutions. Our architecture
combines a shared main network with multiple low-rank matrices. This design
maximizes feature sharing while capturing task-specific differences,
significantly outperforming baselines in many-objective scenarios.
Third, we formulate model merging as a MOO problem. Current techniques
produce "one-size-fits-all" models that ignore diverse user preferences. We
introduce Pareto merging, a framework utilizing a preference-independent
base model and a preference-dependent personalized model via low-rank
tensors. This learns a full Pareto set in a single optimization, allowing
users to select merged models aligned with their specific preferences
without accessing training data.
Finally, we address LLM efficiency with multi-objective one-shot pruning.
Moving beyond single-objective pruning based on single reconstruction error,
we propose a dual ADMM approach. This identifies a common core of weights
while allowing task-specific adjustments, generating a Pareto set of sparse
models. Users can thus navigate trade-offs between capabilities (e.g.,
reasoning vs. coding) without re-pruning costs.
Collectively, these contributions provide powerful methodologies for
creating flexible, personalized, and efficient models for critical deep
learning applications.
Date: Friday, 16 January 2026
Time: 9:30am - 11:30am
Venue: Room 5501
Lifts 25-26
Chairman: Prof. Chik Patrick YUE (ECE)
Committee Members: Prof. James KWOK (Supervisor)
Prof. Raymond WONG
Dr. Dan XU
Prof. Pan HUI (EMIA)
Dr. Weijie ZHENG (HIT)