Efficient Multi-Objective Optimization for Deep Learning

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Efficient Multi-Objective Optimization for Deep Learning"

By

Mr. Weiyu CHEN


Abstract:

While traditional deep learning optimizes for single objectives, real-world
applications like multi-task learning and Large Language Models (LLMs)
require balancing conflicting goals. This thesis addresses this challenge
by developing efficient algorithms for Multi-Objective Optimization (MOO) in
deep learning.

First, we address the limitations of fixed reference vectors in
gradient-based MOO, which often fail to approximate the Pareto front
uniformly. We propose a novel algorithm that treats reference vector
determination as a bilevel optimization problem. By learning these vectors
adaptively, the method dynamically adapts to the Pareto front's geometry,
yielding more accurate and diverse discrete solutions than static methods.

Second, we introduce a method to learn the entire continuous Pareto
manifold, generating infinite trade-off solutions. Our architecture
combines a shared main network with multiple low-rank matrices. This design
maximizes feature sharing while capturing task-specific differences,
significantly outperforming baselines in many-objective scenarios.

Third, we formulate model merging as a MOO problem. Current techniques
produce "one-size-fits-all" models that ignore diverse user preferences. We
introduce Pareto merging, a framework utilizing a preference-independent
base model and a preference-dependent personalized model via low-rank
tensors. This learns a full Pareto set in a single optimization, allowing
users to select merged models aligned with their specific preferences
without accessing training data.

Finally, we address LLM efficiency with multi-objective one-shot pruning.
Moving beyond single-objective pruning based on single reconstruction error,
we propose a dual ADMM approach. This identifies a common core of weights
while allowing task-specific adjustments, generating a Pareto set of sparse
models. Users can thus navigate trade-offs between capabilities (e.g.,
reasoning vs. coding) without re-pruning costs.

Collectively, these contributions provide powerful methodologies for
creating flexible, personalized, and efficient models for critical deep
learning applications.


Date:                   Friday, 16 January 2026

Time:                   9:30am - 11:30am

Venue:                  Room 5501
                        Lifts 25-26

Chairman:               Prof. Chik Patrick YUE (ECE)

Committee Members:      Prof. James KWOK (Supervisor)
                        Prof. Raymond WONG
                        Dr. Dan XU
                        Prof. Pan HUI (EMIA)
                        Dr. Weijie ZHENG (HIT)