More about HKUST
Safe, Scalable Transfer with Dynamic Experts: Mitigating Negative Transfer
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Safe, Scalable Transfer with Dynamic Experts: Mitigating Negative
Transfer"
By
Mr. Zhili LIU
Abstract:
Large-scale pre-training has become a central paradigm in modern machine
learning, enabling models to acquire broadly reusable knowledge from massive
and heterogeneous data. However, transfer is not always beneficial: knowledge
learned during pre-training may interfere with downstream adaptation or give
rise to unsafe and undesired behaviors. This thesis studies such failures
through the lens of negative transfer and argues that, in modern pre-training,
negative transfer manifests in two closely related forms: semantic
interference in representation learning and implicit harmful knowledge
transfer in generative models.
To address these challenges, this thesis develops a sequence of methods that
progressively move from post-hoc mitigation to training-time and architectural
specialization. First, it studies implicit harmful knowledge transfer in
diffusion models and proposes Geom-Erasing, a post-hoc framework that
selectively removes implicitly acquired harmful concepts without retraining
the entire model. Second, it investigates semantic interference in
self-supervised representation learning and introduces Scalable Dynamic
Routing (SDR), a task-customized pre-training framework that reduces harmful
interference through structural specialization across semantically distinct
data subsets. Third, it proposes Mixture of Cluster-Conditional Experts
(MoCE), which extends this idea to a mixture-of-experts framework and improves
selective knowledge reuse through finer-grained expert allocation. Finally, it
extends the specialization perspective to self-alignment in large language
models through Mixture of insighTful Experts (MoTE), showing that structured
expert allocation and coordinated reasoning can further improve safety,
robustness, and controllability.
Across diffusion models, self-supervised representation learning, and LLM
self-alignment, the proposed methods consistently show that structured
specialization can mitigate harmful or ineffective transfer.
Taken together, these studies suggest that safe and scalable transfer depends
not only on learning shared representations at scale, but also on structuring
how knowledge is allocated, specialized, suppressed, and reused under
heterogeneous downstream requirements.
Date: Tuesday, 19 May 2026
Time: 10:00am - 12:00noon
Venue: Room 2128A
Lift 19
Chairman: Dr. Ding PAN (PHYS)
Committee Members: Prof. James KWOK (Supervisor)
Dr. Long CHEN
Dr. Dan XU
Prof. Can YANG (MATH)
Prof. Sinno Jialin PAN (CUHK)