More about HKUST
A Survey on Knowledge Transfer from Large Language Models to Small Language Models
PhD Qualifying Examination
Title: "A Survey on Knowledge Transfer from Large Language Models to Small
Language Models"
by
Mr. Shuoling LIU
Abstract:
Large Language Models (LLMs) demonstrate exceptional generalization,
reasoning, and in-context learning abilities, yet their high computational
cost, limited privacy protection, and inconsistent domain-specific
performance hinder deployment in resource-constrained or privacy-sensitive
settings. Small Language Models (SLMs), by contrast, offer efficient,
privacy-preserving on-device inference and strong domain adaptability, but
lack the broad knowledge coverage and deep reasoning of LLMs. This disparity
motivates the study of LLM-to-SLM knowledge transfer, which aiming to endow
SLMs with enhanced effectiveness while preserving their inherent efficiency
and privacy advantages.
This survey formulates the LLM-to-SLM transfer problem and proposes a
taxonomy based on the form of transferred knowledge: parameter-based,
representation-based, and data-based methods. We analyze each paradigm along
three dimensions, including effectiveness, efficiency, and privacy,
highlighting trade-offs and deployment constraints. Parameter-based
approaches (e.g., pruning, quantization, adapters) directly operate on model
weights; representation-based methods align intermediate features under
white-, grey-, or black-box access levels; data-based approaches synthesize
datasets or structured knowledge for fine-tuning or retrieval-augmented
inference, enabling transfer even under API-only black-box conditions. Given
the increasing prevalence of API-based deployments, where access to
parameters or representations is infeasible, we place particular emphasis on
data-based transfer. We discuss its current limitations, such as the quality
control of synthetic data, the integration with retrieval systems, and
incomplete privacy safeguards, and outline future directions, including
multi-teacher data synthesis, compression-aware retrieval integration,
end-to-end privacy-preserving pipelines and bidirectional knowledge transfer
frameworks. By consolidating recent advances and open challenges, this
survey provides a structured foundation for developing SLMs that combine
LLM-level utility with practical deployment efficiency and privacy
guarantees.
Date: Friday, 10 October 2025
Time: 2:00pm - 4:00pm
Venue: Room 5501
Lifts 25/26
Committee Members: Prof. Qiang Yang (Supervisor)
Prof. Kai Chen (Co-supervisor)
Dr. Yangqiu Song (Chairperson)
Dr. Binhang Yuan