More about HKUST
A Survey on Knowledge Transfer from Large Language Models to Small Language Models
PhD Qualifying Examination Title: "A Survey on Knowledge Transfer from Large Language Models to Small Language Models" by Mr. Shuoling LIU Abstract: Large Language Models (LLMs) demonstrate exceptional generalization, reasoning, and in-context learning abilities, yet their high computational cost, limited privacy protection, and inconsistent domain-specific performance hinder deployment in resource-constrained or privacy-sensitive settings. Small Language Models (SLMs), by contrast, offer efficient, privacy-preserving on-device inference and strong domain adaptability, but lack the broad knowledge coverage and deep reasoning of LLMs. This disparity motivates the study of LLM-to-SLM knowledge transfer, which aiming to endow SLMs with enhanced effectiveness while preserving their inherent efficiency and privacy advantages. This survey formulates the LLM-to-SLM transfer problem and proposes a taxonomy based on the form of transferred knowledge: parameter-based, representation-based, and data-based methods. We analyze each paradigm along three dimensions, including effectiveness, efficiency, and privacy, highlighting trade-offs and deployment constraints. Parameter-based approaches (e.g., pruning, quantization, adapters) directly operate on model weights; representation-based methods align intermediate features under white-, grey-, or black-box access levels; data-based approaches synthesize datasets or structured knowledge for fine-tuning or retrieval-augmented inference, enabling transfer even under API-only black-box conditions. Given the increasing prevalence of API-based deployments, where access to parameters or representations is infeasible, we place particular emphasis on data-based transfer. We discuss its current limitations, such as the quality control of synthetic data, the integration with retrieval systems, and incomplete privacy safeguards, and outline future directions, including multi-teacher data synthesis, compression-aware retrieval integration, end-to-end privacy-preserving pipelines and bidirectional knowledge transfer frameworks. By consolidating recent advances and open challenges, this survey provides a structured foundation for developing SLMs that combine LLM-level utility with practical deployment efficiency and privacy guarantees. Date: Friday, 10 October 2025 Time: 2:00pm - 4:00pm Venue: Room 5501 Lifts 25/26 Committee Members: Prof. Qiang Yang (Supervisor) Prof. Kai Chen (Co-supervisor) Dr. Yangqiu Song (Chairperson) Dr. Binhang Yuan