A Survey on Knowledge Transfer from Large Language Models to Small Language Models

PhD Qualifying Examination


Title: "A Survey on Knowledge Transfer from Large Language Models to Small
Language Models"

by

Mr. Shuoling LIU


Abstract:

Large Language Models (LLMs) demonstrate exceptional generalization,
reasoning, and in-context learning abilities, yet their high computational
cost, limited privacy protection, and inconsistent domain-specific
performance hinder deployment in resource-constrained or privacy-sensitive
settings. Small Language Models (SLMs), by contrast, offer efficient,
privacy-preserving on-device inference and strong domain adaptability, but
lack the broad knowledge coverage and deep reasoning of LLMs. This disparity
motivates the study of LLM-to-SLM knowledge transfer, which aiming to endow
SLMs with enhanced effectiveness while preserving their inherent efficiency
and privacy advantages.

This survey formulates the LLM-to-SLM transfer problem and proposes a
taxonomy based on the form of transferred knowledge: parameter-based,
representation-based, and data-based methods. We analyze each paradigm along
three dimensions, including effectiveness, efficiency, and privacy,
highlighting trade-offs and deployment constraints. Parameter-based
approaches (e.g., pruning, quantization, adapters) directly operate on model
weights; representation-based methods align intermediate features under
white-, grey-, or black-box access levels; data-based approaches synthesize
datasets or structured knowledge for fine-tuning or retrieval-augmented
inference, enabling transfer even under API-only black-box conditions. Given
the increasing prevalence of API-based deployments, where access to
parameters or representations is infeasible, we place particular emphasis on
data-based transfer. We discuss its current limitations, such as the quality
control of synthetic data, the integration with retrieval systems, and
incomplete privacy safeguards, and outline future directions, including
multi-teacher data synthesis, compression-aware retrieval integration,
end-to-end privacy-preserving pipelines and bidirectional knowledge transfer
frameworks. By consolidating recent advances and open challenges, this
survey provides a structured foundation for developing SLMs that combine
LLM-level utility with practical deployment efficiency and privacy
guarantees.


Date:                   Friday, 10 October 2025

Time:                   2:00pm - 4:00pm

Venue:                  Room 5501
                        Lifts 25/26

Committee Members:      Prof. Qiang Yang (Supervisor)
                        Prof. Kai Chen (Co-supervisor)
                        Dr. Yangqiu Song (Chairperson)
                        Dr. Binhang Yuan