More about HKUST
Divide and Distill: Achieving Higher Accuracy, Speed and Data-Efficiency for Knowledge Distillation
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
Final Year Thesis Oral Defense
Title: "Divide and Distill: Achieving Higher Accuracy, Speed and
Data-Efficiency for Knowledge Distillation"
by
KAO Shiu-hong
Abstract:
Knowledge distillation (KD) has been recognized as an effective tool to
compress and accelerate models. However, current KD approaches, while
impressive in achieving smaller models, suffer from either accuracy drop
or excruciatingly long distillation process. In this paper, we first
demonstrate that the inter-block optimization entanglement causes the
traditional end-to-end KD approaches unstable and slow. To overcome it, we
then propose a novel Divide and Distill (DnD) framework, which divides a
pair of teacher and student networks into several blocks, and distills the
block-wise knowledge separately before combining them together for
inference. Extensive experiments validate that our simple yet effective
framework greatly boosts model accuracy, speeds up the distillation
process, and improves the data efficiency. Notably, due to its better
distillation process, DnD can train a student network whose accuracy even
surpasses its teacher. Compared to other KD approaches, DnD offers an
impressive distillation speedup (up to 10x) and outperforms them with only
40% of the original training data.
Date : 3 May 2023 (Wednesday)
Time : 10:30 - 11:10
Venue : Room 5501 (near lifts 25/26), HKUST
Advisor : Prof. CHAN Gary Shueng-Han
2nd Reader : Prof. WONG Raymond Chi-Wing