More about HKUST
Divide and Distill: Achieving Higher Accuracy, Speed and Data-Efficiency for Knowledge Distillation
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Final Year Thesis Oral Defense Title: "Divide and Distill: Achieving Higher Accuracy, Speed and Data-Efficiency for Knowledge Distillation" by KAO Shiu-hong Abstract: Knowledge distillation (KD) has been recognized as an effective tool to compress and accelerate models. However, current KD approaches, while impressive in achieving smaller models, suffer from either accuracy drop or excruciatingly long distillation process. In this paper, we first demonstrate that the inter-block optimization entanglement causes the traditional end-to-end KD approaches unstable and slow. To overcome it, we then propose a novel Divide and Distill (DnD) framework, which divides a pair of teacher and student networks into several blocks, and distills the block-wise knowledge separately before combining them together for inference. Extensive experiments validate that our simple yet effective framework greatly boosts model accuracy, speeds up the distillation process, and improves the data efficiency. Notably, due to its better distillation process, DnD can train a student network whose accuracy even surpasses its teacher. Compared to other KD approaches, DnD offers an impressive distillation speedup (up to 10x) and outperforms them with only 40% of the original training data. Date : 3 May 2023 (Wednesday) Time : 10:30 - 11:10 Venue : Room 5501 (near lifts 25/26), HKUST Advisor : Prof. CHAN Gary Shueng-Han 2nd Reader : Prof. WONG Raymond Chi-Wing