Divide and Distill: Achieving Higher Accuracy, Speed and Data-Efficiency for Knowledge Distillation

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

Final Year Thesis Oral Defense

Title: "Divide and Distill: Achieving Higher Accuracy, Speed and 
        Data-Efficiency for Knowledge Distillation"

by

KAO Shiu-hong

Abstract:

Knowledge distillation (KD) has been recognized as an effective tool to 
compress and accelerate models. However, current KD approaches, while 
impressive in achieving smaller models, suffer from either accuracy drop 
or excruciatingly long distillation process. In this paper, we first 
demonstrate that the inter-block optimization entanglement causes the 
traditional end-to-end KD approaches unstable and slow. To overcome it, we 
then propose a novel Divide and Distill (DnD) framework, which divides a 
pair of teacher and student networks into several blocks, and distills the 
block-wise knowledge separately before combining them together for 
inference. Extensive experiments validate that our simple yet effective 
framework greatly boosts model accuracy, speeds up the distillation 
process, and improves the data efficiency. Notably, due to its better 
distillation process, DnD can train a student network whose accuracy even 
surpasses its teacher. Compared to other KD approaches, DnD offers an 
impressive distillation speedup (up to 10x) and outperforms them with only 
40% of the original training data.


Date            : 3 May 2023 (Wednesday)

Time            : 10:30 - 11:10

Venue           : Room 5501 (near lifts 25/26), HKUST

Advisor         : Prof. CHAN Gary Shueng-Han

2nd Reader      : Prof. WONG Raymond Chi-Wing