More about HKUST
ReDIET-MoE: Reuse-Guided Inter-Expert Pruning and Dynamic Intra-Expert Pruning for Efficient Mixture-of-Experts in Large Language Models
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Final Year Thesis Oral Defense Title: "ReDIET-MoE: Reuse-Guided Inter-Expert Pruning and Dynamic Intra-Expert Pruning for Efficient Mixture-of-Experts in Large Language Models" by LEE Pak Nin Abstract: Mixture-of-Experts (MoE) scales large language models through sparse activation, yet their large expert pool imposes substantial storage and memory requirements that hinder real-world deployment. This thesis addresses the problem with ReDIET-MoE, a two-stage expert pruning framework that combines reuse-guided inter-expert pruning (RIEP) and dynamic intra-expert pruning (DIEP). RIEP removes under-utilized experts while recovering ability lost by allowing routers to reuse experts from adjacent layers. On top of this, DIEP enables each expert to perform task-conditioned neuron-level self-pruning at inference time, producing smaller and input-specific sub-experts that reduces inference computations. Extensive experiments on the Phi-Mini-MoE-Instruct model demonstrate that RIEP and DIEP are effective both individually and jointly. RIEP alone achieves up to 58% parameter reduction with less than 20% zero-shot accuracy degradation; DIEP reduces inference cost by up to 65% with a 1.13 times speedup; and the ReDIET-MoE, which prunes 25% of experts and 25% of intra-expert neurons, delivers up to 1.12 times faster inference while keeping accuracy degradation below 10% across benchmarks. Date : 30 April 2026 (Thursday) Time : 15:00 - 15:40 Venue : Room 2126D (near Lift 19), HKUST Advisor : Prof. YEUNG Dit-Yan 2nd Reader : Dr. FUNG May