ReDIET-MoE: Reuse-Guided Inter-Expert Pruning and Dynamic Intra-Expert Pruning for Efficient Mixture-of-Experts in Large Language Models

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

Final Year Thesis Oral Defense

Title: "ReDIET-MoE: Reuse-Guided Inter-Expert Pruning and Dynamic 
Intra-Expert Pruning for Efficient Mixture-of-Experts in Large Language 
Models"

by

LEE Pak Nin

Abstract:

Mixture-of-Experts (MoE) scales large language models through sparse 
activation, yet their large expert pool imposes substantial storage and 
memory requirements that hinder real-world deployment. This thesis addresses 
the problem with ReDIET-MoE, a two-stage expert pruning framework that 
combines reuse-guided inter-expert pruning (RIEP) and dynamic intra-expert 
pruning (DIEP). RIEP removes under-utilized experts while recovering ability 
lost by allowing routers to reuse experts from adjacent layers. On top of 
this, DIEP enables each expert to perform task-conditioned neuron-level 
self-pruning at inference time, producing smaller and input-specific 
sub-experts that reduces inference computations. Extensive experiments on the 
Phi-Mini-MoE-Instruct model demonstrate that RIEP and DIEP are effective both 
individually and jointly. RIEP alone achieves up to 58% parameter reduction 
with less than 20% zero-shot accuracy degradation; DIEP reduces inference 
cost by up to 65% with a 1.13 times speedup; and the ReDIET-MoE, which prunes 
25% of experts and 25% of intra-expert neurons, delivers up to 1.12 times 
faster inference while keeping accuracy degradation below 10% across 
benchmarks.

Date            : 30 April 2026 (Thursday)

Time            : 15:00 - 15:40

Venue           : Room 2126D (near Lift 19), HKUST

Advisor         : Prof. YEUNG Dit-Yan

2nd Reader      : Dr. FUNG May