A survey of Mixture of Experts in Multi-Modal Large Language Models

PhD Qualifying Examination


Title: "A survey of Mixture of Experts in Multi-Modal Large Language Models"

by

Mr. Zhili LIU


Abstract:

Recent advancements in Multi-Modal Large Language Models (MLLMs) have brought 
us closer to developing general-purpose assistants capable of following 
complex vision-and-language instructions. A key challenge in this development 
is alignment, which ensures that MLLMs accurately interpret and act on human 
intent across a range of real-world tasks. Concurrently, Mixture of Experts 
(MoE) models have gained significant attention for their success in large 
language models (LLMs), and many of these strategies are being integrated 
into MLLM alignment. However, despite the growing use of MoE in MLLMs, a 
systematic and comprehensive review of the literature is lacking. In this 
survey, we first introduce the MoE paradigm and the three key alignment 
stages in MLLM tuning: Vision Encoder Training, MLLM Alignment, and MLLM 
Inference. We then provide an overview of MoE's role in these stages and 
highlight potential directions for future research.


Date:                   Thursday, 19 December 2024

Time:                   4:00pm - 6:00pm

Venue:                  Room 2128A
                        Lift 19

Committee Members:      Prof. James Kwok (Supervisor)
                        Dr. Dan Xu (Chairperson)
                        Dr. Long Chen
                        Dr. May Fung