More about HKUST
A survey of Mixture of Experts in Multi-Modal Large Language Models
PhD Qualifying Examination Title: "A survey of Mixture of Experts in Multi-Modal Large Language Models" by Mr. Zhili LIU Abstract: Recent advancements in Multi-Modal Large Language Models (MLLMs) have brought us closer to developing general-purpose assistants capable of following complex vision-and-language instructions. A key challenge in this development is alignment, which ensures that MLLMs accurately interpret and act on human intent across a range of real-world tasks. Concurrently, Mixture of Experts (MoE) models have gained significant attention for their success in large language models (LLMs), and many of these strategies are being integrated into MLLM alignment. However, despite the growing use of MoE in MLLMs, a systematic and comprehensive review of the literature is lacking. In this survey, we first introduce the MoE paradigm and the three key alignment stages in MLLM tuning: Vision Encoder Training, MLLM Alignment, and MLLM Inference. We then provide an overview of MoE's role in these stages and highlight potential directions for future research. Date: Thursday, 19 December 2024 Time: 4:00pm - 6:00pm Venue: Room 2128A Lift 19 Committee Members: Prof. James Kwok (Supervisor) Dr. Dan Xu (Chairperson) Dr. Long Chen Dr. May Fung