Towards Reliable Multimodal Agents: A Survey

PhD Qualifying Examination


Title: "Towards Reliable Multimodal Agents: A Survey"

by

Mr. Kaican LI


Abstract:

Multimodal agents are increasingly being deployed in real-world settings, 
including autonomous systems, digital assistance, robotics, and healthcare, 
where they must interpret and act on information from diverse sources such as 
vision, language, audio, and sensor data. Despite their growing capabilities, 
ensuring their reliability remains a major challenge. Failures can arise from 
brittle perception, cross-modal inconsistencies, hallucinated reasoning, poor 
uncertainty estimation, distribution shifts, and unsafe or poorly grounded 
action execution. This survey provides a systematic overview of reliability 
in multimodal agents by organizing the field around key failure modes, 
evaluation protocols, and emerging technical approaches for improving 
robustness, calibration, grounding, interpretability, and trustworthiness. It 
further examines how reliability is measured across both benchmark-based and 
real-world interactive settings. Finally, the survey discusses the 
limitations of current methods and outlines open research directions for 
developing multimodal agents that can operate dependably under real-world 
complexity, ambiguity, and uncertainty.


Date:                   Wednesday, 15 April 2026

Time:                   11:00am - 1:00pm

Venue:                  Room 2132C
                        Lift 22

Committee Members:      Prof. Nevin Zhang (Supervisor)
                        Dr. Dan Xu (Chairperson)
                        Dr. Long Chen