More about HKUST
Towards Reliable Multimodal Agents: A Survey
PhD Qualifying Examination
Title: "Towards Reliable Multimodal Agents: A Survey"
by
Mr. Kaican LI
Abstract:
Multimodal agents are increasingly being deployed in real-world settings,
including autonomous systems, digital assistance, robotics, and healthcare,
where they must interpret and act on information from diverse sources such as
vision, language, audio, and sensor data. Despite their growing capabilities,
ensuring their reliability remains a major challenge. Failures can arise from
brittle perception, cross-modal inconsistencies, hallucinated reasoning, poor
uncertainty estimation, distribution shifts, and unsafe or poorly grounded
action execution. This survey provides a systematic overview of reliability
in multimodal agents by organizing the field around key failure modes,
evaluation protocols, and emerging technical approaches for improving
robustness, calibration, grounding, interpretability, and trustworthiness. It
further examines how reliability is measured across both benchmark-based and
real-world interactive settings. Finally, the survey discusses the
limitations of current methods and outlines open research directions for
developing multimodal agents that can operate dependably under real-world
complexity, ambiguity, and uncertainty.
Date: Wednesday, 15 April 2026
Time: 11:00am - 1:00pm
Venue: Room 2132C
Lift 22
Committee Members: Prof. Nevin Zhang (Supervisor)
Dr. Dan Xu (Chairperson)
Dr. Long Chen