Multimodal Commonsense Reasoning

Speaker: Dr. Zhecan (James) Wang
UCLA

Title: Multimodal Commonsense Reasoning

Date: Monday, 24 March 2025

Time: 2:00pm - 3:00pm

Venue: Room 4475 (via lift 25/26), HKUST

Abstract:

In my previous work, I have focused on enabling AI models to achieve human-level commonsense reasoning through two complementary avenues. The first avenue enhances reasoning capabilities by extracting and integrating fine-grained, multimodal knowledge—emphasizing the acquisition of contextual information and its incorporation into complex reasoning processes. The second avenue addresses model reliability from three perspectives: prediction consistency, transparent (or explainable) reasoning steps, and faithful performance in biased or ambiguous scenarios. By leveraging such detailed, multimodal knowledge, AI models can improve their reasoning, robustness, and interpretability, thereby strengthening human trust and understanding in human-AI interactions. Building on these foundations, my future research will continue to advance more generalized and human-centered AI, exploring areas such as real-world learning, multimodal math reasoning, security in reasoning, agent-based learning, embodied learning, interactive learning with human feedback, and AI for science, social good, and beyond.

Biography:

Zhecan (James) Wang is a Postdoctoral Research Fellow at UCLA’s NLP group, working under Prof. Kai-Wei Chang and Prof. Nanyun Peng, and earned his Ph.D. in Computer Science from Columbia University under Prof. Shih-Fu Chang. His research spans Natural Language Processing, Vision-Language Understanding, Multimodal Reasoning, Neural-Symbolic Learning, and Trustworthy, Explainable, and Human-Centered AI, with significant contributions to DARPA’s Machine Commonsense (MCS) and ECOLE programs, state-of-the-art benchmark achievements (e.g., VCR, VQA v2, OKVQA), and a first-place win in the Microsoft Global MS-Celeb-1M Challenge and DARPA MCS Benchmark Leaderboard. Additionally, his impactful industry research includes work at Google DeepMind, Microsoft Research, MIT Media Lab, Xpeng Motors, NUS LV Lab, and Panasonic AI Lab, and his contributions are reflected in 17 top-tier conference papers, 7 workshop papers, 8 AI-related patents, over 1200 Google Scholar citations, and collaborations with 17 professors across 12 institutions, with his work being featured by PaperWeekly, AI2, DARPA, 新智源, and 量子位.

Privacy Sitemap

Multimodal Commonsense Reasoning

About

People

Research

Academics

Admissions