Interactive Visual Analytics for Multimodal Video Data Exploration and Model Steering

PhD Thesis Proposal Defence


Title: "Interactive Visual Analytics for Multimodal Video Data Exploration and 
Model Steering"

by

Miss Jianben HE


Abstract:

Video, a versatile multimodal data format, has emerged as the predominant 
channel for information transmission and communication. Efficient video 
exploration and analysis are critical for numerous applications spanning 
business, security, and education. Taking advantage of powerful computational 
resources, numerous efforts have recently sought to utilize automatic machine 
learning or deep learning models to automatically summarize and analyze videos. 
However, considering the inherent complexity of video data, building models 
that effectively and comprehensively understand the spatial-temporal 
relationships and cross-modal information within videos presents significant 
challenges. Moreover, these automated methods provide poor interactions and 
result representations, hindering analysts from conducting fine-grained 
exploration for efficient high-level insight distillation. Visual analytics, 
which combines computational algorithm efficiency with human visual ability for 
pattern discovery and domain knowledge for decision-making, has brought new 
opportunities for model practitioners and data analysts to perform 
comprehensive video data exploration and model steering.

In this thesis proposal, we concentrate on designing and developing interactive 
visual analytic systems to 1) facilitate efficient and scalable video data 
labeling for model building, 2) support comprehensive diagnosis and effective 
prompt engineering to steer model reasoning performance on multimodal video 
content, and 3) leverage advanced model results for digesting multimodal video 
content and distilling insights in complex online education settings. In the 
first work, we present VideoPro which employs data programming concepts to 
enable efficient video data labeling and supplementation at scale during model 
training. To further support effective model adaptation for domain-specific 
tasks, in the second work, we introduce POEM that streamlines the prompt 
engineering process to incorporate multimodal knowledge in both inductive and 
deductive manner. In the third ongoing work Engager, we further apply advanced 
model results to conduct a multi-granularity analysis of human complex 
engagement and interaction performance for pedagogical insight distillation in 
specific online education settings.


Date:                   Tuesday, 4 June 2024

Time:                   2:00pm - 4:00pm

Venue:                  Room 5501
                        Lifts 25/26

Committee Members:      Prof. Huamin Qu (Supervisor)
                        Prof. Qian Zhang (Co-supervisor)
                        Dr. Dan Xu (Chairperson)
                        Prof. Pedro Sander