More about HKUST
Interactive Visual Analytics for Multimodal Video Data Exploration and Model Steering
PhD Thesis Proposal Defence
Title: "Interactive Visual Analytics for Multimodal Video Data Exploration and
Model Steering"
by
Miss Jianben HE
Abstract:
Video, a versatile multimodal data format, has emerged as the predominant
channel for information transmission and communication. Efficient video
exploration and analysis are critical for numerous applications spanning
business, security, and education. Taking advantage of powerful computational
resources, numerous efforts have recently sought to utilize automatic machine
learning or deep learning models to automatically summarize and analyze videos.
However, considering the inherent complexity of video data, building models
that effectively and comprehensively understand the spatial-temporal
relationships and cross-modal information within videos presents significant
challenges. Moreover, these automated methods provide poor interactions and
result representations, hindering analysts from conducting fine-grained
exploration for efficient high-level insight distillation. Visual analytics,
which combines computational algorithm efficiency with human visual ability for
pattern discovery and domain knowledge for decision-making, has brought new
opportunities for model practitioners and data analysts to perform
comprehensive video data exploration and model steering.
In this thesis proposal, we concentrate on designing and developing interactive
visual analytic systems to 1) facilitate efficient and scalable video data
labeling for model building, 2) support comprehensive diagnosis and effective
prompt engineering to steer model reasoning performance on multimodal video
content, and 3) leverage advanced model results for digesting multimodal video
content and distilling insights in complex online education settings. In the
first work, we present VideoPro which employs data programming concepts to
enable efficient video data labeling and supplementation at scale during model
training. To further support effective model adaptation for domain-specific
tasks, in the second work, we introduce POEM that streamlines the prompt
engineering process to incorporate multimodal knowledge in both inductive and
deductive manner. In the third ongoing work Engager, we further apply advanced
model results to conduct a multi-granularity analysis of human complex
engagement and interaction performance for pedagogical insight distillation in
specific online education settings.
Date: Tuesday, 4 June 2024
Time: 2:00pm - 4:00pm
Venue: Room 5501
Lifts 25/26
Committee Members: Prof. Huamin Qu (Supervisor)
Prof. Qian Zhang (Co-supervisor)
Dr. Dan Xu (Chairperson)
Prof. Pedro Sander