More about HKUST
Interactive Visual Analytics for Multimodal Video Data Exploration and Model Steering
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Interactive Visual Analytics for Multimodal Video Data Exploration
and Model Steering"
By
Miss Jianben HE
Abstract:
Video, a versatile multimodal data format, has emerged as the predominant
channel for information transmission and communication. Efficient video
exploration and analysis are critical for numerous applications spanning
business, security, and education. Taking advantage of powerful
computational resources, numerous efforts have recently sought to utilize
automatic machine learning or deep learning models to automatically
summarize and analyze videos. However, considering the inherent complexity
of video data, building models that effectively and comprehensively
understand the spatial-temporal relationships and cross-modal information
within videos present significant challenges. Moreover, these automated
methods provide poor interactions and result representations, hindering
analysts from conducting fine-grained exploration for efficient high-level
insight distillation. Visual analytics, which combines computational
algorithm efficiency with human visual ability for pattern discovery and
domain knowledge for decision-making, has brought new opportunities for
model practitioners and data analysts to perform comprehensive video data
exploration and model steering.
In this thesis proposal, we concentrate on designing and developing
interactive visual analytic systems to 1) facilitate efficient and scalable
video data labeling for model building, 2) support comprehensive diagnosis
and effective prompt engineering to steer model reasoning performance on
multimodal video content, and 3) leverage advanced model results for
digesting multimodal video content and distilling insights in complex online
education settings. In the first work, we present VideoPro which employs
data programming concepts to enable efficient video data labeling and
supplementation at scale during model training. To further support effective
model adaptation for domain-specific tasks, in the second work, we introduce
POEM that streamlines the prompt engineering process to incorporate
multimodal knowledge in both inductive and deductive manner. In the third
ongoing work Engager, we further apply advanced model results to conduct a
multi-granularity analysis of human complex engagement and interaction
performance for pedagogical insight distillation in specific online
education settings.
Date: Monday, 2 December 2024
Time: 2:00pm - 4:00pm
Venue: Room 3494
Lifts 25/26
Chairman: Dr. Jiachuan YANG (CIVL)
Committee Members: Prof. Huamin QU (Supervisor)
Prof. Qian ZHANG (Co-supervisor)
Prof. Qiong LUO
Prof. Pedro SANDER
Prof. Shengdong ZHAO (CityU)