More about HKUST
Interactive Visual Analytics for Multimodal Video Data Exploration and Model Steering
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Interactive Visual Analytics for Multimodal Video Data Exploration and Model Steering" By Miss Jianben HE Abstract: Video, a versatile multimodal data format, has emerged as the predominant channel for information transmission and communication. Efficient video exploration and analysis are critical for numerous applications spanning business, security, and education. Taking advantage of powerful computational resources, numerous efforts have recently sought to utilize automatic machine learning or deep learning models to automatically summarize and analyze videos. However, considering the inherent complexity of video data, building models that effectively and comprehensively understand the spatial-temporal relationships and cross-modal information within videos present significant challenges. Moreover, these automated methods provide poor interactions and result representations, hindering analysts from conducting fine-grained exploration for efficient high-level insight distillation. Visual analytics, which combines computational algorithm efficiency with human visual ability for pattern discovery and domain knowledge for decision-making, has brought new opportunities for model practitioners and data analysts to perform comprehensive video data exploration and model steering. In this thesis proposal, we concentrate on designing and developing interactive visual analytic systems to 1) facilitate efficient and scalable video data labeling for model building, 2) support comprehensive diagnosis and effective prompt engineering to steer model reasoning performance on multimodal video content, and 3) leverage advanced model results for digesting multimodal video content and distilling insights in complex online education settings. In the first work, we present VideoPro which employs data programming concepts to enable efficient video data labeling and supplementation at scale during model training. To further support effective model adaptation for domain-specific tasks, in the second work, we introduce POEM that streamlines the prompt engineering process to incorporate multimodal knowledge in both inductive and deductive manner. In the third ongoing work Engager, we further apply advanced model results to conduct a multi-granularity analysis of human complex engagement and interaction performance for pedagogical insight distillation in specific online education settings. Date: Monday, 2 December 2024 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Chairman: Dr. Jiachuan YANG (CIVL) Committee Members: Prof. Huamin QU (Supervisor) Prof. Qian ZHANG (Co-supervisor) Prof. Qiong LUO Prof. Pedro SANDER Prof. Shengdong ZHAO (CityU)