More about HKUST
Multi-modal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Final Year Thesis Oral Defense Title: "Multi-modal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding" by XU Baixuan Abstract: Understanding the shopping intention behind E-commerce purchases remains critical in various E-commerce downstream tasks. For instance, a system that understands However, previous methods in large-scale intention acquisition heavily rely on distilling large language model, associated with human annotation as verification, which suffers from losing information from visual modality (i.e., product images) and expensive cost to scale up. To address these issues, we present [MIND], a multimodal framework that distills purchase intentions from large vision-language models. Specifically, we distill the information both from the product image and the product name. We also devise the human-centric role-aware filter mechanism which could significantly improve the quality of the resulted knowledge base. By applying [MIND] to Amazon Review data, we construct a multimodal intention knowledge base comprising million level intentions over million level co-buy shopping records. Date : 1 May 2024 (Wednesday) Time : 10:00 - 10:40 Venue : Room 3494 (near lifts 25/26), HKUST Advisor : Dr. SONG Yangqiu 2nd Reader : Dr. CHEN Long