Multi-modal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

Final Year Thesis Oral Defense

Title: "Multi-modal Shopping Intention Distillation from Large Vision-language 
Models for E-commerce Purchase Understanding"

by

XU Baixuan

Abstract:

Understanding the shopping intention behind E-commerce purchases remains 
critical in various E-commerce downstream tasks. For instance, a system that 
understands However, previous methods in large-scale intention acquisition 
heavily rely on distilling large language model, associated with human 
annotation as verification, which suffers from losing information from visual 
modality (i.e., product images) and expensive cost to scale up. To address 
these issues, we present [MIND], a multimodal framework that distills purchase 
intentions from large vision-language models. Specifically, we distill the 
information both from the product image and the product name. We also devise 
the human-centric role-aware filter mechanism which could significantly improve 
the quality of the resulted knowledge base. By applying [MIND] to Amazon Review 
data, we construct a multimodal intention knowledge base comprising million 
level intentions over million level co-buy shopping records.


Date            : 1 May 2024 (Wednesday)

Time            : 10:00 - 10:40

Venue           : Room 3494 (near lifts 25/26), HKUST

Advisor         : Dr. SONG Yangqiu

2nd Reader      : Dr. CHEN Long