Vision-Language Joint Representation

PhD Qualifying Examination


Title: "Vision-Language Joint Representation"

by

Mr. Feng LI


Abstract:

In the past few years, deep learning has revolutionized natural language 
processing and computer vision. Inspired by the remarkable progress in 
both tasks, and  recent trends are shifting from single modality 
processing to multiple modality comprehension. This paper presents a 
comprehensive survey of vision-language (VL) models. To give readers a 
better overall grasp of VL models, we first briefly introduce some common 
VL tasks. At the core content, we focus on vision-language pre-training 
(VLP) and comprehensively review the key components of the model 
structures and training methods. Then we go through some mainstream VLP 
methods that model joint image-language representation from the 
perspective of time. After that, we show how recent work utilizes 
large-scale image-text data to learn language-aligned visual 
representations that generalize better on zero or few shot learning tasks. 
We believe that this review will be of help for researchers and 
practitioners of AI and ML, especially those interested in computer vision 
and natural language processing.


Date:  			Monday, 11 July 2022

Time:                  	10:00am - 12:00noon

Zoom Meeting: 
https://hkust.zoom.us/j/97905132778?pwd=d0dFeVl6V0J4TVNnYVJlM3U0Q3g1UT09

Committee Members:	Prof. Lionel Ni (Supervisor)
 			Prof. Harry Shum (Supervisor)
 			Dr. Dan Xu (Chairperson)
 			Dr. Qifeng Chen


**** ALL are Welcome ****