Vision Language Intelligence

PhD Qualifying Examination


Title: "Vision Language Intelligence"

by

Mr. Hao ZHANG


Abstract:

This survey provides an investigation of the development of multi-modal 
intelligence, especially visual language (VL) intelligence. This survey is 
motivated by the fast development of visual language learning recently 
with large scale datasets. We divide the development into 3 stages 
overall, which are task specific methods, vision-language pretraining 
(VLP) and large models with large-scale weakly labeled data. First, we 
take some common VL tasks as examples to introduce the development of 
task-related methods. We then focus on the VLP approach, with a 
comprehensive review of the key components of the model structure and 
training method. Finally, the emphasis of this survey is on the third 
stage, where large-scale datasets are adopted to train large powerful 
models since CLIP and DALLE. These methods shows the possibility of 
training from weekly labeled low-quality data. Additionally, we also 
provide an introduction of the future development trends of modal 
collaboration, unified representation and knowledge integration are 
discussed. As far as we know, this is the first survey that introduce VL 
learning in the perspective of time. We believe this review will be 
helpful to researchers and practitioners of artificial intelligence, 
especially those interested in computer vision and natural language 
processing.


Date:  			Thursday, 30 June 2022

Time:                  	10:00am - 12:00noon

Zoom Meeting:		https://hkust.zoom.us/j/3689918530

Committee Members:	Prof. Lionel Ni (Supervisor)
 			Prof. Harry Shum (Supervisor)
 			Dr. Dan Xu (Chairperson)
 			Prof. Raymond Wong
 			Prof. Ke Yi


**** ALL are Welcome ****