More about HKUST
Vision Language Intelligence
PhD Qualifying Examination Title: "Vision Language Intelligence" by Mr. Hao ZHANG Abstract: This survey provides an investigation of the development of multi-modal intelligence, especially visual language (VL) intelligence. This survey is motivated by the fast development of visual language learning recently with large scale datasets. We divide the development into 3 stages overall, which are task specific methods, vision-language pretraining (VLP) and large models with large-scale weakly labeled data. First, we take some common VL tasks as examples to introduce the development of task-related methods. We then focus on the VLP approach, with a comprehensive review of the key components of the model structure and training method. Finally, the emphasis of this survey is on the third stage, where large-scale datasets are adopted to train large powerful models since CLIP and DALLE. These methods shows the possibility of training from weekly labeled low-quality data. Additionally, we also provide an introduction of the future development trends of modal collaboration, unified representation and knowledge integration are discussed. As far as we know, this is the first survey that introduce VL learning in the perspective of time. We believe this review will be helpful to researchers and practitioners of artificial intelligence, especially those interested in computer vision and natural language processing. Date: Thursday, 30 June 2022 Time: 10:00am - 12:00noon Zoom Meeting: https://hkust.zoom.us/j/3689918530 Committee Members: Prof. Lionel Ni (Supervisor) Prof. Harry Shum (Supervisor) Dr. Dan Xu (Chairperson) Prof. Raymond Wong Prof. Ke Yi **** ALL are Welcome ****