More about HKUST
Vision-Language Joint Representation
PhD Qualifying Examination Title: "Vision-Language Joint Representation" by Mr. Feng LI Abstract: In the past few years, deep learning has revolutionized natural language processing and computer vision. Inspired by the remarkable progress in both tasks, and recent trends are shifting from single modality processing to multiple modality comprehension. This paper presents a comprehensive survey of vision-language (VL) models. To give readers a better overall grasp of VL models, we first briefly introduce some common VL tasks. At the core content, we focus on vision-language pre-training (VLP) and comprehensively review the key components of the model structures and training methods. Then we go through some mainstream VLP methods that model joint image-language representation from the perspective of time. After that, we show how recent work utilizes large-scale image-text data to learn language-aligned visual representations that generalize better on zero or few shot learning tasks. We believe that this review will be of help for researchers and practitioners of AI and ML, especially those interested in computer vision and natural language processing. Date: Monday, 11 July 2022 Time: 10:00am - 12:00noon Zoom Meeting: https://hkust.zoom.us/j/97905132778?pwd=d0dFeVl6V0J4TVNnYVJlM3U0Q3g1UT09 Committee Members: Prof. Lionel Ni (Supervisor) Prof. Harry Shum (Supervisor) Dr. Dan Xu (Chairperson) Dr. Qifeng Chen **** ALL are Welcome ****