More about HKUST
Toward Open World Perception: A Survey
PhD Qualifying Examination Title: "Toward Open World Perception: A Survey" by Mr. Lewei YAO Abstract: The past decade has witnessed the remarkable progress in vision perception techniques. Despite these progresses, traditional methods are constraint to recognizing a narrow set of annotated visual concepts, hindering their applicability in varied real-world scenarios. This limitation sparks a paradigm shift towards open-world visual perception, aiming to emulate human-like comprehension by recognizing a wide array of visual concepts. This survey delves into this transition, elucidating how computer vision research evolves from closed-set to a more universal open-set perception. We begin by introducing vision-language pretraining (VLP) , the cornerstone of open world visual perception. VLP models bridge the connection between visual concepts and textual descriptions through learning from extensive image-text pairs. We further explore the advancement into open-vocabulary visual recognition that leverages VLP models to more complex visual tasks like detection and segmentation. Lastly, we introduce the emerging large vision-language models (LVLMs). By leveraging the power of advanced large language models, these LVLMs exhibit enhanced visual understanding and reasoning capabilities. This survey presents key advancements and representative works in these areas, categorizes them, and offers a comprehensive discussion on their core design principles, implementation strategies, and the challenges that persist. We hope this survey can provide readers with a panoramic view of the current trajectory of computer vision research towards achieving universal visual perception. Date: Tuesday, 9 April 2024 Time: 3:00pm - 5:00pm Venue: Room 4472 Lifts 25/26 Committee Members: Dr. Dan Xu (Supervisor) Dr. Long Chen (Chairperson) Dr. Qifeng Chen Prof. James Kwok