Toward Open World Perception: A Survey

PhD Qualifying Examination


Title: "Toward Open World Perception: A Survey"

by

Mr. Lewei YAO


Abstract:

The past decade has witnessed the remarkable progress in vision perception 
techniques. Despite these progresses, traditional methods are constraint to 
recognizing a narrow set of annotated visual concepts, hindering their 
applicability in varied real-world scenarios. This limitation sparks a paradigm 
shift towards open-world visual perception, aiming to emulate human-like 
comprehension by recognizing a wide array of visual concepts. This survey 
delves into this transition, elucidating how computer vision research evolves 
from closed-set to a more universal open-set perception. We begin by 
introducing vision-language pretraining (VLP) , the cornerstone of open world 
visual perception. VLP models bridge the connection between visual concepts and 
textual descriptions through learning from extensive image-text pairs. We 
further explore the advancement into open-vocabulary visual recognition that 
leverages VLP models to more complex visual tasks like detection and 
segmentation. Lastly, we introduce the emerging large vision-language models 
(LVLMs). By leveraging the power of advanced large language models, these LVLMs 
exhibit enhanced visual understanding and reasoning capabilities. This survey 
presents key advancements and representative works in these areas, categorizes 
them, and offers a comprehensive discussion on their core design principles, 
implementation strategies, and the challenges that persist. We hope this survey 
can provide readers with a panoramic view of the current trajectory of computer 
vision research towards achieving universal visual perception.


Date:                   Tuesday, 9 April 2024

Time:                   3:00pm - 5:00pm

Venue:                  Room 4472
                        Lifts 25/26

Committee Members:      Dr. Dan Xu (Supervisor)
                        Dr. Long Chen (Chairperson)
                        Dr. Qifeng Chen
                        Prof. James Kwok
Privacy Sitemap
Toward Open World Perception: A Survey

About

People

Research

Academics

Admissions