More about HKUST
Toward Open World Perception: A Survey
PhD Qualifying Examination
Title: "Toward Open World Perception: A Survey"
by
Mr. Lewei YAO
Abstract:
The past decade has witnessed the remarkable progress in vision perception
techniques. Despite these progresses, traditional methods are constraint to
recognizing a narrow set of annotated visual concepts, hindering their
applicability in varied real-world scenarios. This limitation sparks a paradigm
shift towards open-world visual perception, aiming to emulate human-like
comprehension by recognizing a wide array of visual concepts. This survey
delves into this transition, elucidating how computer vision research evolves
from closed-set to a more universal open-set perception. We begin by
introducing vision-language pretraining (VLP) , the cornerstone of open world
visual perception. VLP models bridge the connection between visual concepts and
textual descriptions through learning from extensive image-text pairs. We
further explore the advancement into open-vocabulary visual recognition that
leverages VLP models to more complex visual tasks like detection and
segmentation. Lastly, we introduce the emerging large vision-language models
(LVLMs). By leveraging the power of advanced large language models, these LVLMs
exhibit enhanced visual understanding and reasoning capabilities. This survey
presents key advancements and representative works in these areas, categorizes
them, and offers a comprehensive discussion on their core design principles,
implementation strategies, and the challenges that persist. We hope this survey
can provide readers with a panoramic view of the current trajectory of computer
vision research towards achieving universal visual perception.
Date: Tuesday, 9 April 2024
Time: 3:00pm - 5:00pm
Venue: Room 4472
Lifts 25/26
Committee Members: Dr. Dan Xu (Supervisor)
Dr. Long Chen (Chairperson)
Dr. Qifeng Chen
Prof. James Kwok