More about HKUST
A Survey on Long Sequence Visual Modeling with Deep Learning
PhD Qualifying Examination
Title: "A Survey on Long Sequence Visual Modeling with Deep Learning"
by
Mr. Zhengrui GUO
Abstract:
In the field of deep learning, the effective handling of long visual
sequences derived from high-resolution images and extensive video footage is
a pivotal challenge, influencing diverse domains such as computational
pathology, remote sensing image analysis, and video understanding. Deep
learning models like Transformers have shown great success in modeling
long-range interactions and dependencies, making them promising for advancing
the aforementioned fields. However, inherent limitations in Transformers such
as quadratic complexity with respect to input length and a lack of inductive
bias pose unique challenges to the effective and efficient application and
scalability of these models. To address these problems, recent years have
witnessed an upward trend in designing models tailored for long sequence
visual modeling, such as Efficient Transformers and State Space Models like
Mamba. This survey provides a comprehensive review of the latest trends in
long sequence visual modeling, emphasizing Transformer-based, State Space
Model-based, and other efficient architectures. Further, we offer a detailed
taxonomy of model designs that enhance the effectiveness and efficiency of
long sequence modeling. Finally, this survey concludes with a discussion of
design trends in the presented architectures including Efficient Transformers
as well as State Space Models, and summarizes the critical findings.
Date: Friday, 5 July 2024
Time: 10:00am - 12:00noon
Zoom Meeting ID: 863 623 1801
Committee Members: Dr. Hao Chen (Supervisor)
Dr. Qifeng Chen (Chairperson)
Dr. Junxian He
Dr. Terence Wong (CBE)