A Survey on Long Sequence Visual Modeling with Deep Learning

PhD Qualifying Examination


Title: "A Survey on Long Sequence Visual Modeling with Deep Learning"

by

Mr. Zhengrui GUO


Abstract:

In the field of deep learning, the effective handling of long visual 
sequences derived from high-resolution images and extensive video footage is 
a pivotal challenge, influencing diverse domains such as computational 
pathology, remote sensing image analysis, and video understanding. Deep 
learning models like Transformers have shown great success in modeling 
long-range interactions and dependencies, making them promising for advancing 
the aforementioned fields. However, inherent limitations in Transformers such 
as quadratic complexity with respect to input length and a lack of inductive 
bias pose unique challenges to the effective and efficient application and 
scalability of these models. To address these problems, recent years have 
witnessed an upward trend in designing models tailored for long sequence 
visual modeling, such as Efficient Transformers and State Space Models like 
Mamba. This survey provides a comprehensive review of the latest trends in 
long sequence visual modeling, emphasizing Transformer-based, State Space 
Model-based, and other efficient architectures. Further, we offer a detailed 
taxonomy of model designs that enhance the effectiveness and efficiency of 
long sequence modeling. Finally, this survey concludes with a discussion of 
design trends in the presented architectures including Efficient Transformers 
as well as State Space Models, and summarizes the critical findings.


Date:                   Friday, 5 July 2024

Time:                   10:00am - 12:00noon

Zoom Meeting ID:        863 623 1801

Committee Members:      Dr. Hao Chen (Supervisor)
                        Dr. Qifeng Chen (Chairperson)
                        Dr. Junxian He
                        Dr. Terence Wong (CBE)