More about HKUST
A Survey on Long Sequence Visual Modeling with Deep Learning
PhD Qualifying Examination Title: "A Survey on Long Sequence Visual Modeling with Deep Learning" by Mr. Zhengrui GUO Abstract: In the field of deep learning, the effective handling of long visual sequences derived from high-resolution images and extensive video footage is a pivotal challenge, influencing diverse domains such as computational pathology, remote sensing image analysis, and video understanding. Deep learning models like Transformers have shown great success in modeling long-range interactions and dependencies, making them promising for advancing the aforementioned fields. However, inherent limitations in Transformers such as quadratic complexity with respect to input length and a lack of inductive bias pose unique challenges to the effective and efficient application and scalability of these models. To address these problems, recent years have witnessed an upward trend in designing models tailored for long sequence visual modeling, such as Efficient Transformers and State Space Models like Mamba. This survey provides a comprehensive review of the latest trends in long sequence visual modeling, emphasizing Transformer-based, State Space Model-based, and other efficient architectures. Further, we offer a detailed taxonomy of model designs that enhance the effectiveness and efficiency of long sequence modeling. Finally, this survey concludes with a discussion of design trends in the presented architectures including Efficient Transformers as well as State Space Models, and summarizes the critical findings. Date: Friday, 5 July 2024 Time: 10:00am - 12:00noon Zoom Meeting ID: 863 623 1801 Committee Members: Dr. Hao Chen (Supervisor) Dr. Qifeng Chen (Chairperson) Dr. Junxian He Dr. Terence Wong (CBE)