More about HKUST
Multi-Task Learning for Scene Understanding
PhD Thesis Proposal Defence Title: "Multi-Task Learning for Scene Understanding" by Mr. Hanrong HE Abstract: Multi-task scene understanding is a crucial research area with extensive applications. Scene understanding involves essential tasks such as semantic segmentation, depth estimation, and 3D object detection. By learning these tasks simultaneously, multi-task models can achieve greater effectiveness and efficiency. This thesis presents a series of studies focused on multi-task learning for scene understanding. First, previous convolution-based multi-task models face limitations in capturing global context due to their reliance on convolutional layers. Recognizing the importance of learning interactions across global spatial dimensions of multiple tasks, we propose Inverted Pyramid multi-task Transformer (InvPT). This approach enables simultaneous modeling of spatial positions and multiple tasks within a unified framework. Secondly, in many multi-task models, the input feature is shared across all tasks, and the task decoders use the same decoding parameters for different input samples, resulting in a static decoding process producing less distinctive task-specific representations. To address this limitation, we introduce TaskExpert—a multi-task MoE model that enables the learning of multiple representative task-generic feature spaces while dynamically decoding task-specific features for different inputs and tasks. Thirdly, although previous approaches demonstrate promising multi-task performance, they separate the learning of three essential representations—task-generic, task-specific, and cross-task interaction—into distinct network modules, which can be suboptimal when designed manually. To overcome this, we present TaskPrompter, a spatial-channel multi-task prompting transformer framework that learns these three representations jointly within each layer. This integration achieves higher performance with reduced computational cost. In the final part of the thesis proposal, we summarize the contributions and point out future research directions on multi-task scene understanding. Date: Monday, 25 November 2024 Time: 10:00am - 12:00noon Venue: Room 5501 Lifts 25/26 Committee Members: Dr. Dan Xu (Supervisor) Prof. Raymond Wong (Chairperson) Dr. Qifeng Chen Prof. Huamin Qu