More about HKUST
Multi-Task Learning for Scene Understanding
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Multi-Task Learning for Scene Understanding" By Mr. Hanrong YE Abstract: Multi-task scene understanding is a crucial research area with extensive applications. Scene understanding involves essential tasks such as semantic segmentation, depth estimation, object detection, etc. By learning these tasks simultaneously, multi-task models can achieve greater effectiveness and efficiency. This thesis presents a series of studies focused on multi-task learning for scene understanding. First, previous convolution-based multi-task models struggle to capture global context due to their reliance on convolutional layers. To address this, we propose the Inverted Pyramid multi-task Transformer (InvPT), which simultaneously models spatial positions and multiple tasks within a unified framework. Secondly, many multi-task models share backbone features across tasks and use static decoders, leading to less distinctive task-specific representations. To overcome this, we propose TaskExpert, a multi-task MoE model that learns diverse task-generic feature spaces and dynamically decodes task-specific features for varying inputs and tasks. Thirdly, existing methods separately learn task-generic, task-specific, and cross-task interaction representations, often relying on handcrafted model designs. To address this, we propose TaskPrompter, a spatial-channel multi-task prompting transformer that jointly learns these representations within each layer, enhancing performance while reducing computational cost. Finally, we tackle the challenge of partially-supervised multi-task learning, where the absence of labels for some tasks during training leads to lower-quality and noisy predictions. To improve the annotation efficiency, we redefine partially-supervised multi-task learning as a denoising problem and introduce a multi-task denoising diffusion framework. We summarize the contributions and point out future research directions on multi-task scene understanding at the end of the thesis. Date: Thursday, 9 January 2025 Time: 10:00am - 12:00noon Venue: Room 3494 Lifts 25/26 Chairman: Dr. Xiaojun ZHANG (ISOM) Committee Members: Dr. Dan XU (Supervisor) Dr. Long CHEN Dr. Qifeng CHEN Dr. Jun ZHANG (ECE) Dr. Rui HUANG (CUHK-SZ)