Multi-Task Learning for Scene Understanding

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Multi-Task Learning for Scene Understanding"

By

Mr. Hanrong YE


Abstract:

Multi-task scene understanding is a crucial research area with extensive 
applications. Scene understanding involves essential tasks such as semantic 
segmentation, depth estimation, object detection, etc. By learning these 
tasks simultaneously, multi-task models can achieve greater effectiveness and 
efficiency. This thesis presents a series of studies focused on multi-task 
learning for scene understanding.

First, previous convolution-based multi-task models struggle to capture 
global context due to their reliance on convolutional layers. To address 
this, we propose the Inverted Pyramid multi-task Transformer (InvPT), which 
simultaneously models spatial positions and multiple tasks within a unified 
framework.

Secondly, many multi-task models share backbone features across tasks and use 
static decoders, leading to less distinctive task-specific representations. 
To overcome this, we propose TaskExpert, a multi-task MoE model that learns 
diverse task-generic feature spaces and dynamically decodes task-specific 
features for varying inputs and tasks.

Thirdly, existing methods separately learn task-generic, task-specific, and 
cross-task interaction representations, often relying on handcrafted model 
designs. To address this, we propose TaskPrompter, a spatial-channel 
multi-task prompting transformer that jointly learns these representations 
within each layer, enhancing performance while reducing computational cost.

Finally, we tackle the challenge of partially-supervised multi-task learning, 
where the absence of labels for some tasks during training leads to 
lower-quality and noisy predictions. To improve the annotation efficiency, we 
redefine partially-supervised multi-task learning as a denoising problem and 
introduce a multi-task denoising diffusion framework.

We summarize the contributions and point out future research directions on 
multi-task scene understanding at the end of the thesis.


Date:                   Thursday, 9 January 2025

Time:                   10:00am - 12:00noon

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Dr. Xiaojun ZHANG (ISOM)

Committee Members:      Dr. Dan XU (Supervisor)
                        Dr. Long CHEN
                        Dr. Qifeng CHEN
                        Dr. Jun ZHANG (ECE)
                        Dr. Rui HUANG (CUHK-SZ)