More about HKUST
Multi-Task Learning for Scene Understanding
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Multi-Task Learning for Scene Understanding"
By
Mr. Hanrong YE
Abstract:
Multi-task scene understanding is a crucial research area with extensive
applications. Scene understanding involves essential tasks such as semantic
segmentation, depth estimation, object detection, etc. By learning these
tasks simultaneously, multi-task models can achieve greater effectiveness and
efficiency. This thesis presents a series of studies focused on multi-task
learning for scene understanding.
First, previous convolution-based multi-task models struggle to capture
global context due to their reliance on convolutional layers. To address
this, we propose the Inverted Pyramid multi-task Transformer (InvPT), which
simultaneously models spatial positions and multiple tasks within a unified
framework.
Secondly, many multi-task models share backbone features across tasks and use
static decoders, leading to less distinctive task-specific representations.
To overcome this, we propose TaskExpert, a multi-task MoE model that learns
diverse task-generic feature spaces and dynamically decodes task-specific
features for varying inputs and tasks.
Thirdly, existing methods separately learn task-generic, task-specific, and
cross-task interaction representations, often relying on handcrafted model
designs. To address this, we propose TaskPrompter, a spatial-channel
multi-task prompting transformer that jointly learns these representations
within each layer, enhancing performance while reducing computational cost.
Finally, we tackle the challenge of partially-supervised multi-task learning,
where the absence of labels for some tasks during training leads to
lower-quality and noisy predictions. To improve the annotation efficiency, we
redefine partially-supervised multi-task learning as a denoising problem and
introduce a multi-task denoising diffusion framework.
We summarize the contributions and point out future research directions on
multi-task scene understanding at the end of the thesis.
Date: Thursday, 9 January 2025
Time: 10:00am - 12:00noon
Venue: Room 3494
Lifts 25/26
Chairman: Dr. Xiaojun ZHANG (ISOM)
Committee Members: Dr. Dan XU (Supervisor)
Dr. Long CHEN
Dr. Qifeng CHEN
Dr. Jun ZHANG (ECE)
Dr. Rui HUANG (CUHK-SZ)