More about HKUST
Multi-Task Learning for Scene Understanding
PhD Thesis Proposal Defence
Title: "Multi-Task Learning for Scene Understanding"
by
Mr. Hanrong HE
Abstract:
Multi-task scene understanding is a crucial research area with extensive
applications. Scene understanding involves essential tasks such as
semantic segmentation, depth estimation, and 3D object detection. By
learning these tasks simultaneously, multi-task models can achieve
greater effectiveness and efficiency. This thesis presents a series of
studies focused on multi-task learning for scene understanding.
First, previous convolution-based multi-task models face limitations in
capturing global context due to their reliance on convolutional layers.
Recognizing the importance of learning interactions across global
spatial dimensions of multiple tasks, we propose Inverted Pyramid
multi-task Transformer (InvPT). This approach enables simultaneous
modeling of spatial positions and multiple tasks within a unified
framework.
Secondly, in many multi-task models, the input feature is shared across
all tasks, and the task decoders use the same decoding parameters for
different input samples, resulting in a static decoding process
producing less distinctive task-specific representations. To address
this limitation, we introduce TaskExpert—a multi-task MoE model that
enables the learning of multiple representative task-generic feature
spaces while dynamically decoding task-specific features for different
inputs and tasks.
Thirdly, although previous approaches demonstrate promising multi-task
performance, they separate the learning of three essential
representations—task-generic, task-specific, and cross-task
interaction—into distinct network modules, which can be suboptimal when
designed manually. To overcome this, we present TaskPrompter, a
spatial-channel multi-task prompting transformer framework that learns
these three representations jointly within each layer. This integration
achieves higher performance with reduced computational cost.
In the final part of the thesis proposal, we summarize the contributions
and point out future research directions on multi-task scene
understanding.
Date: Monday, 25 November 2024
Time: 10:00am - 12:00noon
Venue: Room 5501
Lifts 25/26
Committee Members: Dr. Dan Xu (Supervisor)
Prof. Raymond Wong (Chairperson)
Dr. Qifeng Chen
Prof. Huamin Qu