Multi-Task Learning for Scene Understanding

PhD Thesis Proposal Defence


Title: "Multi-Task Learning for Scene Understanding"

by

Mr. Hanrong HE


Abstract:

Multi-task scene understanding is a crucial research area with extensive 
applications. Scene understanding involves essential tasks such as 
semantic segmentation, depth estimation, and 3D object detection. By 
learning these tasks simultaneously, multi-task models can achieve 
greater effectiveness and efficiency. This thesis presents a series of 
studies focused on multi-task learning for scene understanding.

First, previous convolution-based multi-task models face limitations in 
capturing global context due to their reliance on convolutional layers. 
Recognizing the importance of learning interactions across global 
spatial dimensions of multiple tasks, we propose Inverted Pyramid 
multi-task Transformer (InvPT). This approach enables simultaneous 
modeling of spatial positions and multiple tasks within a unified 
framework.

Secondly, in many multi-task models, the input feature is shared across 
all tasks, and the task decoders use the same decoding parameters for 
different input samples, resulting in a static decoding process 
producing less distinctive task-specific representations. To address 
this limitation, we introduce TaskExpert—a multi-task MoE model that 
enables the learning of multiple representative task-generic feature 
spaces while dynamically decoding task-specific features for different 
inputs and tasks.

Thirdly, although previous approaches demonstrate promising multi-task 
performance, they separate the learning of three essential 
representations—task-generic, task-specific, and cross-task 
interaction—into distinct network modules, which can be suboptimal when 
designed manually. To overcome this, we present TaskPrompter, a 
spatial-channel multi-task prompting transformer framework that learns 
these three representations jointly within each layer. This integration 
achieves higher performance with reduced computational cost.

In the final part of the thesis proposal, we summarize the contributions 
and point out future research directions on multi-task scene 
understanding.


Date:                   Monday, 25 November 2024

Time:                   10:00am - 12:00noon

Venue:                  Room 5501
                        Lifts 25/26

Committee Members:      Dr. Dan Xu (Supervisor)
                        Prof. Raymond Wong (Chairperson)
                        Dr. Qifeng Chen
                        Prof. Huamin Qu