More about HKUST
Learning Representations for Efficient Data Processing, 3D Perception, and Planning in Autonomous Driving
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Learning Representations for Efficient Data Processing, 3D Perception, and
Planning in Autonomous Driving"
By
Mr. Zhili CHEN
Abstract:
Enabling autonomous systems to perceive, reason, and interact safely with the
3D world is fundamental to advancing physical intelligence. Data, 3D
perception, and planning are the three primary pillars of building reliable
autonomous systems.
The rapid proliferation of intelligent vehicles generates massive amounts of
sensor data that can empower more advanced models. Yet, raw sensor data impose
heavy storage and transmission burdens, especially for sparse, disordered
point cloud data. While learning-based compression methods for point cloud
data show promise, they have not fully exploited the inherent redundancies in
the data. Perception largely determines the performance limits of an
autonomous system. It requires finer-grained geometric modeling within limited
computational budgets, along with more effective representations that can fuse
rich multi-sensor scene details for diverse downstream tasks. Planning further
demands an understanding of the spatial-temporal dynamics among traffic
participants, map elements, and the environment, necessitating finer-grained
interaction and game modeling to support reliable, human-like decision-making.
This thesis aims to tackle challenges lying in data, perception, and planning
through a progressive line of work. We propose an octree-based compression
framework for point cloud data. By leveraging the siblings' children context
at finer-grained resolution, the better-learned representation enables the
entropy model to encode the point cloud data into a more compact bitstream.
Extending the idea of learning more representative features for point cloud
data, we introduce an efficient plug-and-play cross-cluster shifting operation
that improves object recognition performance by enabling information exchange
and modeling longer-range dependencies among points. We further propose an
efficient vector representation that fuses fine-grained features across
sensors, in contrast to the Bird's-Eye-View (BEV) representation, which incurs
quadratic computational costs. Finally, to improve planning for self-driving
vehicles, we explicitly model the interactions among ego-to-agent, ego-to-map,
and ego-to-BEV query representations by interleaving planning and prediction
throughout the prediction horizon, rather than relying on a single sequential
interaction.
Together, these contributions advance representation learning for
data-efficient processing, 3D perception, and interaction-aware planning
toward safer and more reliable autonomous driving.
Date: Tuesday, 19 May 2026
Time: 2:00pm - 4:00pm
Venue: Room 2128B
Lift 19
Chairman: Prof. King Lau CHOW (LIFS)
Committee Members: Dr. Qifeng CHEN (Supervisor)
Dr. May FUNG
Dr. Dan XU
Prof. Ping TAN (ECE)
Prof. Si LIU (Beihang University)