Learning Representations for Efficient Data Processing, 3D Perception, and Planning in Autonomous Driving

PhD Thesis Proposal Defence


Title: "Learning Representations for Efficient Data Processing, 3D Perception,
and Planning in Autonomous Driving"

by

Mr. Zhili CHEN


Abstract:

Enabling autonomous systems to perceive, reason, and interact safely with the
3D world is fundamental to advancing physical intelligence. Data, 3D
perception, and planning are the three primary pillars of building reliable
autonomous systems.

The rapid proliferation of intelligent vehicles generates massive amounts of
sensor data that can empower more advanced models. Yet, raw sensor data
impose heavy storage and transmission burdens, especially for sparse,
disordered point cloud data. While learning-based compression methods for
point cloud data show promise, they have not fully exploited the inherent
redundancies in the data. Perception determines the upper and lower bounds of
system performance. It demands finer-grained geometric modeling within a
limited computation budget, along with more efficient representations capable
of fusing rich details across sensors, followed by various task heads applied
to them. Planning further demands an understanding of the temporal dynamics
among traffic participants, map elements, and the environment, necessitating
more fine-grained interaction and game modeling for reliable, human-like
decision-making.

This thesis aims to tackle challenges lying in data, perception, and planning
through a progressive line of work. We propose an octree-based compression
framework for point cloud data. By leveraging the siblings' children context
at finer-grained resolution, the better-learned representation enables the
entropy model to encode the point cloud data into a more compact bitstream.
Extending the idea of learning more representative features for point cloud
data, we introduce an efficient plug-and-play cross-cluster shifting
operation that improves the object recognition performance by exchanging and
modeling larger-range dependencies among points. We further propose an
efficient vector representation that fuses finer features across sensors, in
contrast to the Bird's-Eye-View (BEV) representation, which has a quadratic
computational cost. Finally, to improve planning for self-driving vehicles,
we explicitly model the interactions among ego-to-agent, ego-to-map, and
ego-to-BEV query representations by interleaving planning and prediction at
every future timestep, rather than a single sequential interaction modeling.

Together, these contributions advance data efficiency, 3D perception, and
interaction-aware planning toward safer and more reliable autonomous driving.


Date:                   Wednesday, 26 November 2025

Time:                   3:00pm - 5:00pm

Zoom Meeting:
https://hkust.zoom.us/j/99567187908?pwd=HnrWxfctvL7sCBgxCtihpJl0N5HTVc.1

Committee Members:      Dr. Qifeng Chen (Supervisor)
                        Dr. Dan Xu (Chairperson)
                        Dr. Long Chen