More about HKUST
Learning Representations for Efficient Data Processing, 3D Perception, and Planning in Autonomous Driving
PhD Thesis Proposal Defence Title: "Learning Representations for Efficient Data Processing, 3D Perception, and Planning in Autonomous Driving" by Mr. Zhili CHEN Abstract: Enabling autonomous systems to perceive, reason, and interact safely with the 3D world is fundamental to advancing physical intelligence. Data, 3D perception, and planning are the three primary pillars of building reliable autonomous systems. The rapid proliferation of intelligent vehicles generates massive amounts of sensor data that can empower more advanced models. Yet, raw sensor data impose heavy storage and transmission burdens, especially for sparse, disordered point cloud data. While learning-based compression methods for point cloud data show promise, they have not fully exploited the inherent redundancies in the data. Perception determines the upper and lower bounds of system performance. It demands finer-grained geometric modeling within a limited computation budget, along with more efficient representations capable of fusing rich details across sensors, followed by various task heads applied to them. Planning further demands an understanding of the temporal dynamics among traffic participants, map elements, and the environment, necessitating more fine-grained interaction and game modeling for reliable, human-like decision-making. This thesis aims to tackle challenges lying in data, perception, and planning through a progressive line of work. We propose an octree-based compression framework for point cloud data. By leveraging the siblings' children context at finer-grained resolution, the better-learned representation enables the entropy model to encode the point cloud data into a more compact bitstream. Extending the idea of learning more representative features for point cloud data, we introduce an efficient plug-and-play cross-cluster shifting operation that improves the object recognition performance by exchanging and modeling larger-range dependencies among points. We further propose an efficient vector representation that fuses finer features across sensors, in contrast to the Bird's-Eye-View (BEV) representation, which has a quadratic computational cost. Finally, to improve planning for self-driving vehicles, we explicitly model the interactions among ego-to-agent, ego-to-map, and ego-to-BEV query representations by interleaving planning and prediction at every future timestep, rather than a single sequential interaction modeling. Together, these contributions advance data efficiency, 3D perception, and interaction-aware planning toward safer and more reliable autonomous driving. Date: Wednesday, 26 November 2025 Time: 3:00pm - 5:00pm Zoom Meeting: https://hkust.zoom.us/j/99567187908?pwd=HnrWxfctvL7sCBgxCtihpJl0N5HTVc.1 Committee Members: Dr. Qifeng Chen (Supervisor) Dr. Dan Xu (Chairperson) Dr. Long Chen