Spatial Temporal Feature Processing in 3D Point Clouds for Autonomous Driving

PhD Thesis Proposal Defence


Title: "Spatial Temporal Feature Processing in 3D Point Clouds for Autonomous 
Driving"

by

Mr. Sukai WANG


Abstract:

In past decades, the 3D point cloud data has been one of the most important 
data types across a variety of domains, such as robotics, architecture, and 
especially in the autonomous driving system (ADS), because of its robustness 
and spatial accuracy. Com- pared with the 2D image, point clouds have accurate 
location and geometry information, and in real driving scenarios, the short 
distance between two adjacent point clouds in data stream brings abundant 
temporal redundancy. To better understand point cloud data in the big data era, 
machine learning-based approaches gradually occupied a dominant position. Thus, 
in this proposal, we focus on the point cloud’s spatial and temporal feature 
extraction and processing in learning-based methods.

In ADS, point clouds will be used widely in perception, localization, and 
mapping. And point cloud collection, storage, and transmission are the 
predecessor of those downstream tasks. This thesis explores several feature 
extraction networks in two different stages with suitable data representation 
types: point cloud compression in range image and multiple object detection and 
tracking in spatio-temporal map.

In end-to-end point cloud compression, we first prove that our proposed range 
image- based compression framework with the entropy model is better than the 
state-of-the-art octree-based methods. Then, we introduce a hybrid point cloud 
sequence compression framework, which consists of a static and a dynamic point 
cloud compression algorithm. In the static compression framework, a 
geometry-aware attention layer is used to remove spatial redundancy. In the 
dynamic compression framework, the conv-LSTM with GHU module is used for 
temporal redundancy removal. In the downstream task, 3D multiple object 
detection and tracking, we propose an end-to-end point cloud-based network, 
DiT- Net, to directly assign a track ID to each object across the whole 
sequence, without the data association step. DiTNet is made location-invariant 
by using relative location and embeddings to learn each object’s spatial and 
temporal features in the Spatio-temporal world. The features from the detection 
module help to improve the tracking performance, and the tracking module with 
final trajectories also helps to refine the detection results. Lastly, we will 
conclude and explore the remaining problem for further research in the 3D 
object detection refinement under a priori tracking results in spatio-temporal 
world, and explore how to choose the best feature extraction module in a 
long-term frame sequence in all spatio-temporal frameworks.


Date:  			Friday, 19 August 2022

Time:                  	10:00am - 12:00noon

Zoom Meeting: 
https://hkust.zoom.us/j/95872323558?pwd=c003RHNWdjR3WkJJelJJd2VNN01WQT09

Committee Members:	Dr. Ming Liu (Supervisor)
 			Prof. Cunsheng Ding (Chairperson)
 			Dr. Qifeng Chen
 			Prof. Ling Shi (ECE)


**** ALL are Welcome ****