Spatial-temporal Feature Processing in 3D Point Clouds for Autonomous Driving

PhD Thesis Defence


Title: "Spatial-temporal Feature Processing in 3D Point Clouds for Autonomous 
Driving"

By

Mr. Sukai WANG


Abstract

In past decades, the 3D point cloud data has been one of the most important 
data types across a variety of domains, such as robotics, architecture, and 
especially in the autonomous driving assistance system (ADAS) because of its 
robustness and spatial accuracy. To better understand point cloud data in the 
big data era, machine learning-based approaches gradually occupied a dominant 
position. Thus, in this thesis, I focus on the point cloud sequences’ spatial 
and temporal feature extraction and processing in learning-based methods.

In ADS, point clouds will be used widely in perception, localization, and 
mapping. And at the beginning, point cloud collection, storage, and 
transmission are the predecessor of those downstream tasks. This thesis 
explores several feature extraction networks in two different directions: point 
cloud compression and multiple object detection and tracking. The perception 
task can be seen as one of the downstream tasks of the data compression.

In end-to-end point cloud compression, I first propose a baseline range 
image-based method to prove that the range image-based compression framework is 
better than the octree-based methods for scanning LiDARs in autonomous driving. 
Then, motivated by video compression, I introduce a hybrid point cloud sequence 
compression framework, which consists of a static and a dynamic learning-based 
point cloud compression algorithm. In the static compression framework, a 
geometry-aware attention layer helps remove spatial redundancy. In the dynamic 
compression framework, the conv-LSTM with GHU module is used for temporal 
redundancy removal. And in the downstream task, 3D multiple object detection 
and tracking, I first propose a ”fake” end-to-end tracking-withdetection 
framework by predicting the objects’ movement to improve the data association 
accuracy. Then I introduce a ”real” end-to-end MOT network, ST-TrackNet, which 
rearranges the object detections in a Spatio-temporal map and then directly 
predicts the object track ID without the data association step. Based on the 
above research, I propose DiTNet, which integrates a detection module with the 
tracking network. The features from the detection module help to improve the 
tracking performance, and the tracking module with final trajectories also 
helps to refine the detection results. Lastly, I summarize this thesis and 
propose future research opportunities.


Date:			Monday, 5 December 2022

Time:			10:00am - 12:00noon

Venue:			Room 4472
 			lifts 25/26

Chairperson:		Prof. Maosheng XIONG (MATH)

Committee Members:	Prof. Ming LIU (Supervisor)
 			Prof. Qifeng CHEN
 			Prof. Cunsheng DING
 			Prof. Ling SHI (ECE)
 			Prof. Hesheng WANG (Shanghai Jiao Tong University)


**** ALL are Welcome ****