Urban Scene Parsing with Images and Scan Data

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Urban Scene Parsing with Images and Scan Data"

By

Mr. Honghui Zhang


Abstract

Urban scene parsing, segmenting interested objects and identifying their 
categories in urban scenes, is a fundamental issue for the urban scene 
understanding. As a representative of the constrained scene parsing task, 
it is closely related to many important applications paid great attention 
recently, like 3D city modeling and autonomous vehicles navigation. In 
this thesis, we investigate the methodology for the urban scene parsing 
task with images and scan data, as well as the parameter learning of 
random field models which are wide used to formulate various scene parsing 
tasks.

With both images and scan data, we propose a novel joint image and scan 
data scene parsing system which can be applied in large scale urban 
scenes. The proposed system can automatically obtain necessary training 
data from the input data, which is usually obtained through manually 
labeling in previous work. Then, an associative Hierarchical CRF trained 
with the automatically obtained training data is adopted to jointly 
segment images and scan point cloud by integrating both 3D geometry 
information and 2D image appearance information. For the urban image 
parsing, we propose a nonparametric scene parsing method which exploits 
the partial similarity between images, and a parametric scene parsing 
method, the supervised label transfer method. The partial similarity based 
nonparametric method involves no training process and reduces the 
inference problem in the scene parsing to a matching problem. By contrast, 
the supervised label transfer method transforms the inference problem in 
the scene parsing to a supervised matching problem, inheriting the 
advantages of the nonparametric scene parsing methods. These proposed 
methods are evaluated and compared with some state-of-the-art methods on 
several public datasets and the real Google Street View data, with 
encouraging performance achieved.

Last but not least, we propose an adaptive discriminative learning 
algorithm to learn the parameters of the random field models which are 
widely used to formulate various scene parsing tasks, from some given 
training data. The parameters are iteratively updated with an adaptive 
updating step-size by solving a structured prediction problem, sharing 
similar updating form as the Projected Subgradient method. The proposed 
adaptive discriminative learning algorithm can achieve comparable 
performance as the classical StructSVM method and Projected Subgradient 
method, with significantly improved learning efficiency.


Date:			Friday, 3 August 2012

Time:			3:00pm – 5:00pm

Venue:			Room 3501
 			Lifts 25/26

Chairman:		Prof. Li Qiu (ECE)

Committee Members:	Prof. Long Quan (Supervisor)
 			Prof. Helen Shen
 			Prof. Dit-Yan Yeung
 			Prof. Kai Tang (MECH)
                         Prof. Edmond Boyer (INRIA, France)


**** ALL are Welcome ****