A UNIFIED FRAMEWORK AND APPLICATIONS OF MACHINE LEARNING METHODS FOR VISUAL TRACKING

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "A UNIFIED FRAMEWORK AND APPLICATIONS OF MACHINE LEARNING METHODS 
FOR VISUAL TRACKING"

By

Mr. Naiyan WANG


Abstract

Visual tracking is a fundamental task in computer vision. Numerous 
subsequent applications such as video semantic analysis call for reliable 
tracking results. In this thesis, we first delve into a tracker to analyze 
the most influential components for good performance. In particular, we 
decompose a tracker into five parts: motion model, feature extractor, 
observation model, model updater, and ensemble post-processor. We find 
that the feature extractor plays the most important role in a tracker. 
Although the observation model is the focus of many studies, we find that 
it often brings no significant improvement. Moreover, the motion model and 
model updater contain many details that could affect the result. The 
ensemble post-processor can also improve the result substantially when the 
constituent trackers have high diversity.

Based on these analyses, we decide to apply advanced machine learning 
techniques for visual tracking. Particularly, we focus on the feature 
learning and ensemble post-processing parts. In feature learning, we try 
three different approaches which are based on online nonnegative robust 
dictionary learning, stacked denoising autoencoder (SDAE) and 
convolutional neural network (CNN). In the first method, we present an 
online robust non-negative dictionary learning algorithm for updating the 
object templates, and devise a novel online projected gradient descent 
method to solve the dictionary learning problem. Our algorithm blends the 
past information and the current tracking result in a principled way. It 
can automatically detect and reject the occlusion and cluttered 
background, yielding robust object templates. For the next two methods, we 
share the spirit of transfer learning. We propose to transfer generic 
image and object features which are learned offline to online tracking. 
Our first attempt is to train an SDAE using one million auxiliary natural 
images, and then transfer the encoder part of the SDAE as a feature 
extractor to online tracking. An additional classifier layer is appended 
to the top of the encoder in online tracking, then both the feature 
extractor and the classifier can be fine-tuned to adapt to appearance 
changes of the moving object. Our second approach is based on CNN. To 
tackle the disadvantages of the previous approach, we utilize a more 
powerful deep learning model called CNN. We further improve the vanilla 
CNN with objectness pretraining and structure output. Experiments on a 
recent benchmark [79] show the superiority of the proposed methods.

For the ensemble post-processor, inspired by some recent studies on 
crowdsourcing, we propose a novel factorial hidden Markov model (FHMM) for 
aggregating structured output time series data. For efficient online 
inference of the FHMM, we devise a conditional particle filter algorithm 
by exploiting the structure of the joint posterior distribution of the 
hidden variables. The proposed method could significantly improve the 
results compared with those of individual trackers.


Date:			Monday, 10 August 2015

Time:			2:00pm - 4:00pm

Venue:			Room 2132A
 			Lift 19

Chairman:		Prof. Qian Liu (IELM)

Committee Members:	Prof. Dit Yan Yeung (Supervisor)
 			Prof. James Kwok
 			Prof. Long Quan
 			Prof. Bertram Shi (ECE)
 			Prof. Jiaya Jia (Comp. Sci. & Engg., CUHK)


**** ALL are Welcome ****