More about HKUST
A UNIFIED FRAMEWORK AND APPLICATIONS OF MACHINE LEARNING METHODS FOR VISUAL TRACKING
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "A UNIFIED FRAMEWORK AND APPLICATIONS OF MACHINE LEARNING METHODS FOR VISUAL TRACKING" By Mr. Naiyan WANG Abstract Visual tracking is a fundamental task in computer vision. Numerous subsequent applications such as video semantic analysis call for reliable tracking results. In this thesis, we first delve into a tracker to analyze the most influential components for good performance. In particular, we decompose a tracker into five parts: motion model, feature extractor, observation model, model updater, and ensemble post-processor. We find that the feature extractor plays the most important role in a tracker. Although the observation model is the focus of many studies, we find that it often brings no significant improvement. Moreover, the motion model and model updater contain many details that could affect the result. The ensemble post-processor can also improve the result substantially when the constituent trackers have high diversity. Based on these analyses, we decide to apply advanced machine learning techniques for visual tracking. Particularly, we focus on the feature learning and ensemble post-processing parts. In feature learning, we try three different approaches which are based on online nonnegative robust dictionary learning, stacked denoising autoencoder (SDAE) and convolutional neural network (CNN). In the first method, we present an online robust non-negative dictionary learning algorithm for updating the object templates, and devise a novel online projected gradient descent method to solve the dictionary learning problem. Our algorithm blends the past information and the current tracking result in a principled way. It can automatically detect and reject the occlusion and cluttered background, yielding robust object templates. For the next two methods, we share the spirit of transfer learning. We propose to transfer generic image and object features which are learned offline to online tracking. Our first attempt is to train an SDAE using one million auxiliary natural images, and then transfer the encoder part of the SDAE as a feature extractor to online tracking. An additional classifier layer is appended to the top of the encoder in online tracking, then both the feature extractor and the classifier can be fine-tuned to adapt to appearance changes of the moving object. Our second approach is based on CNN. To tackle the disadvantages of the previous approach, we utilize a more powerful deep learning model called CNN. We further improve the vanilla CNN with objectness pretraining and structure output. Experiments on a recent benchmark [79] show the superiority of the proposed methods. For the ensemble post-processor, inspired by some recent studies on crowdsourcing, we propose a novel factorial hidden Markov model (FHMM) for aggregating structured output time series data. For efficient online inference of the FHMM, we devise a conditional particle filter algorithm by exploiting the structure of the joint posterior distribution of the hidden variables. The proposed method could significantly improve the results compared with those of individual trackers. Date: Monday, 10 August 2015 Time: 2:00pm - 4:00pm Venue: Room 2132A Lift 19 Chairman: Prof. Qian Liu (IELM) Committee Members: Prof. Dit Yan Yeung (Supervisor) Prof. James Kwok Prof. Long Quan Prof. Bertram Shi (ECE) Prof. Jiaya Jia (Comp. Sci. & Engg., CUHK) **** ALL are Welcome ****