A Deep Learning Approach for Video Object Detection

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

Final Year Thesis Oral Presentation

Title: "A Deep Learning Approach for Video Object Detection"

by

Mr. Hengyuan HU


Abstract:

Object detection is one of the most fundamental problems in the field of 
computer vision. In the past several decades, the focus of object detection 
research was mainly on analyzing single images. This was caused by the 
limitation on computational power and the lack of well-labeled video 
datasets. However, we have witnessed a change of landscape, notably 
exemplified during the second half of 2015 by the ImageNet Large Scale 
Visual Recognition Challenge (ILSVRC) when a dataset was introduced for 
object detection from video inputs.  With the ample data generously offered 
by the ILSVRC organizers, mature deep learning framework and implementation 
(Caffe), and advanced GPU acceleration technology, it is an opportune time 
for intensifying the research on object detection from video, particularly 
using the state-of-the-art convolutional neural network and related deep 
learning methods. In this final year thesis, I carefully studied this 
problem in two stages. In the first stage, we tailored an existing image 
object detection architecture, the Fast-RCNN, in order to improve the 
performance on the video dataset. With the experience gained in the first 
stage, I have devised several innovative architectures in the second stage, 
which are carefully engineered to detect video objects by adequately 
considering the inherent temporal information in the data.


Date                 : 9 May 2016 (Monday)

Time                 : 3:30pm to 4:30pm

Venue                : Room 5510 (lift 25/26)

Advisor              : Prof. C.K. TANG

2nd Reader           : Prof. Long QUAN