Deep Contextual Modeling: Exploiting Context in Spatial and Temporal Domains

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Deep Contextual Modeling: Exploiting Context in Spatial and 
Temporal Domains"

By

Mr. Yongyi LU


Abstract

Context plays a critical role in perceptual inference as it provides 
useful guidance to solve numerous tasks both in spatial and temporal 
domains. In this dissertation, we study several fundamental computer 
vision problems, i.e., object detection, image generation and high-level 
image understanding, by exploiting different spatial-temporal context to 
boost their performance.

Driven by the recent development of deep neural nets, we propose deep 
contextual modeling in spatial and temporal domains. Context here refers 
to one of the following application scenarios, e.g., (1) temporal 
coherence and consistence for object detection from video frames; (2) 
spatial constraint for conditional image synthesis, i.e., generating image 
from sketch; (3) domain-specific knowledge such as fcial attributes for 
natural face image generation.

We first study the problem of exploiting temporal context for object 
detection from video, where applying single frame-based object detector 
directly in video sequence tends to produce high temporal variation on 
frame-level output. With the recent advent in sequential modeling, we 
exploit long-range visual context for temporal coherence and consistence 
by proposing a novel association LSTM framework, which solves the 
regression and association tasks in video simultaneously. Next we 
investigate image generation guided by hand sketch in spatial domain. We 
design a joint image representation for learning joint distribution and 
correspondence of sketch-image pair. A contextual GAN framework is 
proposed to pose image generation as a constrained image completion 
problem, where sketch serves as weak spatial context. Therefore the output 
images do not necessarily follow the ugly sketch while still realistic. 
Finally we explore domain-specific context, i.e., face attribute and 
attribute-guided face generation: we condition the CycleGAN and propose 
conditional CycleGAN, which is designed to allow easy control of the 
appearance of the generated face via the facial attribute or identity 
context. We demonstrate three applications for identity-guided face 
generation.

For future research directions, we will study deep nets for jointly 
learning spatial and temporal context and explore the possibility of 
solving all applications using one single model. We will also expore more 
high-level context such as object part for general visual understanding.


Date:			Monday, 27 August 2018

Time:			2:00pm - 4:00pm

Venue:			Room 5504
 			Lifts 25/26

Chairman:		Prof. Xiaoping Wang (MATH)

Committee Members:	Prof. Chi-Keung Tang (Supervisor)
 			Prof. Huamin Qu
 			Prof. Long Quan
 			Prof. Sai-Kit Yeung (ISD)
 			Prof. Michael Brown (York Univ.)


**** ALL are Welcome ****