More about HKUST
Deep Contextual Modeling: Exploiting Context in Spatial and Temporal Domains
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Deep Contextual Modeling: Exploiting Context in Spatial and Temporal Domains" By Mr. Yongyi LU Abstract Context plays a critical role in perceptual inference as it provides useful guidance to solve numerous tasks both in spatial and temporal domains. In this dissertation, we study several fundamental computer vision problems, i.e., object detection, image generation and high-level image understanding, by exploiting different spatial-temporal context to boost their performance. Driven by the recent development of deep neural nets, we propose deep contextual modeling in spatial and temporal domains. Context here refers to one of the following application scenarios, e.g., (1) temporal coherence and consistence for object detection from video frames; (2) spatial constraint for conditional image synthesis, i.e., generating image from sketch; (3) domain-specific knowledge such as fcial attributes for natural face image generation. We first study the problem of exploiting temporal context for object detection from video, where applying single frame-based object detector directly in video sequence tends to produce high temporal variation on frame-level output. With the recent advent in sequential modeling, we exploit long-range visual context for temporal coherence and consistence by proposing a novel association LSTM framework, which solves the regression and association tasks in video simultaneously. Next we investigate image generation guided by hand sketch in spatial domain. We design a joint image representation for learning joint distribution and correspondence of sketch-image pair. A contextual GAN framework is proposed to pose image generation as a constrained image completion problem, where sketch serves as weak spatial context. Therefore the output images do not necessarily follow the ugly sketch while still realistic. Finally we explore domain-specific context, i.e., face attribute and attribute-guided face generation: we condition the CycleGAN and propose conditional CycleGAN, which is designed to allow easy control of the appearance of the generated face via the facial attribute or identity context. We demonstrate three applications for identity-guided face generation. For future research directions, we will study deep nets for jointly learning spatial and temporal context and explore the possibility of solving all applications using one single model. We will also expore more high-level context such as object part for general visual understanding. Date: Monday, 27 August 2018 Time: 2:00pm - 4:00pm Venue: Room 5504 Lifts 25/26 Chairman: Prof. Xiaoping Wang (MATH) Committee Members: Prof. Chi-Keung Tang (Supervisor) Prof. Huamin Qu Prof. Long Quan Prof. Sai-Kit Yeung (ISD) Prof. Michael Brown (York Univ.) **** ALL are Welcome ****