More about HKUST
Synthesizing Images and Videos from Large-scale Datasets
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Synthesizing Images and Videos from Large-scale Datasets" By Miss Mingming HE Abstract The availability of large-scale visual data is increasingly inspiring sophisticated algorithms to process, understand and augment these resources. Particularly, with the rapid advancement of latest data-driven techniques, researchers have demonstrated exciting progress on a wide range of visual synthesis applications, drawing closer to the day that high-quality visual creation techniques are accessible to non-expert users. However, due to the lack of specific domain knowledge, the variety of target subjects, and the complexity of human perception, a majority of visual synthesis problems still remain challenging. In this dissertation, we focus on the algorithms for synthesizing both image color effects and video motion behaviors, to help create context-consistent and photorealistic visual content by leveraging the presence of large-scale visual data. First, we propose an image algorithm to transfer photo color style from one image to another based on semantically meaningful dense correspondence. To achieve accurate color transfer results that respect the semantic relationship between image content, our algorithm leverages the features learned by a deep neural network to build the dense correspondence. Meanwhile, it optimizes local linear color models to enforce both local and global consistency. Semantic matching and color models are jointly optimized in a coarse-to-fine manner. This approach is further extended from "one-to-one" to "one-to-many" color transfer to boost the matching reliability by introducing more reference candidates. However, for exemplar-based color synthesis applications including color transfer and colorization, it is still challenging to handle image pairs involving unrelated contents. The above "one-to-many" method is not a practical solution. Therefore, we take advantage of deep neural networks to better predict consistent chrominance across the whole image, including those mismatching elements, to achieve robust single-reference image colorization. Specifically, rather than using handcrafted rules as in traditional exemplar-based methods, we design an end-to-end colorization network which learns how to select, propagate, and predict colors from the large-scale dataset. This network generalizes well even when using reference images that are unrelated to the input grayscale image. Finally, besides synthesizing static images, we also explore video synthesis techniques by processing large-scale captures and manipulating their dynamism. We present an approach to create wide-angle, high-resolution looping panoramic videos. Starting with hundreds of registered videos acquired on a robotic mount, we formulate a combinatorial optimization to determine for each output pixel the source video and looping parameters that jointly maximize spatiotemporal consistency. Optimizing such large size of video data is challenging. We accelerate the optimization by reducing the set of source labels using a graph-coloring scheme, parallelizing the computation and implementing it out-of-core. These techniques are combined to create gigapixel-sized looping panoramas. Date: Thursday, 1 November 2018 Time: 3:00pm - 5:00pm Venue: Room 3494 Lifts 25/26 Chairman: Prof. Bing-Yi Jing (MATH) Committee Members: Prof. Pedro Sander (Supervisor) Prof. Huamin Qu Prof. Chiew-Lan Tai Prof. Ajay Joneja (ISD) Prof. Tien-Tsin Wong (CUHK) **** ALL are Welcome ****