More about HKUST
Photorealistic Digital Humans: From Controllable Talking Heads to Free‑Viewpoint 4D Avatars
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Photorealistic Digital Humans: From Controllable Talking Heads to
Free‑Viewpoint 4D Avatars"
By
Mr. Fating HONG
Abstract:
Human-centric video generation, encompassing facial synthesis and full-body
animation, is crucial for virtual reality and digital content creation.
However, accurately capturing complex human dynamics remains challenging. Key
bottlenecks include maintaining 3D structural consistency, handling severe
occlusions from large poses, harmonizing conflicting multi-modal driving
signals, and preserving rendering quality across extreme viewpoints and
scales. Existing methods often struggle to balance these factors, resulting
in structural artifacts, control conflicts, and degraded visual fidelity.
This thesis addresses these challenges through a series of novel deep
generative frameworks, progressing from localized talking head synthesis to
free-viewpoint 4D human animation. First, to ensure 3D structural integrity
in facial generation, it introduces DaGAN and DaGAN++, leveraging
self-supervised dense geometry learning. To handle severe occlusions, it
develops MCNet, which queries a global facial meta-memory via implicit
identity representations to robustly recover missing regions. Second, it
resolves multi-modal control conflicts with ACTalker, a video diffusion model
utilizing a Parallel-Control Mamba architecture to harmoniously synchronize
audio-driven lip movements and visually-driven global expressions. Third,
expanding to full-body dynamics under novel views, it proposes a
free-viewpoint diffusion framework using pose-correlated adaptive token
selection to efficiently aggregate appearance context from multiple
references. Finally, to overcome the scale inconsistency of discrete 3D
Gaussian Splatting, it introduces ContinueAvatar. By integrating a continuous
feature splatting mechanism, this framework formulates rendering as a
continuous spatial function to achieve arbitrary-resolution 4D avatars.
Extensive empirical evaluations demonstrate that these contributions achieve
state-of-the-art realism, controllability, and multi-scale robustness.
Date: Wednesday, 6 May 2026
Time: 10:00am - 12:00noon
Venue: Room 2132C
Lift 22
Chairman: Prof. Qinglu ZENG (OCES)
Committee Members: Dr. Dan XU (Supervisor)
Dr. Long CHEN
Dr. Qifeng CHEN
Dr. Wenhan LUO (AMC)
Prof. Lu SHENG (Beihang University)