Efficient Neural Networks for Image Recognition and Generation on the Edge

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Efficient Neural Networks for Image Recognition and Generation on the 
Edge"

By

Mr. Jierun CHEN


Abstract:

Neural networks have prevailed in many fields, with image recognition and 
generation as prominent examples. However, the rapid growth in model size and 
complexity has made them increasingly reliant on cloud servers or high-end 
GPUs. This dependency incurs high operational costs, privacy concerns, and 
round-trip latency. Deploying models to edge and mobile devices is emerging 
as a promising solution but often faces challenges such as limited memory, 
long runtime, and unsatisfactory user experience. In the post-Moore's Law 
era, hardware advancements alone are insufficient to overcome these 
challenges, highlighting the growing need for designing efficient neural 
networks for resource-constrained environments.

This thesis addresses these critical efficiency bottlenecks by introducing 
novel operators and architectures that enhance inference speed and reduce 
model size across various tasks, ranging from layout-specific applications to 
general visual tasks, and from image recognition to generation. First, we 
propose Translation Variant Convolution (TVConv), a novel operator optimized 
for layout-specific applications, such as face recognition, by leveraging 
spatial feature variance for efficient region-wise processing. Second, we 
identify inefficiencies in popular depthwise convolution, such as low compute 
intensity and frequent memory access, and present Partial Convolution (PConv) 
to overcome these inefficiencies. Building on this, we develop FasterNet, a 
family of neural networks that achieves considerably faster running speeds 
across various devices without sacrificing recognition accuracy. Finally, we 
introduce SnapGen, a highly compact and fast text-to-image model, capable of 
generating high-quality, high-resolution images directly and instantly on 
mobile devices. These contributions collectively advance the democratization 
of neural networks on edge devices, enabling seamless, cost-effective, and 
privacy-preserving services accessible anytime, anywhere.


Date:                   Monday, 6 January 2025

Time:                   3:30pm - 5:30pm

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Dr. Tuan Anh NGUYEN (LIFS)

Committee Members:      Prof. Gary CHAN (Supervisor)
                        Prof. Qiong LUO
                        Dr. Dan XU
                        Dr. Jun ZHANG (ECE)
                        Dr. Lai Man PO (CityU)