Efficient Neural Networks for Image Recognition and Generation

PhD Thesis Proposal Defence


Title: "Efficient Neural Networks for Image Recognition and Generation"

by

Mr. Jierun CHEN


Abstract:

Over the past decade, neural networks have prevailed in many fields, with image 
recognition and generation as prominent examples. However, their rapid growth 
in model size and complexity has outpaced the slowing of Moore's Law, 
restricting their deployment to cloud-based servers or high-performance GPUs. 
This dependency incurs significant operational costs, round-trip latency, 
reliance on internet connectivity, and privacy concerns due to data 
transmission to third parties. In resource-constrained environments, such as 
mobile and edge devices, these models also struggle with memory limitations, 
reduced processing speed, and poor user experience. Designing efficient neural 
networks is therefore essential to overcoming these challenges, democratizing 
neural networks, and unlocking their broader applications.

This thesis addresses the critical need for more efficient neural networks by 
introducing novel operators and architectures that enhance inference speed, 
reduce model size, and optimize memory usage across various tasks, from 
layout-specific to general visual applications, and from image recognition to 
generation. First, we propose Translation Variant Convolution (TVConv), a novel 
operator tailored for layout-specific tasks like face recognition, that 
leverages spatial feature variance to enable efficient region-wise processing. 
Next, we identify inefficiencies in widely used operators, such as low compute 
intensity and frequent memory access, and present Partial Convolution (PConv) 
overcoming those inefficiencies. Building on this, we propose FasterNet, a 
family of neural networks that delivers considerably faster running speeds 
across multiple devices without sacrificing accuracy. Finally, We develop 
EfficientGen, a portable, cost-effective diffusion model for text-to-image 
generation on mobile devices, supporting high resolutions, flexible aspect 
ratios, and producing superior visual quality in under a second.


Date:                   Friday, 18 October 2024

Time:                   3:00pm - 5:00pm

Venue:                  Room 5501
                        Lifts 25/26

Committee Members:      Prof. Gary Chan (Supervisor)
                        Prof. Raymond Wong (Chairperson)
                        Prof. Pedro Sander
                        Dr. Dan Xu