More about HKUST
Efficient Neural Networks for Image Recognition and Generation
PhD Thesis Proposal Defence Title: "Efficient Neural Networks for Image Recognition and Generation" by Mr. Jierun CHEN Abstract: Over the past decade, neural networks have prevailed in many fields, with image recognition and generation as prominent examples. However, their rapid growth in model size and complexity has outpaced the slowing of Moore's Law, restricting their deployment to cloud-based servers or high-performance GPUs. This dependency incurs significant operational costs, round-trip latency, reliance on internet connectivity, and privacy concerns due to data transmission to third parties. In resource-constrained environments, such as mobile and edge devices, these models also struggle with memory limitations, reduced processing speed, and poor user experience. Designing efficient neural networks is therefore essential to overcoming these challenges, democratizing neural networks, and unlocking their broader applications. This thesis addresses the critical need for more efficient neural networks by introducing novel operators and architectures that enhance inference speed, reduce model size, and optimize memory usage across various tasks, from layout-specific to general visual applications, and from image recognition to generation. First, we propose Translation Variant Convolution (TVConv), a novel operator tailored for layout-specific tasks like face recognition, that leverages spatial feature variance to enable efficient region-wise processing. Next, we identify inefficiencies in widely used operators, such as low compute intensity and frequent memory access, and present Partial Convolution (PConv) overcoming those inefficiencies. Building on this, we propose FasterNet, a family of neural networks that delivers considerably faster running speeds across multiple devices without sacrificing accuracy. Finally, We develop EfficientGen, a portable, cost-effective diffusion model for text-to-image generation on mobile devices, supporting high resolutions, flexible aspect ratios, and producing superior visual quality in under a second. Date: Friday, 18 October 2024 Time: 3:00pm - 5:00pm Venue: Room 5501 Lifts 25/26 Committee Members: Prof. Gary Chan (Supervisor) Prof. Raymond Wong (Chairperson) Prof. Pedro Sander Dr. Dan Xu