More about HKUST
Integer-Only Training and Low-Bit Quantization of Deep Neural Networks without Graphic Processing Units
PhD Thesis Proposal Defence Title: "Integer-Only Training and Low-Bit Quantization of Deep Neural Networks without Graphic Processing Units" by Mr. Jaewoo SONG Abstract: Deep neural networks (DNNs) have become essential in diverse fields, driven by advancements in graphics processing units (GPUs). However, the increasing complexity of DNN models requires substantial computational resources, posing challenges for training and inference without access to high-performance hardware. This research addresses these challenges by exploring integer-based approaches to DNN training and inference, aiming to reduce storage and computational demands. For training, the study introduces a novel application of direct feedback alignment (DFA) to enable integer-only training, overcoming the limitations of backpropagation, which is prone to integer overflow. The developed framework, PocketNN, implemented in pure C++, demonstrates successful integer-only training and has been adopted by other research projects for its compatibility with low-power edge devices. For inference, the study specifically targeted the quantization of large language models (LLMs). The research proposes SplitQuantV2, an innovative algorithm that splits layers with weights in DNNs to increase quantization resolution. While most of quantization algorithms specialized for LLMs require high-end GPUs and calibration dataset, SplitQuantV2 does not require neither of them and can run efficiently with CPUs. Tested on Llama 3.2 1B Instruct model, SplitQuantV2 significantly improves the accuracy of INT4 quantized models, achieving results comparable to their floating-point counterparts, while requiring minimal computational resources. Together, PocketNN and SplitQuantV2 advance the field of integer-based DNN training and quantization. Promoting the deployment of artificial intelligence (AI) in environments with limited hardware capabilities and contributing to energy conservation, this research will ultimately broaden the accessibility and applicability of AI technologies. Date: Monday, 17 February 2025 Time: 9:45am - 11:30am Venue: Room 4472 Lifts 25/26 Committee Members: Prof. Fangzhen Lin (Supervisor) Prof. Dit-Yan Yeung (Chairperson) Dr. Shuai Wang