More about HKUST
Towards Sustainable Scale-Up of LLMs
Speaker: Dr. Jie FU Visiting scholar Hong Kong University of Science and Technology Title: "Towards Sustainable Scale-Up of LLMs" Date: Monday; 18 March 2024 Time: 4:00pm - 5:00pm Venue: Lecture Theater F (Leung Yat Sing Lecture Theater), near lift 25/26 HKUST Abstract: Increasing the model size, dataset size, and amount of compute for training has been shown to steadily improve the performance of Large Language Models (LLMs). However, unlike labs affiliated with companies like Google, which have access to vast computational resources, academic labs face the challenge of finding alternative and more sustainable ways of scaling up LLMs. In this talk, I will describe our journey of pre-training a 7B-parameter model from scratch within the Hong Kong University of Science and Technology (HKUST). I will delve into the technical aspects of our approach, including the architecture of our model, the training dataset, and the optimization techniques employed. Furthermore, I will discuss the computational resources and infrastructure utilized, highlighting the challenges faced and the solutions implemented to overcome them within an academic setting. In addition to the practical experience of training a large-scale LLM, I will also share some of our ongoing investigations into modular design and continual learning as potential avenues for sustainable scale-up. ********************* Biography: Jie Fu is a visiting scholar at Hong Kong University of Science and Technology (HKUST). He was a postdoctoral fellow (funded by Microsoft Research Montreal) supervised by Yoshua Bengio at University of Montreal, Quebec AI Institute (Mila). He was an IVADO postdoctoral fellow supervised by Chris Pal at Polytechnique Montreal, Quebec AI Institute (Mila). He worked as a researcher (PI) at Beijing Academy of Artificial Intelligence. He obtained his PhD from National University of Singapore under the supervision of Tat-Seng Chua. He received ICLR 2021 Outstanding Paper Award. His research agenda is centered around establishing an LLM within a unified, safe, and scalable world model framework, enabling tight integration of System-1 (e.g., intuitive, fast, unconscious) and System-2 (e.g., slow, logical, conscious, algorithmic, planning) functionalities.