Towards Sustainable Scale-Up of LLMs

Speaker: Dr. Jie FU
         Visiting scholar
         Hong Kong University of Science and Technology

Title:  "Towards Sustainable Scale-Up of LLMs"

Date:   Monday; 18 March 2024

Time:   4:00pm - 5:00pm

Venue:  Lecture Theater F
        (Leung Yat Sing Lecture Theater), near lift 25/26


Increasing the model size, dataset size, and amount of compute for
training has been shown to steadily improve the performance of Large
Language Models (LLMs). However, unlike labs affiliated with companies
like Google, which have access to vast computational resources, academic
labs face the challenge of finding alternative and more sustainable ways
of scaling up LLMs. In this talk, I will describe our journey of
pre-training a 7B-parameter model from scratch within the Hong Kong
University of Science and Technology (HKUST). I will delve into the
technical aspects of our approach, including the architecture of our
model, the training dataset, and the optimization techniques employed.
Furthermore, I will discuss the computational resources and infrastructure
utilized, highlighting the challenges faced and the solutions implemented
to overcome them within an academic setting. In addition to the practical
experience of training a large-scale LLM, I will also share some of our
ongoing investigations into modular design and continual learning as
potential avenues for sustainable scale-up.


Jie Fu is a visiting scholar at Hong Kong University of Science and
Technology (HKUST). He was a postdoctoral fellow (funded by Microsoft
Research Montreal) supervised by Yoshua Bengio at University of Montreal,
Quebec AI Institute (Mila). He was an IVADO postdoctoral fellow supervised
by Chris Pal at Polytechnique Montreal, Quebec AI Institute (Mila). He
worked as a researcher (PI) at Beijing Academy of Artificial Intelligence.
He obtained his PhD from National University of Singapore under the
supervision of Tat-Seng Chua. He received ICLR 2021 Outstanding Paper
Award. His research agenda is centered around establishing an LLM within a
unified, safe, and scalable world model framework, enabling tight
integration of System-1 (e.g., intuitive, fast, unconscious) and System-2
(e.g., slow, logical, conscious, algorithmic, planning) functionalities.