Guest Details
Hongxia Yang
Professor at Hong Kong Polytechnic University
Prof. Hongxia Yang, Associate Dean of Faculty of Computer and Mathematical Sciences & Professor, PhD from Duke University, has published over 150 top conference and journal papers, and held more than 50 patents. She has been awarded the highest prize of the 2019 World Artificial Intelligence Conference, Super AI Leader (SAIL Award), the second prize of the 2020 National Science and Technology Progress Award, the first prize of Science and Technology Progress of the Chinese Institute of Electronics in 2021, the Forbes China Top 50 Women in Science and Technology and Ministry of Education Science and Technology Progress Award First Class in 2022, AI 2000 Most Influential Scholar Award since 2023 and Top World 50 Women in AI and Web3 by CoinDesk in 2025. She used to work as the ByteDance US LLM Head, AI Scientist and Director in Alibaba Group, Principal Data Scientist at Yahoo! Inc and Research Staff Member at IBM T.J. Watson Research Center, joint adjunct professor at Zhejiang University Shanghai Advanced Research Institute respectively. She founded the foundation model teams at both Alibaba and ByteDance and is a pioneer in the field of Generative AI.
Talk
Title: Co-GenAI: A Novel Fusion-Driven Platform
Abstract: We introduce a groundbreaking platform designed to make AI development more accessible and efficient. At its core is Domain-Adaptive Continual Pretraining (DACP), a pioneering system that enhances Large Language Models by continuously pretraining them on domain-specific unlabeled data, allowing for effective specialization in enterprise and scientific domains often underrepresented in general web data. DACP consistently outperforms existing open-source models and mainstream language models like ChatGPT in specific domains, all while minimizing GPU costs. Complementing this is the Advanced Model Fusion Infrastructure, which utilizes a "Model over Models" (MoM) framework to integrate top-performing domain-specific models, irrespective of their pretraining structures. This innovative approach enables the creation of foundation models by fusing existing models, drastically reducing computational requirements compared to traditional methods—requiring only 160 GPU hours to merge four top models, as opposed to 1-1.6 million GPU hours needed to train foundation models from scratch. The platform's Resource-Efficient Architecture further democratizes AI development by facilitating the effective use of distributed, entry-level GPU resources. By leveraging distributed high-performance computing centers equipped with diverse computing accelerators, the platform efficiently trains foundation models through the fusion of smaller language models (SLMs), offering a viable alternative to training large foundation models from scratch. This approach reduces dependency on massive centralized computational resources, fostering greater innovation and diversity in the field.
