Safe and Value Alignment Towards Large AI Models

Speaker: Dr. Yaodong Yang
Peking University

Title: "Safe and Value Alignment Towards Large AI Models"

Date: Friday, 22 November 2024

Time: 6:00pm - 7:00pm

Venue: Room 4504 (via lift 25/26), HKUST

Abstract:

The rapid advancements in large language models (LLMs) have brought both 
unprecedented opportunities and challenges, especially in ensuring that these 
systems act in alignment with human values and intentions. At the heart of 
current alignment efforts lies Reinforcement Learning from Human Feedback 
(RLHF), a widely used paradigm for guiding AI behavior. However, fundamental 
questions remain: Can LLMs truly be aligned? Can RLHF reliably converge? And 
most importantly, does alignment inherently make LLMs safer?

In this talk, I will delve into these critical questions, providing an 
in-depth analysis of the theoretical and practical aspects of AI alignment 
and RLHF. Key topics include the limitations of RLHF in achieving robust 
alignment, the safety implications of aligned models, and whether human 
feedback alone is sufficient to address the complex, dynamic nature of 
alignment challenges. Additionally, I will explore the future of AI alignment 
through the lens of next-generation methods, including multi-objective 
optimization and multi-modal learning frameworks.

By addressing these pressing questions, the talk aims to highlight both the 
current state and the future direction of AI alignment research. It will 
provide insights into how we can move beyond RLHF to develop scalable, 
adaptive, and safer alignment methodologies, ensuring that AI systems 
contribute positively to human society. This session is designed to provoke 
thought and inspire collaboration among researchers and practitioners 
dedicated to the alignment problem.


****************
Biography:

Dr. Yaodong Yang is a Boya Assistant Professor at the Peking University 
Institute for Artificial Intelligence, and Deputy Directory for Centre for AI 
Safety and Governance. Dr. Yang's research focuses on human-AI safe 
interaction and value alignment, covering areas such as reinforcement 
learning, AI alignment, multi-agent learning, and embodied AI. He has 
maintained a track record of more than 100 publications at top conferences 
and journals (Nature Machine Intelligence, Artificial Intelligence, JMLR, 
IEEE T-PAMI, National Science Review), with 6000+ Google Citations. He has 
been awarded the Best Paper Award Initial List at ICCV'23, the Best System 
Paper Award at CoRL'20, the Best Blue-Sky Paper Award at AAMAS'21, the Rising 
Star Award of ACM SIGAI China and World AI Conference 2022. He has also led 
the alignment efforts for the Baichuan2, Pengcheng Naohai 33B, and Hong Kong 
HKGAI LLMs. Dr. Yang serves as an Area Chair for ICLR, NeurIPS, AAAI, IJCAI, 
and AAMAS, and is an Action Editor for Neural Networks and Transactions on 
Machine Learning Research. Previously, Dr. Yang was an Assistant Professor at 
King's College London, Principal Researcher at Huawei UK, and Senior Manager 
at American International Group (AIG). He earned his bachelor's degree from 
the University of Science and Technology of China, a master's and Ph.D. from 
Imperial College London and University College London (nominated for the ACM 
SIGAI Doctoral Dissertation Award by UCL).