More about HKUST
Safe and Value Alignment Towards Large AI Models
Speaker: Dr. Yaodong Yang Peking University Title: "Safe and Value Alignment Towards Large AI Models" Date: Friday, 22 November 2024 Time: 6:00pm - 7:00pm Venue: Room 4504 (via lift 25/26), HKUST Abstract: The rapid advancements in large language models (LLMs) have brought both unprecedented opportunities and challenges, especially in ensuring that these systems act in alignment with human values and intentions. At the heart of current alignment efforts lies Reinforcement Learning from Human Feedback (RLHF), a widely used paradigm for guiding AI behavior. However, fundamental questions remain: Can LLMs truly be aligned? Can RLHF reliably converge? And most importantly, does alignment inherently make LLMs safer? In this talk, I will delve into these critical questions, providing an in-depth analysis of the theoretical and practical aspects of AI alignment and RLHF. Key topics include the limitations of RLHF in achieving robust alignment, the safety implications of aligned models, and whether human feedback alone is sufficient to address the complex, dynamic nature of alignment challenges. Additionally, I will explore the future of AI alignment through the lens of next-generation methods, including multi-objective optimization and multi-modal learning frameworks. By addressing these pressing questions, the talk aims to highlight both the current state and the future direction of AI alignment research. It will provide insights into how we can move beyond RLHF to develop scalable, adaptive, and safer alignment methodologies, ensuring that AI systems contribute positively to human society. This session is designed to provoke thought and inspire collaboration among researchers and practitioners dedicated to the alignment problem. **************** Biography: Dr. Yaodong Yang is a Boya Assistant Professor at the Peking University Institute for Artificial Intelligence, and Deputy Directory for Centre for AI Safety and Governance. Dr. Yang's research focuses on human-AI safe interaction and value alignment, covering areas such as reinforcement learning, AI alignment, multi-agent learning, and embodied AI. He has maintained a track record of more than 100 publications at top conferences and journals (Nature Machine Intelligence, Artificial Intelligence, JMLR, IEEE T-PAMI, National Science Review), with 6000+ Google Citations. He has been awarded the Best Paper Award Initial List at ICCV'23, the Best System Paper Award at CoRL'20, the Best Blue-Sky Paper Award at AAMAS'21, the Rising Star Award of ACM SIGAI China and World AI Conference 2022. He has also led the alignment efforts for the Baichuan2, Pengcheng Naohai 33B, and Hong Kong HKGAI LLMs. Dr. Yang serves as an Area Chair for ICLR, NeurIPS, AAAI, IJCAI, and AAMAS, and is an Action Editor for Neural Networks and Transactions on Machine Learning Research. Previously, Dr. Yang was an Assistant Professor at King's College London, Principal Researcher at Huawei UK, and Senior Manager at American International Group (AIG). He earned his bachelor's degree from the University of Science and Technology of China, a master's and Ph.D. from Imperial College London and University College London (nominated for the ACM SIGAI Doctoral Dissertation Award by UCL).