Towards Efficient Multi-objective Alignment of Large Language Models

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


MPhil Thesis Defence


Title: "Towards Efficient Multi-objective Alignment of Large Language Models"

By

Mr. Rui YANG


Abstract:

This study addresses the challenge of multi-objective alignment of foundation 
models, particularly Large Language Models (LLMs), with human preferences--a 
crucial step towards developing helpful and harmless AI systems. Fine-tuning 
large foundation models using reinforcement learning (RL) is often costly and 
unstable. Additionally, the multi-dimensionality, heterogeneity, and 
conflicting nature of human preferences further complicate the alignment 
process. In this paper, we introduce Rewards-in-Context (RiC), a novel approach 
that conditions the response of a foundation model on multiple rewards within 
its prompt context and employs supervised fine-tuning for alignment. RiC is 
characterized by its simplicity and adaptability, requiring only supervised 
fine-tuning of a single foundation model and allowing for dynamic adjustment of 
user preferences during inference. Inspired by the analytical solution of an 
abstracted convex optimization problem, our dynamic inference-time adjustment 
method approximates the Pareto-optimal solution for multiple objectives. 
Empirical evidence demonstrates the efficacy of our method in aligning LLMs to 
accommodate diverse rewards with only approximately 10% of the GPU hours 
required by multi-objective RL baselines.


Date:                   Thursday, 8 August 2024

Time:                   10:00am - 12:00noon

Venue:                  Room 4475
                        Lifts 25/26

Chairman:               Dr. Ling PAN

Committee Members:      Dr. Junxian HE (Supervisor)
                        Dr. Dongdong SHE