More about HKUST
Advancing Calibration of Language Models for Text Generation
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Advancing Calibration of Language Models for Text Generation" By Mr. Dongkyu LEE Abstract: Over the past few years, the natural language processing community has been witnessing the rapid development in language models. There is a high level of public interest in these models, particularly generative ones, and language models have started to appear in user-facing applications. While improving model performance has always been a key objective, the widespread use of language models has also brought attention to the importance of modeling "reliable" probabilities. In many scenarios, a prediction by a model is coupled with a corresponding probability, commonly referred to as a confidence score, and the score adds meaningful information; the probability indicates the level of confidence by the model in making the prediction and represents the level of trustworthiness of the prediction. However, the underlying hypothesis is that the model is calibrated. When a model is miscalibrated, the confidence score is no longer a meaningful indicator. Model calibration involves refining a model to produce accurate probability estimates. Specifically, the probability mapped by a model is expected to accurately reflect the likelihood of a corresponding prediction being correct. The importance of model calibration is arguably of utmost significance in natural language generation domain; a probability is merely an indicator of model's certainty in other domains, yet with a language model that generates a text in autoregressive manner, calibration of a model has a direct impact on the model outputs. A language model creates a text with a decoding algorithm, and the decoding scheme utilizes probability distributions mapped by the model. Therefore, due to the distinct nature of language generation models, model calibration has a direct impact on a model output, and hence calibration of language models requires a thorough and extensive study. In this thesis, we present three novel methods for improving calibration of a model that are specifically designed for text generation. Firstly, we propose a student-teacher framework that calibrates a language model. Given a calibrated teacher model, a student model not only benefits from the knowledge distilled, but also learns to match the calibrated scores mapped by the teacher model. The next method is a novel regularization scheme that not only improves model performance, but also reduces calibration error of the model. The regularizer is a variant of label smoothing, a popular regularization method. The proposed regularization scheme self-regulates the extent of smoothing based on the confidence score mapped by the model in training, and hence the language model is less likely to make predictions with overconfidence, a type of miscalibration. Lastly, this thesis presents a novel decoding scheme that is rooted from the concept of model calibration. The long-standing problem of a language model is repetitions in outputs. We see the problem as a casualty brought by miscalibration of a language model. In this context, our novel decoding scheme applies post-hoc calibration in the course of inference. Date: Monday, 4 December 2023 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Chairman: Prof. Man Hoi WONG (ECE) Committee Members: Prof. Nevin ZHANG (Supervisor) Prof. Junxian HE Prof. Brian MAK Prof. Zhiyao XIE (ECE) Prof. Wai LAM (CUHK) **** ALL are Welcome ****