Advancing Calibration of Language Models for Text Generation

PhD Thesis Proposal Defence

Title: "Advancing Calibration of Language Models for Text Generation"


Mr. Dongkyu LEE


Over the past few years, the natural language processing community has been 
witnessing the rapid development in language models. There is a high level of 
public interest in these models, particularly generative ones, and language 
models have started to appear in user-facing applications. While improving 
model performance has always been a key objective, the widespread use of 
language models has also brought attention to the importance of modeling 
“reliable” probabilities.

In many scenarios, a prediction by a model is coupled with a corresponding 
probability, commonly referred to as a confidence score, and the score adds 
meaningful information; the probability indicates how certain a model is in 
making the prediction and represents the level of trustworthiness of the 
prediction. However, the underlying hypothesis is that the model is calibrated. 
When a model is miscalibrated, the confidence score is no longer a meaningful 

Model calibration involves refining a model to produce accurate probability 
estimates. Specifically, the probability mapped by a model is expected to 
accurately reflect the likelihood of a corresponding prediction being correct. 
The importance of model calibration is arguably of utmost significance in 
natural language generation domain; a probability is merely an indicator of 
model’s certainty in other domains, yet with a language model that generates a 
text in autoregressive manner, model calibration has a direct impact on the 
model outputs. A language model creates a text with a decoding algorithm, and 
the decoding scheme utilizes probability distributions mapped by the model. 
Therefore, due to the distinct nature of language generation models, model 
calibration has a direct impact on a model output, and hence calibration of 
language models requires a thorough and extensive study.

In this thesis, we present three novel calibration methods that are 
specifically designed for text generation language models. Firstly, we propose 
a student-teacher framework that calibrates a language model. Given a 
calibrated teacher model, a student model not only benefits from the knowledge 
distilled, but also learns to match the calibrated scores mapped by the teacher 
model. The next method is a novel regularization scheme that not only improves 
model performance, but also reduces calibration error of the model. The 
regularizer is a variant of label smoothing, a popular regularization method. 
The proposed regularization scheme self-regulates the extent of smoothing based 
on the confidence score mapped by the model in training, and hence the language 
model is less likely to make predictions with overconfidence, a type of 
miscalibration. Lastly, this thesis presents a novel decoding scheme that is 
rooted from the concept of model calibration. The long-standing problem of a 
language model is repetitions in outputs. We see the problem as a casualty 
brought by miscalibration of a language model. In this context, our novel 
decoding scheme applies post-hoc calibration in the course of inference.

Date:			Wednesday, 27 September 2023

Time:                  	11:00am - 1:00pm

Venue:                  Room 5510
                         lifts 25/26

Committee Members:	Prof. Nevin Zhang (Supervisor)
 			Prof. Fangzhen Lin (Chairperson)
 			Dr. Brian Mak
 			Dr. Yangqiu Song

**** ALL are Welcome ****