Advancing Calibration of Language Models for Text Generation

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

PhD Thesis Defence

Title: "Advancing Calibration of Language Models for Text Generation"


Mr. Dongkyu LEE


Over the past few years, the natural language processing community has been
witnessing the rapid development in language models. There is a high level of
public interest in these models, particularly generative ones, and language
models have started to appear in user-facing applications. While improving
model performance has always been a key objective, the widespread use of
language models has also brought attention to the importance of modeling
"reliable" probabilities.

In many scenarios, a prediction by a model is coupled with a corresponding
probability, commonly referred to as a confidence score, and the score adds
meaningful information; the probability indicates the level of confidence by
the model in making the prediction and represents the level of trustworthiness
of the prediction. However, the underlying hypothesis is that the model is
calibrated. When a model is miscalibrated, the confidence score is no longer a
meaningful indicator.

Model calibration involves refining a model to produce accurate probability
estimates. Specifically, the probability mapped by a model is expected to
accurately reflect the likelihood of a corresponding prediction being correct.
The importance of model calibration is arguably of utmost significance in
natural language generation domain; a probability is merely an indicator of
model's certainty in other domains, yet with a language model that generates a
text in autoregressive manner, calibration of a model has a direct impact on
the model outputs. A language model creates a text with a decoding algorithm,
and the decoding scheme utilizes probability distributions mapped by the model.
Therefore, due to the distinct nature of language generation models, model
calibration has a direct impact on a model output, and hence calibration of
language models requires a thorough and extensive study.

In this thesis, we present three novel methods for improving calibration of a
model that are specifically designed for text generation. Firstly, we propose a
student-teacher framework that calibrates a language model. Given a calibrated
teacher model, a student model not only benefits from the knowledge distilled,
but also learns to match the calibrated scores mapped by the teacher model. The
next method is a novel regularization scheme that not only improves model
performance, but also reduces calibration error of the model. The regularizer
is a variant of label smoothing, a popular regularization method. The proposed
regularization scheme self-regulates the extent of smoothing based on the
confidence score mapped by the model in training, and hence the language model
is less likely to make predictions with overconfidence, a type of
miscalibration. Lastly, this thesis presents a novel decoding scheme that is
rooted from the concept of model calibration. The long-standing problem of a
language model is repetitions in outputs. We see the problem as a casualty
brought by miscalibration of a language model. In this context, our novel
decoding scheme applies post-hoc calibration in the course of inference.

Date:                   Monday, 4 December 2023

Time:                   2:00pm - 4:00pm

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Prof. Man Hoi WONG (ECE)

Committee Members:      Prof. Nevin ZHANG (Supervisor)
                        Prof. Junxian HE
                        Prof. Brian MAK
                        Prof. Zhiyao XIE (ECE)
                        Prof. Wai LAM (CUHK)

**** ALL are Welcome ****