Better Questions for Better Answers: Generating High-quality QA Pairs

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


MPhil Thesis Defence


Title: "Better Questions for Better Answers: Generating High-quality QA Pairs"

By

Mr. Yik Lun LAU


Abstract:

Legal contracts, with their characteristic verbosity and complex jargon, pose
substantial challenges to laypersons. While the use of chatbots for FAQ
generation has gained traction, the need for high-quality, contextual
question-answer pairs specific to legal contracts remains largely unfulfilled.
Existing automated systems, while effective, fall short in their ability to
construct comprehensive, well-grounded QA pairs that mimic the interpretive
prowess of human legal expertise. This research introduces a pioneering effort
to address this need by converting legal contracts into high-quality,
generative questionanswer pairs using the popular ChatGPT framework.

To further enhance the quality of the generated QA pairs, we refined
state-of-the-art models such as T5, T0 and Alpaca-LoRA specifically for the
legal domain. We propose a cost-efficient and highly effective method to train
these custom models, offering a level of performance on par with ChatGPT,
without the confidentiality issues often associated with it. We incorporated an
unprecedented level of diversity in question generation, surpassing existing
legal datasets, and thereby expanding the possibilities for legal contract
interpretation.

A novel aspect of our work is the introduction of generative question
answering, a mechanism where not only is the question answered, but a reference
number is also provided for effective source citation. This contributes towards
the development of a more trustworthy and reliable language model.

The effectiveness of our approach was thoroughly validated using a variety of
automatic evaluation techniques, including BLEU, GoogleBLEU, ROUGE, METEOR, and
BERTScore, as well as qualitative assessments performed by domain experts,
evaluating answerability, grounding, readability, coherence, and
informativeness.

This study provides the first-of-its-kind dataset in the legal domain framed
for generative or abstractive question answering, where the flexibility of
language usage supersedes mere text span extraction. Practical implications
include the creation of a robust chatbot capable of generating informative FAQs
from contracts, thereby expediting user understanding of contractual content
and minimizing the effort required to formulate queries. We believe our work
marks a substantial advancement in AI-powered contract interpretation, and
opens up exciting avenues for future research and applications.


Date:                   Friday, 18 August 2023

Time:                   4:00pm - 6:00pm

Venue:                  Room 4472
                        lifts 25/26

Committee Members:      Dr. Yangqiu Song (Supervisor)
                        Dr. Wei Wang (Chairperson)
                        Dr. Minhao Cheng


**** ALL are Welcome ****