More about HKUST
Better Questions for Better Answers: Generating High-quality QA Pairs
The Hong Kong University of Science and Technology Department of Computer Science and Engineering MPhil Thesis Defence Title: "Better Questions for Better Answers: Generating High-quality QA Pairs" By Mr. Yik Lun LAU Abstract: Legal contracts, with their characteristic verbosity and complex jargon, pose substantial challenges to laypersons. While the use of chatbots for FAQ generation has gained traction, the need for high-quality, contextual question-answer pairs specific to legal contracts remains largely unfulfilled. Existing automated systems, while effective, fall short in their ability to construct comprehensive, well-grounded QA pairs that mimic the interpretive prowess of human legal expertise. This research introduces a pioneering effort to address this need by converting legal contracts into high-quality, generative questionanswer pairs using the popular ChatGPT framework. To further enhance the quality of the generated QA pairs, we refined state-of-the-art models such as T5, T0 and Alpaca-LoRA specifically for the legal domain. We propose a cost-efficient and highly effective method to train these custom models, offering a level of performance on par with ChatGPT, without the confidentiality issues often associated with it. We incorporated an unprecedented level of diversity in question generation, surpassing existing legal datasets, and thereby expanding the possibilities for legal contract interpretation. A novel aspect of our work is the introduction of generative question answering, a mechanism where not only is the question answered, but a reference number is also provided for effective source citation. This contributes towards the development of a more trustworthy and reliable language model. The effectiveness of our approach was thoroughly validated using a variety of automatic evaluation techniques, including BLEU, GoogleBLEU, ROUGE, METEOR, and BERTScore, as well as qualitative assessments performed by domain experts, evaluating answerability, grounding, readability, coherence, and informativeness. This study provides the first-of-its-kind dataset in the legal domain framed for generative or abstractive question answering, where the flexibility of language usage supersedes mere text span extraction. Practical implications include the creation of a robust chatbot capable of generating informative FAQs from contracts, thereby expediting user understanding of contractual content and minimizing the effort required to formulate queries. We believe our work marks a substantial advancement in AI-powered contract interpretation, and opens up exciting avenues for future research and applications. Date: Friday, 18 August 2023 Time: 4:00pm - 6:00pm Venue: Room 4472 lifts 25/26 Committee Members: Dr. Yangqiu Song (Supervisor) Dr. Wei Wang (Chairperson) Dr. Minhao Cheng **** ALL are Welcome ****