More about HKUST
Using Semantic Role Labels to Reorder Statistical Machine Translation Output
MPhil Thesis Defence Title: "Using Semantic Role Labels to Reorder Statistical Machine Translation Output" By Miss Chi-Kiu Lo Abstract In this thesis, we show that reordering Statistical Machine Translation (SMT) output to match its semantic roles with those of the input improves the translation quality. Translation quality can be evaluated in terms of adequacy, fluency and fidelity. Current SMT systems attempts to tackle adequacy primarily by memorizing in a bi-lexicon all word (or phrase) translation pairs that co-occur frequently in a training corpus, using various statistics with the hope of improving the accuracy of translation lexical choice. They model the word order in the translation output as a statistical dependency problem, relying heavily on monolingual n-gram language models of the output language in an attempt to compensate for weak bilingual models of word (or phrase) alignment and permutation. Since no semantic features are considered throughout the process of training and translating, it is not surprising that serious semantic role confusion errors appear in the SMT output. To tackle this problem, one approach is to integrate semantic information into SMT. Firstly, we study in detail a state-of-the-art Chinese shallow semantic parser, C-ASSERT, which consists of a Chinese word segmenter and a Chinese shallow syntactic parser, is studied in detail. A set of controlled experiments is carried out by using different Chinese word segmenters and Chinese shallow syntactic parsers. It is found that the best performance is obtained when the Chinese word segmenter and the Chinese shallow syntactic parser are both the maximum entropy model built by our research center at HKUST. Then, to provide solid groundwork to support our claim that using Semantic Role Labels (SRL) to reorder SMT output improves translation quality, a strong SMT baseline is set up and optimized. An objective scoring function is then devised to quantify the matching of shallow semantic role between the Chinese source input and the SMT output. Finally, an algorithm is built to reorder the SMT output using semantic role labels. The experiment results show that the algorithm successfully returns a better translation with fewer semantic role confusion errors. Date: Monday, 24 August 2009 Time: 2:00pm – 4:00pm Venue: Room 3301A Lifts 17-18 Committee Members: Dr. Dekai Wu (Supervisor) Dr. Brian Mak (Chairperson) Dr. Pascale Fung (ECE) **** ALL are Welcome ****