More about HKUST
IMPROVING SEMANTIC STATISTICAL MACHINE TRANSLATION VIA SEMANTIC ROLE LABELING
PhD Qualifying Examination Title: "IMPROVING SEMANTIC STATISTICAL MACHINE TRANSLATION VIA SEMANTIC ROLE LABELING" by Miss BELOUCIF, MERIEM Abstract: In this survey, we review the state-of-the-art of word alignment in statistical machine translation (SMT) literature and highlight how these techniques differ in learning bilingual correlations between the input and the output. Furthermore, we scrutinize recent work done in semantic role labeling (SRL), with an idea that it might be possible to improve the performance of SMT systems by injecting a semantic objective function earlier into the translation pipeline. We believe that it should be possible to improve both translation adequacy and fluency by replacing the conventional alignment algorithms with more semantically motivated alignments. Our approach is further motivated by the fact that including semantic role labeling in the SMT pipeline in a different way has already been shown to significantly improve the quality of the machine translation output. Recent research defines a good translation as one where a human can successfully understand the core semantics as captured by the basic event structure: “who did what to whom, for whom, when, where, how and why”. The MEANT family of metrics are semantic evaluation metrics that correlate better with human adequacy judgment compared to most commonly used surface based metrics. MEANT produces a score measuring the degree of similarity between the semantic frame structures of the MT output against semantic frame structures of provided reference translations. Our analysis is encouraged by the fact that many previous studies have empirically shown that integrating semantic role labeling into the training pipeline by tuning against MEANT improves the translation adequacy. The quality of machine translation output relies heavily on word alignment. However, the most widespread approach to word alignment is still an outdated method of training IBM models in both directions and combining their results using various heuristics. Word alignments based on inversion transduction grammars or ITGs, on the other hand, have been shown to provide a more structured model leading to efficient and accurate bidirectional alignments. In this survey we discuss how conventional alignment algorithms fail to learn a meaningful mapping between the input and target language. We also discuss some possibilities for obtaining better performance by incorporating semantics into learning the word alignment by integrating SRL constraints much earlier into the training pipeline. Date: Thursday, 10 March 2016 Time: 3:00pm - 5:00pm Venue: Room 3598 Lifts 27/28 Committee Members: Prof. Dekai Wu (Supervisor) Prof. Dit-Yan Yeung (Chairperson) Dr. Brian Mak Prof. Bertram Shi (ECE) **** ALL are Welcome ****