IMPROVING SEMANTIC STATISTICAL MACHINE TRANSLATION VIA SEMANTIC ROLE LABELING

PhD Qualifying Examination


Title: "IMPROVING SEMANTIC STATISTICAL MACHINE TRANSLATION VIA SEMANTIC ROLE 
LABELING"

by

Miss BELOUCIF, MERIEM


Abstract:

In this survey, we review the state-of-the-art of word alignment in statistical 
machine translation (SMT) literature and highlight how these techniques differ 
in learning bilingual correlations between the input and the output. 
Furthermore, we scrutinize recent work done in semantic role labeling (SRL), 
with an idea that it might be possible to improve the performance of SMT 
systems by injecting a semantic objective function earlier into the translation 
pipeline. We believe that it should be possible to improve both translation 
adequacy and fluency by replacing the conventional alignment algorithms with 
more semantically motivated alignments. Our approach is further motivated by 
the fact that including semantic role labeling in the SMT pipeline in a 
different way has already been shown to significantly improve the quality of 
the machine translation output.

Recent research defines a good translation as one where a human can 
successfully understand the core semantics as captured by the basic event 
structure: “who did what to whom, for whom, when, where, how and why”. The 
MEANT family of metrics are semantic evaluation metrics that correlate 
better with human adequacy judgment compared to most commonly used surface 
based metrics. MEANT produces a score measuring the degree of similarity 
between the semantic frame structures of the MT output against semantic 
frame structures of provided reference translations. Our analysis is 
encouraged by the fact that many previous studies have empirically shown 
that integrating semantic role labeling into the training pipeline by 
tuning against MEANT improves the translation adequacy.

The quality of machine translation output relies heavily on word 
alignment. However, the most widespread approach to word alignment is 
still an outdated method of training IBM models in both directions and 
combining their results using various heuristics. Word alignments based on 
inversion transduction grammars or ITGs, on the other hand, have been 
shown to provide a more structured model leading to efficient and accurate 
bidirectional alignments. In this survey we discuss how conventional 
alignment algorithms fail to learn a meaningful mapping between the input 
and target language. We also discuss some possibilities for obtaining 
better performance by incorporating semantics into learning the word 
alignment by integrating SRL constraints much earlier into the training 
pipeline.


Date:			Thursday, 10 March 2016

Time:                  	3:00pm - 5:00pm

Venue:                  Room 3598
                         Lifts 27/28

Committee Members:	Prof. Dekai Wu (Supervisor)
 			Prof. Dit-Yan Yeung (Chairperson)
 			Dr. Brian Mak
 			Prof. Bertram Shi (ECE)


**** ALL are Welcome ****