More about HKUST
IMPROVING SEMANTIC STATISTICAL MACHINE TRANSLATION VIA SEMANTIC ROLE LABELING
PhD Qualifying Examination
Title: "IMPROVING SEMANTIC STATISTICAL MACHINE TRANSLATION VIA SEMANTIC ROLE
LABELING"
by
Miss BELOUCIF, MERIEM
Abstract:
In this survey, we review the state-of-the-art of word alignment in statistical
machine translation (SMT) literature and highlight how these techniques differ
in learning bilingual correlations between the input and the output.
Furthermore, we scrutinize recent work done in semantic role labeling (SRL),
with an idea that it might be possible to improve the performance of SMT
systems by injecting a semantic objective function earlier into the translation
pipeline. We believe that it should be possible to improve both translation
adequacy and fluency by replacing the conventional alignment algorithms with
more semantically motivated alignments. Our approach is further motivated by
the fact that including semantic role labeling in the SMT pipeline in a
different way has already been shown to significantly improve the quality of
the machine translation output.
Recent research defines a good translation as one where a human can
successfully understand the core semantics as captured by the basic event
structure: “who did what to whom, for whom, when, where, how and why”. The
MEANT family of metrics are semantic evaluation metrics that correlate
better with human adequacy judgment compared to most commonly used surface
based metrics. MEANT produces a score measuring the degree of similarity
between the semantic frame structures of the MT output against semantic
frame structures of provided reference translations. Our analysis is
encouraged by the fact that many previous studies have empirically shown
that integrating semantic role labeling into the training pipeline by
tuning against MEANT improves the translation adequacy.
The quality of machine translation output relies heavily on word
alignment. However, the most widespread approach to word alignment is
still an outdated method of training IBM models in both directions and
combining their results using various heuristics. Word alignments based on
inversion transduction grammars or ITGs, on the other hand, have been
shown to provide a more structured model leading to efficient and accurate
bidirectional alignments. In this survey we discuss how conventional
alignment algorithms fail to learn a meaningful mapping between the input
and target language. We also discuss some possibilities for obtaining
better performance by incorporating semantics into learning the word
alignment by integrating SRL constraints much earlier into the training
pipeline.
Date: Thursday, 10 March 2016
Time: 3:00pm - 5:00pm
Venue: Room 3598
Lifts 27/28
Committee Members: Prof. Dekai Wu (Supervisor)
Prof. Dit-Yan Yeung (Chairperson)
Dr. Brian Mak
Prof. Bertram Shi (ECE)
**** ALL are Welcome ****