BTG-based Phrasal SMT: Reordering and Segmentation

The Hong Kong University of Science & Technology
Human Language Technology Center
Department of Computer Science and Engineering
Department of Electronic and Computer Engineering
-----------------------------------------------------------------------

Speaker:	Dr. Deyi XIONG
		Institute for Infocomm Research
		Agency for Science, Technology and Research of Singapore

Title:		"BTG-based Phrasal SMT: Reordering and Segmentation"

Date:		Tuesday, 22 June 2010

Time:		2:00pm - 3:00pm

Venue:		Room 3412 (via lifts 17 & 18), HKUST

Abstract:

In this talk, I will present effective methods to address two fundamental
issues in statistical machine translation: reordering and segmentation.
These methods are evaluated on large scale training data using a machine
translation system which adapts Bracketing Transduction Grammars (BTG) to
phrasal translation.

I will introduce boundary word based reordering (BWR) and linguistically
annotated reordering (LAR). In LAR, hard hierarchical skeletons are built
and soft linguistic knowledge from source parse trees are injected to the
nodes of hard skeletons during translation. The experimental results show
that LAR is comparable with BWR. When combined with BWR, LAR provides
complementary information for phrase reordering, which collectively
improves the BLEU score significantly. To further understand the
contribution of linguistic knowledge in LAR to phrase reordering, I'll
present a new syntax-based analysis method, which automatically detects
constituent  movement in both reference and system translations, and
summarizes syntactic reordering patterns that are captured by reordering
models. A comparative analysis is conducted by this analysis method, which
not only provides the insight into how linguistic knowledge affects phrase
movement but also reveals new challenges in phrase reordering.

A simple yet quite effective segmentation model is also introduced to
capture cohesive segments in both phrasal and hierarchical segmentations.
The model automatically learns beginning and ending cohesive segment
boundaries from word-aligned bilingual data without using any additional
resources. The experimental results show that the segmentation model is
able to achieve significant improvement over the baseline which does not
monitor any segmentations. Experiments further displays the individual
contributions of modeling phrasal segmentation and hierarchical
segmentation respectively. The proposed segmentation method is not limited
to BTG-based phrasal translation. It can also be applied to traditional
phrasal translation, or hierarchical and syntactical translation.


**************
Biography:

Deyi Xiong is a Research Fellow at Institute for Infocomm Research, Agency
for Science, Technology and Research of Singapore (A-STAR/I2R). He
obtained his Ph.D. from Institute of Computing Technology, Chinese Academy
of Sciences (CAS/ICT). His research interests are mainly in natural
language processing, especially statistical parsing and machine
translation. His thesis topic was on syntax-based machine translation
using Bracketing Transduction Grammar and dependency grammar.