More about HKUST
BTG-based Phrasal SMT: Reordering and Segmentation
The Hong Kong University of Science & Technology Human Language Technology Center Department of Computer Science and Engineering Department of Electronic and Computer Engineering ----------------------------------------------------------------------- Speaker: Dr. Deyi XIONG Institute for Infocomm Research Agency for Science, Technology and Research of Singapore Title: "BTG-based Phrasal SMT: Reordering and Segmentation" Date: Tuesday, 22 June 2010 Time: 2:00pm - 3:00pm Venue: Room 3412 (via lifts 17 & 18), HKUST Abstract: In this talk, I will present effective methods to address two fundamental issues in statistical machine translation: reordering and segmentation. These methods are evaluated on large scale training data using a machine translation system which adapts Bracketing Transduction Grammars (BTG) to phrasal translation. I will introduce boundary word based reordering (BWR) and linguistically annotated reordering (LAR). In LAR, hard hierarchical skeletons are built and soft linguistic knowledge from source parse trees are injected to the nodes of hard skeletons during translation. The experimental results show that LAR is comparable with BWR. When combined with BWR, LAR provides complementary information for phrase reordering, which collectively improves the BLEU score significantly. To further understand the contribution of linguistic knowledge in LAR to phrase reordering, I'll present a new syntax-based analysis method, which automatically detects constituent movement in both reference and system translations, and summarizes syntactic reordering patterns that are captured by reordering models. A comparative analysis is conducted by this analysis method, which not only provides the insight into how linguistic knowledge affects phrase movement but also reveals new challenges in phrase reordering. A simple yet quite effective segmentation model is also introduced to capture cohesive segments in both phrasal and hierarchical segmentations. The model automatically learns beginning and ending cohesive segment boundaries from word-aligned bilingual data without using any additional resources. The experimental results show that the segmentation model is able to achieve significant improvement over the baseline which does not monitor any segmentations. Experiments further displays the individual contributions of modeling phrasal segmentation and hierarchical segmentation respectively. The proposed segmentation method is not limited to BTG-based phrasal translation. It can also be applied to traditional phrasal translation, or hierarchical and syntactical translation. ************** Biography: Deyi Xiong is a Research Fellow at Institute for Infocomm Research, Agency for Science, Technology and Research of Singapore (A-STAR/I2R). He obtained his Ph.D. from Institute of Computing Technology, Chinese Academy of Sciences (CAS/ICT). His research interests are mainly in natural language processing, especially statistical parsing and machine translation. His thesis topic was on syntax-based machine translation using Bracketing Transduction Grammar and dependency grammar.