More about HKUST
Binarizing Synchronous Grammars for Machine Translation
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Department of Electronic and Computer Engineering Human Language Technology Center SEMINAR Title: "Binarizing Synchronous Grammars for Machine Translation" Speaker: Liang HUANG, University of Pennsylvania Date: 30 July 2007 Time: 16:00-17:30 Venue: Room 1505 (Lifts 25-26) Abstract: Joint work with Hao Zhang (Rochester), Dan Gildea (Rochester), and Kevin Knight (USC/ISI) Syntax-based MT systems have seen promising improvements in translation quality, but are often computationally too expensive to be practical. In this talk, I will address this problem of efficiency in systems based on synchronous grammars and tree transducers. In this model, the theoretical complexity of decoding is exponential in the size of individual grammar rules due to arbitrary re-orderings between the two languages. To alleviate this problem, we develop a theory of binarization for synchronous grammars which factors big rules into smaller, binary rules, that permit efficient decoding and are in fact equivalent to Inversion Transduction Grammars (ITG). We also present a linear-time algorithm for synchronous binarization. In our large-scale experiments, we found that the syntactic re-orderings between Chinese and English are almost always binarizable and the resulting rule set significantly improves the speed and accuracy of ISI's syntax-based MT system, which is currently the state-of-the-art. Besides efficiency, this novel technique of binarization also provides a tool to study the formal and empirical properties of synchronous grammars. For example, we are now able to address the famous question by Dekai Wu that whether there are non-ITG re- orderings between fixed-word-order languages such as English and Chinese. Our empirical results on hand-aligned English-Chinese parallel text confirmed that the pro-ITG conjecture is largely correct (99.7% cases are binarizable), while we also provide, for the first-time, "real-examples" of non-binarizable reordering verified by native speakers. If time permitting, I would also present results on freer-word-order languages. References: * Hao Zhang, Liang Huang, Daniel Gildea, and Kevin Knight (2006). Synchronous Binarization for Machine Translation. HLT-NAACL 2006, NY. * Liang Huang, Hao Zhang, Daniel Gildea, and Kevin Knight (2007). Binarization of Synchronous Context-Free Grammars. Submitted to Computational Linguistics. Biography: Liang Huang is finishing his fourth year in the PhD program at the University of Pennsylvania, under Aravind Joshi. He is mainly interested in the theoretical aspects of computational linguistics, in particular, efficient algorithms in parsing and machine translation, and formal properties of synchronous grammars. He also works on applying computational linguistics to structural biology. ***All are welcome***