More about HKUST
Tree-based and Forest-based Translation
--------------------------------------------------------------------------- ***Joint Seminar*** --------------------------------------------------------------------------- The Hong Kong University of Science & Technology Department of Computer Science and Engineering Department of Electronic and Computer Engineering Human Language Technology Center --------------------------------------------------------------------------- Speaker: Liang HUANG Department of Computer and Information Science University of Pennsylvania Title: "Tree-based and Forest-based Translation" Date: Monday, 24 November 2008 Time: 4:00pm - 5:00pm Venue: Lecture Theatre F (Leung Yat Sing Lecture Theatre, near lifts25/26), HKUST Abstract: What can machine translation systems learn from human translators? And what is in common between translating English into Chinese and compiling C++ into machine code? In this talk I will first introduce a tree-based paradigm for machine translation, inspired by both human translators and compilers. In this paradigm, a source language sentence is first parsed into a syntax tree, which is then recursively converted into a target language sentence via tree-to-string transformation rules. Since the translation process is driven by the syntax, this approach resembles the "syntax-directed translation" method used by almost all compilers. However, natural languages are crucially different from programming languages in that they are fundamentally ambiguous. So we don't (and will probably never) have perfect parsers, and parsing errors adversely affect translation quality. An obvious solution is to use the top-k parses, rather than a single 1-best tree, but this only helps a little bit due to the limited scope of the k-best list. We instead propose a "forest-based approach", which translates a packed forest encoding *exponentially* many parses in a compact (polynomial) space by sharing common subtrees. Large-scale experiments showed very significant improvements (over the 1-best baseline) in terms of translation quality, which outperforms the best reported systems to date, and confirmed that translating on a forest of millions of trees can be even faster than translating on top-30 individual trees thanks to dynamic programming. This is a joint work with Haitao Mi and Qun Liu. ********************** Biography: Liang HUANG recently defended his PhD thesis at the University of Pennsylvania, co-supervised by Aravind Joshi and Kevin Knight (USC/ISI). He is mainly interested in the theoretical aspects of computational linguistics, in particular, efficient algorithms in parsing and machine translation, generic dynamic programming, and formal properties of synchronous grammars. His thesis develops a set of "forest-based methods" that have been applied to many problems in NLP including k-best parsing, forest rescoring and reranking, and forest-based translation. He received an Outstanding Paper Award at ACL 2008, and a University Teaching Award at Penn in 2005.