Tree-based and Forest-based Translation

---------------------------------------------------------------------------
                ***Joint Seminar***
---------------------------------------------------------------------------
The Hong Kong University of Science & Technology

Department of Computer Science and Engineering
Department of Electronic and Computer Engineering
Human Language Technology Center
---------------------------------------------------------------------------

Speaker:	Liang HUANG
		Department of Computer and Information Science
		University of Pennsylvania

Title:		"Tree-based and Forest-based Translation"

Date:		Monday, 24 November 2008

Time:		4:00pm - 5:00pm

Venue:		Lecture Theatre F
		(Leung Yat Sing Lecture Theatre, near lifts25/26), HKUST

Abstract:

What can machine translation systems learn from human translators? And
what is in common between translating English into Chinese and compiling
C++ into machine code?

In this talk I will first introduce a tree-based paradigm for machine
translation, inspired by both human translators and compilers. In this
paradigm, a source language sentence is first parsed into a syntax tree,
which is then recursively converted into a target language sentence via
tree-to-string transformation rules. Since the translation process is
driven by the syntax, this approach resembles the "syntax-directed
translation" method used by almost all compilers.

However, natural languages are crucially different from programming
languages in that they are fundamentally ambiguous. So we don't (and will
probably never) have perfect parsers, and parsing errors adversely affect
translation quality. An obvious solution is to use the top-k parses,
rather than a single 1-best tree, but this only helps a little bit due to
the limited scope of the k-best list. We instead propose a "forest-based
approach", which translates a packed forest encoding *exponentially* many
parses in a compact (polynomial) space by sharing common subtrees.
Large-scale experiments showed very significant improvements (over the
1-best baseline) in terms of translation quality, which outperforms the
best reported systems to date, and confirmed that translating on a forest
of millions of trees can be even faster than translating on top-30
individual trees thanks to dynamic programming.

This is a joint work with Haitao Mi and Qun Liu.


**********************
Biography:

Liang HUANG recently defended his PhD thesis at the University of
Pennsylvania, co-supervised by Aravind Joshi and Kevin Knight (USC/ISI).
He is mainly interested in the theoretical aspects of computational
linguistics, in particular, efficient algorithms in parsing and machine
translation, generic dynamic programming, and formal properties of
synchronous grammars. His thesis develops a set of "forest-based methods"
that have been applied to many problems in NLP including k-best parsing,
forest rescoring and reranking, and forest-based translation. He received
an Outstanding Paper Award at ACL 2008, and a University Teaching Award at
Penn in 2005.