Word Sense Disambiguation for Statistical Machine Translation

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Word Sense Disambiguation for Statistical Machine Translation"

By

Miss Marine Jacinthe Carpuat


Abstract

In this thesis, we show for the first time that lexical semantics
modelling is useful in Statistical Machine Translation (SMT).

Word Sense Disambiguation (WSD), the task of resolving sense ambiguity to
identify the right translation of a word is one of the major challenges
faced by language translation systems. If the English word "drug"
translates into French as either "drogue" (used as a narcotic) or
"medicament" (used as a medicine), then an English-French machine
translation system needs to disambiguate every use of "drug" in order to
make the correct translations.

Heavy effort has been put in designing and evaluating dedicated WSD
models, in particular with the Senseval series of workshops. This is
partly motivated by the often unstated assumption that any full
translation system, to achieve full performance, will sooner or later have
to incorporate individual WSD components. However, in most machine
translation architectures, in particular SMT, the WSD problem is typically
not explicitly addressed. This paradoxical situation encouraged
speculation that recent progress in SMT shows that SMT models are already
very good at WSD and that current WSD systems have nothing to offer to
state-of-the-art SMT.

Going beyond these untested assumptions and speculative claims, we conduct
the first direct extensive empirical study of the strengths and weaknesses
of WSD and SMT. Using the state-of-the-art HKUST WSD system, we
surprisingly show that incorporating WSD predictions in SMT does not help
translation quality. Puzzlingly, we also report results suggesting that
typical SMT models cannot disambiguate word translations as well as
dedicated WSD systems.

These seemingly contradictory results lead us to generalize conventional
WSD models to incorporate assumptions at least as strong as in
state-of-the-art SMT. Specifically, (1) WSD targets are generalized from
words to phrases, (2) WSD sense inventories and annotation are learned
automatically in the same way as conventional SMT translation lexicons,
and (3) WSD models are fully integrated in SMT decoding.

Remarkably, the resulting generalized phrasal WSD-for-SMT models improve
translation quality across four different Chinese-to-English translation
tasks, as measured by eight common automatic evaluation metrics. Further
analysis reveals that generalization from conventional WSD to generalized
phrasal WSD-for-SMT is necessary in order to obtain consistent
improvements in translation quality.


Date:			Monday, 14 January 2008

Time:			2:00p.m.-4:00p.m.

Venue:			Room 3301
			Lifts 17-18

Chairman:		Prof. Andrew Miller (BIOL)

Committee Members:	Prof. Dekai Wu (Supervisor)
			Prof. Brian Mak
			Prof. Dit-Yan Yeung
			Prof. Pascale Fung (ECE)
			Prof. Bertram Shi (ECE)
			Prof. Dragomir Radev (EECS, Univ. of Michigan)


**** ALL are Welcome ****