Learning Bilingual Semantic Frames

MPhil Thesis Defence


Title: "Learning Bilingual Semantic Frames"

By

Mr. Zhaojun WU


Abstract

We present our studies on the task of automatically learning bilingual
semantic frames from a Chinese and English parallel corpus in this thesis.
Bilingual semantic frames, the mappings of core semantic arguments (roles)
for a predicate pair in a bi-sentence, have the potential to improve the
translation quality of the Statistical Machine Translation (SMT) system.

As a prerequisite, we first report our research on the subtask of Chinese
Semantic Role Labeling (SRL). We present our implementation of two new
state-of-the-art Chinese shallow semantic parsers, based on the Support
Vector Machine (SVM) and the Maximum Entropy classification techniques. We
also present a full-scale feature comparison and classifier performance
comparison, and propose some new important features in this subtask.

We also propose to learn bilingual semantic frames from a parallel corpus
of translated sentence pairs. We first present our observation on a
reference set that is manually extracted from the parallel corpus. We find
that a considerable 15.73% of semantic argument mappings are not direct
mappings but mismatches, which means the core semantic argument i in
Chinese is not aligned to i in English.

We then present a conventional model SYN_ALIGN that acquires bilingual
semantic frames from the results of semantic role projection based on
syntactic constituent alignment. The evaluation result shows that,
unfortunately, SYN_ALIGN only achieves a very modest performance (44.80%
F-measure) due to its brittle assumption that all semantic arguments in
one language can directly map to their syntactic counterparts in the other
language. Therefore, we propose our novel model ARG_ALIGN to learn
bilingual semantic frames using phrasal similarity measure of semantic
roles that are automatically produced by two monolingual semantic parsers.
As a result, ARG_ALIGN surpasses SYN_ALIGN by about 25 points in F-measure
and has an 86% F-measure upper bound.

Our experimental results suggest that, for integrating bilingual semantic
frames into an SMT system, ARG_ALIGN is a much better solution to acquire
such frames.


Date:				Tuesday, 15 January 2008

Time:				2:00p.m.-4:00p.m.

Venue:				Room 3301A
				Lifts 17-18

Committee Members:		Dr. Pascale Fung (Supervisor)
				Prof. Lionel Ni (Supervisor)
				Dr. Brian Mak (Chairperson)
				Dr. Dekai Wu


**** ALL are Welcome ****