More about HKUST
Learning Bilingual Semantic Frames
MPhil Thesis Defence Title: "Learning Bilingual Semantic Frames" By Mr. Zhaojun WU Abstract We present our studies on the task of automatically learning bilingual semantic frames from a Chinese and English parallel corpus in this thesis. Bilingual semantic frames, the mappings of core semantic arguments (roles) for a predicate pair in a bi-sentence, have the potential to improve the translation quality of the Statistical Machine Translation (SMT) system. As a prerequisite, we first report our research on the subtask of Chinese Semantic Role Labeling (SRL). We present our implementation of two new state-of-the-art Chinese shallow semantic parsers, based on the Support Vector Machine (SVM) and the Maximum Entropy classification techniques. We also present a full-scale feature comparison and classifier performance comparison, and propose some new important features in this subtask. We also propose to learn bilingual semantic frames from a parallel corpus of translated sentence pairs. We first present our observation on a reference set that is manually extracted from the parallel corpus. We find that a considerable 15.73% of semantic argument mappings are not direct mappings but mismatches, which means the core semantic argument i in Chinese is not aligned to i in English. We then present a conventional model SYN_ALIGN that acquires bilingual semantic frames from the results of semantic role projection based on syntactic constituent alignment. The evaluation result shows that, unfortunately, SYN_ALIGN only achieves a very modest performance (44.80% F-measure) due to its brittle assumption that all semantic arguments in one language can directly map to their syntactic counterparts in the other language. Therefore, we propose our novel model ARG_ALIGN to learn bilingual semantic frames using phrasal similarity measure of semantic roles that are automatically produced by two monolingual semantic parsers. As a result, ARG_ALIGN surpasses SYN_ALIGN by about 25 points in F-measure and has an 86% F-measure upper bound. Our experimental results suggest that, for integrating bilingual semantic frames into an SMT system, ARG_ALIGN is a much better solution to acquire such frames. Date: Tuesday, 15 January 2008 Time: 2:00p.m.-4:00p.m. Venue: Room 3301A Lifts 17-18 Committee Members: Dr. Pascale Fung (Supervisor) Prof. Lionel Ni (Supervisor) Dr. Brian Mak (Chairperson) Dr. Dekai Wu **** ALL are Welcome ****