PhD Qualifying Examination

"A Survey of Word Sense Disambiguation"

By Mr. Weifeng Su

Abstract:

We compare, contrast, and critique recent Word Sense Disambiguation (WSD) 
models employing supervised, slightly supervised and unsupervised machine 
learning models. We first compare nine supervised learning methods, and 
find that there is no universal best method for WSD applications. 
Different machine learning methods should be used depending on the specific 
characteristics of the WSD application. We then compare slightly supervised 
approaches, and we find that bilingual-bootstrapping outperforms 
monolingual-bootstrapping in accuracy by utilizing a second language corpus. 
Finally we compare unsupervised approaches, and find that a method based on 
Roget's thesaurus performs well on concrete nouns whereas a method based on 
a second language corpus usually performs better on verbs and adjectives. 
Thus, what WSD model is best is dependent on the type of target word.

The area of WSD currently remains in a dilemma. On one hand, although 
supervised WSD can achieve high disambiguation precision, it requires 
impractically large annotated corpora containing sense labeled training 
instances. On the other hand, the precision achieved by slightly supervised 
and unsupervised methods is far from satisfactory. This dilemma prevents WSD 
from being applicable in many NLP applications.


Date: 			Thursday, January 29,2004
Time:			3:00p.m.-5:00p.m.
Venue: 			Room 2302
			lifts 17-18

Committee Members:		Prof. Dekai Wu (Supervisor)
				Prof. Fangzhen Lin (Chairperson)
				Prof. Dit-Yan Yeung
				Prof. Pascale Fung
	
	
**** ALL are Welcome ****