MPhil Thesis Defence "Knowledge-based Sense Pruning using the HowNet: an Alternative to Word Sense Disambiguation" By Mr. Chi-Yung Wang Abstract In this thesis, we try to solve the problem of word sense disambiguation (WSD) in natural language processing by sense pruning using a knowledge-based approach. Traditional WSD methods provide only one meaning for each word in a passage. However, we believe that the textual information alone may not be sufficient to determine the exact meaning of each word which has to be resolved when higher-level knowledge becomes available. Thus, we propose that the objective of WSD is to reduce the number of plausible meanings of a word as much as possible through "sense pruning". After sense pruning, we will associate a word with a list of plausible meanings. We would like to keep the truly correct sense of each word on its own meaning list and yet limit the number of possible meanings of a whole sentence as small as possible. We applied sense pruning to Chinese WSD, making use of the HowNet. HowNet is a knowledge base that describes all entities in its database by a set of unambiguous sememes. It provides information about the relationship between concepts or their attributes, in which concepts are represented by the sememes. One of our contributions is integrating various knowledge from HowNet for sense pruning, such as, relations between the sememes, information structures in Chinese, relations of attributes and attribute values, and characteristics of functional words. Based on HowNet, four additional databases were developed for sense pruning in this thesis. We evaluated our sense pruning algorithm on the Corpus of Sinica from Taiwan. Two criteria were used for the evaluation: recall rate and reduction of the number of possible meanings of a sentence. Effects of the size of the analytical window and the analytical unit, and the speed of the algorithm were fully studied. In summary, sense pruning achieves a recall rate of 91% while reducing the number of possible meanings of a sentence by 48% when a whole sentence is taken as an analytical unit. Date: Monday, 14 January 2002 Time: 2:00p.m.-4:00p.m. Venue: Room 2302 Lifts 17-18 Committee Members: Dr. Brian Mak (Supervisor) Dr. Dr. Fangzhen Lin (Chairman) Dr. James Kwok Dr. Kok-Wee Gan **** ALL are Welcome ****