More about HKUST
Chinese Lexical Unit Extraction from Untagged Corpora
-------------------------------------------------------------------- ***Joint Seminar*** The Hong Kong University of Science & Technology Department of Computer Science and Engineering Department of Electronic and Computer Engineering Human Language Technology Center -------------------------------------------------------------------- Speaker: Gaël Patin Software Engineer Arisem Title: "Chinese Lexical Unit Extraction from Untagged Corpora" Date: Wednesday, 29 July 2009 Time: 4:00pm-5:00pm Venue: Rm 2578, 2/F, (via lifts 29/30), HKUST Abstract: Building lexical resources is a vital task in improving the efficiency of information retrieval systems. This talk introduces a Chinese lexical unit extraction method for untagged specialized corpora. This method is based on an incremental process driven by an association score. This work features an unsupervised statistically aided linguistic approach. The extraction results - evaluated on a random sample of the working corpus - show decent precision and recall which amount respectively to 52.6% and 53.7%. Biography: Gaël Patin is a Phd student in Natural Language Processing currently working on Lexicon Extraction In Chinese, exploiting corpora and linguistically informed statistical models. He is actually a software engineer at Arisem - a Thales Group Company, which develops Text Mining and Text Analytics solutions. He holds a Bachelor of Science in Computer Science from University of Paris 7 and a Master of Science in Natural Language Processing from INALCO (National Institute of Eastern Languages and Civilizations)