Chinese Lexical Unit Extraction from Untagged Corpora

--------------------------------------------------------------------
               ***Joint Seminar***

The Hong Kong University of Science & Technology
Department of Computer Science and Engineering
Department of Electronic and Computer Engineering
Human Language Technology Center
--------------------------------------------------------------------

Speaker:	Gaël Patin
		Software Engineer
		Arisem

Title:		"Chinese Lexical Unit Extraction from Untagged Corpora"

Date:		Wednesday, 29 July 2009

Time:		4:00pm-5:00pm

Venue:		Rm 2578, 2/F, (via lifts 29/30), HKUST

Abstract:

Building lexical resources is a vital task in improving the efficiency of
information retrieval systems. This talk introduces a Chinese lexical unit
extraction method for untagged specialized corpora. This method is based on
an incremental process driven by an association score. This work features
an unsupervised statistically aided linguistic approach. The extraction
results - evaluated on a random sample of the working corpus - show decent
precision and recall which amount respectively to 52.6% and 53.7%.

Biography:

Gaël Patin is a Phd student in Natural Language Processing currently working
on Lexicon Extraction In Chinese, exploiting corpora and linguistically
informed statistical models. He is actually a software engineer at Arisem - a
Thales Group Company, which develops Text Mining and Text Analytics solutions.
He holds a Bachelor of Science in Computer Science from University of Paris 7
and a Master of Science in Natural Language Processing from INALCO (National
Institute of Eastern Languages and Civilizations)