More about HKUST
Bigram Latent Semantic Analysis for Unsupervised Language Model Adaptation
Speaker: Mr. Wilson TAM (PhD candidate) Carnegie Mellon University Title: "Bigram Latent Semantic Analysis for Unsupervised Language Model Adaptation" Date: Friday, 25 July 2008 Time: 11am - 12 noon Venue: Room 3530 (via lifts 25/26) HKUST Abstract: We propose using correlated bigram LSA for unsupervised LM adaptation for automatic speech recognition. The model is trained using efficient variational EM and smoothed using the proposed fractional Kneser-Ney smoothing which handles fractional counts. Our approach can be scalable to large training corpora via bootstrapping of bigram LSA from unigram LSA. For LM adaptation, unigram and bigram LSA are integrated into the background N-gram LM via marginal adaptation and linear interpolation respectively. Experimental results show that applying unigram and bigram LSA together yields 6%-8% relative perplexity reduction and 0.6% absolute character error rates (CER) reduction compared to applying only unigram LSA on the Mandarin RT04 test set. Comparing with the unadapted baseline, our approach reduces the absolute CER by 1.2%. Our approach also showed performance improvement on the Arabic speech recognition system. ****************** Biography: Yik-Cheung (Wilson) Tam is a Ph.D student in the language technology institute at the Carnegie Mellon University. He received an M. Phil degree in computer science in 2001 and a bachelor degree in computer engineering in 1997 from the Hong Kong University of Science and Technology. He actively involves in the research and development of Mandarin automatic speech recognition system for the GALE project. His research interests include automatic speech recognition, statistical machine translation and machine learning. For enquriy, please call 2358 7008 *** All are Welcome*** ----------------------------------------------------------------------