Bigram Latent Semantic Analysis for Unsupervised Language Model Adaptation

Speaker:	Mr. Wilson TAM (PhD candidate)
		Carnegie Mellon University

Title: 		"Bigram Latent Semantic Analysis for Unsupervised
		 Language Model Adaptation"

Date:		Friday, 25 July 2008

Time:		11am - 12 noon

Venue:		Room 3530 (via lifts 25/26)
		HKUST

Abstract:

We propose using correlated bigram LSA for unsupervised LM adaptation
for automatic speech recognition. The model is trained using efficient
variational EM and smoothed using the proposed fractional Kneser-Ney
smoothing which handles fractional counts. Our approach can be
scalable to large training corpora via bootstrapping of bigram LSA
from unigram LSA. For LM adaptation, unigram and bigram LSA are
integrated into the background N-gram LM via marginal adaptation and
linear interpolation respectively. Experimental results show that
applying unigram and bigram LSA together yields 6%-8% relative
perplexity reduction and 0.6% absolute character error rates (CER)
reduction compared to applying only unigram LSA on the Mandarin RT04
test set. Comparing with the unadapted baseline, our approach reduces
the absolute CER by 1.2%. Our approach also showed performance
improvement on the Arabic speech recognition system.


******************
Biography:

Yik-Cheung (Wilson) Tam is a Ph.D student in the language technology
institute at the Carnegie Mellon University. He received an M. Phil
degree in computer science in 2001 and a bachelor degree in computer
engineering in 1997 from the Hong Kong University of Science and
Technology. He actively involves in the research and development of
Mandarin automatic speech recognition system for the GALE project. His
research interests include automatic speech recognition, statistical
machine translation and machine learning.


		For enquriy, please call 2358 7008
			*** All are Welcome***
----------------------------------------------------------------------