THE USE OF DISCRETE DISTRIBUTIONS WITH A VERY LARGE CODEBOOK FOR AUTOMATIC SPEECH RECOGNITION AND SPEAKER VERIFICATION

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "THE USE OF DISCRETE DISTRIBUTIONS WITH A VERY LARGE CODEBOOK FOR AUTOMATIC 
SPEECH RECOGNITION AND SPEAKER VERIFICATION"

By

Mr. Guoli Ye


Abstract

With the advance of semiconductor technology and the popularity of distributed 
speech/speaker recognition paradigm (e.g., Siri in iPhone4s), we would like to 
revisit the use of discrete model in automatic speech recognition (ASR) and 
speaker verification (SV) task. Compared with the dominant continuous density 
model, discrete model has inherently attractive properties: it uses 
non-parametric output distributions and takes only O(1) time to get the 
probability value from it; Furthermore, the features used in discrete models, 
compared with that in continuous models, could be encoded in fewer bits, 
lowering the bandwidth requirement in distributed speech/speaker recognition 
architecture. Unfortunately, the recognition performance of conventional 
discrete model is significantly worse than that of the continuous one due to 
the large quantization error and the use of multiple independent streams. In 
this thesis, we propose to reduce the quantization error of a discrete system 
by using a very large codebook with tens of thousands of codewords (in 
conventional discrete model, the number of codewords in a codebook usually 
ranges from 256 to 1024). Various issues/challenges for very large codebook 
systems are addressed in the thesis, including how to robustly estimate such a 
high-density model with hundreds of time more parameters, which type of 
codebook should be used and how large should the size be, how to model the 
stream correlations in this multiple-stream system. Experimental evaluations on 
both ASR and SV tasks show the feasibility and benefits of the very large 
codebook discrete systems.


Date:			Thursday, 20 December 2012

Time:			9:00am - 11:00am

Venue:			Room 3402
 			Lifts 17/18

Chairman:		Prof. Ming Sing (SOSC)

Committee Members:	Prof. Brian Mak (Supervisor)
 			Prof. James Kwok
 			Prof. Dit-Yan Yeung
 			Prof. Chi-Ying Tsui (ECE)
                        Prof. Mei-Ling Meng (Sys. Engg. & Engg. Mgmt., CUHK)


**** ALL are Welcome ****