More about HKUST
THE USE OF DISCRETE DISTRIBUTIONS WITH A VERY LARGE CODEBOOK FOR AUTOMATIC SPEECH RECOGNITION AND SPEAKER VERIFICATION
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "THE USE OF DISCRETE DISTRIBUTIONS WITH A VERY LARGE CODEBOOK FOR AUTOMATIC SPEECH RECOGNITION AND SPEAKER VERIFICATION" By Mr. Guoli Ye Abstract With the advance of semiconductor technology and the popularity of distributed speech/speaker recognition paradigm (e.g., Siri in iPhone4s), we would like to revisit the use of discrete model in automatic speech recognition (ASR) and speaker verification (SV) task. Compared with the dominant continuous density model, discrete model has inherently attractive properties: it uses non-parametric output distributions and takes only O(1) time to get the probability value from it; Furthermore, the features used in discrete models, compared with that in continuous models, could be encoded in fewer bits, lowering the bandwidth requirement in distributed speech/speaker recognition architecture. Unfortunately, the recognition performance of conventional discrete model is significantly worse than that of the continuous one due to the large quantization error and the use of multiple independent streams. In this thesis, we propose to reduce the quantization error of a discrete system by using a very large codebook with tens of thousands of codewords (in conventional discrete model, the number of codewords in a codebook usually ranges from 256 to 1024). Various issues/challenges for very large codebook systems are addressed in the thesis, including how to robustly estimate such a high-density model with hundreds of time more parameters, which type of codebook should be used and how large should the size be, how to model the stream correlations in this multiple-stream system. Experimental evaluations on both ASR and SV tasks show the feasibility and benefits of the very large codebook discrete systems. Date: Thursday, 20 December 2012 Time: 9:00am - 11:00am Venue: Room 3402 Lifts 17/18 Chairman: Prof. Ming Sing (SOSC) Committee Members: Prof. Brian Mak (Supervisor) Prof. James Kwok Prof. Dit-Yan Yeung Prof. Chi-Ying Tsui (ECE) Prof. Mei-Ling Meng (Sys. Engg. & Engg. Mgmt., CUHK) **** ALL are Welcome ****