More about HKUST
THE USE OF DISCRETE DISTRIBUTIONS WITH A VERY LARGE CODEBOOK FOR AUTOMATIC SPEECH RECOGNITION AND SPEAKER VERIFICATION
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "THE USE OF DISCRETE DISTRIBUTIONS WITH A VERY LARGE CODEBOOK FOR AUTOMATIC
SPEECH RECOGNITION AND SPEAKER VERIFICATION"
By
Mr. Guoli Ye
Abstract
With the advance of semiconductor technology and the popularity of distributed
speech/speaker recognition paradigm (e.g., Siri in iPhone4s), we would like to
revisit the use of discrete model in automatic speech recognition (ASR) and
speaker verification (SV) task. Compared with the dominant continuous density
model, discrete model has inherently attractive properties: it uses
non-parametric output distributions and takes only O(1) time to get the
probability value from it; Furthermore, the features used in discrete models,
compared with that in continuous models, could be encoded in fewer bits,
lowering the bandwidth requirement in distributed speech/speaker recognition
architecture. Unfortunately, the recognition performance of conventional
discrete model is significantly worse than that of the continuous one due to
the large quantization error and the use of multiple independent streams. In
this thesis, we propose to reduce the quantization error of a discrete system
by using a very large codebook with tens of thousands of codewords (in
conventional discrete model, the number of codewords in a codebook usually
ranges from 256 to 1024). Various issues/challenges for very large codebook
systems are addressed in the thesis, including how to robustly estimate such a
high-density model with hundreds of time more parameters, which type of
codebook should be used and how large should the size be, how to model the
stream correlations in this multiple-stream system. Experimental evaluations on
both ASR and SV tasks show the feasibility and benefits of the very large
codebook discrete systems.
Date: Thursday, 20 December 2012
Time: 9:00am - 11:00am
Venue: Room 3402
Lifts 17/18
Chairman: Prof. Ming Sing (SOSC)
Committee Members: Prof. Brian Mak (Supervisor)
Prof. James Kwok
Prof. Dit-Yan Yeung
Prof. Chi-Ying Tsui (ECE)
Prof. Mei-Ling Meng (Sys. Engg. & Engg. Mgmt., CUHK)
**** ALL are Welcome ****