THE USE OF DISTINCT ACOUSTIC MODELING IN AUTOMATIC SPEECH RECOGNITION

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "THE USE OF DISTINCT ACOUSTIC MODELING IN AUTOMATIC SPEECH 
        RECOGNITION"

By

Mr. Yu Ting KO


Abstract

In triphone-based acoustic modeling, it is difficult to robustly model 
infrequent triphones due to their lack of training samples. Naive 
maximum-likelihood (ML) estimation of infrequent triphone models produces 
poor triphone models and eventually affects the overall performance of an 
automatic speech recognition (ASR) system. Among different techniques 
proposed to solve the infrequent triphone problem, the most widely used 
method in current ASR systems is state tying because of its effectiveness 
in reducing model size and achieving good recognition results. However, 
state tying inevitably introduces quantization errors since triphones tied 
to the same state are not distinguishable in that state. This thesis 
addresses the problem by the use of distinctacoustic modeling where every 
modeling unit has a unique model and a distinct acoustic score.

The main contribution of this thesis is the formulation of the estimation 
of triphone models as an adaptation problem through our proposed distinct 
acoustic modeling framework named eigentriphone modeling. The rational 
behind eigentriphone modeling is that a basis is derived over the frequent 
triphones and then each triphone is modeled as a point in the space 
spanned by the basis. The eigenvectors in the basis represent the most 
important context-dependent characteristics among the triphones and thus 
the infrequent triphones can be robustly modeled with few training 
samples. Furthermore, the proposed framework is very flexible and can be 
applied to other modeling units. Since grapheme-based modeling is useful 
in automatic speech recognition of under-resourced languages, we further 
apply our distinct acoustic modeling framework to estimate 
context-dependent grapheme models and we call our new method 
eigentrigrapheme modeling. Experimental evaluation of eigentriphone 
modeling was done on the Wall Street Journal word recognition task and the 
TIMIT phoneme recognition task. Experimental evaluation of 
eigentrigrapheme modeling was done on four official South African 
under-resourced languages. It is shown that distinct acoustic modeling 
consistently performs better than the most common state tying method.


Date:			Friday, 11 April 2014

Time:			10:00am – 12:00noon

Venue:			Room 5505
 			Lifts 25/26

Chairman:		Prof. Yongsheng Gao (MAE)

Committee Members:	Prof. Brian Mak (Supervisor)
 			Prof. Siu-Wing Cheng
 			Prof. Raymond Wong
 			Prof. Wing-Hung Ki (ECE)
                        Prof. Tan Lee (Elec. Engg., CUHK)


**** ALL are Welcome ****