More about HKUST
THE USE OF DISTINCT ACOUSTIC MODELING IN AUTOMATIC SPEECH RECOGNITION
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "THE USE OF DISTINCT ACOUSTIC MODELING IN AUTOMATIC SPEECH
RECOGNITION"
By
Mr. Yu Ting KO
Abstract
In triphone-based acoustic modeling, it is difficult to robustly model
infrequent triphones due to their lack of training samples. Naive
maximum-likelihood (ML) estimation of infrequent triphone models produces
poor triphone models and eventually affects the overall performance of an
automatic speech recognition (ASR) system. Among different techniques
proposed to solve the infrequent triphone problem, the most widely used
method in current ASR systems is state tying because of its effectiveness
in reducing model size and achieving good recognition results. However,
state tying inevitably introduces quantization errors since triphones tied
to the same state are not distinguishable in that state. This thesis
addresses the problem by the use of distinctacoustic modeling where every
modeling unit has a unique model and a distinct acoustic score.
The main contribution of this thesis is the formulation of the estimation
of triphone models as an adaptation problem through our proposed distinct
acoustic modeling framework named eigentriphone modeling. The rational
behind eigentriphone modeling is that a basis is derived over the frequent
triphones and then each triphone is modeled as a point in the space
spanned by the basis. The eigenvectors in the basis represent the most
important context-dependent characteristics among the triphones and thus
the infrequent triphones can be robustly modeled with few training
samples. Furthermore, the proposed framework is very flexible and can be
applied to other modeling units. Since grapheme-based modeling is useful
in automatic speech recognition of under-resourced languages, we further
apply our distinct acoustic modeling framework to estimate
context-dependent grapheme models and we call our new method
eigentrigrapheme modeling. Experimental evaluation of eigentriphone
modeling was done on the Wall Street Journal word recognition task and the
TIMIT phoneme recognition task. Experimental evaluation of
eigentrigrapheme modeling was done on four official South African
under-resourced languages. It is shown that distinct acoustic modeling
consistently performs better than the most common state tying method.
Date: Friday, 11 April 2014
Time: 10:00am – 12:00noon
Venue: Room 5505
Lifts 25/26
Chairman: Prof. Yongsheng Gao (MAE)
Committee Members: Prof. Brian Mak (Supervisor)
Prof. Siu-Wing Cheng
Prof. Raymond Wong
Prof. Wing-Hung Ki (ECE)
Prof. Tan Lee (Elec. Engg., CUHK)
**** ALL are Welcome ****