More about HKUST
THE USE OF DISTINCT ACOUSTIC MODELING IN AUTOMATIC SPEECH RECOGNITION
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "THE USE OF DISTINCT ACOUSTIC MODELING IN AUTOMATIC SPEECH RECOGNITION" By Mr. Yu Ting KO Abstract In triphone-based acoustic modeling, it is difficult to robustly model infrequent triphones due to their lack of training samples. Naive maximum-likelihood (ML) estimation of infrequent triphone models produces poor triphone models and eventually affects the overall performance of an automatic speech recognition (ASR) system. Among different techniques proposed to solve the infrequent triphone problem, the most widely used method in current ASR systems is state tying because of its effectiveness in reducing model size and achieving good recognition results. However, state tying inevitably introduces quantization errors since triphones tied to the same state are not distinguishable in that state. This thesis addresses the problem by the use of distinctacoustic modeling where every modeling unit has a unique model and a distinct acoustic score. The main contribution of this thesis is the formulation of the estimation of triphone models as an adaptation problem through our proposed distinct acoustic modeling framework named eigentriphone modeling. The rational behind eigentriphone modeling is that a basis is derived over the frequent triphones and then each triphone is modeled as a point in the space spanned by the basis. The eigenvectors in the basis represent the most important context-dependent characteristics among the triphones and thus the infrequent triphones can be robustly modeled with few training samples. Furthermore, the proposed framework is very flexible and can be applied to other modeling units. Since grapheme-based modeling is useful in automatic speech recognition of under-resourced languages, we further apply our distinct acoustic modeling framework to estimate context-dependent grapheme models and we call our new method eigentrigrapheme modeling. Experimental evaluation of eigentriphone modeling was done on the Wall Street Journal word recognition task and the TIMIT phoneme recognition task. Experimental evaluation of eigentrigrapheme modeling was done on four official South African under-resourced languages. It is shown that distinct acoustic modeling consistently performs better than the most common state tying method. Date: Friday, 11 April 2014 Time: 10:00am – 12:00noon Venue: Room 5505 Lifts 25/26 Chairman: Prof. Yongsheng Gao (MAE) Committee Members: Prof. Brian Mak (Supervisor) Prof. Siu-Wing Cheng Prof. Raymond Wong Prof. Wing-Hung Ki (ECE) Prof. Tan Lee (Elec. Engg., CUHK) **** ALL are Welcome ****