More about HKUST
MULTI-TASK LEARNING DEEP NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "MULTI-TASK LEARNING DEEP NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION" By Mr. Dongpeng CHEN Abstract Multi-task learning (MTL) learns multiple tasks together to improve the performance of all the tasks by exploiting extra information from each other with shared internal representation. Additional related secondary task(s) acts as regularizer(s) to help improve the generalization performance of each singly learning task; the effect is more prominent when the amount of training data is relatively small. Recently, deep neural network (DNN) is widely utilized for acoustic modeling in ASR. The hidden layers of DNN are ideal internal representation for the shared knowledge. The main contribution of this thesis is proposing three methods of applying MTL to DNN for acoustic modeling to exploit extra information from the related tasks, meanwhile imposing the guideline that the secondary tasks should not require additional language resources, which is a great benefit when language resources are limited. In the first method, phone and grapheme acoustic models are trained together within the same deep neural network. The extra information is the phone-to-grapheme mappings, which is confirmed by analysis and visualization of implicit phone-to-grapheme correlation matrix computed from the model parameters. The training convergence curve also shows that MTL training generalizes better to unseen data than common single task learning does. Moreover, two extensions are proposed to further improve the performance. State tying to some extent relieves the data scarcity problem in context-dependent acoustic modeling. However, quantization errors are inevitably introduced. The second MTL method in this thesis aims at robust modeling of a large set of distinct contextdependent acoustic units. More specifically, distinct triphone states are trained with a smaller set of tied-states, benefiting from better inductive bias to reach a better optimum. In return, they embed more contextual information into the hidden layers of the MTL-DNN acoustic models. Our last method works in a multi-lingual setting when data of multiple languages are available. Multi-lingual acoustic modeling is improved by learning a universal phone set (UPS) modeling task together with language-specific triphones modeling tasks to help implicitly map the phones of multiple languages to each other. MTL methods were proved to be effective on a board range of data sets. The contributions of this thesis include the three proposed MTL methods, and the heuristic guidelines we impose to find helpful secondary tasks. With the successful explorations, we hope to stimulate more interest of MTL in improving ASR, and our results show that it is promising for wider applications. Date: Wednesday, 19 August 2015 Time: 10:30am - 12:30pm Venue: Room 2130B Lift 19 Chairman: Prof. Patrick Yue (ECE) Committee Members: Prof. Brian Mak (Supervisor) Prof. James Kwok Prof. Raymond Wong Prof. Chi-Ying Tsui (ECE) Prof. Pak-Chung Ching (Elec. Engg., CUHK) **** ALL are Welcome ****