MPhil Thesis Proposal Defence "Development of an Asynchronous Multi-band System for Continuous Speech Recognition" By Mr. Yik-Cheung Tam Abstract Recently, multi-band automatic speech recognition (MBASR) has been proposed by Bourlard et al. and Hermansky et al. to improve robustness under noisy environment. It is motivated by the empirical findings by Harvey Fletcher of Bell Labs from a thorough study of human speech recognition(HSR) in which partial speech recognition in sub-bands is believed to take place and then the sub-band decisions are recombined to arrive at a global decision. They found that the full band error rate is empirically equal to the multiplication of sub-band error rates. This implies that human can recognize the speech correctly if there exists a correct sub-band recognition. The MBASR framework proposed by Bourlard et al. and Hermansky et al. is to divide the full frequency band into sub-bands and a speech recognizer is built for each sub-band. During recognition, decisions from individual sub-band recognizers are recombined to arrive at a final decision at some phonetic or linguistic level. In this thesis, a multi-band system is implemented with several proposed features. First, it can be used for continuous speech recognition which is generally a basic requirement for research or real-life application deployment. Second, it allows asynchronous recombination of sub-band information at any desired units such as phoneme/syllable/word units. Third, sub-band information are recombined in an optimal sense. In this proposal, HMM composition framework is introduced as the back-bone in a multi-band system to address the sub-band asynchrony issue. In addition, continuous speech recognition can easily be realized under such framework. Using linear recombination of sub-band log-likelihoods, string-based minimum classification error criterion (MCE) is employed to optimize the sub-band weightings using simulated noisy speech. Our first preliminary experiment suggested that word-based MCE weight training is effective in emphasizing the more reliable sub-band with more weights resulting in boosted performance on an isolated digit recognition task. Our second preliminary experiment suggested that the heuristics of assuming uniform transition probabilities in composite HMMs results in performane degradation on a connected digit recognition task when asynchrony is allowed. It is crucial to either compute them using sub-band counterparts or re-train them in order to utilize the HMM composition framework successfully. Date: Thursday, 28 December 2000 Time: 3:00p.m.-5:00p.m. Venue: Room 2302 Lifts 17-18 Chairman: Dr. Cunsheng Ding Committee Members: Dr. Brian Mak (Supervisor) Dr. James Kwok Dr. Man-Hung Siu **** ALL are Welcome ****