A Speech Attribute Detection Approach to Next-Generation Speech Processing

Speaker:        Professor  Chin-Hui Lee
                School of Electrical and Computer Engineering
                Georgia Institute of Technology

Title:          "A Speech Attribute Detection Approach to Next-Generation
                 Speech Processing"

Date:           Wednesday, 7 August 2013

Time:           4:00pm - 5:00pm

Venue:          Room 2463 (via lifts 25/26), HKUST

Abstract:

The field of automatic speech recognition (ASR) has enjoyed more than 30
years of technology advancement due to the extensive utilization of the
hidden Markov model (HMM) framework and a concentrated effort by the
community to make available a vast amount of language resources. However
the ASR problem is still far from being solved because not all information
available in the speech knowledge hierarchy can be directly and
effectively integrated into state-of-the-art systems to improve ASR
performance and enhance system robustness. It is believed that some of the
current knowledge insufficiency issues can be partially addressed by
processing techniques that can take advantage of the full set of acoustic
and language information in speech. On the other hand in human speech
recognition (HSR) and spectrogram reading we often determine the
linguistic identity of a sound based on detected cues and evidences that
exist at various levels of the speech knowledge hierarchy, ranging from
acoustic phonetics to syntax and semantics. This calls for a bottom-up
knowledge integration framework that links speech processing with
information extraction, by spotting speech cues with a bank of attribute
detectors, weighing and combining acoustic evidences to form cognitive
hypotheses, and verifying these theories until a consistent recognition
decision can be reached. The recently proposed ASAT (automatic speech
attribute transcription) framework is an attempt to mimic some HSR
capabilities with asynchronous speech event detection followed by
bottom-up speech knowledge integration and verification. In the last few
years it has demonstrated potentials and offered insights in
detection-based speech processing and information extraction.

This presentation is intended to illustrate new possibilities of future
speech via linking analysis and processing of raw speech signals with
extracting multiple layers of useful information. We will also demonstrate
that the same methodology used in speech attribute detection and knowledge
integration can be extended to extracting language information from
heterogeneous media signals for multimedia event detection (MED) and
multimedia event recounting (MER).


******************
Biography:

Chin-Hui Lee is a professor at School of Electrical and Computer
Engineering, Georgia Institute of Technology. Dr. Lee received the B.S.
degree in Electrical Engineering from National Taiwan University, Taipei,
in 1973, the M.S. degree in Engineering and Applied Science from Yale
University, New Haven, in 1977, and the Ph.D. degree in Electrical
Engineering with a minor in Statistics from University of Washington,
Seattle, in 1981.

Dr. Lee started his professional career at Verbex Corporation, Bedford,
MA, and was involved in research on connected word recognition. In 1984,
he became affiliated with Digital Sound Corporation, Santa Barbara, where
he engaged in research and product development in speech coding, speech
synthesis, speech recognition and signal processing for the development of
the DSC-2000 Voice Server. Between 1986 and 2001, he was with Bell
Laboratories, Murray Hill, New Jersey, where he became a Distinguished
Member of Technical Staff and Director of the Dialogue Systems Research
Department. His research interests include multimedia communication,
multimedia signal and information processing, speech and speaker
recognition, speech and language modeling, spoken dialogue processing,
adaptive and discriminative learning, biometric authentication, and
information retrieval. From August 2001 to August 2002 he was a visiting
professor at School of Computing, The National University of Singapore. In
September 2002, he joined the Faculty Georgia Institute of Technology.

Prof. Lee has participated actively in professional societies. He is a
member of the IEEE Signal Processing Society (SPS), Communication Society,
and the International Speech Communication Association (ISCA).  In
1991-1995, he was an associate editor for the IEEE Transactions on Signal
Processing and Transactions on Speech and Audio Processing. During the
same period, he served as a member of the ARPA Spoken Language
Coordination Committee. In 1995-1998 he was a member of the Speech
Processing Technical Committee and later became the chairman from 1997 to
1998. In 1996, he helped promote the SPS Multimedia Signal Processing
Technical Committee in which he is a founding member.

Dr. Lee is a Fellow of the IEEE, and has published 400 papers and 30
patents.  He received the SPS Senior Award in 1994 and the SPS Best Paper
Award in 1997 and 1999, respectively.  In 1997, he was awarded the
prestigious Bell Labs President's Gold Award for his contributions to the
Lucent Speech Processing Solutions product. Dr. Lee often gives seminal
lectures to a wide international audience. In 2000, he was named one of
the six Distinguished Lecturers by the IEEE Signal Processing Society. He
was also named one of the two ISCA's inaugural Distinguished Lecturers in
2007-2008. He won the SPS's 2006 Technical Achievement Award for
"Exceptional Contributions to the Field of Automatic Speech Recognition".
In 2012 he was invited by ICASSP to give a plenary talk on the future of
automatic speech recognition. He was selected as an ISCA Fellow in 2012,
and awarded the 2012 ISCA Medal in scientific achievement for "pioneering
and seminal contributions to the principles and practice of automatic
speech and speaker recognition, including fundamental innovations in
adaptive learning, discriminative training and utterance verification".