More about HKUST
From Vectors Representing Speech to Graphs Representing Corpora: Reconciling how far we've come with how far we still have to go
---------------------------------------------------------------------- Joint Seminar ---------------------------------------------------------------------- The Hong Kong University of Science & Technology Department of Computer Science and Engineering Human Language Technology Center ---------------------------------------------------------------------- Speaker: Stephen Shum Electrical Engineering and Computer Science (EECS) MIT Title: "From Vectors Representing Speech to Graphs Representing Corpora: Reconciling how far we've come with how far we still have to go" Date: Friday, 29 November 2013 Time: 2::00pm - 3:00pm Venue: Room 1504 (near lifts 25 & 26), HKUST Abstract: In recent years, the state-of-the-art in speaker and language recognition has relied on the effectiveness of converting a variable-length speech signal into a fixed-dimensional vector-based representation, upon which standard machine learning techniques can be effectively applied as desired. In this talk, we begin by highlighting the key ideas behind this idea and demonstrating, through the use of graph-based visualizations, the effect of channel compensation on these vector-based representations for speaker and language recognition. We then describe how graph embedding can also be used for large-scale community detection on speaker content graphs and, furthermore, how said community detection algorithms can be applied in unsupervised fashion to compensate for a domain mismatch in speaker recognition. Time permitting, we conclude with an overview of how vector-based representations of speech can also be successfully applied to extremely short speech segments in the problem of speaker diarization. ****************** Biography: Stephen Shum is a Ph.D. candidate in Electrical Engineering and Computer Science (EECS) at the Massachusetts Institute of Technology (MIT). He obtained his S.M. in June 2011 as a member of the Spoken Language Systems (SLS) group and continues to be advised by Drs. Jim Glass and Najim Dehak. Prior to attending MIT, Stephen obtained his B.S. in EECS at the University of California, Berkeley, in 2009. He is fortunate to have been afforded the opportunity to participate in a number of speech research workshops, including the JHU CLSP Summer Workshop in 2008 and, most recently, the JHU HLTCOE SCALE Workshop in 2013. Over the years, his research interests have accumulated to include not only speaker recognition, clustering, and diarization, but also computational auditory scene analysis, cover song detection, and Bayesian nonparametric approaches to speech and speaker modeling.