From Vectors Representing Speech to Graphs Representing Corpora: Reconciling how far we've come with how far we still have to go

----------------------------------------------------------------------
                Joint Seminar
----------------------------------------------------------------------
The Hong Kong University of Science & Technology
Department of Computer Science and Engineering
Human Language Technology Center
----------------------------------------------------------------------
Speaker:        Stephen Shum
                Electrical Engineering and Computer Science (EECS)
                MIT

Title:          "From Vectors Representing Speech to Graphs Representing
                 Corpora: Reconciling how far we've come with how far we
                 still have to go"

Date:           Friday, 29 November 2013

Time:           2::00pm - 3:00pm

Venue:          Room 1504 (near lifts 25 & 26), HKUST

Abstract:

In recent years, the state-of-the-art in speaker and language recognition
has relied on the effectiveness of converting a variable-length speech
signal into a fixed-dimensional vector-based representation, upon which
standard machine learning techniques can be effectively applied as
desired. In this talk, we begin by highlighting the key ideas behind this
idea and demonstrating, through the use of graph-based visualizations, the
effect of channel compensation on these vector-based representations for
speaker and language recognition. We then describe how graph embedding can
also be used for large-scale community detection on speaker content graphs
and, furthermore, how said community detection algorithms can be applied
in unsupervised fashion to compensate for a domain mismatch in speaker
recognition. Time permitting, we conclude with an overview of how
vector-based representations of speech can also be successfully applied to
extremely short speech segments in the problem of speaker diarization.


******************
Biography:

Stephen Shum is a Ph.D. candidate in Electrical Engineering and Computer
Science (EECS) at the Massachusetts Institute of Technology (MIT).  He
obtained his S.M. in June 2011 as a member of the Spoken Language Systems
(SLS) group and continues to be advised by Drs. Jim Glass and Najim Dehak.
Prior to attending MIT, Stephen obtained his B.S. in EECS at the
University of California, Berkeley, in 2009. He is fortunate to have been
afforded the opportunity to participate in a number of speech research
workshops, including the JHU CLSP Summer Workshop in 2008 and, most
recently, the JHU HLTCOE SCALE Workshop in 2013. Over the years, his
research interests have accumulated to include not only speaker
recognition, clustering, and diarization, but also computational auditory
scene analysis, cover song detection, and Bayesian nonparametric
approaches to speech and speaker modeling.