Efficient Algorithms for Genome-Wide Computing

Speaker:	Xiang ZHANG
		Department of Computer Science
		University of North Carolina at Chapel Hill
		USA

Title:		"Efficient Algorithms for Genome-Wide Computing"

Date:		Monday, 29 November 2010

Time:		4:00pm - 5:00pm

Venue:		Lecture Theater F (near lifts 25/26), HKUST

Abstract:

Advanced experimental methods have rendered feasible cost-effective and
high-throughput data collecting in the whole genomes of human and other
model organisms. Genome-wide genotype and gene expression data are now
available for dissecting the complex biological processes. Making sense
out of the flood of data, however, is computationally and statistically
challenging.

In this talk, I will first introduce algorithms that enable exhaustive
genome-wide epistasis (gene-gene interaction) detection. Due to the
enormous search space, epistasis detection in the genome-wide scale was
previously considered intractable and commonly addressed using heuristics
without guarantees about the solution quality.  I will show that by
indexing the genotypes and utilizing upper bounds of the test statistics,
we can dramatically prune the search space and reduce redundant
computation. In addition to handling specific tests, our algorithms can be
applied to a various problem settings by utilizing convexity, a common
property of many widely used statistics in genome-wide association study.

In the second part of the talk, I will present algorithms for finding
local correlations in high dimensional data. In many applications, we are
interested in local latent patterns held by feature subsets, which are
invisible via global feature projection methods. We have formalized such
local latent patterns as strongly correlated feature subsets. Due to the
combinatorial nature of the problem and lack of monotonicity of the
correlation measurement, it is prohibitively expensive to exhaustively
explore the whole search space. Our algorithms utilize spectrum properties
and effective heuristics to prune the search space, and are capable of
identifying local correlations hidden in gene expression and other
real-life high-dimensional data.

***********************
Biography:

Xiang Zhang is a Ph.D. Candidate in the Department of Computer Science at
the University of North Carolina at Chapel Hill advised by Wei Wang. His
research focuses on bioinformatics, data mining, and machine learning. For
his research, he has won a best student paper award at ICDE 2008 and a
best research paper award at SIGKDD 2008.  He is a recipient of a
Microsoft Research Ph.D. Fellowship in 2009-2011.