More about HKUST
Efficient Algorithms for Genome-Wide Computing
Speaker: Xiang ZHANG Department of Computer Science University of North Carolina at Chapel Hill USA Title: "Efficient Algorithms for Genome-Wide Computing" Date: Monday, 29 November 2010 Time: 4:00pm - 5:00pm Venue: Lecture Theater F (near lifts 25/26), HKUST Abstract: Advanced experimental methods have rendered feasible cost-effective and high-throughput data collecting in the whole genomes of human and other model organisms. Genome-wide genotype and gene expression data are now available for dissecting the complex biological processes. Making sense out of the flood of data, however, is computationally and statistically challenging. In this talk, I will first introduce algorithms that enable exhaustive genome-wide epistasis (gene-gene interaction) detection. Due to the enormous search space, epistasis detection in the genome-wide scale was previously considered intractable and commonly addressed using heuristics without guarantees about the solution quality. I will show that by indexing the genotypes and utilizing upper bounds of the test statistics, we can dramatically prune the search space and reduce redundant computation. In addition to handling specific tests, our algorithms can be applied to a various problem settings by utilizing convexity, a common property of many widely used statistics in genome-wide association study. In the second part of the talk, I will present algorithms for finding local correlations in high dimensional data. In many applications, we are interested in local latent patterns held by feature subsets, which are invisible via global feature projection methods. We have formalized such local latent patterns as strongly correlated feature subsets. Due to the combinatorial nature of the problem and lack of monotonicity of the correlation measurement, it is prohibitively expensive to exhaustively explore the whole search space. Our algorithms utilize spectrum properties and effective heuristics to prune the search space, and are capable of identifying local correlations hidden in gene expression and other real-life high-dimensional data. *********************** Biography: Xiang Zhang is a Ph.D. Candidate in the Department of Computer Science at the University of North Carolina at Chapel Hill advised by Wei Wang. His research focuses on bioinformatics, data mining, and machine learning. For his research, he has won a best student paper award at ICDE 2008 and a best research paper award at SIGKDD 2008. He is a recipient of a Microsoft Research Ph.D. Fellowship in 2009-2011.