Parallelizing De Novo Assembly with Heterogeneous Processors

PhD Thesis Proposal Defence


Title: "Parallelizing De Novo Assembly with Heterogeneous Processors"

by

Miss Shuang QIU


Abstract:

De Novo assemblers construct genome sequences from small fragments, 
without using any reference genome. Specifically, they represent the 
fragments in a De Bruijn graph and traverse the graph to generate the 
sequence. As constructing and traversing a big De Bruijn graph is both 
time and memory space consuming, we develop ParaGraph, a parallel software 
package that runs this process on a cluster of GPU-equipped computers. In 
particular, it utilizes all processor cores in each CPU and GPU, all CPUs 
and GPUs in a computer node, and all computer nodes of the cluster. 
Furthermore, we analyze the characteristics of genome data to design a 
concurrent hashing algorithm for the graph construction, and to reduce the 
communication overhead in the graph traversal. We further improve the 
overall performance by partitioning and storing the data in a compact 
format, pipelining data transfer and computation, and overlapping 
computation and communication. Our experiments show that on real-world 
datasets, ParaGraph is an order of magnitude faster than the 
state-of-the-art shared memory based assemblers, and more than five times 
faster than the current distributed assemblers.


Date:			Wednesday, 29 August 2018

Time:                  	10:00am - 12:00pm

Venue:                  Room 3494
                         (lifts 25/26)

Committee Members:	Dr. Qiong Luo (Supervisor)
 			Dr. Wilfred Ng (Chairperson)
 			Dr. Ke Yi
 			Prof. Weichuan Yu (ECE)


**** ALL are Welcome ****