GALLOP: GPU Acceleration for Genomics Applications
Graphics processors, or GPUs, have made high-performance computing inexpensive and widely accessible by packing hundreds of identical computing cores in a single chip. With the massive parallel processing power, GPUs have made their way into genomics applications through academic explorations as well as proprietary business solutions. Nevertheless, there lacks a large-scale, open, and systematic study of accelerating state-of-the-art genomic computing algorithms with the GPU. Therefore, we propose Gallop, an open-source software package that features new, GPU-accelerated algorithms for genomics applications.
Specifically, we are interested in four major computational tasks on genome data: (1) genome assembly, where short reads from an unknown DNA sequence are put together into a complete sequence; (2) sequence alignment, in which short reads are aligned to a reference sequence; (3) SNP (Single-Nucleotide Polymorphism) detection, through which the variation on a single nucleotide is identified between each aligned read and the reference sequence; and (4) genome-wide association study (GWAS), which examines the genomes of different individuals of a species. For each task, we first study the leading computational models based on effectiveness and popularity, and optimize the CPU-based algorithm. Then, we design a new GPU-based parallel algorithm that provides the same interface and functionality as the original CPU-based algorithm. Additionally, we optimize the memory access and disk IO, and schedule the CPU, the GPU, and the IO holistically for efficient co-processing.
Publications
- Zonghao Feng, Qiong Luo:
Accelerating Sequence-to-Graph Alignment on Heterogeneous Processors. ICPP 2021: 26:1-26:10
- Zonghao Feng, Shuang Qiu, Lipeng Wang, Qiong Luo:
Accelerating Long Read Alignment on Three Processors. ICPP 2019: 71:1-71:10
- Shuang Qiu, Zonghao Feng, Qiong Luo: Parallelizing Big De Bruijn Graph Traversal for Genome Assembly on GPU Clusters. DASFAA (3) 2019: 466-470
- Shuang Qiu and Qiong Luo. Parallelizing Big De Bruijn Graph Construction on Heterogeneous Processors. ICDCS 2017, Atlanta, GA, USA, Jun 2017.
- Mian Lu and Qiong Luo. Accelerating Large-Scale Genome-Wide Association Studies with Graphics Processors. Big Data Management, Technologies, and Applications. Wen-Chen Hu and Naima Kaabouch (Editors). IGI Global, 2013.pages 349-380 DOI: 10.4018/978-1-4666-4699-5.ch014
- Mian Lu, Qiong Luo, Bingqiang Wang, Junkai Wu, Jiuxin Zhao:
GPU-Accelerated Bidirected De Bruijn Graph Construction for Genome Assembly. APWeb 2013: 51-62
- Mian Lu, Yuwei Tan, Ge Bai, Qiong Luo: High-performance short sequence alignment with GPU acceleration. Distributed and Parallel Databases 30(5-6): 385-399 (2012).
- Mian Lu, Jiuxin Zhao, Qiong Luo, Bingqiang Wang:
Accelerating minor allele frequency computation with graphics processors. BigMine 2012: 85-92
- Mian Lu, Yuwei Tan, Jiuxin Zhao, Ge Bai, and Qiong Luo. Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis. The 24th International Conference on Scientific and Statistical Database Management (SSDBM-2012), Chania, Crete, Greece, June 2012.
- Mian Lu, Jiuxin Zhao, Qiong Luo, Bingqiang Wang, Shaohua Fu, and Zhe Lin. GSNP: A DNA Single-Nucleotide Polymorphism Detection System with GPU Acceleration. The 40th Annual International Conference on Parallel Processing (ICPP-2011), Taiwan, September 2011.
Software
Software License
The license is a free non-exclusive, non-transferable license to reproduce, use, modify and display the source code version of the Software, with or without modifications solely for non-commercial research, educational or evaluation purposes. The license does not entitle Licensee to technical support, telephone assistance, enhancements or updates to the Software. All rights, title to and ownership interest in Software, including all intellectual property rights therein shall remain in HKUST.
Acknowledgement
We thank our collaborator BGI Shenzhen for providing us application requirements, access to their software and data sets, and sharing their hardware resources as well as genomics domain knowledge. Funding for this project is provided by grants 616012 and 617509 from the Hong
Kong Research Grants Council and MRA11EG01 from Microsoft
SQL Server China R&D.