::abstract

Parallel Assembly of Large Genomes from Paired Short Reads

Srinivas Aluru (Iowa State University)

High-throughput short read DNA sequencers are enabling inexpensive sampling of genomes at high coverage. Assembling such short reads to discover hitherto unsequenced organisms is an important challenge in computational biology. The need for memory-intensive graph based models for accurate assembly coupled with the much larger number and higher coverage needed for short reads, is limiting many assemblers to not scale beyond bacterial genomes. In this talk, I will present our parallel short read assembly framework that can assemble large genomes from high coverage sampling of paired short reads with approximate distance constraints. We use bidirected graph models and developed parallel a lgorithms to carry out memory-intensive phases of the assembly using larges distributed memory available on parallel systems. Our framework can handle multiple sized reads and multiple types of distance constraints. I will demonstrate the applicability of this work in genome sequencing projects and comment on future directions.

Bio: Srinivas Aluru is the Mehl Professor of Computer Engineering at Iowa State University, and the Bajaj Group Chair Professor of Computer Science and Engineering at IIT Bombay. Earlier, he served as Chair of Iowa State's Bioinformatics and Computational Biology program. Aluru conducts research in high performance computing, bioinformatics and systems biology, combinatorial scientific computing, and applied algorithms. He is a recipient of the NSF Career award, the Swarnajayanti fellowship from the Government of India, IBM faculty award, Iowa State University Foundation award for mid-career achievement in research, two best paper awards (IPDPS 2006 and CSB 2005), and two best paper finalist recognitions (SC 2007 and SC 2002). He co-chairs an annual workshop in High Performance Computational Biology and edited a comprehensive handbook on computational molecular biology.