Parametric Complexity of Sequence Assembly: Theory and Applications to Next Generation Sequencing

TitleParametric Complexity of Sequence Assembly: Theory and Applications to Next Generation Sequencing
Publication TypeJournal Articles
Year of Publication2009
AuthorsNagarajan N, Pop M.
JournalJournal of Computational BiologyJournal of Computational Biology
Volume16
Type of Article10.1089/cmb.2009.0005
ISBN Number1066-5277, 1557-8666
Abstract

In recent years, a flurry of new DNA sequencing technologies have altered the landscape of genomics, providing a vast amount of sequence information at a fraction of the costs that were previously feasible. The task of assembling these sequences into a genome has, however, still remained an algorithmic challenge that is in practice answered by heuristic solutions. In order to design better assembly algorithms and exploit the characteristics of sequence data from new technologies, we need an improved understanding of the parametric complexity of the assembly problem. In this article, we provide a first theoretical study in this direction, exploring the connections between repeat complexity, read lengths, overlap lengths and coverage in determining the “hard” instances of the assembly problem. Our work suggests at least two ways in which existing assemblers can be extended in a rigorous fashion, in addition to delineating directions for future theoretical investigations.