Genome Assembly and Analysis with Optical Restriction Maps
Optical Mapping Data as a Guide for Genome Assembly
Genome assembly -- the task of reconstructing a genome from the small fragments of DNA that can be sequenced by modern technologies -- is a difficult computational problem, in no small part due to the fact that the shotgun sequencing process cannot preserve the long-range structure of the genome being assembled. Optical mapping is a genomic technology, pioneered by David Schwartz, which can map the location of restriction sites along a genomic chromosome. Thus, optical mapping provides a long-range sparse representation of a genome, complementing the high resolution but localized information provided by the reads.
In our project we plan to rely on the complementary strengths of mapping and sequencing data to help improve genome assembly. In brief, genome assemblers attempt to find a path through an appropriately defined graph, path which represents the correct genome sequence. Identifying this path is difficult given that there are an exponential number of paths through the graph that are consistent with the information contained in the reads. Optical mapping data can help remove many of the incorrect options, thus allowing us to more quickly and accurately identify the correct path through the graph.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation
Principal Investigators
Students and Postdoctoral researchers:
This is an NSF project. See more here