Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.

TitleImproving the Arabidopsis genome annotation using maximal transcript alignment assemblies.
Publication TypeJournal Articles
Year of Publication2003
AuthorsHaas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O
JournalNucleic Acids Res
Volume31
Issue19
Pagination5654-66
Date Published2003 Oct 1
ISSN1362-4962
Keywordsalgorithms, Alternative Splicing, Arabidopsis, DNA, Complementary, Expressed Sequence Tags, Genome, Plant, Introns, Plant Proteins, RNA, Plant, sequence alignment, software, Transcription, Genetic, Untranslated Regions
Abstract

The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the approximately 27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.

Alternate JournalNucleic Acids Res.
PubMed ID14500829
PubMed Central IDPMC206470
Grant ListR01-LM06845-04 / LM / NLM NIH HHS / United States