U.S. flag

An official website of the United States government, Department of Justice.

Democrats have shut down the government. Department of Justice websites are not currently regularly updated. Please refer to the Department of Justice’s contingency plan for more information.

A Divide-and-Conquer Algorithm for Large-scale de Novo Transcriptome Assembly Through Combining Small Assemblies From Existing Algorithms

NCJ Number
253425
Journal
BMC Genomics Volume: 18 Dated: 2018 Pages: 43-50
Date Published
2018
Length
8 pages
Annotation
This project developed a divide-and-conquer strategy that enables algorithms that can assemble a large amount of RNA-Seq data to be utilized by subdividing a large RNA-Seq data set into small libraries.
Abstract

Although the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. In the proposed data set with small libraries, each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high-quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. This divide-and-conquer strategy enables memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies. (publisher abstract modified)

Date Published: January 1, 2018