Statistics, Department of

The R Journal
Date of this Version
8-2016
Document Type
Article
Citation
The R Journal (August 2016) 8(1); Editor: Michael Lawrence
Abstract
In recent years, the cost of DNA sequencing has decreased at a rate that has outpaced improvements in memory capacity. It is now common to collect or have access to many gigabytes of biological sequences. This has created an urgent need for approaches that analyze sequences in subsets without requiring all of the sequences to be loaded into memory at one time. It has also opened opportunities to improve the organization and accessibility of information acquired in sequencing projects. The DECIPHER package offers solutions to these problems by assisting in the curation of large sets of biological sequences stored in compressed format inside a database. This approach has many practical advantages over standard bioinformatics workflows, and enables large analyses that would otherwise be prohibitively time consuming.
Included in
Numerical Analysis and Scientific Computing Commons, Programming Languages and Compilers Commons
Comments
Copyright 2016, The R Foundation. Open access material. License: CC BY 3.0 Unported