Statistics, Department of

 

The R Journal

Date of this Version

8-2016

Document Type

Article

Citation

The R Journal (August 2016) 8(1); Editor: Michael Lawrence

Comments

Copyright 2016, The R Foundation. Open access material. License: CC BY 3.0 Unported

Abstract

In recent years, the cost of DNA sequencing has decreased at a rate that has outpaced improvements in memory capacity. It is now common to collect or have access to many gigabytes of biological sequences. This has created an urgent need for approaches that analyze sequences in subsets without requiring all of the sequences to be loaded into memory at one time. It has also opened opportunities to improve the organization and accessibility of information acquired in sequencing projects. The DECIPHER package offers solutions to these problems by assisting in the curation of large sets of biological sequences stored in compressed format inside a database. This approach has many practical advantages over standard bioinformatics workflows, and enables large analyses that would otherwise be prohibitively time consuming.

Share

COinS