Statistics, Department of

 

The R Journal

Date of this Version

6-2017

Document Type

Article

Citation

The R Journal (June 2017) 9(1); Editor: Roger Bivand

Comments

Copyright 2017, The R Foundation. Open access material. License: CC BY 4.0 International

Abstract

The iotools package provides a set of tools for input and output intensive data processing in R. The functions chunk.apply and read.chunk are supplied to allow for iteratively loading contiguous blocks of data into memory as raw vectors. These raw vectors can then be efficiently converted into matrices and data frames with the iotools functions mstrsplit and dstrsplit. These functions minimize copying of data and avoid the use of intermediate strings in order to drastically improve performance. Finally, we also provide read.csv.raw to allow users to read an entire dataset into memory with the same efficient parsing code. In this paper, we present these functions through a set of examples with an emphasis on the flexibility provided by chunk-wise operations. We provide benchmarks comparing the speed of read.csv.raw to data loading functions provided in base R and other contributed packages.

Share

COinS