Statistics, Department of

The R Journal
Date of this Version
8-2016
Document Type
Article
Citation
The R Journal (August 2016) 8(1); Editor: Michael Lawrence
Abstract
Web access logs contain information on HTTP(S) requests and form a key part of both industry and academic explorations of human behaviour on the internet. But the preparation (reading, parsing and manipulation) of that data is just unique enough to make generalized tools unfit for the task, both in programming time and processing time which are compounded when dealing with large data sets common with web access logs. In this paper we explain and demonstrate a series of packages designed to efficiently read in, parse and munge access log data, allowing researchers to handle URLs and IP addresses easily. These packages are substantially faster than existing R methods- from a 3-500% speedup for file reading to a 57,000% speedup in URL parsing.
Included in
Numerical Analysis and Scientific Computing Commons, Programming Languages and Compilers Commons
Comments
Copyright 2016, The R Foundation. Open access material. License: CC BY 3.0 Unported