Statistics, Department of
The R Journal
Accessibility Remediation
If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.
Date of this Version
6-2014
Document Type
Article
Citation
The R Journal (June 2014) 6(1); Editor: Deepayan Sarkar
Abstract
Comparing text strings in terms of distance functions is a common and fundamental task in many statistical text-processing applications. Thus far, string distance functionality has been somewhat scattered around R and its extension packages, leaving users with inconistent interfaces and encoding handling. The stringdist package was designed to offer a low-level interface to several popular string distance algorithms which have been re-implemented in C for this purpose. The package offers distances based on counting q-grams, edit-based distances, and some lesser known heuristic distance functions. Based on this functionality, the package also offers inexact matching equivalents of R’s native exact matching functions match and %in%.
Included in
Numerical Analysis and Scientific Computing Commons, Programming Languages and Compilers Commons
Comments
Copyright 2014, The R Foundation. Open access material. License: CC BY 3.0 Unported