Statistics, Department of

 

The R Journal

Date of this Version

6-2014

Document Type

Article

Citation

The R Journal (June 2014) 6(1); Editor: Deepayan Sarkar

Comments

Copyright 2014, The R Foundation. Open access material. License: CC BY 3.0 Unported

Abstract

Comparing text strings in terms of distance functions is a common and fundamental task in many statistical text-processing applications. Thus far, string distance functionality has been somewhat scattered around R and its extension packages, leaving users with inconistent interfaces and encoding handling. The stringdist package was designed to offer a low-level interface to several popular string distance algorithms which have been re-implemented in C for this purpose. The package offers distances based on counting q-grams, edit-based distances, and some lesser known heuristic distance functions. Based on this functionality, the package also offers inexact matching equivalents of R’s native exact matching functions match and %in%.

Share

COinS