Statistics, Department of

 

The R Journal

Accessibility Remediation

If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.

Date of this Version

6-2014

Document Type

Article

Citation

The R Journal (June 2014) 6(1); Editor: Deepayan Sarkar

Comments

Copyright 2014, The R Foundation. Open access material. License: CC BY 3.0 Unported

Abstract

Comparing text strings in terms of distance functions is a common and fundamental task in many statistical text-processing applications. Thus far, string distance functionality has been somewhat scattered around R and its extension packages, leaving users with inconistent interfaces and encoding handling. The stringdist package was designed to offer a low-level interface to several popular string distance algorithms which have been re-implemented in C for this purpose. The package offers distances based on counting q-grams, edit-based distances, and some lesser known heuristic distance functions. Based on this functionality, the package also offers inexact matching equivalents of R’s native exact matching functions match and %in%.

Share

COinS