Statistics, Department of
The R Journal
Accessibility Remediation
If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.
Date of this Version
12-2015
Document Type
Article
Citation
The R Journal (December 2015) 7(2); Editor: Bettina Grün
Abstract
This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but strategy can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented.
Included in
Numerical Analysis and Scientific Computing Commons, Programming Languages and Compilers Commons
Comments
Copyright 2015, The R Foundation. Open access material. License: CC BY 3.0 Unported