Accessibility Remediation

If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.

VSURF: An R Package for Variable Selection Using Random Forests

Date of this Version

12-2015

Document Type

Article

Citation

The R Journal (December 2015) 7(2); Editor: Bettina Grün

Comments

Abstract

This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but strategy can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented.

Download

Included in

Numerical Analysis and Scientific Computing Commons, Programming Languages and Compilers Commons

COinS

Statistics, Department of

The R Journal

Accessibility Remediation

VSURF: An R Package for Variable Selection Using Random Forests

Date of this Version

Document Type

Citation

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Statistics, Department of

The R Journal

Accessibility Remediation

VSURF: An R Package for Variable Selection Using Random Forests

Authors

Date of this Version

Document Type

Citation

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links