Statistics, Department of

Date of this Version

6-2020

Document Type

Article

Citation

The R Journal (June 2020) 12(1); Editor: Michael J. Kane

Comments

Abstract

In the era of “big data”, it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what’s really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms—like random forests (RFs) and gradient boosted decision trees (GBMs)—have a natural way of quantifying the importance or relative influence of each feature. Other algorithms—like naive Bayes classifiers and support vector machines—are not capable of doing so and model-agnostic approaches are generally used to measure each predictor’s importance. Enter vip, an R package for constructing variable importance scores/plots for many types of supervised learning algorithms using model-specific and novel model-agnostic approaches. We’ll also discuss a novel way to display both feature importance and feature effects together using sparklines, a very small line chart conveying the general shape or variation in some feature that can be directly embedded in text or tables.

Download

Included in

Numerical Analysis and Scientific Computing Commons, Programming Languages and Compilers Commons

COinS

DigitalCommons@University of Nebraska - Lincoln

Statistics, Department of

The R Journal

Variable Importance Plots: An Introduction to the vip Package

Date of this Version

Document Type

Citation

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

DigitalCommons@University of Nebraska - Lincoln

Statistics, Department of

The R Journal

Variable Importance Plots: An Introduction to the vip Package

Authors

Date of this Version

Document Type

Citation

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links