Statistics, Department of

 

The R Journal

Date of this Version

6-2020

Document Type

Article

Citation

The R Journal (June 2020) 12(1); Editor: Michael J. Kane

Comments

Copyright 2020, The R Foundation. Open access material. License: CC BY 4.0 International

Abstract

In the era of “big data”, it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what’s really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms—like random forests (RFs) and gradient boosted decision trees (GBMs)—have a natural way of quantifying the importance or relative influence of each feature. Other algorithms—like naive Bayes classifiers and support vector machines—are not capable of doing so and model-agnostic approaches are generally used to measure each predictor’s importance. Enter vip, an R package for constructing variable importance scores/plots for many types of supervised learning algorithms using model-specific and novel model-agnostic approaches. We’ll also discuss a novel way to display both feature importance and feature effects together using sparklines, a very small line chart conveying the general shape or variation in some feature that can be directly embedded in text or tables.

Share

COinS