Statistics, Department of

 

The R Journal

Date of this Version

6-2019

Document Type

Article

Citation

The R Journal (June 2019) 11(1); Editor: Norm Matloff

Comments

Copyright 2019, The R Foundation. Open access material. License: CC BY 4.0 International

Abstract

Identification of informative variables in an information system is often performed using simple one-dimensional filtering procedures that discard information about interactions between variables. Such an approach may result in removing some relevant variables from consideration. Here we present an R package MDFS (MultiDimensional Feature Selection) that performs identification of informative variables taking into account synergistic interactions between multiple descriptors and the decision variable. MDFS is an implementation of an algorithm based on information theory (Mnich and Rudnicki, 2017). The computational kernel of the package is implemented in C++. A high-performance version implemented in CUDA C is also available. The application of MDFS is demonstrated using the well-known Madelon dataset, in which a decision variable is generated from synergistic interactions between descriptor variables. It is shown that the application of multidimensional analysis results in better sensitivity and ranking of importance.

Share

COinS