Statistics, Department of
The R Journal
Date of this Version
6-2019
Document Type
Article
Citation
The R Journal (June 2019) 11(1); Editor: Norm Matloff
Abstract
Identification of informative variables in an information system is often performed using simple one-dimensional filtering procedures that discard information about interactions between variables. Such an approach may result in removing some relevant variables from consideration. Here we present an R package MDFS (MultiDimensional Feature Selection) that performs identification of informative variables taking into account synergistic interactions between multiple descriptors and the decision variable. MDFS is an implementation of an algorithm based on information theory (Mnich and Rudnicki, 2017). The computational kernel of the package is implemented in C++. A high-performance version implemented in CUDA C is also available. The application of MDFS is demonstrated using the well-known Madelon dataset, in which a decision variable is generated from synergistic interactions between descriptor variables. It is shown that the application of multidimensional analysis results in better sensitivity and ranking of importance.
Included in
Numerical Analysis and Scientific Computing Commons, Programming Languages and Compilers Commons
Comments
Copyright 2019, The R Foundation. Open access material. License: CC BY 4.0 International