Statistics, Department of

 

The R Journal

Date of this Version

12-2020

Document Type

Article

Citation

The R Journal (December 2020) 12(2); Editor: Michael J. Kane

Comments

Copyright 2020, The R Foundation. Open access material. License: CC BY 4.0 International

Abstract

Ordinal data are used in many domains, especially when measurements are collected from people through observations, tests, or questionnaires. ordinalClust is an innovative R package dedicated to ordinal data that provides tools for modeling, clustering, co-clustering and classifying such data. Ordinal data are modeled using the BOS distribution, which is a model with two meaningful parameters referred to as "position" and "precision". The former indicates the mode of the distribution and the latter describes how scattered the data are around the mode: the user is able to easily interpret the distribution of their data when given these two parameters. The package is based on the co-clustering framework (when rows and columns are simultaneously clustered). The co-clustering approach uses the Latent Block Model (LBM) and the SEM-Gibbs algorithm for parameter inference. On the other hand, the clustering and the classification methods follow on from simplified versions of the SEM-Gibbs algorithm. For the classification process, two approaches are proposed. In the first one, the BOS parameters are estimated from the training dataset in the conventional way. In the second approach, parsimony is introduced by estimating the parameters and column-clusters from the training dataset. We empirically show that this approach can yield better results. For the clustering and co-clustering processes, the ICL-BIC criterion is used for model selection purposes. An overview of these methods is given, and the way to use them with the ordinalClust package is described using real datasets. The latest stable package version is available on the Comprehensive R Archive Network (CRAN).

Share

COinS