Statistics, Department of

Department of Statistics: Faculty Publications

S1: Supplementary Information for Article: A copula based approach for design of multivariate random forests for drug sensitivity prediction

Saad Haider, Texas Tech University
Raziur Rahman, Texas Tech University
Souparno Ghosh, Texas Tech UniversityFollow
Ranadip Pal, Texas Tech UniversityFollow

Document Type

Article

Date of this Version

2015

Citation

PLOS 1-7

Abstract

Changes in performance with prior feature selection

Random forest (RF) is designed to create uncorrelated trees using random subsets of features in each node of each tree. RF by itself is a great tool for feature selection from a high dimensional set of features. But we observed that the prediction accuracy is improved when a prior feature selection (RELIEFF) [1] approach is implemented. Table A shows the performance of RF, VMRF and CMRF with and without RELIEFF feature selection in 2 drug sets of GDSC.

Performance Analysis for drugsets consisting of more 8 than two drugs

We have generated empirical copulas for the bivariate cases as they are able to capture all forms of dependency structures. However, generation of empirical copulas has high computational complexity along with the need for a significant number of training samples at each node. Thus for more than two drug responses, we have considered parametric copulas and the difference between Gaussian copula parameters generated using root node and split node samples instead of the integral difference between empirical copulas is used. To test our hypothesis that VMRF and CMRF will perform better than RF, we considered a drug set with 4 different drugs from CCLE with single common target between them and a drug set with 3 different drugs in GDSC with a common target between them. The CCLE set has 482 cell lines and the GDSC set has 308 cell lines. RELIEFF was used to reduce the feature space prior to random forest application. For simplicity, in this case, we’ve used 30% of the sample cell lines as training data and 70% of them as testing data.

Download

Included in

Other Statistics and Probability Commons

COinS

Statistics, Department of

Department of Statistics: Faculty Publications

S1: Supplementary Information for Article: A copula based approach for design of multivariate random forests for drug sensitivity prediction

Document Type

Date of this Version

Citation

Abstract

Included in

Search

Browse

Author Corner

Links

Statistics, Department of

Department of Statistics: Faculty Publications

S1: Supplementary Information for Article: A copula based approach for design of multivariate random forests for drug sensitivity prediction

Authors

Document Type

Date of this Version

Citation

Abstract

Included in

Share

Search

Browse

Author Corner

Links