Graduate Studies
First Advisor
Rafael De Ayala
Degree Name
Doctor of Philosophy (Ph.D.)
Department
Psychological Studies in Education
Date of this Version
12-11-2024
Document Type
Dissertation
Citation
A dissertation presented to the faculty of the Graduate College at the University of nebraska in partial fulfillment of requirements for the degree of Doctor of Philosophy
Major: Educational Studies (Educational Leadership and Higher Education)
Under the supervision of Professor Deryl K. Hatch-Tocaimaza
Lincoln, Nebraska, February 2020
Abstract
Although the impact of machine learning methods in the educational sciences has been limited, recent opportunities have emerged that can benefit from these flexible methods capable of predicting both linear and non-linear relationships. One emerging area of interest, multi-target regression (MTR), is concerned with the simultaneous prediction of multiple, continuous target variables and is an extension of traditional supervised learning. Problem transformation methods are a subset of MTR methods that seek to exploit the dependencies amongst targets using stacking, chaining, ensembles, or combinations of the three by expanding the Single-Target approach. This is accomplished by using one or more base-learners, which are typically standard machine learning algorithms. The purpose of the proposed study was to investigate the use of shallow problem transformation methods and understand the impact of using performance estimates derived from cross-validation on model assessment, selection, and the communication of results for multi-target regression problems, specifically involving small sample sizes. The study examined these methods on both benchmark and simulated datasets, aiming to provide applied researchers with a consistent procedure for 1) selecting a final learning pipeline amongst several candidates to deploy operationally, 2) providing performance estimates without losing samples to estimation by employing k-fold cross-validation, and 3) providing confidence intervals designed to correct for overly optimistic performance estimates when applying multi-target regression methods to novel datasets. Results from the study suggest that both ensemble methods, including Ensemble of Regressor Chains and Ensembles of Stacked Regressors performed well across nearly all benchmark datasets. Base-learner selection can be confined to the Ranger, MLP, SVM-Radial, and XGBoost base-learners. While no single method or base-learner outperformed all others across all tasks, these base-learners performed well when utilized on both benchmark and simulated datasets. The use of 10-fold cross-validation reduced the bias present in performance estimates over 2- and 5-fold cross-validation, and improved coverage rates of confidence intervals. Additional results indicate that confidence intervals constructed using the Bootstrap Bias Corrected Cross-Validation method, produced superior coverage rates when compared to the naïve method of confidence interval construction on simulated data.
Recommended Citation
Smith, Bradley Ryan, "Examining the Use of Problem Transformation Methods in Multi-target Regression" (2024). Dissertations and Doctoral Documents from University of Nebraska-Lincoln, 2023–. 242.
https://digitalcommons.unl.edu/dissunl/242
Comments
Copyright 2024, the author. Used by permission