Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.
Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Cluster-based boundary of use for selective improvement to supervised learning
Abstract
Supervised learning (SL) is an active research area used for data mining in diverse fields such as education and bioinformatics. Algorithms such as feature selection, noise correction, active learning, and boosting are designed to improve SL predictive accuracy. Although often effective, these improvement algorithms still have problems. One common problem is using an improvement algorithm indiscriminately on all the training data can actually reduce predictive accuracy. Another problem is these algorithms require additional running time. The solution explored in this work is to use improvement algorithms more selectively on the training data. When used selectively, an improvement algorithm is used only on the areas of training data that would most benefit from improvement and not on areas where improvement is unnecessary (or even detrimental). We propose a new cluster-based selective improvement algorithm called the Boundary of Use (BoU). The BoU starts by using a clustering algorithm to partition the training data into clusters. Then, the BoU decides whether each cluster corresponds to an area where improvement is unnecessary (inside the boundary) or an area that would benefit from improvement (outside). This decision is based on whether the SL system is doing well or is struggling on member data. Finally, the BoU uses the improvement algorithm selectively on each cluster outside the boundary. We comprehensively investigate our BoU notion by adapting it to three different application areas resulting in three new selective improvement algorithms: (1) cluster-based boosting (CBB) for boosting, (2) BoU-AL for active learning, and (3) BoU-DP for feature selection and noise correction in data preprocessing. Extensive empirical results on benchmark and real-world datasets demonstrate the effectiveness of these new selective improvement algorithms compared to the same improvement algorithm on all the training data and to existing, selective improvement algorithms. Ultimately, our BoU work results in five research contributions: (1) an insightful categorization of improvement algorithms based on common problems, (2) a novel, flexible selective improvement algorithm adapted into a suite of new selective improvement algorithms on three different application areas, (3) a novel use of clustering as a means for independently evaluating SL, (4) new work establishing the effectiveness of conventional clustering for selective improvement, and (5) SL applied to educational data mining and survey informatics.
Subject Area
Computer science
Recommended Citation
Miller, Lee Dee, "Cluster-based boundary of use for selective improvement to supervised learning" (2014). ETD collection for University of Nebraska-Lincoln. AAI3665383.
https://digitalcommons.unl.edu/dissertations/AAI3665383