Computing, School of
School of Computing: Technical Reports
Accessibility Remediation
If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.
Date of this Version
2012
Document Type
Article
Citation
Department of Computer Science & Engineering, University of Nebraska-Lincoln, Technical Report, TR-UNL-CSE-2012-0003
Abstract
Numerous “repair” mechanisms have been developed to improve the training data for super-vised learning (SL) systems including feature selection, noise correction, and active learning. These repair mechanisms myopically repair in-stances as long the estimated system performance continues to improve. Such general re-pair can lead to unnecessary repairs and overfitting from repair which can lower system performance on new instances. We propose a Boundary of Use (BoU) meta-reasoning framework to decide which instances should be repaired. This framework uses a semi-supervised clustering approach to partition the training instances into regions where the SL system does well without re-pair, regions where it makes some mistakes, and regions where repair is deemed hopeless. Repair is then applied selectively to only mixed regions. We demonstrate that BoU-enhanced versions of repair improve SL system performance on 21 UCI datasets where general repair has varying degrees of unnecessary repair and overfitting.