Computing, School of

 

School of Computing: Technical Reports

Accessibility Remediation

If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.

Date of this Version

2012

Document Type

Article

Citation

Department of Computer Science & Engineering, University of Nebraska-Lincoln, Technical Report, TR-UNL-CSE-2012-0003

Abstract

Numerous “repair” mechanisms have been developed to improve the training data for super-vised learning (SL) systems including feature selection, noise correction, and active learning. These repair mechanisms myopically repair in-stances as long the estimated system performance continues to improve. Such general re-pair can lead to unnecessary repairs and overfitting from repair which can lower system performance on new instances. We propose a Boundary of Use (BoU) meta-reasoning framework to decide which instances should be repaired. This framework uses a semi-supervised clustering approach to partition the training instances into regions where the SL system does well without re-pair, regions where it makes some mistakes, and regions where repair is deemed hopeless. Repair is then applied selectively to only mixed regions. We demonstrate that BoU-enhanced versions of repair improve SL system performance on 21 UCI datasets where general repair has varying degrees of unnecessary repair and overfitting.

Share

COinS