Date of this Version
Department of Computer Science & Engineering, University of Nebraska-Lincoln, Technical Report, TR-UNL-CSE-2012-0003
Numerous “repair” mechanisms have been developed to improve the training data for super-vised learning (SL) systems including feature selection, noise correction, and active learning. These repair mechanisms myopically repair in-stances as long the estimated system performance continues to improve. Such general re-pair can lead to unnecessary repairs and overfitting from repair which can lower system performance on new instances. We propose a Boundary of Use (BoU) meta-reasoning framework to decide which instances should be repaired. This framework uses a semi-supervised clustering approach to partition the training instances into regions where the SL system does well without re-pair, regions where it makes some mistakes, and regions where repair is deemed hopeless. Repair is then applied selectively to only mixed regions. We demonstrate that BoU-enhanced versions of repair improve SL system performance on 21 UCI datasets where general repair has varying degrees of unnecessary repair and overfitting.