Computer Science and Engineering, Department of
Date of this Version
5-16-2003
Abstract
In this paper, we investigate the effectiveness of Citation K-Nearest Neighbors (KNN) learning with noisy training datasets. We devise an authority measure associated with each training instance that changes based on the outcome of Citation KNN classification. The authority is increased when a citer’s classification had been right; and vice versa. We show that by modifying only these authority measures, the classification accuracy of Citation KNN improves significantly in a variety of datasets with different noise levels. We also identify the general characteristics of a dataset that affect the improvement percentages. We conclude that the new algorithm is able to regulate the roles of good and noisy training instances using a very simple authority measure to improve classification.
Comments
University of Nebraska-Lincoln, Computer Science and Engineering
Technical Report # TR-UNL-CSE-2003-0003 05/16/2003