Machine learning in a cost-constrained environment
A sometimes unrealistic assumption in typical machine learning applications is that data is freely available. In this dissertation, I present our research efforts that help to mitigate this assumption in the areas of active machine learning, budgeted machine learning and classifier reoptimization.^ Active machine learning algorithms are used when large numbers of unlabeled examples are available and getting labels for them is costly (e.g. requiring consulting a human expert). We study active learning under the goal of maximizing AUC (Area Under the ROC Curve). We examine two existing algorithms from the literature and present our own active learning algorithms designed to maximize the AUC of the hypothesis. One of our algorithms was the top performer. Further, when good posterior probability estimates were available, our heuristics were by far the best.^ Budgeted machine learning, which can be considered a dual form of active machine learning, assumes a learning algorithm has free access to the training examples' labels, but it has to pay for each attribute that is specified. We present new algorithms for choosing which attributes to purchase of which examples, based on algorithms for the multi-armed bandit problem. In addition, we also evaluate a group of algorithms based on the idea of incorporating second-order statistics into decision making. All of our approaches were competitive with the current state of the art and performed better when the budget was tight. We also present new heuristics for selecting an example to purchase after the attribute is selected, instead of selecting an example uniformly at random, which is typically done. In our experiments, these row selectors improved performance significantly for certain datasets. ^ In the area of classifier re-optimization, the learner is asked to adjust existing classifiers to work under new class distributions or cost models, while the data used for training is unavailable or the learner simply can't afford it. As another contribution to machine learning in a cost-constrained environment, we study the problem of reoptimizing a multi-class classifier based on its ROC hypersurface and a cost matrix describing the costs of each type of prediction error. We prove that the decision version of this problem is NP-complete. As a complementary positive result, we give an algorithm that finds an optimal solution in polynomial time if the number of classes n is a constant. We also present several heuristics for this problem, including linear, nonlinear, and quadratic programming formulations, genetic algorithms, and a customized algorithm. Empirical results suggest that under both uniform and non-uniform cost models, simple greedy methods outperform more sophisticated methods.^
Deng, Kun, "Machine learning in a cost-constrained environment" (2009). ETD collection for University of Nebraska - Lincoln. AAI3371935.