Computing, School of
School of Computing: Dissertations, Theses, and Student Research
Accessibility Remediation
If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.
First Advisor
Stephen D. Scott
Date of this Version
5-2017
Document Type
Thesis
Citation
James Duin, Hierarchical Active Learning Application to Mitochondrial Disease Protein Dataset, MS thesis, University of Nebraska-Lincoln, May 2017.
Abstract
This study investigates an application of active machine learning to a protein dataset developed to identify the source of mutations which give rise to mitochondrial disease. The dataset is labeled according to the protein's location of origin in the cell; whether in the mitochondria or not, or a specific target location in the mitochondria's outer or inner membrane, its matrix, or its ribosomes. This dataset forms a labeling hierarchy. A new machine learning approach is investigated to learn the high-level classifier, i.e., whether the protein is a mitochondrion, by separately learning finer-grained target compartment concepts and combining the results. This approach is termed active over-labeling. In experiments on the protein dataset it is shown that active over-labeling improves area under the precision-recall curve compared to standard passive or active learning. Because finer-grained labels are more costly to obtain, alternative strategies exploring using fixed proportions of a given budget to buy fine vs. coarse labels at various costs are compared and presented. Finally, we present a cost-sensitive active learner that uses a multi-armed bandit approach to dynamically choose the label granularity to purchase, and show that the bandit-based learner is robust to variations in both labeling cost and budget.
Comments
A thesis presented to the faculty of the Graduate College at the University of Nebraska in partial fulfillment of requirements for the degree of Master of Science
Major: Computer Science
Under the supervision of Professor Stephen D. Scott. Lincoln, Nebraska, May 2017
Copyright (c) 2017 James Duin