Computing, School of

 

School of Computing: Dissertations, Theses, and Student Research

Accessibility Remediation

If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.

First Advisor

Stephen D. Scott

Date of this Version

5-2017

Document Type

Thesis

Citation

James Duin, Hierarchical Active Learning Application to Mitochondrial Disease Protein Dataset, MS thesis, University of Nebraska-Lincoln, May 2017.

Comments

A thesis presented to the faculty of the Graduate College at the University of Nebraska in partial fulfillment of requirements for the degree of Master of Science

Major: Computer Science

Under the supervision of Professor Stephen D. Scott. Lincoln, Nebraska, May 2017

Copyright (c) 2017 James Duin

Abstract

This study investigates an application of active machine learning to a protein dataset developed to identify the source of mutations which give rise to mitochondrial disease. The dataset is labeled according to the protein's location of origin in the cell; whether in the mitochondria or not, or a specific target location in the mitochondria's outer or inner membrane, its matrix, or its ribosomes. This dataset forms a labeling hierarchy. A new machine learning approach is investigated to learn the high-level classifier, i.e., whether the protein is a mitochondrion, by separately learning finer-grained target compartment concepts and combining the results. This approach is termed active over-labeling. In experiments on the protein dataset it is shown that active over-labeling improves area under the precision-recall curve compared to standard passive or active learning. Because finer-grained labels are more costly to obtain, alternative strategies exploring using fixed proportions of a given budget to buy fine vs. coarse labels at various costs are compared and presented. Finally, we present a cost-sensitive active learner that uses a multi-armed bandit approach to dynamically choose the label granularity to purchase, and show that the bandit-based learner is robust to variations in both labeling cost and budget.

Share

COinS