Computer Science and Engineering, Department of

 

Computer Science, Computer Engineering, and Bioinformatics: Dissertations, Theses, and Student Research

First Advisor

Stephen D. Scott

Date of this Version

5-2017

Document Type

Thesis

Citation

James Duin, Hierarchical Active Learning Application to Mitochondrial Disease Protein Dataset, MS thesis, University of Nebraska-Lincoln, May 2017.

Comments

A thesis presented to the faculty of the Graduate College at the University of Nebraska in partial fulfillment of requirements for the degree of Master of Science

Major: Computer Science

Under the supervision of Professor Stephen D. Scott. Lincoln, Nebraska, May 2017

Copyright (c) 2017 James Duin

Abstract

This study investigates an application of active machine learning to a protein dataset developed to identify the source of mutations which give rise to mitochondrial disease. The dataset is labeled according to the protein's location of origin in the cell; whether in the mitochondria or not, or a specific target location in the mitochondria's outer or inner membrane, its matrix, or its ribosomes. This dataset forms a labeling hierarchy. A new machine learning approach is investigated to learn the high-level classifier, i.e., whether the protein is a mitochondrion, by separately learning finer-grained target compartment concepts and combining the results. This approach is termed active over-labeling. In experiments on the protein dataset it is shown that active over-labeling improves area under the precision-recall curve compared to standard passive or active learning. Because finer-grained labels are more costly to obtain, alternative strategies exploring using fixed proportions of a given budget to buy fine vs. coarse labels at various costs are compared and presented. Finally, we present a cost-sensitive active learner that uses a multi-armed bandit approach to dynamically choose the label granularity to purchase, and show that the bandit-based learner is robust to variations in both labeling cost and budget.

Share

COinS