Computing, School of

First Advisor

Stephen D. Scott

Date of this Version

5-2017

Document Type

Thesis

Citation

James Duin, Hierarchical Active Learning Application to Mitochondrial Disease Protein Dataset, MS thesis, University of Nebraska-Lincoln, May 2017.

Comments

A thesis presented to the faculty of the Graduate College at the University of Nebraska in partial fulfillment of requirements for the degree of Master of Science

Major: Computer Science

Under the supervision of Professor Stephen D. Scott. Lincoln, Nebraska, May 2017

Abstract

This study investigates an application of active machine learning to a protein dataset developed to identify the source of mutations which give rise to mitochondrial disease. The dataset is labeled according to the protein's location of origin in the cell; whether in the mitochondria or not, or a specific target location in the mitochondria's outer or inner membrane, its matrix, or its ribosomes. This dataset forms a labeling hierarchy. A new machine learning approach is investigated to learn the high-level classifier, i.e., whether the protein is a mitochondrion, by separately learning finer-grained target compartment concepts and combining the results. This approach is termed active over-labeling. In experiments on the protein dataset it is shown that active over-labeling improves area under the precision-recall curve compared to standard passive or active learning. Because finer-grained labels are more costly to obtain, alternative strategies exploring using fixed proportions of a given budget to buy fine vs. coarse labels at various costs are compared and presented. Finally, we present a cost-sensitive active learner that uses a multi-armed bandit approach to dynamically choose the label granularity to purchase, and show that the bandit-based learner is robust to variations in both labeling cost and budget.

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Computing, School of

School of Computing: Dissertations, Theses, and Student Research

Hierarchical Active Learning Application to Mitochondrial Disease Protein Dataset

First Advisor

Date of this Version

Document Type

Citation

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Computing, School of

School of Computing: Dissertations, Theses, and Student Research

Hierarchical Active Learning Application to Mitochondrial Disease Protein Dataset

Authors

First Advisor

Date of this Version

Document Type

Citation

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links