Electrical and Computer Engineering, Department of

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

Accessibility Remediation

If you are unable to use this item in its current form due to accessibility barriers, you may request remediation through our remediation request form.

Genome Annotation Using Average Mutual Information

Garin P. Newcomb, University of Nebraska-LincolnFollow

First Advisor

Khalid Sayood

Date of this Version

12-2021

Document Type

Thesis

Citation

G. P. Newcomb, "Genome Annotation Using Average Mutual Information," M.S. thesis, Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, 2021.

Comments

A thesis Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science, Major: Electrical Engineering, Under the Supervision of Professor Khalid Sayood. Lincoln, Nebraska: December 2021

Abstract

Advancements in high-throughput DNA sequencing technologies and ambitious goals for their use are resulting in the generation of a deluge of unannotated sequenced genomes. This makes computational tools that can aid in annotation increasingly valuable.

Here, we provide a detailed exploration of the utility as well as the limitations of average mutual information (AMI) in several steps of genome annotation. For a genomic sequence, AMI is a measure of the information a base contains about the base separated by a ﬁxed lag. A proﬁle is constructed by calculating AMI at multiple lags. In addition to traditional AMI, we employ two AMI variants: expanded AMI and expanded-adjusted AMI, both of which preserve some granular detail discarded by AMI.

First, we demonstrate AMI’s capacity to assess evolutionary similarity by constructing phylogenetic trees similar to those currently accepted. The remainder of this work focuses on applications involving binary classiﬁcation. We use support vector machines trained using the AMI proﬁles to classify sequences and evaluate predictive performance. These classiﬁcation problems include predicting whether sequences come from protein-coding regions, identifying essential genes, and making functional predictions about the proteins genes produce. We conclude that AMI is particularly adept at identifying coding regions, and this behavior is consistent for species across all of life’s diversity.

Adviser: Khalid Sayood

Download

Included in

Other Electrical and Computer Engineering Commons

COinS

Electrical and Computer Engineering, Department of

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

Accessibility Remediation

Genome Annotation Using Average Mutual Information

First Advisor

Date of this Version

Document Type

Citation

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Electrical and Computer Engineering, Department of

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

Accessibility Remediation

Genome Annotation Using Average Mutual Information

Authors

First Advisor

Date of this Version

Document Type

Citation

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links