Electrical & Computer Engineering, Department of


First Advisor

Khalid Sayood

Date of this Version



G. P. Newcomb, "Genome Annotation Using Average Mutual Information," M.S. thesis, Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, 2021.


A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science, Major: Electrical Engineering, Under the Supervision of Professor Khalid Sayood. Lincoln, Nebraska: December 2021

Copyright © 2021 Garin P. Newcomb


Advancements in high-throughput DNA sequencing technologies and ambitious goals for their use are resulting in the generation of a deluge of unannotated sequenced genomes. This makes computational tools that can aid in annotation increasingly valuable.

Here, we provide a detailed exploration of the utility as well as the limitations of average mutual information (AMI) in several steps of genome annotation. For a genomic sequence, AMI is a measure of the information a base contains about the base separated by a fixed lag. A profile is constructed by calculating AMI at multiple lags. In addition to traditional AMI, we employ two AMI variants: expanded AMI and expanded-adjusted AMI, both of which preserve some granular detail discarded by AMI.

First, we demonstrate AMI’s capacity to assess evolutionary similarity by constructing phylogenetic trees similar to those currently accepted. The remainder of this work focuses on applications involving binary classification. We use support vector machines trained using the AMI profiles to classify sequences and evaluate predictive performance. These classification problems include predicting whether sequences come from protein-coding regions, identifying essential genes, and making functional predictions about the proteins genes produce. We conclude that AMI is particularly adept at identifying coding regions, and this behavior is consistent for species across all of life’s diversity.

Adviser: Khalid Sayood