Electrical & Computer Engineering, Department of

 

First Advisor

Khalid Sayood

Date of this Version

5-2024

Citation

A thesis presented to the faculty of the Graduate College at the University of Nebraska in partial fulfillment of requirements for the degree of Master of Science

Major: Electrical Engineering

Under the supervision of Professor Khalid Sayood

Lincoln, Nebraska, May 2024

Comments

Copyright 2024, Joel Mohrmann. Used by permission

Abstract

The information-containing nature of the DNA molecule has been long known and observed. One technique for quantifying the relationships existing within the information contained in DNA sequences is an entity from information theory known as the average mutual information (AMI) profile. This investigation sought to use principally the AMI profile along with a few other metrics to explore the structure of the information contained in DNA sequences.

Treating DNA sequences as an information source, several computational methods were employed to model their information structure. Maximum likelihood and maximum a posteriori estimators were used to predict missing bases in DNA sequences. Other novel prediction methods based upon the AMI profile and its ability to evaluate the predictability of DNA bases were also developed and tested for accuracy. The AMI profile was also adjusted to account for the triplet-code nature of DNA sequences. Additionally, machine-learning techniques such as neural networks, support vector machines, and principal component analysis were used to classify different regions of DNA sequences using the AMI profile and to compare coding versus noncoding regions.

Finally, the analysis considered the relative frequency of groups of bases (known as k-mers) in DNA sequences. Arithmetic coding was explored as a way to effect the compression of DNA sequences modeled upon the relative frequency of the appearance of k-mers. It was concluded that biological information stored in DNA is complex, yet this investigation provided methods to elucidate some of the character of the information structure of DNA sequences.

Advisor: Khalid Sayood

Share

COinS