Computing, School of

School of Computing: Dissertations, Theses, and Student Research

Sequence-based Bioinformatics Approaches to Predict Virus–Host Relationships in Archaea and Eukaryotes

Yingshan Li, University of Nebraska-LincolnFollow

First Advisor

Juan Cui

Second Advisor

Etsuko Moriyama

Third Advisor

Heriberto Cerutti

Committee Members

Etsuko N. Moriyama, Heriberto Cerutti

Date of this Version

Fall 12-1-2022

Document Type

Thesis

Comments

A thesis presented to the faculty of the Graduate College at the University of Nebraska in partial fulfillment of requirements for the degree of Master of Science

Major: Computer Science

Under the supervision of Professor Juan Cui. Lincoln, NE: December 2022

Abstract

Viral metagenomics is independent of lab culturing and capable of investigating viromes of virtually any given environmental niches. While numerous sequences of viral genomes have been assembled from metagenomic studies over the past years, the natural hosts for the majority of these viral contigs have not been determined. Different computational approaches have been developed to predict hosts of bacteria phages. Nevertheless, little progress has been made in the virus-host prediction, especially for viruses that infect eukaryotes and archaea. In this study, by analyzing all documented viruses with known eukaryotic and archaeal hosts, we assessed the predictive power of four computational approaches in viral host prediction. The use of the following biological relationships among viruses and hosts were explored: 1. Sequence similarity between virus and host genome, where direct genetic interactions between viruses and hosts are assumed to leave traces of historical infections. 2. Co-evolution between viruses and hosts, where the viral dependency on their hosts for replication is assumed to result in similar genomic features including nucleotide composition and codon usage. 3. Sequence similarity between viruses, where closely related viruses are assumed to infect the same hosts. And 4. genomic feature similarities between viruses based on nucleotide compositions and dinucleotide/codon/bi-codon usage biases. We assume that viruses with similar genomic features tend to share the same hosts. We showed that using any of the four approaches produced better predictions than uninformed guesses, indicating that our current knowledge of virus-host interaction and

co-evolution can be exploited to help predict natural hosts among eukaryotes and archaea for viral contigs. Overall, the third and fourth approaches (prediction based on virus-virus genomic sequence similarity and genomic feature similarity) had the highest prediction accuracy. The second approach (prediction based on virus-host co-evolution) has the least predictive power. We also discuss the biological underpinnings of different predictive power shown in each of these approaches. We anticipate a significant increase in predictive capacity as more training data and knowledge of virus-host relationships are accumulated in the future.

Advisor: Juan Cui

Download

Included in

Biology Commons, Computational Biology Commons, Computer Engineering Commons, Computer Sciences Commons

COinS

Computing, School of

School of Computing: Dissertations, Theses, and Student Research

Sequence-based Bioinformatics Approaches to Predict Virus–Host Relationships in Archaea and Eukaryotes

First Advisor

Second Advisor

Third Advisor

Committee Members

Date of this Version

Document Type

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Computing, School of

School of Computing: Dissertations, Theses, and Student Research

Sequence-based Bioinformatics Approaches to Predict Virus–Host Relationships in Archaea and Eukaryotes

Authors

First Advisor

Second Advisor

Third Advisor

Committee Members

Date of this Version

Document Type

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links