Virology, Nebraska Center for


Date of this Version



Published in Bioinformatics (2012); doi: 10.1093/bioinformatics/bts181 Published online in Advance Access April 11, 2012.


Copyright © 2012 Tavis K. Anderson, William W. Laegreid, Francesco Cerutti, Fernando A. Osorio, Eric A. Nelson, Jane Christopher- Hennings, and Tony L. Goldberg. Published by Oxford University Press. Used by permission.


Motivation: The extraordinary genetic and antigenic variability of RNA viruses is arguably the greatest challenge to the development of broadly effective vaccines. No single viral variant can induce sufficiently broad immunity, and incorporating all known naturally circulating variants into one multivalent vaccine is not feasible. Further, no objective strategies currently exist to select actual viral variants that should be included or excluded in polyvalent vaccines.

Results: To address this problem, we demonstrate a method based upon graph theory that quantifies the relative importance of viral variants. We demonstrate our method through application to the envelope glycoprotein gene of a particularly diverse RNA virus of pigs: porcine reproductive and respiratory syndrome virus (PRRSV). Using distance matrices derived from sequence nucleotide difference, amino acid difference, and evolutionary distance we constructed viral networks and used common network statistics to assign each sequence an objective ranking of relative “importance.” To validate our approach, we use an independent published algorithm to score our top-ranked wild-type variants for coverage of putative T-cell epitopes across the 9383 sequences in our dataset. Top-ranked viruses achieve significantly higher coverage than lower-ranked viruses, and top-ranked viruses achieve nearly equal coverage as a synthetic mosaic protein constructed in silico from the same set of 9383 sequences.

Conclusion: Our approach relies on the network structure of PRRSV, but applies to any diverse RNA virus because it identifies subsets of viral variants that are the most important to overall viral diversity. We suggest that this method, through the objective quantification of variant importance, provides criteria for choosing viral variants for further characterization, diagnostics, surveillance, and ultimately polyvalent vaccine development.

3 files of supplementary information are attached (below).

prrsv_database.fasta (10440 kB)
all PRRSV sequences that were stored on prior to it closing

prrsv_from_genbank.fasta (12819 kB)
all PRRSV sequences that were stored on NCBI Genbank

final_cleaned_prrsv_seqs.fasta (5903 kB)
final dataset used in network and phylogenetic analyses

Included in

Virology Commons