Computer Science and Engineering, Department of

 

Document Type

Article

Date of this Version

Summer 7-15-2017

Citation

J. Ramanan, P. Z. Revesz, Testing the independence hypothesis of accepted mutations for pairs of adjacent amino acids in protein sequences, International Journal of Biology and Biomedical Engineering, 11, 170-179, 2017.

Comments

OPEN ACCESS journal

Jyotsna Ramanan, MS in Computer Science, University of Nebraska-Lincoln, 2016.

Abstract

Evolutionary studies usually assume that the genetic mutations are independent of each other. However, that does not imply that the observed mutations are independent of each other because it is possible that when a nucleotide is mutated, then it may be biologically beneficial if an adjacent nucleotide mutates too. With a number of decoded genes currently available in various genome libraries and online databases, it is now possible to have a large-scale computer-based study to test whether the independence assumption holds for pairs of adjacent amino acids. Hence the independence question also arises for pairs of adjacent amino acids within proteins. The independence question can be tested by considering the evolution of proteins within a closely related sets of proteins, which are called protein families. In this thesis, we test the independence hypothesis for three protein families from the PFAM library, which is a publicly available online database that records a growing number of protein families. For each protein family, we construct a hypothetical common ancestor, or consensus sequence. We compare the hypothetical common ancestor of a protein family with each of the descendant protein sequences in the family to test where the mutations occurred during evolution. The comparison yields actual probabilities for each pair of amino acids changing into another pair of amino acids. By comparing the actual probabilities with the theoretical probabilities under the independence assumption, we identify anomalies that indicate that the independence assumption does not hold for many pairs of amino acids.

Share

COinS