Electrical Engineering, Department of
Title
The Average Mutual Information Profile as a Genomic Signature
Document Type
Article
Date of this Version
1-25-2008
Abstract
Background: Occult organizational structures in DNA sequences may hold the key to
understanding functional and evolutionary aspects of the DNA molecule. Such structures can also
provide the means for identifying and discriminating organisms using genomic data. Species specific
genomic signatures are useful in a variety of contexts such as evolutionary analysis, assembly and
classification of genomic sequences from large uncultivated microbial communities and a rapid
identification system in health hazard situations.
Results: We have analyzed genomic sequences of eukaryotic and prokaryotic chromosomes as
well as various subtypes of viruses using an information theoretic framework. We confirm the
existence of a species specific average mutual information (AMI) profile. We use these profiles to
define a very simple, computationally efficient, alignment free, distance measure that reflects the
evolutionary relationships between genomic sequences. We use this distance measure to classify
chromosomes according to species of origin, to separate and cluster subtypes of the HIV-1 virus,
and classify DNA fragments to species of origin.
Conclusion: AMI profiles of DNA sequences prove to be species specific and easy to compute.
The structure of AMI profiles are conserved, even in short subsequences of a species' genome,
rendering a pervasive signature. This signature can be used to classify relatively short DNA
fragments to species of origin.

Comments
Published in BMC Bioinformatics 2008, 9:48. Copyright © 2008 Bauer et al; licensee BioMed Central Ltd. Used by permission.