Electrical Engineering, Department of
Title
Grammar-Based Distance in Progressive Multiple Sequence Alignment
Document Type
Article
Date of this Version
2008
Abstract
Background: We propose a multiple sequence alignment (MSA) algorithm and compare the
alignment-quality and execution-time of the proposed algorithm with that of existing algorithms.
The proposed progressive alignment algorithm uses a grammar-based distance metric to determine
the order in which biological sequences are to be pairwise aligned. The progressive alignment
occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned.
Results: The performance of the proposed algorithm is validated via comparison to popular
progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently
developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of
amino acid alignment files and a set of longer sequences generated by Rose software. The proposed
algorithm has successfully built multiple alignments comparable to other programs with significant
improvements in running time. The results are especially striking for large datasets.
Conclusion: We introduce a computationally efficient progressive alignment algorithm using a
grammar based sequence distance particularly useful in aligning large datasets.

Comments
Published in BMC Bioinformatics 2008, 9:306. Copyright © 2008 Russell et al; licensee BioMed Central Ltd. Used by permission.