Electrical and Computer Engineering, Department of

Department of Electrical and Computer Engineering: Faculty Publications

Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets

Aydin Albayrak, Sabanci University
Hasan H. Otu, Harvard Medical SchoolFollow
Ugur O. Sezerman, Sabanci UniversityFollow

ORCID IDs

Hasan H. Otu

Document Type

Article

Date of this Version

2010

Citation

BMC Bioinformatics 2010, 11:428

Comments

Open access

doi:10.1186/1471-2105-11-428

Abstract

Background: Phylogenetic analysis can be used to divide a protein family into subfamilies in the absence of experimental information. Most phylogenetic analysis methods utilize multiple alignment of sequences and are based on an evolutionary model. However, multiple alignment is not an automated procedure and requires human intervention to maintain alignment integrity and to produce phylogenies consistent with the functional splits in underlying sequences. To address this problem, we propose to use the alignment-free Relative Complexity Measure (RCM) combined with reduced amino acid alphabets to cluster protein families into functional subtypes purely on sequence criteria. Comparison with an alignment-based approach was also carried out to test the quality of the clustering.

Results: We demonstrate the robustness of RCM with reduced alphabets in clustering of protein sequences into families in a simulated dataset and seven well-characterized protein datasets. On protein datasets, crotonases, mandelate racemases, nucleotidyl cyclases and glycoside hydrolase family 2 were clustered into subfamilies with 100% accuracy whereas acyl transferase domains, haloacid dehalogenases, and vicinal oxygen chelates could be assigned to subfamilies with 97.2%, 96.9% and 92.2% accuracies, respectively.

Conclusions: The overall combination of methods in this paper is useful for clustering protein families into subtypes based on solely protein sequence information. The method is also flexible and computationally fast because it does not require multiple alignment of sequences.

Download

Included in

Computer Engineering Commons, Electrical and Computer Engineering Commons

COinS

Electrical and Computer Engineering, Department of

Department of Electrical and Computer Engineering: Faculty Publications

Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets

ORCID IDs

Document Type

Date of this Version

Citation

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Electrical and Computer Engineering, Department of

Department of Electrical and Computer Engineering: Faculty Publications

Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets

Authors

ORCID IDs

Document Type

Date of this Version

Citation

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links