Food Science and Technology, Department of

Department of Food Science and Technology: Faculty Publications

Alignment behaviors of short peptides provide a roadmap for functional profiling of metagenomic data

Rohita Sinha, University of Nebraska-LincolnFollow
Jennifer L. Clarke, University of Nebraska-LincolnFollow
Andrew K. Benson, University of Nebraska-LincolnFollow

ORCID IDs

Jennifer L. Clarke

Document Type

Article

Date of this Version

2015

Citation

Sinha et al. BMC Genomics (2015) 16:1080

DOI 10.1186/s12864-015-2272-z

Comments

Abstract

Background: Functional assignments for short-read metagenomic data pose a significant computational challenge due to perceived unpredictability of alignment behavior and the inability to infer useful functional information from translated protein-fragments/peptides. To address this problem, we have examined the predictability of short peptide alignments by systematically studying alignment behavior of large sets of short peptides generated from well-characterized proteins as well as hypothetical proteins in the KEGG database.

Results: Using test sets of peptides modeling the length and phylogenetic distributions of short-read metagenomic data, we observed that peptides from well-characterized proteins had indistinguishable alignments to proteins from the same orthologous family and proteins from different families. Nonetheless, the patterns contained remarkable phylogenetic and structural signals, with alignments of even very short peptides naturally restricted to their orthologous family and/or proteins having similar structural folds. In stark contrast, peptides from “hypothetical proteins” had only sparse hit patterns with low frequencies and much lower identities. By weighting the structure-driven alignments and filtering peptides with behaviors similar to those derived from “hypothetical proteins”, we demonstrate that the accuracy of abundance predictions of protein families is dramatically improved.

Conclusions: Evolutionary processes have dispersed protein folds across multiple protein families, precluding accurate functional assignment to short peptides, whose alignment behavior is non-random and driven by structure. Algorithms that filter sparse peptides and weight hit patterns of peptides from “known space” dramatically improve quantification of functions from diverse mixtures of peptides and should substantially improve applications of metagenomic analyses requiring accurate quantitative measures of functional families.

Download

Included in

Food Science Commons

COinS

Food Science and Technology, Department of

Department of Food Science and Technology: Faculty Publications

Alignment behaviors of short peptides provide a roadmap for functional profiling of metagenomic data

ORCID IDs

Document Type

Date of this Version

Citation

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Food Science and Technology, Department of

Department of Food Science and Technology: Faculty Publications

Alignment behaviors of short peptides provide a roadmap for functional profiling of metagenomic data

Authors

ORCID IDs

Document Type

Date of this Version

Citation

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links