Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.
Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Functional classification of divergent protein sequences and molecular evolution of multi-domain proteins
Abstract
Transmembrane proteins and multi-domain proteins together make up more than 80% of the total proteins in any eukaryotic proteome. Therefore accurately classifying such proteins into functional classes is an important task. Furthermore, understanding the molecular evolution of multi-domain proteins is important because it shows how various domains fuse to form more complex proteins, and acquire new functions possibly affecting the organismal level of evolution. In this thesis, I first investigated the performance of several protein classifiers using one of the most divergent transmembrane protein families, the G-protein-coupled receptor (GPCR) superfamily, as an example. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. While a support vector machine using local pairwise-alignment scores showed very well-balanced performance, profile hidden Markov models were generally highly specific and well suited for classifying well-established protein family members. We suggested that different types of protein classifiers should be applied to gain the optimal mining power. Including some of these methods, combinations of multiple protein classification methods were applied to identify especially divergent plant GPCRs (or seven-transmembrane receptors) from the Arabidopsis thaliana genome. We identified 394 proteins as the candidates and provided a prioritized list including 54 proteins for further investigation. For multi-domain protein families, the distribution of urea amidolyase, urea carboxylase, and sterol-sensing domain (SSD) proteins across kingdoms was investigated. Molecular evolutionary analysis showed that the urea amidolyase genes currently found only in fungi among eukaryotes are the results of a horizontal gene transfer event from proteobacteria. Urea carboxylase genes currently found in fungi and other limited organisms were also likely derived from another ancestral gene in bacteria. Finally we showed the possibility of the bacterial origin of the eukaryotic SSD-containing proteins and that these ancestral sequences evolved into four different SSD-containing proteins acquiring specific functions. Two groups of SSD-containing proteins seemed to have been formed before the divergence of fungal and metazoan lineages by domain acquisition.
Subject Area
Molecular biology|Bioinformatics
Recommended Citation
Strope, Pooja K, "Functional classification of divergent protein sequences and molecular evolution of multi-domain proteins" (2011). ETD collection for University of Nebraska-Lincoln. AAI3450122.
https://digitalcommons.unl.edu/dissertations/AAI3450122