Electrical and Computer Engineering, Department of

Date of this Version

Fall 12-2-2011

Document Type

Thesis

Comments

A thesis Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science, Major: Computer Science, Under the Supervision of Professor Stephen Scott. Lincoln, Nebraska: December, 2011

Abstract

When using the Gene Ontology (GO), nucleotide and amino acid sequences are annotated by terms in a structured and controlled vocabulary organized into relational graphs. The usage of the vocabulary (GO terms) in the annotation of these sequences may diverge from the relations defined in the ontology. We measure the consistency of the use of GO terms by comparing GO's defined structure to the terms' application. To do this, we first use synthetic data with different characteristics to understand how these characteristics influence the correlation values determined by various similarity measures. Using these results as a baseline, we found that the correlation between GO's definition and its application to real data is relatively low, suggesting that GO annotations might not be applied in a manner consistent with its definition. In contrast, we found a sub-ontology of GO that correlates well with its usage in UniProtKB. We also study how terms from different ontologies in GO relate to each other, Such relationships can be helpful in refining term definitions. In order to identify such ``cross-terms", we propose a generalized semantic measure which can be used to identify related terms across GO ontologies. Results based on Saccharomyces Genome Database show that the measure is correlated with the degree of co-occurrence for term pairs. By thresholding the level of similarity, we found a list of highly correlated cross ontology term pairs. These term pairs show a high level of biological correlation.

Adviser: Stephen Scott

Download

Included in

Bioinformatics Commons, Computational Biology Commons, Computer Engineering Commons, Databases and Information Systems Commons

COinS

Electrical and Computer Engineering, Department of

Department of Computer Electronics and Engineering: Dissertations, Theses, and Student Research

A Study of Correlations between the Definition and Application of the Gene Ontology

Date of this Version

Document Type

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Electrical and Computer Engineering, Department of

Department of Computer Electronics and Engineering: Dissertations, Theses, and Student Research

A Study of Correlations between the Definition and Application of the Gene Ontology

Authors

Date of this Version

Document Type

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links