Electrical and Computer Engineering, Department of

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

Modeling Biological Structures via Abstract Grammars to Solve Common Problems in Computational Biology

David J. Russell, University of Nebraska-LincolnFollow

First Advisor

Khalid Sayood

Date of this Version

2010

Document Type

Dissertation

Comments

A dissertation Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Philosophy, Major: Engineering (Electrical Engineering), Under the Supervision of Professor Khalid Sayood. Lincoln, Nebraska: October, 2010.
Copyright 2010 David James Russell.

Abstract

Grammars are generally understood to be the set of rules that define the relationships between elements of a language. However, grammars can also be used to elucidate structural relationships within sequences constructed from any finite alphabet. In this work abstract grammars are used to model the primary and secondary structures present in biological data. These grammar models are inferred and applied to efficiently solve various sequence analysis problems in computational biology, including multiple sequence alignment, fragment assembly, database redundancy removal, and structural prediction.

The primary structures, or sequential ordering of symbols, of biological data are first modeled with Lempel-Ziv (LZ) grammars. The results are used to construct a grammar based sequence distance metric which can be used to compare biological sequences by comparing their inferred grammars. This concept is applied to solve several problems involving biological sequence analysis including multiple sequence alignment and phylogenetic clustering. The higher-level secondary structures of biological sequences are then modeled via two novel grammar inference methods. The resulting context-free grammars are used to estimate structural pieces within biological sequences, which can in-turn be used as supplemental information to help guide various sequence analysis algorithms. The use of this approach to develop algorithms for various sequence analysis tasks demonstrates the viability and versatility of using abstract grammars to model biological data.

Download

Included in

Computational Engineering Commons, Signal Processing Commons

COinS

Electrical and Computer Engineering, Department of

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

Modeling Biological Structures via Abstract Grammars to Solve Common Problems in Computational Biology

First Advisor

Date of this Version

Document Type

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Electrical and Computer Engineering, Department of

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

Modeling Biological Structures via Abstract Grammars to Solve Common Problems in Computational Biology

Authors

First Advisor

Date of this Version

Document Type

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links