Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Signal based Bayesian framework for gene structural prediction

Alexander Tchourbanov, University of Nebraska - Lincoln

Abstract

The development of a "Precise, predictive model of RNA splicing/alternative splicing" has recently been listed among the ten major challenges facing bioinformatics. Analysis of existing obstacles in precise gene structural annotation was followed by several implementations targeted to address specific problems. To address the challenge, the mRNA/DNA homology-based gene structure prediction tool GIGOgene has been developed. This application uses a new affine gap penalty splice-enhanced global alignment algorithm running in linear memory for a high-quality annotation of splice sites (SSs). Our tool includes a novel algorithm to assemble partial gene structure predictions using interval graphs. GIGOgene exhibited a sensitivity of 99.08% and a specificity of 99.98% on the Genie learning set, and demonstrated a higher quality of gene structural prediction when compared to Sim4, est2genome, Spidey, Galahad and BLAT, including when genes contained micro-exons and non-canonical SSs. GIGOgene showed an acceptable loss of prediction quality when confronted with a noisy Genie learning set simulating ESTs. The GIGOgene tool was used to collect an extensive learning set of human and mouse gene structural predictions. In addition to this homology-based gene structural prediction method, several new approaches to combine a priori knowledge to improve ab initio SS detection are discussed. First, a new Bayesian SS sensor design is introduced. According to tests, the Bayesian sensor outperforms the contemporary Maximum Entropy sensor for 5' SS detection. To further enhance prediction quality, the new de novo motif detection tool MHMMotif was designed and applied to intronic ends and to exons. The number of putative Exonic and Intronic Splicing Enhancers (ESE and ISE) is reported. Elements found are combined with sensor information using the Naive Bayesian Network, as implemented in the SpliceScan tool. SpliceScan has been shown to outperform the SpliceView, GeneSplicer, NNSplice, Genio and NetUTR tools for the test set of human and rat, genes. For the test set of short 5' UTR gene fragments, our SpliceScan outperformed all contemporary ab initio gene structural prediction tools.

Subject Area

Computer science

Recommended Citation

Tchourbanov, Alexander, "Signal based Bayesian framework for gene structural prediction" (2006). ETD collection for University of Nebraska-Lincoln. AAI3209964.
https://digitalcommons.unl.edu/dissertations/AAI3209964

Share

COinS