Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Convolutional Neural Network-Based Gene Prediction Using Buffalograss as a Model System

Michael Morikone, University of Nebraska - Lincoln

Abstract

The task of gene prediction has been largely stagnant in algorithmic improvements compared to when algorithms were first developed for predicting genes thirty years ago. Rather than iteratively improving the underlying algorithms in gene prediction tools by utilizing better performing models, most current approaches update existing tools through incorporating increasing amounts of extrinsic data to improve gene prediction performance. The traditional method of predicting genes is done using Hidden Markov Models (HMMs). These HMMs are constrained by having strict assumptions made about the independence of genes that do not always hold true. To address this, a Convolutional Neural Network (CNN) based gene prediction tool was developed and named GeneCNN. Due to their nonlinearity, neural networks are adept at capturing complex relationships between data points when applied to sufficiently large datasets such as whole genomes. Convolutional neural networks further improve upon neural networks through the incorporation of spatial dependence into individual datapoints. GeneCNN was trained using a sequenced buffalograss (Bouteloua dactyloides) genome. Training performance of GeneCNN resulted in a 97% accuracy in correctly identifying genic sequences in test data. GeneCNN uniquely identified a greater number of genes than currently existing gene prediction tools BRAKER3, AUGUSTUS, and Fgenesh at 1,089, 1,535, and 478 respectively, when using a 10 million nucleotide length genome sequence of buffalograss as input. Gene predictions made by combinations of the tools BRAKER3, AUGUSTUS, and Fgenesh, were compared to GeneCNN to assess the percentage of gene predictions made by GeneCNN that are supported by at least one other tool, where support ranged from 40.5% to 84.1% of all GeneCNN gene predictions for every combination of BRAKER3, AUGUSTUS, and Fgenesh. The findings in this study support the use of CNNs for gene prediction and serve as a valuable resource for the improvement of gene prediction algorithms in future research.

Subject Area

Bioinformatics|Biology|Genetics

Recommended Citation

Morikone, Michael, "Convolutional Neural Network-Based Gene Prediction Using Buffalograss as a Model System" (2023). ETD collection for University of Nebraska-Lincoln. AAI30813474.
https://digitalcommons.unl.edu/dissertations/AAI30813474

Share

COinS