Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.
Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Addressing Bioinformatics Bottlenecks for Scalable Microbial Population Genomics Analyses
Abstract
With population genomics analyses, researchers can understand genetic relationships in populations and their environments, find genomic patterns, and for pathogenic organisms, especially microorganisms, track outbreaks and develop treatments with high accuracy. The process of doing population genomics starts with raw sequencing data and ends with genotypic mapping. However, to perform genomics analyses on a whole population scale, we need powerful computational platforms and efficient methods that work well with large data and generate accurate outcomes. In this dissertation, the two focal bottlenecks for performing efficient and accurate microbial population analyses are addressed: 1) the need for scalable and effective computational platform that utilizes powerful computational resources; and 2) strategic algorithm selection for various steps of population genomics analyses by exploring three main applications: i) automated and scalable multi-step bioinformatics pipeline; ii) accuracy of tools for read mapping; and iii) real-time sequence typing of foodborne pathogens. As part of this dissertation, we: 1) built ProkEvo, an automated and scalable platform for bacterial population genomics; 2) deployed ProkEvo on two different computational platforms; and 3) provided application case studies of ProkEvo. Next, we investigated the accuracy of mapping and alignment tools for long sequencing reads by: 4) building a consistent set of benchmarks using simulated data; 5) defining stringent assessment metrics; 6) using a range of thresholds to reflect their true accuracy. Furthermore, we focused on real-time sequence typing of foodborne pathogens and: 7) performed systematic and comprehensive comparison between assembly-dependent and assembly-free methods for scalable bacterial MLST mapping; 8) showed that the accuracy of these methods and the optimal k-mer length are species-specific; and 9) incorporated both methods in ProkEvo. With the experiments performed in this dissertation, we provide useful guidelines for strategic algorithm selection of the steps part of the population genomics analyses. Moreover, we conclude that ProkEvo provides a practical and viable platform for scalable automated analyses of bacterial populations that can be applied in microbiology research, clinical diagnostics, and epidemiological surveillance.
Subject Area
Bioinformatics
Recommended Citation
Pavlovikj, Natasha, "Addressing Bioinformatics Bottlenecks for Scalable Microbial Population Genomics Analyses" (2022). ETD collection for University of Nebraska-Lincoln. AAI29168531.
https://digitalcommons.unl.edu/dissertations/AAI29168531