Biological Sciences, School of
First Advisor
Joshua R. Herr
Date of this Version
12-2018
Document Type
Article
Abstract
Adequate recommendations for the amount and types of sequencing data necessary to optimize the recovery of single chromosomes from bacterial sequencing projects do not exist. Broad estimates for coverage depths needed to recover complete bacterial genomes are present in the literature, but required sequencing depths across bacterial and archaeal phylogenies needed for high-quality assembly are not known. Additionally, correlations between genomic complexity and expected quality of assembly have not been properly defined. Furthermore, the capabilities of multiplexing (sequencing more than one sample simultaneously on one flow cell) with long-read sequencing platforms in order to recover complete bacterial chromosomes are poorly documented. We first preface our research by discussing the benefits and challenges surrounding assembly of single chromosome bacterial genomes. Then, in order to address the role of genomic variability on genome assembly quality, we selected a clade of closely related Escherichia coli strains and assessed how strain-level genomic variation leads to differences in genome assembly quality. While variation in assembly quality among highly similar strains does occur, we show that the depth at which increased coverage does not improve assembly contiguity can be ascertained for strains of highly similar bacteria. We also show that there are significant correlations between genomic traits -- such as genome size, repeat content, and number of coding sequences -- and the resulting genome assembly quality. Furthermore, we simulated long-read data based on standard multiplexed read profiles of a phylogenetically diverse array of bacteria and archaea and found that although limitations due to genome size and repeat complexity exist, long-read x8 multiplexed data are able to complete many bacterial genomes without the need for additional short-read sequencing. This research provides a series of criteria for why short-read sequencing and assembly often does not result in the generation of complete genome assemblies, and how multiplexed, long-read data can greatly reduce time and financial resources for many bacterial and archaeal sequencing projects.
Adviser: Joshua R. Herr
Comments
A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science, Major: Biological Sciences, Under the Supervision of Joshua R. Herr. Lincoln, Nebraska: December, 2018
Copyright (c) 2018 Timothy J. Krause