Date of this Version
Streptomyces aureofaciens is a Gram-positive Actinomycete used for commercial antibiotic production. Although it has been the subject of many biochemical studies, no public genome resource was available prior to this project. To address this need, the genome of S. aureofaciens ATCC 10762 was sequenced using a combination of sequencing platforms (Illumina and 454-shotgun). Multiple de novo assembly methods (SGA, IDBA, Trinity, SOAPdenovo2, MIRA, Velvet and SPAdes) as well as combinations of these methods were assessed to determine which provided the most robust assembly. Combination strategies led to a consistent overestimation of the total genome size. Empirical data from targeted PCR of predicted gap regions provided a robust validation framework for our de novo assemblies. Overall, the best assembly was generated using SPAdes. The total assembly length was 9.47 Mb and the average G+C content was 71.15 %. We annotated this assembly using the NCBI Prokaryotic Genome Annotation Pipeline, revealing 8,073 total genes, including a total of 7,627 protein coding sequences. Additional functional analysis using the KEGG GENES database provided functional predictions for over 1,400 of these sequences whose functions were not initially inferred by NCBI. The information provided from multiple independent assemblies allowed us to close 200 scaffold gaps present in our first hybrid assembly. Comparative genomic and phylogenetic analyses suggested S. aureofaciens ATCC 10762 may be more closely related to the genus Kitasatospora than to neighboring Streptomyces species. Our results highlight the need for, and the value of, multiple assemblies when attempting to produce high quality prokaryotic genome sequences.
Advisor: Etsuko Moriyama