Date of this Version
An accurate transcriptome is essential to understanding biological systems enabling omics analyses such as gene expression, gene discovery, and gene-regulatory network construction. However, assembling an accurate transcriptome is challenging, especially for organisms without adequate reference genomes or transcriptomes. While several methods for transcriptome assembly with different approaches exist, it is still difficult to establish the most accurate methods. This thesis explores the different transcriptome assembly methods and compares their performances using simulated benchmark transcriptomes with varying complexity. We also introduce ConSemblEX to improve a consensus-based ensemble transcriptome assembler, ConSemble, in three main areas: we provide the ability to use any number of assemblers, provide a variety of consensus assembly outputs, and provide information about the effect of each assembler in the final assembly. Using five assembly methods both in the de novo and genome-guided approaches, we showed how ConSemblEX can be used to explore various strategies for consensus assembly, such as ConSemblEX-4+, to find the optimum assembly. Compared to the original ConSemble, ConSemblEX improved the de novo assembly performance, increasing the precision by 14% and F1 by 5%, and significantly reducing the FP by 49%. In the genome-guided assembly, ConSemblEX had identical performance to the original ConSemble. We showed that ConSemblEX provides tools to explore how different methods perform and behave depending on the datasets. With the ConSemblEX-select assembly, we further demonstrated that we can improve consensus-based assembly more by choosing optimum overlap sets among different methods. Such information provides the foundation to develop machine learning algorithms in the future to further improve transcriptome assembly performance.
Adviser: Jitender Deogun