Computer Science and Engineering, Department of


First Advisor

Jitender S. Deogun

Date of this Version


Document Type



A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science, Major: Computer Science, Under the Supervision of Professor Jitender S. Deogun. Lincoln, Nebraska: December, 2021

Copyright © 2021 Ankitha Vejandla


The rapid development of next-generation sequencing (NGS) technologies for determining the sequence of DNA has revolutionized genome research in recent years. De novo assemblers are the most commonly used tools to perform genome assembly. Most of the assemblers use de Bruijn graphs that break the sequenced reads into smaller sequences (sub-strings), called kmers, where k denotes the length of the sub-strings. The kmer counting and analysis of kmer frequency distribution are important in genome assembly. The main goal of this research is to provide a detailed analysis of the performance of different kmer counting and estimation tools that are currently available. This helps bioinformatics researchers to make a good decision in choosing accurate and efficient tools for genome assemblers.

Advisor: Jitender S. Deogun