Computer Science and Engineering, Department of


First Advisor

Juan Cui

Date of this Version

Fall 12-5-2018

Document Type



Department of Computer Science and Engineering University of Nebraska-Lincoln


MicroRNAs (miRNAs) are a class of short (~22 nt) single strand RNA molecules predominantly found in eukaryotes. Being involved in many major biological processes, miRNAs can regulate gene expression by targeting mRNAs to facilitate their degradation or translational inhibition. The imprecise splicing of miRNA splicing which introduces severe variability in terms of sequences of miRNA products and their corresponding downstream gene expression regulation. For example, to study biogenesis of miRNAs, usually, biologists can deplete a gene in the miRNA biogenesis pathway and study the change of miRNA sequences, which can cause impression of miRNAs. Although high-throughput sequencing technologies such as small RNA-seq provide unprecedented quantitative readouts for miRNA expression analysis, existing standalone tools developed for miRNA-seq analysis usually do not provide comprehensive and detailed information on miRNA splicing variations. To advance our understanding of miRNA variability by identifying significant miRNA imprecise splicing with statistical power, and to present a complete scenario of miRNA splicing regulation, we proposed a pipeline called GMAim as a Genome-wide MiRNAImprecise splicing detection including read cataloging, miRNA family expression quantification, imprecise splicing identification and categorization. This pipeline was implemented with R and an R package was developed. Based on the cross validation, this tool is proved to be more powerful and accurate than other modern miRNA annotation tools by avoiding over-counting of sequence reads and quantification as well as fast implementation time.