Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Finding DNA motifs: A probabilistic suffix tree approach

Abhishek Majumdar, University of Nebraska - Lincoln


We address the problem of de novo motif identification. That is, given a set of DNA sequences we try to identify motifs in the dataset without having any prior knowledge about existence of any motifs in the dataset. We propose a method based on Probabilistic Suffix Trees (PSTs) to identify fixed-length motifs from a given set of DNA sequences. Our experiments reveal that our approach successfully discovers true motifs. We compared our method with the popular MEME algorithm, and observed that it detects a larger number of correct and statistically significant motifs than MEME. Our method is highly efficient as compared to MEME in finding the motifs when processing datasets of 1000 or more sequences. We applied our method to sequences of mutant strains of Exophiala dermatitidis and successfully identified motifs which revealed several transcription factor binding sites. This information is important to biologists for performing experiments to understand their role in different regulatory pathways affected by cdc42. We also show that our PST approach to de novo motif discovery can be used successfully to identify motifs in ChIP-Seq datasets. These motifs in turn identify binding sites for proteins in the sequences.

Subject Area

Computer science

Recommended Citation

Majumdar, Abhishek, "Finding DNA motifs: A probabilistic suffix tree approach" (2017). ETD collection for University of Nebraska-Lincoln. AAI10254179.