Computer Science and Engineering, Department of


First Advisor

Qiuming Yao

Date of this Version



A thesis presented to the faculty of the Graduate College at the University of Nebraska in partial fulfillment of requirements for the degree of Master of Science

Major: Computer Science

Under the supervision of Professor Qiuming Yao

Lincoln, Nebraska, November 2023


Copyright 2023, Mengyuan Zhou


Previous efforts in using genome-wide analysis of transcription factor binding sites (TFBSs) have overlooked the importance of ranking potential significant regulatory regions, especially those with repetitive binding within a local region. Identifying these homogenous binding sites is critical because they have the potential to amplify the binding affinity and regulation activity of transcription factors, impacting gene expression and cellular functions. To address this issue, we developed an open-source tool Motif-Cluster that prioritizes and visualizes transcription factor regulatory regions by incorporating the idea of local motif clusters. Motif-Cluster can rank the significant transcription factor regulatory regions without the need for experimental data by applying a density-based clustering approach combined with flexible binding gaps and binding affinities.

Motif-Cluster uses an algorithm which effectively filters out the noise from weak binding sites by balancing region size and binding instances based on binding site gaps and binding affinities. As a result, the algorithm can effectively cluster local binding sites and identify crucial regulatory areas. The tool has been tested under multiple strategies on local binding sites and has successfully recovered key regulatory regions for ZNF410 discovered previously for its binding clusters in the CHD4 promoter. It provides a useful interface to analyze densely packed binding sites and to visualize prioritized regulatory regions.

Overall, Motif-Cluster provides a more efficient and comprehensive solution to identifying significant transcription factory binding sites in genome-wide analyses than previous solutions. With improved efficiency and visualization capabilities, Motif-Cluster empowers researchers to gain new insights and design novel experiments through a new way of discovery.

Advisor: Qiuming Yao