Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Biological sequence analyses - Theory, algorithms, and applications

Fangrui Ma, University of Nebraska - Lincoln


With more and more biological sequences available, sequence analyses have become very important in bioinformatics and computational biology. In this dissertation, we present the results of our research on genome sequence alignment, RNA folding with simple pseudoknots, and single nucleotide polymorphism (SNP) association pattern discovery with spatial constraints. In genome sequence alignment, we use the divide and conquer approach to reduce the computational complexity of multiple whole genome sequence alignment. There are three major steps: finding candidate anchors, aligning multiple anchor sequences, and closing the gaps between the aligned anchors. The candidate anchors are computed using a suffix tree/array method. Then multiple anchor sequences are aligned using efficient graph theoretic algorithms. ClustalW is used for closing the gaps. The experiments showed that the algorithms can correctly find the alignment, and the longest path algorithm is more efficient than the maximum clique algorithm. The comparison with other closely related program showed that our programs run faster. Furthermore, we introduced the concept of solution space of genome sequence alignment to solve the problem that the current genome sequence alignment algorithms do not consider spatial constraints. The solution space is modeled as a multi-bipartite digraph. We provide efficient graph decomposition and traversal algorithms for processing the graph to output solutions as alignments of functionally equivalent clusters. To find the maximum alignment among them, we developed an O(qn2) algorithm for finding the maximum edge q-clique in the graph. The most conserved sites between genome sequences are equivalent to the minimum level cut of the graph. For RNA folding, we developed a dynamic programming algorithm for prediction of simple pseudoknots in the optimal secondary structure of a single RNA sequence using standard thermodynamic parameters. The algorithm has time and space complexities of O(n 4) and O(n3 ), respectively. Compared with other methods by using PseudoBase, our method is more accurate. In the last part of the dissertation, we present a tool for discovering SNP association patterns with spatial constraints using the association mining method.^

Subject Area

Biology, Bioinformatics|Computer Science

Recommended Citation

Ma, Fangrui, "Biological sequence analyses - Theory, algorithms, and applications" (2009). ETD collection for University of Nebraska - Lincoln. AAI3360173.