Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.

Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Selecting the "Closest to Optimal" Multiple Sequence Alignment Using Multi-Layer Perceptron

Catherine Anderson, University of Nebraska - Lincoln


Many bioinformatics analyses use multiple sequence alignments (MSAs) as their input data. Therefore, the quality of an MSA is critical. When selecting an MSA, users often rely on the overall accuracy reported in published studies where various MSA programs are evaluated using only a small number of benchmark datasets. For protein sequences, such benchmark alignments are often generated based on protein 3D-structure information, limiting the numbers and types of alignments that can be tested. The main objective of this study is to develop a method that can improve the quality of MSAs. Toward this goal, we first developed SuiteMSA, a graphical MSA viewing and assessment software package. It helps users to visually and quantitatively assess MSAs produced by any automated programs. A learning problem of this nature requires a large number of reference protein alignments and currently available benchmark databases are not sufficiently large nor diverse. Therefore, we constructed a new simulated alignment benchmark database, SimDom. It includes a large number of protein sets with a wide range of properties and levels of divergence as well as multi-domain architectures. Using this benchmark, we evaluated the performance of five MSA programs and developed a system of measures that quantify the shift in performance between the programs. We determined which aspects of the sequence sets and resulting alignments influenced the performance shift. Based on this knowledge, we developed a multi-class classifier based on a multi-layer perceptron to select the alignment closest to the optimal. Using this “S eLecting an Alignment Program” (SLAP) classifier, we successfully increased the average quality score of the selected alignments by as much as 0.052 for the simulated datasets and by as much as 0.259 for the non-simulated datasets over the best effort of a single program. Successful selection of the alignment closest to the optimum will allow for better results from downstream analyses and thus contribute to the improvement of various bioinformatics and molecular evolutionary analyses.

Subject Area

Bioinformatics|Computer science

Recommended Citation

Anderson, Catherine, "Selecting the "Closest to Optimal" Multiple Sequence Alignment Using Multi-Layer Perceptron" (2017). ETD collection for University of Nebraska-Lincoln. AAI10603482.