Computer Science and Engineering, Department of


Date of this Version



A DISSERTATION Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Philosophy, Major: Computer Science, Under the Supervision of Professor Stephen E. Reichenbach. Lincoln, Nebraska: August, 2011

Copyright 2011 Xue Tian


Mass spectra contain characteristic information regarding the molecular structure and properties of compounds. The mass spectra of compounds from the same chemically related group are similar. Classification is one of the fundamental methodologies for analyzing mass spectral data. The primary goals of classification are to automatically group compounds based on their mass spectra, to find correlation between the properties of compounds and their mass spectra, and to provide a positive identification of unknown compounds.

This dissertation presents a new algorithm for the classification of mass spectra, the most similar neighbor with a probability-based spectrum similarity measure (MSN-PSSM). Experimental results demonstrate the effectiveness and robustness of the new MSN-PSSM algorithm. In leave-one-out cross-validation, it outperforms popular techniques for classification of mass spectra, such as principal component analysis with discriminant function analysis, soft independent modeling of class analogy, and decision tree learning.

Comprehensive two-dimensional chromatography yields highly informative separation patterns because of its great practical peak capacity and sensitivity produced by applying two different separation principles. However, the improvement in information yields complex data requiring comprehensive analyses to interpret the rich information and to extract useful information for characterizing sample composition.

This dissertation presents a new non-targeted cross-sample classification method to analyze comprehensive two-dimensional chromatograms. Experimental results validate the effectiveness of the new non-targeted cross-sample classification. The new non-targeted cross-sample classification is successfully applied to a set of comprehensive two-dimensional chromatograms of breast cancer tumor samples. The feature vectors generated by the new non-targeted cross-sample classification are useful for discriminating between breast cancer tumor samples of different grades and providing information to identify potential biomarkers for closer examination.