Computer Science and Engineering, Department of


Date of this Version



From 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09)


Identifying and separating subtly different biological samples is one of the most critical tasks in biological analysis. Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is becoming a popular and important technique in the analysis of biological samples, because it can detect molecular information and characterize chemical composition. ToF-SIMS spectra of biological samples are enormously complex with large mass ranges and many peaks. As a result the classification and cluster analysis are challenging. This study presents a new classification algorithm, the most similar neighbor with a probability-based spectrum similarity measure (MSN- PSSM), which uses all the information in the entire ToF- SIMS spectra. MSN-PSSM is applied to automatically classify bacterial samples which are major causal agents of urinary tract infections. Experimental results show that MSN-PSSM is an accurate classification algorithm. It outperforms traditional techniques such as decision trees, principal component analysis (PCA) with discriminant function analysis (DFA), and soft independent modeling of class analogy (SIMCA). This study also applies a modern clustering algorithm, normalized spectral clustering, to automatically cluster the bacterial samples at the species level. Experimental results demonstrate that normalized spectral clustering is able to show accurate quantitative separations. It outperforms traditional techniques such as hierarchical clustering analysis, k- means, and PCA with k-means.