Dr. Robert Powers
Date of this Version
The development and awareness of Machine Learning and “big data” has led to a growing interest in applying these methods to bioanalytical research. Methods such as Mass Spectrometry (MS), and Nuclear Magnetic Resonance (NMR) can now obtain tens of thousands to millions of data points from a single sample, due to fundamental instrumental advances and ever-increasing resolution. Simple pairwise comparisons on datasets of this magnitude can obfuscate more complex underlying trends, and does a disservice to the richness of information contained within. This necessitates the need for multivariate approaches that can more fully take advantage of the complexity of these datasets.
Performing these multivariate analyses takes high degree of expertise, requiring knowledge of such disparate areas as chemistry, physics, mathematics, statistics, software development and signal processing. As a result, this barrier to entry prevents many investigators from fully utilizing all the tools available to them, instead relying on a mix of commercial and free software, chained together with in-house developed solutions just to perform a single analysis. While there are numerous methods in published literature for statistical analysis of these larger datasets, most are still confined to the realm of theory due to them not being implemented into publicly available software for the research community.
This dissertation outlines the development of routines for handling LC-MS data with freely available tools, including the Octave programming language. This presents, in combination with our previously developed software MVAPACK, a unified platform for metabolomics data analysis that will encourage the wider adoption of multi-instrument investigations and multiblock statistical analyses.
Advisor: Robert Powers