Published Research - Department of Chemistry


Date of this Version



Anal Biochem. 2010 April 1; 399(1): 58–63. doi:10.1016/j.ab.2009.12.022.


© 2009 Elsevier Inc. All rights reserved.


Large amounts of data from high throughput metabolomic experiments are commonly visualized using a principal component analysis (PCA) 2D scores plot. The question of the similarity or difference between multiple metabolic states then becomes a question of the degree of overlap between their respective data point clusters in PC scores space. A qualitative visual inspection of the clustering pattern in PCA score plots is a common protocol. This report describes the application of tree diagrams and bootstrapping techniques for an improved quantitative analysis of metabolic PCA data clustering. Our PCAtoTree program creates a distance matrix with 100 bootstrap steps that describes the separation of all clusters in a metabolic dataset. Using accepted phylogenetic software, the distance matrix resulting from the various metabolic states is organized into a phylogenetic-like tree format, where bootstrap values ≥ 50 indicate a statistically relevant branch separation. PCAtoTree analysis of two previously published data sets demonstrates the improved resolution of metabolic state differences using tree diagrams. In addition, for metabolomic studies of large numbers of different metabolic states, the tree format provides a better description of similarities and differences between each metabolic state. The approach is also tolerant of sample size variations between different metabolic states.