Juan Cui, Ashok Samal
Date of this Version
Genome-Scale Metabolic Models (GEMMs) are powerful reconstructions of biological systems that help metabolic engineers understand and predict growth conditions subjected to various environmental factors around the cellular metabolism of an organism in observation, purely in silico. Applications of metabolic engineering range from perturbation analysis and drug-target discovery to predicting growth rates of biotechnologically important metabolites and reaction objectives within dierent single-cell and multi-cellular organism types. GEMMs use mathematical frameworks for quantitative estimations of flux distributions within metabolic networks. The reasons behind why an organism activates, stuns, or fluctuates between alternative pathways for growth and survival, however, remain relatively unknown. GEMMs rely on manual intervention during their curation and annotation process, which can potentially induce substantial experimental bias. Also, solution spaces that cater to the flux distributions can be sensitive to the addition, updates, and deletions of metabolites and reactions and gene-enzyme-reaction rules within the model. Therefore, the quest for optimality can often be lost due to the number of hyper dimensions represented by these networks
Recently, Deep Learning (DL) has played a significant role in building function approximators for highly complex input datasets correlating in extremely large hyper dimensions. In this thesis, to address the computational costs associated with the simulations of GEMMs, we use an interpretable learning-driven approach to build surrogate GEMM models that act as alternatives to existent Flux Balance Analysis (FBA)-based approaches for predicting intracellular fluxes of reactions. We exploit the network characteristics of a well-curated input organism and build a synthetic subset of the flux cone containing thermodynamically feasible reaction growth rates. We then feed this dataset into a deep generative model capable of reconstructing intracellular flux values of the input organism. We evaluate its efficiency based on time-to-construct, accuracy, and ease of use. To provide a fair comparative analysis, we explore our learning approach with other traditional regression-based models and test our pipeline on three different input organisms subjected to network reduction techniques and different hyperparameters.
Advisers: Tomáš Helikar & Massimiliano Pierobon