Computer Science and Engineering, Department of
First Advisor
Tomáš Helikar
Second Advisor
Massimiliano Pierobon
Third Advisor
Juan Cui, Ashok Samal
Date of this Version
Fall 11-30-2022
Document Type
Article
Abstract
Genome-Scale Metabolic Models (GEMMs) are powerful reconstructions of biological systems that help metabolic engineers understand and predict growth conditions subjected to various environmental factors around the cellular metabolism of an organism in observation, purely in silico. Applications of metabolic engineering range from perturbation analysis and drug-target discovery to predicting growth rates of biotechnologically important metabolites and reaction objectives within dierent single-cell and multi-cellular organism types. GEMMs use mathematical frameworks for quantitative estimations of flux distributions within metabolic networks. The reasons behind why an organism activates, stuns, or fluctuates between alternative pathways for growth and survival, however, remain relatively unknown. GEMMs rely on manual intervention during their curation and annotation process, which can potentially induce substantial experimental bias. Also, solution spaces that cater to the flux distributions can be sensitive to the addition, updates, and deletions of metabolites and reactions and gene-enzyme-reaction rules within the model. Therefore, the quest for optimality can often be lost due to the number of hyper dimensions represented by these networks
Recently, Deep Learning (DL) has played a significant role in building function approximators for highly complex input datasets correlating in extremely large hyper dimensions. In this thesis, to address the computational costs associated with the simulations of GEMMs, we use an interpretable learning-driven approach to build surrogate GEMM models that act as alternatives to existent Flux Balance Analysis (FBA)-based approaches for predicting intracellular fluxes of reactions. We exploit the network characteristics of a well-curated input organism and build a synthetic subset of the flux cone containing thermodynamically feasible reaction growth rates. We then feed this dataset into a deep generative model capable of reconstructing intracellular flux values of the input organism. We evaluate its efficiency based on time-to-construct, accuracy, and ease of use. To provide a fair comparative analysis, we explore our learning approach with other traditional regression-based models and test our pipeline on three different input organisms subjected to network reduction techniques and different hyperparameters.
Advisers: Tomáš Helikar & Massimiliano Pierobon
Included in
Biochemistry Commons, Computer Engineering Commons, Computer Sciences Commons, Structural Biology Commons
Comments
A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science, Major: Computer Science, Under the Supervision of Professors Tomáš Helikar and Massimiliano Pierobon. Lincoln, Nebraska: November, 2022
Copyright © 2022 Achilles Rasquinha