Metabolic Modeling of Human Gut Microbiota on a Genome Scale: An Overview

There is growing interest in the metabolic interplay between the gut microbiome and host metabolism. Taxonomic and functional profiling of the gut microbiome by next-generation sequencing (NGS) has unveiled substantial richness and diversity. However, the mechanisms underlying interactions between diet, gut microbiome and host metabolism are still poorly understood. Genome-scale metabolic modeling (GSMM) is an emerging approach that has been increasingly applied to infer diet–microbiome, microbe–microbe and host–microbe interactions under physiological conditions. GSMM can, for example, be applied to estimate the metabolic capabilities of microbes in the gut. Here, we discuss how meta-omics datasets such as shotgun metagenomics, can be processed and integrated to develop large-scale, condition-specific, personalized microbiota models in healthy and disease states. Furthermore, we summarize various tools and resources available for metagenomic data processing and GSMM, highlighting the experimental approaches needed to validate the model predictions.


Introduction
The human gut microbiome consists of trillions of microorganisms such as bacteria, archaea, and unicellular eukaryotes [1,2]. Most gut microbes are facultative obligate anaerobes spanning between five different phyla (Bacteriodetes, Firmicutes, Proteobacteria, Verrumicrobia, and Actinobacteria), with over 1000 species already identified [3]. Several collaborative studies and large consortia such as MetaHIT [4,5], the Human Microbiome Project (HMP) [6,7], and American Gut [8] have taxonomically and functionally profiled the gut microbiome in healthy and various disease states. The composition of the gut microbiota is relatively simple at birth, it undergoes a series of changes in composition, metabolic functions and eventually matures between 3-5 years of age [9]. For any one individual, the composition of the gut microbiome tends to be stable over time. Interestingly, there is a difference in the composition of the gut microbiome within a human population [10][11][12]. Several genetic and environmental factors such as diet, lifestyle, geography, mode of delivery, infection, infant feeding modality (e.g. formula versus breastfed) and medication attribute to these differences, and thereby, shape the gut microbiota during the early stages of life [2,9,13].
The gut microbiome acts as an auxiliary metabolic organ. Several complex carbohydrates, not digested by the host intestinal enzymes, are passed to the microbial community, which are then metabolized in the large intestine [14,15]. The gut microbiota is involved in metabolism of short-chain fatty acid (SCFAs), branched chain fatty acids (BCFAs), branched chain amino acids Herein, we review the role of GSMM in understanding microbial metabolism in the human gut, with a focus on how GEMs have been used to infer diet-microbiome, microbe-microbe and host-microbiome interactions under physiological conditions. We discuss metagenomics profiling, and how meta-omics datasets can be used for building condition-specific personalized community models of gut microbiota. We further summarize the available tools for metagenomic profiling and GSMM. Finally, we highlight and emphasize the experimental techniques and data required to validate the GEM-based predictions.

Colonization and Shaping of the Gut Ecosystem
Early colonization of the gut microbiota in infants is vital for shaping of the intestinal ecosystem at a later age [2,32]. These processes are driven by multiple factors such as mode of delivery, gestational age, maternal diet, environment and host genetics. Additionally, geography, life style, age, certain diseases and drug usage can all affect the gut microbial composition and function [2,33].

Gut Microbiome Profiling and Functional Annotation
Metagenomics shotgun sequencing [42] and 16S rRNA amplicon sequencing [43] have been used for profiling gut microbiota from fecal (stool) samples. An appropriately annotated shotgun metagenomics dataset can be used for accurately mapping and predicting microbiota-affected metabolic pathways. These approaches also have proven potential for novel gene discovery [44] and identification of essential functions. Annotation of metagenomics datasets is primarily carried out in two ways: (a) by assembling nucleotide sequences from NGS reads of appropriate length and subsequently predicting the protein coding sequences (called CDS) [45], and (b) by mapping the reads to genome or non-redundant marker gene sets of the relevant organisms guided by the taxonomic profiling [46]. These genes can be clustered, catalogued and aligned against reference database(s) of annotated gene/protein families (e.g., KEGG Orthology [47]), and/or they can be linked to metabolic pathways (e.g., MetaCyc [48]).
Various computational tools and pipelines have been developed for these sorts of purposes. MOCAT2, for example, provides automated annotation of non-redundant reference catalogues from 18 databases covering various functional categories [45]. HMP Unified Metabolic Analysis Network (HUMAnN2) is a pipeline for profiling the relative abundances of microbes and the activity of their metabolic pathways from metagenomics data [46,49]. MEtaGenome ANalyzer (MEGAN) is an interactive and comprehensive microbiome analysis toolbox, that allows researchers to explore and analyze large-scale metagenomics datasets both from taxonomic and functional perspectives [50]. Metagenomics Rast (MG-RAST), is a RAST (Rapid Annotation using Subsystem Technology) server for automated annotation of metagenomics datasets [51]. Integrated Microbial Genomes & Microbiomes (IMG/M) is another server-based system that supports the annotation and analysis of microbiome datasets [52]. There is a plethora of tools for sequence assembly, gene prediction and phylogenetic classification which underpin many of these processes, and these tools are extensively reviewed elsewhere [53].
Functional annotation of metagenomics datasets poses several challenges in itself [53,54]. Although metagenomics data categorizes microbial functions at the community level, it fails to suggest a mechanistic explanation for how these functions arise. To understand the intricate relationship between microbial components, such as genes, proteins and metabolites, and their influence on host metabolism via different biochemical pathways, microbe-specific metabolic models need to be developed at the genome scale.

A Constraint-Based Strategy and Tools for Genome-Scale Metabolic Modeling of Gut Microbiota
A rapid increase in use of shotgun metagenomics, the availability of model organisms, and the number of meta-omics datasets in public repositories, gives an opportunity to develop metabolic reconstructions of human gut microbes. These reconstructions can be converted into quantitative mathematical models that can be used to study metabolism at the genome scale [28,[55][56][57][58]. Current tools and resources for gut microbiome modeling are listed in Table 1. OptCom A modeling framework to perform FBA of microbial communities. [68] SteadyCom A toolbox that can be used to predict the changes in microbial species abundance in response to the dietary changes. [69] MetExplore An open access web-server for integrative analysis of metabolomic datasets and genome-scale metabolic networks. [70] MMinte An integrated pipeline for modeling the pairwise interactions within a microbial network. [71] jQMM library An open-source, Python-based framework for modeling internal metabolic fluxes. The toolbox can be used for FBA and 13C Metabolic Flux Analysis (MFA). [72]

Model repositories and databases
BiGG database An open access database for gold standard GEMs. [73] Virtual Metabolic Human (VMH) An open access database for human and gut microbial metabolism (GEMs). [74] ModelSEED A web-based resource for metabolic modeling. [75] Human Metabolic Atlas (HMA) An open access web-based resource for human metabolism. [76]

Metabolic Pathways and Enzyme databases
MetaCyc/HumanCyc A curated database of experimentally validated metabolic pathways.
HumanCyc is a database of curated human metabolic pathways. [48] KEGG A resource comprised of databases including large-scale molecular datasets and detailed pathway information. [77,78] BRENDA An information retrieval system focusing on enzymes and their ligands. [79] REACTOME An open access database of biological pathways. [80] UniProt. An open access database of curated protein information. [81] In a GEM, uptake or secretion of certain metabolites over time (denoted as their 'flux'), enzymes/transcript abundances and ON/OFF gene expression can be constrained using information from datasets generated by quantitative fluxomic, metabolomic, transcriptomic and proteomic experiments. By applying these constraints, GEMs can be contextualized to a particular state or condition. These condition-specific/contextualized models can provide information about the activity of metabolic pathways, metabolite flux, cellular growth, and provide estimates of the overall metabolic capacities of these gut microbes. GSMM use FBA [28], a constraint-based approach (CBA), to predict organisms' phenotypes [28]. A tutorial on linear programming and FBA is available in [28].
GSMM has been applied to study gut microbial metabolism and its interactions with the host. Recently, AGORA (Assembly of Gut Organisms through Reconstruction and Analysis) was published, which carried out semi-automatic metabolic reconstruction of 773 human gut bacteria (205 genera, 605 species) [26]. The authors modeled metabolic interactions among microbial species based on their metabolic potential and availability of nutrients. This approach has identified and defined growth medium for Bacteroides caccae ATCC 34185. Moreover, these metabolic reconstructions have been used to infer metabolic diversity of microbial communities. The AGORA framework can be coupled with, for example Recon 2, a generic reconstruction of human metabolism, which in turn can be used to study host-microbiome interactions. AGORA reconstructions are publicly available via the Virtual Metabolic Human (VMH) [74] database (https://vmh.life/). In addition, BiGG Models [73] (http://bigg.ucsd.edu/) and the Human Metabolic Atlas [76] (http://metabolicatlas.org/) are other open access knowledge bases for metabolic reconstructions.
The Microbiome Modeling Toolbox [82] extends the functionality of the COBRA toolbox to use metagenomic data for modeling microbe-microbe/host-microbe metabolic interactions and modeling personalized microbial communities. Draft GEMs generated by these platforms are then curated for the occurrence of genes, metabolites, reactions and their associations based on evidence from the literature and expert knowledge of metabolism. Quality control checks, which are performed to eliminate false positives, also enhance the predictability of GEMs [55].

Reconstruction of Condition-Specific Personalized Gut Microbiota Models
In a metabolic model, numerous genes and metabolites are associated by way of metabolic pathways deemed to be thermodynamically feasible. These models are formalized and applied over the entire microbiota community model [82]. Various efforts have already been made to integrate metagenomic data with a genome-scale framework [26,83]. However, approaches to integrate other kinds of meta-omics data are still in the early phases of development.
Shotgun metagenomics and 16S rRNA data have guided the selection of representative microbes (species or strains) in a community [24]. Integration of meta-omics datasets such as metatranscriptomics, metaproteomics together with fecal metabolomics with the microbiota metabolic modeling framework can constrain the model, improving the accuracy of its representation of the biological system. Moreover, meta-omics data can be applied to develop condition-specific microbiota models (Figure 1) such as metabolic reconstruction of gut microbiota in lean vs. obese subjects. Likewise, a microbiota model can be personalized for an individual subject by combining the metagenomics information with other phenomics datasets. Metagenomics, metatranscriptomics and metaproteomics data can provide an estimate for enzymatic and pathway activities in the gut [49], which approximate the metabolic activity in the gut of an individual under specified conditions.
Context-based, personalized microbiota models have already been used to study various conditions [28,55,56,61,84]. An array of analysis can be performed with these models. Flux Variability Analysis (FVA) [28,85] can estimate the maximal and minimal possible flux differences (flux span) for a specific metabolic exchange reaction of a specific microbial strain, pair of strains, or community as a whole. It determines the potential of a reaction to carry out flux under the applied constraints/conditions. FVA can thus be used to compute strain-specific exchange fluxes for a particular metabolite that can be compared with the net metabolite exchanges in the community. Moreover, it can evaluate the role of individual microbe for metabolite production. On the other hand, shadow price (SP) of a metabolite determines whether it is limiting for an optimal objective function (growth or biomass production) [28,61]. A negative SP suggests that flux through the objective function would increase with the increase in the concentration of the metabolite. As an example, SP analysis has already identified several microbial strains that decrease ursodeoxycholate (UDCA) biosynthesis by limiting its precursors [83].
Food metabolomics datasets detailing dietary constituents have been used to constrain the nutrient uptake rates of microbiota models [58]. Diet acts as a 'spooning media' for the microbiome. Several diets such as a typical Western diet, high fiber diet [26], average European diet [26], breast milk [58], and Ready-to-Use Therapeutic Foods (RUTFs) [24], have been designed. The diet designer tool included as part of the aforementioned [74] can be used to calculate range of dietary fluxes, given the metabolite concentrations. On the other hand, fecal, serum and plasma metabolomics data can be used to confirm the identity of microbial metabolites produced by the models [24,25].

Modeling the Effect of Diet on Gut Microbiome
Diet is the direct regulator of microbial metabolism in the gut ecosystem; dietary patterns have profound effect on gut colonization and the shaping of the gut microbiome during the early stages of life [9]. Western diets are associated with a Bacteroides enterotype whereas plant-based polysaccharides are associated with a Prevotella enterotype [86]. Mostly, three primary macronutrients such carbohydrates, proteins, and fats are known to affect the gut microbial composition [18].
GSMM has already begun to be used to help improve mechanistic understanding of gut microbial metabolism and its dietary interactions [24][25][26]. Computational tools such as COMET [65], BacArena [64], dOptCom [68], MatNet [87], DyMMM [67], MCM [66], and CASINO [25] were designed to study diet-microbiome interactions. CASINO was able to predict the interactions along the diet-microbiota-host axis in 45 obese and overweight individuals [25]. Furthermore, this study estimated the metabolic capabilities of microbes in the lumen of obese and overweight individuals. The model predicted a significant change in the amino acids and SCFAs levels in response to dietary intervention. The model predictions were further validated by fecal and blood metabolomics data. In another study, GSMM was used to predict and elucidate the underlying interactions between Bacteroides thetaiotamicron, Eubacterium rectale and Methanobrevibacter smithii, when subjected to different gut ecosystems [15,22]. Recently, GEM-based predictions were used to evaluate the effect of RUTFs on gut microbiome of healthy and malnourished children from Bangladesh and Malawi [24]. This methodology can be further extended to study the effect of health supplements, prebiotics and probiotics on the human gut microbiota.

Multispecies Modeling and Interactions in the Gut Community
Microbial species or strains with high abundances in samples are often selected for pairwise or community modeling [24,26]. Two or more microbial GEMs are joined together along their extracellular compartments to build a community model [82]. The community model is linked to a "common compartment" mimicking the human gut, through which exchange of metabolites takes place. A community biomass, i.e., the sum of biomasses estimated for each microbe, and coupling constraints are added [82].
Pairwise analysis of microbes in the community has determined their metabolic relationships when introduced to different types of diets [24,26,83]. However, in vitro screening of microbial pairs can be laborious and expensive. When subjected to Western and high fiber diets under aerobic and anaerobic conditions, pairwise modeling has predicted six different interactions between gut microbes such as competition, parasitism, amensalism, neutralism, commensalism and mutualism [26]. Furthermore, pairwise models developed from personalized gut microbiomes have been interrogated for single, cooperative, and community-wide bile acid production potential [83]. This strategy has identified several microbe pairs producing secondary BAs. For instance, Bacteroides spp. and R. gnavus can cooperatively produce UDCA [83]. In another study, the rate of butyrate production increased by pairs of microbes as compared to a single species, when studied in the gut communities of healthy Bangladeshi and Malawian children [24].
Alternatively, correlation-based co-occurrence topological networks looking at abundant metagenomic species can be developed [88,89]. Such a network can predict positive or negative associations between the microbes. Microbe-microbe co-occurrence pairs of interest can be selected and evaluated by in vitro co-culture experiments [90]. Interestingly, co-occurring species compete strongly for metabolic resources, which are required for cellular growth and maintenance. In this context, the network analysis can be extended to incorporate different metrics such as competition and complementarity indices, which can be used to further characterize/quantify the degree of metabolic interactions between the selected pairs of microbes.

Metabolic Modeling of Host-Microbiome Interactions
Gut microbiota can harvest nutrients and energy from the diet. During these processes, small molecules (metabolites) are produced. Some of these metabolites can be beneficial for host and microbial symbionts [16,18,84]. One such metabolite is butyrate, a bacterial fermentation product that fuels the colonic epithelium [22]. In fact, butyrate is the primary energy source for colonocytes. In mammals, the production of cresols from tyrosine have been linked to various species of Clostridium, Bifidobacterium, and Bacteroides, and altered 4-cresol levels in human urine have been associated with weight loss in IBD [17]. The primary conjugated BAs produced by liver are deconjugated and biotransformed by gut microbes, affecting host signaling and metabolism [83]. Also, BAs can activate the innate immune genes which in turns alters the gut microbial composition. It also inhibits the growth of pathogens in the gut.
GEMs have been expanded to study metabolism in humans. Human generic metabolic reconstructions such as Recon 1 [91] and the Edinburgh Human Metabolic Network (EHMN) [92] were developed with a vision to integrate and analyze biological datasets. Similarly, Recon 2 [56,93] and Recon 3D [94], and Human Metabolic Reaction (HMR) [95,96], were designed, that comprehensively captured human metabolism. A metabolic reconstruction of human small intestinal epithelial cells (sIECs) was assembled and manually curated [97]. sIECs were used to study the physiological functionality of the small intestine and their overall role in human metabolism. These models incorporate transporters present in the human gut [94,97,98], while some of them are putatively identified. Furthermore, several functional cell or tissue-specific GEMs have been generated for the liver [96], brain [99], adipocytes [95] and myocytes [100], using semi-automated approaches [101]. In addition, a gender-specific, whole-body metabolism (WBM) reconstruction was developed to capture and characterize the metabolism of 20 human organs [102]. A WBM framework can be constrained with dietary, physiological parameters and omics datasets. Such a framework was used to link organ-level metabolic processes in 149 subjects induced by their gut microbiota.
The Microbiome Modeling Toolbox [82], deployed under the COBRA suite, includes several functions for modeling complex metabolic interactions between the host and gut microbiota. It can integrate microbe (AGORA [26], BiGG [73]) and host (Recon [56,91,94]) metabolic reconstructions. Similarly, a common compartment mimicking the human gut is added, which enables pooling and exchange of metabolites between the microbes, lumen and the host cells.
In a different context, the microbiome-induced immune response is currently well established. An imbalance in gut microbial composition has been linked to inflammatory and autoimmune diseases [103][104][105][106]. Various immune cells including CD4 + effector T cells (particularly Th1, Th2, Th17 and iTreg), CD8 + T cells (cytotoxic) and macrophages undergo metabolic reprogramming during proliferation and differentiation processes [107]. The macrophage (RAW 264.7 cell line) model was developed to study immunoactivation and immunosuppression [108]. Metabolic reconstructions of immune cells are currently unavailable. By developing GEMs for host immune cells [57], might guide us to study, the microbiome-mediated immunometabolic responses under various health/disease conditions.

Model Predictions and Experimental Validation
To establish the biological relevance of metabolic models, the congruence between model predictions and experimental data is of utmost importance. GEM-based predictions can be validated by existing data, knowledge and bibliographical evidence. For instance, metabolites secreted by gut microbiota can be compared with the concentrations of metabolites found in fecal and blood samples [24,25]. Furthermore, blood metabolomics data can be used for validation of metabolites predicted as being transported across the human gut. Meta-omics datasets [109] can be used to estimate the abundances of gut enzymes and microbial pathways for an individual species or strain [49]. The pathway abundances can be compared with the enrichment and usage (flux) of GEM-predicted pathway(s). GSMM can be applied to quantify dietary nutrient uptake of gut microbes and their metabolic interactions with the host. To understand the regulation of host metabolism by gut microbes, germ-free (GF) and conventionally raised (CONV-R) mice are usually used [110]. These mice can be raised on different diets and then euthanized, with samples analyzed by meta-omics analyses. The generated datasets can be used for contextualization and validation of GEMs. Furthermore, the theoretical growth rate of a microbe can be validated by culturing species in a specific media [25,26]. In addition, the predicted metabolic interactions between microbes, regulation of co-occurrence network, and dietary cross-feeding can be validated by mono-and co-culture experiments [90].

Concluding Remarks and Future Perspectives
Integration of meta-omics datasets and genome-wide metabolic reconstructions provide a framework for interrogating and suggesting mechanistic workings of diet-microbe-host metabolic interaction. However, such integrative methods are still evolving and require extensive and robust experimental validation.
Profiling and culturing gut microbes at the strain level, under controlled conditions, remains challenging. Recently, an integrated approach involving targeted phenotypic culturing, WGS, phylogenetic analysis and computational modeling has succeeded in culturing a substantial portion of bacteria previously declared to be 'unculturable' under laboratory conditions. This approach identified 137 bacterial species, including novel species isolated from pure cultures [111]. Furthermore, the culturomics techniques are currently used for filling the gap by isolating the unknown or novel members of the gut community [111,112].
In studies of gut microbial communities, there is increasing interest in mechanistic approaches, in contrast to solely genome-centric approaches. Correspondingly, GSMM is widely used as a preferred computational method for studying gut microbial metabolism and its interaction with the host. Additionally, GEMs can be contextualized and personalized using longitudinal meta-omics datasets, providing a snapshot of metabolic processes over time. Personalized microbiota models may help to reduce the costs of clinical studies, predict markers and contribute to the development of potential treatments at either the individual patient level, or for a defined patient group [83,113]. Many efforts are ongoing, aiming to couple pharmacokinetic and constraint-based models to study drug-microbe-diet interactions [114]. However, a limitation of GSMM approach is that GEMs are stoichiometric models, and cannot, in their current form at least, incorporate metabolite concentrations or enzyme kinetics (V max , K m , K cat ) [115,116]. Albeit more limited in scope, kinetic modeling [116] may help improve understanding of the dynamics of metabolic pathways in the human gut.
As indicated in this review, GSMM and CBA have provided computational tools and frameworks to study metabolism of gut microbiota. These tools guided researchers to study and identify the metabolic functions of individual microbes in the gut community. It also helped to infer their spatial dynamics, environmental interactions and metabolic resource allocations under a certain condition. We believe that, a combination of several computational and experimental approaches, may reveal the complex and diverse structure of the human gut microbiome and its underlying interactions with the host metabolic machinery. It might bridge the gaps in gut microbiome research and thereby, enhance our knowledge of human gut microbiota under health/disease conditions. thank Aidan McGlinchey for editing the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.