This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Kinetic modeling of metabolic pathways has important applications in metabolic engineering, but significant challenges still remain. The difficulties faced vary from finding best-fit parameters in a highly multidimensional search space to incomplete parameter identifiability. To meet some of these challenges, an ensemble modeling method is developed for characterizing a subset of kinetic parameters that give statistically equivalent goodness-of-fit to time series concentration data. The method is based on the incremental identification approach, where the parameter estimation is done in a step-wise manner. Numerical efficacy is achieved by reducing the dimensionality of parameter space and using efficient random parameter exploration algorithms. The shift toward using model ensembles, instead of the traditional “best-fit” models, is necessary to directly account for model uncertainty during the application of such models. The performance of the ensemble modeling approach has been demonstrated in the modeling of a generic branched pathway and the trehalose pathway in

Mathematical modeling is one of the cornerstones of metabolic engineering [

The majority of existing parameter estimation methods for the kinetic modeling of metabolic networks involve a single-step estimation, in which unknown parameters are estimated simultaneously by minimizing model prediction error [

The aforementioned issues give the motivation for developing and applying a different framework to construct metabolic and biological models from data, one that can explicitly account for model uncertainty. In this work, an ensemble modeling strategy is employed. Ensemble modeling has previously been applied to address structural uncertainty in the modeling of metabolic and other biological networks. For example, ensemble models of metabolic pathways could be created by enforcing thermodynamic feasibility constraints on the metabolic reactions and used for metabolic control analysis [

Here, we describe a step-wise model identification approach for the creation of an ensemble of kinetic ODE models from metabolic time profiles. Unlike the ensemble modeling work mentioned above, this approach is applied to tackle the uncertainty in the estimation of kinetic parameters. That is, models in the ensemble will share the same network topology, but differ in their parameter values. In essence, these models represent regions in the parameter space from which model prediction errors are (statistically) equivalent. Such an ensemble can be generated by exploring the parameter space using existing methods such as Metropolis-type random walk Markov chain [

Ordinary differential equations have been commonly used to model metabolic pathways. The model equations describe the mole balance around metabolites as they are enzymatically transformed from one to another. In this case, the system is assumed to be well-mixed (_{j}_{ji}_{i}

The aforementioned power-law model, also known as generalized mass action (GMA) model, belongs to a widely adopted framework for the modeling and analysis of biochemical processes, the Biochemical Systems Theory (BST) [

Briefly, the proposed ensemble modeling derives from the incremental identification or dynamic flux estimation method [_{M}_{M}_{M}_{M}

The metabolic pathway map in this case study is given in _{1}(_{0}) _{2}(_{0}) _{3}(_{0}) _{4}(_{0})] = [4 1 3 4] and [0.2 0.3 4.2 0.01], respectively. The ^{2} [

A generic branched pathway. (

In this example, the degree of freedoms is 2 (4 metabolites and 6 fluxes). Fluxes _{1}_{6}_{D}_{I}_{1}_{6}_{13}_{64}_{2}, _{3}, _{4}, _{5}} ∈ [0,100], {_{21}, _{33}, _{43}, _{44}, _{51}} ∈ [0,5]. In addition, the upper bound for allowable metabolic fluxes in this artificial network was set as 5×10^{5} mM/min.

Following the ensemble modeling procedure described in the Method section, the initial parameter point for the out-of-equilibrium adaptive Metropolis Monte Carlo (OEAMC) algorithm was taken from the parameter estimation minimizing the flux error function Φ_{R}

Ensemble kinetic modeling of the branched pathway model using Φ_{R}

CPU time (sec) ^{a} |
1664 |

Calculated volume of initial parameter space (_{ci}^{b} |
2.5 × 10^{5} |

Estimated volume of viable parameter space (_{ev}^{c} |
710.1 ± 5.1 |

Ratio of
_{ev}_{ci} |
(284.0 ± 2.0) × 10^{−3}% |

Range of slope errors |
[1.370 × 10^{−1}, 5.081 × 10^{−1}] |

Range of concentration errors |
[3.554 × 10^{−2}, 2.150 × 10^{−1}] |

a. The CPU time was the total time for the ensemble construction, which was run on a computer workstation with Dual Processors Intel Quad-Core 2.83 GHz.

b. _{ci}

c. _{ev}

d. The range of slope error was computed using Equation (14) for all models in the ensemble.

e. The range of concentration error was computed by Equation (15) for all models in the ensemble.

Two-dimensional projections of the viable parameter space onto the parameter axes of each independent flux (_{1}_{6}

Note that besides the Φ_{R}_{S}

The second case study was taken from the modeling of the glycolysis and trehalose production in the baker’s yeast _{1}_{2}_{3}_{4}_{5}_{6}_{7}_{8}_{ex}_{in}^{−2} L) and intracellular (7.17×10^{−3} L) volumes of the bioreactor and the cell population, respectively. The time-course concentration data have been obtained using _{1}_{3}_{4}_{5}_{6}^{2} [

Concentration simulations of five randomly selected models from the ensemble (solid blue, brown, green, red and purple lines)

Concentration simulations of the same five models as in

The original ODE model contains 6 metabolites and 8 fluxes, as shown in Equation (4). In this case study, the ODE for _{7}_{8}_{2}

According to Equation (4), we have 3 degrees of freedom (5 measured metabolites and 8 fluxes). Here, fluxes _{4}_{7}_{8}_{I}_{4}, _{7}, _{8}} and the kinetic orders {_{44}, _{73}, _{85}}, which were constrained within [0, 100] and [0, 5], respectively. Note that the glucose transport flux (_{1}_{1} (a constant decrease at high _{1} and an exponential-like time profile at low _{1}). The regression of the MM kinetic parameters can also be casted as a linear regression problem as follows:
_{1} ∙ _{1}] is the vector of element-wise multiplication of _{1} and _{1}. Finally, the upper bound for flux values was set as 5 × 10^{5} mM/min, according to the maximal flux value reported in a similar glycolytic pathway [

The initial parameter point for the OEAMC algorithm was again obtained by minimizing Φ_{R}^{−2}) and the upper 95% confidence bound was found using a Monte Carlo approach (viable ^{−3}% of the original constrained parameter space. The slope errors were acceptable, but the concentration errors had a high upper bound. Upon a closer inspection, only a minority of the model (3 out of 3423) had concentration errors larger than 10^{2}, and removing these, the upper bound for the concentration error reduces to 35.92. This issue is not unexpected as the model ensemble was created based on the flux error function and not the concentration error. In particular, there is no guarantee that parameter values with a small flux error will also provide a low concentration error. However, we note that the divergence between the flux error and concentration error functions occurred only rarely (< 0.1%).

The trehalose pathway in

Two-dimensional projections of the viable parameter space onto the parameter axes of each independent flux (_{4}_{7}_{8}

Concentration simulations of five randomly selected models from the ensemble (solid blue, brown, green, red and purple lines)

Ensemble kinetic modeling of the trehalose pathway model using Φ_{R}

CPU time (sec) | 6489 |

Calculated volume of initial parameter space (_{ci} |
1.25 × 10^{8} |

Estimated volume of viable parameter space (_{ev} |
3237 ± 125 |

Ratio of
_{ev}_{ci} |
(25.90 ± 1.00) × 10^{−4}% |

Range of slope errors |
[5.825, 46.42] |

Range of concentration errors |
[1.125, 3.880 × 10^{2}] |

The difficulty in simultaneously estimating kinetic parameters of metabolic models is often caused by a lack of complete parameter identifiability [

In this work, we have used the DOF in estimating dynamic fluxes from time-slopes of concentration data _{k}_{k}_{k}_{D}_{R}_{C}_{R}_{k}_{k}_{C}

The proposed ensemble modeling method has the advantages that (1) the model ensemble is compactly defined using a small number of independent parameters; (2) the dependent parameters can be efficiently computed from the independent parameters; (3) only biologically-meaningful models are included in the model ensemble; and (4) data uncertainty (noise) is explicitly accounted for. The first two aspects come as courtesy of the step-wise identification approach adopted in the development of the method. The computational cost of constructing the model ensemble is related with the parameter exploration and the computation of the error function. The compactness of the parameter space of the ensemble is therefore particularly important for numerical efficiency and ultimately for practical applications. For OEAMC and MEBS algorithms, the number of required parameter samples during parameter exploration has been shown to increase linearly with the parameter dimension, which in this case is equal to the number of independent parameters [

In the proposed ensemble modeling, the model uncertainty is related to parametric uncertainty that arises from data noise, leaving out the contribution of structural uncertainty (mismatch between the assumed model equations and the true dynamics). Increasing data noise is therefore expected to increase the size of the model ensemble,

We have also made the assumption that there exists a unique solution to the computation of _{D}_{I}_{D}_{D}

Constraints on parameters and fluxes are important in restricting the size of the ensemble, in a problem dependent manner. For example, in the first case study, the ensemble hit the lower constraints on both kinetic order parameters (set at 0) and the upper constraint for the rate constant _{1} (see

The ensemble modeling can be integrated into the iterative model building procedures for biological systems [

Finally, the ability to generate an ensemble of kinetic models also necessitates the development of new methodologies on how to utilize such ensemble. The obvious challenge is how to analyze and/or optimize the system when it is represented by a set of models, not just one model, possibly containing a large number of members. Here, we suggest two strategies: the first involves the generation of a (random) sample of models from the ensemble and in such a case, the results from the analysis and optimization can be represented in the form of a histogram. The second strategy is to take the advantage that the ensemble model generation involves only linear (or log-linear) algebraic equations. In this case, interval or constraint propagation using interval arithmetic can be used to evaluate upper and lower bounds for the system behavior, as done previously for GMA models [

The ensemble modeling procedure is based on the incremental identification or DFE approach for parameter estimation, where kinetic parameters are estimated in three incremental steps. Initially, given time-course concentration measurements _{M}_{k}_{M}_{k}_{M}_{k}_{k}_{k}_{M}_{k}

Consider the typical scenario where the number of reactions in the metabolic pathway exceeds that of metabolites (_{k}_{M}_{k}_{k}_{DOF}_{DOF}_{k}

In the following, the flux vector is decomposed into _{k}_{I}_{k}^{T}_{D}_{k}^{T}^{T}_{I}_{D}_{I}_{D}_{I}_{k}_{D}_{k}_{DOF}_{D}_{D}_{I}_{I}_{I}_{k}_{I}_{M}_{k}_{I}_{D}_{D}_{k}_{I}_{D}_{D}_{k}_{D}_{M}_{k}_{D}

In the above, we have assumed that time-series data for all metabolites in the model are available. When one or more metabolites are not measured, we can modify the procedure by first rewriting the ODE model, separating the balances associated with those that are measured and those that are not:
_{I,M}_{D,M}_{M}_{M}_{I,M}_{D,M}_{k}_{I}^{T}_{D}^{T}^{T}_{DOF}_{D,M}_{U}_{I}_{I}_{I}_{D}_{k}_{I}_{k}_{U}_{I}_{M}_{k}_{U}_{k}_{I}_{U}_{U}_{M}_{U}_{M}

Here, the model ensemble embodies two types of uncertainty: mathematical and statistical. The mathematical uncertainty is related to the aforementioned DOF in the mass balance, while statistical uncertainty is associated with noise in the concentration data. Now, even when different combinations of _{I}_{D}_{M}_{k}_{k}_{M}_{k}_{I}_{M}_{k}_{k}

In this work, the parameter exploration is carried out using the HYPERSPACE toolbox, specifically the out-of-equilibrium adaptive Metropolis Monte Carlo (OEAMC) and multiple ellipsoid-based sampling (MEBS) method [_{MC}

Flowchart of the out-of-equilibrium adaptive Metropolis Monte Carlo (OEAMC) algorithm. On the right, the red closed curves represent hypothetical contour plots of the viable parameter space. The viable points are marked in blue and the nonviable points are marked in red. Finally, the grey areas illustrate the minimum volume enclosing ellipsoids. This figure is adapted from the original publication [

The MEBS method is designed to produce fine-tuned hyper-ellipsoids that tightly bound viable regions in the parameter search space, based on another algorithm that has been introduced elsewhere [_{MC}_{i}_{i+1}_{i}_{MC}

Flowchart of the multiple ellipsoid-based sampling (MEBS) algorithm. In the right part of the figure, the red closed curves represent hypothetical contour plots of the viable parameter space defined by some criteria. The viable points are marked in blue and the nonviable points are marked in red. Finally, the grey areas illustrate the minimum volume enclosing ellipsoids. This figure is adapted from the original publication [

Given any values of the independent parameters _{I}_{v}

In the case studies, the upper confidence bound of the error function Φ was obtained using a Monte Carlo approach. Specifically, 100 sets of time profiles were randomly generated from a Gaussian distribution using the measured concentration data as the mean values. The variance of the data noise was estimated from the residuals of the data smoothing procedure. For each dataset, the same data smoothing and slope calculation were performed and the corresponding parameter estimates were obtained by minimizing the error function (see below). The confidence bound was directly estimated from the set of 100 values of Φ. For example, the 95% upper confidence bound of the upper bound of the error function is approximated by the

In the examples, the error function Φ was set to be:
_{D}_{D}_{k}_{D}_{D}_{k}_{D}_{k}_{I}_{k}_{I}_{k}_{I}_{I}_{M}_{k}_{I}_{I}_{I}_{k}

The model ensemble procedure starts with finding an initial viable point for the OEAMC algorithm, as discussed above. Next, the upper bound for the error function will be set either by applying standard statistical analysis assuming Gaussian noise or using the Monte Carlo algorithm described in the previous subsection. The OEAMC is then applied to generate the coarse-grained set of viable parameters over the space of the independent parameters. Finally, this set becomes the input to the MEBS algorithm, producing a population of viable parameters _{I}

The kinetic modeling of metabolic networks is challenging, but critical in many applications of metabolic engineering. Particularly, parameter identifiability issue, wherein not all parameters can be uniquely determined from the data, has been identified as a common root cause of the difficulty in this process. This uncertainty in parameters implies that there exist (infinitely) many models that will give statistically equivalent goodness of fit to data. Built on the concept of incremental identification, we have proposed an efficient ensemble modeling procedure that relies on three components: (1) data smoothing and approximation of time-series metabolic concentration data, (2) a compact parameter space defining the model ensemble, and (3) efficient parameter exploration. The applications for the ensemble modeling of a generic branched pathway and the trehalose pathway in

The authors would like to acknowledge the funding support from Singapore-MIT Alliance and ETH Zurich. We also would like to thank Dr. Adrián López García de Lomana and Prof. Andreas Wagner for their assistance in using the HYPERSPACE toolbox.

The authors declare no conflict of interest.

Supplementary File (DOCX, 242 KB)