- freely available
- re-usable
metabolites 2013, 3(4), 946-966; doi:10.3390/metabo3040946
Published: 14 October 2013
Abstract
: Thermodynamics constrains the flow of matter in a reaction network to occur through routes along which the Gibbs energy decreases, implying that viable steady-state flux patterns should be void of closed reaction cycles. Identifying and removing cycles in large reaction networks can unfortunately be a highly challenging task from a computational viewpoint. We propose here a method that accomplishes it by combining a relaxation algorithm and a Monte Carlo procedure to detect loops, with ad hoc rules (discussed in detail) to eliminate them. As test cases, we tackle (a) the problem of identifying infeasible cycles in the E. coli metabolic network and (b) the problem of correcting thermodynamic infeasibilities in the Flux-Balance-Analysis solutions for 15 human cell-type-specific metabolic networks. Results for (a) are compared with previous analyses of the same issue, while results for (b) are weighed against alternative methods to retrieve thermodynamically viable flux patterns based on minimizing specific global quantities. Our method, on the one hand, outperforms previous techniques and, on the other, corrects loopy solutions to Flux Balance Analysis. As a byproduct, it also turns out to be able to reveal possible inconsistencies in model reconstructions.1. Introduction
Starting from the discovery by Lavoisier concerning the relation between respiration and combustion, thermodynamics stands as a key physical framework for understanding metabolism and physiology, from single cell to whole organisms. When applied to a given metabolic reaction network, at the simplest level, thermodynamics requires that, in non-equilibrium steady states, fluxes of matter proceed downhill in the underlying Gibbs (free) energy landscape. Violations of this rule (which corresponds to nothing but the second law of thermodynamics) are signaled by the existence of unphysical cycles in flux configurations [1]. In the current era of metabolic genome-scale reconstructed networks, the implementation of such a constraint in computational models of a cell's metabolism has far-reaching implications [2], ranging from the physical feasibility of flux configurations [3] to the estimation of metabolite levels [4], the assignment of directionality for reactions and pathways [5] and the characterization of the overall chemical energy balance [6]. Accounting for thermodynamics in genome-scale models, however, poses considerable practical problems both for algorithms and for CPU costs.
The reference modeling scheme that we shall consider here is given by the so-called constraint-based models [7], widely employed in the literature to describe the operation of a biochemical reaction network at steady states with time-independent metabolite levels. While building a detailed model of metabolism presupposes knowledge of the kinetic parameters and reaction mechanisms [8], and should possibly take into account stochasticity [9] and spatial diffusion [10,11], constraint-based models focus on well-mixed non-equilibrium steady states (NESSs) for the reaction fluxes, to recover what fundamental information comes from the underlying stoichiometry alone. For a given stoichiometric matrix S = {S_{mr}} that accounts for the stoichiometric coefficient of metabolite m in reaction r (with the usual sign convention to distinguish products from substrates), a flux vector v = {υ_{r}} represents a non-equilibrium steady state if it enforces the balance of metabolite levels c = {c_{m}}, i.e., if:
Solutions of Equation (1) are in general not guaranteed to be thermodynamically viable. Frameworks, like FBA, can however be modified to include thermodynamic constraints directly in order to generate thermodynamically viable flux configurations, for instance, by resorting to empirical data to estimate the chemical potentials of metabolites [16] and infer reaction reversibility more precisely [17,18]. As a matter of fact, a large part of thermodynamic inconsistencies appear to be due to fallacious direction assignments. Models of this type, however, require prior biochemical information that is often scarce or unavailable [19]. To overcome these difficulties, new methods were devised that detect infeasible loops leveraging only on the constraint based model, i.e., on the structure on the metabolic network alone [20–22]. Although these methods release the need for experimental knowledge, the direct detection of infeasible loops is a computationally demanding task that limits their applicability. It is therefore important to devise algorithms that are able to identify and remove thermodynamic inconsistencies from solutions of Equation (1) or, more generally, from generic flux patterns.
Checking the thermodynamic feasibility of a flux pattern can be made straightforward. Denoting by v′ a flux vector, from which we exclude uptakes and every reaction that cannot be associated directly with a thermodynamic constraint (like biomass production, “effective” reactions with non-integer stoichiometry or null fluxes), let us define the matrix Ω = {Ω_{mr}} with elements ${\mathrm{\Omega}}_{mr}=-\text{sign}({\upsilon}_{r}^{\prime}){S}_{mr}$. Note that the minus sign in the definition of Ω is needed to connect the directions of the reactions to the corresponding Gibbs energy differences. The thermodynamic feasibility of v′ is easily seen to be guaranteed (see the Supporting Text for a toy example) if a non-zero vector μ = {μ_{m}} (of chemical potentials) exists, such that [23]:
Finding all cycles in a directed (bipartite) network is, at the heart, an integer programming problem in the NP-hard (Non-deterministic Polynomial time class) [26], which suggests that using deterministic algorithms to find loops in large enough networks may be unwise. However, for networks in which reliable prior thermodynamic information is available, the complexity of loop counting can be significantly reduced, and indeed, in some cases, the problem has already been tackled (although, in our view, not fully solved) in genome-scale networks with some degree of success [24,27]. By contrast, in large networks lacking detailed thermodynamic information, like the human metabolic networks, the implementation of thermodynamic constraints requires the development of algorithms that are able to handle more difficult instances of the loop counting problem. Luckily, in many hard computational problems where the use of exact algorithms is prevented by CPU costs, stochastic methods have proven to be effective. A biologically relevant case is represented by the problem of sampling solutions of Equation (1), for which Monte Carlo [28,29] and message-passing techniques [30] are being employed instead of deterministic methods (as the latter presuppose the enumeration of a possibly exponential number of vertices of the polytope). It is simple to guess that a similar strategy might be employed (as we shall see, with some care) for the analysis of the solutions of Equation (3), i.e., to identify reaction cycles.
The strategy we present here combines a relaxation algorithm and a Monte Carlo method to allow for the thorough analysis of thermodynamic infeasibilities on genome-scale metabolic networks of unprecedented size. More precisely, loops will be found by applying Monte Carlo to Equation (3) with a reduced search space obtained by analyzing how relaxation behaves when applied to Equation (2). Once a loop is found, it can be removed in several ways, provided they do not violate any of the constraints other than thermodynamic (e.g., mass balance). We shall discuss and compare different approaches: more precisely, a “local” rule that exploits, in essence, the fact that fluxes in cycles are defined up to a constant and a “global” rule, based on the minimization of an overall function of the fluxes. The method will be used to analyze different types of networks of a large size. Specifically, we shall first identify all loops in the metabolic network of E. coli [31], then focus on amending the FBA solutions of 15 different human metabolic network models derived from the genome-scale Reactome Recon-2 [32], all bearing a specified objective function. Such solutions turn out to be rich with infeasible cycles, which we are able to find and correct.
The structure and rationale of the method we propose are discussed in detail in Section 2, together with a brief summary of the network reconstructions we shall employ. Section 3 exposes our results, while our conclusions are reported in Section 4.
2. Materials and Methods
2.1. Materials: Metabolic Network Reconstructions
The human Reactome Recon-2 [32] has been reconstructed by a community that merged and integrated existing global human metabolic networks and transcriptional information on specific human cell types. Authors verified the quality of Recon-2 by determining how many tasks the network was able to perform. A task can be as simple as the transformation of a metabolite by a single enzyme or by a complex pathway—like fermentation or oxidative phosphorylation—or as complex as the production of the building blocks, energy, cofactors, etc. required for cell duplication, i.e., biomass. For E. Coli and, in general, for unicellular organisms, biomass yield is a valuable objective function for the FBA framework [33], since its maximization essentially equals growth maximization at fixed nutrient intake. Although it is unlikely that, in normal circumstances, cells in a multicellular organism maximize the biomass yield, we stick to it as the FBA objective function, as, for our purposes, the objective function can be seen merely as a tool to obtain motivated flux patterns for thermodynamic analysis.
In addition to the global reconstruction, [32] provides a collection of 65 drafts of cell-specific networks, which are derived from Recon-2 by means of an automatic procedure that utilizes proteomic data [33,34]. We focused on 15 networks with the ability to produce biomass, which we list in the first column of Table 1 together with the number of reactions (N) and metabolites (M) included in each case. We have used, in particular, the network reconstructions in SBML (System biology Markup Language) format from [32] and resorted to the COBRA (COnstraints Based Reconstruction Analysis) Toolbox [33] for the FBA analysis and to produce the stoichiometric matrix and the list of metabolites and reactions to be used in our analysis.
We have also analyzed the reconstructed metabolic network of the bacterium E. coli derived in [31], consisting of 2,382 reactions (including 305 uptakes) among 1,668 metabolites. In this model, 548 reactions are putatively reversible.
In each case, the key information we employed is encoded in the stoichiometric matrix S.
2.2. Methods
2.2.1. Algorithm for Thermodynamic Analysis: Structure
Before describing the algorithm in detail, we briefly recall the idea behind the procedure. We do not directly assess whether the flux configuration is loop free, but we try to compute the chemical potentials that satisfy Equation (2), which is a computationally easier problem to solve. If such a a solution exists, we are guaranteed that the flux configuration does not contain infeasible loops. It can be demonstrated [24] that the relaxation method described below always converges polynomially to a solution; a lack of convergence signals the presence of infeasible loops. Even when the relaxation method does not converge, it still provides us with a list of reactions that are likely to contain infeasible loops. To this limited subset of reactions, we can directly apply algorithms to find loop-free solutions.
The overall structure of the algorithm is reported in Figure 1.
In a few words, and referring to points A, B, C.1, C.2 and D shown explicitly in the flow chart:
- (A)
Input: the input information includes a stoichiometric matrix, S, a flux vector, v (e.g., a solution of FBA) and a prior vector, μ, of chemical potentials. Initialize an integer variable, t, at t = 0 (relaxation steps) and an empty list.
- (B)
Compute the matrix, Ω, and evaluate the thermodynamic constraints (2), i.e., compute μΩ. If they are satisfied, i.e., if μΩ > 0, go to (D); otherwise, register the least unsatisfied constraint (l.u.c.), i.e., the value of the index, r, for which the corresponding components of the vector, μΩ, is smallest (more negative). Insert it into the list, and increase the t variable by 1; if t < T, with T, a pre-defined large parameter, go to (C.1); otherwise, go to (C.2).
- (C.1)
Update the vector, μ, by performing a single step of the relaxation algorithm described in Section 2.2.2.; update the list by inserting the new l.u.c. and go back to (B).
- (C.2)
Perform a Monte Carlo computation, as described in Section 2.2.3., in order to find a solution of system Equation (3), namely Ωk = 0, including only the reactions appearing in the list. Once a solution is found, correct the associated cycle as described in Section 2.2.4. and 2.2.5. ; re-initialize t, empty the list and go back to (B).
- (D)
Output: a thermodynamically feasible flux vector.
In the following sections, we shall describe the sub-procedures (relaxation method, Monte Carlo and cycle removal) of the algorithm in detail. A C++ code, which performs each of the above steps, is provided as Supporting Material. It is worth pointing out that the present study is not concerned with the calculation of realistic chemical potentials. Rather, we simply require their existence in order for the flux configuration to be feasible. In this case, the prior vector of chemical potentials can be arbitrary, e.g., constant or composed by independent and identically distributed random variables. If complemented with specific experimentally determined or computationally estimated priors for the chemical potentials, however, the relaxation method included in the above algorithm generates, as a by-product, a free energy vector compatible with the final flux configuration and can, thus, be employed to refine experimental data on the free energy of the formation of metabolites and/or on their levels [24]. Better priors ultimately allow one to obtain more precise estimates for the real chemical potentials, but are essentially irrelevant for the convergence of the relaxation method. In what follows, we shall neglect this aspect, which is discussed in depth in [24], and focus exclusively on the retrieval of cycles.
2.2.2. Checking Thermodynamic Viability by Relaxation
This routine, corresponding to point C.1 of the flow chart, allows one to retrieve a solution of Equation (2) starting from a vector of chemical potentials that is not a solution thereof. For simplicity, we construct an initial vector made of uniformly distributed random numbers. At any step, t, of the procedure, given a chemical potential vector μ(t), the relaxation algorithm corrects the l.u.c. of Equation (2) through the dynamics defined by:
2.2.3. Identifying Loops by Monte Carlo
As said above, cycles generically correspond to solutions of Equation (3) with k ≥ 0. As the stoichiometric coefficients are typically integers, one can focus on searching solutions with k_{r} non-negative integers for each r. To this aim, the following method (borrowed from the standard statistical physics toolbox) can be employed. Starting from Equation (3), note that the function:
It is worth noting that, based on the above discussion of the relaxation method, the number of reactions to be included in the above procedure equals the number of distinct reactions appearing in the list, which is usually much smaller than N. To give an idea, in the study of E. coli, whose results are reported below, our lists ended up containing at most 50 reactions, to be compared with the over 2,000 that form the genome-scale reconstruction. Hence, the computational costs of the Monte Carlo step of our algorithms are overall modest.
2.2.4. Correcting the Flux Configuration: Local Strategy
Once a flux cycle has been identified, there are multiple ways to remove it and re-organize the flux pattern, while still preserving all constraints and, eventually, the values of objective functions.
To clarify the situation, consider the following simple example with four reactions, pictured in Figure 2.
υ_{1} and υ_{2} are “internal” fluxes, while υ_{3} and υ_{4} are an intake and an outtake flux, respectively. The stoichiometric matrix S and the flux vector v reads:
After the elimination of the uptakes, the internal stoichiometric matrix and flux vector (which, for clarity, in this section, we denote as S^{int} and v^{int}) are given by:
Given a solution v of Equation (10), any linear combination of the form:
Equation (11) can be used to correct the infeasible loops, while still satisfying all mass balance constraints. The simplest possible correction scheme is based on the idea that by properly fixing the value of the coefficients, L^{a}, one can lift the degeneracy and rid the flux configuration of loops. There is, however, a major caveat. To make it explicit, we note that (a) the signs of the fluxes ${\upsilon}_{r}^{\prime}$ depend on the choice of the coefficients L, so that the matrix Ω will also depend on it; and (b) it is not guaranteed that the new fluxes v′ will vary within the same bounds as v. This means that Equation (3) must be solved together with all other constraints which are not related to the stoichiometry, such as sign constraints. We will write these constraints in a general fashion as Av′ ≥ b. In our example, this matrix inequality reduces to υ_{2} ≥ ϵ.
With this notation, the space of vectors L yielding thermodynamically feasible solutions is given by _{1} ∩ _{2}, where:
Suppose now that, in our example, we are given the flux vector v^{*} = (3, 1) as the solution to Equation (10). We see that the vector n = (1, 1) is in the null space of S^{int} and that the new vector v′ = v^{*} + Ln still satisfies (10). Furthermore, since both ${\upsilon}_{1}^{\mathbf{*}}$ and ${\upsilon}_{2}^{\mathbf{*}}$ are positive, n itself identifies a loop, i.e., it is a solution of (3) (for, in this case, Ω = −S^{int}). It can be easily checked that, in this example, as long as the constraint υ_{2} ≥ ϵ is not taken into account, we can pick L in the interval [−3, −1] to get rid of the cycle. The choice is arbitrary and produces a fully directional flux pattern, such that reactions υ_{1} and υ_{2}, if both active, operate in the same direction. The constraint υ_{2} ≥ ϵ, however, implies L ≥ ϵ − 1. The sets, _{1} and _{2}, are then given by:
From this last observation, we can deduce the following general loop removal strategy (which we refer to as the “local” correction strategy): for a given loop k^{a}, choose the value of L^{a} that sets to zero the flux of the reaction whose absolute value is the smallest, i.e.,:
2.2.5. Correcting the Flux Configuration: Global Strategy
Other possible loop-removing procedures are based on the minimization of some norm of the fluxes, as suggested in [37]. Let us elaborate this idea further and consider the function Q_{p}(v) = Σ_{r} |υ_{r}|^{p} with p ≥ 1, representing, for different p's, different norms of the flux vector v (Q_{1} is the so-called “Taxicab” norm, Q_{2} is the square of the Euclidean norm, etc.). Suppose we have an FBA solution v^{*} that minimizes Q_{p}. If {n^{a}} is the set of all null space vectors of the stoichiometric matrix, we can construct a new solution v′ = v^{*} + Σ_{a} L^{a}n^{a} and compute the partial derivative of Q_{p} with respect to a coefficient, L^{a}. The derivatives vanish when evaluated at L = 0:
If at least one of the fluxes ${\upsilon}_{r}^{\mathbf{*}}$ is zero, this reaction cannot be involved in any cycle. In particular, n^{a} is not associated with a loop.
If all fluxes are non-zero, the vector k^{a} cannot have a definite sign (positive or negative), since the sum of its entries, namely Equation (17), weighted with some positive coefficients, is zero.
Therefore, the vector v^{*} that minimizes Q_{p} does not contain cycles.
The argument can be easily extended to include irreversibility constraints. Let v^{*} denote the flux configuration that minimizes Q_{p}(v) with irreversibility constraints υ_{r} ≥ 0 for the reactions, r, belonging to the set = {r_{1}, r_{2}, …}. We shall instead denote by ${\mathcal{I}}_{0}=\{{r}_{1}^{\prime},{r}_{2}^{\prime},\dots \}\subseteq \mathcal{I}$ the set of irreversible reactions for which υ_{r} = 0 in v^{*}. Clearly, v^{*} also minimizes Q_{p} subject to the stronger constraints υ_{r} = 0 for r ∈ _{0} and υ_{r} > 0 for r ∈ \ _{0}. Given this, one can now proceed along the same lines as before, because, for any vector n^{a} in the null space of S^{int}:
If some reaction, r, for which ${n}_{r}^{a}\ne 0$ is forced to have zero flux, since r ∈ _{0}, then n^{a} is not associated with a cycle;
Otherwise, we can demonstrate that n^{a} does not correspond to a cycle by taking the partial derivative of Q_{p}(v^{*} + L^{a}n^{a}) as done above.
Problems may arise, as before, when boundary conditions like υ_{r} ≥ ϵ with ϵ > 0 have to be considered. In particular, if the flux of a variable thus bounded is fixed to take the value ϵ, there is the possibility that the cycle cannot be removed. In the example discussed above (see Figure 2):
If ϵ < − 1, then υ_{2} > ϵ and the Q_{p} minimization yields v^{*} = (1,−1);
If − 1 ≤ ϵ ≤ 0, then υ_{2} = ϵ, but the flux configuration v^{*} = (1 − ϵ, ϵ) is still feasible (in particular, the configuration is feasible for ϵ = 0);
If ϵ > 0, then υ_{2} = ϵ, and the optimal flux configuration is not feasible.
In summary, the global minimization of the norm Q_{p} produces thermodynamically feasible flux patterns, provided they are allowed by the constraints. If not, the minimization can get rid of all loops not involving reactions constrained to keep the same sign. We shall term the cycle-removal strategy based on minimizing a norm as the “global” strategy.
3. Results
3.1. A Test: Identifying Infeasible Loops in the E. Coli Network iAF1260
As a proof of principle, we have applied our method to search and enumerate all independent infeasible loops of a large metabolic network reconstruction for the bacterium E. Coli, the iAF1260 [31]. Other authors have attempted to solve the same enumeration problem before (see, e.g., [27]). We note, however, that our method is radically different in that we make use of the theorem of alternatives and do not directly search for loops on the graph, which is the more standard route [38], or rely on subsequent optimizations that reduce the search space [27,39]. In the present case, we characterize cycles in an ensemble of net-flux patterns generated randomly by assigning a specific operating direction to each reaction according to its reversibility. More precisely, random flux patterns are generated by simply assigning an operation direction for each reaction as follows: if the reaction is irreversible, we pick the allowed direction; if the reaction is reversible, we select the forward or the reverse direction randomly with a probability of 1/2 (note that direction assignments suffice to pose the problem of thermodynamic feasibility. In this way, all reactions are active, a worst-case scenario with respect to a growth-yield optimizing state that normally only requires the operation of around 30% of the reactions, implying (in our case) a much larger number of loops and, in principle, higher computational costs for loop counting. For each configuration, we look for and eliminate cycles until the material flow is thermodynamically consistent, recording the cycles that we have detected. Finally, we keep only independent loops by applying Gaussian elimination (i.e., we exclude from our list loops that can be decomposed as the sum of, say, two simpler loops).
In Figure 3, we display the number of independent loops that we identify as a function of the number of random configurations tested.
We identify 196 loops (189 of which turn out to be of a size of three or more) after having generated about 80,000 random configurations, and no new loops appear upon enlarging the test ensemble. The loops thus found are listed in Supporting File 1, and a histogram of the cycle lengths (in terms of the number of reactions involved) is displayed in Figure 4.
We note that, in [27], 591 cycles were identified, 564 of which are, however, formed by two reactions, mostly originating from the fact that reversible reactions were, in that study, split in two separate processes (forward and reverse). Therefore, only 27 of those cycles were formed by three reactions or more. Because we do not split reversible reactions, we find only seven cycles of a length of two and 189 loops of a length at least equal to three. We note that these 189 cycles span 396 reactions altogether. This suggests that, for the technique employed in [27], some loops were undetectable once the exhaustive search had been restricted to 50 reactions. We stress, however, that the procedure discussed in [27] is, in principle, exact, and once the restriction is removed, it might be able to identify more cycles involving at least three reactions.
3.2. Inconsistencies in the FBA Solution for the Overall Human Reactome Recon-2
We now move on to the identification of thermodynamic infeasibilities in the human Reactome Recon-2 [32]. In specific, we have analyzed the feasibility of flux patterns defined by solving FBA on the entire Reactome, using the “biomass” reaction that comes with the reconstruction as the objective function. This section provides a concrete example of an inconsistency that is unrecoverable without correcting basic structural information concerning the network. It should be kept in mind, however, that the physiologically relevant metabolic networks that can be obtained from Recon-2 are the cell-type specific ones, which will be discussed in the following section.
As almost all metabolic objective functions, the biomass reaction of Recon-2 contains ATP hydrolysis, representing the energetic requirements associated with cell duplication, which are not explicitly accounted for by the flux organization. As such, requirements are typically large: the stoichiometry of ATP in the biomass reaction is often two orders of magnitude larger than that of the other chemical species. Hence, ATP tends to be the limiting factor for biomass production, and FBA solutions will often organize metabolic fluxes, so as to produce as much ATP as possible. This however turns out to lead, in Recon-2, to a violation of thermodynamics. In particular, in the FBA solution for Recon-2, we detect a huge number of cycles involving the active and passive transport of a metabolite through a membrane, as, e.g., for the transport of stearoyl-CoA (stcoa) from cytosol (c) to peroxisomes (x), namely (see Figure 5).
ATP-coupled reactions are a common, though not the only, source of thermodynamic inconsistencies that can be spotted in Recon-2 (see Supporting File 2 for the complete list of cycles we identified in the Recon-2-derived cell-type specific networks. It is however important to stress that they are spurious and may be identified easily by complementing Recon-2 with a maintenance reaction that mimics the energy expenditure associated with basal processes (similar to those that are present in bacterial metabolic networks) and even cured automatically (or with an automated procedure) by fixing the directionality of active transports directly in the reconstruction (when possible).
3.3. Correcting Infeasible Loops in FBA Solutions for Cell-Type Specific Human Metabolic Networks
In this section, we focus on finding and correcting infeasible loops in FBA solutions of the cell-type specific human metabolic networks obtained by Recon-2. We have restricted our attention to 15 networks carrying an objective function, representing, respectively, cerebral cortex neuronal cell, liver bile duct cell, cervix uterine squamous epithelial cell, kidney tubule cell, gall bladder cell, lung macrophage, small intestine glandular cell, rectum glandular cell, smooth muscle cell, urinary bladder urothelial cell, pre- and post-menopause uterus glandular cell, pancreatic exocrine glandular cell, tonsil germinal cell and squamous epithelial cell. We first computed the FBA solutions for each of the networks via the COBRA Toolbox [33]. We have computed optimal solutions with respect to the biomass objective function. The choice is mainly motivated by the fact that maximizing the biomass yield represents a network-wide goal with respect to the more specific tasks described by other objective functions included in the reconstructions. We stress however that for our present purposes, the objective function merely provides a means of obtaining flux patterns; hence, the particular choice we made is immaterial for the problem we consider. Subsequently, we identified infeasible loops using the method described in Section 2.2.3. and, finally, corrected thermodynamic inconsistencies using both the local and global strategies described in Sections 2.2.4. and 2.2.5. (in the latter case, minimizing the Q_{1} norm, while fixing sinks, uptakes and objective function to the values of the FBA solution). In particular, with the local strategy, we eliminate one infeasible loop at a time, making sure that no constraint is violated by the corrected solutions, including the value of the objective function. We note, however, that the local strategy does not return a unique thermodynamically consistent network, since the final flux pattern may depend on the order with which loops are removed. We shall see that, quite generically, this strategy produces flux patterns that are more similar to the original (infeasible) solutions than those generated by the global correction strategy.
Results are shown in Table 1, where we list different topological quantities (specifically, the overall number of reactions and metabolites and the number of reactions carrying a non-zero flux and that of metabolites that are produced and consumed by at least one reaction) for the original (infeasible) FBA solutions and for the corrected flux patterns, both for the local and global strategies, as well as the number of loops in the FBA solutions that need to be corrected by the local strategy. One sees that the local strategy typically needs to resolve several hundreds of inconsistencies in order to obtain viable solutions and that correction strategies enforce a reduction in the number of active processes, which in certain cases, can be rather dramatic. Supporting File 2 lists the cycles we identified and corrected in each of the 15 metabolic networks we have analyzed. To quantify more precisely the similarity between the solutions thus obtained, we have measured the ‘overlap’ parameter defined as follows: given two flux configurations, ${\mathbf{v}}^{a}=\left\{{\upsilon}_{r}^{a}\right\}$ and ${\mathbf{v}}^{b}=\left\{{\upsilon}_{r}^{b}\right\}$, we let:
Clearly, q_{ab} = 1, if v^{a} = v^{b}, while the more different fluxes are in the two solutions the smaller q_{ab} gets, until q_{ab} = −1, if v^{a} = −v^{b}. Larger values of q_{ab}, therefore, generically point to the fact that the two flux vectors are more similar also in terms of their directions. (Note that in computing (20), one should account for the fact that a flux that is null in both solutions contributes one to the above sum. In this study, for numerical reasons, a flux, υ_{r}, is considered to be null whenever υ_{r} < υ_{0}, where υ_{0} is a (small) threshold. Results have been obtained with υ_{0} = 10^{−6}, but they are robust to changes in this value. Values of the overlaps between the three solutions we consider (original FBA, FBA corrected by the local strategy, FBA corrected by the global strategy) are also displayed in Table 1, clarifying that the local strategy applied to our sample always generates flux patterns that are closer to the original (infeasible) solution than those obtained by the global strategy. Nevertheless, the overlap between the locally- and globally-corrected solutions can also be rather large in some cases, suggesting that a common physical, possibly variational, requirement may underlie, to some extent, the two criteria.
The final column of Table 1 shows the sign of the Gibbs energy change of ATP hydrolysis that is obtained in the solution corrected by the global strategy. This provides an interesting check of physiologic consistency, as solutions should be compatible with a spontaneous ATP hydrolysis in vivo (i.e., with a negative Gibbs energy difference). We find that only for five models does Q_{1} minimization provide (thermodynamically feasible) flux configurations carrying a negative Gibbs energy difference for ATP hydrolysis. A possible, simple to obtain improvement of the method we present indeed includes taking into account physiological aspects when correcting a flux configuration. We stress once more, however, that these types of infeasibilities are due to inconsistent constraints or wrong reversibility assignments that prevent the existence of feasible, energetically realistic flux patterns and can be eliminated already at the stage of network reconstruction. Our main goal here was to show that our method is capable of identifying and correcting loops. By this type of example, we prove that it can furthermore point to possible limitations of the current models.
4. Discussions
Accounting for thermodynamic constraints in stoichiometry-based flux models, though potentially highly rewarding (in terms of the possibility to predict metabolite levels, chemical potentials, reaction free energies and reversibility), is a generically hard task. Methods that integrate directly with the constraints defining the space of viable fluxes are often computationally intensive and either presuppose prior biochemical knowledge or lead to a considerable increase in the number of parameters (or both). The technique presented here makes use of stoichiometry alone (hence, it is essentially a topological method) and allows us to accomplish two goals: on the one hand, counting and listing the infeasible reaction cycles that spur flux configurations derived from thermodynamics-free models; on the other, correcting such infeasibilities in a physically motivated manner. Indeed, we have first analyzed the genome scale metabolic network reconstruction iAf1260 of the bacterium E. coli. By simply recording the cycles found in randomly generated flux patterns we are able to uncover a much larger set of (much more complex) loops than previously obtained, also involving a much larger overall number of processes, comparing, in particular, with [27] (in this sense, outperforming previously employed methods). In passing, we note that our method comes with a certificate of completeness for the set of cycles, which was previously unavailable. Secondly, after showing that cycles plague FBA solutions for the metabolic networks of several different types of human cells (all retrieved from the human Recon-2 Reactome), we have applied our loop-removal strategies in order to obtain thermodynamically viable flux patterns that both preserve the basic constraints of FBA, as well as the value of the objective function. In doing so, some inconsistencies in the reconstructions have been identified, which can easily be eliminated at the level of network building. Quite importantly in our view, we have also discussed the possibility to employ global variational criteria to generate thermodynamically feasible flux configurations. In particular, generalizing a previous observation, we have proven that flux patterns that minimize the p-norms of the fluxes are thermodynamically viable, provided they are allowed by the constraints. Otherwise, this idea can be used (with some care) to remove cycles that do not involve reactions that cannot be inverted or silenced.
The work presented here extends and improves over previous studies and takes several steps to suggest controlled and motivated methods to deal with thermodynamic inconsistencies in large networks of biochemical reactions. Further improvements along the lines discussed above (requiring, e.g., more precise physiological constraints) are clearly possible. Most promisingly, however, we believe that work directed at enhancing the integration of thermodynamic constraints into flux analysis would be extremely important in light of the current efforts aimed at increasing the scope, reach and predictive power of computational models of cellular metabolism. In absence of sufficiently detailed biochemical information about metabolite levels in vivo or chemical potentials, general stoichiometry-based techniques must be expected to play a key role in this endeavor.
Table 1. Overview of the results obtained for the human tissue-specific metabolic networks (with the biomass objective function). Columns are as follows. Cell type: abbreviations for the tissue-specific metabolic networks examined, for the full names please refer to the text (head of Section 3.2). N and M: overall number of reactions and metabolites appearing in the network. N_{FBA} and M_{FBA}: number of active reactions and produced/consumed metabolites in the Flux Balance Analysis (FBA) solution. # cycles: number of cycles that the local strategy needs to correct. N_{local} and M_{local}: number of active reactions and produced/consumed metabolites in the FBA solution corrected by the local strategy. N_{global} and M_{global}: the number of active reactions and produced/consumed metabolites in the FBA solution corrected by the global strategy. q_{FBA,local}: overlap between the FBA solution and the solution corrected by the local strategy. q_{FBA,global}: overlap between the FBA solution and the solution corrected by the global strategy. q_{local,global}: overlap between the FBA solution corrected by the local and global strategies. ΔG sign: sign of the free energy difference obtained for the ATP hydrolysis in the solution obtained via the global correction strategy. |
Cell type | N | M | N_{FBA} | M_{FBA} | # cycles | N_{local} | M_{local} | N_{global} | M_{global} | q_{FBA,local} | q_{FBA,global} | q_{local,global} | ΔG sign |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bile duct | 2,076 | 1,445 | 1,009 | 743 | 215 | 516 | 554 | 367 | 476 | 0.706 | 0.559 | 0.781 | + |
Cer. cortex | 2,169 | 1,494 | 1,231 | 898 | 358 | 818 | 767 | 257 | 320 | 0.750 | 0.448 | 0.629 | + |
Cerv. ut. | 1,774 | 1,171 | 1,046 | 780 | 194 | 562 | 620 | 339 | 380 | 0.666 | 0.480 | 0.735 | - |
Gall blad. | 3,073 | 2,159 | 1,666 | 1,284 | 385 | 1,514 | 1,227 | 254 | 356 | 0.751 | 0.471 | 0.521 | + |
Kidney | 3,176 | 2,212 | 1,695 | 1,285 | 414 | 1,423 | 1,196 | 142 | 449 | 0.759 | 0.469 | 0.551 | + |
Lung macroph.. | 2,810 | 1,991 | 1,313 | 960 | 223 | 817 | 779 | 606 | 587 | 0.765 | 0.681 | 0.849 | - |
Pancreas | 2,821 | 1,951 | 1,319 | 948 | 409 | 814 | 797 | 225 | 534 | 0.756 | 0.534 | 0.701 | + |
Rectum | 2,976 | 2,041 | 1,328 | 1,135 | 406 | 989 | 1017 | 259 | 399 | 0.765 | 0.560 | 0.670 | - |
Small int. | 3,179 | 2,213 | 1,385 | 1,192 | 405 | 836 | 1023 | 185 | 206 | 0.776 | 0.578 | 0.745 | + |
Smooth muscle | 1,806 | 1,222 | 1,042 | 796 | 184 | 579 | 607 | 314 | 320 | 0.677 | 0.501 | 0.747 | + |
Tonsil ger. | 2,126 | 1,421 | 1,178 | 884 | 405 | 881 | 764 | 357 | 412 | 0.667 | 0.503 | 0.644 | - |
Tonsil sqam. | 2,573 | 1,718 | 1,719 | 1,250 | 423 | 1,455 | 1,188 | 301 | 403 | 0.718 | 0.334 | 0.430 | + |
Urot. blad. | 2,874 | 1,965 | 1,597 | 1,308 | 219 | 1,111 | 1,158 | 148 | 686 | 0.760 | 0.450 | 0.613 | + |
Uterus post-m. | 2,773 | 1,973 | 1,266 | 1,095 | 305 | 736 | 927 | 303 | 389 | 0.763 | 0.578 | 0.757 | + |
Uterus pre-m. | 2,793 | 1,982 | 1,376 | 1,157 | 208 | 924 | 1022 | 259 | 582 | 0.785 | 0.507 | 0.658 | + |
Acknowledgments
This work is supported by the DREAM Seed Project of the Italian Institute of Technology (IIT). The IIT Platform Computation is gratefully acknowledged.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Price, N.; Famili, I.; Beard, D.; Palsson, B. Extreme pathways and kirchhoff's second law. Biophys. J. 2002, 83, 2879–2882, doi:10.1016/S0006-3495(02)75297-1.
- Soh, K.; Hatzimanikatis, V. Network thermodynamics in the post-genomic era. Curr. Opin. Microbiol. 2010, 13, 350–357, doi:10.1016/j.mib.2010.03.001.
- Beard, D.; Babson, E.; Curtis, E.; Qian, H. Thermodynamic constraints for biochemical networks. J. Theor. Biol. 2004, 228, 327–333, doi:10.1016/j.jtbi.2004.01.008.
- Hoppe, A.; Hoffmann, S.; Holzhutter, H. Including metabolite concentrations into flux balance analysis: Thermodynamic realizability as a constraint on flux distributions in metabolic networks. BMC Syst. Biol. 2007, 1, 23:1–23:12.
- Qian, H.; Beard, D. Thermodynamics of stoichiometric biochemical networks in living systems far from equilibrium. Biophys. Chem. 2005, 114, 213–220, doi:10.1016/j.bpc.2004.12.001.
- Beard, D.; Liang, S.; Qian, H. Energy balance for analysis of complex metabolic networks. Biophys. J. 2002, 83, 79–86, doi:10.1016/S0006-3495(02)75150-3.
- Palsson, B.O. Systems Biology: Properties of Reconstructed Networks; Cambridge University Press: Cambridge, NY, USA, 2006.
- Bowden, A.C. Fundamentals of Enzyme Kinetics; Wiley-Blackwell: Weinheim, Germany, 2013.
- Ge, H.; Qian, M.; Qian, H. Stochastic theory of nonequilibrium steady states. Part II: Applications in chemical biophysics. Phys. Rep. 2012, 510, 87–118, doi:10.1016/j.physrep.2011.09.001.
- Frey, E.; Kroy, K. Brownian motion: A paradigm of soft matter and biological physics. Ann. Phys. 2005, 14, 20–50, doi:10.1002/andp.200410132.
- Beg, Q.; Vazquez, A.; Ernst, J.; de Menezes, M.; Bar-Joseph, Z.; Barabási, A.L.; Oltvai, Z.N. Intracellular crowding defines the mode and sequence of substrate uptake by Escherichia coli and constrains its metabolic activity. Proc. Natl. Acad. Sci. USA 2007, 104, 12663–12668, doi:10.1073/pnas.0609845104.
- De Martino, A.; Marinari, E. The solution space of metabolic networks: Producibility, robustness and fluctuations. J. Phys. Conf. Ser. 2010, 233, 012019, doi:10.1088/1742-6596/233/1/012019.
- Schrijver, A. Theory of Linear and Integer Programming; Wiley: New York, NY, USA, 1986.
- Orth, J.; Thiele, I.; Palsson, B.O. What is flux balance analysis? Nat. Biotechnol. 2010, 28, 245–248, doi:10.1038/nbt.1614.
- Segrè, D.; Vitkup, D.; Church, G. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl. Acad. Sci. USA 2002, 99, 15112–15117, doi:10.1073/pnas.232349399.
- Jankowski, M.; Henry, C.; Broadbelt, L.; Hatzimanikatis, V. Group contribution method for thermodynamic analysis of complex metabolic networks. Biophys. J. 2008, 95, 1487–1499, doi:10.1529/biophysj.107.124784.
- Fleming, R.; Thiele, I.; Nasheuer, H. Quantitative assignment of reaction directionality in constraint-based models of metabolism: Application to Escherichia coli. Biophys. Chem. 2009, 145, 47–56, doi:10.1016/j.bpc.2009.08.007.
- Kummel, A.; Panke, S.; Heinemann, M. Systematic assignment of thermodynamic constraints in metabolic network models. BMC Bioinforma. 2006, 7, 512:1–512:12.
- Alberty, R.A. Thermodynamics of Biochemical Reactions; Wiley: Hoboken, NJ, USA, 2003.
- Schellenberger, J.; Lewis, N.; Palsson, B.O. Elimination of thermodynamically infeasible loops in steady-state metabolic models. Biophys. J. 2011, 100, 544–553, doi:10.1016/j.bpj.2010.12.3707.
- Henry, C.; Broadbelt, L.; Hatzimanikatis, V. Thermodynamics-based metabolic flux analysis. Biophys. J. 2007, 92, 1792–1805, doi:10.1529/biophysj.106.093138.
- Müller, A.; Brockmayr, A. Fast thermodynamically constrained flux variability analysis. Bioinformatics 2013, 29, 903–909, doi:10.1093/bioinformatics/btt059.
- Beard, D.A.; Qian, H. Chemical Biophysics; Cambridge University Press: Cambridge, NY, USA, 2008.
- De Martino, D.; Figliuzzi, M.; de Martino, A.; Marinari, E. A scalable algorithm to explore the gibbs energy landscape of genome-scale metabolic networks. PLoS Comp. Biol. 2012, 8, e1002562, doi:10.1371/journal.pcbi.1002562.
- De Martino, D. Thermodynamics of biochemical networks and duality theorems. Phys. Rev. E 2013, 87, 053108, doi:10.1103/PhysRevE.87.053108.
- Johnson, D.B. Finding all the elemtary circuits of a directed graph. SIAM J. Comput. 1975, 4, 77–84, doi:10.1137/0204007.
- Wright, J.; Wagner, A. Exhaustive identification of steady state cycles in large stoichiometric networks. BMC Syst. Biol. 2008, 2, 61:1–61:9.
- Wiback, S.; Famili, I.; Greenberg, H.; Palsson, B.O. Monte Carlo sampling can be used to determine the size and shape of the steady-state flux space. J. Theor. Biol. 2004, 228, 437–447, doi:10.1016/j.jtbi.2004.02.006.
- Price, N.; Schellenberger, J.; Palsson, B.O. Uniform sampling of steady-state flux spaces: Means to design experiments and to interpret enzymopathies. Biophys. J. 2004, 87, 2172–2186, doi:10.1529/biophysj.104.043000.
- Mezard, M.; Montanari, A. Information, Physics, and Computation; Oxford University Press: Oxford, NY, USA, 2009.
- Feist, A.; Henry, C.; Reed, J.; Krummenacker, M.; Joyce, A.; Karp, P.; Broadbelt, L.; Hatzimanikatis, V.; Palsson, B.O. A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol. Syst. Biol. 2007, 3, 121:1–121:18.
- Thiele, I.; Swainston, N.; Fleming, R.M.T.; Hoppe, A.; Sahoo, S.; Aurich, M.K.; Haraldsdottir, H.; Mo, M.L.; Rolfsson, O.; Stobbe, M.D.; et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 2013, 31, 419–425, doi:10.1038/nbt.2488.
- Schellenberger, J.; Que, R.; Fleming, R.M.T.; Thiele, I.; Orth, J.D.; Feist, A.M.; Zielinski, D.C.; Bordbar, A.; Lewis, N.E.; Rahmanian, S.; et al. Quantitative prediction of cellular metabolism with constraint-based models: The COBRA Toolbox v2.0. Nat. Protoc. 2011, 6, 1290–1307, doi:10.1038/nprot.2011.308.
- Shlomi, T.; Cabili, M.N.; Herrgård, M.J.; Palsson, B.O.; Ruppin, E. Network-based prediction of human tissue-specific metabolism. Nat. Biotechnol. 2008, 26, 1003–1010, doi:10.1038/nbt.1487.
- Krauth, W.; Mezard, M. Learning algorithms with optimal stability in neural networks. J. Phys. A 1987, 20, L745–L752, doi:10.1088/0305-4470/20/11/013.
- Binder, K.; Heermann, D.W. Monte Carlo Simulation in Statistical Physics; Springer: Heidelberg, Germany, 2002.
- De Martino, A.; de Martino, D.; Mulet, R.; Uguzzoni, G. Reaction networks as systems for resource allocation: A variational principle for their non-equilibrium steady states. PLoS One 2012, 7, e39849, doi:10.1371/journal.pone.0039849.
- Schilling, C.H.; Letscher, D.; Palsson, B.O. Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. J. Theor. Biol. 2000, 203, 229–248, doi:10.1006/jtbi.2000.1073.
- Mahadevan, R.; Schilling, C. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab. Eng. 2003, 5, 264–276, doi:10.1016/j.ymben.2003.09.002.
Supplementary Files
© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).