Counting and correcting thermodynamically infeasible flux cycles in genome-scale metabolic networks

Thermodynamics constrains the flow of matter in a reaction network to occur through routes along which the Gibbs energy decreases, implying that viable steady-state flux patterns should be void of closed reaction cycles. Identifying and removing cycles in large reaction networks can unfortunately be a highly challenging task from a computational viewpoint. We propose here a method that accomplishes it by combining a relaxation algorithm and a Monte Carlo procedure to detect loops, with ad hoc rules (discussed in detail) to eliminate them. As test cases, we tackle (a) the problem of identifying infeasible cycles in the E. coli metabolic network and (b) the problem of correcting thermodynamic infeasibilities in the Flux-Balance-Analysis solutions for 15 human cell-type specific metabolic networks. Results for (a) are compared with previous analyses of the same issue, while results for (b) are weighed against alternative methods to retrieve thermodynamically viable flux patterns based on minimizing specific global quantities. Our method on one hand outperforms previous techniques and, on the other, corrects loopy solutions to Flux Balance Analysis. As a byproduct, it also turns out to be able to reveal possible inconsistencies in model reconstructions.

Starting from the discovery by Lavoisier concerning the relation between respiration and combustion, thermodynamics stands as a key physical framework for understanding metabolism and physiology, from single cell to whole organisms.When applied to a given metabolic reaction network, at the simplest level, thermodynamics requires that, in nonequilibrium steady states, fluxes of matter proceed downhill in the underlying Gibbs (free) energy landscape.Violations of this rule (which corresponds to nothing but the second law of thermodynamics) are signaled by the existence of unphysical cycles in flux configurations [1].In the current era of metabolic genome-scale reconstructed networks, the implementation of such a constraint in computational models of a cell's metabolism has far-reaching implications [2], ranging from the physical feasibility of flux configurations [3] to the estimation of metabolite levels [4], the assignment of directionality for reactions and pathways [5], and the characterization of the overall chemical energy balance [6].Accounting for thermodynamics in genome-scale models, however, poses considerable practical problems both for algorithms and for CPU costs.
The reference modeling scheme that we shall consider here is given by the so-called constraint-based models [7], widely employed in the literature to descibe the operation of a biochemical reaction network at steady states with timeindependent metabolite levels.While building a detailed model of metabolism presupposes knowledge of the kinetic parameters and reaction mechanisms [8], and should possibly take into account stochasticity [9] and spatial diffusion [10,11], constraint-based models focus on well-mixed nonequilibrium steady states (NESSs) for the reaction fluxes, to recover which the fundamental information comes from the underlying stochiometry alone.For a given stoichiometric matrix S = {S mr } that accounts for the stoichiometric coefficient of metabolite m in reaction r (with the usual sign convention to distinguish products from substrates), a flux vector v = {v r } represents a non-equilibrium steady state if it enforces the balance of metabolite levels c = {c m }, i.e. if ċ ≡ Sv = 0 . ( In usual applications, physiological aspects constrain fluxes to vary with certain ranges, so that bounds of the type v r ∈ [v min r , v max r ] are normally prescribed for every reaction r.Such bounds may reflect, for instance, the fact that certain processes are known to be physiologically irreversible (e.g.v r ≥ 0) or are required to occur at precise rates (as can be the case for maintenance reactions).From a geometric point of view, under (1) and the bounds on fluxes, the space of possible NESSs is represented by a convex polytope.If all flux configurations inside this volume could be considered as physically realizable solutions, one might assess the 'typical' productive capabilites of the network by sampling them using a controlled algorithm [12].Unluckily this route often turns out to be computationally too expensive for large enough systems.Alternatively, one may search for the state(s) that maximize the value of certain biologically motivated objective functions, which can usually be cast in the form of a linear combination of fluxes that represents the selective production of a given set of metabolites.The flux configurations that maximize such a linear functional can be retrieved with the methods of linear programming [13], the textbook case being growth yield maximization for bacterial cells in culture.Such a framework, known as Flux Balance Analysis (FBA) [14], has been shown to be predictive in many instances, even under genetic and/or environmental perturbations [15] (possibly with small modifications).
Solutions of (1) are in general not guaranteed to be thermodynamically viable.Frameworks like FBA can however be modified to include thermodynamic constraints directly in order to generate thermodynamically viable flux configurations, for instance by resorting to empirical data to estimate the chemical potentials of metabolites [16] and infer reaction reversibility more precisely [17,18].As a matter of fact, a large part of thermodynamic inconsistencies appear to be due to fallacious direction assignments.Models of this type, however, require prior biochemical information that is often scarce or unavailable [19].To overcome these difficulties, new methods were devised that detect infeasible loops leveraging only on the constraint based model, i.e. on the structure on the metabolic network alone [20][21][22].Although these methods release the need for experimental knowledge, the direct detection of infeasible loops is a computationally demanding task that limits their applicability.It is therefore important to devise algorithms that are able to identify and remove thermodynamic inconsistencies from solutions of (1) or, more generally, from generic flux patterns.
Checking thermodynamic feasibility of a flux pattern can be made straightforward.Denoting by v a flux vector, from which we exclude uptakes and every reaction that cannot be associated directly to a thermodynamic constraint (like biomass production, "effective" reactions with non-integer stoichiometry, or null fluxes), let us define the matrix Ω = {Ω mr } with elements Ω mr = −sign(v r )S mr .Note that the minus sign in the definition of Ω is needed to connect the directions of the reactions to the corresponding Gibbs energy differences.Thermodynamic feasibility of v is easily seen to be guaranteed (see the Supporting Text for a toy example) if a non-zero vector µ = {µ m } (of chemical potentials) exists such that [23] µΩ > 0 . ( (Note that, for physiologicaly realism, one would like the individual µ m 's to lie in specific ranges.As we shall not be concerned here with reconstructing the cellular Gibbs energy landscape [24], this aspect will be neglected in what follows.)Solving ( 2) can be done very efficiently, e.g., via relaxation algorithms [13].By Gordan's theorem of the alternatives (see e.g.[25]), if (2) has no solution, then necessarily its dual sys-tem with k = {k r } possesses at least one non-zero solution with k r ≥ 0 for each r.It is easy to understand that such vectors k represent closed cycles of reactions that could in principle be able to perform work without using free energy, contradicting the laws of thermodynamics (note however that the topology of such cycles may turn out to be remarkably complex, see e.g.[24]).The problem posed by thermodynamics can then be seen as that of identifying and removing such loops.Finding all cycles in a directed (bipartite) network is, at heart, an integer programming problem in the NP-hard class [26], which suggests that using deterministic algorithms to find loops in large enough networks may be unwise.However, for networks in which reliable prior thermodynamic information is available the complexity of loop counting can be significantly reduced, and indeed in some cases the problem has already been tackled (altough, in our view, not fully solved) in genome-scale networks with some degree of success [24,27].By contrast, in large networks lacking detailed thermodynamic information, like the human metabolic networks, the implementation of thermodynamic constraints requires the development of algorithms that are able to handle more difficult instances of the loop counting problem.Luckily, in many hard computational problems where the use of exact algorithms is prevented by CPU costs, stochastic methods have proved to be effective.A biologically relevant case is represented by the problem of sampling solutions of (1), for which Monte Carlo [28,29] and message-passing techniques [30] are being employed instead of deterministic methods (as the latter presuppose the enumeration of a possibly exponential number of vertices of the polytope).It is simple to guess that a similar strategy might be employed (as we shall see, with some care) for the analysis of the solutions of (3), i.e. to identify reaction cycles.
The strategy we present here combines a relaxation algorithm and a Monte Carlo method to allow for the thorough analysis of thermodynamic infeasibilities on genome-scale metabolic networks of unprecedented size.More precisely, loops will be found by applying Monte Carlo to (3) with a reduced search space obtained by analysing how relaxation behaves when applied to (2).Once a loop is found, it can be removed in several ways, provided they don't violate any of the constraints other than thermodynamic (e.g.mass balance).We shall discuss and compare different approaches: more precisely, a 'local' rule that exploits, in essence, the fact that fluxes in cycles are defined up to a constant, and a 'global' rule, based on the minimization of an overall function of the fluxes.The method will be used to analyze different types of networks of large size.Specifically, we shall first identify all loops in the metabolic network of E. coli [31], then focus on amending the FBA solutions of 15 different human metabolic network models derived from the genome-scale reactome Recon-2 [32], all bearing a specified objective function.Such solutions turn out to be rich with infeasible cycles, which we are able to find and correct.
The structure and rationale of the method we propose are discussed in detail in Section 2, together with a brief summary of the network reconstructions we shall employ.Section 3 exposes our results, while our conclusions are reported in Section 4.

A. Materials: metabolic network reconstructions
The human reactome Recon-2 [32] has been reconstructed by a community that merged and integrated existing global human metabolic networks and transcriptional information on specific human cell types.Authors verified the quality of Recon-2 by determining how many tasks the network was able to perform.A task can be as simple as the transformation of a metabolite by a single enzyme or by a complex pathway -like fermentation or oxydative phosphorylation-or as complex as the production of the building blocks, energy, cofactors, etc. required for cell duplication, i.e. biomass.For E. Coli, and in general for unicellular organisms, biomass yield is a valuable objective function for the FBA framework [33], since its maximization essentialy equals growth maximization at fixed nutrient intake.Although it is unlikely that, in normal circumstances, cells in a multicellular organism maximize the biomass yield, we stick to it as the FBA objective function as, for our purposes, the objective function can be seen merely as a tool to obtain motivated flux patterns for thermodynamic analysis.
In addition to the global reconstruction, [32] provides a collection of 65 drafts of cell-specific networks, which are derived from Recon-2 by means of an automatic procedure that utilizes proteomic data [33,34].We focused on 15 networks with the ability to produce biomass, which we list in the first column of Table I together with the number of reactions (N ) and metabolites (M ) included in each case.We have used in particular the network reconstructions in SBML format from [32] and resorted to the COBRA Toolbox [33] for the FBA analysis and to produce the Stoichiometric matrix and the list of metabolites and reactions to be used in our analysis.
We have also analyzed the reconstructed metabolic network of the bacterium E. coli derived in [31], consisting of 2382 reactions (including 305 uptakes) among 1668 metabolites.In this model, 548 reactions are putatively reversible.
In each case, the key information we employed is encoded in the stoichiometric matrix S.

Algorithm for thermodynamic analysis: structure
Before describing the algorithm in detail, we briefly recall the idea behind the procedure.We do not directly assess whether the flux configuration is loop free, but we try to compute the chemical potentials that satisfy Eq. ( 2), which is a computationally easier problem to solve.If such a a solution exists, we are guaranteed that the flux configuration does not contain infeasible loops.It can be demonstrated [24] that the relaxation method described below always converges polynomially to a solution, lack of convergence signals the presence of infeasible loops.Even when the relaxation method does not converge, it still provides us with a list of reactions that are likely to contain infeasible loops.To this limited subset of reactions, we can directly apply algorithms to find loop-free solutions.
The overall structure of the algorithm is reported in Fig. 1. (A) Input: the input information includes a stoichiometric matrix S, a flux vector v (e.g. a solution of FBA) and a prior vector µ of chemical potentials.Initialize an integer variable t at t = 0 (relaxation steps) and an empty list.

Input
(B) Compute the matrix Ω and evaluate the thermodynamic constraints (2), i.e. compute µΩ.If they are satisfied, i.e. if µΩ > 0, go to (D); else register the least unsatisfied constraint (l.u.c.), i.e. the value of the index r for which the corresponding components of the vector µΩ is smallest (more negative), insert it into the list, and increase the t variable by 1 ; if t ≤ T , with T a predefined large parameter, go to (C.1); else go to (C.2).
(C.1) Update the vector µ by performing a single step of the relaxation algorithm described in Sec.II B 2; update the list by inserting the new l.u.c. and go back to (B).
(C.2) Perform a Monte Carlo computation, as described in Sec.II B 3, in order to find a solution of system (3), namely Ωk = 0, including only the reactions appearing in the list.Once a solution is found, correct the associated cycle as described in Sec.II B 4 and II B 5; re-initialize t, empty the list, and go back to (B).
In the following sections, we shall describe the subprocedures (relaxation method, Monte Carlo and cycle removal) of the algorithm in detail.A C++ code which performs each of the above steps is provided as Supporting Material.It is worth to point out that the present study is not concerned with the calculation of realistic chemical potentials.Rather, we simply require their existence in order for the flux configuration to be feasible.In this case, the prior vector of chemical potentials can be arbitrary, e.g.constant or composed by i.i.d.random variables.If complemented with specific experimentally determined or computationally estimated priors for the chemical potentials, however, the relaxation method included in the above algorithm generates, as a by-product, a free energy vector compatible with the final flux configuration and can thus be employed to refine experimental data on the free energy of formation of metabolites and/or on their levels [24].Better priors ultimately allow to obtain more precise estimates for the real chemical potentials, but are essentially irrelevant for the convergence of the relaxation method.In what follows we shall neglect this aspect, which is discussed in depth in [24], and focus exclusively on the retrieval of cycles.

Checking thermodynamic viability by relaxation
This routine, corresponding to point C.1 of the flow chart, allows to retrieve a solution of (2) starting from a vector of chemical potentials that is not a solution thereof.For simplicity, we construct an initial vector made of uniformly distributed random numbers.At any step t of the procedure, given a chemical potential vector µ(t), the relaxation algorithm corrects the l.u.c. of (2) through the dynamics defined by where r t is the index of the l.u.c.added to the list at the t-th step and α > 0 is a constant.As explained in [24], the above step simply shifts chemical potentials in a direction that will improve the l.u.c.The parameter α can be chosen in different ways, from a suitably small constant (as in the so-called Minover scheme [35]) to a quantity proportional to the amount by which the constraint is violated (as in the Motzkin scheme [13]).The Minover scheme returns a solution that is tipically closer to the prior, but here we utilized the faster Motzkin scheme.The above algorithm is known to converge to a solution of (2), upon iteration, in polynomial time if and only if a solution exists.If convergence fails, instead, by the Gordan theorem the reaction pattern contains infeasible cycles.Convergence of relaxation therefore guarantees feasibility of a flux vector.In presence of loops, an iteration of the above dynamics tipically cycles among inconsistent constraints.Therefore, by keeping track of the l.u.c.over the iterations, i.e. by recording the the series {r t } for t ≥ 0 (corresponding to the content of the list described in the flow chart), one can build a list of reactions that are candidates for being responsible for the failed convergence.If relaxation doesn't converge to a solution in a resonable time (denoted as T above), we look for infeasible cycles, i.e. for solutions of (3), within such a restricted list.To this aim, we employ a Monte Carlo method.

Identifying loops by Monte Carlo
As said above, cycles generically correspond to solutions of (3) with k ≥ 0. As the stoichiometric coefficients are typically integers, one can focus on searching solutions with k r non-negative integers for each r.To this aim, the following method (borrowed from the standard statistical physics toolbox) can be employed.Starting from (3), note that the func-tion vanishes when k defines a flux cycle.Because E ≥ 0, then, infeasible loops correspond to the minima of E, and loop finding amounts to locating the minima of E in the search space k r ∈ {0, 1, 2, . ..} for each r.Monte Carlo methods are ideally suited to tackle this type of problems [36].In brief, such methods (the most famous of which is possibly the Metropolis scheme) generically generate vectors k distributed according to where β > 0 is an externally fixed parameter.When β → ∞, the above measure concentrates around the minima of E. One possibility to make sure that large enough values of β are reached is to initialize the Monte Carlo simulation at some small value of β and then increase β in a controlled way, initializing each time from the configuration retrieved at the previous value of β ('simulated annealing').This is precisely the approach we have employed here: in order to identify the infeasible loops, we have performed iterated Metropolis-based annealings to minimize the fictitious 'energy' (6).(The increase in performance warranted by the annealing procedure compared to the simple Metropolis scheme at a fixed temperature is discussed in the Supporting Text.)It is worth noting that, based on the above discussion of the relaxation method, the number of reactions to be included in the above procedure equals the number of distinct reactions appearing in the list, which is usually much smaller than N .To give an idea, in the study of E. coli whose results are reported below, our lists ended up containing at most 50 reactions, to be compared with the over 2000 that form the genome-scale reconstruction.Hence the computational costs of the Monte Carlo step of our algorithms are overall modest.

Correcting the flux configuration: local strategy
Once a flux cycle has been identified, there are mutiple ways to remove it and re-organize the flux pattern while still preserving all constraints and, eventually, the values of objective functions.
To clarify the situation, consider the following simple example with four reactions, pictured in Figure 2. v 1 and v 2 are "internal" fluxes, while v 3 and v 4 are an intake and an outtake flux, respectively.The stoichiometric matrix S and the flux vector v reads: With this formalism, the mass balance constraints (1) are given by the homogeneous equation Sv = 0. We also fix the value of the uptakes to, for example, v 3 = v 4 = 2, and add a lower bound on the first flux, v 2 ≥ (with a constant).After the elimination of the uptakes, the internal stoichiometric matrix and flux vector (which for clarity in this section we denote as S int and v int ) are given by Since we fixed the values of v 3 and v 4 , we can move them to the r.h.s. of the mass balance constraints, obtaining Equation ( 10) must be solved together with the constraint v 2 ≥ , which we will treat separately.Given a solution v of (10), any linear combination of the form still satisfies the mass balance equations (10), provided n a is in the right null space of the internal stoichiometric matrix, i.e. is a solution of S int n a = 0, and L = {L a } is a family of real numbers.If we now recall that loops k are non negative solutions of Eq. (3), i.e. of Ωk = 0, where the matrix Ω is built from S int as Ω mr = −sign(v r )S int mr , we see that, by construction, the relation n a r = sign(v r )k a r connects infeasible loops k a (a = 1, 2, . ..) and the solutions of S int n a = 0.In other terms, the presence of cycles causes a degeneracy in flux patterns, as there are many ways to assign fluxes to reactions in a cycle.
Eq. ( 11) can be used to correct the infeasible loops, while still satisfying all mass balance constraints.The simplest possible correction scheme is based on the idea that by properly fixing the value of the coefficients L a one can lift the degeneracy and rid the flux configuration of loops.There is however a major caveat.To make it explicit, we note that (a) the signs of the fluxes v r depend on the choice of the coefficients L, so that the matrix Ω will also depend on it, and (b) it is not guaranteed that the new fluxes v will vary within the same bounds as v.This means that equation (3) must be solved together with all other constraints which are not related to the stoichiometry, such as sign constraints.We will write these constraints in a general fashion as Av ≥ b.In our example, this matrix inequality reduces to v 2 ≥ .
With this notation, the space of vectors L yielding thermodynamically feasible solutions is given by C 1 ∩ C 2 , where The first set contains all constraints which are not related to stoichiometry, while the second one contains the thermodynamic ones.Now, unluckily, these two sets may or may not have points in common, depending on the properties of the network and on the additional constraints (in other words, it may not be possible to choose L properly).Suppose now that, in our example, we are given the flux vector v = (3, 1) as solution to Eq. (10).We see that the vector n = (1, 1) is in the null space of S int , and that the new vector v = v + Ln still satisfies (10).Also, since both v 1 and v 2 are positive, n itself identifies a loop, i.e. it is a solution of (3) (for, in this case, Ω = −S int ).It can be easily checked that, in this example, as long the constraint v 2 ≥ is not taken into account we can pick L in the interval [−3, −1] to get rid of the cycle.The choice is arbitrary and produces a fully directional flux pattern such that reactions v 1 and v 2 , if both active, operate in the same direction.The constraint v 2 ≥ , however, implies L ≥ − 1.The sets C 1 and C 2 are then given by So we see that if < 0 there are infinitely many values of L which remove the cycle, whereas the cycle cannot be removed if > 0. In other words, each time a flux is constrained to keep a pre-defined sign there is no guarantee that the loops involving this reaction can be corrected by simply lifting the flux degeneracy associated to them.The value = 0 plays here a particular role, since v 2 ≥ 0 is an irreversibility constraint.In this case, the intersection of the two sets above imposes L = −1.This is indeed the most common kind of constraint on the fluxes, and allows for the simplest unambiguous loop removal strategy: setting the smallest flux to zero without changing signs to any of the other fluxes involved (as occurs in the present case upon choosing L = −1).
From this last observation we can deduce the following general loop removal strategy (to which we refer as the 'local' correction strategy): for a given loop k a , choose the value of L a that sets to zero the flux of the reaction whose absolute value is the smallest, i.e.
With this choice, every constraint of the form v r ≥ 0 will still be satisfied, and at least one loop will be removed.This strategy tends, in a sense, to minimize the distance between the original flux pattern and the corrected one.(We will quantify this aspect more precisely below.)On the other hand, if bounds like v r ≥ with > 0 are present, a more careful analysis is required.We finish by noting that the most important instance of a constraint of the latter type in models of metabolism is represented by the ATP maintenance flux.

Correcting the flux configuration: global strategy
Other possible loop-removing procedures are based on the minimization of some norm of the fluxes, as suggested in [37].Let us elaborate this idea further, and consider the function r |v r | p with p ≥ 1, representing, for different p's, different norms of the flux vector v (Q 1 is the so-called 'Taxicab' norm, Q 2 is the square of the Euclidean norm, etc.).Suppose we have an FBA solution v which minimizes Q p .If {n a } is the set of all null space vectors of the stoichiometric matrix, we can construct a new solution v = v + a L a n a and compute the partial derivative of Q p with respect to a coefficient L a .The derivatives vanish when evaluated at L = 0: We see immediatly that the quantities k a r = sign(v r )n a r can not have a definite sign.Restricting the sum to all terms with n a r = 0, we have two cases: 1.If at least one of the fluxes v r is zero, this reaction can not be involved in any cycle.In particular, n a is not associated to a loop.
2. If all fluxes are non-zero, the vector k a cannot have a definite sign (positive or negative) since the sum of its entries, namely (17), weighted with some positive coefficients, is zero.
Therefore, the vector v which minimizes Q p does not contain cycles.The argument can be easily extended to include irreversibility constraints.Let v denote the flux configuration which minimizes Q p (v) with irreversibility constraints v r ≥ 0 for the reactions r beloging to the set I = {r 1 , r 2 , . . .}.We shall instead denote by I 0 = {r 1 , r 2 , . . .} ⊆ I the set of irreversible reactions for which v r = 0 in v .Clearly, v also minimizes Q p subject to the stronger constraints v r = 0 for r ∈ I 0 and v r > 0 for r ∈ I \ I 0 .Given this, one can now proceed along the same lines as before, because, for any vector n a in the null space of S int , • If some reaction r for which n a r = 0 is forced to have zero flux since r ∈ I 0 , then n a is not associated to a cycle; • Otherwise, we can demonstrate that n a does not correspond to a cycle by taking the partial derivative of Q p (v * + L a n a ) as done above.
Problems may arise, as before, when boundary conditions like v r ≥ with > 0 have to be considered.In particular, if the flux of a variable thus bounded is fixed to take the value there is the possibility that the cycle cannot be removed.In the example discussed above (see Figure 2), is still feasible (in particular, the configuration is feasible for = 0); • If > 0, then v 2 = and the optimal flux configuration is not feasible.
In summary, the global minimization of the norm Q p produces thermodynamically feasible flux patterns, provided they are allowed by the constraints.If not, the minimization can get rid of all loops not involving reactions constrained to keep the same sign.We shall term the cycle-removal strategy based on minimizing a norm as the 'global' strategy.

III. RESULTS
A. A test: identifying infeasible loops in the E. Coli network iAF1260 As a proof of principle, we have applied our method to search and enumerate all independent infeasible loops of a large metabolic network reconstruction for the bacterium E. Coli, the iAF1260 [31].Other authors have attempted to solve the same enumeration problem before (see e.g.[27]).We note however that our method is radically different in that we make use of the theorem of alternatives and do not directly search for loops on the graph, which is the more standard route [38], or rely on subsequent optimizations that reduce the search space [27,39].In the present case, we characterize cycles in an ensemble of net-flux patterns generated randomly by assigning a specific operating direction to each reaction according to its reversibility.(More precisely, random flux patterns are generated by simply assigning an operation direction for each reaction as follows: if the reaction is irreversible, we pick the allowed direction; if the reaction is reversible, we select the forward or the reverse direction randomly with probability 1/2 (note that direction assignments suffice to pose the problem of thermodynamic feasibility).In this way, all reactions are active, a worst-case scenario with respect to a growth-yield optimizing state that normally only requires the operation of around 30% of the reactions, implying (in our case) a much larger number of loops and, in principle, higher computational costs for loop counting.For each configuration, we look for and eliminate cycles until the material flow is thermodynamically consistent, recording the cycles that we have detected.Finally, we keep only independent loops by applying Gaussian elimination (i.e., we exclude from our list loops that can be decomposed as the sum of, say, two simpler loops).
In Fig. 3, we display the number of independent loops that we identify as a function of the number of random configurations tested.We identify 196 loops (189 of which turn out to be of size three or more) after having generated about 80,000 random configurations, and no new loops appear upon enlarging the test ensemble.The loops thus found are listed in the Supporting File 1, and a histogram of the cycle lengths (in terms of the number of reactions involved) is displayed in Fig- ure 4. We note that, in [27], 591 cycles were identified, 564 of which are however formed by 2 reactions, mostly originating from the fact that reversible reactions were, in that study, split in two separate processes (forward and reverse).Therefore, only 27 of those cycles were formed by 3 reactions or more.Because we do not split reversible reactions, we find only 7 cycles of length 2 and 189 loops of length at least equal to 3. We note that these 189 cycles span 396 reactions altogether.This suggests that, for the technique employed in [27], some loops were undetectable once the exhaustive search had been restricted to 50 reactions.We stress however that the proce- dure discussed in [27] is in principle exact and once the restriction is removed it might be able to identify more cycles involving at least three reactions.We now move on to the identification of thermodynamic infeasibilities in the human reactome Recon-2 [32].In specific, we have analyzed the feasibility of flux patterns defined by solving FBA on the entire reactome, using the 'biomass' reaction that comes with the reconstruction as the objective function.This section provides a concrete example of an incon- sistency that is unrecoverable without correcting basic structural information concerning the network.It should be kept in mind, however, that the physiologically relevant metabolic networks that can be obtained from Recon-2 are the cell-type specific ones, which will be discussed in the following section.
As almost all metabolic objective functions, the biomass reaction of Recon-2 contains ATP hydrolysis, representing the energetic requirements associated to cell duplication that are not explicitly accounted for by the flux organization.As such requirements are typically large, the stoichiometry of ATP in the biomass reaction is often two orders of magnitude larger than that of the other chemical species.Hence, ATP tends to be the limiting factor for biomass production and FBA solutions will often organize metabolic fluxes so as to produce as much ATP as possible.This however turns out to lead, in Recon-2, to a violation of thermodynamics.In particular, in the FBA solution for Recon-2 we detect a huge number of cycles involving the active and passive transport of a metabolite through a membrane, as e.g. for the transport of Stearoyl-CoA (stcoa) from cytosol (c) to peroxisomes (x), namely (see Figure 5) Note that both reactions are listed as reversible.When the chemical potential difference drives stcoa from peroxisome to cytoplasm, the cell can actively transport Stearoyl-CoA to peroxisomes by consuming ATP.By reverting both reactions, however, the cell could produce ATP at no expense.This is precisely the type of solution that we obtain when we maximize the biomass yield in Recon-2.ATP-coupled reactions are a common, though not the only, source of thermodynamic inconsistencies that can be spotted in Recon-2 (see Supporting File 2 for the complete list of cycles we identified in the Recon-2-derived cell-type specific networks).It is however important to stress that they are spu-rious and may be identified easily by complementing Recon-2 with a maintenance reaction that mimics the energy expenditure associated with basal processes (similar to those that are present in bacterial metabolic networks), and even cured automatically (or with an automated procedure) by fixing the directionality of active transports directly in the reconstruction (when possible).
C. Correcting infeasible loops in FBA solutions for cell-type specific human metabolic networks In this section, we focus on finding and correcting infeasible loops in FBA solutions of the cell-type specific human metabolic networks obtained by Recon-2.We have restricted our attention to 15 networks carrying an objective function, representing respectively cerebral cortex neuronal cell, liver bile duct cell, cervix uterine squamous epithelial cell, kidney tubule cell, gall bladder cell, lung macrophage, small intestine glandular cell, rectum glandular cell, smooth muscle cell, urinary bladder urothelial cell, pre-and post-menopause uterus glandular cell, pancreatic exocrine glandular cell, tonsil germinal cell and squamous epithelial cell.We first computed the FBA solutions for each of the networks via the CO-BRA Toolbox [33][40].Subsequently, we identified infeasible loops using the method described in Section II B 3 and, finally, corrected thermodynamic inconsistencies using both the local and global strategies described in Sections II B 4 and II B 5 (in the latter case, minimizing the Q 1 norm, while fixing sinks, uptakes, and objective function to the values of the FBA solution.)In particular, with the local strategy we eliminate one infeasible loop at a time making sure that no constraint is violated by the corrected solutions, including the value of the objective function.We note however that the local strategy does not return a unique thermodynamically consistent network, since the final flux pattern may depend on the order with which loops are removed.We shall see that, quite generically, this strategy produces flux patterns that are more similar to the original (infeasible) solutions than those generated by the global correction strategy.
Results are shown in Table I, where we list different topological quantities (specifically, the overall number of reactions and metabolites, and the number of reactions carrying a nonzero flux and that of metabolites that are produced and consumed by at least one reaction) for the original (infeasible) FBA solutions and for the corrected flux patterns, both for the local and global strategies, as well as the number of loops in the FBA solutions that need to be corrected by the local strategy.One sees that the local strategy typically needs to resolve several hundreds of inconsistencies in order to obtain viable solutions, and that correction strategies enforce a reduction in the number of active processes, that in certain cases can be rather dramatic.Supporting File 2 lists the cycles we identified and corrected in each of the 15 metabolic networks we have analyzed.To quantify more precisely the similarity between the solutions thus obtained, we have measured the 'overlap' parameter defined as follows: given two flux configurations v a = {v a r } and v b = {v b r }, we let Clearly, q ab = 1 if v a = v b , while the more different fluxes are in the two solutions the smaller q ab gets, until q ab = −1 if v a = −v b .Larger values of q ab therefore generically point to the fact that the two flux vectors are more similar also in terms of their directions.(Note that in computing (19) one should account for the fact that a flux that is null in both solutions contributes 1 to the above sum [41].)Values of the overlaps between the three solutions we consider (original FBA, FBA corrected by the local strategy, FBA corrected by the global strategy) are also displayed in Table I, clarifying that the local strategy applied to our sample always generates flux patterns that are closer to the original (infeasible) solution than those obtained by the global strategy.Nevertheless, the overlap between the locally-and globally-corrected solutions can also be rather large in some cases, suggesting that a common physical, possibly variational, requirement may underlie, to some extent, the two criteria.
The final column of Table I shows the sign of the Gibbs energy change of ATP hydrolysis that is obtained in the solution corrected by the global strategy.This provides an interesting check of physiologic consistency, as solutions should be compatible with a spontaneous ATP hydrolysis in vivo (i.e. with a negative Gibbs energy difference).We find that only for five models does Q 1 minimization provide (thermodynamically feasible) flux configurations carrying a negative Gibbs energy difference for ATP hydrolysis.A possible, simple to obtain improvement of the method we present indeed includes taking into account physiological aspects when correcting a flux configuration.We stress once more, however, that these types of infeasibilities are due to inconsistent constraints or wrong reversibility assignments that prevent the existence of feasible, energetically realistic flux patterns, and can be eliminated already at the stage of network reconstruction.Our main goal here was to show that our method is capable of identifying and correcting loops.By this type of examples we prove that it can furthermore point to possible limitations of the current models.

Accounting
for thermodynamic constraints in stoichiometry-based flux models, though potentially highly rewarding (in terms of the possibility to predict metabolite levels, chemical potentials, reaction free energies and reversibility), is a generically hard task.Methods that integrate directly with the constraints defining the space of viable fluxes are often computationally intensive and either presuppose prior biochemical knowledge or lead to a considerable increase in the number of parameters (or both).The technique presented here makes use of stoichiometry alone (hence, it is essentially a topological method) and allows to accomplish two goals: on one hand, counting and listing the infeasible reaction cycles that spur flux configurations derived from thermodynamics-free models; on the other, correcting such infeasibilities in a physically motivated manner.Indeed, we have first analyzed the genome scale metabolic network reconstruction iAf1260 of the bacterium E. coli.By simply recording the cycles found in randomly generated flux patterns we are able to uncover a much larger set of (much more complex) loops than previously obtained, also involving a much larger overall number of processes, comparing in particular with [27] (in this sense outperforming previously employed methods).In passing, we note that our method comes with a certificate of completeness for the set of cycles, which was previously unavailable.Secondly, after showing that cycles plague FBA solutions for the metabolic networks of several different types of human cells (all retrieved from the human Recon-2 reactome), we have applied our loop-removal strategies in order to obtain thermodynamically viable flux patterns that both preserve the basic constraints of FBA as well as the value of the objective function.In doing so, some inconsistencies in the reconstructions have been identified, that can easily be eliminated at the level of network building.Quite importantly in our view, we have also discussed the possibility to employ global variational criteria to generate thermodynamically feasible flux configurations.In particular, generalizing a previous observation, we have proved that flux patterns that minimize the p-norms of the fluxes are thermodynamically viable, provided they are allowed by the constraints.Otherwise, this idea can be used (with some care) to remove cycles that do not involve reactions that cannot be inverted or silenced.
The work presented here extends and improves over previous studies, and takes several steps to suggest controlled and motivated methods to deal with thermodynamic inconsistencies in large networks of biochemical reactions.Further improvements along the lines discussed above (requiring e.g. more precise physiological constraints) are clearly possible.Most promisingly, however, we believe that work directed at enhancing the integration of thermodynamic constraints into flux analysis would be extremely important in light of the current efforts aimed at increasing the scope, reach and predictive power of computational models of cellular metabolism.In absence of sufficiently detailed biochemical information about metabolite levels in vivo or chemical potentials, general stoichiometry-based techniques must be expected to play a key role in this endeavour.
the flux configuration: local strategy 4 5. Correcting the flux configuration: global strategy 5 III.Results 6 A. A test: identifying infeasible loops in the E. Coli network iAF1260 6 B. Inconsistencies in the FBA solution for the overall human reactome Recon-2 7 C. Correcting infeasible loops in FBA solutions for cell-type specific human metabolic networks 8 I. INTRODUCTION

FIG. 1 :
FIG.1: Flowchart of the algorithm for counting and removing cycles employed in this study.See text for details.

FIG. 2 :
FIG.2: Example of a toy reaction network.The black dots are two metabolites, to each of which corresponds a mass balance constraint.Each line is labeled with the name of the flux carried by a reaction, and the arrow indicates the conventional forward direction of the fluxes.Evidently, a thermodynamically infeasible cycle is present if v1 and v2 have the same sign.

FIG. 3 :FIG. 4 :
FIG. 3: Number of independent loops identified in the metabolic network of E. Coli iAF1260 as a function of the number of random configurations tested.Results are obtained by jackknifing over 10 configurations.The black line represents the average number of loops, while the blue lines represent the extremes of the error bars at each point.

B.
Inconsistencies in the FBA solution for the overall human reactome Recon-2

FIG. 5 :
FIG.5: Typical structure of an infeasible loop created by two reversible transports across a membrane-enclosed compartment: a passive one (by diffusion) and an active one (requiring the expenditure of energy).If the active process is allowed to reverse and the cargo re-enters the cell or compartment via diffusion, an infeasible loop is generated that builds ATP from ADP without energetic costs.