Solving Stochastic Reaction Networks with Maximum Entropy Lagrange Multipliers

The time evolution of stochastic reaction networks can be modeled with the chemical master equation of the probability distribution. Alternatively, the numerical problem can be reformulated in terms of probability moment equations. Herein we present a new alternative method for numerically solving the time evolution of stochastic reaction networks. Based on the assumption that the entropy of the reaction network is maximum, Lagrange multipliers are introduced. The proposed method derives equations that model the time derivatives of these Lagrange multipliers. We present detailed steps to transform moment equations to Lagrange multiplier equations. In order to demonstrate the method, we present examples of non-linear stochastic reaction networks of varying degrees of complexity, including multistable and oscillatory systems. We find that the new approach is as accurate and significantly more efficient than Gillespie’s original exact algorithm for systems with small number of interacting species. This work is a step towards solving stochastic reaction networks accurately and efficiently.


Introduction
The traditional approach to describe chemically reacting systems involves the use of deterministic rate laws. This macroscopic, continuous-deterministic modeling formalism is appropriate at the thermodynamic limit, when the volume of the system and the numbers of molecules of reactants and products all tend to very large values [1]. However, this approach fails to describe molecular populations that are small, finite and countable [2]. Such populations must be described by variables that can only change stochastically and by discrete amounts [3][4][5][6][7][8].
Markov chain models can be used to describe chemical reactions away from the thermodynamic limit [9]. When the system evolves stochastically, the all-encompassing chemical master equation (CME) can model the probability distribution of the system being at a particular state at time t [3,4].
The historical difficulties in solving the CME are well documented in the literature [5,[9][10][11]. The CME is not a single equation but an infinite set of coupled equations. As a result the CME is analytically unsolvable for the majority of systems, with only few exceptions [12,13]. An established way to sample the master probability distribution is to use Gillespie's Stochastic Simulation Algorithm (SSA) [6]. Despite the accuracy of SSA, the method is a kinetic Monte Carlo algorithm and hence computationally expensive, especially for systems with large numbers of molecules, or with reaction kinetics that span multiple time scales [14,15].
A proposed alternative to solve the CME is by calculating the probability moments [3]. Moments are expected values, e.g., the mean and the variance of the number of molecules [5,[16][17][18].

Connect Moments to Lagrange Multipliers
The general form of the moment equations for any arbitrary chemical reaction network is: where µ is the lower-order moment vector (not including the zeroth-order moment, which is always equal to 1), µ is the higher-order moment vector, µ c is a vector of constants and t represents time.
A and A are constant matrices that represent the linear and non-linear components of the network, respectively. In most cases, the vectors µ and µ have different dimensions [20]. For non-linear reaction networks, the µ vector is nonempty. For a chemical reaction network with N reactants and products, the moment µ i is connected to the probability function P through the following expression: (2) where f µ i is the functional form of the ith moment µ i . For example, for a one component system the functional form of the 3rd polynomial moment is f µ 3 = Y 3 , where Y is the system's component. The number of molecules of each component is contained in the matrix X = (X 1 . . . X N ) and Ω corresponds to the N-dimensional state space for all the possible values of (X 1 . . . X N ). The differences between X and Ω becomes clear if one compares Equation (2) and the definition of entropy (S = − ∑ X P(X)logP(X)).
One can connect the probability distribution to Lagrange multipliers λ by maximizing the entropy [22]: where λ j is the jth Lagrange multiplier and M is the number of lower-order moments and also the size of vector µ. Thus, moments can be related to Lagrange multipliers by combing Equations (2) and (3):

Time Derivatives
In Equation (3), only the Lagrange multipliers depend on time; the state space and the functional form of the moments are time-independent. Hence, the time derivative of the probability distribution is: This equation and the definition of moments (Equation (2)) connect the time derivative of the moments to the time derivative of the Lagrange multipliers: Thus, the moments' time derivative is transformed to: where µ i,j is the combined moment of f µ i and f µ j , given by The sum of the probabilities across the whole state space is always 1 by definition. As a result, the zeroth-order moment is 1 (µ 0 = 1) and its time derivative is zero ( ∂µ 0 ∂t = 1). This results in the zeroth Lagrange multiplier λ 0 to be dependent on the rest of Lagrange multipliers Hence, the time derivative of the zeroth Lagrange multiplier depends on the rest of the time derivatives as follows: Based on this relation, Equation (7) can be modified as: We have previously proved [39] that: Hence, the time derivative of one moment is given by (Equations (11) and (12)): (13) or in matrix form: (14) J represents the Jacobian matrix of the system. Equation (14) also represents the chain rule of differentiation. There is now a way to connect the moment equations to equations that involve the Lagrange multipliers. The Lagrange multiplier equations can be derived by Equation (14) and the moment equations (Equation (1)): where J −1 is the inverse of the Jacobian matrix. In the Appendix A, we discuss a computationally efficient way to calculate it. Even though the above equation also includes the moments of the probability distribution, it only depends on the Lagrange multipliers since the moments are directly correlated with Lagrange multipliers (Equation (4)).
All the Lagrange multipliers can be calculated through Equation (15) except the zeroth-order one λ 0 . As discussed earlier, λ 0 depends on the rest of the Lagrange multipliers through the relation [39]: We refer to Equation (15) as the "Lagrange multiplier equations (LMEs)". To our knowledge, this is the first time that LMEs are being discussed and derived in the literature. Through this step, the problem of calculating the probability for each point of the state space of stochastic networks has been transformed into a problem of calculating a finite set of Lagrange multipliers. The system is closed, it has the same number of equations and unknowns, and it is an initial value problem. An iterative numerical method can be used to solve this system. Appendix B outlines a proposed numerical approach to solve LMEs based on the Dormand-Prince Runge-Kutta RK5(4)7M method [32].

Initial Value from Moments to Lagrange Multipliers
The LMEs problem is an initial value problem; the values of the Lagrange multipliers for the initial state are required. However, Lagrange multipliers are numerical variables with no physical meaning and as a result an initial value is unknown. What is known is either the initial probability distribution or the values of its moments. Hence, a way to calculate Lagrange multipliers from moments is required. It should be noted that there is not a universally acceptable way to back calculate probability distributions from moments and the problem is not trivial [40].
Let's assume that the moments, µ (t = 0) of the initial distribution, P (t = 0) are known. If only the initial distribution is known, its moments can be calculated from Equation (2). The moments are related to the Lagrange multipliers by Equation (4). Hence: G denotes the functional form of the matrix with dependences of the Lagrange multipliers. To calculate the Lagrange multipliers at the initial time from the moments, one has to solve a problem of the form: G (λ) − µ = 0. This is an infinite set and in order to be numerically solvable the user should decide the necessary number of lower-order moments (i.e., to specify the closure order [22]). When closure order is specified, the system has the same number of equations and unknowns (the Lagrange multipliers) and a root-finding method such as Newton-Rapshon can be used. The residual of the method is R = G (λ) − µ and the Jacobian matrix [39]. With this approach all the Lagrange multipliers can be calculated other than the zeroth one. The zeroth one can be calculated from Equation (16).
The knowledge of moments is used here to calculate the Lagrange multipliers, which then can be used to calculate the probability distribution based on Equation (3). This approach allows to calculate probability distributions from its moments. The novelty of the method is the connection of the probability distribution with Lagrange multipliers by maximizing the entropy of the system.

Results
To demonstrate the proposed Lagrange multiplier equations (LMEs) approach, we employ four different example networks: the bistable Schlögl model [33], the multistable Wilhelm's system [34], the oscillatory Brusselator [35,36] and the viral infection [37,38]. The reaction networks and kinetic constants for each model are reported in Table 1. To assess the accuracy and computational efficiency of this method, the results are compared to SSA results using the same initial condition and kinetic constants. Table 1. The table shows the reaction networks with their kinetic constants for the bistable Schlögl, multistable Wilhelm, oscillatory Brusselator and viral infection networks. The kinetic constants values and initial condition for each system are also presented. For the third order reactions the kinetic constants are in (molecules 2 · s) −1 units, for the second order in (molecules · s) −1 , for the first order in s −1 and for the zeroth order in molecules · s −1 . For the Schlögl model the initial distribution is bimodal. All the other systems have a unimodal initial distribution. The first and second-order factorial moments of the initial distribution is reported for each network.

Models
Reactions Kinetic Constants First-Order Second-Order For all the systems the initial condition is a distribution generated with SSA, away from the steady state. The first and second-order factorial moments [17] of each initial distribution can be found in Table 1. Using the algorithm in Section 2.2, the initial distribution was transformed into an initial set of Lagrange multipliers. Based on the initial Lagrange multipliers and the Prince-Dormant algorithm outlined in Appendix B, the Lagrange multipliers for every time point were calculated until steady state was reached. Equations (3) and (4) were then used to calculate the probability distribution and factorial moments for each time point based on the calculated Lagrange multipliers.
All the LMEs results obtained with RK5(4)M are compared with SSA results on Table 2, in order to draw conclusions about the accuracy and the time efficiency of the method. The solution for each time point was compared to the one obtained from SSA simulations with 500,000 trajectories. The average difference among all time points between the two methods can be found in Table 2. Results for both the probability distribution and the first two orders of factorial moments are presented. For the system with more than one components (i.e., Wilhelm's, Brusselator and viral infection models), the moments of the same order are averaged. The difference between LMEs results and the SSA probability distribution is calculated with the Kullbalck-Leiber divergence [41], whereas the second norm is used to calculate the errors in the moments.  Table 1. Each SSA simulation used 500,000 trajectories. This number of trajectories is sufficient for the errors for the mean and variance to be at most O(1) molecules and O(1) molecules 2 , respectively. The first three columns show the average error between the two methods, which is calculated for each time point until steady state is reached and then averaged across all of them. For the probability distribution (P(X)), the Kullback-Leibler divergence [41] is used. For the first and second-order moments the second norm is used. The last three columns present the time that is required for each method. Each system has different initial and final time as well as different step size. To take into account these differences, we report the solving time per each method's time step in CPU seconds. The last column presents the ratio of SSA time required over the RK5 (4)

Multistable Systems: The Schlögl and Wilhelm Models
The Schlögl model is one of the most commonly used theoretical bistable systems and is the simplest single-component system that can exhibit bistability [9,34]. Biological systems exhibiting bistability, both natural and synthetic, have gained an increasing interest recently [42][43][44], and the Schlögl model is a simple prototypical bistable model.
For different kinetic constants, the model has either a unimodal or a bimodal distribution. The kinetic constants selected in Table 1 result in a bimodal network. Results for the bistable model are reported on Figure 1. The probability distribution of the model is a bimodal distribution that changes form with time.
Another example of a theoretical system that can exhibit bistability is Wilhelm's network. The network is a non-linear two-component model and it was created by Wilhelm as the smallest known bistable chemical reaction system [34]. Depending on the kinetic constants, both components can exhibit bistability simultaneously which results in a multistable system overall. This is the case for the kinetic constants reported in Table 1. The probability distribution for different time points for the two component of the network can be found in Figure 2. The system starts with a unimodal distribution for both components. As the time progresses, both components have a different bimodal distribution resulting in a mutlistable network. Close to the steady state, both components still have bimodal distributions but different than before.
For these two example systems, the LMEs solutions are as accurate as the SSA results ( Table 2). All the probability and moments errors are lower than 1%. The accuracy of the LMEs approach does not seem to be affected by the number of stability points of the system. There is also no significant difference between the first and second-order moment errors.  [39,45]. For SSA, 500,000 trajectories were used.

Oscillatory System: The Brusselator Model
Oscillatory behaviors are particularly challenging to capture with stochastic models [46]. We have reported earlier how methods that can capture multistable behaviors fail to appropriately capture oscillatory ones [39].
We used the LMEs to investigated the Brusselator as a simple example of a theoretical oscillatory network [36]. We study the network in its oscillatory region as shown in Figure 3. Both network's first-order moments oscillate with respect to time.
As observed, the LMEs can capture the damped oscillatory behavior of the Brusselator. However, the method exhibits limited accuracy for solving this model ( Table 2). The solution by LMEs of the Brusselator network has an increased error in all categories compared to the other networks. The error in both probability and moments reaches up to 10%. In this case, unlike in the other networks, there is a significant increase in the error of the second-order moments compared to the first-order ones. The difference in the accuracy can be associated with the order of closure that is used to generate the LMEs (in this case 4 or M = 14, Figure 3). The accuracy of the Lagrange multipliers approach depends on the number of lower-order moments, M [29]. The higher the value of M, the more accurate results the method can produce. After, a certain value of M, the improvement in accuracy is insignificant. For more information on how the lower-order moments affects probability distributions the reader is directed to the literature [29]. It has been reported that the Brusselator requires more than fourth-order moments for accurate results [46].
A low order is used for this system due to the numerical difficulties of the algorithm that generates the initial condition (Section 2.2). In the LMEs approach, the closure order is specified through the initial condition. This is the number of Lagrange multipliers with known values at the initial time. For the rest of the time points, the number of Lagrange multipliers is the same as the initial time. For this system, the initial probability distribution was created with SSA and then, based on the algorithm described in Section 2.2, it was translated into the initial Lagrange multipliers. However, the algorithm in Section 2.2 faced significant numerical issues for moment orders higher than 4 and we were not able to generate initial conditions for higher lower-order moments. Thus, we did not provide the LMEs approach with the necessary number of lower-order moments that produce accurate results. The high error was generated by the algorithm that calculates the initial condition and not the Runge-Kutta algorithm of the LMEs. The Brusselator results are not ideal; however, given the difficulties of current methods to solve oscillatory systems, this is a step in the right direction.

Multicomponent System: The Viral Infection Model
The LMEs approach is only limited to theoretical systems with complex behavior but can also be applied to natural and synthetic biological networks. One example is the viral infection model. The model was first created by Haseltine [37] as a general model of a cell infection by a virus. The model was then revised and simplified by Goutsias [38]. The network in Table 1 reflects the revised version of Goutsias. Results for the mixed first-order moments (moments than depend on more than one component) are reported in Figure 4. The LMEs approach accurately captures the different dynamics of all components and their combinations throughout the entire simulated time (

Computational Cost of LMEs
Aside from the accuracy of the approach, Table 2 also includes the computational time that is required for RK5(4)M to solve the LMEs. In all the systems, the LMEs approach is significantly faster than SSA. Each model has different computational time steps that are used; in order to take this difference into account, the table shows the time required for one time point per system. For each system, at least 10,000 time steps were used. As observed, the total amount of time that is required for LMEs approach is significantly lower than for SSA. Among all systems, the LMEs approach requires less computational time to solve the two multistable systems (the Schlögl and the Wilhelm's models). The LMEs approach is significantly faster than SSA (at least 4000 times faster, Table 2) and advantageous especially for multistable systems. As for oscillatory dynamics, they do not seem to affect the computational time of the LMEs approach.
The number of components, on the other hand, affects significantly the computational time of the LMEs approach. The Wilhelm's and Brusselator models (both two-component systems) are slower than the Schlögl model (one-component system) and faster that the viral infection model (three-component system).

Dependence on State Space Size
The proposed method is a numerical method to solve stochastic reaction networks. The LMEs include summations across all the possible values of the state space. The state space needs to be specified and be finite in order for the algorithm to perform numerical calculations. This is a known case with numerical methods [22,47] and it can raise concerns about the accuracy of a method, especially in the case of systems with unknown state space. Some ways to calculate the appropriate state space of stochastic networks have been discussed in the literature [11,47].
To evaluate the stability of the Lagrange multipliers approach with the state space size, we use the Schlögl model as an example. The model has one component and thus the connection between state space size and solution accuracy can be drawn easily; yet it has a complex enough bistable behavior. Figure 5 shows results of the LMEs approach for different state space sizes. The solutions are also compared with the SSA results. The accuracy of the method is not affected by the state space size in this example. All different state space lengths provide the same value for the moments of the probability.

Discussion
Stochasticity governs reaction networks away from the thermodynamic limit. A common approach to solve stochastic chemically reacting systems is through moment equations. However, moment equation systems are not closed for non-linear chemical reactions, and require important assumptions about the form of the probability function.
Here, we present an alternative method to solve stochastic reaction networks by using Lagrange multipliers. An approach to transform moment equations into a closed system of Lagrange multiplier equations (LMEs) is described. Since, LMEs is a closed system, Runge-Kutta methods (e.g., Durmand-Prince RK5(4)7M) can be employed to solve the transient problem and calculate the Lagrange multipliers for different time points. With the knowledge of the Lagrange multipliers, one then can calculate the probability distribution and its moments at any given time.
We show that this is a novel approach to bypass numerical challenges with chemical master equations and moment equations. The only assumption used is that the entropy of the system is maximum at all times. The LMEs solutions are as accurate as alternative methods, such as SSA, with significant computational advantages. Four different non-linear reaction networks are employed to demonstrate the advantages of the new LMEs approach. The networks vary in numbers of components (from one to three) and complexity in behavior (including multistability and oscillations). The number of stable solutions of the network does not affect the accuracy of the LMEs approach. On the other hand, the number of the network's components can affect the method's computational requirements. Future studies may be required to assess how the method scales with the size of the reaction network.
For all of the examples studied, the LMEs approach is faster than SSA. Notably, the method is computationally efficient for multistable systems, which are of interest to a large community [42][43][44].
The method also appears unaffected by the size of the system's state space; the method is stable and accurate for multiple state space sizes. On the other hand, oscillatory behaviors can reduce the method's accuracy.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Inverse of Jacobian
The Jacobian matrix J of the moment equations for a stochastic reaction network is calculated as: By the definition (Equation (8)), it is true that: Thus, µ i,j = µ j,i . It is then apparent that the Jacobian matrix is symmetric since: As a symmetric matrix, J can be calculated as [48]: where Λ is a diagonal matrix with the diagonal elements being the eigenvalues of J and Q the associated eigenvector matrix. Q is also real orthogonal [48], i.e., Q −1 = Q T . The inverse of the Jacobian matrix, J −1 can be calculated as: Since Λ is diagonal, (Λ) −1 is also diagonal with elements the reciprocal elements of ). Additionally, each element of Λ, Λ i,j , represents each eigenvalue of the Jacobian J.
Based on Equation (A5), in order to calculate the inverse of the Jacobian matrix, one only needs to solve the eigenproblem of the Jacobian.

Appendix B. Runge-Kutta
As mentioned, the LMEs problem is an initial value problem. A proposed way to solve the system is the Dormand-Prince RKF5(4)7M Runge-Kutta method [32]. RKF5(4)7M is an explicit numerical method with adaptive step size. Dormand and Prince formulated the method in order to create a stable approach for local extrapolations; this is the reason it is chosen in this work as it will help to create more stable solution for time calculations by extrapolating the knowledge of past time solutions. A table that includes all the constants of the RKF5(4)7M method (often called "Butcher tableau" [49] or simply "tableau") can be found in the Appendix B.1.
As derived in Equation (15), the LMEs are: where F denotes the functional form of the LMEs.
The calculation of Lagrange multipliers, λ, is desired for multiple time points. To calculate the multipliers at the time point n + 1, the knowledge of the multipliers at time n is required. The Lagrange multipliers at time n + 1, λ n+1 , are calculated as: where λ n+1 are the Lagrange multipliers at time n,b i are constants of RKF5(4)7M and k i are the Runge-Kutta coefficients: h is the adaptive step size and a ij are RKF5(4)7M constants. In RKF5(4)7M, a lower-order solution, λ * , is used to evaluate the error of the solution λ and change accordingly the adaptive step size h. The lower-order solution at time n + 1, λ * n+1 , are calculated as: where b i 's are another set of RKF5(4)7M constants. All the constants (a ij ,b i & b i ) can be found in the Appendix B.1. More details about the algorithm of RKF5(4)7M used in this work can be found in the Appendix B.2. 5. Calculate the k i 's coefficients from equation k i = hF λ n + ∑ i−1 j=1 a ij k j and the tableau (Appendix B.1). Each coefficient k i depends on the functional form F of the Lagrange multiplier equations (F(λ) = J(λ) −1 [Aµ(λ) + A µ (λ) + µ c ]) and on different set of Lagrange multipliers (λ n + ∑ i−1 j=1 a ij k j ). To account for the change of the Lagrange multipliers, there is a need for a subroutine to calculate k i . 6. This is a proposed subroutine for k i (a) Calculate the augmented Lagrange multipliers for the specific k i , λ n,k i = λ n + ∑ i−1 j=1 a ij k j . The values of a ij can be found in the tableau in Appendix B.1. (b) Calculate the lower (µ(λ n,k i )) and higher-order moments (µ (λ n,k i )) for the augmented Lagrange multipliers based on equation µ i = ∑ Ω f µ i (X) exp −1 − ∑ j λ j f µ j (X) . (c) Calculate the inverse of the Jacobian, J(λ n,k i ) −1 for the augmented multipliers as described in Appendix A. (d) Calculate the functional form F(λ n,k i ) of the Lagrange multiplier equations (F(λ) = J(λ) −1 [Aµ(λ) + A µ (λ) + µ c ]). (e) Calculate k i based on equations k i = hF(λ n,k i ).
7. Repeat step 6 for all seven k i coefficients, i = 1, . . . , 7. Its coefficient k i depends on all the previous coefficients, thus the calculation of each one needs to be in sequence and include the subroutine. For example, the calculations of step 6 should first performed for i = 1, then based on k 1 to be performed for i = 2, then based on k 2 to be performed for i = 3 etc. 8. Calculate the Lagrange multipliers for the current iteration n + 1, λ n+1 , based on equation i=1b i k i and the tableau (Appendix B.1). 9. Calculate the lower-order solution for the current iteration n + 1, λ * n+1 , based on equation λ * n+1 = λ n + ∑ 7 i=1 b i k i and the tableau (Appendix B.1). 10. Check the accuracy of the solution by calculating the error : 11. If the error is smaller than the wanted tolerance accept the solution and proceed to the next step. Otherwise, set a new time step size h = h · s and go back to step 5 and re-perform the calculation with the new step size. s is used to adjust the step size based on the accuracy of the solution and is given by: 12. Accept the solution of Lagrange multipliers for the current iteration (n + 1) and current time point (t n+1 ) as calculated at step 8. 13. Set the new step size to h = h · s and proceed to step 3 to calculate the solution for the rest of the time points until the final is reached. 14. For any given time t, the probability distribution and its moments can be calculated based on the Lagrange multipliers (λ(t)) and equations P(X) = exp −1 − ∑ j λ j f µ j (X) & µ i = ∑ Ω f µ i (X) exp −1 − ∑ j λ j f µ j (X) .