Estimating Equations for Density Dependent Markov Jump Processes

: Reaction networks are important tools for modeling a variety of biological phenomena across a wide range of scales, for example as models of gene regulation within a cell or infectious disease outbreaks in a population. Hence, calibrating these models to observed data is useful for predicting future system behavior. However, the statistical estimation of the parameters of reaction networks is often challenging due to intractable likelihoods. Here we explore estimating equations to estimate the reaction rate parameters of density dependent Markov jump processes (DDMJP). The variance–covariance weights we propose to use in the estimating equations are obtained from an approximating process, derived from the Fokker–Planck approximation of the chemical master equation for stochastic reaction networks. We investigate the performance of the proposed methodology in a simulation study of the Lotka–Volterra predator–prey model and by ﬁtting a susceptible, infectious, removed (SIR) model to real data from the historical plague outbreak in Eyam, England.


Introduction
Reaction networks (RNs) are a set of mathematical tools often used to describe and analyze a wide variety of biological systems at many different scales, with examples ranging from infectious disease spread throughout a population to cellular regulation of gene expression. In its simplest description, an RN is a model of how the individual components of a system, sometimes called species, interact with each other to update the state of the system, which evolves over time. An important example from cell biology is the process of gene expression. Cells in living organism are made up of proteins, among other chemical species, which carry out basic functions like building and repairing tissues and making the enzymes and other necessary chemicals essential for the survival of the organism. Many apparently random processes take place in the creation of protein from DNA. For example, in prokaryotes, enzymes called RNA polymerase diffuse randomly throughout the cell and attach to an initial site on a gene. The RNA polymerase then slides down the gene to produce an mRNA transcript. Ribosomes, which also diffuse throughout the cell, then interact with this new RNA molecule to translate it into the functional gene product, a protein. Some of these proteins, the so-called transcription factors, then go on to bind to the promoter regions of other genes and thereby modulate their expression by tuning the binding affinity of RNA polymerase to this region. All of these processes, subject to random intrinsic and extrinsic noise, occur in order for a single protein molecule to be produced. Hence, for overall gene expression through the creation of many thousands of proteins, the regulatory machinery of the cell remarkably coordinates the activity of a large number of the different types of species that perform the critical tasks required to sustain life. Changes in this regulatory machinery then lead to the huge observed variability in biology, i.e., different cell types like skin and liver cells, healthy tissues vs. cancerous ones and different organisms. Reliable and accurate models of such systems can then provide insight into a wide range of different biological phenomena of interest, from uncovering the mechanisms that lead to biological diversity to developing better treatment strategies in diseases like cancer.
Our ability to interrogate cellular networks has improved significantly in the last two decades with the advancements in technologies such as fluorescent microscopy, cellular cytometry and high-throughput sequencing (HTS) to name a few. Data from experiments utilizing these technologies have led to increasing interest in mathematical models as a way to describe the processes involved in transcription, translation, gene regulation, cell cycle, cellular signaling and many others using observed data. The reconstruction of synthetic network models from these observed data, sometimes called reverse-engineering or model calibration, is an area of intense research activity between biologists, mathematicians, statisticians, computer scientists and others. Significant challenges arise in fitting these networks to observed data for several reasons: uncertainty in the underlying network structure, unknown parameter values of the system, identifiability and intractable likelihoods, to name a few [1][2][3]. In this paper we propose a likelihood-free method for fitting a stochastic reaction network to observed data. The method is based on weighted least squares estimation, which gives rise to the so-called generalized estimating equations. The proposed methodology circumvents the difficult challenge of specifying a likelihood function via augmenting the standard least squares estimates with observation-specific weights based on the precision matrix of an appropriate approximating process. In what follows, we describe reaction networks in Section 2 and develop the proposed methodology with results in Section 3. In Section 4, we compare the performance of the proposed method in a simulation study on the stochastic version of the classical Lotka-Volterra (LV) predatorprey system. The LV system is yet another natural candidate system whose dynamics are also amenable to modeling via reaction networks, like the examples of gene expression and infectious disease spread we have mentioned earlier, which further illustrates the wide applicability of these models to different biological systems. Finally, in Section 5, we analyze data from the outbreak of plague during the years 1665-1666 in Eyam, England using the susceptible, infectious, removed (SIR) model of disease outbreak dynamics.

Reaction Networks
There are two primary means of modeling systems using reaction networks: the deterministic and stochastic approaches. The deterministic approach treats the system's species, i.e., the state space, as a continuous process that is governed by a set of coupled ordinary differential equations (the 'reaction-rate equations'). In the stochastic approach, the system state is a vector of counts, where each entry corresponds to the (discrete and non-negative) number of copies of each species type. The state space of the species' counts evolves over continuous time, but system updates occur at discrete random times, and any changes in the state space are by an integer amount. This type of system behavior is clearly a closer approximation to real world systems, which usually involve a finite number of species, or particles, interacting with each other at random times to change the system, in contrast to a deterministic ordinary differential equation (ODE) with continuous state space. The system updates in the stochastic formulation occur as integer valued changes to the coordinates of the state space. The stochastic approach also appears to be more realistic in terms of modeling randomness in interactions of the species. For example, in gene expression, random movement associated with Brownian motion affects particle interactions. In the SIR system and in the LV system, interactions between susceptible individuals and infectious individuals and between fox (predator) and rabbits (prey), respectively, are subject to noise [4]. Analysis using the stochastic model also solves the parameter identifiability issues inherent in fitting some of their deterministic analogs [4]. Since we are now in a stochastic setting, the object that describes the system evolution is a probability distribution function, which turns out to be the solution to a single differential equation (the "master equation") [5]. The master equation solution gives the probability that the system is in a particular state at a particular time and was derived for chemical reaction networks by [6].

Deterministic Systems
An RN consists of a set of reactant species, that interact via a set of reactions, to produce products. Specifically, an RN consists of three sets (X, C, R) defined as follows: 1 Species X: {X 1 · · · X d }, which represents the quantities of the d different species types in the system. 2 Complexes C: a linear combination of the species. These can be represented as vectors where η r gives the number of each species consumed by the rth reaction and η r gives the number of each species produced by the rth reaction. 3 Reactions R: {η r → η r } with the r th reaction's stoichiometry S r = η r − η r ∈ Z d The classical chemical notation for the rth reaction is then where η ir and η ir are non-negative stoichiometric coefficients denoting the number(s) of X i consumed and produced respectively for the rth reaction. The parameter κ r is the rate constant for the rth reaction. We say that the rth reaction is of order m if ∑ d i=1 η ir = m, i.e., if it involves m reactant molecules.
Guldberg and Waage modeled the dynamics of chemical reaction systems in a macroscopic setting using the Law of Mass Action in the 1860s [7]. The macroscopic setting implies the system contains a sufficiently large number of molecules, so that the state space may be viewed as a concentration, by dividing the species' numbers by an appropriate volume term, Ω, and considering both to be large. Let x = (x 1 , · · · , x d ) = X Ω , where x i denotes the concentration of species X i , and Ω is the system volume, so that x i is a continuous quantity. In a well-mixed system, the reaction rate for reaction r under the Law of Mass Action is given by We call g r the macroscopic rate function of the rth reaction, and its form implies that the reaction rates are proportional to the concentrations of the reactant molecules in each reaction. The stoichiometric matrix S is defined as The column S r ∈ R d describes the net change in the state space when reaction r occurs, and S ir describes the net change in the number of molecules of X i when reaction r occurs. The deterministic ODE φ i for species i gives a deterministic approximation of species i's concentration over time, which may be derived by adding up the product of the macroscopic rate functions and the net change in species i from each reaction. This gives the system of reaction rate equations as [8] where φ = (φ 1 , ..., φ d ). Clearly the solution to Equation (4) depends on the parameter values, initial condition and time, and we denote this whenever necessary as φ κ (t). We suppress this dependence in some places for convenience, but φ should be understood implicitly to depend on these. The determinsitic approximation is valid in well-mixed macroscopic systems, i.e., with large numbers of molecules such that X i Ω = x i → φ i when Ω grows large. To give a simple concrete example of the notation, consider the following example Here there are three species (X 1 , X 2 , X 3 ) = (H 2 , O 2 , H 2 O), R = 1 since there is only 1 reaction, and the complexes are η 1 = (2, 1, 0) , η 1 = (0, 0, 2) , with stoichiometry S 1 = η 1 − η 1 = (−2, −1, 2) and macroscopic reaction rate g 1 (x) = κ 1 x 2 1 x 2 , that is, two molecules of dihydrogen react with one molecule of dioxygen to give two molecules of water. Then the reaction rate equation describing concentration over time in a macroscopic system is

Stochastic Systems
Often, systems will contain species occurring with relatively low abundance, for example in a predator-prey system when say the prey is on the verge of extinction, and hence such systems will not be completely amenable to a macroscopic analysis. In these systems, stochasticity, for instance due to Brownian motion in the chemical example, becomes an important factor affecting system evolution. While the deterministic ODE provides a sort of approximate 'mean' trajectory (this will only be exactly true for linear systems), a stochastic approach requires the probability distribution of different states over time, which turns out is the solution to the so-called master equation (ME).
Consider a system of d species X = (X 1 · · · X d ) and R reactions, {η r → η r }, r = 1, ..., R, under a fixed volume (Ω) and temperature. Further, let f r (X, t)dt be the probability that reaction r occurs in the interval (t, t + dt), given system state X at time t. Then P(X − S r ) f r (X − S r , t)dt is the probability that reaction r occurs in (t, t + dt), given the system is in state X − S r , and thus moves to state X. A mathematical representation of the master equation may be arrived at by answering the following question. How did we get to state X at time t + dt, there are two possibilities • we were in state X at time t + dt and no reaction took place or • we arrived in state X from X − S r as a result of one single reaction. Writing these probabilities out, dividing by dt and taking the limit as dt → ∞, gives the master equation The expression in Equation (5) is typically referred to as the chemical master equation, note that the expression above is merely the Chapman-Kolmogorov equation for a general stochastic process, not necessarily one of a chemical nature. The solution to Equation (5) gives the probability distribution that the system is at state X at time t, and there is an implicit dependence on the initial state X(0) which is suppressed for convenience. Hence, given a set of observed data from a reaction network, one would ideally perform parameter estimation and statistical inference using the solution to Equation (5) in order to compute the likelihood function. Likelihood-based inference has strong axiomatic justification via the likelihood principle, and its computation is required for the frequentist maximum likelihood estimation and in Bayesian analysis to compute the posterior distribution. Unfortunately, it is well known that solving Equation (5) is intractable in all but a few simple systems, for example in linear systems. This is because no analytic solutions to the ME exist in general, and in the cases where solutions exist, the computational complexity involved makes solving them impractical. For instance, when the size of the state space is finite, a solution to Equation (5) can be computed in principle via matrix exponentiation. Even when the state space is finite, it is usually very large, large enough so that exact solutions to the ME using matrix exponentiation is infeasible. For example, in the SIR model we fit to the Eyam plague data below, the size of the state space is 62,128. An exact solution to the ME via matrix exponentiation requires the computation of the power series of a 62,128 × 62,128 matrix, and this calculation would be required each time the likelihood needs to be evaluated at a particular parameter value.
To completely specify the stochastic systems under consideration, we must determine the form of the stochastic reaction rates f r . Under the assumptions that the system is well-stirred and at constant temperature, these rates mirror the deterministic mass action rates and follow the so-called stochastic law of mass action of the form These are adjustments of the determinsitic rates that account for finite molecule numbers and the combinatorial number of ways to choose reactant molecules. A rigorous derivation of these rates was given by [6] and McQuarrie was one of the first to review the stochastic nature of chemically reacting systems [9]. Under the assumptions that the systems are well-mixed and thermally equilibrated, the reaction rates have the form in Equation (6) and the master equation with these rates describe a density dependent Markov process (DDMJP). DDMJPs evolve over continuous time and make jumps at discrete time points, which update the system's state. Further, their transition probabilities depend on the state space only through the density X Ω , hence the name. We give two important results on the limiting properties of DDMJPs in the Appendix A developed by Tom Kurtz [10][11][12][13], which we make use of in Section 3 to prove results on the properties of the proposed estimation method. In the context of chemical systems, the master equation is often referred to as the chemical master equation (CME).
While exact solutions to the ME/CME are usually not available, several useful approximations have been developed which are more amenable to analysis. These often involve Taylor expanding the ME and truncating higher order terms. Consider writing the ME in terms of the step operator as follows Here, X i is the number of molecules of species i and E −S ir i is the step operator defined Taylor expansion of the step operator and truncation of terms of order higher than 3 leads to a partial differential equation which gives an approximation to the ME. This partial differential equation is sometimes referred to as a Fokker-Planck equation, whose solution describes the probability distribution of a Langevin equation (CLE), or diffusion process. In general, a Fokker-Planck approximation will still not have analytic solutions, but simulating process trajectories from the CLE will be typically much more efficient than simulation from the exact process via Gillespie's stochastic simulation algorithm (SSA). If we carry this out one step further and Taylor expand both the step operator and the rate functions f r about the ODE solution Equation (4), we arrive at yet another partial differential equation. The solution to this equation describes the probability distribution of the so-called linear noise approximation (LNA), see [14][15][16] for details on these expansions. The LNA has several nice properties that make it useful for computation, one of them being that the process variances and covariances can be computed by a matrix system of ODEs. The LNA process variance at time t, which we denote Σ t , satisfies where ) and is subject to initial condition Σ 0 . Note .., f R (φ)), and the explicit dependence on time t has been suppressed. Our goal was to use the ideas introduced in this section to propose a method for estimating the parameters κ r of a stochastic reaction network. We do this in the following section and prove results on properties of the proposed estimators.

Generalized Estimating Equations
Ideally, one would base the statistical estimation of parameters on the likelihood function. The likelihood principle has axiomatic foundations in statistical decision theory, and acceptance of the principle justifies using the likelihood function, as it then contains all the available information provided by the data [17][18][19]. As has been mentioned previously, unfortunately the likelihood function is generally intractable in the setting of stochastic reaction networks because of the intractability of the ME solution. A powerful tool to circumvent the intractability of the likelihood function uses a so-called particle filtering approach. This methodology relies on simulating trajectories, or particles, from the processes to obtain unbiased estimates of the likelihood function, and hence allows for exact Bayesian inference via particle marginal Metropolis-Hastings (PMMH) [20]. Even in the absence of some of the known issues associated with PMMH, such as particle degeneracy, merely the computational cost of implementing PMMH may be significant. This is due to the required sampling of particles, and the general difficulty related to tuning Markov chain Monte-Carlo (MCMC) proposal distributions in the absence of an analytic form for the likelihood function. We compare the proposed methodology to PMMH in our simulation study.
Likelihood-free methods have also been proposed to alleviate the outright need to compute the likelihood function. Approximate Bayesian computation (ABC) is a popular class of likelihood-free methods that require only the ability to simulate from the process [21]. While ABC methods are straightforward to implement, once appropriate summary statistics and distance measures have been selected, good choices for these are often not straightforward and will affect both the quality of the approximate posterior and the quality of the chain. We do not consider ABC methods further, since both PMMH and ABC require repeated sampling of the process, but PMMH can give exact posterior distributions instead of approximate posteriors. Further, PMMH does not require selection of statistics that are essentially guaranteed to not be sufficient by the Fisher-Pitman-Koopman-Darmois theorem.
Another likelihood-free method based on least squares estimates (LSE) for DDMJPs has been proposed in [22], and investigated further in [23]. Assume we have observed the process at timepoints t 0 , t 1 , ..., t T , leading to the data X(t 0 ), X(t 1 ), ..., X(t T ) where each X(t j ) ∈ R d . The least squares estimator is the solution to Ω . The asymptotic properties of κ LSE have been investigated in [22]. Here, we propose a weighted estimator that accounts for the different variability in the process at different time points. Consider the solution to the followinĝ where V t j is a symmetric positive definite matrix. Differentiating with respect to κ, setting to 0 and solving for κ gives the proposed estimatorκ, which is the solution to the estimating equation [24] G Ω (κ) = 0, i.e., where x(t j ) =

X(t j )
Ω are the observed species concentrations and φ κ (t j ) is the ODE solution at time t j . Generalized estimating equations (GEEs) have been used in the context of diffusion processes [25], and here we investigate them in the context of DDMJPs. For the proposed method, we estimate the process variance matrix V t j by numerically solving the LNA variance formula in Equation (8). We prove two results on the limiting properties of the GEE estimatorκ. The first result is on consistency, and the second a central limit theorem (CLT), using results of theorems provided in the Appendix A on the strong law of large numbers (SLLN) and the CLT for DDMJPs.

Consistency
Theorem 1 (Consistency). Let ∂ r φ κ denote the derivative of φ κ ∈ R d ≥0 with respect to the rth coordinate of κ ∈ R R + and set ∂φ κ = [∂ r φ κ j ] ∈ R R×d . Assume that ||∂φ κ || < ∞ and that the SLLN for the DDMJP holds. If and for some > 0 then the GEE estimator is a strongly consistent estimate, that isκ → κ 0 almost surely.

Proof.
Assume that x(t j ) are the observed concentrations from a density DDMJP and that κ 0 is the true parameter. Adding and subtracting φ κ 0 (t j ) inside the parenthesis of Equation (10) we have The SLLN for DDMJP given in Appendix A implies the left hand side (LHS) of Equation (13) converges to zero almost surely as Ω → ∞. Thus, we have in the large volume limit that almost surely, whereκ ∞ is the limiting solution to Equation (10). Under the assumptions in Equations (11) and (12), we thus have thatκ ∞ = κ 0 almost surely.

Central Limit Theorem
Theorem 2 (Central limit theorem). Assume that all of the assumptions of Theorem 1 hold, that V t j is invertible, and that the central limit theorem (CLT) hypotheses for DDMJP hold for the CLT in Appendix A. Then where N R denotes the R-variate normal distribution, ⇒ denotes convergence in distribution and Σ κ 0 is given by In the above, Proof. Consider the RHS of Equation (13), expanding this as a function ofκ about the point κ 0 we have The standard order bound O(|κ − κ |) is directly from the Taylor's expansion and the small o convergence in probability to 0 of this term, i.e., o p (1) follows from the previous result on the consistency ofκ. Thus, the second term on the RHS of Equation (16) converges to 0 in probability and may be neglected. Defining Bκ as Bκ = ∑ T j=1 ∂φκV −1 t j [∂φ κ 0 ] and considering the remaining terms we have that
Note that in the above theorems, we have assumed that the data come from a single trajectory of the DDMJP; however, the results may be extended to data collected from independent trajectories of the jump process. This will be the case when the data collection process is destructive, for example those requiring crushing of cells. If we have data from multiple trajectories that are independent, then the Cov(Z j , Z i ) = 0 for i = j, and under the remaining conditions the results still hold but then with Σ κ 0 = ∑ j A j Var(Z j )A j .

Simulation
For our simulation study we consider the classical Lotka-Volterra model with two species, rabbit (Ra) and fox (F), and the corresponding reaction network given below by with the macroscopic mass action rates and where x 1 and x 2 are concentration of rabbits and foxes, respectively. The macroscopic rate equations for the LV system have the form For the simulation study, we generate trajectories from the stochastic LV process using Gillespie's SSA, and we collect the observed data at 10 evenly spaced time points in the interval (0.1,1) under three different volume sizes Ω = 1, 10 and 100. We set the parameter values to κ 1 = 1, κ 2 = 0.2, κ 3 = 0.5, and the initial rabbit and fox copy numbers are set so that the initial concentrations are x 1 = 10 and x 2 = 10 under each of the different volume scenarios. For the particle filtering method, which we label as (PMMH) in the tables, we used 100 particles generated from Gillespie's SSA and a standard deviation of 10 for the measurement error to construct the likelihood. We implement two numerical procedures for the proposed GEE method. The first uses a single iteration of the GEE solution, with the process variance V t j for this single iteration computed using the LNA variance formula evaluated using the LSE solution. We denote this as GEE 1 . The second implementation updates the process variance V t j at the each iteration by using the previous iterations GEE estimate until convergence. We denote this as GEE. We initialize each of these numerical procedures at the LSE for the starting point. We report the average of the parameter estimates over the 1000 iterations for each method. For the proposed method we report the coverage probabilities of 95% confidence intervals constructed using the CLT result proven in the previous section, which are in parenthesis in the tables below.
The results in Tables 1 and 2 indicate that the proposed methodology works reasonably well for a range of volumes in the LV system. For instance, we see that already at a volume of Ω = 1, the GEE estimates are close to the true values and that the empirical coverage probabilities are close to the nominal level. This implies that, in this example, the asymptotic normality of the estimates is generally a good approximation even at this small volume. Further, the proposed GEE method appears to be less biased than the LSE and PMMH at this volume size. Further, as the volume increases, the estimates for GEE and LSE methods tend toward the truth, while for Ω = 100 the PMMH estimates for κ 2 still maintain some bias.   In Tables 3-5, we compare the variance and squared-bias for each parameter using the different methods on the LV model. At small volume Ω = 1, the PMMH has the smallest variance and bias, followed by the GEE and then LSE. Although the PMMH method performs well for the LV model in the smallest volume Ω = 1 case, this comes at the cost of significant increase in computational time compared to both the GEE and LSE, which are approximately 20 and 60 times faster, respectively. At larger volumes this time difference becomes even more pronounced due to having to simulate particles via the Gillespie SSA for the PMMH method. Further, at least in the LV example, as the volume increases above 1, both the estimates and run time are better for the GEE method compared to PMMH. Hence, one may consider using the GEE at moderate to high volumes (Ω ≥ 10), and PMMH at very low volumes, provided the computation time is feasible.

Real Data Application: Eyam Plague of 1665-1666
We now apply our methodology to real data from a historical plague outbreak at Eyam, England. The Eyam plague of 1666 in England started in 1665 when a flea-infested bundle of cloth arrived from London for a local tailor. The tailor's assistant, George Vicars, was the first fatality, and he died within a week of the arrival. As more people began dying via disease transmission throughout the village, Reverend William Mompesson and Minister Thomas Stanley introduced precautions to slow the spread of the illness from May 1666. These included families burying their own dead and quarantining the entire village. The worst month of the outbreak was August 1666, during which there were five to six deaths per day. We used the proposed methodology to estimate the epidemiological parameters of a stochastic SIR compartmental model for this outbreak. The plague lasted for over 14 months nearly wiping out the entire village. Abraham Morten was the last victim of the plague, dying on 1 November 1666. We modeled this outbreak with an SIR model, where S is susceptible, I is infectious and R is removed. The initial number of people who lived in the village at the time of the outbreak and how many people died is subject to debate and has been revised several times Whittles and Didelot [26], but we use numbers based on the account of historian William Wood about the village's demography [27]. Wood wrote about this outbreak a century after the event. According to Wood, there were about 350 people in the village, with the plague killing 75% of villagers. In this example of a fixed closed population, a natural quantity for the system volume is Ω = 350, the total population.
The SIR diagram in Figure 1. depicts how individuals can transition between the compartments. For instance, a susceptible may transition to the infectious compartment by interacting with an infectious contact. The rate describing movement from the S compartment to the I compartment is proportional to the rate constant κ 1 times the number of possible pair-wise interactions between infected and susceptible individuals, SI. Additionally, an infected individual transitions to the R compartment at rate κ 2 . The macroscopic rate equations for the SIR model are While transmission via flea bites is almost certainly how the outbreak initially began, it became clear during the latter months that human-to-human transmission was a driving force. It was reported that the infection could eventually be spread from the cough of an infectious person. This route of infectious spread is sometimes referred to as plague pneumonia, and the SIR system offers a simple model of the human-to-human transmission. We also look at a model that includes a vector, in this case representing rats, as a possible mode of transmission through flea bites. We assume a constant infectious pressure from the vector compartment V, and we denote this model as SIRV. A graphical depiction of this model is dislplayed in the Figure 2 below.
The corresponding rate equations are given below.  [27]. We fit both the SIR and SIRV models to these data using the GEE and LSE methods and report the results in Table 7.
For the SIRV model, the estimated vector transmission parameter κ 3 is zero for all the methods, which suggests that this mode of transmission may not have played as significant a role in the disease transmission as has been originally suggested, at least after the infection has been initiated. Thus, the results from these analyses indicate it is likely most people were infected through some form of human-to-human transmission at the Eyam outbreak, either through plague pneumonia or ectoparasites, like lice. Hence, the SIRV model reduces to the standard SIR model in this case. The 95% confidence intervals for the SIR model parameters in parenthesis indicate the GEE method has increased precision compared to the LSE as the interval widths are more than twice as large for the LSE. The GEE 1 method also has interval width around two times smaller than the LSE's interval width for κ 1 , and has similar widths for κ 2 . These estimated parameters give basic reproductive rate estimates of R GEE 0 = 1.71, R GEE 1 0 = 1.70 and R LSE 0 = 1.62. We plot the corresponding macroscopic rate equations solution for the SIR model with the estimated parameter values from the two GEE methods.
The ODE curves in Figure 3 give an estimated final outbreak size of approximately 70% for both methods, which is similar to the observed final outbreak size.

Discussion
In this paper we have proposed a new way to estimate the parameters of stochastic reaction networks. We used ideas related to generalized estimating equations to estimate these parameters using the covariance matrix computed from the LNA's approximating variance. We have discussed why stochastic models are a better way to model reaction networks compared to the classical deterministic models. While the CME gives us an exact representation for reaction networks, the computational cost of computing the CME quickly becomes prohibitive, and thus approximate methods can help tackle this issue. The ability to reverse engineer reaction networks from experimental data is critical for understanding and predicting a variety of biological systems' behavior. Thus, we have proposed methodology for estimating parameters when we have observed data from stochastic reaction networks, and we have developed the corresponding limiting results so that researchers may perform better inferences about their systems.

Funding:
The second author would like to acknowledge that part of his effort on this project was funded by the Augusta University early career award.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: