Identifiability and Reconstruction of Biochemical Reaction Networks from Population Snapshot Data

Inference of biochemical network models from experimental data is a crucial problem in systems and synthetic biology that includes parameter calibration but also identification of unknown interactions. Stochastic modelling from single-cell data is known to improve identifiability of reaction network parameters for specific systems. However, general results are lacking, and the advantage over deterministic, population-average approaches has not been explored for network reconstruction. In this work, we study identifiability and propose new reconstruction methods for biochemical interaction networks. Focusing on population-snapshot data and networks with reaction rates affine in the state, for parameter estimation, we derive general methods to test structural identifiability and demonstrate them in connection with practical identifiability for a reporter gene in silico case study. In the same framework, we next develop a two-step approach to the reconstruction of unknown networks of interactions. We apply it to compare the achievable network reconstruction performance in a deterministic and a stochastic setting, showing the advantage of the latter, and demonstrate it on population-snapshot data from a simulated example.


Introduction
A central problem in systems and synthetic biology is the calibration of the unknown parameters of a biochemical reaction network model on the basis of real data [1].Sometimes, the network of interactions in the system of interest is not completely known and needs to be reconstructed from data as well [2].In synthetic biology, this is of special interest since the engineering of circuits into cells may result in unexpected interactions [3].For parameter inference, a central question is what parameters of a given model can be identified unambiguously by a given experimental setup.This is the problem of identifiability, which is dedicated much attention in the systems' biology literature since crucial for the correct interpretation of inference results [4][5][6].Posed purely in terms of model properties, the question goes under the name of structural identifiability.When instead the quality of the data is also taken into account, attention shifts to estimation accuracy and leads to the concept of practical identifiability.With an appropriate reformulation, similar questions can be addressed for the reconstruction of interaction networks.
Model experimental technologies in cellular biology allow one to measure reaction dynamics with single-cell resolution [7].Correspondingly, stochastic reaction network models are utilized to explain variability of reaction dynamics, notably gene expression, across cells.It has been realized that the combination of stochastic modelling and single-cell measurements allows one to discriminate parameters of reaction models that would not be identifiable based on deterministic modelling counterparts and population-averaged data [8][9][10].While demonstrated on specific case studies, this point has not been addressed in much generality.At the same time, one expects that the advantage of stochastic modelling and single-cell measurements for parameter identifiability carries over to the problem of network inference.Whereas network inference has received attention in a deterministic context [11,12], the problem has barely been addressed in a stochastic setting, and a comparative study of network identifiability in the two settings does not exist.
In this paper, we discuss identifiability and reconstruction of unknown parameters and interactions in biochemical reaction networks.We focus on the fundamental class of models with reaction rates that are affine functions of the network state, and look at inference problems from mean and covariance time-profiles of the network state, as obtained from population-snapshot measurements [13].State-affine rates reflect networks composed of reactions of zeroth or first order.Such networks occur naturally in gene expression modelling [14] and constitute the starting point to investigate more complex networks [15].Population snapshot data, on the other hand, is easily obtained e.g., by inexpensive cytometry measurements, but is also the most immediate outcome of videomicroscopy [7].Thus, we work in a framework that is at the same time sufficiently flexible and amenable of in-depth analysis via, in particular, the moment equations [16,17].This allows us to establish the following tools and results.
For the parameter estimation problem, we develop general results and inexpensive methods to test structural identifiability, and discuss connections with practical identifiability.In the case of a random telegraph model of gene expression [14], we apply our methods to extend existing identifiability results [8,10] and discuss practical identifiability on the basis of numerical simulations.For the reconstruction of interaction networks, we develop a two-step method where a first step devoted to the identification of the network moment dynamics is followed by a second step that determines the network of interactions by the algorithmic solution of a factorization problem.We notably compare the deterministic (population-average) and stochastic (single-cell) scenarios, demonstrating the superiority of the latter from a theoretical viewpoint and also numerically on a toy example.
This work is an extension and consolidation of earlier results by the same author and published as part of conference proceedings in [18] (parameter identifiability) and [19] (network reconstruction).Relevant work on parameter identifiability in the context of biochemical networks has been developed by several authors (see [4,6,8,20] and references therein).Relative to the state of the art, our results on structural identifiability are entirely novel, while the discussion of practical identifiability is in the same spirit as e.g., [1].Reconstruction of biochemical networks is reviewed in [11,12,21,22] (see also [23]) and is even dedicated a yearly challenge boosting continuous investigation (see [2]).However, from the viewpoint of stochastic dynamics and single-cell data, existing contributions are essentially limited to model discrimination and selection among small pools of specific network alternatives (see, e.g., [24]).Moreover, to our knowledge, a comparative analysis of network identifiability in the deterministic and stochastic setups is not present in literature.Our work contributes to fill this gap with an original analysis and methods for the reconstruction of stochastic biochemical interaction networks with no a priori restriction on the network structure for a given set of species.
The paper is organized as follows.In Section 2, we review stochastic modelling of biochemical reaction networks.In particular, in Section 2.1, we derive an especially convenient form for the moment equations that will be the working tool for the rest of the work.Section 3 is devoted to parameter identifiability.In particular, our novel methods for structural identifiability are derived in Section 3.1.Together with the practical identification tools reviewed in Section 3.2, they are applied to the case of the random telegraph model for a reporter gene system in Section 3.3.Section 4 addresses network reconstruction.The theoretical analysis and development of the two steps (Sections 4.1 and 4.2) of the reconstruction problem are turned into a novel practical network identification procedure in Section 4.3.Analysis and reconstruction methods are then demonstrated on a toy network in Section 4.4.Final discussion is presented in Section 5.For mathematical formulas, we stick to common notation, adding explanations where needed.

Modelling of Biochemical Reaction Networks
A reaction network is a family of n ∈ N chemical species S 1 , . . ., S n and m ∈ N reactions R 1 , . . ., R m that may occur among them in a given reaction volume.The stoichiometry matrix of the network, S = [S 1 , . . ., S m ] ∈ Z n×m , is defined such that the ith row of S j is the net change in the number of molecules of S i when reaction R j occurs, with i = 1, . . ., n and j = 1, . . ., m.Let X i (t) be the number of molecules of S i at time t, and The evolution of X(t) depends on the random occurrence of reactions R j , thus X(t) is, in general, a random vector.Under suitable assumptions on the reaction volume and the physics of the reactions [25], one has that X(t) follows the laws of a Continuous-Time Markov Chain (CTMC), with the columns of S as possible state transitions and transition probabilities Reaction rates a(x, t) = [a 1 (x, t), . . ., a m (x, t)] T depend on the current abundance x of molecules of the different species, and possibly on time t e.g., in the presence of environmental perturbations.They entirely specify the time dynamics of P[X(t)] as expressed by the Chemical Master Equation [25].
We focus on reaction rates that are affine in the state, that is, where W ∈ R m×n and w 0 (t) : R → R m is piecewise continuous (W may as well depend on t, but we will not address this scenario here).This form is typical of networks comprising zeroth-and first-order reactions only, as dictated by the mass-action laws [26].In cellular biology, affine rates arise naturally in the modelling of gene expression (see later Section 3.3), and constitute the starting point for the (possibly approximate) modelling and analysis of regulatory networks [15].The choice of state-affine rates and possible generalizations are further discussed in Section 5. Note that, together, the patterns of nonzero entries of S and W define the network of regulatory interactions among the different species (see also [27]).Indeed, for every reaction j, the jth row of W tells what species regulate the reaction rate, whereas the jth column of S tells what species are affected by that reaction.Thus, if S i,j = 0 and W j,k = 0 for some j, X k directly regulates the dynamics of X i , whereas, if S i,j = 0 or W j,k = 0 for all j, X k does not exert a direct regulation of the dynamics of X i . Denote It can be shown that the time evolution of the mean vector µ and of the covariance matrix Σ obeys the so-called Moment Equations [16], with initial conditions µ(t 0 ) = µ 0 and Σ(t 0 ) = Σ 0 determined by the initial probability distribution of X at a time t 0 (we will conventionally fix this time to t 0 = 0).This is a set of linear equations in the entries of µ and Σ. Similar, though more intricate, equations could be written for higher-order moments [17], but we will not use them.Equation (2), describing the evolution of the process mean, also provides by definition the mean-field approximation of the system and, for networks with negligible random fluctuations, its deterministic dynamics.Equation (3) instead quantifies the strength of the stochastic fluctuations around the system mean.If rates are not in the form (1), the description of the moments and the mean-field approximation of the system dynamics are involved (see, e.g., [28,29]).

Vector Representation of Moment Equations
Thanks to linearity, Equations ( 2) and ( 3) can be rewritten in the form where z(t) is a vector formed by the entries of µ(t) and Σ(t), whereas A and B are matrices of appropriate dimension fixed by S and W.This representation is non-unique as it depends in particular on the ordering of the entries of z.One such representation is the following.Let "vec(•)" be the operation of stacking the columns of a matrix into a single column vector, and let "⊗" denote the Kronecker product. 2).( 5) Proof.We start by rewriting (3) in vector form.Let us drop index t from notation for brevity.By the properties of Kronecker product, one gets that In order to write the rightmost term in a more convenient form, observe that where W j denotes the j-th column of W. Therefore, vec(Sdiag(Wµ + w 0 )S T ) = vec(Sdiag(W 1 )S T ) , . . ., vec(Sdiag(W n )S T ) • µ + vec(Sdiag(w 0 )S T ).
Next note that, for any vector w of appropriate size, vec(Sdiag(w)S T ) = S (2) w.Then, we may write Together with Equation (2), this yields the result.
In the sequel, we let (5) be the definition of z, A and B. Note that this representation is redundant, since the upper-and lower-triangular part of the symmetric matrix Σ are both included in z.In addition, depending on the definition of process X(t), relationships among the entries µ and Σ may further reduce the effective dimensionality of the system [30].

Input-Output Model
For identification purposes, well-defined time-varying stimuli are usually applied to a biochemical network to explore its dynamics.These perturbation inputs typically enter the system in a way that is known only in part.Here, we focus on the scenario where the inputs act on rates w 0 (t).A natural manner to express the dependency of w 0 (t) on known inputs u(t) is to write where G is an m × n u matrix of (possibly null) constants, while u(t) is an n u -dimensional vector of (possibly constant) functions.In general, we let u(•) belong to the space of piecewise-continuous (vector) functions U .In the sequel, we will consider u(•) to be known and treat G as possibly unknown parameters, reflecting partially unknown effects of inputs of reaction rates.
A chief biological application of stochastic reaction networks is to describe variability of the individual cell dynamics in an isogenic population.In this context, the different random outcomes of X(t) correspond to different dynamics of individual cells in the same environment.Accordingly, Equations ( 2) and (3) describe how the statistics of X(t) over the cell population evolve over time.Depending on the biochemical system under consideration, several experimental techniques are available to monitor the evolution of these statistics, ranging from population-average to individual-cell measurements.From the mathematical viewpoint, they differ in the order of the moments and in the state variables that are monitored.As far as moments up to second order are concerned, we may describe the observed moments as where C is a matrix of size n y × (n + n 2 ) and z obeys Equation (4).Typically C is a (0, 1)-matrix that selects the observed entries of z, so that n y is simply the number of observed moments.For a sequence of N y measurement times t , with = 1, . . ., N y , we may then define the experimental measurements of y as ỹ = y(t ) + e , = 1, . . ., N y , where e accounts for measurement error.For population-average measurements, C only selects the entries of y from those of µ.Note that the selected entries of µ, corresponding to different species S i , can be observed in separate experiments, as long as the moment equations (that is, the network state statistics) are the same across all experiments.For individual-cell measurements obtained e.g., by flow-cytometry, C most often selects the entries of µ and diag(Σ) in accordance with the monitored species S i .Again, several species monitored in different experiments are accounted for at once by Equation (8) as long as z obeys the same moment equations across all experiments.A definition of C such that non-diagonal entries Σ i,i of Σ enter vector y instead subsumes experiments where species S i and S i are quantified simultaneously in individual cells (covariance of X i and X i cannot be determined from independent experiments separately targeting i and i ).In general, for the later mathematical developments, C can be any real matrix of appropriate size.Finally, the statistical model for the error terms e depends on the experimental technique.We will not discuss different possible error models here.Where needed in the sequel, following a common practice, we will assume the e to be mutually independent Gaussian random vectors, with the R 2 known (or estimated from the data).In connection with the discussion above, this model is notably advocated in [9] for flow-cytometry measurements.
In view of the above definitions, in vector form, the moment equation system relating inputs u to the observed outputs y is with z, A and B as in Equation ( 5), and time-sampled, noisy measurements of y as in Equations ( 8) and (9).

Identification of Parameters
Assuming S and C known, in this section, we consider the problem of estimating the reaction rate parameters W and G that define the model of Equations ( 10) and ( 11) via (5) from experimental data (8) and (9).Since µ 0 and Σ 0 are also unknown in most practical scenarios, we let z 0 = z(0) be an additional parameter vector to be estimated.We assume that u(t) is a known input vector of the reaction network, such as controlled environmental stimuli.The choice of appropriate inputs and observables, i.e., function u(•) and matrix C, in an experiment design phase is an intriguing and important subject, but its full treatment is beyond the scope of this paper.We will limit ourselves to commenting on this point when relevant.
In general, some entries of W, G and z 0 , such as null entries for species that do not participate in certain reactions or that are initially absent, may be known in advance.At the same time, as in the example of Section 3.3 later on, some of the unknown entries of W, G and z 0 may be identical by construction.Taking this into account, let θ = [θ 1 , . . ., θ N ] T be the vector of unknown, distinguished entries of W, G, and z 0 , in some convenient order.We assume that θ ∈ Θ, where Θ ⊆ R N is given.To avoid technical complicacies, we let Θ be nonempty and open, and elaborate on this hypothesis when appropriate.
The first question to be addressed is whether the values of θ can be uniquely resolved based on the system observables, regardless of the quality and frequency of the measurements.This is the problem of structural identifiability, which concerns properties of the system model and its observables.In our context, given model ( 10)- (11), it addresses the question whether the relationship between θ and y(•) is one-to-one.Structural identifiability is treated in Section 3.1.
The second question is the accuracy by which the model parameters can be estimated from actual data.This is the problem of practical identifiability.In addition to the model properties, it also concerns the quality and frequency of the data, as defined by Equation (8), and the properties of the measurement error.Intuitively, this question is only well-posed for systems that are structurally identifiable.However a formal definition of practical identifiability is not obvious and still open (see [1,4,6,20], among others).Rather than a theoretical discussion of practical identifiability, we will briefly review the Maximum Likelihood (ML) approach to identification from noisy data and related performance evaluation tools in Section 3.2, and discuss practical identifiability results in relation with estimation performance and structural identifiability on a numerical case study in Section 3.3.

Structural Identifiability
In this section, we review the formal notion of structural identifiability and develop novel methods to test structural identifiability for networks with affine rates.For any given θ ∈ Θ, let ŷθ be the corresponding solution y(t)| t≥0 of Equations ( 10) and (11), and let Structural identifiability is then a property of the model family (12), as expressed via the following classical definitions [31].(Notice that, for this definition, the hypothesis that set Θ is open is irrelevant, since a property holding almost everywhere in an open set also holds almost everywhere in its closure.)Structural identifiability can thus be understood in a global or local sense.In both cases, an equivalent formulation of structural identifiability can be given in terms of the Laplace transform of ŷθ , which we denote by Y(s, θ), with s ∈ C. Indeed, the model family in ( 12) is structurally identifiable whenever, for a.e.

Definition 1 (Identifiability at a point). The model family
holds for θ within some B θ * (local identifiability) or for all θ ∈ Θ (global identifiability) [31].Global identifiability is generally difficult to assess (we will discuss why by the case study of Section 3.3).
Local identifiability can instead be tested on the basis of the following approach.Let Proposition 2. Suppose that, for some L ∈ N, a set of points S L = {s 1 , . . ., s L } ⊆ R exists such that the matrix has full column rank.Then (12) is locally identifiable at θ * .
Proof.In the hypothesis of full column rank, This result, which is a variant of a family of rank conditions [32], provides a test for local identifiability at a given θ * in the sense of Definition 1(a), and applies to any Y(•, θ) differentiable with respect to θ.For the moment equations of our interest, it is easy to see that where U(s) is the Laplace transform of u(t)| t≥0 , and the writing A(W) indicates dependency of A on W. It can be verified by inspection that, for any fixed s, Equation ( 15) is a ratio of multivariate polynomials in the entries of W, G and z 0 , that is, the entries of θ.This allows us to strengthen Proposition 2 into a method to test local structural identifiability as per Definition 2.
Corollary 1.Given a set of points S L , if ∆(S L , θ * ) has full column rank for at least one value of θ * , then (12) is structurally locally identifiable.
Proof.As a function of θ, rank loss of ∆(S L , θ) relative to the rank of ∆(S L , θ * ) may only occur at the zeroes of the minors of ∆(S L , θ) that are nonzero for θ = θ * .In view of ( 15), these minors are multivariate polynomials in the entries of θ.N-variate polynomials are either identically null, or their zeroes form a variety of zero measure relative to the N-dimensional Lebesgue measure of Θ, that is, they are nonzero almost everywhere.The nonzero minors of ∆(S L , θ * ) are obviously not identically null, hence they are nonzero almost everywhere in Θ and guarantee that the rank of ∆(S L , θ) is no smaller than the rank of ∆(S L , θ * ) almost everywhere.Since by hypothesis ∆(S L , θ * ) is full rank, ∆(S L , θ) is full rank almost everywhere in Θ.By Proposition 2, one concludes that ( 12) is locally identifiable almost everywhere, i.e., it is structurally locally identifiable.
Thus, the method suggested by Corollary 1 amounts to evaluate the rank of matrix ∆(S L , θ * ) at suitable (even randomly generated) candidate points S L and θ * .One choice of S L and θ * yielding full rank suffices to conclude for structural identifiability.
A time-domain version of Proposition 2 can also be derived, resulting in the evaluation of the rank of the sensitivity matrix of the system outputs at sufficiently many time points (see e.g., [1] and references therein, and Equation ( 17) later on).Unfortunately, computation of time-domain sensitivities is entirely numerical (see [33]).We found that this makes the subsequent rank computation unreliable.Instead, for the moment equations of our concern, the calculation of the Laplace-domain sensitivity matrix ∆(S L , θ * ) can be performed explicitly or by standard symbolic calculation software, since the matrix is composed of rational functions.This allows one to confine numerical rank evaluation at a very last step, with great gain in numerical stability.Therefore, not only does Corollary 1 establish a theoretical result showing that, for the model class (12), structural identifiability at a point implies structural stability a.e., but it also provides a numerically robust method to test identifiability.We will further discuss implementation aspects in Section 3.3, where the method is demonstrated on a case study.
Note that, for fixed S L and θ * , the rank of ∆(S L , θ * ) increases by the addition of rows, that is, by the observation of more system statistics (in (15), C has more rows).In particular, this suggests that observation of stochastic variability of a system (e.g., of Σ(•)) favors identifiability of networks that are not identifiable from the sole observation of deterministic population-average dynamics (e.g., of µ(•)), in line with previous findings on specific case studies [8].Another instance of this fact is the case study of Section 3.3.
To conclude the section, suppose that input u(•) can be selected from within the class of functions U .The above identifiability definitions and results can be rephrased in terms of the existence of an element u ∈ U that makes (12) identifiable.In particular, the test of identifiability of Corollary 1 requires finding an element u(•) in U such that the corresponding ∆(S L , θ * ) is full rank (a suitable input is often easy to determine by a qualitative inspection of the problem).We will come back on this point in the discussion of Section 5.

Parameter Identification in Practice
For a parametric model that is structurally identifiable, the question turns to how accurately the unknown parameters can be estimated in practice from real (noisy, sampled) data.In this section, we consider existing approaches to assess practical identifiability, which will be used in Section 3.3 to elaborate on the results of Section 3.1 on a case study, and later on in Section 4.3 to devise a network reconstruction algorithm.As argued in [20], by the very name, practical identifiability should be a property independent of the specific estimation strategy.However, from this perspective, the question is hard to tackle except for special cases, and a posteriori strategies have been popularized trying to assess estimation uncertainty around the estimated paramater values [6].In this section, we focus on a widespread approach to parameter estimation, likelihood maximization, and discuss an equally common linearization approach to define estimation accuracy around the obtained parameter estimates [1,34].
Consider data (8).Under the statistical error model ( 9), the ML estimator of θ is defined as a vector θ such that (neglecting terms independent of θ, the weighted sum of squares E(θ) equals the negative log-likelihood of θ given the data [35]).Solution θ is unique (almost surely) as long as the finite sample set {y(t ) : = 1, . . ., N y } is sufficiently rich.Formally, this can be expressed by the condition that the sensitivity matrix has full rank (almost everywhere in Θ).In particular, this requires N y • h ≥ N, where h is the size of y.
As discussed in the previous section after Proposition 2, this coincides with the condition for structural identifiability expressed in the time domain.ML estimators enjoy well-known, strong theoretical properties that are mostly asymptotic in N y [35].For a finite dataset, it is of interest to quantify the (approximate) statistics of the estimation error θ − θ * , where θ * denotes the true value of θ that generated the data.Using the approximation ŷθ (t) where S θ * is matrix (17) computed at θ = θ * , one has that E[ θ] = θ * , and the estimation error covariance matrix V(θ * ) = Var( θ − θ * ) is equal to I(θ * ) −1 , where Matrix I(θ * ) is known as the Fisher Information Matrix (FIM, [1,4,6,34]).Provided θ is reasonably close to θ * , evaluating V at θ allows one to establish a confidence ellipsoid E α ( θ) that θ * belongs to with 1 − α probability, α ∈ (0, 1).For R known, with = 1, . . ., N y , this is given by the set where χ α is the (1 − α)-quantile of the χ 2 distribution with N degrees of freedom.In practice, computation of V requires computation of (17).This is easily done by the sensitivity equations [33], a system of ODEs that follows from the definition of ( 10) to be solved numerically alongside the latter.
To conclude, following up from the final comments to the previous section, note that the choice of the perturbation input u (but also the observation matrix C) may play an important role in the achievable estimation performance.In particular, u should make the system structurally identifiable, and should be chosen such that V is as small as possible in an appropriate sense.This is the subject of optimal experiment design [31], which we will not treat here.

Example: Reporter Gene Expression Dynamics
By the tools developed in the previous sections, in this section, we study identifiability of reporter gene systems.These are synthetic gene constructs that are commonly engineered into cells to monitor gene expression over time in terms of the synthesis of a fluorescent (or luminescent) protein [36].The illustration of a reporter gene is given in Figure 1.When the gene is expressed, the transcription of the genetic sequence that codes for the fluorescent protein leads to synthesis of corresponding mRNA molecules.These molecules are then transcribed in a second step, leading to the synthesis of fluorescent proteins.When the gene is switched off, synthesis of mRNA molecules is no longer possible until the gene is switched on again.The existing mRNA and protein molecules are subject to degradation throughout.Experimental measurement of fluorescence intensity allows for the quantification of the abundance of fluorescent reporter molecules over time, and thus gives an indirect dynamical readout of the gene expression state.In some cases, an additional maturation step of the reporter protein needs to be taken into account before the molecule becomes fluorescent.While this can be modelled in our mathematical framework, we will not address it here.Let M and P denote mRNA and protein species, in the same order.Let F be the active promoter species, such that gene expression is enabled only when F is present.The synthesis and degradation of mRNA and protein molecules described above are typically expressed by the reaction model for some nonnegative rate parameters k M , k P , d M and d P [14,37].In turn, for some nonnegative parameters λ + and λ − , the switching dynamics of F are expressed by two additional reactions for activation and deactivation, where the activation rate parameter r(λ + ) is equal to λ + in the inactive state, and to 0 otherwise (activation is possible only from the inactive state).At the level of single cells, Equations ( 19)-( 21) are known as the random telegraph model of gene expression [14].Let X 1 ∈ N and X 2 ∈ N be the number of molecules of species M and P, respectively.Let X 3 ∈ {0, 1} be the state of the gene promoter, that is, X 3 = 0 in the inactive state and X 3 = 1 in the active state.In accordance with the modelling of Section 2, one may describe X = [X 1 X 2 X 3 ] T as a Markov process [14].The rates of the m = 6 reactions ( 19)-( 21) (ordered from left to right, top to bottom) are These rates are in the affine form (1), with time-independent w 0 (t) = G.Rate parameters and stoichiometry matrix of the network are We consider that all rate parameters of reactions ( 19)- (21), that is all parameters defining W and G, are unknown, and wish to study their identifiability from the experimentally observed species P. The observation model (7), that is, matrix C, is thus such that y(t) = E[X 2 (t)], var X 2 (t) T .
We first investigate structural identifiability by the approach of Section 3.1.This requires in the first place to compute the Laplace transform Y(s, θ) and its sensitivity function ∇Y(s, θ).We implemented these computations in MATLAB (Release 2017b, The MathWorks, Inc., Natick, MA, USA) by the aid of the Symbolic calculation toolbox.Because Y(s, θ) is a rational (vector) function, this allows us to obtain explicit expressions for ∇Y(s, θ) in the symbolic variables s and θ in a straightforward manner.
Then, for a given set of test points S L and a given vector θ * , we evaluate ∇Y(s, θ) numerically at (s, θ) = (s , θ * ), for all values s ∈ S L .This allows us to build the sensitivity matrix ∆(S L , θ * ) and finally compute its rank.In the light of Corollary 1, the system is found identifiable if this rank is full.
For our case study, given the number of test points L, matrix ∆(S L , θ) has 2L rows.Therefore, the full column rank required by Corollary 1 to conclude for identifiability can only be obtained if 2L ≥ N, where N, the number of columns of ∆(S L , θ), is equal to the size of the parameter vector θ.This is made of six rate parameters plus the unknown initial conditions z 0 , for a total of N = 15 entries.Then, identifiability should be tested with L ≥ 8.We arbitrarily set L = 10 and choose points S L = {s l = l : l = 1, . . ., L}.Then, we sampled values θ * at random and computed the corresponding rank of ∆(S L , θ * ).We always found this rank to be full, that is, we concluded in each case that the reporter gene system is structurally locally identifiable.Whereas one such random evaluation suffices, the consistency of the result shows the effectiveness of the method, which does not depend on a critical choice of θ * or S L .
To verify how variance observations contribute to identifiability, by the same approach, we further investigated structural local identifiability from the observation of the reporter abundance (fluorescence) mean only.The method holds unchanged, provided a suitable redefinition of matrix C. In this case, ∆(S L , θ) has only L rows, and its rank should be tested for L ≥ N.For increasing values of L and parameter values θ * sampled at random as above, the maximal rank of this reduced version of ∆(S L , θ * ) was found to be equal to 7. That is, full column rank was never found.Since the rank condition of Corollary 1 is not necessary but only sufficient, we cannot state from this test that the reporter system lacks structural identifiability from the mean, yet the results are a strong hint toward non-identifiability.In fact, consider for simplicity z 0 = 0.The entry of Y(s, θ) associated with the mean is found to be Inspection of this formula reveals lack of identifiability, since parameters k M and k P only appear through the product k M k P .Thus, in this case, it is easy to see that changes in one parameter can be perfectly compensated by reciprocal changes in another parameter, such that unique values for these parameters cannot be fixed from the available observations.
To summarize, by the identifiability test of Section 3.1, we showed that the observation of reporter protein mean and variance profiles guarantees structural identifiability of the gene expression model, whereas the same model is not structurally identifiable from the mean only.This is similar to an analogous result in [8], where, however, global structural identifiability of a simpler gene expression model not accounting for switching promoter dynamics is investigated.Note that, for our case study, assessing global identifiability is rather challenging.For instance, by the existing approaches based on (13) [1,31], one would be confronted with an algebraic analysis of Y(s, θ), whose second entry is a ratio of polynomials of degree 10.Our local identifiability result was instead obtained by a symbolic computation approach that is fast, robust and fully general, as it can equally deal with any network in the class we consider.
For the same system, we now move on to a practical analysis of identifiability by the tools reported in Section 3.2.For simplicity, we fixed z 0 = 0 and assumed it known.We set the value of the rate parameters to θ * = (k M , d M , k P , d P , λ + , λ − ) = (0.5, 0.1, 0.2, 0.01, 0.05, 0.05) (in minutes −1 ).These values are consistent with the literature [36,37] and correspond to a case of slow switching of the gene expression state and a stable reporter protein.(At this θ * , the system is locally identifiable.)For convenience, let us denote by µ P and σ 2 P respectively the mean and variance of X 2 (the abundance of the reporter protein P), and use similar notation for the other species F and M. Simulated data were generated for this system for t ∈ [0, 95] (minutes).Random simulations were performed in STOCHKIT [38].We fixed measurement times to t = ( − 1) • 5 min, with = 1, . . ., N y and N y = 20.At these times, we computed empirical mean and variance statistics of X(t) from 10 4 simulated trajectories (different simulations were used at different measurement times to mimic statistically independent cell samples across ).This allowed us to to generate measurements of µ P and σ 2 P of the type ( 8) and (9).The obtained data is exemplified in Figure 2b.Based on 10 such datasets, we estimated θ * 10 times by numerical solution of (16).Estimation was performed in MATLAB (fmincon, with initial guess 5 • θ * ).Results are reported in Figure 2a together with 95% confidence regions determined from (18).First of all, it can be seen that the confidence regions deduced from the approximate estimation error covariance matrix V are in excellent agreement wth simulated estimates.Therefore, they provide an effective tool to study practical identifiability.In our case, one sees from the figure that the confidence regions for the joint estimation of k M and k P are nearly degenerate ellipses.Relating this with the structural identifiability analysis carried out before, one concludes that, whereas variance observations guarantee identifiability of the system, discerning k M and k P unambiguously from their product k M k P remains difficult.Further analysis of Figure 2a shows that discerning λ + from k M and k P is also difficult.This is not surprising since λ + itself multiplies k M and k P in the entry of Y(s, θ) displayed above (however λ + also appears at the denominator separate from the other parameters).Finally, it is interesting to look at the time profiles of the statistics of M and F corresponding to the 10 estimates of θ * .These are shown in Figure 2b.The estimates of µ F (•) and σ 2 F (•) are all close to the true profiles, while those of µ M (•) and σ 2 M (•) are significantly more dispersed.Apparently, the limited accuracy in the estimation of k M , k P and λ + is reflected in the estimates of the mRNA statistics, but not in those of gene activation statistics.This shows that, depending on the purpose of parameter identification, inaccurate parameter estimates may or not be detrimental for the study of the system.
In summary, the study of practical identifiability of the reporter gene system showed that structural identifiability translates into the ability to discern all parameters unambiguously, but with an accuracy that still depends on the interplay of the parameters.Moreover, certain system dynamics may still be predicted with accuracy in spite of poor estimates of relevant system parameters.This relates with the concept of "parameter sloppiness" discussed in the literature for biological systems [5].

Identification of Networks
As discussed in Section 2, the structure of regulatory interactions among species of a given network is captured by the pattern of nonzero entries of S and W. In Section 3, we have discussed the problem of identifying unknown parameters of a reaction network with a known structure of interactions.By the case study of Section 3.3, we have shown that usage of stochastic information, such as the dynamics of the network state variance, may guarantee structural parameter identifiability in cases where the sole usage of deterministic information, such as the mean system response, does not.In this section, we wish to investigate whether the addition of stochastic information also helps identification of networks with unknown interactions.In full generality, the problem thus becomes the identification of the matrices S ∈ Z n×m and W ∈ R m×n , defining models (10) and ( 11) via (5), from measurements of type (8).Matrix C, describing what statistical moments are observed for what species, is fixed by the measurement experiment design and is thus known.
In a first, naive attempt, one could approach identification of S and W by directly fitting measurements (8) with the predictions of ( 10) and (11).However, this is computationally intensive, since it requires solving the dynamics (10) repeatedly over the joint search space of S and W, and does not give any insight into the identifiability of S and W. On the other hand, Equations ( 10) and ( 11) are a linear dynamical system.Disregarding the specific structure of A and B, linear dynamical systems can be reconstructed by standard identification techniques [31,39].This suggests separating identification of S and W into two steps.The first step is the reconstruction of a dynamical model in the form ξ(t) = Âξ(t) + Ku(t), from measurements (8).For noiseless measurements, one such model perfectly matching the data is of course given by Equations ( 10) and (11), with z = ξ, Â = A, K = BG and Ĉ = C. Assuming for a moment that the reconstructed Â equals A and K = BG, the second step is the inference of S and W from Â and K in light of Equation (5).A similar approach is used for parameter estimation of a known network in [30].
A priori, none of the two steps has a unique solution.In the first step, several models ( Â, K, Ĉ) may equally explain the same data, that is, Â and K may not be uniquely defined (even their dimension is generally not uniquely determined).In addition, noise on measurement samples propagates into the computation of Â, K.In the second step, leaving G aside for the moment, the crucial question is whether the matrices S and W that define the network structure can be uniquely factored out of Â.As we will see, the answer depends in the first place on the relationship between Â and A.
In analogy with the discussion of Section 3.1, we will first look at the problem in terms of identifiability from y(•) and u(•).In Section 4.1, we will look at the reconstruction problem of a suitable model in the form (22).In Section 4.2, we will look at the subsequent problem of extracting S and W from the reconstructed Â.In order to understand how stochasticity contributes to the reconstruction of unknown networks, in these two sections, we will focus on a first case, where y(•) = µ(•), and a second case, where observations additionally include the entries of Σ(•), that is, y(•) = z(•).Then, in Section 4.3, we will look at these two steps from a practical perspective, providing a method to infer S and W from real data.The contribution of stochasticity to the identifiability of S and W and the full network identification procedure will be demonstrated on an example in Section 4.4.We will assume throughout this section that u(t) is assigned, leaving experiment design considerations to later studies (see Section 5).

Step 1: Identifiability of a Linear Model for the Moment Dynamics
Here, we are concerned with the reconstruction of a model in the form (22). Let us recall some facts [40].In the context of linear state-space models, a realization of a (strictly causal) linear map L : U → Y from an input space U to an output space Y is any triplet ( Â, K, Ĉ) such that the solution of ( 22) reproduces this mapping.Its order is the size of matrix Â.It is well known that, if ( Â, K, Ĉ) is one realization, all triplets (T ÂT −1 , T K, ĈT −1 ), with T a nonsingular square matrix, are equally viable realizations.A realization is minimal if it has the smallest size of Â among all realizations.If ( Â, K, Ĉ) is minimal, then all minimal realizations are of the form (T ÂT −1 , T K, ĈT −1 ) for some nonsingular T, i.e., minimal realizations form an equivalence class.
Recall that n y is the number of rows of C, that is, the size of vector y.We will make the following assumption.
Assumption 1.The minimal realization of the linear map expressed by (10) and ( 11) is of order greater than or equal to n y .In addition, for the given u, the class of minimal realizations is uniquely determined by u and the corresponding y.
The first part of this assumption concerns structural properties of the system, essentially ruling out singular cases where the moment equations provide a redundant description of the dynamics of y.As a counterexample, a system that would not fulfill this requirement is one where two observed species participate in the same reaction in exactly the same way, such that the evolution of their moments is identical.On the basis of the first part, the second part of the assumption concerns the informativity of the given input-output pair, requiring in practice that all observed moments are excited by the given input.In view of the relation between minimal realizations and transfer functions, this corresponds to saying that the system transfer function can be identified from the given input-output pair.This is a reasonable requirement provided a sufficiently rich excitation input [39,40].
In summary, under Assumption 1, identification is well-posed up to an unknown matrix T.However, if ( Â, K, Ĉ) is a minimal realization determined from u and y, T is constrained by the knowledge of C, since it must hold that ĈT −1 = C.What this constraint implies on the reconstructed model depends on how a minimal realization of the system itself looks like.We focus on the following cases: Case (i): Observation of mean only (y = µ).In this case, n y = n and C = C 0 n×n 2 , with C ∈ R n×n nonsingular (typically the identity).In view of the structure of A in (5), for this definition of C, one realization of ( 10) and ( 11) is This realization is of order n y = n and is minimal for non-degenerate definitions of S, W and G. Assumption 1 is thus satisfied provided the input and/or the initial conditions excite all system dynamics.Then, any reconstructed model ( Â, K, Ĉ) must satisfy ( Â, K, Ĉ) = (TSWT −1 , TSG, C T −1 ) for some invertible T. Since C is known and invertible, T = Ĉ−1 C is uniquely determined, and so are SW = T −1 ÂT and SG = T −1 K. Case (ii): Observation of mean and covariance matrix (y = z).Since Σ = Σ T , this case is captured by a model where C has n y = n + n(n + 1)/2 rows and n + n 2 columns.The definition of C is such that y = Cz = C ξ, where ξ is an n y -dimensional vector containing all and only the distinct entries of z, and C is invertible (in particular, C and C can be (0, 1)-matrices).One realization of ( 10) and ( 11) is then where A and K are formed from all and only the distinct entries of A and BG, in the same order, in accordance with the definition of ξ.This realization is of order n y , and is minimal except for peculiar definitions of S and W. Similar to Case (i), provided Assumption 1 is satisfied, any reconstructed model ( Â, K, Ĉ) must satisfy ( Â, K, Ĉ) = (TA T −1 , TK , C T −1 ), with T invertible.Because C is known, by the same arguments used in Case (i), matrices A and K are uniquely determined.Since they necessarily contain all elements of A and BG, the latter are also uniquely determined.
In summary, under Assumption 1, Case (i) guarantees unambiguous reconstruction of the products SW and SG, while Case (ii) also guarantees unambiguous reconstruction of the other blocks of A, namely S (2) W and I n ⊗ (SW) + (SW) ⊗ I n , and of the second block of BG, namely S (2) G.Other cases of interest exist, among which, for instance, the observation of the statistics of part of X, or the observation of the diagonal entries of Σ only [8,9,18].The investigation of these scenarios is beyond the scope of this paper, but is rediscussed in Section 5.

Step 2: Identifiability of the Network Stoichiometry and Rate Parameter Matrices
We have shown in the previous section that different information about the matrices of model ( 10) and ( 11) can be obtained depending on the definition of matrix C.This can be summarized in terms of the (n Under Assumption 1, Case (i) yields perfect reconstruction of the first block-row (first n rows) of Ξ(S, W, G), whereas Case (ii) also yields the remaining n 2 rows.Note that, for the latter case, the further availability of I n ⊗ (SW) + (SW) ⊗ I n does not provide additional information since the product SW is already part of the first n rows of Ξ(S, W, G).For these two cases, the question addressed in this section is what can be said about the individual contributions of S, W and G.
Denote with Ξ h (S, W, G) the first h rows of (23).For h = n and h = n + n 2 , in the order, denote with Ξh the estimate of Ξ h (S, W, G) obtained in Step 1 for Case (i) and Case (ii), in the same order.Under the current hypothesis of perfect reconstruction, it holds that The question we are posing is about the solutions (S, W, G) of ( 24) given Ξh .One solution to ( 24) is of course provided by the true network matrices, which we denote S * , W * , G * .Equivalent solutions are all triplets (S, W, G) obtained by permutation of the columns of S * and corresponding permutation of the rows of W * and G * , since these are trivially different enumerations of the network reactions R 1 , . . ., R m .However, other solutions may exist.Simple algebraic manipulations of ( 23) and ( 24) yield 2) .( Thus, for a given Ξh , the viable triplets (S, W, G) are all solutions to the mixed factorization problem (25) with discrete-valued factor L h (S) and continuous-valued factor [W G].Clearly, the set of solutions is generally smaller the larger the h, since more equations need to be satisfied.That is, Case (ii) has better potential than Case (i) for precise network reconstruction.Starting from (25), a characterization of the solutions in terms of algebraic or topological properties of the true underlying network would be desirable but is still unavailable to us.Our contribution here is an operational definition of all possible solutions, that is, an algorithmic procedure to seek all solutions corresponding to a given Ξh .
Let S ⊆ Z n×m , W ⊆ R m×n and G ⊆ R m×n u , with W and G convex.For any matrix norm ||| • |||, if S * ∈ S, W * ∈ W and G * ∈ G, the solutions of (25) coincide with the set of optimizers of and (S * , W * , G * ) attains the minimum Q * = 0. Problem (26) can be rewritten as Q(S) min For any fixed S, Equation ( 28) is a least-squares problem with convex constraints that is easy to solve by standard search algorithms [41].For finite S, the complete solution space of (27) can then be determined by exploration of S, seeking all S such that Q(S) = 0.In general, the solution of ( 28) is not unique.If S is such that Q(S) = 0 and (W, G) solves the corresponding problem (28), then all couples are in ker L h (S) ("ker(•)" denotes the kernel of a matrix) equally solve (28).The multiplicity of the solutions for a given S thus depends on the interplay between ker L h (S) and W × G.We will come back on this in the example of Section 4.4.
For all practical matters (robustness to numerical errors and modelling inaccuracies, and applications below where Ξh is computed from the noisy data (8), one should require that the equality in (25) holds approximately within a suitable tolerance > 0. This leads to the procedure for the computation of the set of solutions Ω detailed in Algorithm 1.
One possible choice of S is to consider all matrices S ∈ Z n×m whose elements S i,j are such that, for some S max ∈ N, |S i,j | ≤ S max , with i = 1, . . ., n and j = 1, . . ., m.The size of S in this case is of order O(S nm max ).Despite the exponentially growing complexity, exploring S by enumeration remains viable for networks of small size.The complexity of this search could be dramatically reduced by recalling that permutations of the reactions list amount to equivalent models.Other ameliorations are possible to improve the scalability of the method (see also the relevant discussion in Section 5).

Algorithm 1: Identification of stoichiometry and rate parameters from a model of the moment dynamics
Given Ξh and an > 0: Set Ω = ∅; For every S ∈ S: Solve problem (28) to get Q(S) and the solution set To conclude, suppose that the true number of reactions, say m * , is unknown.One way to generalize the procedure above to this scenario is to execute it for a value of m large enough.By this approach, let m ≤ m be the minimum number of nonzero columns of S among the elements of Ω.Since the null columns of S do not contribute to the network dynamics, m is the minimum number of reactions needed to explain the data and is thus a viable estimate of m.In practice, though, this approach is computationally inefficient, since it explores an unnecessarily large space S. A better solution is to proceed incrementally and execute the algorithm for increasing values of m, stopping the exploration as soon as Ω = ∅.

Network Identification in Practice
So far in this section, we have treated reconstruction of moment dynamics (Step 1) and then of biochemical network matrices (Step 2) in the absence of noise.We now devise a procedure for the two-step estimation of S, W and G from noisy moment measurements of the types ( 8) and (9).To do this, we will repeatedly exploit the methods of Section 3.2, with a specific definition of parameters θ.For ease of exposition, we focus on Case (ii) and assume that the initial system moments z 0 are known.The generalization to possible unknown entries of z 0 is straightforward.The necessary adaptations for Case (i) are commented on at the end of the section.
Consider Step 1 first.As mentioned in Section 4.1, general approaches to the estimation of linear state-space models can be borrowed from literature.Here, however, we account explicitly for the structure of model ( 5) and develop a dedicated approach.Let θ 1,1 , θ 1,2 , θ 2,1 and θ 2,2 denote the unknown matrix products SW, SG, S (2) W and S (2) G that compose (23), in the same order, and let θ be the vector collecting all entries of θ 1,1 , θ 1,2 , θ 2,1 and θ 2,2 .Define ŷθ (•) as the solution of Note that, for true parameter values, these equations coincide with the true moment dynamics ( 10) and (11).With these definitions of θ and ŷθ , for a suitable search space Θ, we compute the ML estimate θ by the solution of ( 16), which provides us with estimates of SW, SG, S (2) W and S (2) G.In practice, this solution shall be computed by numerical optimization.Now, consider Step 2. The estimates θ of SW, SG, S (2) W and S (2) G from above allow us to build (a noisy version of) the matrix Ξh of Section 4.2 (with h = n + n 2 for Case (ii)).This matrix can be used to run Algorithm 1 and determine a set of solution triplets (S, W, G).However, for a candidate solution (S, W, G), the acceptance criterion Q(S, W, G) < (with Q(S, W, G) = ||| Ξh − L h (S) • [W, G]|||) needs to be adapted to the statistics of the noise that corrupts Ξh .To do this, we first compute the estimation error covariance matrix for Ξh , say V h , by the sensitivity method explained after (16).Armed with V h , for a given confidence level 1 − α, the idea is to define the norm ||| • ||| and such that a candidate solution (S, W, G) is accepted by Algorithm 1 if the discrepancy between L h (S) • [W G] and Ξh falls within the (1 − α)-confidence ellipsoid around Ξh .In view of (18), this is obtained by setting = χ α and In this way, our acceptance criterion is expressed as a χ 2 -statistical test that the candidate solution corresponds to the true parameters underlying our estimate Ξh .
For Case (i), the method described above remains the same, except that θ shall only contain the entries of θ 1,1 and θ 1,2 , model ( 29) and ( 30) is restricted to the mean dynamics, and Ξh , L h are defined for h = n.From a computational viewpoint, our two-step approach requires fitting moment equations to measurements (8) only once (Step 1).On the contrary, the naive approach discussed at the beginning of Section 4 would require solving a similar optimization problem iteratively for all candidate stoichiometries S. Our approach postpones this search to the second step based on the iterative solution of a much simpler, convex optimization problem, with tremendous computational saving.Observe that the error statistics in the estimation of matrix products S * W * and (S * ) (2) W * from Step 1 are quantitatively accounted for in the construction of the test to select the compatible network structures in Step 2. Therefore, provided the ML estimator in Step 1 is close to optimality, the splitting of network identification into two steps is not expected to deteriorate performance relative to a one-step procedure testing candidate network structures directly on the moment measurements.
In summary, we have devised an algorithm to perform reconstruction of biochemical networks from real-world population-snapshot data.The method can be applied to mean data only, but, contrary to existing approaches, it equally applies to joint mean and variance measurements.We argued by theoretical arguments that leveraging the additional variance measurements is expected to improve reconstruction accuracy.In the next section, we will show that this is indeed the case by the analysis of a case study, which reconfirms the practical interest of our method.

Example: A Toy Network
We now apply the methods developed above to a toy example.Our first aim is to study the achievable network reconstruction performance in Case (i) (observations of mean only) and (ii) (observations of mean and covariance matrix).To do this, we initially focus on the network reconstruction step described in Section 4.2, assuming that the matrix products S * W * and, for Case (ii), (S * ) (2) W * are known exactly.For simplicity, we ignore G.This constitutes no loss of generality since G enters the factorization problem (25) in a way similar to W. Our second aim is to test the full identification procedure of Section 4.3, where the products S * W * and (S * ) (2) W * are not known and need to be estimated from noisy measurements of the network moments.This is done at the end of the section on the basis of simulated data.
Consider a reaction network with where superscript " * " denotes true system quantities.The actual network size is then n * = 2 and m * = 3.For this system, the matrix products defining the moment dynamics are First assume that, under Assumption 1, these products are perfectly reconstructed from the observation of the system moments.More precisely, in agreement with Section 4.1, S * W * is available for Case (i) and (ii), whereas (S * ) (2) W * is additionally available for Case (ii) only.In order to infer S * and W * from this data by the procedure of Section 4.2, we assumed that S is the set of all integer matrices with entries |S i,j | ≤ S max , with S max = 2, and W = R m×n + .We first tested the approach with the number of reaction channels known and equal to the true value m = m * = 3.The size of S in this case is (2S max + 1) n•m = 5 6 = 15, 625 possible stoichiometry matrices.For every candidate stoichiometry S, the computation of Q(S) and Ω was performed via the Matlab function lsqnonneg, implementing quadratic optimization under nonnegativity constraints.To cope with numerical errors in the solution of this optimization, here we take = 10 −6 .Results are summarized in Table 1 (column with m = 3).In Case (i), we found 2604 stoichiometry matrices S such that Q(S) < , i.e., about 16.7% of the matrices in S can explain the data, provided an adapted choice of rate parameters W. For the given definition of W (nonnegative real vectors), if (S, W) is a solution and the kernel of S contains a nonnegative vector W ∈ W, then all couples (S, W + c W) with c ≥ 0 are solutions, since c • W ∈ W and S • (W + c • W) = SW (note that the kernel of S is nontrivial for m > 2).Thus, in general, infinitely many solutions (S, W) correspond to a viable S.
In Case (ii), only six matrices S were found such that Q(S) < , i.e., only about 0.038% of all possible stoichiometries S is in agreement with the data.This is of course due to the different definition of Q, which also penalizes poor fit with (S * ) (2) W * .In this case, W is uniquely determined by S, since W ∈ W should be sought in ker(S) ∩ ker(S (2) ), and all S that were selected are such that ker(S) ∩ ker(S (2) ) = {0}.Compared with Case (i) above, the gain in using observations of the second-order moments is thus striking.
In both cases, as anticipated in Section 4.2, the solutions found are redundant, since they correspond to the same reactions listed in a different order.In particular, we verified by inspection that the six solutions (S, W) in Case (ii) are given by all possible column permutations of S * and corresponding permutations of the rows of W * .Thus, in this case, the solution found is essentially unique.In Case (i), these 6 solutions are instead contained in a much larger pool of 2604 putative solutions.
We then run the algorithm for values of m different from m * in order to test the feasilibility of its estimation.In Case (i), a nonempty solution set Ω was returned for m ≥ 2. This is not surprising since the columns of S * W * belong to a two-dimensional linear space.Therefore, in general, S must have at least two columns in order to span this space via the product SW.At the same time, this shows that m * will be underestimated based on sole mean measurements.In Case (ii), instead, solutions with m < m * were ruled out, showing the potential of joint mean and variance measurements for the estimation of m * .In both Case (i) and (ii), for m = 4, many putative solutions are found from within the pool of 5 8 = 390, 625 stoichiometries tested.Putative solutions include correct solutions where S has one null column and the remaining columns given by a permutation of the columns of S * .As expected, alternative pairs (S, W) that do not relate with (S * , W * ) as also returned in Ω, as a result of the over-parametrization of the model in this case.However, Table 1 confirms that the covariance measurements collected in Case (ii) guarantee a tighter selection of the solution set.
The computational time to run the algorithm increases rapidly with m, as expected from the exponential complexity of the search, and it is similar for Case (i) and (ii).In scenarios where m * is unknown, this shows the importance of testing candidate values m incrementally.In the given example, stopping the search with the solutions found for m = 3 (overall execution time less than four seconds) allows one to spare much computational time associated with the exploration of solutions with m = 4 (between one and two minutes).
Finally, for Case (ii), we implemented and run the full identification procedure of Section 4.3 from simulated data.Here, the products S * W * and (S * ) (2) W * are not known and are estimated from noisy mean and variance measurements.These measurements were obtained by simulating the moment Equations ( 10) and ( 11) with the true S * and W * from initial conditions known and fixed to µ(0) = [100, 0] T and Σ(0) null.We generated measurements at times t = ( − 1) • T, with = 1, . . ., N y , where T = 5 and N y = 20, adding random noise with covariance matrix R = diag(0.3 2 , 0.3 2 ) for all , corresponding to roughly 1% error on the observed mean and variance profiles.An example simulated dataset is shown in Figure 3a.The numerical optimization in the first step of the procedure, yielding noisy estimates of the matrix products S * W * and (S * ) (2) W * , was implemented by MATLAB's lsqnonlin.Results are illustrated in Figure 3b.Estimates are found to be well-centered and little dispersed relative to the true values of the entries of S * W * and (S * ) (2) W * , showing the effectiveness of the reconstruction of the moment dynamics.Starting from the noisy estimates of S * W * and (S * ) (2) W * , the second step was implemented as described in Section 4.3, for a significance level of α = 0.05%.With this noise and significance levels, for m ∈ {1, 2, 3, 4}, the same solutions as in Table 1 were returned over several runs, showing feasibility and effectiveness of the approach.For the case of m = m * , in particular, we quantified the rate of rejection of correct candidate solutions.Over a few hundred runs, we found this rate to be around 1%, that is, smaller than the prescribed rate α.This can be ascribed to the linear approximations made to establish the χ 2 statistic in Section 3.2.On the other hand, incorrect solutions were never accepted.
To sum up, we showed that stochastic information about the network dynamics allows one to identify the structure of a biochemical reaction network in cases where the sole mean data does not.In addition, in the scenario where the observable moments provide full information for network reconstruction, we showed that our two-step procedure is capable of correctly identifying the example network from data with realistic measurement error levels.

Discussion
In this paper, we have investigated the problems of parameter identification and reconstruction of interactions in biochemical reaction networks.In the context of population snapshot data (firstand second-order statistics), and state-affine reaction rates, we have provided practical methods to study identifiability of unknown parameters and algorithms to address network reconstruction.In both cases, we have shown superiority of stochastic, single-cell approaches over deterministic, population-average data.In so doing, we have also extended the existing results on identifiability of gene expression models.
Our parameter identifiability analysis is developed with reference to a given observation model (what statistics of what species are detected) and a given perturbation input.A desirable theoretical advance of our work is the investigation of identifiability as a property independent of a specific input choice.From a practical standpoint, instead, evaluating structural and practical identifiability for different inputs and observation models allows one to design most informative experiments and, for synthetic biology, the engineering of most informative intracellular reporters.Developments of our work in the context of optimal experiment design indeed constitute a first important direction of future investigation.
Our results were developed for a class of problems of immediate relevance to applications.Population-snapshot data are nowaday easily collected by simple experiments such as flow-cytometry.More complex experimental setup, such as microfuidics in combination with video-microscopy, also provide this type of data.In addition, these allow for time-lapse monitoring of individual cells, providing time correlations that we did not account for here.However, this requires nontrivial processing of raw images entailing single-cell tracking procedures, which are not always performed in practice.Instead, for network reconstruction, more complex measurement scenarios, such as the monitoring of a subset of the network species and lack of covariance measurements across different species, shall be addressed in detail to widen the applicability of our methods.
Of course, the choice of networks with affine rates formally restricts one to reactions of zerothor first-order only.However, as the random-telegraph reporter gene model exemplifies, real systems of interest exist in this form.At the same time, state-affine rates have been used to model and investigate interactions of arbitrarily complex networks in an approximate manner [15].Precise account of nonlinear stochastic dynamics instead requires generalization of our methods, for instance, with moment-closure approaches [29,42,43].This is another direction of research.
Finally, the proposed network reconstruction methods were shown to be viable for a small toy example.Whereas automated reconstruction of small networks is of practical interest, network reconstruction methods are of greatest help for the investigation of larger networks of interactions.Our current algorithm performs fast on small networks but scales poorly with the network size (number of species and putative reaction channels).As anticipated in Section 4.2, great improvements can be easily obtained by an optimized implementation that avoids the exploration of redundant network candidates.In particular, considering that not only permutations of the reactions list but also identical columns of a stoichiometry matrix are redundant, it can be seen that the number of effectively distinct stoichiometry matrices to be tested is in the order of ( S n

Figure 1 .
Figure 1.Reporter gene system.The coding sequence of a fluorescent reporter protein is engineered into a gene of interest.The gene promoter can switch between an inactive (off) and an active (on) state.When active, transcription of reporter mRNA molecules is enabled.Existing mRNA molecules are further translated into visible (quantifiable) reporter protein molecules.Both mRNA and protein molecules are subject to degradation.

Figure 2 .
Figure 2. Parameter estimation results.(a) scatter plots of the estimates of parameters (k M , d M , k P , d P , λ + , λ − ) from 10 simulated datasets (blue dots) and theoretically computed 95% confidence regions (red lines).Results for all different pairs of these parameters are reported in the different boxes, as per labeling on top and left of the figure.Estimated values and pairwise confidence ellipsoids (one-dimensional confidence intervals for boxes on the diagonal) are normalized by the true parameter values.Dashed lines show the reference coordinates (1, 1) corresponding to true values; (b) estimated dynamics of the system means (left) and variances (right), corresponding to the 10 different estimates of θ * in (a).Solid blue lines show estimates, dashed black lines show true system statistics.In the bottom plots, black circles show measurements used for one of these estimates.

Figure 3 .
Figure 3. Simulated measurements and estimates of S * W * and (S * ) (2) W * for the network reconstruction example.(a) true trajectories of the entries of µ and Σ (solid black line) and one simulated dataset (blue markers); (b) estimates of the entries of S * W * and (S * ) (2) W * (blue markers) obtained from 10 different datasets (dashed black lines indicate true values).Notation (•) r,c is used in labels to denote the row-r, column-c entry of a matrix.

Table 1 .
Network reconstruction results for Case (i) and Case (ii), for different hypotheses on the number of reactions m.Number of solutions refers to the number of different stoichiometry matrices in Ω. Acceptance ratio is the number of solutions divided by the number of stoichiometry matrices tested, given by |S| = 5 2m .Computational times are in seconds, evaluated on a 4-core 3GHz Intel Xeon processor (Santa Clara, CA, USA).Results for the true number of reactions (m = m * ) are reported in bold.