# Identifiability and Reconstruction of Biochemical Reaction Networks from Population Snapshot Data

## Abstract

**:**

## 1. Introduction

## 2. Modelling of Biochemical Reaction Networks

#### 2.1. Vector Representation of Moment Equations

**Proposition**

**1.**

**Proof.**

#### 2.2. Input–Output Model

## 3. Identification of Parameters

#### 3.1. Structural Identifiability

**Definition**

**1**(Identifiability at a point)

**.**

- (a)
- locally identifiable at${\theta}^{\ast}\in \Theta $if, for some neighborhood${B}_{{\theta}^{\ast}}\subseteq \Theta $of${\theta}^{\ast}$,$${\widehat{y}}_{\theta}={\widehat{y}}_{{\theta}^{\ast}}\Rightarrow \theta ={\theta}^{\ast},\phantom{\rule{1.em}{0ex}}\forall \theta \in {B}_{{\theta}^{\ast}};$$
- (b)
- globally identifiable at${\theta}^{\ast}$if the implication above holds for${B}_{{\theta}^{\ast}}=\Theta $.

**Definition**

**2**(Structural identifiability)

**.**

**Proposition**

**2.**

**Proof.**

**Corollary**

**1.**

**Proof.**

#### 3.2. Parameter Identification in Practice

#### 3.3. Example: Reporter Gene Expression Dynamics

`fmincon`, with initial guess $5\xb7{\theta}^{\ast}$). Results are reported in Figure 2a together with $95\%$ confidence regions determined from (18).

## 4. Identification of Networks

#### 4.1. Step 1: Identifiability of a Linear Model for the Moment Dynamics

**Assumption**

**1.**

- Case (i):
- Observation of mean only ($y=\mu $). In this case, ${n}_{y}=n$ and $C=\left[\begin{array}{cc}{C}^{\prime}& {0}_{n\times {n}^{2}}\end{array}\right]$, with ${C}^{\prime}\in {\mathbb{R}}^{n\times n}$ nonsingular (typically the identity). In view of the structure of A in (5), for this definition of C, one realization of (10) and (11) is$$\begin{array}{cc}\hfill \dot{\xi}\left(t\right)& =SW\xi \left(t\right)+SGu\left(t\right),\hfill \\ \hfill y\left(t\right)& ={C}^{\prime}\mu \left(t\right).\hfill \end{array}$$This realization is of order ${n}_{y}=n$ and is minimal for non-degenerate definitions of S, W and G. Assumption 1 is thus satisfied provided the input and/or the initial conditions excite all system dynamics. Then, any reconstructed model $(\widehat{A},\widehat{K},\widehat{C})$ must satisfy $(\widehat{A},\widehat{K},\widehat{C})=(TSW{T}^{-1},TSG,{C}^{\prime}{T}^{-1})$ for some invertible T. Since ${C}^{\prime}$ is known and invertible, $T={\widehat{C}}^{-1}{C}^{\prime}$ is uniquely determined, and so are $SW={T}^{-1}\widehat{A}T$ and $SG={T}^{-1}\widehat{K}$.
- Case (ii):
- Observation of mean and covariance matrix ($y=z$). Since $\Sigma ={\Sigma}^{T}$, this case is captured by a model where C has ${n}_{y}=n+n(n+1)/2$ rows and $n+{n}^{2}$ columns. The definition of C is such that $y=Cz={C}^{\u2033}\xi $, where $\xi $ is an ${n}_{y}$-dimensional vector containing all and only the distinct entries of z, and ${C}^{\u2033}$ is invertible (in particular, C and ${C}^{\u2033}$ can be $(0,1)$-matrices). One realization of (10) and (11) is then$$\begin{array}{cc}\hfill \dot{\xi}\left(t\right)& ={A}^{\u2033}\xi \left(t\right)+{K}^{\u2033}u\left(t\right),\hfill \\ \hfill y\left(t\right)& ={C}^{\u2033}{\xi}^{\u2033}\left(t\right),\hfill \end{array}$$

#### 4.2. Step 2: Identifiability of the Network Stoichiometry and Rate Parameter Matrices

Algorithm 1: Identification of stoichiometry and rate parameters from a model of the moment dynamics |

Given ${\widehat{\Xi}}_{h}$ and an $\u03f5>0$: |

Set $\Omega =\varnothing $; |

For every $S\in \mathbb{S}$: |

Solve problem (28) to get $\widehat{Q}\left(S\right)$ and the solution set $\widehat{\Omega}\left(S\right)=\{(W,G):\phantom{\rule{3.33333pt}{0ex}}Q(S,W,G)=\widehat{Q}\left(S\right)\}$; |

If $\widehat{Q}\left(S\right)<\u03f5$, include $\left\{S\right\}\times \widehat{\Omega}\left(S\right)$ in $\Omega $; |

Return $\Omega $. |

#### 4.3. Network Identification in Practice

#### 4.4. Example: A Toy Network

`lsqnonneg`, implementing quadratic optimization under nonnegativity constraints. To cope with numerical errors in the solution of this optimization, here we take $\u03f5={10}^{-6}$. Results are summarized in Table 1 (column with $m=3$).

`lsqnonlin`. Results are illustrated in Figure 3b. Estimates are found to be well-centered and little dispersed relative to the true values of the entries of ${S}^{\ast}{W}^{\ast}$ and ${\left({S}^{\ast}\right)}^{\left(2\right)}{W}^{\ast}$, showing the effectiveness of the reconstruction of the moment dynamics. Starting from the noisy estimates of ${S}^{\ast}{W}^{\ast}$ and ${\left({S}^{\ast}\right)}^{\left(2\right)}{W}^{\ast}$, the second step was implemented as described in Section 4.3, for a significance level of $\alpha =0.05\%$. With this noise and significance levels, for $m\in \{1,2,3,4\}$, the same solutions as in Table 1 were returned over several runs, showing feasibility and effectiveness of the approach. For the case of $m={m}^{\ast}$, in particular, we quantified the rate of rejection of correct candidate solutions. Over a few hundred runs, we found this rate to be around $1\%$, that is, smaller than the prescribed rate $\alpha $. This can be ascribed to the linear approximations made to establish the ${\chi}^{2}$ statistic in Section 3.2. On the other hand, incorrect solutions were never accepted.

## 5. Discussion

## Funding

## Conflicts of Interest

## References

- Ashyraliyev, M.; Fomekong-Nanfack, Y.; Kaandorp, J.; Blom, J. Systems Biology: Parameter Estimation for Biochemical Models. FEBS J.
**2009**, 276, 886–902. [Google Scholar] [CrossRef] [PubMed] - Marbach, D.; Costello, J.; Küffner, R.; Vega, N.; Prill, R.; Camacho, D.; Allison, K.; The DREAM5 Consortium; Kellis, M.; Collins, J.; et al. Wisdom of crowds for robust gene network inference. Nat. Methods
**2012**, 9, 796–804. [Google Scholar] [CrossRef] [PubMed][Green Version] - Purnick, P.; Weiss, R. The second wave of synthetic biology: From modules to systems. Nat. Rev. Mol. Cell Biol.
**2009**, 10, 410–422. [Google Scholar] [CrossRef] [PubMed] - Chis, O.T.; Banga, J.R.; Balsa-Canto, E. Structural Identifiability of Systems Biology Models: A Critical Comparison of Methods. PLoS ONE
**2011**, 6, e27755. [Google Scholar] [CrossRef] [PubMed][Green Version] - Gutenkunst, R.N.; Waterfall, J.J.; Casey, F.P.; Brown, K.S.; Myers, C.R.; Sethna, J.P. Universally Sloppy Parameter Sensitivities in Systems Biology Models. PLoS Comput. Biol.
**2007**, 3, e189. [Google Scholar] [CrossRef] [PubMed][Green Version] - Raue, A.; Kreutz, C.; Maiwald, T.; Bachmann, J.; Schilling, M.; Klingmüller, U.; Timmer, J. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics
**2009**, 25, 1923–1929. [Google Scholar] [CrossRef] [PubMed][Green Version] - Taniguchi, Y.; Choi, P.J.; Li, G.W.; Chen, H.; Babu, M.; Hearn, J.; Emili, A.; Xie, X.S. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science
**2010**, 329, 533–538. [Google Scholar] [CrossRef] [PubMed] - Munsky, B.; Trinh, B.; Khammash, M. Listening to the noise: Random fluctuations reveal gene network parameters. Mol. Syst. Biol.
**2009**, 5, 318. [Google Scholar] [CrossRef] [PubMed] - Zechner, C.; Ruess, J.; Krenn, P.; Pelet, S.; Peter, M.; Lygeros, J.; Koeppl, H. Moment-based inference predicts bimodality in transient gene expression. PNAS
**2012**, 109, 8340–8345. [Google Scholar] [CrossRef] [PubMed][Green Version] - Helmke, U.; Hüper, K.; Khammash, M. Global identifiability of a simple linear model for gene expression analysis. In Proceedings of the 52nd IEEE CDC, Florence, Italy, 10–13 December 2013. [Google Scholar]
- Cho, K.H.; Choo, S.M.; Jung, S.; Kim, J.R.; Choi, H.S.; Kim, J. Reverse engineering of gene regulatory networks. IET Syst. Biol.
**2007**, 1, 149–163. [Google Scholar] [CrossRef] [PubMed] - Markowetz, F.; Spang, R. Inferring cellular networks: A review. BMC Bioinform.
**2007**, 28, S5. [Google Scholar] [CrossRef] [PubMed] - Hasenauer, J.; Waldherr, S.; Doszczak, M.; Radde, N.; Scheurich, P.; Allgower, F. Identification of models of heterogeneous cell populations from population snapshot data. BMC Bioinform.
**2011**, 12, 125. [Google Scholar] [CrossRef] [PubMed] - Paulsson, J. Models of stochastic gene expression. Phys. Life Rev.
**2005**, 2, 157–175. [Google Scholar] [CrossRef] - Thattai, M.; van Oudenaarden, A. Intrinsic noise in gene regulatory networks. PNAS
**2001**, 98, 8614–8619. [Google Scholar] [CrossRef] [PubMed][Green Version] - Hespanha, J. Modelling and analysis of stochastic hybrid systems. IEE Proc. Control Theory Appl.
**2006**, 153, 520–535. [Google Scholar] [CrossRef] - Sotiropoulos, V.; Kaznessis, Y. Analytical Derivation of Moment Equations in Stochastic Chemical Kinetics. Chem. Eng. Sci.
**2011**, 66, 268–277. [Google Scholar] [CrossRef] [PubMed] - Cinquemani, E. Reconstruction of promoter activity statistics from reporter protein population snapshot data. In Proceedings of the 54th IEEE CDC, Osaka, Japan, 15–18 December 2015; pp. 1471–1476. [Google Scholar]
- Cinquemani, E. Structural identification of biochemical reaction networks from population snapshot data. In Proceedings of the 20th IFAC World Congress, IFAC—PapersOnLine, Toulouse, France, 9–14 July 2017; Volume 50, pp. 12629–12634. [Google Scholar]
- Berthoumieux, S.; Brilli, M.; Kahn, D.; de Jong, H.; Cinquemani, E. On the identifiability of metabolic network models. J. Math. Biol.
**2013**, 67, 1795–1832. [Google Scholar] [CrossRef] [PubMed] - Bansal, M.; Belcastro, V.; Ambesi-Impiombato, A.; di Bernardo, D. How to infer gene networks from expression profiles. Mol. Syst. Biol.
**2007**, 3, 78. [Google Scholar] [CrossRef] [PubMed] - Gardner, T.; Faith, J. Reverse-engineering transcription control networks. Phys. Life Rev.
**2005**, 2, 65–88. [Google Scholar] [CrossRef] [PubMed] - Porreca, R.; Cinquemani, E.; Lygeros, J.; Ferrari-Trecate, G. Identification of genetic network dynamics with unate structure. Bioinformatics
**2010**, 26, 1239–1245. [Google Scholar] [CrossRef] [PubMed][Green Version] - Neuert, G.; Munsky, B.; Tan, R.; Teytelman, L.; Khammash, M.; van Oudenaarden, A. Systematic Identification of Signal-Activated Stochastic Gene Regulation. Science
**2013**, 339, 584–587. [Google Scholar] [CrossRef] [PubMed][Green Version] - Gillespie, D. A Rigorous Derivation of the Chemical Master Equation. Physica A
**1992**, 188, 404–425. [Google Scholar] [CrossRef] - Van Kampen, N. Stochastic Processes in Physics and Chemistry; North-Holland Personal Library: Amsterdam, The Netherlands, 1992. [Google Scholar]
- Gadgil, C.; Lee, C.; Othmer, H. A stochastic analysis of first-order reaction networks. Bull. Math. Biol.
**2005**, 67, 901–946. [Google Scholar] [CrossRef] [PubMed][Green Version] - Gillespie, D.T. The chemical Langevin equation. J. Chem. Phys.
**2000**, 113, 297–306. [Google Scholar] [CrossRef][Green Version] - Gillespie, C. Moment-closure approximations for mass-action models. IET Syst. Biol.
**2009**, 3, 52–58. [Google Scholar] [CrossRef] [PubMed] - Parise, F.; Ruess, J.; Lygeros, J. Grey-box techniques for the identification of a controlled gene expression model. In Proceedings of the ECC, Strasbourg, France, 24–27 June 2014. [Google Scholar]
- Walter, E.; Pronzato, L. Identification of Parametric Models—From Experimental Data; Springer: London, UK, 1997. [Google Scholar]
- Walter, E. (Ed.) Identifiability of Parametric Models; Pergamon Press: Oxford, UK, 1987. [Google Scholar]
- Khalil, H.K. Nonlinear Systems; Prentice Hall: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
- Ruess, J.; Lygeros, J. Identifying stochastic biochemical networks from single-cell population experiments: A comparison of approaches based on the Fisher information. In Proceedings of the 52nd IEEE CDC, Florence, Italy, 10–13 December 2013; pp. 2703–2708. [Google Scholar]
- Kay, S.M. Fundamentals of Statistical Signal Processing [Volume I] Estimation Theory; Prentice Hall: Upper Saddle River, NJ, USA, 1993; p. 1. [Google Scholar]
- De Jong, H.; Ranquet, C.; Ropers, D.; Pinel, C.; Geiselmann, J. Experimental and computational validation of models of fluorescent and luminescent reporter genes in bacteria. BMC Syst. Biol.
**2010**, 4, 55. [Google Scholar] [CrossRef] [PubMed] - Kaern, M.; Elston, T.C.; Blake, W.J.; Collins, J.J. Stochasticity in gene expression: From theories to phenotypes. Nat. Rev. Gen.
**2005**, 6, 451–464. [Google Scholar] [CrossRef] [PubMed] - Sanft, K.R.; Wu, S.; Roh, M.; Fu, J.; Lim, R.K.; Petzold, L.R. StochKit2: Software for discrete stochastic simulation of biochemical systems with events. Bioinformatics
**2011**, 27, 2457–2458. [Google Scholar] [CrossRef] [PubMed] - Ljung, L. System Identification: Theory for the User; Prentice Hall: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
- Callier, F.; Desoer, C. Linear System Theory; Springer: New York, NY, USA, 1991. [Google Scholar]
- Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: New York, NY, USA, 2004. [Google Scholar]
- Singh, A.; Hespanha, J. Approximate Moment Dynamics for Chemically Reacting Systems. IEEE Trans. Autom. Control
**2011**, 56, 414–418. [Google Scholar] [CrossRef][Green Version] - Ruess, J.; Milias-Argeitis, A.; Summers, S.; Lygeros, J. Moment estimation for chemically reacting systems by extended Kalman filtering. J. Chem. Phys.
**2011**, 135, 165102. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Reporter gene system. The coding sequence of a fluorescent reporter protein is engineered into a gene of interest. The gene promoter can switch between an inactive (off) and an active (on) state. When active, transcription of reporter mRNA molecules is enabled. Existing mRNA molecules are further translated into visible (quantifiable) reporter protein molecules. Both mRNA and protein molecules are subject to degradation.

**Figure 2.**Parameter estimation results. (

**a**) scatter plots of the estimates of parameters $({k}_{M},{d}_{M},{k}_{P},{d}_{P},{\lambda}_{+},{\lambda}_{-})$ from 10 simulated datasets (blue dots) and theoretically computed $95\%$ confidence regions (red lines). Results for all different pairs of these parameters are reported in the different boxes, as per labeling on top and left of the figure. Estimated values and pairwise confidence ellipsoids (one-dimensional confidence intervals for boxes on the diagonal) are normalized by the true parameter values. Dashed lines show the reference coordinates $(1,1)$ corresponding to true values; (

**b**) estimated dynamics of the system means (left) and variances (right), corresponding to the 10 different estimates of ${\theta}^{\ast}$ in (a). Solid blue lines show estimates, dashed black lines show true system statistics. In the bottom plots, black circles show measurements used for one of these estimates.

**Figure 3.**Simulated measurements and estimates of ${S}^{\ast}{W}^{\ast}$ and ${\left({S}^{\ast}\right)}^{\left(2\right)}{W}^{\ast}$ for the network reconstruction example. (

**a**) true trajectories of the entries of $\mu $ and $\underline{\Sigma}$ (solid black line) and one simulated dataset (blue markers); (

**b**) estimates of the entries of ${S}^{\ast}{W}^{\ast}$ and ${\left({S}^{\ast}\right)}^{\left(2\right)}{W}^{\ast}$ (blue markers) obtained from 10 different datasets (dashed black lines indicate true values). Notation ${(\xb7)}_{r,c}$ is used in labels to denote the row-r, column-c entry of a matrix.

**Table 1.**Network reconstruction results for Case (i) and Case (ii), for different hypotheses on the number of reactions m. Number of solutions refers to the number of different stoichiometry matrices in $\Omega $. Acceptance ratio is the number of solutions divided by the number of stoichiometry matrices tested, given by $\left|\mathbb{S}\right|={5}^{2m}$. Computational times are in seconds, evaluated on a 4-core 3GHz Intel Xeon processor (Santa Clara, CA, USA). Results for the true number of reactions ($m={m}^{\ast}$) are reported in bold.

m | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|

Case (i) | Number of solutions | 0 | 4 | $\mathbf{2604}$ | $\mathrm{150,172}$ |

Acceptance ratio | $0\%$ | $0.64\%$ | $\mathbf{16.7}\%$ | $38.4\%$ | |

Computational time | <$0.01$ | $0.11$ | $\mathbf{2.86}$ | $75.13$ | |

Case (ii) | Number of solutions | 0 | 0 | $\mathbf{6}$ | 564 |

Acceptance ratio | $0\%$ | $0\%$ | $\mathbf{0.038}\%$ | $0.14\%$ | |

Computational time | $0.01$ | $0.12$ | $\mathbf{3.11}$ | $80.57$ |

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cinquemani, E.
Identifiability and Reconstruction of Biochemical Reaction Networks from Population Snapshot Data. *Processes* **2018**, *6*, 136.
https://doi.org/10.3390/pr6090136

**AMA Style**

Cinquemani E.
Identifiability and Reconstruction of Biochemical Reaction Networks from Population Snapshot Data. *Processes*. 2018; 6(9):136.
https://doi.org/10.3390/pr6090136

**Chicago/Turabian Style**

Cinquemani, Eugenio.
2018. "Identifiability and Reconstruction of Biochemical Reaction Networks from Population Snapshot Data" *Processes* 6, no. 9: 136.
https://doi.org/10.3390/pr6090136