# A Language for Modeling and Optimizing Experimental Biological Protocols

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Chemical Reaction Network (CRN)

**Definition**

**1**

**Definition**

**2**

**Definition**

**3**

**Definition**

**4**

**Example**

**1.**

#### 2.2. A Language for Experimental Biological Protocols

**Definition**

**5**

#### 2.3. Gaussian Semantics for Protocols

**Definition**

**6**

**Example**

**2.**

**Example**

**3.**

**Example**

**4.**

**Example**

**5.**

**Definition**

**7**

#### 2.4. Optimization of Protocols through Gaussian Process Regression

**Lemma**

**1.**

**Example**

**6.**

## 3. Results

#### 3.1. Gibson Assembly Protocol

#### 3.2. Split and Mix Protocol

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Appendix A. Simulation Script

`species {c}`’ and ends at ‘

`equilibrate E`’. For the sensitivity analysis of Figure 5, a function

`f`abstracts the

`equilibrate`time parameters

`e1,e2,e3`and the

`split`proportion parameter

`s1`and yields the concentrations of

`a,b,c`at the end of the protocol. A multivariate random variable

`X`, over a uniform multidimensional sample space

`w`, is constructed from

`f`to vary the parameters. Then

`X`is sampled and plotted.

`function f(number e1 e2 e3 s1) {`

`define`

`species {c}`

`sample A 1¼L, 20C`

`species a @ 10mM in A`

`amount c @ 1mM in A`

`a + c -> a + a {1}`

`equilibrate A1 = A for~e1`

`sample B {1¼L, 20C}`

`species b @ 10mM in B`

`amount c @ 1mM in B`

`b + c -> c + c {1}`

`equilibrate B1 = B for~e2`

`split C,D = A1 by s1`

`dispose~C`

`mix E = D, B1`

`a + b -> b + b {1}`

`equilibrate E for~e3`

`yield [observe(a,E), observe(b,E), observe(c,E)]`

`}`

`random X(omega w) {`

`f(100*(1+(w(0)-0.5)/10), 100*(1+(w(1)-0.5)/10), 1000*(1+(w(2)-0.5)/10),`

`0.5*(1+(w(3)-0.5)/10))`

`}`

`draw 3000 from X`

`f`with

`yield [observe(sqrt(var(a)),E), observe(sqrt(var(b)),E),`

`observe(sqrt(var(c)),E)]`

## Appendix B. Data for Gibson Assembly

## References

- Murphy, N.; Petersen, R.; Phillips, A.; Yordanov, B.; Dalchau, N. Synthesizing and tuning stochastic chemical reaction networks with specified behaviours. J. R. Soc. Interface
**2018**, 15, 20180283. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ananthanarayanan, V.; Thies, W. Biocoder: A programming language for standardizing and automating biology protocols. J. Biol. Eng.
**2010**, 4, 1–13. [Google Scholar] [CrossRef] [PubMed][Green Version] - Cardelli, L.; Češka, M.; Fränzle, M.; Kwiatkowska, M.; Laurenti, L.; Paoletti, N.; Whitby, M. Syntax-guided optimal synthesis for chemical reaction networks. In International Conference on Computer Aided Verification; Springer: Berlin/Heidelberg, Germany, 2017; pp. 375–395. [Google Scholar]
- Ang, J.; Harris, E.; Hussey, B.J.; Kil, R.; McMillen, D.R. Tuning response curves for synthetic biology. ACS Synth. Biol.
**2013**, 2, 547–567. [Google Scholar] [CrossRef] [PubMed] - Abate, A.; Cardelli, L.; Kwiatkowska, M.; Laurenti, L.; Yordanov, B. Experimental biological protocols with formal semantics. In International Conference on Computational Methods in Systems Biology; Springer: Berlin/Heidelberg, Germany, 2018; pp. 165–182. [Google Scholar]
- Rasmussen, C.E.; Williams, C.K.; Bach, F. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
- Van Kampen, N.G. Stochastic Processes in Physics and Chemistry; Elsevier: Amsterdam, The Netherlands, 1992; Volume 1. [Google Scholar]
- Cardelli, L.; Kwiatkowska, M.; Laurenti, L. Stochastic analysis of chemical reaction networks using linear noise approximation. Biosystems
**2016**, 149, 26–33. [Google Scholar] [CrossRef] [PubMed] - Gibson, D.G.; Young, L.; Chuang, R.Y.; Venter, J.C.; Hutchison, C.A.; Smith, H.O. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods
**2009**, 6, 343–345. [Google Scholar] [CrossRef] [PubMed] - Begley, C.G.; Ellis, L.M. Raise standards for preclinical cancer research. Nature
**2012**, 483, 531–533. [Google Scholar] [CrossRef] [PubMed] - Ott, J.; Loveless, T.; Curtis, C.; Lesani, M.; Brisk, P. Bioscript: Programming safe chemistry on laboratories-on-a-chip. Proc. ACM Program. Lang.
**2018**, 2, 1–31. [Google Scholar] [CrossRef][Green Version] - Baker, M. 1500 scientists lift the lid on reproducibility. Nat. News
**2016**, 533, 452. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bates, M.; Berliner, A.; Lachoff, J.; Jaschke, P.; Groban, E. Wet lab accelerator: A web-based application democratizing laboratory automation for synthetic biology. ACS Synth. Biol.
**2017**, 6, 167–171. [Google Scholar] [CrossRef] [PubMed] - Synthace. Antha. Available online: https://www.synthace.com/platform/first-steps-with-antha/ (accessed on 16 October 2021).
- Cardelli, L. Kaemika app: Integrating protocols and chemical simulation. In International Conference on Computational Methods in Systems Biology; Springer: Berlin/Heidelberg, Germany, 2020; pp. 373–379. [Google Scholar]
- Scott, D.; Strachey, C. Toward a Mathematical Semantics for Computer Languages; Oxford University Computing Laboratory, Programming Research Group Oxford: Oxford, UK, 1971; Volume 1. [Google Scholar]
- Cardelli, L. Two-domain DNA strand displacement. Math. Struct. Comput. Sci.
**2013**, 23, 247–271. [Google Scholar] [CrossRef] - Bortolussi, L.; Cardelli, L.; Kwiatkowska, M.; Laurenti, L. Central limit model checking. ACM Trans. Comput. Log. (TOCL)
**2019**, 20, 1–35. [Google Scholar] [CrossRef] - Laurenti, L.; Csikasz-Nagy, A.; Kwiatkowska, M.; Cardelli, L. Molecular Filters for Noise Reduction. Biophys. J.
**2018**, 114, 3000–3011. [Google Scholar] [CrossRef] [PubMed][Green Version] - Micchelli, C.A.; Xu, Y.; Zhang, H. Universal Kernels. J. Mach. Learn. Res.
**2006**, 7, 2651–2667. [Google Scholar] - Boyd, S.; Boyd, S.P.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Newman, S.; Stephenson, A.P.; Willsey, M.; Nguyen, B.H.; Takahashi, C.N.; Strauss, K.; Ceze, L. High density DNA data storage library via dehydration with digital microfluidic retrieval. Nat. Commun.
**2019**, 10, 1–6. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ethier, S.N.; Kurtz, T.G. Markov Processes: Characterization and Convergence; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 282. [Google Scholar]
- Schwabe, A.; Rybakova, K.N.; Bruggeman, F.J. Transcription stochasticity of complex gene regulation models. Biophys. J.
**2012**, 103, 1152–1161. [Google Scholar] [CrossRef] [PubMed][Green Version] - Leake, M. Analytical tools for single-molecule fluorescence imaging in cellulo. Phys. Chem. Chem. Phys.
**2014**, 16, 12635–12647. [Google Scholar] [CrossRef] [PubMed][Green Version]

**Figure 1.**Processing a unified description for an experimental protocol. A program integrates a biophysical model of the underlying molecular system with the steps of the protocol. In this case, the protocol comprises a single instruction, which lets a sample equilibrate for t seconds, where t is a parameter. The initial concentration of the sample is 1 for the first species and 0 for all the others. The value of t is selected as the one the maximizes a cost function, in this case the difference between ${y}_{5}$ and ${y}_{1}$ after the execution of the protocol. The optimization is performed on a Gaussian process given by the semantics of the program (biophysical model and protocol) integrated with experimental data.

**Figure 2.**Evolution of ${\mu}_{a}$ (red), ${\mu}_{b}$ (green), ${\mu}_{c}$ (blue) for the CRN reported in Example 1. Inset: the respective variances.

**Figure 3.**Mean and variance of species b for the CRN reported in Example 1 for $r=[0.1,{r}_{b},0.001,1,20]$ and $u=\left[T\right].$ (

**Left:**) Evolution of μ (prior mean of species b given by Equation (1)), ${\mathbf{\mu}}_{p}$ (posterior mean of species b given by Equation (3)), and the true dynamics of species b, assumed to be a deterministic function for this example, for ${r}_{b}=0.001$ (initialization variable relative to species b). It is possible to observe how, with just a few data points, the posterior mean reflects correctly the true dynamics. (

**Right:**) Standard deviation of b after training (square root of solution of Equation (4)) as a function of ${r}_{b}$ and T. The variance is higher for combinations of T and ${r}_{b}$ where no training data are available.

**Figure 4.**Optimal values for ${x}_{BA}$ and T and variance of the predictive Gaussian process. (

**Left:**) Optimal values of ${x}_{BA}$ and $\frac{T}{1000}$ for different values of $\lambda .$ (

**Right:**) Variance of $O\left(T\right)$ after training (Equation (2)) as a function of ${x}_{BA}$ and T. The variance is minimized for ${x}_{BA}\sim 1$. This is due to the fact that all training data have ${x}_{BA}=1$.

**Figure 5.**(

**Left, Center:**) Evolution of a (red), b (green), c (blue) for protocol ${P}_{split\&mix}$, showing mean (thick lines) and standard deviation (thin lines), separately for Sample E, with some trajectory overlaps in Sample A and B. Horizontal axis is time (s), vertical axis is concentration (mM). Sample A is simulated first, then Sample B, and finally Sample E, where the standard deviations start above zero due to propagating the final states of the earlier simulations. (

**Right:**) Density plots for global sensitivity analysis over 3000 runs, displaying the sensitivity at the end of the protocol of mean (

**top**) and standard deviation (

**bottom**) of a, b, c. Sensitivity is with respect to the three $Equilibrate$ duration parameters and the $Split$ proportion parameter: those parameters are simultaneously drawn from uniform distributions, each varying by up to ±5%. Horizontal axis is concentration (mM), vertical axis is the kernel density estimate ($m=\times {10}^{-3}$) with a standard normal distribution kernel, and bandwidth of 1/100 of the data range. Thick vertical lines locate the mean, thin vertical lines locate the ±standard deviation.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cardelli, L.; Kwiatkowska, M.; Laurenti, L.
A Language for Modeling and Optimizing Experimental Biological Protocols. *Computation* **2021**, *9*, 107.
https://doi.org/10.3390/computation9100107

**AMA Style**

Cardelli L, Kwiatkowska M, Laurenti L.
A Language for Modeling and Optimizing Experimental Biological Protocols. *Computation*. 2021; 9(10):107.
https://doi.org/10.3390/computation9100107

**Chicago/Turabian Style**

Cardelli, Luca, Marta Kwiatkowska, and Luca Laurenti.
2021. "A Language for Modeling and Optimizing Experimental Biological Protocols" *Computation* 9, no. 10: 107.
https://doi.org/10.3390/computation9100107