Mechanistic Models of Inducible Synthetic Circuits for Joint Description of DNA Copy Number , Regulatory Protein Level , and Cell Load

Accurate predictive mathematical models are urgently needed in synthetic biology to support the bottom-up design of complex biological systems, minimizing trial-and-error approaches. The majority of models used so far adopt empirical Hill functions to describe activation and repression in exogenously-controlled inducible promoter systems. However, such equations may be poorly predictive in practical situations that are typical in bottom-up design, including changes in promoter copy number, regulatory protein level, and cell load. In this work, we derived novel mechanistic steady-state models of the lux inducible system, used as case study, relying on different assumptions on regulatory protein (LuxR) and cognate promoter (Plux) concentrations, inducer-protein complex formation, and resource usage limitation. We demonstrated that a change in the considered model assumptions can significantly affect circuit output, and preliminary experimental data are in accordance with the simulated activation curves. We finally showed that the models are identifiable a priori (in the analytically tractable cases) and a posteriori, and we determined the specific experiments needed to parametrize them. Although a larger-scale experimental validation is required, in the future the reported models may support synthetic circuits output prediction in practical situations with unprecedented details.


Introduction
Recent advances in the construction and characterization of DNA encoded synthetic circuits have enabled to boost the design-build-test engineering cycle applied to biological systems, in terms of time and economic resources.This process has led to the engineering of complex engineering-inspired information processing and control systems, as well as solutions to numerous problems in industrial biotechnology and medicine [1][2][3].
Given a genetic circuit, its promoters, ribosome binding sites (RBSs) and the DNA copy number of its elements are some of the crucial degrees of freedom that biological engineers have to tune to implement an effectively working function [4,5].However, the combinatorial search space to properly tune complex regulatory networks makes a trial-and-error approach unpractical with no guarantee to reach an optimally working system [6,7].
High-impact industrially-attractive control systems for different applications have been already delivered by synthetic biology, for example to adaptively regulate biofuel production as a function of pathway intermediates availability [8] or to switch cellular metabolism towards growth or product formation as a function of cell stress [9].Although the mentioned circuits are characterized by an elegant design and have real-world impact, the final correctly working systems were obtained only after random selection steps-e.g., to tune the expression levels of key proteins-or the affinity of repressors exerting feedback control [10].These and a number of other examples suggest that a bottom-up design process is desirable to increase the success rate in the construction of working biological systems that behave as intended.To this aim, accurate predictive models are needed to support this design step.
Among the many issues currently limiting model-based approaches in synthetic biology, several unpredictability sources affect the re-use of biological parts in different contexts (i.e., strains, growth media, and even circuits).Cell-to-cell variability, flanking regions-dependent behavior, and cell load are among the major features causing such variability [11][12][13][14][15][16].Many of such effects have often been neglected in mathematical models, thereby limiting their predictive power.In addition, widely used modeling approaches describe recombinant protein regulation via empirical Michaelis-Menten or Hill equations [17].
Empirical models are popular tools and have many advantages (e.g., low number of parameters, overall good descriptive capability and no need to know the biomolecular interactions underlying the described process) [18,19].However, they may have poor predictive power on unseen data when one or more circuit elements are changed [20], like copy number of DNA or protein regulators that have been reported to be essential for the biological systems behavior [21].In specific, transfer functions of inducible devices can be characterized in vivo through dose-response experiments, in which a constitutively expressed regulator (activator or repressor) is activated or inhibited by an exogenously added molecule and the complex eventually affects the transcription of a cognate regulated promoter [22].While empirical models can be easily identified from such experimental data, it is not trivial to generalize them in situations in which molecule copy number changes, also for the empirical nature of model parameters [20].This weak aspect of empirical models might be crucial in practical bottom-up design situations, in which circuit parts are interconnected and the behavior of the system is predicted from the functioning of individual parts, by using previously estimated parameters [23].
Mechanistic models are able to overcome some of the issues mentioned above: parameters usually have biological meaning (e.g., dissociation constants, copy numbers, etc.) and predictive power is expected to be higher than empirical models, since circuit changes can be translated into the variation of specific parameters [20].However, mechanistic models usually have a larger number of parameters to be estimated, thereby raising issues of model identifiability [18,24].Since such models require a deep knowledge of occurring biomolecular interactions in the biological system under study, they are less popular than empirical models.Different mechanistic modeling efforts have been undertaken, describing gene regulation using thermodynamics or law of mass action [18,[25][26][27][28].However, the application of these models in bottom-up design approaches remains low due to the absence of studies on their identification and lack of broad-range advantages demonstrations over the empirical ones.
In this work, we presented different mechanistic models of the widely used lux inducible system [29], derived under different key assumptions on molecule abundances and resource limitation, thereby obtaining mathematical tools able to describe copy number changes and the burden effects caused by individual parts.
Our objective was to investigate the impact of the underlying assumptions on the transfer function of the lux inducible circuit via numerical simulations.The inclusion of the considered features in a mechanistic model represents a novel contribution of this work.We extended the work by studying the usability of such models in terms of structural and practical identifiability, given experimental measurements commonly available in synthetic biology in vivo studies.
We showed that different assumptions can lead to relevant differences among activation curves features (i.e., all the Hill function parameters showed variations).We also showed that simulations were consistent with preliminary in vivo data, measured in recombinant Escherichia coli in previous and novel studies, although a larger-scale validation of the model variants will be required.Finally, model identifiability analysis outlined the experimental designs required to enable accurate parameter estimation.
This study is expected to have high relevance for both computational biologists, since new models have been proposed and the impact of their key assumptions evaluated in silico, and for experimental biologists, since recommendations on the measurement procedures have been provided, supporting the future in vivo assessment of the models.Taken together, the results found in this work may support bottom-up design of synthetic biological systems by providing new models that, once validated, will represent accurate tools for synthetic circuits behavior prediction.

Inducible System Description
The inducible system studied in this work is described in Figure 1a.It includes a constitutively expressed LuxR protein, which acts as transcriptional activator of the cognate P lux promoter in presence of N-(3-oxohexanoyl)-L-homoserine lactone (HSL).The LuxR-HSL active complex binds to the lux box of the P lux promoter, triggering its activity in an HSL concentration-dependent fashion, thereby regulating the transcription of the downstream gene, which is then translated into protein.We considered a set of reactions affecting the production of the (reporter) protein-encoded gene regulated by P lux , similar to the ones described by Carbonell-Ballestero et al. [20] (Figure 1b).Specifically, a LuxR protein dimer (R 2 ) and HSL (L) form a complex (Q), which binds the free promoter (P) to form a transcriptionally active bound promoter (S).

Model Definition
In this section, different models describing the lux system are illustrated.In particular, the essential elements of gene expression modeling are reported in Section 2.2.1, together with the description of a basic empirical model for transcriptional activation.The description of all the models used in this work is provided in Sections 2.2.2-2.2.5.Due to the complexity of model structure under some assumptions, the theoretical analysis of the activation function parameters is only provided for the models described in Sections 2.2.1 and 2.2.2.As an alternative reaction scheme, we also considered a similar set of biomolecular interactions, with complex formation occurring in two steps (Figure 1c): binding of R 2 and L to form Q, and subsequent binding of L to Q to form the hetero-tetramer Q 2 .This alternative reaction set was defined to refine the mechanistic model above, in accordance with previous investigations, in which a hetero-tetrameric structure was suggested for the activated complex [30].

Model Definition
In this section, different models describing the lux system are illustrated.In particular, the essential elements of gene expression modeling are reported in Section 2.2.1, together with the description of a basic empirical model for transcriptional activation.The description of all the models used in this work is provided in Sections 2.2.2-2.2.5.Due to the complexity of model structure under some assumptions, the theoretical analysis of the activation function parameters is only provided for the models described in Sections 2.2.1 and 2.2.2.

Assumptions
Assuming that no post-transcriptional or post-translational regulations are involved in the circuit, the intracellular dynamics of mRNA (M), immature protein (X) concentration regulated by P lux and the mature form of the protein (Y) can be described via Equations (1)-(3): Processes 2019, 7, 119 5 of 24 In Equation ( 1), γ M is the mRNA degradation rate and H(L) is an HSL-dependent activation function describing transcription rate, expressed in (RNA time −1 ) units.In Equation (2), γ X (time −1 ) includes the protein degradation and dilution rates, and ρ (time −1 ) is the translation rate.If the expressed protein is very stable, γ X is equal to the rate of cell division.When relevant in terms of dynamics, protein maturation or folding is also included: Y represents mature protein, σ (time −1 ) represents protein maturation rate and protein degradation rate is assumed to be the same for immature and mature forms.Assuming the steady-state of all the intracellular processes, the output commonly considered for such system, i.e., the mature protein synthesis rate per cell (y), which is equal to the synthesis term of Equation (3), is proportional to H(L) (Equation ( 4)) Another output commonly found in literature is the per-cell mature protein (Y), which is proportional to y, thereby enabling to generalize all the modeling work presented in this study.
In many works considering a constant LuxR production, not changing throughout the experiments [22,23,[31][32][33][34], H(L) is modeled via a Hill equation and the circuit output, y, can be written as in Equation ( 5) Assuming no cooperativity, Equation ( 5) with η = 1 is equivalent to a Michaelis-Menten equation with three parameters: δ is the basic protein synthesis rate when P lux is in its off state; α is the activity range in the on state; κ is the HSL concentration corresponding to half of the maximum activation [20].The δ and α parameters are expressed as intracellular protein concentration per time.The experimental measurements routinely performed in laboratory to characterize synthetic circuits usually exploit fluorescent reporter proteins, which are quantified via in vivo assays by means of plate reader or flow-cytometry.In these cases, per-cell arbitrary units of fluorescence (AU) can be adopted to express intracellular protein concentration, assuming their proportionality.

Derivation
When LuxR level is also needed to be described (situation of interest in the present work), another commonly found modeling approach includes the following equations (Equations ( 6) and ( 7)) [30,33,35], describing the LuxR-HSL complex formation and the subsequent activation of protein synthesis by this complex In Equation ( 6), C is the intracellular LuxR-HSL complex concentration, U is the total LuxR concentration, κ R is the concentration of HSL required for half-activation of LuxR, and β is the Hill coefficient.In Equation (7), symbols have the same meaning as in Equation (5), with the cap denoting that this Hill function has C as input.For this reason, κ has the same units as C and U (protein concentration or AU, as explained above).Assuming no cooperativity, as before, the Hill coefficients (β and η) can be fixed to 1.The expressions in Equations ( 6) and ( 7) can be lumped into a single equation describing y as a function of L (Equation ( 8)) which is equivalent to the Michaelis-Menten function in Equation ( 5) with the following parameters (Equations ( 9)-( 11))

Final Expression
Although the parameters of Equation ( 8) have empirical nature, the copy numbers of P lux (n 1 ) and LuxR (n 2 ) can be included in the model.Specifically, n 1 and n 2 can act as known scale factors of δ, α and U (Equations ( 12)-( 14)), thereby enabling the description of copy number changes among different situations.Following a bottom-up approach, model parameters ( δ or d, α or v, κ and κ R ) can be estimated from experimental data and the parametrized model can be adopted to predict unseen situations with different n 1 and As before, the system output (y) can be expressed in per-time intracellular protein concentration or AU time -1 , like δ and α.Analogously, U and κ are intracellular protein concentrations that can also be expressed in AU, while κ R has the same units as HSL concentration.For this reason, the relative copy number changes, instead of the absolute ones, are sufficient to express n 1 and n 2 .Specifically, n 1 can be set according to plasmid copy number and n 2 is proportional to the strength of the promoter expressing LuxR.Finally, the u value (Equation ( 14)) can be set to 1 without any loss of generality, since U is always present in ratio with κ (Equation ( 8)).The final empirical model, called M0, describing the output of the lux inducible system as a function of HSL and the per-cell copy numbers of P lux and LuxR, is reported in Equation ( 15)

Mechanistic Model with LuxR Abundance Assumption (M1)
Assumptions A mechanistic model of the biomolecular reactions shown in Figure 1b has been previously reported [20] and is herein briefly illustrated.The main underlying assumptions are that HSL molecules bound to LuxR are negligible compared to the total HSL amount, and that P << R. A difference from the work by Carbonell-Ballestero et al. is that no binding is assumed between LuxR dimer and P lux , and the low-entity promoter activation by this unspecific complex is neglected.Previous experimental work showed that this activity increase is undetectable in the commonly tested conditions [35].

Derivation
Considering the same equations for transcription, translation and maturation (Equations ( 1)-(3)) as in Section 2.2.1, H(L) can be expressed in a more mechanistic fashion (Equation ( 16)) where P and S represent the intracellular concentrations of free and complex-bound promoter, while k m0 and k mL are the transcription rate constants in off and on state, respectively.Based on the law of mass action, the biomolecular interactions in Figure 1b depend on the following equilibrium constants and conservation expressions (Equations ( 17)-( 20)) Symbols in Equations ( 17)-( 20) are described in Figure 1b; briefly, R 2T and R 2 are the total and free LuxR dimer concentrations, respectively, where R 2T ≈ R T /2 (and R T is the total concentration of LuxR monomer).Under the assumption that P << R, the bound promoter S in Equation ( 20) can be neglected: 1)-(3) and Equations ( 16)-( 20), the following expressions for LuxR-HSL complex and circuit output can be written (Equations ( 21) and ( 22)) [20] where km0 and kmL are k m0 and k mL , respectively, scaled by σ × ρ/((γ X + σ) × γ M ) (see Equation ( 4)).These equations show that, like in the empirical model in Equations ( 6) and (7), in this mechanistic model the intracellular complex concentration and circuit output can be both described by Michaelis-Menten functions.Analogously, circuit output as a function of HSL is a Michaelis-Menten function (Equation ( 23)) with coefficients described in Equations ( 24)-( 26) In summary, M0 (Equation ( 15)) and M1 (Equation ( 23)) have the same y(L) function and analogous LuxR-dependent Michaelis-Menten parameters expressions (Equations ( 9)- (11) and Equations ( 24)-( 26)).However, the mechanistic nature of M1 gives the advantages that molecule concentrations are explicitly described and the link between model parameters and biological mechanism is not lost.As a result, M1 offers the opportunity to parametrize the model with biologically meaningful parameters, like intracellular concentrations, equilibrium constants, and copy numbers.Nonetheless, hard-to-measure quantities can still be expressed in AUs instead of concentrations, as in M0.For instance, while promoter concentration and DNA copy number are easy to retrieve (e.g., from assumptions about cell volume and plasmid datasheets or quantitative PCR), protein concentration requires more resource consuming experiments (e.g., Western blot) and depends not only on DNA copy number, but also on transcription, translation and degradation rates [20].Nevertheless, the relative protein level can still be approximated via the strength of the upstream promoter.
For the reasons above, in this work M0 will not be further analyzed, and only serves as a reference for M1 parameters expressions.

Final Expression
The final expression of M1, explicitly including n 1 and n 2 , is reported in Equation ( 27) where P U and r T are the intracellular concentrations of one DNA copy of promoter and LuxR protein, respectively.The n 1 and n 2 parameters are, respectively, the P lux copy number, and the scale factor between one protein monomer and the actual dimer concentration.

Mechanistic Model without LuxR Abundance Assumption (M2) Assumptions and Final Expression
Following the same biomolecular reactions and calculation steps as in M1, it is possible to calculate circuit output without the P << R assumption.In this situation, S cannot be neglected in Equation (20).Circuit output can be computed from Equation (28), in which R 2 and P are the roots of a second order two-equation system (Equations ( 29)-( 30)).In this equation system, only one root has a biologically acceptable meaning for any value of L (not shown).Analytical formulas expressing R 2 and P are not reported due to their complexity and, as a result, Equations ( 28)-( 30) represent the final reported expression for M2 2.2.4.Modeling LuxR-HSL Hetero-Tetramerization (M1T, M2T)

Assumptions
The biomolecular interactions illustrated in Figure 1c can be used to derive mathematical models in which, differently from Figure 1b, the activated complex is a hetero-tetramer, formed by two LuxR and two HSL molecules.In particular, under the P << R assumption, we also assumed that: i) the LuxR dimer has two binding sites for HSL; ii) the probability of HSL binding to a free site of R 2 and Q is the same (i.e., there is no cooperative behavior); iii) the probability of HSL unbinding from an occupied site of R 2 and Q is the same.

Derivation
The output can be expressed as in Equation ( 16), the K 1 equilibrium constant as in Equation (17) and the conservation of total, free and bound promoter as in Equation (19).The following expressions describe equilibrium constants K 4 and K 5 , as well as the LuxR dimer (free, bound with L, bound with 2L and bound with 2L and the promoter) conservation (Equations ( 31)-( 33)) Thanks to the P << R assumption, as in M1 the S concentration becomes negligible in Equation (33), thereby having: The forward and reverse rate constants in Equation (34) (k + and k − ) and Equation ( 35) (k + and k − ), describing the two HSL binding steps, have the following relations: For the described reasons, a relation between K 1 and K 4 can be written (Equation ( 36)) Final Expression Based on the relations above and following the same mathematical steps as in M1, the final output for this model (M1T) is reported in Equation (37); the M1T expression in Equation (37) does not resemble a Michaelis-Menten function (differently from M1).
Without the P << R assumption, the expressions of circuit output become more complex.By following the same steps previously done in Section 2.2.3 for M2, the final model (M2T) is described by the following expressions (Equations ( 38)-( 40)).

Assumptions
Unless differently indicated, in the two-gene circuit considered in this work (Figure 1a) we assume that only the protein with expression regulated by P lux is characterized by a relevant resource usage, while LuxR does not cause relevant burden.Since LuxR is constitutively expressed, thereby giving constant load, this assumption will not affect any HSL-dependent function, as previously discussed for genes with constant resource usage acting as background [23].

Derivation
A mechanistic model has been recently proposed to describe the effects of cell load caused by the expression of proteins with high resource demand [12,36].In such context of transcriptional/translational resource limitation, the synthesis rates of all the proteins of a synthetic circuits are globally scaled by a factor D (Equation ( 41)).
where c indicates the number of expressed proteins in the synthetic circuit, s i represents the synthesis rate of the i-th protein and J i its resource usage parameter.Model derivation is extensively discussed in the original publications [12,36], while in this study only the expression with lumped parameters is reported and used (Equation ( 41)).

Final Expression
The final expression of model output in presence of cell load is reported in Equations ( 42)-(44).
where y b and n 2,b indicate the model output and the LuxR scale factor affected by cell load, y is the output of one of the models described in Sections 2.2.1-2.2.4,n 1 and n 2 have the same meaning as before, and J is the resource usage parameter (in min/AU) associated to the output protein.Finally, the contribution of an external constant load (E) was studied analogously, with the exception that the value of E was added to Equation (42).

Model Parametrization
Models were parametrized using plausible values, according to available biological knowledge (e.g., DNA/protein concentrations) and previously published experimental data (e.g., P lux activation curve, resource usage [23,31,37]).A summary of parameters is provided in Table 1 for each model used in this work.When indicated, structural parameters were fixed to the reported values in simulations and they were assumed to be unknown during parameter estimation tasks (e.g., when studying a posteriori identifiability).E. coli cell volume was assumed to be 1 µm 3 , corresponding to 10 −15 L. Under this assumption, the concentration of one molecule of promoter DNA or protein corresponds to 1.66 nM [38].A variation of the average E. coli cell volume has been previously reported in the range 0.5-2 µm 3 for different growth rates and conditions [38].This variation is expected to affect the absolute values but not the trends of the numerical simulations shown of this work, thereby not affecting the drawn conclusions.Nonetheless, the variation of cell volume will be highly relevant in the analysis of real data, in which model parameters have to be estimated based on the knowledge of this value.As anticipated in Section 2.2.2, to enable the application of the models in popular situations occurring in synthetic biology (i.e., model identification with the data routinely measured in fluorescent reporter protein-based assays) promoter copy number (n 1 ) was assumed to be available; on the other hand, the actual LuxR protein concentration (dependent from luxR DNA copy number, transcription rate, translation rate, dimerization, and mRNA/protein degradation) is harder to measure.While in simulation the R 2T quantity was spanned to explore the effects of having wide ranges of LuxR values, its value was assumed to be unavailable during model identification.Nonetheless, the relative strength of the promoter expressing the luxR gene is commonly known, thereby enabling to approximate the relative level of LuxR.For model amenability reasons, in estimation steps we parametrized n 2 as the (known) relative strength of this promoter (in AU) and r T as the (unknown) scale factor between protein concentration (in nM) and AU, in which all the biologically occurring processes, described above, are lumped without any loss of generality and maintaining the mechanistic nature of the model.In M1, the r T quantity always appears multiplied by K 3 (Equation ( 27)) and, for this reason, only their product is identifiable.By defining K3 = K 3 × r T (in AU −1 ), the M1 model can be re-parametrized as Equation (45) Analogously, the K 5 × r T product in M1T (Equation ( 37)) can be re-parametrized as K5 = K 5 × r T (in AU −1 ), thereby yielding the model in Equation ( 46)

A Priori Identifiability
Given a circuit output expression with known form (e.g., a Michaelis-Menten or a rational function) with specific coefficients that could be estimated from data, an expression for each parameter of the model was found as a function of the theoretically known coefficients.If the system can be solved uniquely, the model was considered as structurally (or a priori) identifiable.

A Posteriori Identifiability
Given a model, simulated data were generated and parameters were estimated.Different experiments were simulated varying n 2 as required, while the other structural parameters were kept constant to the values in Table 1, unless differently indicated.Proportional Gaussian random noise was added to the simulated data with a coefficient of variation (CV) of 5% as default value; other values were also evaluated (0, 1, 2.5, and 10%).A limited number of data points ( 12) was assumed to be available, resembling a realistic dose-response experimental setup.Parameter estimation was performed via least squares method with the MATLAB R2017b (MathWorks, Natick, MA, USA) lsqnonlin routine.If the estimated parameters are consistent with the ones used to generate the data, the model is considered as practically (or a posteriori) identifiable.For each proportional error entity, 200 simulation and estimation steps were carried out, thereby identifying the model using different synthetic data with random noise.Relative estimation error (REE = 100•|p est − p true |/p true , where p est and p true are the estimated and true parameter values, respectively) was used to express parameter consistency.Uncertainty of parameter estimates, in terms of CV, was also computed as reported previously [39].For each run, the maximum REE and CV among all the estimated parameters was considered.

Simulations
The MATLAB roots function was used to find P and R 2 as polynomial roots to solve the M2, M2T, and M2L model equations.Implicit equations, commonly occurring in models including cell load terms, were solved using the fixed point method, as previously described [23].

Analysis of Activation Curves
Hill function parameters, described in Equation ( 5), were calculated for each activation curve: δ and α were computed as the y values at lowest and highest L value, respectively; κ was computed as the value of L corresponding to half-maximum activation; η was computed according to Equation (47).
where L90 and L10 are the L values corresponding to 90% and 10% of the maximum value of y.
On the other hand, in vivo measured activation curves were fitted with Equation (5) to estimate its parameters, as described in Section 2.2.3.

In Vivo Experiments
Circuit output, i.e., RFP synthesis rate per cell (S cell ) at steady-state, was measured for recombinant MG1655-Z1 strain [31] bearing the low-copy plasmid pSB4C5 with X 3 r as insert [23].In this construct, previously described in [23] and with sequence available as BBa_J107032 code in the Registry of Standard Biological Parts (http://parts.igem.org),luxR is under the control of an anhydrotetracycline (ATc) inducible promoter, which works as gene expression knob in Escherichia coli strains overexpressing TetR, like the one used in this study.
The detailed experimental protocol for S cell measurement was previously described [23].Briefly, 0.5 ml of M9 medium (11.28 g/L M9 salts, 1 mM thiamine hydrochloride, 2 mM MgSO 4 , 0.1 mM CaCl 2 , 0.2% casamino acids and 0.4% glycerol) were inoculated with a colony from a freshly streaked LB agar plate in a 2-ml tube and the culture was incubated at 37 • C, 220 rpm for at least 16 h.The grown culture was 100-fold diluted in 200 µL of M9 in a 96-well microplate.ATc and HSL (2 µl) were added to the microplate wells to reach the desired concentrations.The microplate was incubated at 37 • C in the Infinite F200 (Tecan) microplate reader and an automatic measurement procedure was programmed via the i-control software v.2.0.10 (Tecan, Switzerland): shaking (15 s, 3-mm amplitude), wait (5 s), optical density (600 nm) acquisition, red fluorescence (excitation: 535 nm, emission: 620 nm, gain = 50) acquisition, sampling time = 5 min.Raw data time series were background-subtracted using sterile medium (absorbance) and a non-fluorescent culture (fluorescence), incubated in the same experiment.The resulting data were used to compute S cell as the numeric time derivative of fluorescence, divided by absorbance over time.S cell was averaged in the exponential growth phase, typically occurring at absorbance values between 0.05 and 0.18 [32].Finally, the average S cell values of the X 3 r strains at the desired inductions were normalized by the S cell value of a reference culture (MG1655-Z1 bearing the BBa_J107029 constitutive RFP expression cassette in the pSB4C5 plasmid) as internal control to obtain S cell in standardized relative units.
The resulting dose-response curves were fitted using the MATLAB lsqnonlin routine to estimate Hill function parameters.For each HSL and ATc condition, at least three independent experiments were carried out.

Simulated Output for Different Model Assumptions
The effect of different model assumptions on the transfer functions shape was studied via numerical simulations.The considered assumptions were on LuxR abundance (Section 3.1.1),cell load (Section 3.1.2)and LuxR-HSL binding mechanism (Section 3.1.3).We evaluated if such assumptions exert a relevant contribution to output variation and, in some cases, if their inclusion contributes to the superior descriptive power of preliminary experimental data.

Effect of LuxR Abundance Assumption: M1 vs. M2
We compared the simulated outputs of M1 and M2, for different DNA/protein copy number situations, to evaluate the effect of the assumption of LuxR concentration abundance over P lux (Figure 2).
As expected from Equations ( 25) and ( 26) and previous works [20], for different values of R 2T (equal to n 2 × r T ) the output curves generated by M1 showed diverse maximum (α) and switch point (κ) values, increasing and decreasing respectively as a function of R 2T (Figure 2a,c,e).On the other hand, for increasing values of plasmid copy number (n 1 ), the values of α showed a linear increase and κ remained constant (Figure 2b,d,f) as expected from Equation (24).In both cases, the Hill coefficient was always equal to 1 as expected (Figure 2g,h).While the M1 model was able to capture the effects of LuxR level variation in some experimentally tested model systems, as previously demonstrated [20], the effect of changes in plasmid copy number are intuitively non-realistic since M1 assumes that the concentration of LuxR is much higher than the one of the promoter, thereby resulting in an unlimited increase of protein synthesis rate for high DNA copy numbers, with unchanged shape of the curve in terms of κ and η.Since LuxR may be expressed over a wide range of values to tune the sensitivity of the inducible device [17], the removal of its abundance assumption can be of interest and led us to develop M2.Assuming that a constant load () also affects protein expression, α showed a further decrease, and κ showed an increase compared with M1, that is, activation curves showed a systematically higher switch point than in a situation without load.
In all the described cases, the Hill coefficient showed values slightly lower than 1 upon plasmid copy number increase and almost unchanged values (close to 1) in case of  variation.(a,c,e,g) the promoter copy number value was set to 5, while in panels (b,d,f,h) the LuxR concentration was set to 500 nM.
The M2 model was investigated in the same situations considered for M1 (Figure 2).Upon changes of LuxR values, the relationship between R 2T and α or κ were qualitatively analogous to M1.However, when R 2T decreases below the promoter concentration value (about 8 nM) the output of M2 showed both lower α and κ levels compared to M1, with the decrease of κ showing the highest-entity effect (>2-fold with the used parameters, Figure 2e).When plasmid copy number is varied in a range below a constant LuxR concentration value (500 nM), α still showed a linear increase as observed in M1, but, differently from M1, κ showed a low-entity increase (less than 2-fold).For concentrations of promoter higher than LuxR, the α value showed saturation, intuitively because all the P lux promoters in the cell cannot be occupied by the limiting amount of LuxR, and as a result RFP synthesis cannot increase anymore for higher concentrations of promoter.Although in this latter condition RFP maximum expression is constant, an increase in plasmid copy number results in a decrease of κ (about 2-fold).Also, in the M2 model, the Hill coefficient was equal to 1 upon R 2T variation.However, interestingly, it decreased to values slightly lower than 1 (up to 0.8) upon variations of plasmid copy number (Figure 2g,h).
The illustrated results demonstrate that the removal of the LuxR abundance assumption can affect all the parameters of a Hill function, even when R 2T and P T are tuned over ranges of values not violating this assumption.

Effects of Cell Load
The M1L model was simulated to investigate the effects of cell load, which was assumed to derive from RFP expression alone, or from both RFP and a constant load outside the inducible circuit, caused by the expression of another heterologous protein (Figure 3).We then analyzed the in vivo data from a previously published experiment, in which the activation curve of a medium copy plasmid-borne lux inducible device was characterized in absence If the load was caused by RFP, its expression affected both LuxR and RFP itself when induction levels became high upon HSL addition.As expected, the maximum level of RFP expression reached by M1L was lower than the respective levels reached by M1 (Figure 3c,d) upon changes of both R 2T and plasmid copy number.In particular, the increase of DNA copy number resulted in a saturated maximum RFP expression, which was much lower than in the no-burden model (Figure 3d).The burden effect on κ resulted in a slight decrease compared with M1 upon R 2T variation (Figure 3e), while the decreasing effect on κ was more relevant upon plasmid copy number increase (Figure 3f).
Assuming that a constant load (E) also affects protein expression, α showed a further decrease, and κ showed an increase compared with M1, that is, activation curves showed a systematically higher switch point than in a situation without load.
In all the described cases, the Hill coefficient showed values slightly lower than 1 upon plasmid copy number increase and almost unchanged values (close to 1) in case of R 2T variation.
We then analyzed the in vivo data from a previously published experiment, in which the activation curve of a medium copy plasmid-borne lux inducible device was characterized in absence and presence of a second co-transformed plasmid, considered as an additional constant load for the host [31].Transfer functions were measured in four conditions (with/without load, low/high expression of IPTG-driven LuxR) and a summary of the resulting Hill function parameters is reported in Table 2.As the M1L model predicts, the additional load affects the transfer function parameters upon both low and high LuxR expression level conditions: α showed a decrease in presence of E, while κ increased.It is worth noting that a simpler model including cell load only on RFP expression (and not on LuxR), as previously adopted to analyze in vivo data [23], fails to describe the joint increase of α and decrease of κ upon LuxR overexpression, captured by the M1L model and observed in experimental data resembling the modeled situation (data not shown).
Analogous results were obtained from the simulations via M2L (data not shown).
The reported simulations showed that cell load can quantitatively affect all the Hill parameters of the activation curves, thereby demonstrating the relevance of model assumptions in the analysis of dose-response curves of inducible systems.a Percent expression, relative to the maximum expression value obtained in the same study; b BioBrick™ construct BBa_J107063 in the pSB3K3 medium-copy vector, IPTG-inducible LuxR expression cassette; c Additional load provided by the OL1 low-copy plasmid, described in the same paper; d BioBrick™ construct BBa_J107032 in the pSB4C5 low-copy vector, ATc-inducible LuxR expression cassette.

Evaluation of LuxR-HSL Complex Formation Assumptions
By assuming that the LuxR dimer has two binding sites for HSL molecule, we defined models including a hetero-tetramer formation step (M1T and M2T, depending on the LuxR abundance assumption as above).We also assumed a non-cooperative behavior for the HSL ligand binding, i.e., the two successive HSL binding events occur with the same probability and K 4 = K 1 /4, as described in Equations ( 34)- (36).Considering the Adair equation (Equation (48)), describing the fraction (F) of HSL-bound sites of LuxR over the total number of sites [40], the resulting Hill coefficient (computed as in Equation ( 47)) is always 1 under the non-cooperativity assumption, for any parameter and R 2T value Differently from Equation (48), the expressions of Q 2 (which can be calculated from Equations ( 17), ( 31) and ( 33)) and y show LuxR level-dependent Hill coefficients.For both Q 2 and y, such features can be observed in the closed-form expressions which can be obtained from M1T (Section 2.2.4).In particular, model simulations of M1T showed that the Hill coefficient increases as a function of R 2T , which represented the main difference from the respective model without hetero-tetramerization assumption (M1), while the other parameters showed analogous trends as above (Figure 4).The simulation of M2T also showed this feature on the Hill coefficient, together with the same trends described in Section 3.1.1due to the removal of LuxR abundance assumption.Although a number of previous experimental works on the lux system showed a Hill coefficient around 1 or slightly lower in the tested conditions [20,33,37], in some works a higher number was reported [31,41].While Hill coefficient values lower than 1 could be due to burden effects and/or violation of the LuxR abundance assumption, as described by the models illustrated in Sections 3.1.1and 3.1.2,higher values could not be described by those models in any tested case.Using preliminary second stage, in which the M2 model was used to fit the data with n2 = 0.005 (corresponding to  = 0.5 nM) and, as before, 0.5 and 5 AU, to estimate  and  , exploiting at least one condition in which LuxR is not abundant compared with the Plux promoter.This procedure led to a structurally identifiable condition for the M2 model, since it could estimate its parameters with reasonably low estimation error and CV, even if it had higher REE compared with M1 but much lower CV (Figure 5).

M1T
The M1T model is a priori identifiable from only one experiment.In particular, assuming that the coefficients of the rational function in Equation (37) are known (Equation (55)), the Equations (56)-( 60 The a posteriori identifiability was investigated by using only one activation curve with  = 50 AU (with  = 1).Despite the identifiability was successfully confirmed, the REE and CV were both higher than in M1 (Figure 5).We also investigated the availability of a second activation curve with

M2
The a priori identifiability of the M2 model could not be studied due to its complex expression.Only a posteriori identifiability could be addressed.Since M2 has the same parameters as M1 with the addition of r T (which cannot be estimated separately from K 3 in M1), its proper estimation in M2 is intuitively possible only when LuxR is not abundant compared with promoter concentration, otherwise the M2 expression would become identical to M1. Accordingly, the synthetic experiments were simulated with n 2 = 0.005, 0.5, and 5 AU (with r T = 100 nM/AU).The simultaneous estimation of its five parameters did not lead to a structurally identifiable model due to the high REE and CV, even by considering more activation curves with different LuxR levels (e.g., n 2 = 0.05 which was added to the ones above-data not shown).For this reason, we investigated a two-stage procedure in which synthetic data obtained by setting n 2 = 0.5 and 5 AU (corresponding to R 2T = 50 and 500 nM) were fitted with M1 (first stage).This fitting is expected to provide reliable estimates (as shown in Section 3.2.1)because LuxR is highly abundant and the assumptions of M1 are not violated.Since M1 and M2 share the same km0 , kmL and K 1 parameters, their estimated values were fixed in the second stage, in which the M2 model was used to fit the data with n 2 = 0.005 (corresponding to R 2T = 0.5 nM) and, as before, 0.5 and 5 AU, to estimate K 3 .andr T , exploiting at least one condition in which LuxR is not abundant compared with the P lux promoter.This procedure led to a structurally identifiable condition for the M2 model, since it could estimate its parameters with reasonably low estimation error and CV, even if it had higher REE compared with M1 but much lower CV (Figure 5).
When the assumption on the LuxR abundance was removed (M2), the model gave different outputs compared with M1 when the assumption was violated.In particular, the increase of P lux copy number was predicted to result into a linearly increasing expression by M1, while M2 showed a saturating trend, thereby demonstrating that the removal of this simplifying assumption can lead to an intuitively more realistic behavior of the inducible system upon DNA copy number changes.In this situation and with the structural parameter values used in our study, the output showed maximum activity variations up to 2.3-fold between M1 and M2 for biologically plausible values of DNA and protein concentrations.
When a limited resource framework was assumed (M1L), the expression of LuxR and output protein were globally affected by cell load due to the expressed output protein (in an HSL-dependent fashion) and/or to a load outside the inducible circuitry.As previously demonstrated in vivo and in silico, the output of inducible systems is affected by the resource usage of the expressed proteins [23].In the M1L model, we showed that an external load causes a decrease in maximum output and an increase of the half-maximum HSL concentration.This result could not be predicted by some of the traditionally used models in which the transcriptional regulator is assumed to be constant and is not modeled [23,34,42].In this situation and with the structural parameter values used in our study, the output showed parameter variations up to 2.5-fold between M1 and M1L for biologically plausible values of resource usage parameters, and DNA and protein concentrations.The predicted effects were consistent with previously published experimental data of our group, in which the same inducible system with and without cell load (due to the presence of an additional co-transformed plasmid) showed the same effect on maximum activity and activation curve sensitivity [31].
In both M2 and M1L models, we also showed that the Hill coefficient of the output curve could decrease (compared to 1, i.e., the one of M1) up to 0.8 with the parameters used in our study.
When a different ligand binding reaction was considered (M1T), i.e., two HSL molecule binding to the LuxR dimer non-cooperatively, the output expression introduced a power over the HSL concentration term.As a result, output curves had a Hill coefficient greater than one, even if the described binding mechanism was assumed to be non-cooperative.To our knowledge, this is the first study explicitly highlighting that a Hill coefficient greater than one could occur in absence of cooperative binding in the transcriptional activator.This effect was consistent with previously published experimental data from our group [31] and others [41], as well as novel preliminary experimental data explicitly measured in this work: the Hill coefficient of the output curve increased as a function of LuxR level.In our M1T model, an increase up to 2-fold was observed for the Hill coefficient compared to M1, which relied on a different assumption on HSL binding.
The behavior of the remaining models, including combinations of the described assumptions (M2L, M2T, M1TL, M2TL), showed more complex features, but similar conclusions on the effects of the investigated assumptions could be drawn.
The M1, M2, M1T, and M2T models were also studied in terms of usability, by investigating their identifiability, to eventually understand if their parameters could be estimated from experimental data and which experimental design is recommended.In fact, in addition to simulation, parameter estimation is a crucial step in model usability that enables the re-use of well-characterized regulatory components in synthetic biology.All the models enabled parameter estimation with reasonably low error and a few constraints (described in Section 3).In general, as expected, by increasing the random noise affecting experimental data REE and CV increase.As for the individual models, M1 required two experiments with different LuxR levels, while M1T required only one experiment with a single LuxR expression value, despite its parameters could be estimated much more reliably by adding a second experiment, like in the M1 case.The M2 and M2T models required an additional experiment, compared to M1 and M1T, respectively, to be properly identified, in which a curve obtained with a LuxR level that is not much higher than P lux concentration had to be measured.Considering a proportional error model for the generated data (5% CV), all the models enabled the estimation of structural parameters within 2-fold compared with the true value (considering interquartile ranges of REE distribution).The M2 model showed the highest estimation error (>100%, median among 200 runs with different datasets simulated with a 10% proportional error), making it the less robust model among the tested ones in estimation tasks.The M1T model with a single LuxR level showed similar drawbacks, which could be overcome by adding activation curve data with more LuxR levels, while M2 could not be improved following the same procedure.
If used to fit the data in real experimental works, a major advantage of the M2 and M2T models is that they potentially enable the estimation of the actual LuxR intracellular concentration, which could not be estimated with the LuxR abundance assumption (M1 and M1T).However, the simultaneous estimation of all the parameters of M2 and M2T failed, thereby leading to the definition of an alternative procedure for the identification of such models: first, M1 (or M1T) was identified by fitting two activation curves data that conformed to the LuxR abundance assumption to estimate all model parameters; then, M2 (or M2T) was identified by fixing three parameters, previously estimated via M1 (or M1T), and estimating the two remaining ones by fitting the two activation curves data used in the first stage, together with an additional curve obtained for a low LuxR level.Given the same number of experiments, M1T and M2T could be identified with lower estimation error and parameter uncertainty than M1 and M2, respectively.The identifiability of models including cell load was not herein addressed, but their usability with real data was investigated previously [23]; they are not expected to include additional identifiability issues, since resource usage parameters could be estimated separately if required [23,43] and their identification can rely on a second output of the system, i.e., a cell burden monitor which acts as a proxy of cell load.
A number of limitations may affect the usability and predictive performance of the studied models against in vivo data.In fact, although mechanistic details have been herein added to traditional models to improve their generalization performance, other assumptions may still be inaccurate in capturing the real behavior of a synthetic circuit.Among the potential crucial aspects, it is worth noting that cell systems are inherently stochastic and when the reacting molecules are present at small intracellular copy numbers stochasticity can result in large fluctuations in the behavior of single cells in a population.In addition, the non-cooperativity of the inducer-regulator binding, herein assumed, should be further investigated.More in general, despite preliminary data have been used to confirm some of the transfer function variations found in this work, a larger-scale experimental validation should be required to investigate and confirm the described effects.Such effort should experimentally validate the impact of the individual assumptions.The different LuxR-HSL binding assumptions will need such validation to select the one best describing experimental measures.However, the validation of the other assumptions (LuxR abundance and cell load) should not lead to the selection of a best-performing model since it is expected to be application-specific, e.g., a no-burden model may have high predictive power if all the genes of a circuit have low resource usage.
In summary, we have proposed different usable mechanistic models that had a significant impact on the predicted output of an inducible system, and they were also consistent with a set of preliminary experimental data.In the future, the reported models may support synthetic circuits output prediction in practical situations with unprecedented details, also facilitating the bottom-up design of complex circuits due to their generalization power.In this framework, highly relevant applications of such models in synthetic biology are, e.g., the prediction of circuit output as a function of unseen DNA, protein and inducer concentrations; estimation of protein regulator abundance; investigation of different binding mechanisms for subsequent model selection.
The proposed model definition and analytic procedures may be used to study other systems different from the LuxR/P lux module, considered in this work.The underlying binding reactions could be known, or they might be investigated by testing different model assumptions against experimental data for model parametrization and mechanistic understanding.

Figure 1 .
Figure 1.Description of the lux inducible system.(a) The gene encoding the LuxR protein transcriptional regulator is expressed by a constitutive promoter (Pcon); the LuxR regulator becomes activated upon HSL molecule binding to form a complex which can bind the Plux promoter in its single lux box sequence, thereby activating the expression of the gene of interest (GOI), placed downstream of Plux.Curved arrows represent promoters; ovals represent ribosome binding sites (RBSs); straight arrows represent coding sequences; hexagons represent transcriptional terminators; circles represent HSL molecules.(b) and (c) Biomolecular reactions modeled in this work.The LuxR dimer (R2T, blue overlapping circles) is assumed to be activated upon binding of one (b) or two (c) HSL molecules (L, yellow circles) to form an activated complex (Q or Q2, respectively), which binds DNA (double helix icon) to enable transcription.In panel (c), Q is the LuxR dimer form bound to one HSL molecule and it is assumed to be unable to bind DNA.The equilibrium constants are reported for each reaction occurring in panels (b) and (c).

Figure 1 .
Figure 1.Description of the lux inducible system.(a) The gene encoding the LuxR protein transcriptional regulator is expressed by a constitutive promoter (P con ); the LuxR regulator becomes activated upon HSL molecule binding to form a complex which can bind the P lux promoter in its single lux box sequence, thereby activating the expression of the gene of interest (GOI), placed downstream of P lux .Curved arrows represent promoters; ovals represent ribosome binding sites (RBSs); straight arrows represent coding sequences; hexagons represent transcriptional terminators; circles represent HSL molecules.(b,c) Biomolecular reactions modeled in this work.The LuxR dimer (R 2T , blue overlapping circles) is assumed to be activated upon binding of one (b) or two (c) HSL molecules (L, yellow circles) to form an activated complex (Q or Q 2 , respectively), which binds DNA (double helix icon) to enable transcription.In panel (c), Q is the LuxR dimer form bound to one HSL molecule and it is assumed to be unable to bind DNA.The equilibrium constants are reported for each reaction occurring in panels (b,c).

Figure 2 .
Figure 2. Comparison between M1 and M2.Panels (a) and (b) report the output activation curves of M1 (solid lines) and M2 (dashed lines) for different values of R2T (panel (a), as indicated in the legend, expressed as nM) and PT (panel (b), as indicated in the legend, expressed as per cell copy number).The (c)-(d), (e)-(f), and (g)-(h) panel pairs report the R2T-and PT-dependent trend of α, κ and η, respectively, with solid and dashed lines representing M1 and M2, respectively.In panels (a),(c),(e),(g) the promoter copy number value was set to 5, while in panels (b),(d),(f),(h) the LuxR concentration was set to 500 nM.

Figure 2 .
Figure 2. Comparison between M1 and M2.Panels (a,b) report the output activation curves of M1 (solid lines) and M2 (dashed lines) for different values of R 2T (panel (a), as indicated in the legend, expressed as nM) and P T (panel (b), as indicated in the legend, expressed as per cell copy number).The (c)-(d), (e)-(f), and (g)-(h) panel pairs report the R 2T -and P T -dependent trend of α, κ and η, respectively, with solid and dashed lines representing M1 and M2, respectively.In panels (a,c,e,g) the promoter copy number value was set to 5, while in panels (b,d,f,h) the LuxR concentration was set to 500 nM.

Figure 3 .
Figure 3.Comparison between M1 and M1L.Panels (a,b) report the output activation curves of M1 (solid lines) and M1L (dashed lines) and M1L with external load, indicated as M1L+E (dotted lines), for different values of R 2T (panel (a), as indicated in the legend, expressed as nM) and P T (panel (b), as indicated in the legend, expressed as per cell copy number).The (c)-(d), (e)-(f), and (g)-(h) panel pairs report the R 2T -and P T -dependent trend of α, κ and η, respectively, with solid, dashed, and dotted lines representing M1, M1L, and M1L+E, respectively.In panels (a,c,e,g) the promoter copy number value was set to 5, while in panels (b,d,f,h) the LuxR concentration was set to 500 nM.

Figure 4 .
Figure 4. Comparison between M1T and M2T.Panels (a) and (b) report the output activation curves of M1T (solid lines) and M2T (dashed lines) for different values of R2T (panel (a), as indicated in the legend, expressed as nM) and PT (panel (b), as indicated in the legend, expressed as per cell copy number).The (c)-(d), (e)-(f), and (g)-(h) panel pairs report the R2T-and PT-dependent trend of α, κ and η, respectively, with solid and dashed lines representing M1T and M2T, respectively.In panels (a),(c),(e),(g) the promoter copy number value was set to 5, while in panels (b),(d),(f),(h) the LuxR concentration was set to 500 nM.

Figure 4 .
Figure 4. Comparison between M1T and M2T.Panels (a,b) report the output activation curves of M1T (solid lines) and M2T (dashed lines) for different values of R 2T (panel (a), as indicated in the legend, expressed as nM) and P T (panel (b), as indicated in the legend, expressed as per cell copy number).The (c)-(d), (e)-(f), and (g)-(h) panel pairs report the R 2T -and P T -dependent trend of α, κ and η, respectively, with solid and dashed lines representing M1T and M2T, respectively.In panels (a,c,e,g) the promoter copy number value was set to 5, while in panels (b,d,f,h) the LuxR concentration was set to 500 nM.

Figure 5 .
Figure 5. Relative estimation error (REE) and uncertainty of parameter estimates (CV) in a posteriori identifiability.Panels (a) and (b) report the distribution of REE (a) and CV (b) among 200 runs starting from different simulated data, with a 5% proportional error.The model and the number of LuxR levels included in the fitted data are specified for each boxplot.The red line represents the median of the distribution.Panels (c) and (d) report the median of the REE (c) and CV (d) distribution as a function of the proportional error entity, from 0 to 10%.The models and specific LuxR levels are described in the legend.In all the panels, the dashed horizontal line indicates the 100% value to facilitate the interpretation of the graphs.

Figure 5 .
Figure 5. Relative estimation error (REE) and uncertainty of parameter estimates (CV) in a posteriori identifiability.Panels (a,b) report the distribution of REE (a) and CV (b) among 200 runs starting from different simulated data, with a 5% proportional error.The model and the number of LuxR levels included in the fitted data are specified for each boxplot.The red line represents the median of the distribution.Panels (c,d) report the median of the REE (c) and CV (d) distribution as a function of the proportional error entity, from 0 to 10%.The models and specific LuxR levels are described in the legend.In all the panels, the dashed horizontal line indicates the 100% value to facilitate the interpretation of the graphs.

Table 1 .
Model parameters.The indicated values were used for simulation and, unless differently indicated, for the study of a posteriori identifiability.
a During the study of a posteriori identifiability, n 2 was expressed as AU with known value, while r T was expressed as nM/AU with unknown value; b When indicated, a range of copy number values was spanned; c Typical value of E. coli growth rate, which is used as the rate of intracellular protein dilution; d A value of 2 was used to simulate the presence of external load where indicated.

Table 2 .
Parameters estimated from in vivo experiments.Measurements were carried out using MG1655-Z1 as host strain.