Computational Modelling of Metabolic Burden and Substrate Toxicity in Escherichia coli Carrying a Synthetic Metabolic Pathway

In our previous work, we designed and implemented a synthetic metabolic pathway for 1,2,3-trichloropropane (TCP) biodegradation in Escherichia coli. Significant effects of metabolic burden and toxicity exacerbation were observed on single cell and population levels. Deeper understanding of mechanisms underlying these effects is extremely important for metabolic engineering of efficient microbial cell factories for biotechnological processes. In this paper, we present a novel mathematical model of the pathway. The model addresses for the first time the combined effects of toxicity exacerbation and metabolic burden in the context of bacterial population growth. The model is calibrated with respect to the real data obtained with our original synthetically modified E. coli strain. Using the model, we explore the dynamics of the population growth along with the outcome of the TCP biodegradation pathway considering the toxicity exacerbation and metabolic burden. On the methodological side, we introduce a unique computational workflow utilising algorithmic methods of computer science for the particular modelling problem.


Introduction
Escherichia coli strain BL21(DE3) is frequently used in synthetic biology with embedded protein expression (pET) vectors enabling heterologous protein expression [1][2][3][4][5][6]. Especially, it has recently found use as a cell factory for heterologous expression of entire biochemical pathways [7][8][9][10][11]. In spite of its common usage in metabolic engineering, the expression system in E. coli BL21(DE3) suffers from certain problems primarily caused by strong overexpression of recombinant proteins triggered by exposing the cognate LacI Q /P lacUV5 -T7 expression system to the synthetic inducer isopropyl-beta-D-thiogalactopyranoside (IPTG) [12]. Such negative effects that result from prioritising high levels of protein production to normal metabolic capacities in host cells are known as the metabolic burden [13,14]. One of the known factors causing the burden is the energetically expensive maintenance and replication of plasmid vectors carrying heterologous genes [15][16][17]. Other factors are associated with the activities of the foreign proteins which may interact with the metabolic network of the cell and burdens linked to the components of the expression system itself, such as IPTG. The individual factors may not always be additive, but can sometimes potentiate each other, leading to significantly larger effects on the living cells-the so-called exacerbation effect.
In our previous work, we designed and implemented a synthetic metabolic pathway for TCP biodegradation in E. coli [18]. The pathway was assembled with enzymes introduced to the cell using the pETDuet plasmids with LacI Q /P lacUV5 -T7 expression system inducible by IPTG. In relation to that work, we observed the effect of metabolic burden and toxicity exacerbation on E. coli population in [19]. However, the mechanisms behind these effects and their influence on cell growth have not been targeted. Deeper understanding of underlying mechanisms is extremely important for metabolic engineering of efficient microbial cell factories for biotechnological processes. To the best of our knowledge, the metabolic burden effect has not yet been fully-handled regarding the dynamics of affected metabolites and the influence on cell population growth. In [20], the authors reviewed existing fundamental frameworks for the kinetic modelling of microbial growth based on basic hypotheses about the underlying reaction systems. An integrated model presented in [21] couples the growth rate with the gene expression and the growth rate with the growing population of cells. To the best of our knowledge, existing population-level growth models neither consider metabolic interactions together with toxic effects caused by some metabolites nor describe a burden linked with using of engineered metabolic pathways. It is apparent that such effects cause a significant increase of the complexity of the underlying non-linear dynamics. To that end, a precise explanation of the resulting emergent behaviour remains a challenge.
The approach of computational systems biology gives a powerful tool allowing to study the dynamics of biological mechanisms holistically in terms of mathematical models. An important aspect of modern systems biology is the requirement to ground the models within the relevant genomic context of the explored organism [22]. It is obvious that utilising mathematical modelling and computational analysis based on the models makes an important starting point for successful experiment design in synthetic biology. To that end, the synthetic pathway we designed for conversion of the highly toxic TCP to glycerol (GLY) in E. coli [23] was optimised by using our original mathematical model [18] that allowed us to simulate the behaviour of the synthetic pathway in silico . The synthetic pathway is depicted in Figure 1. It is composed of a few toxic intermediates with harmless glycerol as a final product and it utilises enzymes from other bacterial species. These enzymes represent the engineered form of haloalkane dehalogenase (DhaA31, originally from Rhodococcus rhodochrous NCIMB 13064) and haloalcohol dehalogenase (HheC), and epoxide hydrolase (EchA) from Agrobacterium radiobacter AD1. They have a major role in this pathway; however, since they are heterologous proteins in E. coli, they have to be produced at the expense of other substances (causing a particular instance of the metabolic burden). In this paper, we propose a significant extension of the existing mathematical model of the pathway which addresses for the first time the combined effects of toxicity exacerbation and metabolic burden in the context of bacterial population growth. To calibrate and fine-tune the model with respect to our original synthetically modified E. coli strain, we conducted several dedicated wet-lab experiments. Based on the new model calibrated with respect to the real data, we explore the dynamics of the population growth along with the outcome of the TCP biodegradation pathway considering all the phenomena mentioned above. On the methodological side, we introduce a unique computational workflow utilising algorithmic methods of computer science. It allows us to fully exploit the expressive power of the model under parameter uncertainty.

Bacterial Strains and Plasmids
Escherichia coli strain BL21(DE3) was used for this study with two modifications. Here, they are called deg31 (carrying synthetic metabolic pathway containing recombinant plasmids pCDF::dhaA31 and pETDuet::echA-hheC) and host (carrying empty plasmids pCDF and pETDuet).

Preparation of Pre-Induced Cells
The lysogeny broth medium (i.e., LB medium) with volume of 10 mL containing respective antibiotic combination (75 µg/mL ampicillin, 25 µg/mL streptomycin) was inoculated with transformed cells from glycerol stock. The culture was incubated overnight at 37°C with shaking (180 rpm) in incubator Innova 44 (New Brunswick Scientific, Edison, NJ, USA-this machine was used for all incubations). Then, 25 mL of fresh LB medium with respective antibiotics were inoculated with 250 µL of night culture and incubated at 37°C with shaking (180 rpm) until OD 600 reached 1. The cultures were induced with 0, 0.01, 0.05, 0.2 and 1 mM IPTG and incubated overnight at 20°C. Cells were then collected at late exponential phase by centrifugation (4000× g) at 4°C in centrifuge Sigma 6-16K (Merck, Darmstadt, Germany-this machine was used for all centrifugation) and washed with 50 mM sodium phosphate buffer (NaP buffer, pH 7). After washing and centrifugation, cells were resuspended in NaP buffer to final value of 7 OD 600 .

Short Term Toxicity Test
Prior to the toxicity test, 4 mM TCP was diluted in 5 mL of NaP buffer in 25 mL sterile Reacti Flasks closed by screw caps and incubated for 1 h at 37°C. The reaction was initiated by mixing preincubated buffer with TCP with 5 mL of cells in NaP buffer, resulting in final OD 600 3.5 and 2 mM concentration of TCP. Reaction suspension was incubated in water bath at 37°C with constant shaking for 5 h.
Cell samples were taken from the reaction mixture at the beginning of the toxicity test and after 4 h of incubation. Cell suspension (100 µL) was aseptically withdrawn from the flask and serially diluted in 900 µL of PBS buffer (pH 7.4) up to the final dilution of 10 −6 to 10 −7 . Diluted cell suspensions (100 µL) was spread on Plate Count Agar (PCA) plates and incubated 24 h at 37°C. After incubation, colonies on agar plates were manually counted and expressed as colony forming units per volume (CFU/mL).

Growth Test
Cells of E. coli BL21(DE3) strain carrying empty plasmids and degrading strain deg31 were pre-grown in LB medium containing respective antibiotic combination (75 µg/mL ampicillin and 25 µg/mL streptomycin) at 37°C until the culture reached stationary phase. The cells were then centrifuged at 6000× g and resuspended in Synthetic Mineral Medium (SMM) [18]. SMM medium (15 mL) with 10 mM glycerol in 25-mL Reacti Flask closed with screw cap was preincubated at 37°C, growth test was initiated by addition of cells to 0.1 OD 600 . Growth medium was supplemented with IPTG to the final concentrations 0, 0.01, 0.05, 0.2 and 1 mM. Cells were incubated at 37°C with shaking (200 rpm). Optical density (OD 600 ) of the culture was measured periodically and cell dry weight (CDW) in g/L at given time intervals was determined by multiplying the OD 600 values by 0.39 g/L [24]. Acute toxicity measurement was performed in the same conditions, and the medium for toxicity test contained respective concentrations of TCP pathway metabolites.

Glycerol Analysis
Samples withdrawn from incubated cell cultures were heated for 10 min at 95°C, centrifuged and stored in the fridge prior to analysis. Free Glycerol Colorimetric/Fluorometric Assay kit (BioVision, USA) was used for analysis of glycerol content in the cells withdrawn from toxicity or growth tests following manufacturer's instructions.

Parameter Constraints
We consider a set of m unknown parameters p 1 , ..., p m and the set P ⊆ R m ≥0 denoting the so-called parameter space consisting of m-dimensional vectors of parameter values. For the purpose of this paper, we assume P to be bounded. The constraints on parameters could generally be of various types. In this paper, we consider linear constraints of the form: where p j ∈ R ≥0 : j ∈ {1, . . . , m} is a parameter; a j , b ∈ R stand for coefficients; and ∼ ∈ {<, >, ≤, ≥ , =, =} represents (in)equalities.

Biochemical Model
A biochemical model, in this context, is a dynamical system given as a set of ordinary differential equations (ODEs) of the formẋ where x is a vector of n variables, p is a vector of m model parameters, and f i is a kinetic function constructed as a sum of reaction rates where every sum member represents an affine or bi-linear function of x i ; a ramp or a Heaviside step function of x i (Figures 2 and 3, respectively); or a function from a limited set of non-linear functions of x i referred as sigmoidal functions; all of them possibly containing own set of internal parameters. In general, our framework covers mass action kinetics ( [25], Section 1.1) with stoichiometric coefficients not greater than one and all biologically relevant non-linear functions such as Michaelis-Menten [26] or Hill kinetics [27], and microbial growth kinetics such as Monod [28]. An additional requirement restricts the role of parameters in kinetic functions; in particular, we require f i is affine in p.

Inverse Problem
An inverse problem in mathematics is a process of finding a good model (with parameters) reflecting given data; mathematically speaking: where π is a vector of parameters of a model M and d represents the given data. Intuitively, the problem is the following: "How to find the proper set of parameter values π given a model M." In this paper, we distinguish two classes of the inverse problem-parameter estimation and parameter synthesis.
Parameter estimation is based on optimal fitting of the model parameters to observed experimental data. In the basic case, we assume the model to be a parameterised curve.
Parameter synthesis is a method that allows identification of satisfiable parameter values with respect to a given set of hypotheses restricting systems dynamics (described in a particular formalism) and a priori known parameter constraints (e.g., correlations of parameter values, constraints on production/degradation ratio, etc.). In this case, the parameters are assumed to be the model parameters (typically, the rate coefficients appearing in kinetic functions) [29][30][31][32][33][34][35][36][37][38][39][40].

Parameter Estimation and Regression
Parameter estimation is a well-known process for the identification of the model parameters from experimental data representing the observations of the system. The particular problem of finding a parameterised curve (i.e., function) that approximately fits a set of data is referred to as regression. A function is classified as either linear or non-linear concerning affinity of the fitted parameters. Consequently, the problem is classified as the linear regression or the non-linear regression according to the function class. In general, the linear regression is easier but of limited effectiveness and it is preferred if we know the function. In this paper, almost all concerned functions (i.e., models) belong to the non-linear class. Hence, we assume only non-linear regression further referred to as regression or fitting (for simplicity) [41].

Parameter Synthesis and Robustness Monitoring
Both approaches rely upon a temporal logic for specification of the behaviour properties of dynamical systems (Section 2.3). Although several different temporal logics have been developed, they can all be determined in the context of paths (or runs) of the system, i.e., the infinite sequences of states describing the system dynamics in consecutive time events. Such sequences are encoded in temporal logic formulae [42].
Parameter synthesis performs comprehensive exploration of the continuous parameter space including the space of initial conditions. It provides a qualitative answer to the question "which settings of a model satisfy a given set of temporal properties". It is not as widely used as parameter estimation and therefore many of the developed tools are still in a prototype phase [29][30][31][32][33][34][35][36][37][38][39][40].
Robustness monitoring enables quantitative evaluation of the robustness of a model with respect to a temporal property under some perturbation of parameters (typically, reaction rate constants or initial conditions) [43][44][45][46][47][48].

Analysis Workflow
Mathematical models in synthetic biology are often represented in terms of the ODEs system. These models are quantitative and their functionality relies on particular parameter values. Parameter estimation is often the only solution to obtain proper parameter values that fit with experimental observations. In this section, we present our methodology step by step. The presented workflow is an extension of the workflow we introduced in [37].

General Assumptions
There are several preliminary assumptions upon which the workflow is formulated. First, the model must be a metabolic pathway (i.e., a linked series of chemical reactions catalysed by enzymes [49]). Next, we consider several assumptions limiting the experimental settings of the studied system: 1. We assume the total inducer concentration to be constant in the time frame of our interest.
An inducer is supposed to have a function of an input parameter, and it would be an inadequate parameter should it be adjusted spontaneously over time. Otherwise, the inducer degradation rate would be needed either found in literature or extracted from experimental data. 2. The workflow is limited to protease-deficient bacterial strains (e.g., E. coli BL21). In particular, we assume the total concentration of every enzyme affecting the studied pathway is constant in the time frame of our interest. Moreover, no influx of the enzymes is permitted as a consequence of time-limited induction phase where the proteosynthesis takes place [50,51]. Additional synthetic processes are considered negligible in a microbial population stressed enough by the massive expression of (heterologous) genes during induction. 3. There occurs a metabolic burden effect caused by the heterologous genes expression during the induction process and possibly by the presence of an inducer itself which in a combined way affects the bacterial growth rate. 4. Finally, we assume the bacterial population is in the stationary phase after the induction process is finished.
These assumptions limit the use of the proposed workflow to studies of the synthetic metabolic pathway models describing the metabolite dynamics in a time frame of a few hours. Moreover, the workflow targets the protease-deficient bacterial strains assuming some of the enzymes are products of expression of heterologous genes initiated by a non-metabolisable inducer during the induction process. It is worth noting that these significant assumptions help to keep the mathematical model computationally feasible in terms of the number of potentially unknown parameters (thus, minimising the risk of over-parameterisation). In particular, the degradation rates of enzymes can be neglected in such settings.

Workflow Description
In the previous section, we define a metabolic pathway as a chain of chemical reactions with metabolites as the inputs and outputs of the reactions catalysed by enzymes. This form of the system is assumed to be the input to the workflow. The mathematical vector of concentrations of all model enzymes can be understood as the specific point in the enzymatic space. In Step 1 of the workflow, we focus on the reduction of the enzymatic space. Assuming that some of the enzymes are products of the induction process, their concentration can be expressed as the set of functions with an inducer as the input. There is one such function for each enzyme. The internal parameters of these functions need to be extracted out of experimental data-measured in various initial concentrations of the inducer-possibly with the help of parameter estimation methods. Typically, there are more enzymes than inducers in biochemical reactions. This step eliminates the extent of dependency of the model on enzymes and it provides the reduction of the number of potential model parameters.
Up to this point, the model was considered without any effect on the bacterial population. We propose to monitor and possibly predict an impact of the pathway on the population-such as the metabolic burden if using plasmids carrying heterologous genes-instead of modelling entire bacterial system. Following this idea, in Step 2 of the workflow, we extend the current model with new variables describing: (1) the bacterial population; and (2) a substrate used to feed the population. In general, there can be more sources of nutrition. The dynamics of the extension is defined by the growth rate and death rate functions. Each rate function must depend at least on the particular substrate and might depend on the inducer and any of the metabolites especially if they have a harmful effect on the population. To extract a proper rate function, fitting of the internal parameters to experimental data is needed. The traditionally used growth functions include Monod, Moser, Tessier, etc. [52]. They can also be used for diauxic growth ( [53], Section 12.1.3). At the cost of expanding the model with a few new variables, our framework allows predicting the population development depending on settings of the model parameters.
To fine-tune the model, we can use the formal computational methods of parameter synthesis mentioned in Section 2.4.2. This approach is more efficient than iterative analysis of demanding laboratory experiments used for fitting. Thus, in Step 3 of the workflow, we need to specify a set of relevant parameters-ideally, the coefficients that can be altered in an experiment design allowing validation of the model-based results. Besides that, we need to formulate the temporal properties tailored to the particular case study. In general, this requires having some a priori knowledge about the investigated system. Nevertheless, there exist several typical (template) properties which can be used universally (i.e., oscillation, bistability, etc.). By using parameter synthesis together with robustness monitoring, we can: (i) specify various hypothetical scenarios where experimental data are missing or even impossible to achieve in laboratory conditions; (ii) monitor the behaviour of the investigated model; and (iii) optimise a set of tunable model parameters.

Software Tools
For the fitting procedure, we use several tools. GraphPad Prism (https://www.graphpad.com/) (version 8.2.1 (441)) is used in Section 3.3 to fit the functions describing the stable concentration level of enzymes induced by IPTG. The tool GUI is easy to use for traditional functions such as Michaelis-Menten. However, to investigate and compare the usability of a comprehensive set of functions, we use a more advanced tool-the FME package (https://cran.r-project.org/web/packages/ FME/index.html) (version 1.3.5) [54] of the R language [55]. It offers well-documented procedures: parameter identifiability; parameter sensitivity; parameter fitting; and more.
For the parameter synthesis procedure, we utilise our tool Pithya [40], providing a parameter synthesis algorithm working with an expressive temporal logic HUCTL P [39,56]. Its main advantage is it enables a fully automatised investigation of the system dynamics including exploration of equilibria. The tool has GUI and CLI, both available on github (https://github.com/sybila/pithya-gui) and also online (http://pithya.ics.muni.cz).
For the robustness monitoring, we use our tool Parasim that implements the algorithm working with a signal temporal logic STL * [46][47][48] expressing properties of continuous signals. STL * operates with forward consecutive time events over real-valued signals (i.e., evaluated variables). Its strength is to identify various types of repetitive behaviour (i.e., oscillations). The tool is available on github (https://github.com/sybila/parasim) and makes part of the online toolset BioDivine (http://pithya. ics.muni.cz/galaxy).

Results and Discussion
In this section, we apply the sequence of steps defined in Section 2.5 to the model of TCP metabolic pathway ( Figure 1).
The outcome of the workflow is a novel model, the scheme of which is shown in Figure 4. It exhibits a modular structure, which was necessary to relieve the inverse modelling process (Section 2.4). We decided on this concept because the fitting of all new parts simultaneously would be practically impossible. Instead, only two to four parameters needed to be taken into account simultaneously. Due to this fact, the dedicated experimental data-capturing only partial behaviour-had to be obtained (Data S1-S3).

Extended Assumptions
Our model satisfies the general theoretical assumptions defined in Section 2.5.1: (1) the standard non-metabolisable IPTG inducer has been used [61]; (2) E. coli BL21 (DE3) is a protease deficient B strain, and all experiments were conducted in a closed environment with a limited amount of nutrients; (3) the core of the model is an expression of heterologous genes causing experimentally-provable metabolic burden [19]; and (4) all experiments were preceded by the induction phase after which the bacterial population was in stationary phase. Next, we assume the time frame of 5 h in which all metabolic pathway experiments and simulations were performed.
Next, we specify an extended set of assumptions characterising this case study: 1. We define a new variable called Bact standing for CDW (g/L) of E. coli population taken as 0.39 g/L = 1 OD 600 [24]. 2. We assume IPTG to be the only inducer for the synthetic pathway. Concentration of IPTG is considered to be constant in the given time frame. 3. Reversible reactions in the TCP-degradation pathway are considered negligible. 4. The initial concentration of substances (e.g., TCP 0 , GLY 0 , IPTG 0 ) and the population (i.e., Bact 0 ) determines the input for the system. 5. Dynamics of individual enzymes is approximated as a constant function of time in the given time frame. Moreover, enzymes dynamics is considered to be independent on the size of the bacterial population in the given time frame. 6. Total conversion of TCP into GLY is assumed to occur in a sufficiently long time reflecting the known behaviour of the pathway.
7. Viability of the bacterial population is given as the function of the pathway compounds toxicity, metabolic burden and the presence of nutrients (i.e., GLY). 8. Toxic effects of the pathway compounds are considered to be mutually independent. 9. Glycerol is the only assumed nutrient. 10. We assume natural degradation (death rate) of the bacterial population.  We are aware of the fact that Assumption (5) makes a significant approximation. The assumption reflects our former studies, in which we worked with pre-cultured pre-induced resting cells. In particular, estimated enzyme concentrations have been set as the endpoint values determined at a certain time interval after the overnight induction (Section 2.1.2).

Workflow Input: Synthetic TCP Degradation Pathway
Dvorak and co-workers [18] tested three different forms of the enzyme DhaA (two of them mutant) in order to improve the efficiency of the pathway. Nevertheless, we consider only the most effective form denoted DhaA31 (referred to here as DhaA for simplicity). This particular enzyme produces two different enantiomers of the DCP compound with a similar rate (k cat,TCP,(R)-DCP = 0.58 (s −1 ) as the 55% of 1.05 (s −1 ) for (R)-DCP and k cat,TCP,(S)-DCP = 0.47 (s −1 ) as the 45% of 1.05 (s −1 ) in favor of (S)-DCP). However, the enzyme HheC is enantioselective and prefers (R)-DCP (k cat,(R)-DCP = 1.81 (s −1 ) vs. k cat,(S)-DCP = 0.08 (s −1 ), where the greater means the better). As a consequence, (S)-DCP accumulates during the biodegradation process until (R)-DCP is consumed. Note that both enantiomers are supposed to be equally toxic to the host.
In general, reactions catalysed by HheC are reversible. However, in this particular case, the catalytic efficiency (i.e., k cat K M ) of EchA in turning ECH into CPD is much higher than the catalytic efficiency of HheC towards the reaction ECH → (R,S)-DCP. Due to this fact, the reverse reaction was removed from the mathematical model ( [23], Chapter 2). HheC also catalyses the reaction CPD ↔ GDL. The reverse reaction is compensated by a special kind of competitive version of Michaelis-Menten equation in the reaction GDL → GLY of the mathematical model. Thorough research can be found in [23]. However, we did a comparative test with several simulations of the model with competitive version of Michaelis-Menten from ( [23], Chapter 2) and the simplified model containing only common Michaelis-Menten equations and, according to the results in Figure S1, we decided to use the simplified model instead ( Figure S2).

Step 1: Enzymatic Space Settings and Reduction
Based on the simplifying assumptions mentioned above, the concentration levels of DhaA, HheC and EchA are represented as constants in the input mathematical model ( Figure S2). In such settings, these constants can be understood as parameters. To extend the model with respect to IPTG concentration and in agreement with Step 1 in Section 2.5.2, it is necessary to define functions describing the concentrations of enzymes depending on the concentration of IPTG. We extracted these functions by parameter fitting to experimental data obtained from densitometric analysis of enzyme cell-free extracts prepared from cells induced with different IPTG concentrations (more details in Appendix A).
To that end, we employed Michaelis-Menten kinetics because the data showed a good agreement with its shape ( Figure 5), although the MM is typically used for the description of a rate and not a concentration. The quantitative result of the fitting with statistical evaluation is shown in Table S1. The initial growth of the functions is in perfect agreement with the switch-like influence when using a non-metabolisable inducer such as IPTG on LacI Q /P lacUV5 -T7 expression system [62,63] which is employed in heterologous plasmids containing genes of enzymes DhaA, HheC and EchA. The resulting functions of stable concentrations for particular enzymes are shown in Equations (6)-(8), respectively:

Step 2: Integration with Population Growth
The original model does not take into account the bacterial population. Therefore, in agreement with Step 2 in Section 2.5.2, we introduce a mechanism explaining the dependency of the substrate (GLY) and the inducer (IPTG) on the population. The Monod equation is commonly used to explain bacterial population growth rate µ, and, indeed, the results are in good agreement with experimental data (Figure 6). For the sake of completeness, we have exemplified several different alternatives (Tessier ([64], in French), Moser [65], Aiba-Edwards [66], Andrews [67], etc.). For the full list of considered functions, see the comparative table (Table S2). Nevertheless, according to Figure S3, the best results were observed by Monod growth model defined in Equations (9) and (10), where µ is the specific growth rate, µ max is the maximum specific growth rate, K is the half-velocity constant, γ GLY is the rate of substrate utilisation and Y is the yield coefficient.
Traditional growth functions seem not to be sufficient to explain the effect of the metabolic burden caused by IPTG and heterologous genes expression, as displayed in Figure S4. Thus, considering the experimental data in Figure 7, we decided to use a specific function to model the gap between results for values t 1 = 0.01 and t 2 = 0.05 of mM IPTG 0 . It seems that the Heaviside step function ( Figure 3) is a straightforward option for the apparent step in-between resulting values. However, as we do not know the exact value of the breaking point, it is safer to assume the linear behaviour. Hence, we decided to employ the ramp function defined in Figure 2 in the way that the maximum specific growth rate constant µ max -estimated above using Monod growth function ( Figure 6)-is replaced with the ramp function defining an actual growth rate from the interval of the maximum (µ max ) and the minimum specific growth rate (µ min ) parameterised with IPTG 0 . The best result of the fitting is shown in Figure 8 and represents evidence of the metabolic burden caused by IPTG with heterologous genes expression.
For the completeness, we also assume a death rate coefficient (γ Bact ), the value of which fits approximately in the range of [3.876 · 10 −3 , 5.382 · 10 −4 ] (h −1 ). The uncertainty of these values comes from the fact that the death rate is usually not the main interest of microbiologists, and also some methods of measurement make the death phase hard to discern ( [68], Section 3.1.4). The origin of the range values is explained in Appendix B and has the base in [69]. Although the range is an approximation, it is a good starting point for further investigation by parameter synthesis and robustness monitoring. The decision of taking microbial population into account would not be complete without an understanding of the toxic effect of the pathway components. Here, the selection of the optimal function is not straightforward because no function has been made a standard for this purpose. Therefore, we investigated several functions (Appendix C). The best results are shown in Figures S5-S8 for TCP, ECH, CPD and GDL, respectively. Both DCP enantiomers were observed to have minimal effect on the population even for high doses (i.e., 4 mM).
With respect to the nature of the experimental data, some of the results were far from the best fit. Notably, the case of time-series data for TCP concentration of 2 mM was the worst ( Figure S5). However, in the long time horizon, the population stabilises more or less on the same CDW ( Figure S9). This fact indicates some delay in the cellular response to the presence of the toxic pollutant.
As we are interested in an explanation, or at least a mathematical description of the exacerbation effect, we introduced the Heaviside step function in the expression describing the TCP toxic effect on the bacterial population regarding of IPTG concentration. In other words, the purpose of the step function is to improve the TCP toxicity function according to an observation from the experimental data such that it simulates the exacerbation when TCP and IPTG effects are combined.  In one case, a simple evidence of the exacerbation phenomenon has been given in [19] where Dvorak and co-workers compared the results of similar experiments of pre-induced (IPTG +) or non-induced (IPTG −) cells incubated in buffer with TCP (TCP +) or without (TCP −) (Figure 9 displays all four combinations: −−, −+, +−, and ++). The figure clearly shows some unexplained disruption amplifying the population extinction, although the data are not sufficiently convincing.
In another, more extensive case, two datasets for the same type of experiment with E. coli population degrading over time are compared for various initial concentrations of IPTG-IPTG 0 ( Figure 10). The first dataset represents the individual effect of IPTG and the second one reflects the combined activity of IPTG and TCP. Remarkable is the fact that the exacerbation is more or less conserved for various values of IPTG 0 except for the case when IPTG 0 equals zero (displaying the toxic effect of TCP only). By acquiring a proportion of the median of the differences for various positive concentrations of IPTG 0 to the difference of the individual TCP effect (where IPTG 0 equals zero) we have obtained a value of 1.82. Due to the fact it is dimensionless, it is suitable for a wide variety of models regardless of the units used for determination of the bacterial population. Therefore, we incorporate this value as a potential exacerbation parameter (ex on = 1.82, whereas ex off = 1) in the next step of our workflow.  Note that the population in the first column of the first dataset is pre-induced with 0 mM IPTG and incubated in absence of TCP, which makes it a control group. The second column in the same dataset shows the sole effect of 2 mM TCP on the population. It is remarkable that the third columns in all other datasets seem to be in perfect match and approximately 1.82 times bigger than the same column in the first dataset. We explain this fact by the existence of the exacerbation effect. Different concentrations of IPTG were used for the induction of the TCP pathway expression from pCDF and pETDuet plasmids. Error bars represent standard deviations calculated from at least three independent experiments except for the rigthmost column in each dataset where error bars represent standard error of values in these columns.

Step 3: Model Dynamics Exploration
As the outcome of the previous part of the workflow, we obtained the extended ODE model ( Figure 11). It makes an input into Step 3 of the workflow (Section 2.5.2). For the purpose of the employed analysis techniques (parameter synthesis and robustness monitoring), we converted the model into two different formats compatible with the used software (the BIO (https://github.com/ sybila/ode-generator/blob/master/README.md#model-syntax) format and the SBML (http://sbml. org/Documents/Specifications) format, respectively). . An ODE model of the extended TCP metabolic pathway. The model represents a chain reaction for biodegradation of TCP into GLY reflecting the E. coli population. Note that the rate constants of the original ODE model were rescaled from seconds (s −1 ) into hours (h −1 ) because the data used for fitting were sampled every hour. It concerns the original rate constants (k 1 R , k 1 S , k 2 R , k 2 S , k 3 , k 4 , k 5 ). Units: k * , V * , µ * , γ * (h −1 ); t * , K * (mM); ex * , Y, n * (unitless).
Next, we specify the model parameters that make the objectives of further analysis. While the previous parts of the workflow targeted the internal parameters adapting the model for the particular cell population, in this part, we target the model parameters that can be perturbed and fine-tuned. In particular, we consider the following set of model parameters for tuning: 1. Concentration of IPTG in mM: IPTG is an obvious candidate for tuning because many aspects of the model depend on it and it can be controlled easily in the experimental environment (since its concentration is considered constant during the experiment, it can be referred by its initial concentration, denoted IPTG 0 ).

Size of bacterial population (Bact) in g/L:
The initial population size, denoted Bact 0 , makes the crucial input of the model and it affects the model output-the final population size (reached in a given time). In general, the initial population size can be controlled in experiments.

Initial concentration of TCP (TCP 0 ) in mM:
The key input of the model that must be set in order to make the modelled metabolic pathway work; it can be easily set to any arbitrary value during the experiments. 4. Death rate of the population (γ Bact ) in h −1 : The death rate is considered as a parameter because we are interested in the dynamics of the microbial culture and the effects affecting the growth.

Property 1.
The complete degradation of TCP as fast as possible with the least accumulated toxicity.
In [37], we declared the desired property of the model dynamics verbally (stated as Property 1). The model used in that study did not concern the bacterial population. Therefore, the notion of toxicity was interpreted as an artificial accumulation of the inhibitory effect of the particular pathway's (intermediate) products. The inhibitory effect was experimentally measured and is traceable in [18,23]. In the extended model, we are able to investigate directly the effect of the particular pathway's products on the bacterial population. To that end, the desired property is lifted to the population level resulting in Property 2. Due to the very abstract character, this property serves as a theoretical concept and several adjustments have to be performed to use it with computational analysis methods.
First, the part "the complete degradation of TCP" is translated into "the TCP concentration is close to zero or below a minimal threshold (e.g., 0.01 of mM)". This is due to the possible errors of numerical solvers and the nature of the employed model approximation/abstraction algorithms.
Second, the terms such as "as fast as possible" and "the most survived bacteria" cannot be interpreted numerically, which is necessary for the computing. For that reason, we use various numerical thresholds to instantiate such abstract terms in the specified property.
As we addressed in Section 2.4.2, both considered approaches (parameter synthesis and robustness monitoring) employ a particular temporal logic for the specification of properties. We employ various properties compatible with both approaches and we utilise their specific advantages.
The results of parameter synthesis contain comprehensive information about parameter values for all possible initial settings of variables in considered ranges. In particular, not all positive results (parameter values satisfying the particular property) are automatically valid in all initial settings of model variables. The parameter synthesis method computes only positive results for which there exist some valid initial values of model variables. It is also worth noting that, due to the model overapproximation performed inside the parameter synthesis procedure, the complement to the set of positive results does not necessarily imply a valid negative result [37]. In other words, the so-called false-positives might exist. Property 3. The population will not drop below 0.08 g/L until TCP will degrade fully (drop below 0.01 mM).
Addressing Property 2, we designed a suitable reformulation (Property 3), which is compatible with the parameter synthesis procedure. The results of parameter synthesis for this property are shown in Figure 12. The displayed parameters have been synthesised in given ranges. Every blue region depicts a set of values satisfying the stated property in at least one initial value of model variables. In this particular case, the blue regions make joint projections across all dimensions (i.e., parameters and variables) of the modelled system. Consequently, the property is confirmed to be satisfied by every combination of parameters (respectively, initial conditions) in the considered ranges.

Property 4.
Eventually, there will happen a situation where the population never exceeds a low value (stays below 0.08 g/L forever) and TCP concentration stays above 0.01 mM (never fully degrades).
To support the above results, we decided to consider a property with the opposite meaning (Property 4). In particular, the analysis resulted with no positive results which shows us a certain agreement with the previous analysis. However, as we have already addressed, we cannot directly interpret an "empty result" due to potential existence of false-positives. Therefore, we decided to investigate Property 4 more deeply with the help of the robustness monitoring analysis (see the analysis of Property 7).

Property 5.
Eventually, there will happen a situation in which TCP concentration is currently above 0.01 mM and the population never exceeds a low value (stays below 0.08 g/L forever).
An interesting fact appears in the analysis based on comparing Property 4 with its weaker form represented by Property 5. A difference between the two properties is only in the predicate about the TCP concentration. However, for the parameter synthesis method, the difference is significant; as in Property 5, we do not require the concentration of TCP to be above 0.01 mM forever. In Figure 13, it is shown that for the weaker property there exist positive results. However, they appear only for the population death rate (parameter γ Bact ) higher than 0.07 s −1 , which is an overrated value.
The parameter synthesis method works with abstractions of systems dynamics that do not consider time explicitly. However, it allows exploring patterns of model dynamics globally (regardless of the concrete settings of initial conditions). On the contrary, the robustness monitoring method considers time but works locally (i.e., dynamics is simulated in time for a given set of initial conditions). Therefore, there is a chance of missing an interesting behaviour occurring for an initial condition which is not included in the considered set. Property 6. The population will never drop below half of its initial value in the 5 h scope and TCP will degrade (drop below 0.01 mM) in the 2.5 h scope at the same time.
Property 6 gives another reformulation of the desired property that now includes timing aspects. It requires the entire TCP degradation to be realised in a certain time limit. The particular results obtained with robustness monitoring are shown in Figure S10. For brevity, we present a simple diagram ( Figure 14) showing that the change of the population death rate (γ Bact ) as well as the initial size of the population (Bact 0 ) have practically no influence on this property. In Figure S11, the range of the death rate coefficient is increased to the number of 0.1 (h −1 ), which is a considerably overrated value. However, the influence is still considered negligible. More interesting appears to be the effect of the TCP initial concentration (TCP 0 ) on the satisfiability of Property 6. The dependency between TCP 0 and IPTG 0 is non-linear ( Figure 15). . All the constants can be found in Figure 11. The shades of green colour imply a feasibility of the particular property in the particular initial setting while the shades of red imply a violation of the property-the darker the tone, the stronger the feasibility/violation. At the bottom of the plot, there is the feasibility scale mapped to real values. The plot represents a single layer of the entire parameter space.
To justify the results of parameter synthesis obtained for Property 4, we formulate a similar property that is compatible with the robustness monitoring method (Property 7). The results are shown in Figure S12-they contain only negative values (the property is definitely not satisfied in any sampled point). The summary of these results is available in a compact table (Table 1). For the reasonable range of the death rate coefficient, the results do not change significantly. However, there is some noticeable influence for the increased range of the death rate (up to 0.1 h −1 ) shown in Figure S13. Nevertheless, the obtained negative results support the previous outcome of parameter synthesis and even improve it by adding the quantitative information.  Robustness monitoring proves to be useful for this type of models. On the other hand, parameter synthesis approach is designed for models with a more complex relationship between variables where some interesting bifurcations can take place.

Conclusions
The effects of metabolites on the fitness of bacterial strains carrying artificial pathways are of great interest for the community of metabolic engineers. The adverse effects of toxic metabolites and metabolic burden on the host cells need to be considered and ideally even quantitatively characterised by the computational modelling. Here, we present development of such a model for the E. coli BL21(DE3) strain overexpressing the artificial pathway for degradation of highly toxic environmental pollutant 1,2,3-trichloropropane (TCP).
To assess the complex behaviour of the system, we created a unique mathematical model which combines population modeling with prediction of effects of metabolite toxicity and burden caused by application of the standard synthetic inducer IPTG triggering expression of heterologous biodegradation pathway. We were interested in conditions of the system in which the biodegradation is efficient while population survives and we investigated the system under these conditions using state-of-the-art computational methods.
First, we extended the original model of the metabolic pathway with a new model variable describing the bacterial population growth over time. The new model reflects the initial concentration of the inducer (IPTG 0 ). Second, we formalised the following features of the investigated system: (1) the metabolic burden effect caused by increased concentration of the inducer; (2) a highly toxic effect of TCP and other metabolites of the pathway; and (3) an exciting phenomenon of toxicity exacerbation occurred by the joint effect of the inducer (IPTG) and the pathway's substrate (TCP). Finally, using computational biology methods, we investigated the continuous parameter space of initial conditions and uncertain coefficients targeting the interesting properties of the model. We believe that the obtained results give a solid basis for further optimisation of the synthetic pathway in the considered strain.
Further refinement of the model is planned for future work. In particular, we aim at producing the data describing increasing enzyme concentrations in time. That will allow us to generalise the model by representing enzyme concentrations as model variables. Although the model itself is fine-tuned with respect to the considered E. coli BL21(DE3) strain, it can provide a modelling basis in different scenarios where a non-metabolisable inducer is used to control a closed cell population environment in stationary phase. On the computational side, the employed computational framework combining the carefully selected set of formal methods and simulation-based tools can be reused in any modelling case fitting the class of non-linear ODEs including the enzyme kinetics.
Supplementary Materials: The following are available online at http://www.mdpi.com/2076-2607/7/11/553/s1, Data S1: IPTG growth curves; Data S2: Growth on 10 mM glycerol; Data S3: Toxicity on 10 mM glycerol; Figure S1: The comparative test of two mathematical models for biodegradation of TCP; Figure S2: The model of the metabolic pathway for biodegradation of TCP without reverse reactions; Figure S3: The comparative simulation of two bacterial growth functions; Figure S4: Evidence of need for proper function describing population growth reflecting metabolic burden caused by IPTG; Figure S5: The results of fitting to population growth data reflecting toxicity caused by TCP; Figure S6: The results of fitting to population growth data reflecting toxicity caused by ECH; Figure S7: The results of fitting to population growth data reflecting toxicity caused by CPD; Figure S8: The results of fitting to population growth data reflecting toxicity caused by GDL; Figure S9: The results of fitting to population growth data reflecting toxicity caused by TCP-prolonged simulation; Figure S10: The results of robustness monitoring for Property 6; Figure S11: The results of robustness monitoring for Property 6 with extended range of the death rate coefficient; Figure S12: The results of robustness monitoring for Property 7; Figure S13: The results of robustness monitoring for Property 7 with extended range of the death rate coefficient; Table S1: The quantitative results for fitting of the functions explaining the enzymes concentration reflecting IPTG; and Table S2: The results of fitting several growth functions to the same set of experimental data.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: In Tables A1 and A2, the results of the experiments published in [19] are presented. These results contain data from two experiments, one of which is shown in Figure A1. The tables show relative portions of enzymes in total soluble protein content of cell-free extract for different starting concentrations of IPTG-the inducer used for induction of host cells (E. coli deg31) under conditions described in Laboratory Methods.
The total masses of enzymes mg 10 mL were calculated from value 4.02 (measured for 0.2 mM IPTG and coloured in blue) appropriately for the rest of IPTG concentrations reflecting different portions of total enzymes (Column 2 in Tables A1 and A2). The total mass values were then used for calculation of the enzymes mass in cell-free extract (Tables A3 and A4) with respect to the particular enzyme relative portion (Tables A1 and A2, respectively). Then, using atomic mass values in Table A5 and enzyme masses in cell-free extract from two different experiments, we calculated molar concentrations (mM) for all enzymes. Finally, the median and standard deviation were calculated from these values (Table A6) and used for fitting, as described in Section 3.3.

Appendix B. The Origin of the Death Rate Coefficient
We used the mean value of death rate in percentage per generation (% p.g.) from article Robust growth of Escherichia coli [68] for E. coli strain B/r, which is close to the BL21 we used in experiments. The particular value ranges from 0.5% p.g. to 1% p.g.. However, we need a death rate per hour (h −1 ) in our model. To achieve that, we divided p.g. values with the generation (doubling) time of the population in our model. The growth rate (g) in our model is in the range [0.07, 0.26] (h −1 ) according to fitting of the model to the experimental data. Thus, the particular values of generation time (t g ) are between 2.58 h and 9.29 h for the doubling of the population. Both growth rate and doubling time were obtained as a median from at least three experiments. The final death-rate-per-hour values-limiting the range-are the maximum and minimum (emphasised numbers) from Table A7. Table A7. The death rate (γ) coefficients evaluated from different growth rates (g) using the generation time (t g ) and percentage death rate per generation (% p.g.).

Appendix C. The List of Considered Inhibition Models
In this step of modelling, we found a good model describing the bacterial population growth on a substrate (e.g., glycerol). In this case study, the model involves traditional Monod equation [28] as the growth rate function: where µ is the specific growth rate, µ max is the maximum specific growth rate for the culture, [S] is the substrate concentration, and K S is the half-saturation constant (or the affinity constant). Both µ max and K S reflect intrinsic physiological properties of a particular type of microorganism, the substrate utilisation and the temperature of growth ( [67], Chapter 3). The Monod equation can express the rate of change in the number of cells as well as the fluxion of the cell mass (both can be defined by [B]). Thus, the whole model of bacterial growth is: In general, we could use a different model as the growth rate function [52]. We only need to have some proper model and think of it as an abstract module describing population growth. In this way, we can use a different module to explain the inhibition of the population growth by a poisonous substance (e.g., a toxic water pollutant such as 1,2,3-trichloropropane, TCP). However, there is no universal inhibition model. Therefore, we conducted several simulations with various combinations of models to fit the experimental data properly. In the following list, [B], [S] and [X] stand for the bacterial population, the substrate concentration and the toxic substance concentration, respectively. The majority of the following equations consist of a growth module (e.g., Monod) and some inhibition module (separated by minus character). Equations (11)-(16) adopt a different form (in general called competitive inhibition [25]) using one module (e.g., Monod) where K i and K X are inhibition constants explaining inhibitory effects of poisonous substance at higher concentrations; k is an inhibitory rate; and n is a coefficient explaining cooperativity (as defined in Hill equation) or another biochemical property, depending on the context.