Dynamic Sequence Specific Constraint-Based Modeling of Cell-Free Protein Synthesis

Cell-free protein expression has emerged as an important approach in systems and synthetic biology, and a promising technology for personalized point of care medicine. Cell-free systems derived from crude whole cell extracts have shown remarkable utility as a protein synthesis technology. However, if cell-free platforms for on-demand biomanufacturing are to become a reality, the performance limits of these systems must be defined and optimized. Toward this goal, we modeled E. coli cell-free protein expression using a sequence specific dynamic constraint-based approach in which metabolite measurements were directly incorporated into the flux estimation problem. A cell-free metabolic network was constructed by removing growth associated reactions from the iAF1260 reconstruction of K-12 MG1655 E. coli. Sequence specific descriptions of transcription and translation processes were then added to this metabolic network to describe protein production. A linear programming problem was then solved over short time intervals to estimate metabolic fluxes through the augmented cell-free network, subject to material balances, time rate of change and metabolite measurement constraints. The approach captured the biphasic cell-free production of a model protein, chloramphenicol acetyltransferase. Flux variability analysis suggested that cell-free metabolism was potentially robust; for example, the rate of protein production could be met by flux through the glycolytic, pentose phosphate, or the Entner-Doudoroff pathways. Variation of the metabolite constraints revealed central carbon metabolites, specifically upper glycolysis, tricarboxylic acid (TCA) cycle, and pentose phosphate, to be the most effective at training a predictive model, while energy and amino acid measurements were less effective. Irrespective of the measurement set, the metabolic fluxes (for the most part) remained unidentifiable. These findings suggested dynamic constraint-based modeling could aid in the design of cell-free protein expression experiments for metabolite prediction, but the flux estimation problem remains challenging. Furthermore, while we modeled the cell-free production of only a single protein in this study, the sequence specific dynamic constraint-based modeling approach presented here could be extended to multi-protein synthetic circuits, RNA circuits or even small molecule production.


Introduction
Cell-free protein expression has become a widely used research tool in systems and synthetic biology, and a promising technology for personalized point of use biotechnology [1].Cell-free systems offer many advantages for the study, manipulation and modeling of metabolism compared to in vivo processes.Central amongst these is direct access to metabolites and the biosynthetic machinery, without the interference of a cell wall or the complications associated with cell growth.This allows us to interrogate (and potentially manipulate) the chemical microenvironment while the biosynthetic machinery is operating, potentially at a fine time resolution.Cell-free protein synthesis (CFPS) systems are arguably the most prominent examples of cell-free systems used today [2].However, CFPS in crude E. coli extracts has been used previously to explore fundamental biological questions.For example, Matthaei and Nirenberg used E. coli cell-free extracts to decipher the genetic code [3,4].Later, Spirin and coworkers continuously exchanged reactants and products in a CFPS reaction, which improved protein production.However, while these extracts could run for up to tens of hours, they could only synthesize a single product and were likely energy limited [5].More recently, energy and cofactor regeneration in CFPS has been significantly improved; for example, ATP can be regenerated using substrate level phosphorylation [6] or even oxidative phosphorylation [2].Today, cell-free systems are used in a variety of applications ranging from therapeutic protein production [7] to synthetic biology [1,8].There are also several CFPS technology platforms, such as the PANOx-SP (PEP, Amino Acids, NAD, Oxalic Acid, Spermidine, and Putrescine) and Cytomin platforms developed by Swartz and coworkers [2,9], and the transcription and translation (TX-TL) platform of Noireaux [10].Taken together, CFPS is a promising technology for protein production.However, if CFPS is to become a mainstream technology for applications such as point of care biomanufacturing, we must first understand the performance limits of these systems, and eventually optimize their yield and productivity.Toward this unmet need, we have developed a dynamic constraint-based modeling that can be used to interrogate cell-free systems.
Genome scale stoichiometric reconstructions of microbial metabolism popularized by static, constraint-based modeling techniques such as flux balance analysis (FBA) have become standard tools [11].Since the first genome scale stoichiometric model of E. coli, developed by Edwards and Palsson [12], well over 100 organisms, including industrially important prokaryotes such as E. coli [13] or B. subtilis [14], are now available [15].Stoichiometric models rely on a pseudo-steady-state assumption to reduce unidentifiable genome-scale kinetic models to an underdetermined linear algebraic system, which can be solved efficiently even for large systems using linear programming.Traditionally, stoichiometric models have also neglected explicit descriptions of metabolic regulation and control mechanisms, instead opting to describe the choice of pathways by prescribing an objective function on metabolism.Interestingly, similar to early cybernetic models, the most common metabolic objective function has been the optimization of biomass formation [16], although other metabolic objectives have also been estimated [17].Recent advances in constraint-based modeling have overcome the early shortcomings of the platform, including describing metabolic regulation and control [18] and incorporating genome sequence into the model [19,20].Dynamic constraint-based methods have also been developed in which the metabolic flux is computed over short-time intervals subject to time-varying constraints [21].These methods are common, have been used in varied applications, [22][23][24][25], and there are open source packages to support this class of calculation [26][27][28].Thus, constraint-based approaches, and their dynamic extensions, have proven useful in the discovery of metabolic engineering strategies and represent the state of the art in metabolic modeling [29,30].However, while constraint-based tools have been used extensively to analyze whole cells systems, they have not yet been widely applied to study cell-free reactions.
In this study, we constructed a dynamic constraint-based model of cell-free protein expression.This approach avoids the pseudo-steady-state assumption found in traditional constraint-based approaches, which allowed for the direct integration of dynamic metabolite measurements into the flux estimation problem, along with the accumulation or depletion of network metabolites.We adapted the sequence specific constraint-based model of Vilkhovoy and coworkers [31] into a dynamic constraint-based model of cell-free E. coli metabolism and protein production, and leveraged the kinetic model of Horvath and coworkers [32] to provide synthetic data to inform metabolite constraints.CFPS synthesis is often (but not always) conducted in small scale batch reactors.Thus, the concentration of the components of the reaction mixture, and the associated rates of the metabolic processes in the reaction are not always constant.The Vilkhovoy et al. study considered only the first hour of the CFPS reaction producing the model protein, chloramphenicol acetyltransferase (CAT).During this initial phase, the metabolic rates were approximately constant and a classical sequence specific flux balance analysis approach was sufficient to describe protein synthesis [31].However, after this initial phase, there was a significant shift in productivity following the exhaustion of glucose (which occurred at approximately 1.5 h).Horvath and coworkers developed a fully kinetic model that described the complete three hour reaction time course, including the shift in productivity following glucose exhaustion [32].While this model described the CFPS dynamics, and the decrease in productivity following the exhaustion of glucose, the identification of the model from 37 measured metabolite trajectories was difficult.Thus, there was an unmet need for a tool that could describe the dynamics of a CFPS reaction, without the burden of identifying a full kinetic model.Toward this need, we developed a dynamic constraint-based modeling approach for CFPS reactions which directly incorporated metabolite measurements (as constraints) into the flux calculation.The dynamic constraint-based model satisfied time-dependent metabolite constraints, while predicting the concentration of the CAT protein and unconstrained metabolite concentrations.Model interrogation suggested the most important metabolite constraint was glucose, as excluding glucose yielded the greatest metabolite prediction error, and the greatest uncertainty in the estimated metabolic flux.Furthermore, we evaluated metabolite constraint sets with one more and one fewer metabolites than the base case (the 37 measured metabolites) to explore the impact of measurement selection on model performance.The single addition of metabolites yielded no significant improvement in the predictive power, while the single exclusion suggested glucose to be the most important measured metabolite in the base case.Next, we selected measurement species based on the results of singular value decomposition on the stoichiometric matrix.The top 36 species from the singular value decomposition (SVD) analysis with the addition of glucose improved the predictive power and reduced flux uncertainty compared with the base case.Finally, we developed a heuristic optimization approach to estimate the optimal list of metabolite measurements.This approach significantly improved metabolite prediction compared to the base case.However, both the base measurement set and the heuristically optimized experimental design poorly characterized flux uncertainty.Taken together, this suggested that dynamic constraint-based modeling can aid in experimental design and measurement selection for metabolite prediction, but the flux estimation problem remains challenging.Furthermore, while we modeled the cell-free production of only a single protein in this study, the sequence specific dynamic constraint-based modeling approach presented here could be extended to multi-protein synthetic circuits, RNA circuits [33] or even small molecule production.

Cell-Free E. coli Metabolic Network
We constructed the cell-free stoichiometric network by removing growth associated reactions from the iAF1260 reconstruction of K-12 MG1655 E. coli [13], and removing reactions not present in the cell-free system (see Materials and Methods).We then added the transcription and translation template reactions of Allen and Palsson for the CAT protein [19].Thus, our stoichiometric network described the material and energetic demands for transcription and translation at sequence specific level.The metabolic network consisted of 264 reactions and 146 species; a schematic of the central carbon metabolism is shown in Figure 1.The network described the major carbon and energy pathways and amino acid biosynthesis and degradation pathways.Lastly, we removed genes from the network that were knocked out in the E. coli host strain used to make the cell-free extract (A19 ∆tonA ∆tnaA ∆speA ∆endA ∆sdaA ∆sdaB ∆gshA); see Jewett et al. for further details of the host strain and the cell-free extract preparation [34].Using this network, we simulated time-dependent cell-free production of the model protein CAT.We used dynamic modified flux balance analysis, a stoichiometric modeling technique that does not make the pseudo steady state assumption and allows the accumulation and depletion of metabolite species.Horvath and coworkers predicted time-dependent cell-free production of CAT using a fully kinetic model trained against an experimental dataset of 37 metabolites, including the substrate glucose, the protein product CAT, organic acids, amino acids, and energy species [32].This model was used to generate the metabolite constraints used in this study.Transcription and translation rates were subject to resource constraints encoded by the metabolic network, and transcription and translation model parameters were largely derived from literature (Table 1).In this study, we did not explicitly consider protein folding.However, the addition of chaperone or other protein maturation steps could easily be accommodated within the approach by updating the template reactions (see Palsson and coworkers [20]).The cell-free metabolic network and all model code and parameters can be downloaded under an MIT software license from the Varnerlab website [35].

Dynamic Constrained Simulation of cell-free Protein Synthesis
Cell-free synthesis of the CAT protein showed two production phases, an initial fast production phase before glucose exhaustion (at approximately 1.5 h) and a slow production phase following glucose exhaustion.The metabolite profile varied significantly between these phases; for example, pyruvate and lactate were produced during the first phase but consumed during the second.Thus, a static pseudo steady state flux balance approach was not possible for this system.However, a central advantage of cell-free systems is direct access to metabolite measurements, and the biosynthetic machinery during production.If we could directly integrate dynamic metabolite and protein concentration measurements into the flux estimation problem, we could potentially get a better estimate of the flux distribution.Toward this question, we developed a dynamic modeling approach in which metabolic fluxes were estimated so that all metabolites were non-negative and the simulated metabolites were constrained to lie within a bounded range of the measured value.Using this technique, we simulated the cell-free production of CAT subject to dynamic metabolite measurements.
We explored the influence of uncertainty in the transcription (TX) and translation parameters (TL) by sampling different values for the abundance and elongation rates of RNA polymerases and ribosomes, the polysome amplification constant, the mRNA degradation rate and other kinetic parameters appearing in the transcription and translation bounds.The base values for the TX/TL parameters are given in Table 1, and the uniform sampling procedure is described in the Materials and Methods.Central carbon metabolites (Figure 2), amino acids (Figure 3), and energy species (Figure 4) in the synthetic measurement set were captured, within experimental error, by an ensemble of dynamic constraint-based simulations.The synthetic metabolite constraints (blue regions) shown in each of the simulation figures was derived from the kinetic model of Horvath et al. [32], which was trained on experimental measurement of the 37 metabolites shown in Figures 2-4.Thus, the metabolite constraints used in this study were calculated based upon the kinetic model, which shows high fidelity with the experimental measurements.The flux estimation problem converged in greater than 99% of the simulation time intervals, given these metabolite constraints.This suggested there were not gross measurement errors in the measurement constraints, as the stoichiometric constraints were satisfied.Moreover, it suggested the error introduced by the time discretization scheme did not lead to inconsistent metabolite estimates.The ensemble of models captured the time evolution of protein biosynthesis, and the consumption and production of organic acid, amino acid and energy species.Arginine and glutamate were excluded from the constraint set, but were still largely captured by the ensemble of dynamic constraint-based models, although with wide variance than the synthetic measurement set.During the first hour, glucose was consumed as the primary carbon source for ATP, amino acids, and protein synthesis.After glucose was depleted, lactate and pyruvate were consumed as alternate substrates for energy production and CAT synthesis.Taken together, we captured the 37 metabolite measurements in the base synthetic data set, and captured the biphasic behavior of CAT production, although we significantly over-predicted the translation rate for some elements of the ensemble.This suggested that there was excess capacity in the metabolic network, which could be used to enhance protein production.Central carbon metabolism, including glucose (substrate), chloramphenicol acetyltransferase (CAT) (product), and intermediates, as well as total concentration of energy species (energy total).The energy total denotes the summation of all energy species in the model (all bases and all phosphate states).The 95% confidence interval for the simulation conducted over the ensemble of transcription/translation parameter sets is shown in the orange shaded region, while the 95% confidence interval for the synthetic constraint data is shown in the blue shaded region.The synthetic data constraints were generated from the kinetic model of Horvath et al., which was trained using experimental measurements of the system simulated in this study [32].
We quantified the uncertainty in the estimated metabolic flux distribution, given constrained CAT production using flux variability analysis (FVA) for the base synthetic data across the three hours of measurement (Tables 2 and 3).The analysis was divided into two phases: phase 1 where glucose was consumed as the carbon source, and phase 2 when glucose was depleted and lactate and pyruvate were utilized.The reactions associated with protein synthesis (translation initiation, translation, tRNA charging, mRNA degradation) were unsurprisingly the most constrained, as CAT production was forced to remain the same.Transcription was not varied in this analysis.On the other hand, glycolytic, pentose phosphate, and Entner-Doudoroff reactions were not highly constrained, indicating the robustness of substrate utilization.However, one exception to this was the net reaction through zwf reaction, which was tightly constrained, suggesting that glycolysis alone cannot support protein production.Interestingly, although the two phases consumed different carbon sources, the flux variability remained similar.Taken together, these results suggested that there was significant flexibility in the ability of the metabolic network to meet the carbon and energy demands of protein synthesis.Next, we explored alternative measurement sets to constrain the simulation of cell-free protein synthesis.The 95% confidence interval for the simulation conducted over the ensemble of transcription/translation parameter sets is shown in the orange shaded region, while the synthetic constraint data is shown in the blue shaded region.Arginine and glutamate were excluded from the constraint set.The synthetic data constraints were generated from the kinetic model of Horvath et al., which was trained using experimental measurements of the system simulated in this study [32].Table 2. Flux uncertainty calculated using flux variability analysis for the base synthetic dataset during the first production phase (0 h to 1.5 h), normalized to the glucose consumption rate.3. Flux uncertainty calculated using flux variability analysis for the base synthetic dataset during the second production phase (1 h to 3 h), normalized to the glucose consumption rate.

Enzyme Reaction Uncertainty
Step    The 95% confidence interval for the simulation conducted over the ensemble of transcription/translation parameter sets is shown in the orange shaded region, while the synthetic constraint data is shown in the blue shaded region.The synthetic data constraints were generated from the kinetic model of Horvath et al., which was trained using experimental measurements of the system simulated in this study [32].

Alternative Measurement Sets
The base synthetic data set, consisting of 36 metabolite time series and the protein product CAT, was the measurement set used to train the kinetic model of Horvath et al. [32].Thus, the confidence intervals on the synthetic data used in this study as constraints on the flux estimation problem were informed by experimental measurements of glucose, organic and amino acids, energy species and the protein product CAT.However, we have no a priori reason to suppose that this experimental design was optimal.Toward this question, we performed simulations and flux variability analysis (FVA) for alternative synthetic data sets to understand the importance of measurement selection when characterizing CFPS (Figures 5 and 6).In all cases, we assumed the same sampling frequency as the base synthetic dataset, but we varied which species were measured.First, we removed each of the 37 metabolites from the base set, one at a time, to create 37 measurement exclusion sets, consisting of 36 metabolites each (Figure 5, light gray dots).For each set, the state the dynamic model was used to calculate a value of error against the synthetic data, and FVA was used to calculate a value of flux uncertainty.Most of the exclusion sets clustered around the base case, with error values between 75% and 110%, and flux uncertainties between 93% and 103%, of the base case.The exception to this was the glucose exclusion set, which showed 89% higher error and 7% greater flux uncertainty.Within the primary cluster, a slight pattern emerged: the sets in which an organic acid were removed tended to result in increased error, while the removal of an amino acid tended to reduce error; however, this was not true across all metabolites.We also performed the analysis on several inclusion sets to determine which additional metabolites could improve predictive power (Figure 5, black dots).In particular, we added unmeasured central carbon metabolites to the base case, which resulted in 23 inclusion sets, consisting of 38 metabolites each.As with the exclusion sets, most of the inclusion sets clustered around the base case, with error values between 72% and 103%, and flux uncertainties between 94% and 102%, of the base case.Considering all exclusion and inclusion sets, there was generally no correlation between the metabolite prediction error and flux uncertainty.Taken together, these suggested central carbon metabolites, especially glucose, were important to characterize the network, but performing single additional measurements was not enough to significantly increase predictive power.Next, we explored whether measurement selection could be based upon the structural features of a metabolic network.Toward this question, we used singular value decomposition (SVD) of the stoichiometric matrix to suggest which metabolites should be measured.
Singular value decomposition (SVD) measurement selection outperformed the base case, with a prediction error improvement of 11% and similar flux variability (Figure 5, open square).SVD was used to decompose the stoichiometric matrix into 105 modes.The top 36 metabolites that had the greatest weighted sum across the modes that accounted for 95% of the network were estimated.Since our exclusion analysis identified glucose as the single most important metabolite, we added it to the top metabolites as determined by SVD to obtain a 37-metabolite constraint set, consisting of: GTP, GDP, GMP, ATP, ADP, AMP, UTP, UMP, CTP, CMP, GLN, GLU, ASP, LYS, LEU, HIS, THR, PHE, ALA, VAL, TYR, GLY, SER, H, ASN, ILE, MET, AKG, PYR, ARG, CYS, NH3, FUM, SUCC, TRP, ACCOA, and glucose.The SVD measurement set suggested that energy, and amino acid species carried the most information compared to central carbon species, which made up a relatively smaller fraction of the list.Surprisingly, the measurement set selected by SVD was approximately 80% similar to the original synthetic data generated by hand.However, the 20% difference was enough to improve the prediction error by approximately 11%.Taken together, measurements selected by SVD decomposition of the stoichiometric matrix improved the prediction of metabolite abundance, but SVD-based measurement selection did not improve flux variability.Next, we used heuristic optimization to systematically investigate the effect of changing the dimension and identify the measurement constraints (Figure 6).In particular, we minimized the error and flux variability of model predictions by varying the metabolites that appeared in the synthetic constraint set.We used a binary simulated annealing algorithm to switch metabolite membership in the constraint set on or off, and thus generated an ensemble of >200 measurement constraint sets (Figure 6A).While there was no strict error threshold, the simulated annealing algorithm was less likely to accept high-error sets into the ensemble; thus, the error of most sets in the ensemble was less than that of the base case.Specifically, the error varied from just over double to less than one ten-thousandth of the base case.Flux uncertainty was also a component of the objective function, but was only improved by 7%, suggesting this performance metric was tightly constrained; network flux values were not well characterized, even with comprehensive training datasets.As expected, there was an inverse relationship between the number of metabolite constraints and the prediction error (Figure 6B).However, the slope of that trend was striking; error was improved by three to four orders of magnitude, simply by increasing the number of constraints by 11 or fewer.Furthermore, the base synthetic measurement set was outperformed by the majority of the ensemble; often, the simulated annealing approach achieved the same error with fewer constraints, or much lower error with the same, or even fewer, constraints.This suggests that, while comprehensive, the original synthetic dataset was not optimal in terms of predictive power per measurement.However, the base case was one of the best in terms of reducing flux uncertainty.

Normalized Error Normalized Flux Uncertainty
Lastly, we investigated which metabolites were most effective at improving predictive power by considering how often it appeared in the ensemble (Table 4).Glucose unsurprisingly appeared most often (tied with G6P), but interestingly was not in every constraint set.Those that did not contain glucose had some of the highest errors, but also some of the smallest constraint set sizes (Figure 6B, black dots).The most frequent metabolites from the heuristic method were largely from glycolysis, pentose phosphate, and the TCA cycle (compared to the SVD analysis which gave greater consideration to energetic and amino acid species).To further understand the species selection, we calculated the frequency of appearance in the 57 best sets, those with error of least three orders of magnitude lower than the base case (Table 5).Nineteen metabolites appeared in all of these sets, and all but Alanine were central carbon metabolites (defined here as glycolysis, pentose phosphate, TCA).Taken together, measurement selection made a significant difference in capturing dynamic metabolite abundance in cell-free protein synthesis.Although the error decreased with increasing measurement number overall, the specific combination of metabolites was arguably even more important.Metabolic fluxes, however, remained unknown despite the large number of measurements taken.

Normalized Error Number of Metabolite Constraints
Normalized Error Normalized Flux Uncertainty A B Figure 6.Flux uncertainty and metabolite prediction error for the simulated annealing experimental design approach.(A) normalized flux uncertainty versus normalized metabolite prediction error; (B) number of metabolite constraints versus normalized metabolite prediction error.Error was computed for the synthetic experimental designs normalized to the base synthetic dataset (white star).Sets that include glucose are show as gray circles, while those that do not are represented with black circles.

Discussion
In this study, we presented a dynamic constraint-based model of cell-free protein expression.This approach avoids the pseudo-steady-state assumption found in traditional constraint-based approaches, which allowed for the direct integration of metabolite measurements into the flux estimation problem, and the the accumulation or depletion of network metabolites.The approach used the E. coli cell-free protein synthesis metabolic network from Vilkhovoy and coworkers [31], and the simulated metabolite trajectories from the kinetic model of Horvath et al. [32] as constraints on the CFPS flux calculation.The dynamic constraint-based model satisfied time-dependent metabolite measurement constraints, predicted unconstrained metabolite concentrations as well as the concentration of a model protein, chloramphenicol acetyltransferase (CAT).Model interrogation suggested the most important metabolite measurement within the dataset to be glucose, as excluding the glucose yielded the greatest metabolite prediction error, and the greatest uncertainty in the estimated metabolic flux.Furthermore, we evaluated metabolite constraint sets with one more and one fewer metabolites than the base case (37 metabolites) to explore the impact of measurement selection on model performance.The single addition of metabolites yielded no significant improvement in the predictive power, while the single exclusion suggested glucose to be the most important measured metabolite in the base case.Next, we selected measurement species based on the results of singular value decomposition on the stoichiometric matrix.The top 36 species from the SVD analysis with the addition of glucose improved the predictive power and reduced flux uncertainty compared with the base case.Finally, we described a heuristic optimization approach to estimate the optimal list of metabolite measurements.Measurement sets determined by heuristic optimization vastly outperformed the accuracy of the base synthetic dataset; model precision, meanwhile, was virtually unchanged despite comprehensive measurement sets.Taken together, model interrogation showed that, even with a comprehensive dataset, there still exists a great amount of uncertainty associated with metabolic fluxes.This highlights the need for fluxomic data to fully understand biological networks.
Despite synthetic datasets consisting of greater than 30 metabolite time series, estimates of metabolic flux were largely uncertain.Flux variability analysis suggested that the metabolite constraints could be met with a wide range of different flux distributions.For instance, an open question in cell-free systems is the balance between glycolytic versus pentose phosphate pathway flux.In previous studies of E. coli cell-free protein synthesis, the kinetic model of Horvath and coworkers suggested that glucose was consumed primarily by glycolytic reactions, with minimal flux into the pentose phosphate pathway.However, Vilkhovoy et al. estimated, using sequence specific flux balance analysis with the same experimental dataset, that the CAT production was unaffected by the choice of pentose phosphate pathway versus glycolysis; deletion of either pathway did not change protein productivity.To answer this discrepancy, model analysis showed, during the first phase when glucose was being consumed, glycolytic and pentose phosphate fluxes (pgi and zwf, respectively) exhibited large uncertainty, as either could be utilized to satisfy CAT production.The measurement selection analysis was conducted by excluding or including a metabolite from the constraint set.The exclusion sets were dominated by the removal of glucose, and to a lesser extent the organic acids, suggesting that measurements of central carbon metabolism intermediates were more important than energetic and amino acid measurements.However, the inclusion sets showed no significant effect on error and flux uncertainty.There was generally no correlation between the error and flux uncertainty of a model constrained to a particular metabolite set, except with respect to the outlier glucose.Model calculations showed that, even with a comprehensive data set of 37 metabolite measurements, there was significant flux uncertainty.This suggested that there were many flux combinations that could give rise to the same set of time course measurements.This phenomenon was further supported by analyzing the ensemble of constraint sets determined by heuristic optimization.Although the optimization algorithm reduced the objective function by four degrees of magnitude, the flux variability remained stagnant in comparison.An ensemble of measurement sets ranging from 22 to 48 metabolite constraints was only able to reduce flux uncertainty by 7% from the base synthetic data set.The dynamic constraint-based model showed high flux variability in important branch points, including the glucose-6-phosphate split between glycolysis and pentose phosphate, the 6PGC split into pentose phosphate and Entner-Doudoroff, and the pyruvate split into TCA cycle versus lactate production.This may be why the high overall flux variability was robust to the varying of metabolite constraints.Using three different sampling approaches (single additions/exclusions, singular value decomposition, simulated annealing) coupled with the dynamic constraint-based model, we estimated key metabolites that could be prioritized in measurement selection, such as glucose.Although measuring central carbon metabolites and amino acids is the intuition of most researchers, model interrogation was able to provide the importance of certain species over others; for instance, measuring G6P, G3P, and F6P would be more fruitful than measuring PEP.Interestingly, many of the most valuable measurements were involved in upper glycolysis and pentose phosphate, such as glucose, G6P, and 6PGL.This may be because upstream metabolites have an effect on more of the network; any error or uncertainty in these metabolites will cascade down the rest of the network and magnify throughout.Taken together, the dynamic constraint-based model quantitatively affirmed the robustness of metabolism, and illustrated the complexity of inferring flux information from metabolite concentrations.Ultimately, to determine the metabolic flux distribution occurring in a cell-free system, we need to add additional constraints to the flux estimation calculation.This study suggested that metabolite measurements alone were not sufficient.However, these are not the only experimentally realizable types of constraints.For example, thermodynamic feasibility constraints may result in a better depiction of the flux distribution [37,38], and 13 C labeling constraints in cell-free systems could provide significant insight.However, while 13 C labeling techniques are well established for in vivo processes [39], application of these techniques to cell-free systems remains an active area of research.
The use of cell-free systems as a personalized point of care biomanufacturing tools, as platforms for vaccine development, or as the basis for portable pathogen detection are promising research directions [1,[40][41][42][43]. cell-free systems have significant advantages in these application areas compared to in vivo systems.For example, as there is no longer a cell wall, we can experimentally observe the system and intervene if need be.Moreover, cell-free synthetic circuitry is highly portable.For example, it can be dried onto paper, easily transported, and potentially stored indefinitely [44].Thus, after development and testing of the circuitry, and manufacturing of the devices, there is no need for large, bulky and expensive equipment usually associated with in vivo bioprocesses.However, to move beyond proof of concept and into industrial or medical practice will require extensive optimization.Mathematical modeling is an important tool in this regard.In this study, we explored the relationship between measurement selection and the ability to estimate metabolic flux in a cell-free system.One of the central advantages of cell-free systems is the ability to measure the concentration of metabolic intermediates.However, ultimately, we need to transform these measurements into testable hypothesis about the performance of a cell-free system.The connection between flux estimation and optimal measurement selection has been well studied for in vivo systems; for example, see the classic work of Savinell and Palsson [45,46].Moreover, the robust quantification of metabolic flux in in vivo systems is also well developed, with both mature experimental and computational tools available (see [39,47,48]).However, quantification of metabolic flux in cell-free systems from metabolite measurements remains an open area of research.Certainly, techniques developed for the identification and quantification of in vivo systems could be applied to cell-free.For example, Lucks and coworkers used the D-optimal experimental design approach of Doyle and coworkers [49] to parameterize a kinetic model of RNA-based cell-free circuits [33].However, Lucks and coworkers did not consider the resources required for the RNA circuits to function.Quantification of metabolic flux, and the associated resource production and consumption in cell-free applications, is not common.Instead, the synthetic biology community has focused on designing circuits and circuit components with specific behaviors e.g., [50][51][52], assuming the resources required to express these components will be available.At proof of concept scales, this is a reasonable assumption.However, as we move toward industrial practice, careful attention must be paid to resource generation and consumption-for example, optimizing the expressed proteome for cell-free extract production, or balancing cofactor utilization during the cell-free reaction [53,54].Resource management has a direct impact on the performance and industrial viability of cell-free applications-for example, potentially limiting the rate of production and yield of circuit components, the size and complexity of possible synthetic circuits, the operational lifetime of synthetic devices, or the titer and rate of production of protein and small molecule products.Thus, as we move beyond technology development to realistic industrial or medical applications, the performance of synthetic devices will become increasingly important.Toward this need, we expect mathematical modeling will play an important role.

Formulation and Solution of the Model Equations
We modeled the time evolution of the ith metabolite concentration (x i ), the scaled activity of network enzymes ( i ), transcription processes generating the mRNA m and translation processes generating the protein P in an E. coli cell-free metabolic network as a system of ordinary differential equations: The quantity R denotes the number of metabolic reactions, M denotes the number of metabolites and N denotes the number of metabolic enzymes in the model.The quantity r j (x, , k) denotes the rate of reaction j.Typically, reaction j is a nonlinear function of metabolite and enzyme abundance, as well as unknown kinetic parameters k (K × 1).The quantity σ ij denotes the stoichiometric coefficient for species i in reaction j.If σ ij > 0, metabolite i is produced by reaction j.Conversely, if σ ij < 0, metabolite i is consumed by reaction j, while σ ij = 0 indicates metabolite i is not connected with reaction j.Lastly, λ i denotes the scaled enzyme activity decay constant.The system material balances were subject to the initial conditions x (t o ) = x o and (t o ) = 1 (initially, we have 100% cell-free enzyme activity).
The cell-free model equations were solved using a dynamic constraint-based approach in which the rates of the metabolic fluxes, transcription and translation processes were estimated by solving an optimization subproblem from t to t + ∆t.In particular, the biochemical fluxes r 1 , r 2 , . . ., r R which appear in the balance equations were calculated from t to t + ∆t by solving a constrained optimization subproblem with (potentially nonlinear) objective O (x 1 , x 2 , . . . ,x M ): subject to species constraints and flux bounds: 0 ≤ r j ≤ U j (x 1 , x 2 , . . . ,x M , κ) j = 1, 2, . . ., R.
In this study, we maximized the rate of translation r X unless otherwise specified.We discretized the derivative term for each species using a constant width h forward different approximation (however, this was done for convenience and more sophisticated techniques could have been used).The reaction bounds U j (x 1 , x 2 , . . . ,x M , κ) are potentially non-linear functions of the system state, and can be updated during each time step.Here, we modeled the upper bound for flux j as Vmax j (k), where Vmax denotes a characteristic maximum reaction velocity, and j (k) denotes the scaled enzyme activity catalyzing reaction j at time step k.The characteristic maximum reaction velocity was set to 600 mM/h (which corresponds to an average k cat 1000 s −1 and and enzyme concentration of approximately 0.2 µM) unless otherwise specified.Additional species constraints can be added to directly incorporate metabolomic, proteomic or transcriptomic measurements into the flux calculation.In this study, we incorporated metabolite measurement constraints of the form: where χ L m,k+1 and χ U m,k+1 denote the lower and upper measurement bound for metabolite m at time step k + 1, where Ξ metabolites were measured over the time course of the cell-free reaction.Lastly, we imposed a user-configurable bound B i on the maximum rate of change for metabolite i: and non-negativity constraints x i ≥ 0 for all metabolites and all time steps.The bounds on the transcription rate (L T = r T = U T ) were modeled as: where G P denotes the concentration of the gene encoding the protein of interest, and K T denotes a transcription saturation coefficient.The maximum transcription rate V max T was formulated as: where R T denotes the RNA polymerase concentration (nM), vT denotes the RNA polymerase elongation rate (nt/h), l G denotes the gene length (nt).The term u (κ) (dimensionless, 0 ≤ u (κ) ≤ 1) is an effective model of promoter activity, where κ denotes promoter specific parameters.The general form for the promoter models was taken from Moon et al. [55], which was based on earlier studies from Bintu and coworkers [56], and similar to the genetically structured modeling approach of Lee and Bailey [57].In this study, we considered only the T7 promoter model: where K T7 denotes a T7 RNA polymerase binding constant.The values for all promoter parameters are given in Table 1.
The translation rate (r X ) was bounded by: where m denotes the mRNA abundance and K X denotes a translation saturation constant.The maximum translation rate V max X was formulated as: The term K P denotes the polysome amplification constant, vX denotes the ribosome elongation rate (amino acids per hour), and l P denotes the number of amino acids in the protein of interest.The mRNA abundance m was estimated as: where λ denotes the mRNA degradation rate constant (h −1 ).All translation parameters are given in Table 1.Metabolic fluxes were estimated at each time step using the GNU Linear Programming Kit (GLPK) v4.55 [58].The objective of the optimization subproblem was to maximize the translation rate, subject to the stoichiometric and metabolite constraints.The model code, parameters and initial conditions used in this study are available under an MIT software license at the Varnerlab website [35].The model code is written in the Julia programming language [59].Default transcription and translation parameters are stored in TXTLDictionary.jl,while specific simulations are described in the Solve_*.jlfiles.Lastly, the figures for this study were produced using the Plot_*.jlscripts.

Sampling of Transcription and Translation Parameters
The influence of the uncertainty in the transcription (TX) and translation (TL) parameters was estimated by sampling the expected physiological ranges for these parameters as determined from literature.We generated uniform random samples between an upper (u) and lower (l) parameter bound of the form: where U (0, 1) denotes a uniform random number between 0 and 1.The T7 RNA polymerase concentration was sampled between 800 and 1200 nM, ribosome levels between 1.5 and 3.0 µM, polysome amplification between 5 and 15, the RNA polymerase elongation rate between 20 and 30 nt/s, and the ribosome elongation rate between 1.0 and 3.0 aa/s [10,60]; see TXTL_ensemble.jl for a complete list of parameter ranges.

Generation and Evaluation of Alternative Measurement Sets
The measurement sets consisted of the base (one set of 37 metabolites), inclusion sets (23 sets of 38 metabolites each), exclusion sets (37 sets of 36 metabolites each), SVD-guided (one set of 37 metabolites), and simulated annealing samples (238 sets of varying length).In all cases, we assumed the same sampling frequency as the base synthetic dataset, but we varied which species were measured.The exclusion or inclusion measurement sets were constructed by removing or adding a metabolite to the base set, while the SVD-guided measurement set was constructed from high importance metabolites; the top 36 metabolites (plus glucose) that had the greatest singular value weighted sum across the SVD-modes, accounting for 95% of the network structure, were designated the SVD measurement set.Lastly, we used simulated annealing to generate potentially optimal measurements sets, where the objective was to minimize the product of the prediction error, and flux uncertainty.The prediction error, E , was computed by comparing the simulated versus the measured value of a metabolite, for a M core set of metabolites.On the other hand, the flux variability was computed using flux variability analysis (FVA) [61], subject to constraints on the CAT production rate, and the selected metabolite trajectories.In particular, the metabolite prediction error was calculated from the time-dependent state array: where x i (t) denotes the simulated value of metabolite i at time t, y U i (t) denotes the upper bound of the 95% confidence interval on the synthetic data for metabolite i at time t, y L i (t) denotes the lower bound of the 95% confidence interval on the synthetic data for metabolite i at time t, and M core denotes the subset of metabolites in the core metabolism.For this calculation, the entire time course was considered (t i = 0 h, T = 3 h).The flux uncertainty was calculated from the maximal and minimal flux arrays: where r max j (t) denotes the maximum value of flux j, while r min j (t) denotes the value of flux j at time t, calculated using flux variability analysis.The quantity R core denotes the subset of reactions that constitute the core metabolism.For the flux uncertainty calculations, either the entire reaction time course was considered (t i = 0 h, T = 3 h), or the uncertainty was calculated separately for each phase (phase 1: t i = 0 h, T = 1 h; phase 2: t i = 1 h, T = 3 h).
The simulated annealing algorithm began by evaluating the error and flux uncertainty of the base case and multiplying these to obtain a cost function: Then, each metabolite that was considered measurable was added to or removed from the constraint set with a certain probability p switch : 0, 1) < p switch , θ i , U (0, 1) > p switch , i = 1, 2, . . ., M measurable , (18) where θ i ∈ {0, 1} denotes a binary parameter encoding whether or not metabolite i is in the constraint set, U (0, 1) denotes a uniform random number taken from a distribution between 0 and 1, and M measurable denotes the set of metabolites deemed to be measurable.For each newly generated constraint set, we re-solved the dFBA and FVA problems, and re-calculated the cost function.All sets with a lower cost were accepted into the ensemble.Sets with a higher cost were also accepted into the ensemble, if they satisfied the acceptance constraint: R uniform 0,1 where R uniform 0,1 denotes a random number taken from a uniform distribution between 0 and 1, cost denotes the cost of the current parameter set, cost new denotes the cost of the new parameter set, and α denotes an adjustable parameter to control the tolerance to high-error sets.A total of 238 samples were accepted into the ensemble, of which there were 219 unique sets.Both M core and R core and user-configurable are defined in the model code repository available from the Varnerlab website [35].

Conclusions
In summary, we used a dynamic constraint-based modeling approach to simulate cell-free metabolism, and to study how measurement selection impacts model performance.We extended sequence specific flux balance analysis, by removing the pseudo steady state assumption, and adding synthetic metabolite measurement constraints to the flux calculation.Using this method, we simulated the cell-free synthesis of a model protein, chloramphenicol acetyltransferase, we identified the most important measured species in the cell-free system, and additional species that yielded the lowest metabolite prediction error and flux uncertainty.Only synthetic metabolite measurements were used in this study; however, this work built a foundation to rationally design experimental measurement protocols that could be implemented with a variety of analytical techniques.Taken together, these findings represent a novel tool for dynamic cell-free simulations, measurement selection and pathway analysis, not only for E. coli, but potentially for align variety of metabolic networks, whether in vivo or cell-free.However, while this first study was promising, there were several issues to consider in future work.First, while we described transcription and translation at a sequence specific level, we have not considered the complexities of protein folding, or post-translational modifications such as protein glycosylation.A more detailed description of transcription and translation reactions, including the role of chaperones in protein folding, has been used in in-vivo genome scale ME models; e.g., see O'Brien et al. [20].These template reactions could easily be adapted to a cell-free system, thereby providing a potentially higher fidelity description protein synthesis and folding.Next, the inclusion of post-translational modifications such as protein glycosylation in the next generation of models will be important to describe the cell-free synthesis of therapeutic proteins.DeLisa and coworkers recently showed that glycoproteins can be synthesized in a cell-free system, using extract generated from modified E. coli cells capable of asparagine-linked protein glycosylation [62].Simulation of the generation and attachment of glycans to protein targets could be an important step to optimizing cell-free glycoprotein production.Lastly, while we modeled the cell-free production of a only single protein in this study, sequence specific dynamic constraint models could be developed for multi-protein synthetic circuits, RNA circuits or even small molecule production.Thus, this approach offers a unique tool to model and potentially optimize a wide variety of application areas in synthetic biology.

Figure 1 .
Figure 1.Schematic of the core portion of the cell-free E. coli metabolic network.The network consisted of 264 reactions and 146 metabolites.Metabolites of glycolysis, pentose phosphate pathway, Entner-Doudoroff pathway, and the tricarboxylic acid (TCA) cycle are shown.Metabolites of oxidative phosphorylation, amino acid biosynthesis and degradation, transcription/translation, chorismate metabolism, and energy metabolism are not shown.

Figure 2 .
Figure2.Simulated metabolite concentration versus synthetic data as a function of time.Central carbon metabolism, including glucose (substrate), chloramphenicol acetyltransferase (CAT) (product), and intermediates, as well as total concentration of energy species (energy total).The energy total denotes the summation of all energy species in the model (all bases and all phosphate states).The 95% confidence interval for the simulation conducted over the ensemble of transcription/translation parameter sets is shown in the orange shaded region, while the 95% confidence interval for the synthetic constraint data is shown in the blue shaded region.The synthetic data constraints were generated from the kinetic model of Horvath et al., which was trained using experimental measurements of the system simulated in this study[32].

Figure 3 .
Figure3.Simulation of amino acid concentration versus synthetic data as a function of time.The 95% confidence interval for the simulation conducted over the ensemble of transcription/translation parameter sets is shown in the orange shaded region, while the synthetic constraint data is shown in the blue shaded region.Arginine and glutamate were excluded from the constraint set.The synthetic data constraints were generated from the kinetic model of Horvath et al., which was trained using experimental measurements of the system simulated in this study[32].

Figure 4 .
Figure 4. Simulation of energy species and energy totals by base versus synthetic data as a function of time.The 95% confidence interval for the simulation conducted over the ensemble of transcription/translation parameter sets is shown in the orange shaded region, while the synthetic constraint data is shown in the blue shaded region.The synthetic data constraints were generated from the kinetic model of Horvath et al., which was trained using experimental measurements of the system simulated in this study[32].

Figure 5 .
Figure 5. Flux uncertainty versus metabolite prediction error against synthetic data, normalized to the base case (white star), for exclusion (gray) and inclusion (black) metabolite constraint sets.The performance of the singular value decomposition (SVD)-determined metabolite constraint set is shown by the white square.

Table 1 .
Reference values for transcription, translation, and mRNA degradation from literature.Transcription rate calculated from elongation rate, mRNA length, and promoter activity level.Translation rate calculated from elongation rate, protein length, and polysome amplification constant.The mRNA degradation rate calculated from a characteristic mRNA half-life.CAT: chloramphenicol acetyltransferase.

Table 4 .
Metabolites by frequency of appearance in the simulated annealing constraint sets.

Table 5 .
Metabolites by frequency of appearance in the 57 best simulated annealing constraint sets, those with error at least three orders of magnitude lower than the base synthetic dataset.