Towards Autonomous Process Control—Digital Twin for CHO Cell-Based Antibody Manufacturing Using a Dynamic Metabolic Model

: The development of new biologics is becoming more challenging due to global competition and increased requirements for process understanding and assured quality in regulatory approval. As a result, there is a need for predictive, mechanistic process models. These reduce the resources and time required in process development, generating understanding, expanding the possible operating space, and providing the basis for a digital twin for automated process control. Monoclonal antibodies are an important representative of industrially produced biologics that can be used for a wide range of applications. In this work, the validation of a mechanistic process model with respect to sensitivity, as well as accuracy and precision, is presented. For the investigated process conditions, the concentration of glycine, phenylalanine, tyrosine, and glutamine have been identiﬁed as signiﬁcant inﬂuencing factors for product formation via statistical evaluation. Cell growth is, under the investigated process conditions, signiﬁcantly dependent on the concentration of glucose within the investigated design space. Other signiﬁcant amino acids were identiﬁed. A Monte Carlo simulation was used to simulate the cultivation run with an optimized medium resulting from the sensitivity analysis. The precision of the model was shown to have a 95% conﬁdence interval. The model shown here includes the implementation of cell death in addition to models described in the literature.


Introduction
In biopharmaceutical production, the time-to-market for new, innovative products is growing shorter and shorter [1,2]. In this context, process development in the upstream, which typically involves optimization of the medium, feeding strategy, and various process parameters such as pH, power input, etc., is very costly, as a large number of lengthy cultivation experiments, often based on statistical experimental design, have to be performed [3]. Although the application of statistical experimental designs reduces the necessary number of experiments, in contrast to classical one-factor-at-a-time experiments, it is still very resource and time intensive due to the high number of process parameters and possible media compositions [4,5]. In addition to high-throughput screening using miniaturized and parallelized cultivation [6][7][8], a mechanistic, predictive process model is available as an alternative process optimization method [9][10][11]. Compared with the experimental approach, this offers the advantage that many parameter combinations can be screened within a short time [12,13]. For later automation, the process model also serves as the basis of the digital twin [14]. In the course of validating the process model, mechanistic understanding of the process can also be generated, which reduces hurdles in approval and expands the possible operating range [15]. A typical example of biopharmaceuticals for the regulated market is monoclonal antibodies (mAb). These are usually produced using Processes 2022, 10, 316 2 of 16 animal cells, mostly Chinese hamster ovary (CHO) cells [16,17]. This system is thus very suitable for the development and validation of a process model.
Simple Monod models with static yield coefficients are often used [10,18]. Although these are easy to determine from an already performed cultivation, there is no causality between the metabolism of the cell and biomass, as well as product formation. However, these processes are usually the actual subject of media optimization. The trend towards automated processes requires a validated, mechanistic process model that represents the processes in the cell in sufficient detail and causally [19,20]. In this work, the application of a dynamic metabolic model to mAb-producing CHO-DG44 cells is investigated and a procedure for model validation is presented. Models of this kind are published in the literature; recently, a metabolic model has been presented by Robitaille for CHO cells [21]. The model used in this work additionally includes the implementation of cell death and is, to our knowledge, the first adaption to CHO-DG44 for mAb production in fed-batch cultivation.

QbD-Based Process Development
The quality-by-design (QbD) concept has become an established pillar in modern process development of biologics [22][23][24]. In contrast to classical quality-by-testing, this approach, based on the use of process understanding and quantitatively defined normal operating range, enables the process to be readjusted for optimization, even after approval [25].
The basic principles of QbD-based process development are laid down in the ICH guidelines Q8-Q12 [26][27][28][29][30]. Figure 1 graphically illustrates the most important steps and development phases. Once the most important product properties have been determined and the quality target product profile (QTPP) thus defined, it is possible to derive related critical product properties [31,32]. If the focus is on process development, traditional process parameters such as productivity are often chosen as critical quality attributes (CQA), in addition to toxicity, bioavailability, etc. [33]. This enables an initial risk assessment to be carried out [34]. If predictive, mechanistic models are to be developed alongside or instead of resource-intensive experiments and subsequently used for optimization and control, the risk assessment of the model must be carried out according to the same principles as those used in an experimental process development [35]. Part of this procedure is the collection and quantitative evaluation of risk severity and risk probability, which together result in a risk rank that forms the decision-making basis for the design of multi-and univariate investigations [36]. be carried out [34]. If predictive, mechanistic models are to be developed alongside or instead of resource-intensive experiments and subsequently used for optimization and control, the risk assessment of the model must be carried out according to the same principles as those used in an experimental process development [35]. Part of this procedure is the collection and quantitative evaluation of risk severity and risk probability, which together result in a risk rank that forms the decision-making basis for the design of multiand univariate investigations [36].   [37]. In a first step, the QTPPs are defined. Subsequently, the CQAs are defined and a risk assessment of the influence of various process parameters on the CQAs is carried out. The risk assessment results in a design space for the process parameters to be investigated, which can be examined either via experiments or by means of a rigorous process model. Based on the results, a control strategy is defined, which can be continuously compared online via PAT with the actual state of the system. Strict implementation of this strategy allows for continuous process optimization.
In analogy to the experimental development, design-of-experiments (DoE) can also be used in the model validation. This allows the evaluation of the criticality of the investigated parameters, and additionally the definition of a design space [38].
The following steps in QbD-based process development deal with the feasibility of a control strategy [39]. The control strategy lists the critical process parameters and the CQAs depending on them, which have to be measured continuously in order to achieve QTTP assurance. Key enabling technologies for continuous monitoring are grouped under the umbrella term process-analytical-technology (PAT). Real time release testing (RTRT) could be realized based on full QbD-based process development and validated PAT, eliminating bottlenecks in the production of critical biopharmaceuticals [40]. The continuous monitoring of process variables by PAT as well as the achieved and documented process understanding allow the process to be continuously improved based on new process data [41].

Model Validation
The feasibility of the continuous monitoring, control, and optimization of the process described above requires a digital twin of the process. This should be based on the model used in the process development. The distinction between a predictive process model and a digital twin is made in the literature on the basis of the model depth and the degree of information exchange with the physical process. Figure 2 shows the intermediate stages from a simple steady-state model to a fully fledged digital twin for predictive, modelbased control. The feasibility of the continuous monitoring, control, and optimization of the process described above requires a digital twin of the process. This should be based on the model used in the process development. The distinction between a predictive process model and a digital twin is made in the literature on the basis of the model depth and the degree of information exchange with the physical process. Figure 2 shows the intermediate stages from a simple steady-state model to a fully fledged digital twin for predictive, modelbased control.   A prerequisite for the use of digital twins in regulated industries in a QbD-based process is a quantitative and unambiguous validation of the process model [43], as shown in Figure 3. The procedure for this is described several times in the literature for different upstream and downstream processes. Here, the specifics of a dynamic metabolic model for cell cultivation are addressed. First, after defining the model task and application, the model must be verified. In this case, it must be verified whether the model can reasonably represent the fundamental processes, such as cell growth, substrate consumption and product formation. Due to the large number of Monod-based formation and consumption rates, particular attention must be paid to the correct implementation of stoichiometry. If the model is plausible according to the assessment of an experienced process engineer, the sensitivity of the model should be quantified in the next step. For this purpose, DoE can be used to compare sensitivities from the model prediction with those from the process development. If the sensitivity is known, a rough design space can be defined, for example in the form of contour plots, which can also be used for further process optimization. For use as a digital twin, the model must be accurate and precise. For different states, the model predictions must match the target variables measured in the process (accuracy). For robust control, sufficient precision in the prediction is also necessary. The final validation milestone tests whether the model in the design space under investigation is at least as precise and accurate as the measurement in the physical process.
Processes 2022, 10, x FOR PEER REVIEW

Modeling of the Intracellular Metabolism of CHO Cells
The mathematical description of intracellular metabolism was adapted from lished dynamic metabolic model [21]. A detailed overview of model equations

Modeling of the Intracellular Metabolism of CHO Cells
The mathematical description of intracellular metabolism was adapted from a published dynamic metabolic model [21]. A detailed overview of model equations can be found in the original publication [21]. The reaction equations are based on modified Michaelis-Menten-type reaction equations. Multiplicative Michaelis-Menten equations are used when multiple substrates are involved. Feedback inhibition and activation were considered by using formulations 1 and 2, respectively.
where v max is the maximal reaction rate, [S], [A], and [I] are the concentrations of substrate, activator, and inhibitor, respectively, and K S , K A , and K I , are the Michaelis-Menten constants for substrate, activator, and inhibitor, respectively. The cell-specific growth rate, as well as the mAb formation rate, were also formulated as multiplicative Monod kinetics, with all amino acids, ATP, and, in the case of the growth rate, additionally glucose-6-phosphate and ribulose-5-phosphate, each being considered with a separate term. A separate Monod constant was also defined for each of the substrates for both the growth and the mAb formation.
The reaction network, shown in Figure 4, covers the major metabolic pathways of central metabolism, namely glycolysis, TCA cycle, pentose phosphate pathway, and oxidative phosphorylation as well as energy-consuming pathways in the form of ATPases, and anabolic reactions for cell division and mAb synthesis. Additionally, the model includes the most relevant metabolic pathways for amino acid metabolism, especially glutaminolysis as a central contributor to the TCA cycle. In addition, aspartate and alanine transaminase, the conversion of serine to pyruvate and formation of alpha ketoglutarate and succinate from two different reactions is covered in the model.
The composition of cells and the result he literature [44]. An average molecular weight of 107.5 g mol −1 was assumed for the proteins composing the biomass. The IgG1 sequence was assumed to be the average sequence as proposed in the literature [45], and the amino acid consumption for mAb synthesis was set accordingly. Lipid metabolism was not considered separately in the model; hence, the assumption was made that the entire lipid content of the cells was derived from citrate in the citric acid cycle. Similarly, the synthesis of nucleic acids was modeled as a lumped reaction and the synthesis was assumed to be derived from ribulose-5-phosphate and glucose-6-phosphate. The ATP requirements for the synthesis of biomass and mAb were adopted from the literature [19]. From the literature, a conversion factor of 3.15 × 10 −4 gDW 10 −6 cells was adopted [21].  In addition to the description of substrate consumption and the associate formation and cell growth, described by the metabolic model, fluid dynamics a balance are necessary for a complete process model in order to be scale-able d For dynamic modeling the following assumptions were made: -Ideally mixed stirred tank reactor, i.e., no spatial differences in pH, temperature, concentration of chemical species. Constant pH, constant temperature, no oxygen limitation. - The model is an unsegregated, structured model, meaning the entire cell population was assumed to be an "average cell" and cell cycle differences were not considered. -Limited number of metabolites: primarily metabolites were used that represent a branch in a metabolic pathway, or that are taken up directly from the medium into the cell. In this approach, subsequent reactions are often grouped together, allowing the number of metabolites, and thus model complexity, to be reduced without sacrificing predictive power. -Constant enzyme amounts: the maximum reaction rate of an enzyme-catalyzed reaction depends on the enzyme amount. The enzyme amount depends on the transcription and translation rates, which may depend on substrate concentration and other influencing variables. In order to represent the dependence of transcription and translation rates, "omics" data are required, which were not available in the context of this work. Therefore, constant enzyme amounts were assumed in this model. -New cells and monoclonal antibodies were assumed to be directly formed from precursors present in the cell (amino acids, citrate (representing lipids), and R5P (representing nucleotides). - Analytically undetermined media components such as vitamins, trace elements, phospholipid precursors, growth factors, etc., were assumed to be non-limiting. Thus, it implicitly follows that the growth rate depends only on the number of quantified substrates. - The cell volume was assumed to be constant. The concentration of intracellular substrates depends on the volume of the cell, whereas the substrate mass remains constant. Changes in the concentration of intracellular substrates due to changes in cell volume were not considered. - The composition of the cells was assumed to be constant.
In addition to the description of substrate consumption and the associated product formation and cell growth, described by the metabolic model, fluid dynamics and energy balance are necessary for a complete process model in order to be scale-able due to fluid dynamics and energy management non-idealities due to scale. In the context of this work, work was carried out at the 1 L scale standardized defined laboratory equipment. At this small scale, fluid dynamics non-idealities do not play a significant role in terms of mixing time and residence time behavior. Details on the implementation of such an approach, i.e., residence time and energy balancing non-idealities for stirred reactors at different scales, can be found in [46], for example.
Here, the balance space for the energy balance includes the accumulation as the difference of the heat energy removed and added (see Equation (3)). The power input of the stirrer can be described by Equation (4). Here, the Ne number describes the ratio of flow resistance to inertial force. To keep the bioreactor at 37 • C, the reactor must be tempered. The heat supplied or dissipated via the double jacket depends on the heat transfer coefficient and the exchange surface (see Equation (5)). . .

Model Parameter Determination
As usual, at first the equipment setup is characterized fluid dynamically and due to its energy management [46]. The metabolic flux model parameters were initially taken from the original publication. Since the model predictions did not apply to the cell line used, the model parameters were newly determined for the different CHO DG 44 cell line. A total of 12 key parameters had to be determined specifically in order to sufficiently describe the cultivation of those CHO DG 44 cells. Table 1 shows the modified parameters and the only slightly updated parameter values. With the exception of the first parameter, K growth,NH4 , which is the Michaelis-Menten constant of growth inhibition by ammonium, all of the parameters are maximal reaction rates. Maximal reaction rates are dependent on the amount of enzyme present. Since a different cell line was used, expression of different enzymes may differ, leading to a different metabolic phenotype.

Sensitivity Analysis
Part of the model validation process is the execution of the model verification. As described in the introduction, it is examined here whether the plausibility is given. For this purpose, the syntax and the stoichiometry are checked for errors. Likewise, the plausibility of the model has been tested with regard to the correct implementation of substrate consumption, cell growth, and product formation (see Section 4.1.1). In the following, the sensitivity of the model parameters, which is the second decision criterion, is investigated. The results are discussed in Section 4.1.2. The quantification of accuracy and precision of the model predictions is presented using Monte Carlo simulations in Section 4.2. Figure 5 shows the model prediction for cell growth, product formation as well as the different substrate courses. The model prediction qualitatively agrees with the experimental courses. The characteristic decrease in VCD after 200 h is correctly reproduced. The sigmoidal course of the product concentration as well as the turnover of the substrates is reproduced sufficiently accurately by the model within the experimental accuracy.

Plausibility
From the progression of, e.g., Gln (d) and ASN (f), it can be seen that these amino acids are consumed faster in cultivation than they are supplied by feeding. Furthermore, from the progression of GLY concentration (h), a slight overfeeding of this component can be predicted by the model. Figure 5 shows the model prediction for cell growth, product formation as well as the different substrate courses. The model prediction qualitatively agrees with the experimental courses. The characteristic decrease in VCD after 200 h is correctly reproduced. The sigmoidal course of the product concentration as well as the turnover of the substrates is reproduced sufficiently accurately by the model within the experimental accuracy.

Sensitivity
The sensitivity of the model parameters in terms of strength and direction was determined using a partial factorial experimental design. This shows the main factors and their interactions with each other. Both product formation (see Figure 6a) and cell growth (Figure 6b) can be represented sufficiently reliably with a p-value of less than 0.0001 using the regression model created.
From the progression of, e.g., Gln (d) and ASN (f), it can be seen that these amino acids are consumed faster in cultivation than they are supplied by feeding. Furthermore, from the progression of GLY concentration (h), a slight overfeeding of this component can be predicted by the model.

Sensitivity
The sensitivity of the model parameters in terms of strength and direction was determined using a partial factorial experimental design. This shows the main factors and their interactions with each other. Both product formation (see Error! Reference source not found.a) and cell growth (Error! Reference source not found.b) can be represented sufficiently reliably with a p-value of less than 0.0001 using the regression model created.
(a) (b) In the experimental design, the concentration in the reference medium was varied ± 50%. The significance of the parameters must therefore not be interpreted as generally valid, but only for the medium used. For both target parameters, the GLY concentration is the most significant parameter (see Figure 7). An important finding from the evaluations discussed above is not the fundamental dependence of cell growth and antibody productivity on amino acid concentration, but the identifiability of those components that have a particularly sensitive effect in the process under investigation. Thus, the applicability of the model lies not only in the prediction of cultivation processes, but in the predictive optimization of media.  In the experimental design, the concentration in the reference medium was varied ±50%. The significance of the parameters must therefore not be interpreted as generally valid, but only for the medium used. For both target parameters, the GLY concentration is the most significant parameter (see Figure 7). An important finding from the evaluations discussed above is not the fundamental dependence of cell growth and antibody productivity on amino acid concentration, but the identifiability of those components that have a particularly sensitive effect in the process under investigation. Thus, the applicability of the model lies not only in the prediction of cultivation processes, but in the predictive optimization of media.
From the progression of, e.g., Gln (d) and ASN (f), it can be seen that these amino acids are consumed faster in cultivation than they are supplied by feeding. Furthermore, from the progression of GLY concentration (h), a slight overfeeding of this component can be predicted by the model.

Sensitivity
The sensitivity of the model parameters in terms of strength and direction was determined using a partial factorial experimental design. This shows the main factors and their interactions with each other. Both product formation (see Error! Reference source not found.a) and cell growth (Error! Reference source not found.b) can be represented sufficiently reliably with a p-value of less than 0.0001 using the regression model created.
(a) (b) In the experimental design, the concentration in the reference medium was varied ± 50%. The significance of the parameters must therefore not be interpreted as generally valid, but only for the medium used. For both target parameters, the GLY concentration is the most significant parameter (see Figure 7). An important finding from the evaluations discussed above is not the fundamental dependence of cell growth and antibody productivity on amino acid concentration, but the identifiability of those components that have a particularly sensitive effect in the process under investigation. Thus, the applicability of the model lies not only in the prediction of cultivation processes, but in the predictive optimization of media. Although for cell growth GLC and GLY show an equivalent significance, the dominant influence of GLY concentration for product formation can be seen from the small effect of TYR concentration at low GLY concentrations (see Figure 8). All media components show a positive effect direction in the investigated range. With the obtained knowledge about the effect of the media components on cell growth and product formation, an optimized media composition can be predicted.
Although for cell growth GLC and GLY show an equivalent significance, the dominant influence of GLY concentration for product formation can be seen from the small effect of TYR concentration at low GLY concentrations (see Figure 8). All media components show a positive effect direction in the investigated range. With the obtained knowledge about the effect of the media components on cell growth and product formation, an optimized media composition can be predicted. In order to achieve the next validation criterion, the cultivation process must now be reproduced sufficiently accurately and precisely by the model for the optimization.

Accuracy and Precision
The model precision was determined for the media concentrations optimized from the MFAT study by means of a Monte Carlo simulation, shown in Figure 9. Here, 30 simulations were carried out. The values for the concentrations used in the model were combined within the random, normally distributed deviation of 5%. This allows us to determine how robust the model prediction is. For these simulation results, a t-test confidence interval of 94.97% is obtained with a certainty of 99%. The simulation results deviating by less than 2% can be assumed as sufficiently accurate model predictions. The third criterion is thus fulfilled with regard to precision. In order to achieve the next validation criterion, the cultivation process must now be reproduced sufficiently accurately and precisely by the model for the optimization.

Accuracy and Precision
The model precision was determined for the media concentrations optimized from the MFAT study by means of a Monte Carlo simulation, shown in Figure 9. Here, 30 simulations were carried out. The values for the concentrations used in the model were combined within the random, normally distributed deviation of 5%. This allows us to determine how robust the model prediction is. For these simulation results, a t-test confidence interval of 94.97% is obtained with a certainty of 99%. The simulation results deviating by less than 2% can be assumed as sufficiently accurate model predictions. The third criterion is thus fulfilled with regard to precision.  In order to achieve the next validation criterion, the cultivation process must now be reproduced sufficiently accurately and precisely by the model for the optimization.

Accuracy and Precision
The model precision was determined for the media concentrations optimized from the MFAT study by means of a Monte Carlo simulation, shown in Figure 9. Here, 30 simulations were carried out. The values for the concentrations used in the model were combined within the random, normally distributed deviation of 5%. This allows us to determine how robust the model prediction is. For these simulation results, a t-test confidence interval of 94.97% is obtained with a certainty of 99%. The simulation results deviating by less than 2% can be assumed as sufficiently accurate model predictions. The third criterion is thus fulfilled with regard to precision.

Materials and Methods
Chinese hamster ovary cells (CHO DG44) were used to produce an immunoglobulin (IgG1). The culture conditions were 36.8 • C, pH 7.1, 60% pO2, and 433 rpm (three-blade segment impeller with a diameter of 54 mm and blades at an angle of 30 • , bbi-biotech GmbH, Berlin, Germany). The cultivations were carried out in serum-free, commercial medium (CellcaCHO Expression Platform, Sartorius Stedim Biotech GmbH, Göttingen, Germany) in 2 L glass bioreactors (Biostat ® B, Sartorius Stedim Biotech GmbH, Göttingen, Germany) controlled via a digital control unit (DCU, Biostat ® B, Sartorius Stedim Biotech GmbH, Göttingen, Germany). Pre-cultures were grown in shake flasks in serum-free medium. In terms of fed-batch bioreactor cultivations, feed medium (based on CellcaCHO Expression Platform) was provided every 24 h starting at 72 h. Cell concentration was repeatedly quantified using a hemocytometer (Neubauer improved, BRAND GmbH + CO KG, Wertheim, Germany) and trypan blue solution (0.4%, Sigma-Aldrich, St. Louis, MO, USA) as dye for the detection of dead cells. An in situ turbidity probe (transmission, 880 nm, HiTec Zang GmbH, Herzogenrath, Germany) was used for quantifying the cell concentration during bioreactor cultivations. mAb concentration was determined by Protein A chromatography (PA ID Sensor Cartridge, Applied Biosystems, Bedford, MA, USA). Dulbecco's PBS buffer was used as a loading buffer at pH 7.4 and as an elution buffer at pH 2.6. The absorbance was monitored at 280 nm. Glucose and lactate concentrations were quantified using a LaboTrace compact (TRACE Analytics GmbH, Braunschweig, Germany).
An RP chromatography column (InfinityLab Poroshell HPH-C18; 3.0 × 100 mm; 2.7 µm; Agilent Technologies, Santa Clara, USA) was used to determine amino acid concentrations in the cultivation samples. Sample preparation consisted of filtration through a 0.2 µm cellulose acetate syringe filter (VWR International GmbH, Radnor, USA). Prior to sample injection, the amino acids were derivatized using orthophthalic aldehyde (OPA). To allow better separation of amino acids, the column oven was set to a temperature of 40 • C.
The kinetic model was implemented in Aspen Custom Modeler V8.4 (Aspen Technology, Inc., Bedford, MA, USA) in order to allow total process simulations and optimizations [15]. The model framework was adapted from the literature [21]. However, a kinetic for cell death was added to the model to represent decreasing viable cell density toward the end of the cultivation as well as scalable fluid dynamics and energy balance non-idealities. As bioreactor cell cultures were performed in fed-batch mode with daily bolus feed additions, the model equations were extended by feeding terms. Consequently, volumetric changes were considered as well.

Discussion
The present study shows the distinct and quantitative validation of a dynamic metabolic model that is used for simulating a fed-batch cultivation of an industrially relevant mAbproducing CHO DG 44 cell line4.
Single-and multi-parameter-at-a-time studies reveal the significance of model parameters and enables the identification of combined parameter effects with support from statistical evaluation (Pareto chart and partial least squares loading plot). The comparison between the simulation and the experiment suggests sufficient precision and accuracy for the applied model approach to be applicable in a process development scenario. It is shown in the Pareto analysis that significant parameters regarding the concentration of the mAb are the tyrosine, glutamine, phenylalanine, and glycine concentration. Additionally, interactions of these parameters are significant. It is shown that the highest concentration of mAb is positively correlated to the amino acid concentrations. The results and procedure presented support the implementation of dynamic modelling of intracellular metabolisms in upcoming processes. Subsequent research will focus on transferring the model to the Fed-Batch cultivation of HEK293 cells to produce human immunodeficiency-virus-like particles, as well as applicability of the developed model on the example of other CHO cell lines.