Macroscopic Dynamic Modeling of Sequential Batch Cultures of Hybridoma Cells: An Experimental Validation

Hybridoma cells are commonly grown for the production of monoclonal antibodies (MAb). For monitoring and control purposes of the bioreactors, dynamic models of the cultures are required. However these models are difficult to infer from the usually limited amount of available experimental data and do not focus on target protein production optimization. This paper explores an experimental case study where hybridoma cells are grown in a sequential batch reactor. The simplest macroscopic reaction scheme translating the data is first derived using a maximum likelihood principal component analysis. Subsequently, nonlinear least-squares estimation is used to determine the kinetic laws. The resulting dynamic model reproduces quite satisfactorily the experimental data, as evidenced in direct and cross-validation tests. Furthermore, model predictions can also be used to predict optimal medium renewal time and composition.


Introduction
Therapeutic products (vaccines, antibodies, etc.) are subject to exponential demands and cost-lowering process improvements, leading to the intensification of growth conditions in the bio-pharmaceutical industry and the sharp increase of the related market economy. For instance, monoclonal antibody (MAb) market amounts to several billion dollars and is still increasing.
To improve bioprocess yield and repeatability, monitoring and control tools are required. The latter implies the availability of dynamic models, which can predict the process trajectory and support the design of software sensors or control strategies. Previous optimization studies of hybridoma cell cultures for MAb production were usually conducted using simple mathematical models based on macroscopic reaction schemes such as in [1,2].
More recently, a macroscopic model with kinetics accounting for overflow metabolism, where glucose and glutamine are the main substrates, was proposed in [3]. Indeed, cell respiratory capacity is limited [4]. Therefore, depending on the substrate concentrations, cell metabolism follows two possible pathways: the respirative regime if the respiratory capacity is sufficient to oxidize the whole substrate amount or the respiro-fermentative regime if this substrate amount is in excess with respect to the available oxidative capacity, thus leading to the formation of growth-inhibiting • A simple dynamic model of cultures of hybridoma cells in SBR is developed and validated with experimental data. Confidence intervals for the parameters and the estimated trajectories are provided. • A systematic model identification procedure, based on rigorous yet simple to use tools-MLPCA to determine the stoichiometry, nonlinear least squares to identify the parameters of the kinetic laws, sensitivity analysis and Monte Carlo analysis to infer the confidence intervals-is assessed in a real case study, showing good performance and promise for future applications.

•
The simple dynamic model is further exploited to optimize the medium renewal strategy in the sequential batches.
This paper is organized as follows. Section 2 reviews the basic concepts of verflow metabolism and mathematical modeling using principal component analysis. Section 3 presents the experimental case study and process operating conditions. The dynamic model of hybridoma sequential batch cultures is derived in Section 4 and parameters are identified from experimental data in Section 5. Subsequently, Section 6 develops a parametric sensitivity analysis and proposes further model simplifications. The simplified model is identified and cross-validated with two different data sets in Section 7. Finally, the dynamic model is used to optimize the culture medium renewal time and composition in Section 8 while conclusions are drawn in Section 9.

Dynamic Modeling of Hybridoma Cultures
This section first reviews the basic concept of overflow metabolism and then presents a systematic procedure to infer candidate macroscopic models from principal component analysis of the data at hand.

Overflow Metabolism
The main physiological feature of hybridoma resides in their primary metabolism or, more precisely, in their catabolism, presenting the following main pathways:

•
The glycolysis which is a series of degradation reactions of glucose (the main substrate) taking place in the cytoplasm and leading to a final product, i.e., pyruvate.

•
The Krebs cycle, also called the tricarboxylic acids (TCA) cycle or citric acids cycle, which takes place inside the mitochondrions and uses pyruvate to product the cells energy units (Adenosine triphosphate or ATP) and reduced cofactors (typically NADH and FADH).

•
The electron transport, still taking place in mitochondrions and producing ATP from the reduced cofactors.

•
The fermentative pathway which, in oxygen limitation, produces typical products like lactate from pyruvate in the cytoplasm.
Cell catabolism is characterized by a limited energy production (i.e., the Krebs cycle has a limited capacity) principally used for cell growth and division. This limitation comes from the capacity to oxidize the main nutrients: glucose (main carbon source) and glutamine (main nitrogen source). The excess amounts of these nutrients are assumed to follow other metabolic pathways more commonly known as "fermentation", producing a side byproduct.
This "Overflow Metabolism" or "short-term Crabtree effect" [4,[7][8][9][10][11], is typically observed with yeast, bacteria and animal cell cultures. Depending on the case, it leads to the production of ethanol, acetate and lactate/ammonium as side byproducts. Several descriptions of this switching mechanism have been proposed in the literature (for instance in [10]) but this phenomenon remains not well understood.
The byproduct formation usually inhibits the oxidative capacity of the cells, slowing down growth for increasing concentrations. In turn, it depends on the oxidative capacity of the cells and on the medium composition.
A generic mechanistic model that would, in principle, allow the representation of the culture of different strains presenting overflow metabolism, can be described through the following main reactions: jth substrate consumption: jth substrate overflow: jth byproduct consumption: where X, S j and P j are the concentrations of cell biomass, jth substrate and jth side byproduct, respectively. The k ni coefficients represent the yield (or pseudo-stoichiometric) coefficients of component n in reaction i. Overflow metabolism assumption involves that, for each concerned substrate, these reactions take place in pairs (1a and 1b) or triplets (1a-1c) if the considered byproduct can be reconsumed by the biomass as a substitute substrate source when the oxidative capacity is not fully exploited. Indeed, Sonnleitner and Käppeli [12] assume that the cell oxidative capacity rules the general metabolism, following a bottleneck effect. During a culture, the cells are likely to change their metabolism depending on the exploitation of the respiratory capacity. At low substrate uptake rate (substrate concentrations are below critical levels S < S crit and φ S < φ Smax ), substrate is consumed with biomass growth and a relatively low metabolite byproduct production (1a) without overflow, which is defined as respiratory metabolism and the consequent remaining respiratory capacity can be used to oxidize byproduct as substitute carbon source as in (1c).
At high substrate uptake rate (substrate concentration is above critical level S > S crit and φ S > φ Smax ), the respiratory capacity is saturated, resulting in overflow metabolism towards excess metabolite production (reactions (1a) and (1b)). The state at which overflow metabolism is initiated (S = S crit and φ S = φ Smax ) is referred to as critical metabolism. For instance, yeast metabolism is described by the bottleneck assumption of Sonnleitner and Käppeli [12], as illustrated in Figure 1. This model was exploited in [13][14][15] for robust control purposes. Based on a similar model, references [16,17] suggested practical ways to estimate state variables such as biomass, glucose or acetate in bacteria cultures using software sensors. Recently, reference [3] proposed a dynamic model of

Systematic Modeling Procedure
In contrast with the previous modeling approach which is based on past experience and a priori knowledge of the metabolic network, it is now suggested to derive a model based mostly on the information content of available data sets. This can be particularly relevant when the model structure is uncertain and experimental data sets are available that can be analyzed to extract information on the reaction stoichiometry and kinetics.
First, we recall that bioprocesses can be represented by macroscopic reaction schemes involving M reactions between N components under the following generic form [18]: where j  (respectively, j  ) denotes the set of reactants (or products) in the "jth" reaction. The parameters kij are pseudo-stoichiometric coefficients while j  is the corresponding reaction rate.
Applying mass balances to (2), the following ordinary differential equation system is obtained: where K is the pseudo-stoichiometric matrix and  represents the transport term taking dilutions, input feeds and gaseous outflows into account.  is the vector containing all the kinetic parameters.
The number of components N is generally larger than the number of reactions M so that the rank of the stoichiometric matrix K is assumed to be M. For instance, in [3], M = 5 and N = 6.
Defining the transport-free state evolution f  and integrating (3) between two consecutive measurement times lead to the following expression: where  i f  is the differential transport-free state vector. As discussed in [6], Equation (4) expresses that  i f  is contained in a M-dimensional linear subspace, and MLPCA allows to determine subspaces of increasing dimensions p explaining a noisy data set (and therefore reaction schemes of increasing detail explaining the experimental data). A systematic procedure can therefore be developed, which selects the smallest value of p that allows a thorough interpretation of the data up to a given confidence level, minimizing a log-likelihood cost:

Systematic Modeling Procedure
In contrast with the previous modeling approach which is based on past experience and a priori knowledge of the metabolic network, it is now suggested to derive a model based mostly on the information content of available data sets. This can be particularly relevant when the model structure is uncertain and experimental data sets are available that can be analyzed to extract information on the reaction stoichiometry and kinetics.
First, we recall that bioprocesses can be represented by macroscopic reaction schemes involving M reactions between N components under the following generic form [18]: where j (respectively, ℘ j ) denotes the set of reactants (or products) in the "jth" reaction. The parameters k ij are pseudo-stoichiometric coefficients while ϕ j is the corresponding reaction rate. Applying mass balances to (2), the following ordinary differential equation system is obtained: where K is the pseudo-stoichiometric matrix and υ represents the transport term taking dilutions, input feeds and gaseous outflows into account. ϑ is the vector containing all the kinetic parameters. The number of components N is generally larger than the number of reactions M so that the rank of the stoichiometric matrix K is assumed to be M. For instance, in [3], M = 5 and N = 6.
Defining the transport-free state evolution ξ f and integrating (3) between two consecutive measurement times lead to the following expression: where ξ ∆ f i is the differential transport-free state vector. As discussed in [6], Equation (4) expresses that ξ ∆ f i is contained in a M-dimensional linear subspace, and MLPCA allows to determine subspaces of increasing dimensions p explaining a noisy data set (and therefore reaction schemes of increasing detail explaining the experimental data). A systematic procedure can therefore be developed, which selects the smallest value of p that allows a thorough interpretation of the data up to a given confidence level, minimizing a log-likelihood cost: where n s is the number of measured vector samples and ξ ∆ f ,m i is the noisy measurement of ξ ∆ f , with an error covariance matrix Q ∆ i andξ ∆,p f is its maximum-likelihood (ML) estimate by the reduced p-dimensional linear model [6]. Jp is a decreasing function of p which is always smaller or equal to the log-likelihood cost J* of the true nonlinear model. Since J* is known to have a chi-square distribution with n S xN degrees of freedom [19]. The number of reaction is just chosen as the smallest p such that the log-likelihood cost Jp is smaller or equal to the range of a χ 2 n S xN -distributed random variable. Once the number of reactions is determined, the resulting N by p affine subspace basisρ can be used to estimate a stoichiometric matrixK, which is a linear combination of the basis vectors, i.e., K =ρG (6) with G a p by p regular matrix. For a complete estimation of the stoichiometry, p biological constraints have to be imposed in each column ofK (for instance the fact that a specific reactant or product is involved in only one reaction).

Operating Conditions
In the framework of this study, six sequential suspended hybridoma batch cultures of 2 hybridoma strains (called, for the sake of confidentiality, HB1 and HB2) were performed in two series of three 200 mL T-flasks. In this protocol, at the initial time of each batch, biomass is kept in the reactor, while the metabolites (lactate, ammonia and monoclonal antibodies) are withdrawn and the substrate concentrations (glucose and glutamine) are set to prescribed values (respectively ranging between 6 and 7 g/L, and 0.3 and 0.4 g/L). The end-of-batch viable and dead biomass concentrations are considered as the initial conditions of the next batch (the initial biomass concentration of the first batch is 0.1 × 10 6 cells/mL). The culture time is approximately 15 days and one medium renewal is performed approximately after one week. Measurements are taken once every day.
The culture medium is based on 10% FBS (ThermoFischer, Waltham, MA, USA) added to DMEM (Lonza, Belgium) with 6 g/L of glucose and 4 mM of L-glutamine, and is replaced at a specific time (approximately after one week) when one of the substrates (glucose and glutamine) is exhausted, in order to avoid starving. Most of the times, due to the selected medium composition, glutamine is the limiting substrate. As glucose measurement can be performed relatively quickly with respect to the other analytical methods, medium refreshments are achieved based on glucose concentration evolution. Indeed, when glutamine vanishes, glycolysis stops and glucose is not oxidized anymore. Once this phenomenon is observed, the medium is replaced within the day.
Concerning the culture basic parameters, pH medium is set between 7.2-7.6 at the beginning of the batch and decreases to a minimum of pH between 6.7 and 7.0. The temperature is regulated at 37 • C in a 5% CO 2 incubator.

Measurements and Data Sets
Measurements are collected off-line following different methods with respect to the component/analyte:

•
Biomass: Living and dead biomasses are measured by cell-counting using Trypan blue and a Neubauer chamber.
• Glucose concentration is measured by using a Roche glycemic analytical device called Accu-Chek allowing a fast calculus of the glucose concentration within a few seconds. • Lactate concentration is also measured using a Roche device called Accutrend delivering fast concentration measurements using dipsticks. • A "Mega-Calc" enzymatic kit from Megazyme is used to obtain the glutamine and ammonium concentrations. This method is based on absorbance measurements. • Antibody concentration is obtained using an ELISA dosage of murine IgG designed by the CER group from Aye (Belgium) based on reactants from Bethyl Laboratories (ref A90-131A for coatage antibodies and A90-131P for revelation).
The resulting data are shown in Figures 2 and 3. As apparent, cell viability decreases significantly after four days but is maintained around 30% thanks to the medium replacement. The level of ammonium concentration is very low and ranges below the sensitivity level of the measurement method. Therefore, ammonium is not considered in the modeling study since concentrations are far below the growth-inhibiting level. Only the glucose overflow, producing lactate, will be taken into account.

Measurements and Data Sets
Measurements are collected off-line following different methods with respect to the component/analyte:


Biomass: Living and dead biomasses are measured by cell-counting using Trypan blue and a Neubauer chamber.  Glucose concentration is measured by using a Roche glycemic analytical device called Accu-Chek allowing a fast calculus of the glucose concentration within a few seconds.  Lactate concentration is also measured using a Roche device called Accutrend delivering fast concentration measurements using dipsticks.  A "Mega-Calc" enzymatic kit from Megazyme is used to obtain the glutamine and ammonium concentrations. This method is based on absorbance measurements.  Antibody concentration is obtained using an ELISA dosage of murine IgG designed by the CER group from Aye (Belgium) based on reactants from Bethyl Laboratories (ref A90-131A for coatage antibodies and A90-131P for revelation).
The resulting data are shown in Figures 2 and 3. As apparent, cell viability decreases significantly after four days but is maintained around 30% thanks to the medium replacement. The level of ammonium concentration is very low and ranges below the sensitivity level of the measurement method. Therefore, ammonium is not considered in the modeling study since concentrations are far below the growth-inhibiting level. Only the glucose overflow, producing lactate, will be taken into account.

Data Processing
Before applying MLPCA to the data sets, elimination of data outliers should be achieved in order to reject measurement inconsistencies. For instance, the last increasing glutamine concentration measurements of HB1 third experiment should not be used in identification (and direct validation)

Data Processing
Before applying MLPCA to the data sets, elimination of data outliers should be achieved in order to reject measurement inconsistencies. For instance, the last increasing glutamine concentration measurements of HB1 third experiment should not be used in identification (and direct validation) since glutamine production is not possible.
Even if part of the data is discarded for identification, all the measurements can be considered in cross-validation. In the next sections, the first two sets of HB1 data are selected for identification, and the rest of data for cross-validation.

MLPCA-Based Systematic Procedure
The methodology presented in Section 2.2 is now applied to the first two data sets of HB1. As shown in Figure 4, a 3-dimensional subspace (i.e., p = 3 reactions) is sufficient to interpret the data.

Data Processing
Before applying MLPCA to the data sets, elimination of data outliers should be achieved in order to reject measurement inconsistencies. For instance, the last increasing glutamine concentration measurements of HB1 third experiment should not be used in identification (and direct validation) since glutamine production is not possible.
Even if part of the data is discarded for identification, all the measurements can be considered in cross-validation. In the next sections, the first two sets of HB1 data are selected for identification, and the rest of data for cross-validation.

MLPCA-Based Systematic Procedure
The methodology presented in Subsection 2.2 is now applied to the first two data sets of HB1. As shown in Figures 4, a 3-dimensional subspace (i.e., p = 3 reactions) is sufficient to interpret the data.  The following matrix ρ represents the maximum likelihood principal components defining the subspace basis related to Figure 4: To obtain a biologically-consistent stoichiometric matrix, reaction constraints have to be expressed so as define a matrix G as introduced in Equation (6): (a) The existence of a glycolysis pathway where biomass grows on substrates, producing no lactate and without mortality (k 11 = 1,k 21 = 0,k 51 = 0); (b) A sole glucose overflow pathway, according to the absence of ammonium (i.e., of glutamine overflow), where no dead biomass nor antibody is produced (k 12 = 1,k 22 = 0,k 62 = 0); (c) A biomass death pathway (k 13 = −1,k 23 = 1) theoretically with no substrate or metabolite concentration variations. The latter would represent too many constraints with respect to the available degree of freedom and arbitrarily, only the lactate coefficient is set to zero (k 53 = 0). Indeed, due to the size of G, which is a 3 by 3 matrix, only 3 constraints can be expressed per reaction.
The general constrained problem can be summarized as: In contrast with [5,6], this case study offers the possibility to explore the scenario where biomass is produced out of several macro-reactions.
A specificK matrix related to the constrained problem (8) is provided by (9): Apparently, the glucose and glutamine stoichiometric coefficients in the third reaction (i.e.,k 33 andk 43 ) are small compared to the sum of their respective values in reactions 1 and 2. A possible scenario is therefore to consider thatk 33 andk 43 could be set to zero (the coefficient deviations is partly explained by the lack of information in the data and the measurement noise).
The corresponding reaction scheme becomes: Substrate oxidation: Substrate overflow: where ϕ 1 , ϕ 2 and ϕ 3 are the reaction rates introduced in Section 5.1. The corresponding mass balance equations are: Compared to published models such as [3], the number of reactions is reduced. This can be explained by the absence of ammonium and the related overflow mechanism. As our procedure is data-driven, it leads to the identification of the sole phenomena visible in the collected experimental data.
Moreover, our strategy allows to decouple the identification of the stoichiometry from the kinetics or, at least, to get a first estimate of the stoichiometric parameters, independently of the kinetics. This can be an important asset when identifying bioprocess complex models with numerous parameters.

Reaction Rates
Since the double bottleneck glucose-glutamine is reduced to a simple bottleneck depending on both substrates, a reaction rate combining Monod factors is selected while the death rate is given by

Initial Conditions and Identification Criterion
Starting with the previously obtained values of the stoichiometric matrixK in (10a-10c) as stoichiometric parameter initial conditions, the whole parameter set (i.e., stoichiometric and kinetic parameters) can be identified minimizing a least-squares criterion measuring the distance between model simulated data ξ m and experimental measurements ξ exp as in: where θ = [µ max1 µ max2 K G K Gn K Gd K Gnd µ dmax k 31 k 41 k 61 k 32 k 42 k 52 k 63 ξ 0 ] is the parameter vector initialized with θ 0 = µ max1,0 µ max2,0 K G,0 K Gn,0 K Gd,0 K Gnd,0 µ dmax,0k31k41k61k32k42k52k63 ξ 0,0 . The initial state ξ 0 is a vector of length N.n exp with n exp being the number of experiments used in identification. ξ 0,0 is set using the experimental measurements at time t = 0. Q is the measurement error covariance matrix. As measurement error standard deviations are a priori unknown, it is common choice to set Q to a diagonal matrix with the squares of the maximum respective concentration levels. This allows to normalize the distances calculated in (16) and give equal importance to states with different orders of magnitude.
Parameter identification is performed with the MATLAB library-optimizer "fmincon". This algoritm allows to set box constraints on the parameters so as to limit the search space, and is typically used here in three successive calls. The first call starts from the initial guess (the MLPCA estimates of the stoichiometry, and an "inspired guess" for the kinetics), and the next are initialized with the parameter values resulting from of the previous minimization. Clever initialization is essential in reducing the computational cost and in increasing the chance of capturing the global minimum.

Minimization and Multi-Start Strategy
A multi-start strategy is applied in order to check if convergence is achieved when starting from different locations in the 7-dimensional kinetic parameter space polytope bounded by vertices defined in Table 1, and to identify the best parameter set corresponding to the cost Function (16) global minimum. Table 1. Vertices of the multi-start parameter polytope.

Kinetic Parameters/Min-Max Values
Minimum Initial Value Maximum Initial Value 25 runs were achieved, leading to the results shown in Table A1 in Appendix A. From a quantitative point of view, the minimization process is achieved efficiently in most of the cases since the cost function residuals are comprised in the interval J res = 1.1431 1.5686 in 22 out of the 25 runs (the initial order of magnitude of J is typically between 20 and 100). Runs 11, 16 and 21 lead respectively to J res = 6.7252, J res = 4.1909 and J res = 2.8027, which are highlighted in Table A1 in Appendix A (large deviations in the value of the growth rate are observed). We can conclude that the neighborhood of the optimum should be reached in almost 90% of the runs based on random initialization inside the polytope.
Interestingly, all the 22 runs lead to similar direct validation results shown in Figure 5. For the sake of space in this article, both experiments are graphed on the same figure and the second experiment starts after 15 days, i.e. when the first one is over. Overall, the model predicts well the experimental measurements. However, the prediction of antibody concentration is less accurate after the medium renewal at day 8, probably due to inaccurate biomass concentration measurements.
Bioengineering 2017, 4, 17 11 of 21 A1 in Appendix A (large deviations in the value of the growth rate are observed). We can conclude that the neighborhood of the optimum should be reached in almost 90% of the runs based on random initialization inside the polytope. Interestingly, all the 22 runs lead to similar direct validation results shown in Figure 5. For the sake of space in this article, both experiments are graphed on the same figure and the second experiment starts after 15 days, i.e. when the first one is over. Overall, the model predicts well the experimental measurements. However, the prediction of antibody concentration is less accurate after the medium renewal at day 8, probably due to inaccurate biomass concentration measurements.

Parametric Sensitivity Analysis
The evaluation of parametric sensitivities, i.e., the relative influence of the parameters on the model outputs, is useful to assess potential identifiability problems and confidence intervals. Identifiability depends on the model structure and parametrization as well as on the information

Parametric Sensitivity Analysis
The evaluation of parametric sensitivities, i.e., the relative influence of the parameters on the model outputs, is useful to assess potential identifiability problems and confidence intervals. Identifiability depends on the model structure and parametrization as well as on the information content of the data. In unfavorable situations a lack of sensitivity could appear or correlation among parameters. When the model is identifiable with the data at hand, sensitivity information can be used to evaluate the Fisher Information Matrix (FIM) and in turn confidence intervals for the several parameters [20].

Parameter Error Covariance
The sensitivity of the ith state ξ i with respect to kth parameter θ k at time t is theoretically defined by: Parametric sensitivities can be computed by integration of the following ordinary differential equations: with .
ξ i = f i the model state equation. Parameter identifiability can be assessed using the Fisher Information Matrix (FIM), which can be computed as follows where t k is the sampling time and n meas is the number of samples. An optimistic estimate of the parameter estimation error covariance matrix can be estimated based on the inverse of the FIM:P > σ 2 FI M −1 (20) with σ 2 being the posterior estimate of the measurement error variance obtained from the residual cost function at the optimum: where N meas is the total number of measurements (N meas = n meas N) and n θ is the number of estimated parameters.

Application to the Case Study
The relative standard deviations (the square root of the diagonal of (20)) are shown in Table A2 in Appendix A for the 22 optimization runs under consideration. It is apparent that the error on K G is very large, which is a sign that model (11) is over-parameterized. Indeed, glucose concentration levels are low so that G G+K G ≈ 1.

Model Reduction
Expression (13) is simplified to: Since glutamine is the main nitrogen source dedicated to cell viability, it is not surprising that glutamine becomes responsible of cell growth, i.e. glycolysis and overflow.

Re-Identification
With the exception of a few local minima, multi-start identification again leads the minima in the range J res = 1.1087 1.6534 and direct validation is shown in Figure 6.

Reduced Model Cross-Validation
The third data set of HB1 is now used to cross-validate the identified model. During this crossvalidation, initial states are re-estimated since initial measurement noise can be a critical source of result degradation. Results shown in Figure 7 are quite satisfactory even though the antibody concentration still suffers from discrepancies after medium renewal. It is worth noticing that the last 3 measurements of glutamine concentration are probably outliers following wrong analytical manipulations (glutamine is only consumed and cannot be produced). Relative standard deviations are much improved as shown in Table 2, only for the best run (i.e., presenting the best cost function and relative error standard deviations).

Reduced Model Cross-Validation
The third data set of HB1 is now used to cross-validate the identified model. During this cross-validation, initial states are re-estimated since initial measurement noise can be a critical source of result degradation. Results shown in Figure 7 are quite satisfactory even though the antibody concentration still suffers from discrepancies after medium renewal. It is worth noticing that the last 3 measurements of glutamine concentration are probably outliers following wrong analytical manipulations (glutamine is only consumed and cannot be produced).   The residual deviation between the model and the experimental data is given by J res = 1.3573. Interestingly, the model is also able of a relatively good prediction of the experimental data collected with the second hybridoma strain shown in Figure 8. The main discrepancy is in the prediction of the biomass (and consequently of the antibody concentration). However, residuals are still relatively low (J res = 1.4774), confirming the satisfactory results. These observations allow the perspective that macroscopic models could be adapted from one application to another at relatively little extra costs, just recalibrating the model based on some new available data, starting from the parameter estimates obtained in earlier applications. The residual deviation between the model and the experimental data is given by Interestingly, the model is also able of a relatively good prediction of the experimental data collected with the second hybridoma strain shown in Figure 8. The main discrepancy is in the prediction of the biomass (and consequently of the antibody concentration). However, residuals are still relatively low ( 4774 . 1  res J ), confirming the satisfactory results. These observations allow the perspective that macroscopic models could be adapted from one application to another at relatively little extra costs, just recalibrating the model based on some new available data, starting from the parameter estimates obtained in earlier applications.

Robustness to Parameter Uncertainty
Since the identified parameters of Section 7.2 show some uncertainties represented by their estimation error standard deviations (see Table 2), a Monte-Carlo analysis is developed, where each parameter is subject to normally distributed variations.
100 runs of the HB1 model are performed for the HB1 cross-validation data sets and are shown in Figure 9. The trajectory envelope is most of the time contained within the measurement confidence intervals with the exception of the MAb measurements following the medium renewal.
Results of the Monte-Carlo analysis are presented in Table 3. Table 3. Results of the Monte-Carlo analysis: number of runs, minimum, maximum and mean values of the residual cost function J and standard deviation.
Parameter variations can have a slight positive effect on cross validation (since the residual cost function was initially 1.3573 and the best Monte-Carlo case is 1.3412) but usually a negative effect, the worst case corresponding to a residual cost of 1.6225. Since all the runs provide satisfactory results, the identified model is quite acceptable for prediction and control purposes.

Robustness to Parameter Uncertainty
Since the identified parameters of Subsection 7.2 show some uncertainties represented by their estimation error standard deviations (see Table 2), a Monte-Carlo analysis is developed, where each parameter is subject to normally distributed variations.
100 runs of the HB1 model are performed for the HB1 cross-validation data sets and are shown in Figure 9. The trajectory envelope is most of the time contained within the measurement confidence intervals with the exception of the MAb measurements following the medium renewal.
Results of the Monte-Carlo analysis are presented in Table 3. Parameter variations can have a slight positive effect on cross validation (since the residual cost function was initially 1.3573 and the best Monte-Carlo case is 1.3412) but usually a negative effect, the worst case corresponding to a residual cost of 1.6225. Since all the runs provide satisfactory results, the identified model is quite acceptable for prediction and control purposes.  Table 3. Circles represent the experimental measurements with a confidence interval of 99%.

Optimization of the Monoclonal Antibody Production
This section intends to provide the best medium renewal time and composition to maximize the monoclonal antibody production and substrate savings. Using the validated model of Section 7, it is possible to express these targets in a mathematical objective function of the form: where α represents a weighting coefficient penalizing substrate savings with respect to MAb production, i.e. defining the degree of predominance of one target over the other. Minimization of (25) is achieved using the optimizer fmincon from the Matlab platform, in order to find the best values of θ = [trenewal Grenewal Gnrefresh], i.e., is the medium renewal time trenewal and the glucose and glutamine concentrations in the medium Grenewal and Gnrenewal. Fmincon also allows to specify box constraints for trenewal Є [3 14] days. These values are selected in accordance with the previous experimental results: medium renewal should on the one hand not occur too soon and on the other hand before the end of  Table 3. Circles represent the experimental measurements with a confidence interval of 99%.

Optimization of the Monoclonal Antibody Production
This section intends to provide the best medium renewal time and composition to maximize the monoclonal antibody production and substrate savings. Using the validated model of Section 7, it is possible to express these targets in a mathematical objective function of the form: where α represents a weighting coefficient penalizing substrate savings with respect to MAb production, i.e. defining the degree of predominance of one target over the other. Minimization of (25) is achieved using the optimizer fmincon from the Matlab platform, in order to find the best values of θ = [t renewal G renewal Gn refresh ], i.e., is the medium renewal time t renewal and the glucose and glutamine concentrations in the medium G renewal and Gn renewal . Fmincon also allows to specify box constraints for t renewal Є [3 14] days. These values are selected in accordance with the previous experimental results: medium renewal should on the one hand not occur too soon and on the other hand before the end of the experiment set at day 14. G renewal Є [1 15] g/L and Gn renewal Є [0.1 1] g/L so as to avoid cell starvation or growth inhibition through the accumulation of byproduct. Figure 10 shows the optimization results when α is set to zero and G renewal and Gn renewal are respectively set to 6 and 0.4 g/L (which is similar to the concentrations used in the experiments dedicated to model identification described in Section 7). The best time at which medium renewal should be achieved is found at t renewal = 4.54 days (in the previous experiments, renewal had been achieved after approximately 7 days). Moreover, the MAb production, defined as the sum of the final batch concentrations, amounts to 60.92 µg/mL which represents a production gain of 30% with respect to the experiments of Figure 1 (where the production can be estimated to 40 to 45 µg/mL). Figure 11 shows new results when a strong emphasis is placed on substrate savings with α = 10. The optimizer converges to θ = [6.95 4.55 1], which leads to the following observations: • Even when considering substrate savings, the upper bound of Gn refresh is reached. Indeed, when G is depleted, Gn still limits biomass decay and therefore maintains an efficient MAb production rate. However, since ammonium production (byproduct formed by glutamine overflow) is not considered in the model obtained in Section 7, higher values of Gn refresh are not recommended.

•
Interestingly, approximately the same renewal time as in Figure 2 is obtained, which means that these experiments could be "economically" optimized only by revising the medium composition. 1 1] g/L so as to avoid cell starvation or growth inhibition through the accumulation of byproduct. Figure 10 shows the optimization results when α is set to zero and Grenewal and Gnrenewal are respectively set to 6 and 0.4 g/L (which is similar to the concentrations used in the experiments dedicated to model identification described in Section 7). The best time at which medium renewal should be achieved is found at trenewal = 4.54 days (in the previous experiments, renewal had been achieved after approximately 7 days). Moreover, the MAb production, defined as the sum of the final batch concentrations, amounts to 60.92 µ g/mL which represents a production gain of 30% with respect to the experiments of Figure 1 (where the production can be estimated to 40 to 45 µ g/mL). Figure 11 shows new results when a strong emphasis is placed on substrate savings with α = 10. The optimizer converges to θ = [6.95 4.55 1], which leads to the following observations:


Even when considering substrate savings, the upper bound of Gnrefresh is reached. Indeed, when G is depleted, Gn still limits biomass decay and therefore maintains an efficient MAb production rate. However, since ammonium production (byproduct formed by glutamine overflow) is not considered in the model obtained in Section 7, higher values of Gnrefresh are not recommended.  Interestingly, approximately the same renewal time as in Figure 2 is obtained, which means that these experiments could be "economically" optimized only by revising the medium composition.   1 1] g/L so as to avoid cell starvation or growth inhibition through the accumulation of byproduct. Figure 10 shows the optimization results when α is set to zero and Grenewal and Gnrenewal are respectively set to 6 and 0.4 g/L (which is similar to the concentrations used in the experiments dedicated to model identification described in Section 7). The best time at which medium renewal should be achieved is found at trenewal = 4.54 days (in the previous experiments, renewal had been achieved after approximately 7 days). Moreover, the MAb production, defined as the sum of the final batch concentrations, amounts to 60.92 µ g/mL which represents a production gain of 30% with respect to the experiments of Figure 1 (where the production can be estimated to 40 to 45 µ g/mL). Figure 11 shows new results when a strong emphasis is placed on substrate savings with α = 10. The optimizer converges to θ = [6.95 4.55 1], which leads to the following observations:


Even when considering substrate savings, the upper bound of Gnrefresh is reached. Indeed, when G is depleted, Gn still limits biomass decay and therefore maintains an efficient MAb production rate. However, since ammonium production (byproduct formed by glutamine overflow) is not considered in the model obtained in Section 7, higher values of Gnrefresh are not recommended.  Interestingly, approximately the same renewal time as in Figure 2 is obtained, which means that these experiments could be "economically" optimized only by revising the medium composition.   Since MAb production clearly appears as a function of substrate saving penalization, new optimizations considering α in the range 0 to 500 with incremental steps of 50 are achieved in order to assess the impact of α on MAb production and select a good compromise. Results displayed in Figure 12 show that specific operating conditions can be chosen to reach a target MAb production. For instance, approximately 3 g/L of glucose are sufficient, with a renewal after 4 days, to harvest 75 µg/mL of MAb within 14 days. Moreover, operating conditions of Figure 2 seem to be a good economic compromise as 100 µg/mL can be harvested starting with a glucose concentration of 6 g/L and a renewal after 7 days. Concerning the glutamine concentration, the observations from Figure 10 are confirmed: since no glutamine overflow is considered, very high glutamine concentrations are unrealistically tolerated.
Since MAb production clearly appears as a function of substrate saving penalization, new optimizations considering α in the range 0 to 500 with incremental steps of 50 are achieved in order to assess the impact of α on MAb production and select a good compromise. Results displayed in Figure 12 show that specific operating conditions can be chosen to reach a target MAb production. For instance, approximately 3 g/L of glucose are sufficient, with a renewal after 4 days, to harvest 75 µ g/mL of MAb within 14 days. Moreover, operating conditions of Figure 2 seem to be a good economic compromise as 100 µ g/mL can be harvested starting with a glucose concentration of 6 g/L and a renewal after 7 days. Concerning the glutamine concentration, the observations from Figure 10 are confirmed: since no glutamine overflow is considered, very high glutamine concentrations are unrealistically tolerated.

Conclusions
In this work, a simple dynamic model of hybridoma sequential batch cultures is developed, which can be used to optimize the production of monoclonal antibodies.
Maximum likelihood principal component analysis allows assessing the information content of the experimental data, providing the minimum number of reactions and the corresponding stoichiometry by solving an optimization problem under a few a priori biological constraints. An original formulation of the method is presented, allowing biomass to occur in several reactions.
Advantages of the method are (a) to limit the number of reactions, i.e. to avoid a useless complication of the model with respect to the experimental field and the involved biological phenomena (activation, saturation, inhibition, etc.); (b) to offer the possibility to proceed to a quick first estimation of the stoichiometry independently of the kinetics, in turn reducing the number of unknown parameters (for the current model, stoichiometry represents half the parameter set); (c) to provide a "divide and conquer" approach where the stoichiometry and kinetics can be estimated separately or simultaneously, in an iterative way, starting from estimates obtained at the previous step.
The procedure can be supplemented by parametric sensitivity analysis, which allows further model simplification, whenever needed, by isolating parameters with low sensitivities.
A Monte-Carlo study, where parameter variations are considered in accordance with the resulting estimation error variances, shows that model trajectories are globally kept inside a corridor defined by measurement confidence intervals (i.e., parameter discrepancies do not cause critical model misevaluations).
As a practical illustrative outcome of the present study, the obtained dynamic model is used for a two-sequential batch process optimization (determination of the best sequence and composition of medium renewals). The results show that the importance of substrate savings drives the location of the optimum. A renewal time scheduling can therefore be established based on the user will to save medium components such as substrates. Further experimental validations of the optimization

Conclusions
In this work, a simple dynamic model of hybridoma sequential batch cultures is developed, which can be used to optimize the production of monoclonal antibodies.
Maximum likelihood principal component analysis allows assessing the information content of the experimental data, providing the minimum number of reactions and the corresponding stoichiometry by solving an optimization problem under a few a priori biological constraints. An original formulation of the method is presented, allowing biomass to occur in several reactions.
Advantages of the method are (a) to limit the number of reactions, i.e. to avoid a useless complication of the model with respect to the experimental field and the involved biological phenomena (activation, saturation, inhibition, etc.); (b) to offer the possibility to proceed to a quick first estimation of the stoichiometry independently of the kinetics, in turn reducing the number of unknown parameters (for the current model, stoichiometry represents half the parameter set); (c) to provide a "divide and conquer" approach where the stoichiometry and kinetics can be estimated separately or simultaneously, in an iterative way, starting from estimates obtained at the previous step.
The procedure can be supplemented by parametric sensitivity analysis, which allows further model simplification, whenever needed, by isolating parameters with low sensitivities.
A Monte-Carlo study, where parameter variations are considered in accordance with the resulting estimation error variances, shows that model trajectories are globally kept inside a corridor defined by measurement confidence intervals (i.e., parameter discrepancies do not cause critical model misevaluations).
As a practical illustrative outcome of the present study, the obtained dynamic model is used for a two-sequential batch process optimization (determination of the best sequence and composition of medium renewals). The results show that the importance of substrate savings drives the location of the optimum. A renewal time scheduling can therefore be established based on the user will to save medium components such as substrates. Further experimental validations of the optimization method are important perspectives of this still on-going work as well as estimation and on-line control issues.