Glycosylation flux analysis of immunoglobulin G in Chinese hamster ovary perfusion cell culture

SI Glycosylation flux analysis of immunoglobulin G in Chinese hamster ovary perfusion cell culture Sandro Hutter1,2,a, Moritz Wolf1,a, Nan Papili Gao1,2, Dario Lepori1, Thea Schweigler1, Massimo Morbidelli1 and Rudiyanto Gunawan1,2,3,* 1 Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich 8093, Switzerland 2 Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland 3 Department of Chemical and Biological Engineering, University at Buffalo, The State University of New York, Amherst, NY 14260, USA a Contributed equally * Correspondence: rudiyant@buffalo.edu


Introduction
Therapeutic recombinant monoclonal antibodies constitute the most important class of drugs in the biopharmaceutical industry, making up approximately half of the total revenue of biopharmaceutical products in 2013 [1].The production of mAb drugs typically employs batch or fed-batch cultivations of mammalian cells.The state-of-the-art batch/fed-batch cell cultures are able to meet the large production volume requirement of mAbs, with reactor sizes of up to 25,000 L [2].Nevertheless, the increasing number of mAb products entering different stages of clinical trials and the burgeoning market of biosimilars driven by impending patent expirations of blockbuster mAb drugs, give a strong motivation for the development of new production technology that is more robust and cost effective [3].The US Food and Drug Administration (FDA) initiative on quality by design and process analytical technology put further emphasis on implementing quantitative approaches for process improvements in biopharmaceutical manufacturing [4,5].
Continuous manufacturing technology offers an effective and flexible way for the large scale and robust production of drug compounds [6].In the biopharmaceutical industry, the application of continuous cell culture technology has thus far been limited to the production of unstable products that require constant recovery [7].Nevertheless, continuous perfusion cell cultures have previously been demonstrated to be capable of producing antibodies at a volumetric rate that matches or exceeds that of fed-batch cultures [8].In addition to the high productivity, the stable steady state operation mode in conjunction with the short residence time of perfusion cultures translates to a tight maintenance of product quality.Whether or not product qualities and their key controlling parameters can be directly translated from the traditional batch/fed-batch cell cultures to the perfusion cell cultivations is still unresolved.Past work on converting mAb production from batch/fed-batch to perfusion cell cultures gave conflicting reports on product qualities, where a few studies demonstrated consistent product qualities [9,10], and many others showed differences between the two production modes [11][12][13][14][15].
Among the key critical quality attributes (CQAs) of therapeutic mAbs is the glycan structures of the Fc domain [16].The N-linked glycosylation is a common post-translational modification of proteins, a process that occurs in the endoplasmic reticulum (ER) and Golgi apparatuses.The Fc glycan structure has been shown to impact protein folding [17], clearance [18], bioactivity [19] and efficacy [20], as well as immunogenicity [21].Moreover, there exists naturally-occurring heterogeneity in the glycosylation of mAbs.Thus, the FDA approval of mAb drugs is given for a particular composition of mAb glycoforms [5].Because of the importance of the N-linked glycosylation, the majority of recombinant mAb drugs are produced using mammalian host cells in order to achieve human-like glycan structures [22].In particular, Chinese Hamster Ovary (CHO) cells have become the major expression host for the biopharmaceutical production of therapeutic mAbs [23].
The N-linked glycosylation of mAbs has been shown to depend on the host genetic background [24], expression vector [25], media composition [16], media supplements [26], and bioprocess parameters [27].However, the mechanism of the above dependence remains to be established [28].Toward closing this gap, mathematical models of the glycosylation network have previously been developed [29][30][31][32][33][34][35][36].Many of these models have a large number of unknown and system-specific kinetic parameters that need to be fitted to experimental data [29][30][31][32][33].A number of parameter-free models have also been proposed to study the N-linked glycosylation process based on the constraint-based modeling approach (i.e., stoichiometric models) [34][35][36].Recently, we proposed a flux analysis method, called glycosylation flux analysis, which provides predictions of the rate or flux of intracellular glycosylation reactions using the stoichiometric model of the glycosylation network and the cell secretion rates of mAb glycoforms [36].Similar to the well-known metabolic flux analysis (MFA) [37], the GFA is based on the molar balance equations of glycoforms involved in the glycosylation reaction network under the pseudo steady state assumption.The GFA further makes use of the relatively small number of enzymes involved in the glycosylation process to reduce the degrees of freedom in the flux estimation.More specifically, we assumed that glycosylation fluxes vary with time according to a (global) cell-specific factor and a (local) enzyme-specific factor.When applied to study the IgG production of CHO cells in fed-batch cultivations, the GFA was able to give insights on how changes in the media affect the intracellular IgG glycosylation reactions [36].
The formulation of the flux estimation in the GFA involves a nonlinear least-square regression, which requires computationally intensive global optimizations.One contribution of this work is a reformulation of the flux estimation in the GFA into an iterative procedure involving two linear regression problems, which provides higher computational efficiency.In this study, we applied the improved GFA to analyze the N-linked glycosylation of IgG in perfusion CHO cell cultures undergoing setpoint changes in the viable cell densities and perfusion rates.In the post-analysis of the GFA results, we employed a random forest regression analysis to elucidate the key process parameters that affect intracellular IgG glycosylation fluxes.Finally, we compared the characteristics of IgG glycosylation between the traditional fed-batch and continuous perfusion cell cultures and identified similarities and differences.

Materials and Methods
Figure 1 illustrates the analysis of IgG glycosylation in our study.The perfusion cell culture experiments are summarized in Section 2.1.In the data preprocessing step, we computed the cell-specific secretion fluxes of IgG glycoforms, as well as other molecules such as glucose, lactate and ammonia, from cell culture measurements.The data preprocessing is described in Section 2.2.We applied the GFA to the secretion fluxes of IgG glycoforms calculated above using a constraint-based model of the glycosylation network.The GFA generates estimates of enzyme-specific glycosylation activities at different measurement time points.An improved iterative procedure for the GFA is detailed in Section 2.3.Finally, we performed a random forest regression analysis, as outlined in Section 2.4, to identify the process parameters that are able to explain the dynamic behavior of intracellular glycosylation activities. of IgG glycosylation between the traditional fed-batch and continuous perfusion cell cultures and identified similarities and differences.

Materials and Methods
Figure 1 illustrates the analysis of IgG glycosylation in our study.The perfusion cell culture experiments are summarized in Section 2.1.In the data preprocessing step, we computed the cellspecific secretion fluxes of IgG glycoforms, as well as other molecules such as glucose, lactate and ammonia, from cell culture measurements.The data preprocessing is described in Section 2.2.We applied the GFA to the secretion fluxes of IgG glycoforms calculated above using a constraint-based model of the glycosylation network.The GFA generates estimates of enzyme-specific glycosylation activities at different measurement time points.An improved iterative procedure for the GFA is detailed in Section 2.3.Finally, we performed a random forest regression analysis, as outlined in Section 2.4, to identify the process parameters that are able to explain the dynamic behavior of intracellular glycosylation activities.2), the cell-specific secretion fluxes of IgG glycoforms, as well as the cell-specific secretion/uptake rates of IgG, glucose, lactate and ammonia, were computed.(c) The glycosylation flux analysis (GFA) was applied to the secretion fluxes of IgG and its glycoforms (see Section 2.3).The GFA is based on a constraint-based model of the glycosylation network where the glycosylation fluxes vary with time according to two multiplicative factors: the enzyme-specific factor   () and the cell-specific factor ().The factor   () captures the dynamic changes in the enzyme-specific glycosylation activities, while the factor () describes the dynamic changes in the cell-specific IgG productivity.(d) A random forest regression analysis was carried out to rank the process parameters (p) based on their ability to predict the dynamic changes of   ().

Continuous Perfusion Cell Cultures
The detailed procedure of the perfusion cell culture is available elsewhere [38].Here, we provide a brief summary of the experiments.A proprietary CHO cell line expressing IgG1 was cultured using a previously developed 1.5 L perfusion cell culture setup [39], as depicted in Figure 2. In total, four perfusion cell culture experiments were performed with different viable cell density (VCD) and perfusion rate (PR) set-point profiles, as shown in Figure 3.Each of the experiments was inoculated following the same procedure.CHO cells were held back in the reactor by a cell retention device so that only cell free reaction mixture left the reactor through the harvest stream.The feed flowrate (F) ).Measurements of glucose/lactate/ammonia concentrations, viable cell density, IgG titer, glycoform fractions, bleed rates and harvest rates were taken from each cell culture run.(b) In the data preprocessing step (see Section 2.2), the cell-specific secretion fluxes of IgG glycoforms, as well as the cell-specific secretion/uptake rates of IgG, glucose, lactate and ammonia, were computed.(c) The glycosylation flux analysis (GFA) was applied to the secretion fluxes of IgG and its glycoforms (see Section 2.3).The GFA is based on a constraint-based model of the glycosylation network where the glycosylation fluxes vary with time according to two multiplicative factors: the enzyme-specific factor α J (t) and the cell-specific factor β(t).The factor α J (t) captures the dynamic changes in the enzyme-specific glycosylation activities, while the factor β(t) describes the dynamic changes in the cell-specific IgG productivity.(d) A random forest regression analysis was carried out to rank the process parameters (p) based on their ability to predict the dynamic changes of α J (t).

Continuous Perfusion Cell Cultures
The detailed procedure of the perfusion cell culture is available elsewhere [38].Here, we provide a brief summary of the experiments.A proprietary CHO cell line expressing IgG1 was cultured using a previously developed 1.5 L perfusion cell culture setup [39], as depicted in Figure 2. In total, four perfusion cell culture experiments were performed with different viable cell density (VCD) and perfusion rate (PR) set-point profiles, as shown in Figure 3.Each of the experiments was inoculated following the same procedure.CHO cells were held back in the reactor by a cell retention device so that only cell free reaction mixture left the reactor through the harvest stream.The feed flowrate (F) of fresh media into the reactor and the bleed (B) and harvest (H) flowrates out of the reactor were balanced to keep the reactor volume constant.The perfusion rate (PR) is given by the flowrate through the reactor, as follows: The perfusion rate represents the rate of fresh media supplied to the cells.We define the cell-specific perfusion rate (CSPR) as the rate of fresh media per cell: During the experiments, the following measurements were collected on a daily basis: cell counts and cell viability, concentrations of glucose, lactate and ammonia, IgG titer, and protein glycan distribution.
of fresh media into the reactor and the bleed (B) and harvest (H) flowrates out of the reactor were balanced to keep the reactor volume constant.The perfusion rate (PR) is given by the flowrate through the reactor, as follows: The perfusion rate represents the rate of fresh media supplied to the cells.We define the cellspecific perfusion rate (CSPR) as the rate of fresh media per cell: During the experiments, the following measurements were collected on a daily basis: cell counts and cell viability, concentrations of glucose, lactate and ammonia, IgG titer, and protein glycan distribution.[39] with the permission from the authors).CHO cells were cultivated in suspension in a continuous stirred tank reactor with continuous feeding of fresh nutrients.Cell-free spent media was constantly collected in the harvest stream, while cells remained in the stirred tank reactor.A bleed stream removed a small fraction of the reactor mixture, including biomass, which was used to regulate viable cell density.[39] with the permission from the authors).CHO cells were cultivated in suspension in a continuous stirred tank reactor with continuous feeding of fresh nutrients.Cell-free spent media was constantly collected in the harvest stream, while cells remained in the stirred tank reactor.A bleed stream removed a small fraction of the reactor mixture, including biomass, which was used to regulate viable cell density.
of fresh media into the reactor and the bleed (B) and harvest (H) flowrates out of the reactor were balanced to keep the reactor volume constant.The perfusion rate (PR) is given by the flowrate through the reactor, as follows: The perfusion rate represents the rate of fresh media supplied to the cells.We define the cellspecific perfusion rate (CSPR) as the rate of fresh media per cell: During the experiments, the following measurements were collected on a daily basis: cell counts and cell viability, concentrations of glucose, lactate and ammonia, IgG titer, and protein glycan distribution.[39] with the permission from the authors).CHO cells were cultivated in suspension in a continuous stirred tank reactor with continuous feeding of fresh nutrients.Cell-free spent media was constantly collected in the harvest stream, while cells remained in the stirred tank reactor.A bleed stream removed a small fraction of the reactor mixture, including biomass, which was used to regulate viable cell density.

Estimation of Secretion and Uptake Fluxes
As inputs to the GFA, the cell-specific secretion flux of each IgG glycoform (v E,i ) was determined based on the molar balance in the reactor, as follows where c E,i denotes the concentration of the i-th IgG glycoform and V denotes the reactor volume.Since the reactor volume is kept (approximately) constant, Equation ( 3) can be rearranged to give: The variables h(t) = H(t) V and b(t) = B(t) V represent the specific harvest and bleed rate, respectively.The concentration of the IgG glycoform was calculated as the product of the measured glycoform fraction (f i ) and the IgG titer (T) according to: where t k denotes the k-th measurement time point.
For the estimation of the secretion fluxes in Equation ( 4), the computed concentration of IgG glycoforms c E,i (t k ) was first smoothened (as a function of time) using spline fitting.The time derivative dc E,i (t) dt was then evaluated by taking the first order derivative of the spline curve.Equation ( 4) can be used to compute the cell-specific uptake/secretion fluxes of IgG, glucose, lactate and ammonia by replacing c E,i (t) with the concentration measurements of IgG, glucose, lactate and ammonia, respectively.

Glycosylation Flux Analysis
The glycosylation flux analysis is based on a constraint-based model of the protein glycosylation reaction network.Like other flux analysis methods, the GFA is developed for evaluating the intracellular (glycosylation) fluxes using the cellular (mAb glycoforms) secretion rates.In the GFA, a pseudo steady state assumption is taken to derive the stoichiometric model of the glycosylation network, as follows: where c I denotes the vector of m intracellular IgG glycoform concentrations, v I denotes the vector of n intracellular IgG glycosylation reaction fluxes (rates), v E denotes the vector of secretion fluxes of IgG glycoforms estimated above, and S denotes the m × n stoichiometric matrix.The (i,j)-th element of S gives the number of the i-th glycoform molecule produced (if positive) or consumed (if negative) by the j-th glycosylation reaction.Since the number of reaction fluxes (i.e., the number of unknowns) typically exceeds that of glycoforms (i.e., the number of equations), the estimation of v I from v E in Equation ( 6) is underdetermined.In other words, there exist many v I for the same experimentally determined v E .In the method Flux Balance Analysis, the most plausible v I is set to the vector that maximizes a cellular objective, such as the biomass production [37].However, the appropriate cellular objective to use for glycosylation networks is not immediately obvious.Instead of assuming a cellular objective, the GFA enforces constraints on how groups of glycosylation fluxes vary together.More specifically, in the GFA each glycosylation flux v I,j (t) is computed as the product of a reference flux value v I,j ref , an enzyme-specific factor α J (t) and a cell-specific factor β(t), as follows: v I,j (t) = α J (t)β(t)v I,j ref (7) The variables α J (t) and β(t) represent the fold-change amplification or attenuation and can therefore be normalized to 1 at a chosen reference time point t ref .
The cell-specific factor β(t) captures the (global) influence of the cell metabolism on the glycosylation process, more specifically the total amount of mAb entering/leaving the glycosylation network.For this reason, β(t) is represented by the ratio of the cell-specific productivity (q mAb ) between time t and the reference time t ref , as follows: On the other hand, the factor α J (t) describes the (local) influence of enzymatic processing capacity, which captures the dependence of glycosylation on factors such as enzyme expression and activity, as well as co-factor and nucleotide sugar availability.Note that the number of enzymes involved in the glycosylation network is typically much smaller than the number of reactions, as an enzyme catalyzes multiple reactions.Thus, the estimation of v I (t) can be reformulated to fitting α J (t) and v I,j ref to the secretion fluxes, as follows: min For a more detailed derivation of the GFA, we refer interested readers to the original publication [36].
The formulation in Equation ( 9) is a nonlinear programming problem that requires a global optimization algorithm to solve.In the following, we describe an alternative and more computationally efficient procedure for solving the regression problem in the GFA.The procedure is based on decomposing the nonlinear least square regression above into two linear regression problems.First, for a given reference flux vector v I ref , one can formulate the following linear regression problem to obtain the least square estimate of α J : where Ψ I is an n × e matrix with e being the number of enzymes involved in the glycosylation network, and n J l is the number of fluxes catalyzed by the enzyme J l .Ψ I is constructed by grouping the reference fluxes (v ref I,j ) according the enzyme that catalyzes the reactions and stacking each group (block-)diagonally.In this manner, each v ref I,j is multiplied by the corresponding enzyme-specific factor (α J ).In the second linear regression, given α J (t k ), one obtains v I,j ref by solving for the least square estimates of the following problem: where Ω is an n × n diagonal matrix with the enzyme-specific factor (α J ) as its diagonal elements.
Given the two linear regressions in Equations ( 10) and ( 11), we estimated α J and v I ref following an iterative procedure as follows: Processes 2018, 6, 176 7 of 15 (1) generate a uniformly distributed random vector of v I ref within a biologically feasible range 25]; unit : pg/cell/day), (2) given v I ref from step (1), solve for or update α J using Equation ( 10), (3) given α J from step (2), solve for or update v I ref using Equation (11), and (4) repeat steps ( 2) and (3) until the change of Φ as described in Equation ( 9) becomes smaller than a threshold (default 10 −10 ).
Equations (10) and (11) were solved following the iterative procedure above using the MATLAB subroutine lsqlin constraining α J to values between zero and 20 and v I ref to be positive.In order to improve the chance of obtaining the global minimum of Φ, we adopted a multi-start strategy and ran the aforementioned iterative estimation for multiple random initial vectors of v I ref .
Among the results of the multi-start runs, we took the best α J and v I ref values that correspond to the lowest Φ value.Note that the multi-start strategy is embarrassingly parallel and can be implemented on a multiprocessor computing platform.

Random Forest for Regression
For identifying explanatory variables of the dynamical changes in the galatosyltransferase activity α J,GalT , we formulated a regression problem using the change in the GalT enzyme-specific factors over time dα J,GalT /dt as the response variable, and the experimental parameters of perfusion cell cultures as predictor variables.More specifically, we are interested the following regression problem: which describes the dynamic change in α J,GalT using the function g t, α J,GalT , p that depends on time, α J,GalT , and process parameters p.Note that the function g t, α J,GalT , p is likely nonlinear in nature.
Here, we employed Random Forest (RF) [40] to build the above regression model using data from all four perfusion cell culture experiments.RF regression involves building an ensemble of unpruned regression trees, in which each regression tree is created using a bootstrap sample of the original dataset.At each node of a tree, a subset of predictors is selected randomly to determine the best decision split of the samples.The final prediction of the regression trees in RF is obtained by averaging the predictions of the entire ensemble.Notably, RF regression is able to capture nonlinear dependencies of the response variable on the predictors.
In our work, we applied RF using normalized data of each variable, in which the data were centered and divided by the standard deviation.We created a RF regression model using an ensemble of 100 trees and employed one third of the total predictors for the decision split at each node.Predictor variables (features) were subsequently ranked based on their average impurity gain over all splits and all trees.Predictor variables with higher impurity gains contribute more to the variability in the prediction of the response variable, and hence are considered more important.

Perfusion Cell Culture Experiments
We performed four perfusion cell cultures (Experiment A, B, C and D) using different VCD and PR set-point profiles, as illustrated in Figure 3.In Experiments A and B, we kept the VCD set-point constant for the entire duration of the cell cultures but shifted the PR set-point between day nine and ten (down in Experiment A and up in Experiment B).In Experiment C, we changed the VCD and PR set-points in a manner that maintained the CSPR at the same value.Finally, in Experiment D, we varied the VCD set-point while leaving the PR set-point constant.As shown in Figure 3, the VCD and PR followed the set-points very well with only minor deviations (also see Table S1).As the control of VCD was done only by bleeding (i.e., removal of cells), the VCD unsurprisingly tracked a decrease in the set-points better than an increase.Supplementary Figures S1-S4 give a summary of the cell culture parameters, including VCD, PR, CSPR, the concentrations of glucose (Glc), lactose (Lac), ammonia (Amm) and IgG, and the cell-specific growth rate (µ), for all experiments.As shown in Figure 4, the cell-specific productivities of IgG, i.e., the secretion rate of IgG divided by the VCD, in all experiments follows a decreasing trend over the course of the cell cultivation.Correspondingly, the experimental secretion rates of the IgG glycoforms computed in the data pre-processing decrease with time (see Figure 5 and Supplementary Figure S5). the experimental secretion rates of the IgG glycoforms computed in the data pre-processing decrease with time (see Figure 5 and Supplementary Figure S5).Processes 2018, 6, x FOR PEER REVIEW 8 of 15 the experimental secretion rates of the IgG glycoforms computed in the data pre-processing decrease with time (see Figure 5 and Supplementary Figure S5).The lines show the secretion fluxes from the fitting of α J in the GFA, as outlined in Section 2.3.

Glycosylation Flux Analysis
For the GFA, we employed the IgG glycosylation reaction network depicted in Figure 6, which consists of 19 IgG glycoforms and 25 IgG glycosylation reactions.The glycosylation reaction network was based on a previously published network [41], where we omitted glycosylation reactions and molecules corresponding to IgG glycoforms that are not detected in our experiments and do not participate as intermediate species.Since each perfusion cell culture was started in the same manner, we used the same reference IgG glycosylation flux vector v I ref in the GFA of all experiments, more specifically using day one of experiment A as the reference time sample point (i.e., v I ref of day one in Experimental A is set to the vector of 1 s). Figure 5 shows the fitting of IgG glycoform secretion fluxes in the GFA for the major glycoforms (see Supplementary Figure S5 for the rest of IgG glycoforms).

Glycosylation Flux Analysis
For the GFA, we employed the IgG glycosylation reaction network depicted in Figure 6, which consists of 19 IgG glycoforms and 25 IgG glycosylation reactions.The glycosylation reaction network was based on a previously published network [41], where we omitted glycosylation reactions and molecules corresponding to IgG glycoforms that are not detected in our experiments and do not participate as intermediate species.Since each perfusion cell culture was started in the same manner, we used the same reference IgG glycosylation flux vector  I ref in the GFA of all experiments, more specifically using day one of experiment A as the reference time sample point (i.e.,  I ref of day one in Experimental A is set to the vector of 1 s). Figure 5 shows the fitting of IgG glycoform secretion fluxes in the GFA for the major glycoforms (see Supplementary Figure S5 for the rest of IgG glycoforms).The glycan labels are provided in Supplementary Table S2.
We compared the computational speed of the iterative GFA in analyzing the perfusion cell culture datasets with that of the original implementation.On a test computer (3.33 GHz Intel Xeon W3680, 18 GB RAM), the iterative GFA converged within 5 min using 250 random starting points.Of note, among the 250 random starts, 40% converged to the same minimum Φ value.Meanwhile, the original implementation of GFA using the global optimization toolbox MEIGO [42] required at least 42 min to converge.However, among ten repeated independent runs of MEIGO, only one converged to the same minimum Φ value as the iterative GFA, while the other runs converged to higher Φ values.
Figure 7 gives the time profiles of the enzyme-specific factors   () for each of the enzymes in the IgG glycosylation network.Most of the enzyme-specific factors, particularly those of α-Mannosidase I and II (Man I/II), N-Acetylglucosaminyltransferase I and II (GnT I/II) and Fucosyltransferase (FucT), maintain a constant activity level throughout the cell cultivation (i.e.,   () ≈ 1 ).Meanwhile, the galactosyltransferase (GalT)-specific factor decreases during the beginning of the four experiments and varies with changes in the VCD and PR set-points.Since the fluxes catalyzed by sialyltransferase (SiaT) are close to zero, the estimate of the corresponding   () becomes unreliable due to high sensitivity to experimental noise, and thus is omitted from further analysis.S2.
We compared the computational speed of the iterative GFA in analyzing the perfusion cell culture datasets with that of the original implementation.On a test computer (3.33 GHz Intel Xeon W3680, 18 GB RAM), the iterative GFA converged within 5 min using 250 random starting points.Of note, among the 250 random starts, 40% converged to the same minimum Φ value.Meanwhile, the original implementation of GFA using the global optimization toolbox MEIGO [42] required at least 42 min to converge.However, among ten repeated independent runs of MEIGO, only one converged to the same minimum Φ value as the iterative GFA, while the other runs converged to higher Φ values.
Figure 7 gives the time profiles of the enzyme-specific factors α J (t) for each of the enzymes in the IgG glycosylation network.Most of the enzyme-specific factors, particularly those of α-Mannosidase I and II (Man I/II), N-Acetylglucosaminyltransferase I and II (GnT I/II) and Fucosyltransferase (FucT), maintain a constant activity level throughout the cell cultivation (i.e., α J (t) ≈ 1).Meanwhile, the galactosyltransferase (GalT)-specific factor decreases during the beginning of the four experiments and varies with changes in the VCD and PR set-points.Since the fluxes catalyzed by sialyltransferase (SiaT) are close to zero, the estimate of the corresponding α J (t) becomes unreliable due to high sensitivity to experimental noise, and thus is omitted from further analysis.

Effects of Process Parameters on Glycosylation
As mentioned above, the enzyme-specific factor of IgG galactosylation fluxes displays the most dynamical change during the perfusion cell culture.However, the relationship between the changes in the enzyme-specific factor   () and the other process parameters is difficult to discern by a simple observation of the experimental data.For this reason, we employed a random forest regression analysis using the change of galactosyltransferase-specific factor over time d  () d as the response variable and using fourteen process parameters as the predictor variables (see Section 2.4).A RF regression model should be able to capture any nonlinear dependencies between the response and predictor variables.Here, we considered the following predictors:   (), VCD, PR, B, CSPR, time and the concentrations of IgG, Glc, Lac and Amm, and the specific productivities of IgG, Glc, Lac and Amm (i.e., qIgG, qGlc, qLac and qAmm, respectively).Furthermore, we excluded data from the startup period of the cell culture (i.e., days one to three of each experiment), as we were more interested in the regulation of IgG glycosylation during the steady state operations and setpoint changes of the perfusion cell culture.Finally, we ranked the predictor variables in decreasing magnitudes of the impurity gains.A higher impurity gain points to a predictor variable with higher importance in explaining the response variable.
Figure 8 gives the ranking of the predictor variables in decreasing impurity gains.The specific productivity of IgG (qIgG) and the concentration of ammonia (Amm) are the two most important predictors of the dynamical changes in the galactosyltransferase specific activity.Indeed, when we repeated the RF regression using only qIgG and Amm as the predictor variables, we observed a similar quality of data fitting to the response variables (see Supplementary Figure S6).Kolmogorov-Smirnov (KS) test and Wilcoxon rank sum test further confirmed that the residuals of the RF regression models using all 14 predictors and those using only qIgG and Amm, are not statistically different (KS test pvalue = 0.857; Wilcoxon rank sum test p-value = 0.824).Following qIgG and Amm in the ranking are the glucose uptake rate (qGLC) and bleed rate (B), indicating that the specific growth rate has a moderate contribution to the changes in the IgG galactosylation activity.The influence of growth rate

Effects of Process Parameters on Glycosylation
As mentioned above, the enzyme-specific factor of IgG galactosylation fluxes displays the most dynamical change during the perfusion cell culture.However, the relationship between the changes in the enzyme-specific factor α GalT (t) and the other process parameters is difficult to discern by a simple observation of the experimental data.For this reason, we employed a random forest regression analysis using the change of galactosyltransferase-specific factor over time dα GalT (t) dt as the response variable and using fourteen process parameters as the predictor variables (see Section 2.4).A RF regression model should be able to capture any nonlinear dependencies between the response and predictor variables.Here, we considered the following predictors: α GalT (t), VCD, PR, B, CSPR, time and the concentrations of IgG, Glc, Lac and Amm, and the specific productivities of IgG, Glc, Lac and Amm (i.e., q IgG , q Glc , q Lac and q Amm , respectively).Furthermore, we excluded data from the startup period of the cell culture (i.e., days one to three of each experiment), as we were more interested in the regulation of IgG glycosylation during the steady state operations and setpoint changes of the perfusion cell culture.Finally, we ranked the predictor variables in decreasing magnitudes of the impurity gains.A higher impurity gain points to a predictor variable with higher importance in explaining the response variable.
Figure 8 gives the ranking of the predictor variables in decreasing impurity gains.The specific productivity of IgG (q IgG ) and the concentration of ammonia (Amm) are the two most important predictors of the dynamical changes in the galactosyltransferase specific activity.Indeed, when we repeated the RF regression using only q IgG and Amm as the predictor variables, we observed a similar quality of data fitting to the response variables (see Supplementary Figure S6).Kolmogorov-Smirnov (KS) test and Wilcoxon rank sum test further confirmed that the residuals of the RF regression models using all 14 predictors and those using only q IgG and Amm, are not statistically different (KS test p-value = 0.857; Wilcoxon rank sum test p-value = 0.824).Following q IgG and Amm in the ranking are the glucose uptake rate (q GLC ) and bleed rate (B), indicating that the specific growth rate has a moderate contribution to the changes in the IgG galactosylation activity.The influence of growth rate on IgG galactosylation is somewhat expected, as the nucleotide sugar UDP-Galactose is used for different forms of cellular glycosylation, not solely for IgG [43].
Processes 2018, 6, x FOR PEER REVIEW 11 of 15 on IgG galactosylation is somewhat expected, as the nucleotide sugar UDP-Galactose is used for different forms of cellular glycosylation, not solely for IgG [43].

Discussion
In this work, we improved the computational efficiency of the GFA and applied the GFA to analyze the IgG glycosylation of four perfusion CHO cell culture experiments.The GFA relies on a stoichiometric model of the IgG glycosylation network to estimate IgG intracellular glycosylation reaction fluxes from the secretion fluxes of IgG glycoforms [36].The GFA assumes that the intracellular IgG glycosylation fluxes vary with time according to two multiplicative factors:   and   .The enzyme-specific factor   represents the relative change of the fraction of IgG that can processed by a specific enzyme, while the cell-specific factor   describes the relative change of the specific productivity of IgG.In particular,   captures dynamical alterations in the intracellular IgG glycosylation fluxes that cannot be explained by increases or decreases in the amount of IgG entering the glycosylation network.
We caution against making an inference on the intracellular IgG glycosylation fluxes directly from the cell culture data-i.e., without using the appropriate mass and flux balances.For example, a larger fraction of an IgG glycoform could result from a higher secretion rate of the specific IgG glycoform or from lower secretion rates of the other glycoforms.Furthermore, as noted in Section 2.3, the flux balance around the cells in Equation ( 6) is underdetermined, implying that multiple solutions there exist for the intracellular IgG glycosylation fluxes for the same observed secretion fluxes.The GFA gets around such an issue by assuming that reactions catalyzed by the same enzyme vary with time commensurately.As demonstrated in the study, by quantifying dynamic intracellular glycosylation activities in an enzyme-specific manner, the GFA enables the identification of key process parameters that are indicative of and potentially contribute to the changes in the intracellular IgG glycosylation process.
In contrast to other approaches that rely on kinetic modeling of the glycosylation network, the GFA does not require specifying a large number of kinetic parameters, whose values are uncertain and likely system specific.However, the GFA requires time-series data for computing the secretion fluxes of different glycoforms, which include viable cell density, IgG titer and glycoform fractions.Such data may not always be available during the production of IgG.The GFA is also unable to provide mechanistic insights for the dynamic changes in the enzyme-specific factors   , at least not directly.In this study, we performed a post-analysis using a random forest regression to identify the key process parameters that are able to explain the variability in   ().

Discussion
In this work, we improved the computational efficiency of the GFA and applied the GFA to analyze the IgG glycosylation of four perfusion CHO cell culture experiments.The GFA relies on a stoichiometric model of the IgG glycosylation network to estimate IgG intracellular glycosylation reaction fluxes from the secretion fluxes of IgG glycoforms [36].The GFA assumes that the intracellular IgG glycosylation fluxes vary with time according to two multiplicative factors: α J and β J .The enzyme-specific factor α J represents the relative change of the fraction of IgG that can processed by a specific enzyme, while the cell-specific factor β J describes the relative change of the specific productivity of IgG.In particular, α J captures dynamical alterations in the intracellular IgG glycosylation fluxes that cannot be explained by increases or decreases in the amount of IgG entering the glycosylation network.
We caution against making an inference on the intracellular IgG glycosylation fluxes directly from the cell culture data-i.e., without using the appropriate mass and flux balances.For example, a larger fraction of an IgG glycoform could result from a higher secretion rate of the specific IgG glycoform or from lower secretion rates of the other glycoforms.Furthermore, as noted in Section 2.3, the flux balance around the cells in Equation ( 6) is underdetermined, implying that multiple solutions there exist for the intracellular IgG glycosylation fluxes for the same observed secretion fluxes.The GFA gets around such an issue by assuming that reactions catalyzed by the same enzyme vary with time commensurately.As demonstrated in the study, by quantifying dynamic intracellular glycosylation activities in an enzyme-specific manner, the GFA enables the identification of key process parameters that are indicative of and potentially contribute to the changes in the intracellular IgG glycosylation process.
In contrast to other approaches that rely on kinetic modeling of the glycosylation network, the GFA does not require specifying a large number of kinetic parameters, whose values are uncertain and likely system specific.However, the GFA requires time-series data for computing the secretion fluxes of different glycoforms, which include viable cell density, IgG titer and glycoform fractions.Such data may not always be available during the production of IgG.The GFA is also unable to provide mechanistic insights for the dynamic changes in the enzyme-specific factors α J , at least not directly.In this study, we performed a post-analysis using a random forest regression to identify the key process parameters that are able to explain the variability in α J (t).
The four perfusion cell cultures in this study differed among each other in the VCD and PR set-point profiles, which were designed with the goal of understanding how IgG productivity and glycosylation vary with the set-points and process variables during steady-state operations.The cell culture conditions (i.e., media composition, seeding density, pH, temperature) of the four experiments were kept being as similar as possible, so that the effect of changes in operating set-points VCD and PR on the glycosylation process could be examined.
In our previous study using the same CHO cells, we reported that the cell-specific productivity of IgG in fed-batch cultivations increases with time, and that such an increase is associated with lower α J s for the upstream enzymes ManI/II, GnTI/II and FucT [36].We attributed the decreasing α J s to the inability of these enzymes to process the increasing amount of IgG entering the intracellular glycosylation network, beyond the processing capacity of each enzyme.In contrast to fed-batch cultivations, the cell-specific productivity of IgG generally decreases over the course of the perfusion cell culture, as shown in Figure 4.The GFA shows that the enzyme-specific factors of the aforementioned upstream enzymes stay approximately constant at roughly 1 throughout the experiments.With α J staying near 1, the IgG intracellular glycosylation fluxes associated with these enzymes thus vary proportionally with the cell-specific productivity β(t).This trend is not surprising, considering that the amount of IgG that needs to be processed decreases over time.
The enzyme-specific factor with the most dynamical changes in the perfusion cell cultures corresponds to galactosyltransferase (GalT), which parallels that in the fed-batch cultivations.However, unlike the reduction in the GalT-specific factor α GalT (t) over time during the fed-batch cultivations [36], α GalT (t) dynamics in perfusion cell cultures showed both up and downward trends during the different steady-state operation periods, as shown in Figure 7.Because of the rich dynamics of α GalT (t) in the perfusion cell culture, the relationship among the GalT-specific factor α GalT (t) and process variables was not immediately obvious.By applying a random forest regression analysis, we identified the specific productivity of IgG and the concentration of ammonia as the two variables with the highest importance in explaining the dynamic changes of IgG galactosylation (see Figure 8).The accumulation of ammonia and its inhibitory activity on GalT have previously been reported as one of the main reasons for decreasing IgG galactosylation in fed-batch cultivations [36,44].Unlike ammonia, the connection between the IgG specific productivity and galactosylation is not immediately clear.The partial correlation between dα GalT (t) dt and q IgG is small and negative (partial correlation = −0.037),suggesting that the relationship between the two variables as revealed by RF analysis is likely to be nonlinear.The negative partial correlation further implies that keeping all other process parameters the same, a higher q IgG is concomitant with decreasing α GalT with time.Such a trend is in general agreement with how α J of the upstream enzymes vary with the cell-specific productivity of IgG as explained earlier.
Taken together, the key process variables that are associated with IgG galactosylation in our perfusion cell cultures resemble those in the fed-batch cultivations of the same CHO cells.Note that the aforementioned similarity between fed-batch and perfusion cell cultures cannot be concluded by a direct observation of the process data and is only apparent through the combination of the GFA and random forest regression analysis.Such similarity suggests the possibility of using extensive data and knowledge from traditional batch/fed-batch production to inform or predict the IgG glycosylation in continuous perfusion cell cultures, which is a topic of interest in our research groups.In a separate publication, we noted comparable IgG glycoform patterns between our perfusion cell culture process at steady state and the fed-batch cultivation beyond the stationary growth phase [15].In the literature, past studies produced contradictory conclusions on the IgG glycosylation when comparing the two cell culture modes, where some studies demonstrated similarities [9,10] and others showed discrepancies in the IgG glycan structures [13,14].A recent report of a head-to-head comparison between perfusion and fed-batch cell cultures noted a strong similarity in the IgG glycosylation dynamics [45], an observation that aligns well with the result of our analysis.

Figure 1 .
Figure 1.Analysis procedure of IgG glycosylation in CHO perfusion cell cultures.(a) Four perfusion cell cultures were performed (see Section 2.1).Measurements of glucose/lactate/ammonia concentrations, viable cell density, IgG titer, glycoform fractions, bleed rates and harvest rates were taken from each cell culture run.(b) In the data preprocessing step (see Section 2.2), the cell-specific secretion fluxes of IgG glycoforms, as well as the cell-specific secretion/uptake rates of IgG, glucose, lactate and ammonia, were computed.(c) The glycosylation flux analysis (GFA) was applied to the secretion fluxes of IgG and its glycoforms (see Section 2.3).The GFA is based on a constraint-based model of the glycosylation network where the glycosylation fluxes vary with time according to two multiplicative factors: the enzyme-specific factor   () and the cell-specific factor ().The factor   () captures the dynamic changes in the enzyme-specific glycosylation activities, while the factor () describes the dynamic changes in the cell-specific IgG productivity.(d) A random forest regression analysis was carried out to rank the process parameters (p) based on their ability to predict the dynamic changes of   ().

Figure 1 .
Figure 1.Analysis procedure of IgG glycosylation in CHO perfusion cell cultures.(a) Four perfusion cell cultures were performed (see Section 2.1).Measurements of glucose/lactate/ammonia concentrations, viable cell density, IgG titer, glycoform fractions, bleed rates and harvest rates were taken from each cell culture run.(b) In the data preprocessing step (see Section 2.2), the cell-specific secretion fluxes of IgG glycoforms, as well as the cell-specific secretion/uptake rates of IgG, glucose, lactate and ammonia, were computed.(c) The glycosylation flux analysis (GFA) was applied to the secretion fluxes of IgG and its glycoforms (see Section 2.3).The GFA is based on a constraint-based model of the glycosylation network where the glycosylation fluxes vary with time according to two multiplicative factors: the enzyme-specific factor α J (t) and the cell-specific factor β(t).The factor α J (t) captures the dynamic changes in the enzyme-specific glycosylation activities, while the factor β(t) describes the dynamic changes in the cell-specific IgG productivity.(d) A random forest regression analysis was carried out to rank the process parameters (p) based on their ability to predict the dynamic changes of α J (t).

Figure 2 .
Figure 2. A schematic of perfusion cell culture reactor (Figure adapted from[39] with the permission from the authors).CHO cells were cultivated in suspension in a continuous stirred tank reactor with continuous feeding of fresh nutrients.Cell-free spent media was constantly collected in the harvest stream, while cells remained in the stirred tank reactor.A bleed stream removed a small fraction of the reactor mixture, including biomass, which was used to regulate viable cell density.

Figure 3 .
Figure 3. Perfusion cell culture experiments.Four perfusion cell culture experiments, labeled (A), (B), (C) and (D), were conducted with varying VCD and PR given set-points (solid lines).The experimental VCD and PR are shown as blue filled squares and red empty triangles, respectively.

Figure 2 .
Figure 2. A schematic of perfusion cell culture reactor (Figure adapted from[39] with the permission from the authors).CHO cells were cultivated in suspension in a continuous stirred tank reactor with continuous feeding of fresh nutrients.Cell-free spent media was constantly collected in the harvest stream, while cells remained in the stirred tank reactor.A bleed stream removed a small fraction of the reactor mixture, including biomass, which was used to regulate viable cell density.

Figure 2 .
Figure 2. A schematic of perfusion cell culture reactor (Figure adapted from[39] with the permission from the authors).CHO cells were cultivated in suspension in a continuous stirred tank reactor with continuous feeding of fresh nutrients.Cell-free spent media was constantly collected in the harvest stream, while cells remained in the stirred tank reactor.A bleed stream removed a small fraction of the reactor mixture, including biomass, which was used to regulate viable cell density.

Figure 3 .
Figure 3. Perfusion cell culture experiments.Four perfusion cell culture experiments, labeled (A), (B), (C) and (D), were conducted with varying VCD and PR given set-points (solid lines).The experimental VCD and PR are shown as blue filled squares and red empty triangles, respectively.

Figure 3 .
Figure 3. Perfusion cell culture experiments.Four perfusion cell culture experiments, labeled (A), (B), (C) and (D), were conducted with varying VCD and PR given set-points (solid lines).The experimental VCD and PR are shown as blue filled squares and red empty triangles, respectively.

Figure 4 .
Figure 4. Cell-specific productivity.The cell-specific productivity generally decreases over the course of the four perfusion cell culture experiments (A), (B), (C) and (D).

Figure 5 .
Figure 5. Secretion fluxes of the main IgG glycoforms.The solid symbols show the experimental secretion fluxes computed in the data preprocessing step, as outlined in Section 2.2 (Experiment A: black squares, Experiment B: blue circles, Experiment C: green triangle, Experiment D: red diamonds).The lines show the secretion fluxes from the fitting of   in the GFA, as outlined in Section 2.3.

Figure 4 .
Figure 4. Cell-specific productivity.The cell-specific productivity generally decreases over the course of the four perfusion cell culture experiments (A), (B), (C) and (D).

Figure 4 .
Figure 4. Cell-specific productivity.The cell-specific productivity generally decreases over the course of the four perfusion cell culture experiments (A), (B), (C) and (D).

Figure 5 .
Figure 5. Secretion fluxes of the main IgG glycoforms.The solid symbols show the experimental secretion fluxes computed in the data preprocessing step, as outlined in Section 2.2 (Experiment A: black squares, Experiment B: blue circles, Experiment C: green triangle, Experiment D: red diamonds).The lines show the secretion fluxes from the fitting of   in the GFA, as outlined in Section 2.3.

Figure 5 .
Figure 5. Secretion fluxes of the main IgG glycoforms.The solid symbols show the experimental secretion fluxes computed in the data preprocessing step, as outlined in Section 2.2 (Experiment A: black squares, Experiment B: blue circles, Experiment C: green triangle, Experiment D: red diamonds).The lines show the secretion fluxes from the fitting of α J in the GFA, as outlined in Section 2.3.

Figure 6 .
Figure 6.Glycosylation network for the GFA of immunoglobulin G in CHO-S.The enzyme names are abbreviated as follows: α-Mannosidase I and II (Man I/II), N-Acetylglucosaminyltransferase I and II (GnT I/II) and Fucosyltransferase (FucT), Galactosyltransferase (GalT) and Sialyltransferase (SiaT).The glycan labels are provided in Supplementary TableS2.

Figure 6 .
Figure 6.Glycosylation network for the GFA of immunoglobulin G in CHO-S.The enzyme names are abbreviated as follows: α-Mannosidase I and II (Man I/II), N-Acetylglucosaminyltransferase I and II (GnT I/II) and Fucosyltransferase (FucT), Galactosyltransferase (GalT) and Sialyltransferase (SiaT).The glycan labels are provided in Supplementary TableS2.

Figure 7 .
Figure 7. Predicted enzyme-specific factors.The activity of fluxes catalyzed by Man I (black), Man II (grey), GnT I (red), GnT II (orange) and FucT (green) remain relatively constant in all experiments.However, the fluxes catalyzed by GalT (purple) shows significant variation over the course of the four perfusion cell culture experiments (A), (B), (C) and (D).

Figure 7 .
Figure 7. Predicted enzyme-specific factors.The activity of fluxes catalyzed by Man I (black), Man II (grey), GnT I (red), GnT II (orange) and FucT (green) remain relatively constant in all experiments.However, the fluxes catalyzed by GalT (purple) shows significant variation over the course of the four perfusion cell culture experiments (A), (B), (C) and (D).

Figure 8 .
Figure 8. Ranking of predictors based on importance.The predictors are sorted in decreasing impurity gains as measured by the improvement in the split criterion.

Figure 8 .
Figure 8. Ranking of predictors based on importance.The predictors are sorted in decreasing impurity gains as measured by the improvement in the split criterion.