Hybrid Modeling for Simultaneous Prediction of Flux, Rejection Factor and Concentration in Two-Component Crossﬂow Ultraﬁltration

: Ultraﬁltration is a powerful method used in virtually every pharmaceutical bioprocess. Depending on the process stage, the product-to-impurity ratio di ﬀ ers. The impact of impurities on the process depends on various factors. Solely mechanistic models are currently not su ﬃ cient to entirely describe these complex interactions. We have established two hybrid models for predicting the ﬂux evolution, the protein rejection factor and two components’ concentration during crossﬂow ultraﬁltration. The hybrid models were compared to the standard mechanistic modeling approach based on the stagnant ﬁlm theory. The hybrid models accurately predicted the ﬂux and concentration over a wide range of process parameters and product-to-impurity ratios based on a minimum set of training experiments. Incorporating both components into the modeling approach was essential to yielding precise results. The stagnant ﬁlm model exhibited larger errors and no predictions regarding the impurity could be made, since it is based on the main product only. Further, the developed hybrid models exhibit excellent interpolation properties and enable both multi-step ahead ﬂux predictions as well as time-resolved impurity forecasts, which is considered to be a critical quality attribute in many bioprocesses. Therefore, the developed hybrid models present the basis for next generation bioprocessing when implemented as soft sensors for real-time monitoring of processes.


Introduction
Membrane separation is a unit operation used in virtually all bioprocesses. One prominent type, crossflow ultrafiltration, is widely used from cell harvest and virus clearance approaches to product concentration steps. In downstream processing of biopharmaceuticals, ultrafiltration (UF) is commonly applied for concentration and buffer exchange after the capture step. It is also applied after virus filtration in single-pass mode to concentrate the sample before it is loaded onto the polishing chromatography, or after polishing to reach the final conditions for product formulation [1]. These process steps entail varying ratios of process and impurities to product concentration.
Modeling of process steps is of increasing importance for bioprocesses. Such process models increase understanding of processes, facilitate the discovery of optimal process conditions and are indispensable for model predictive control. The latter is a cornerstone of Quality by Design and Process Analytical Technology, which is recommended by authorities for biopharmaceutical production. The right balance of model complexity and usability is crucial to employ such models effectively for different unit operations.
Product and impurity were mimicked with different ratios of bovine serum albumin (BSA) to lysozyme concentrations in the starting solution. BSA and lysozyme exhibit different physicochemical properties to facilitate separation and quantification. While BSA was fully retained by the membrane, lysozyme was only partially retained, rendering the predictions of the permeate flux over time even more complex. In a first assessment, we compared the abilities of the well-established mechanistic stagnant film model (SFM) and the recently established one-component hybrid to predict the filtration progress of a two-component solution. Further, we presented two hybrid model structures to predict the evolution of permeate flux and protein concentration of product and impurity by multi-step ahead predictions. One hybrid model included a static lysozyme rejection factor (R Lys ), while the other updated R Lys dynamically in an iterative way. These model outputs were influenced by the transmembrane pressure (TMP), crossflow velocity (CF), the initial BSA concentration c B,BSA and lysozyme bulk concentration c B,Lys . Finally, these novel hybrid model structures were compared to the SFM regarding flux and concentration prediction.

Equipment and Chemicals
All UF experiments were performed on an ÄKTA Crossflow system (Cytiva, Marlborough, MA, USA) controlled by UNICORN 5.31 software. The reservoir tank held up to 1100 mL of bulk solution. The system featured an inline pH probe and UV monitor on the permeate side and a pressure-based reservoir level sensor. The experiments were performed with a Sartocon Slice Hydrosart Cassette hydrophilic, stabilized cellulose-based membrane (Sartorius AG, Göttingen, Germany) with a membrane area of 200 cm 2 . The model proteins were BSA and lysozyme (A2153 and L6876, both purchased from Sigma-Aldrich, St. Louis, MO, USA). The molecular weight cutoff (MWCO) of the membranes was 30 kDa, chosen so that BSA (66 kDa) was fully retained and lysozyme (14 kDa) was partially retained. BSA and lysozyme were chosen to mimic the protein of interest and process-related impurities, respectively. A filtration buffer of 50 mM phosphate-buffered saline (PBS), pH 8, was used.

Training and Test Data Generation
For the training experiments, the bulk reservoir was filled with 1000 mL of the lowest bulk BSA and/or lysozyme concentration c B,BSA and c B,Lys (see Table A1). The following two steps were then alternated. First, the TMP and CF were increased stepwise, while the permeate was redirected to the feed reservoir to keep the protein concentration c B constant. For each combination of TMP and CF, the permeate flux was recorded. Second, the sample was concentrated until the next desired c B was reached. These two steps were repeated at all concentrations given in Table A1. A total of 90 equilibrium fluxes were recorded for different concentrations and combinations of TMP and CF. Our previous work with a one-component system [22] showed that this training set size was sufficient to develop a well-trained hybrid model with accurate flux predictions. A detailed summary of all scouted TMPs, CFs, c B s and recorded fluxes is given in Table A1. Samples were taken after each concentration step for offline measurement. A more detailed description of the methodology for the training experiment is given in an earlier publication [22].
During the test experiments, samples were taken from the retentate and permeate. The measured retentate and permeate concentrations were used to calculate the rejection factor R of the model proteins. A summary of the performed test sets is provided in Table A2.

Concentration Polarization Correction
When concentrating the sample throughout the training experiments, we observed that the measured c B,BSA was lower than the expected concentration calculated from permeate volume (V P ) and mass balance. The difference between observed and calculated concentration increased with concentration (see Figure A3B). This was because the concentration polarization (CP) layer-the protein gradient that forms on the surface of the membrane-increased with c B,BSA . This deviation was considered for the test experiments by employing a quadratic polynomial function (Equation A1) and used to correct the calculated c B,BSA .

Protein Quantification
BSA and lysozyme concentrations were determined with an analytical high-performance size-exclusion chromatography (SEC-HPLC) using a TSKgel G3000SWXL column (5 µm, 7.8 × 300 mm; TOSOH, Shiba, Tokyo, Japan). The separation was performed under isocratic conditions with 50 mM sodium phosphate, 200 mM NaCl, pH 6.5 as running buffer at a flow rate of 0.4 mL/min. Samples were diluted to a final concentration of 0.1 to 1.0 g/L using 50 mM PBS, pH 8 and filtered through a 0.22 µm Millex-GV Filter (Merck Millipore, Billerica, MA, USA) prior to analysis. The injection volume was 10 µL per sample. Due to the difference in the size of BSA and lysozyme, the peaks were fully separated and could be quantified independently, using standard calibrations from BSA and lysozyme stock solutions.

Black Box Model
The black box inside the first hybrid model (HM 1) aimed to predict the flux based on the combination of inputs parameters: TMP, CF and the bulk protein concentrations of BSA and lysozyme, c B,BSA and c B,Lys , respectively. In the second hybrid model (HM 2), an additional black box was employed to predict the rejection factor of lysozyme R Lys ( Figure 1B). An ANN was utilized for this purpose and optimized by varying the number of hidden nodes from 1 to 7. The ANN was set up with the feedforwardnet function and trained with the trainbr function, using MATLAB 2018b. A detailed description is the ANN structure and optimizer function is given in the Appendix A.

White Box Model
The white box model is the mechanistic part of the hybrid model and consisted of a mass balance. The incrementally decreasing bulk volume (dV B in Equation (1)) was derived from the permeate flux (J), which is the output of the black box, and the membrane area (A). The rejection factor R for component i was calculated with Equation (2), considering the concentration of i in both the retentate (c R ; in crossflow filtration c R is equal to c B ) and the permeate (c P ). Equation (1) and Equation (2) were used to predict c B of each component and Equation (3) to calculate the V B after dt.

Training and Test Data
R Lys was calculated from the training set with the UV absorbance at 280 nm on the permeate side. A separate lysozyme training run was performed to correlate the UV signal at 280 nm with the permeate concentration determined by SEC-HPLC. The correlation curve ( Figure A3A) with an R 2 of 0.97 was used to calculate c P,Lys , and subsequently R Lys for all observations of the training set was used to train the black box. The observed flux and R Lys were compared to the predictions of the hybrid models using the normalized root-mean-square error (NRMSE) where n is the number of overserved fluxes y i and the corresponding predicted fluxesŷ i . The normalization y max −y min allows a fair comparison of various fluxes due to different concentrations and process parameters.

Multistep-Ahead Hybrid Model
The structures of the investigated hybrid model are given in Figure 1. The first and simplest HM 1 ( Figure 1A) assumed a constant R Lys of 0.77 for all test sets based on the weighted average of all permeate and retentate concentrations samples taken throughout the training experiment. The weighted average considered the variation in c B,Lys and sample intervals using trapezoid rule integration. For the second HM 2 structure ( Figure 1B), the flux and R Lys were predicted separately, using two different black box models. The flux and R Lys were fed into the same white box model, which yielded the predicted c B,BSA and c B,Lys after a defined time increment. The developed hybrid model is capable of predicting multiple steps ahead, as depicted in Figure 1D. The multi-step ahead structure uses HM 1, HM 2 or the SFM to predict c B,BSA and c B,Lys for a time increment (dt). The concentrations of the first iteration were used to calculate future fluxes and c B s of the second iteration, and so on. Multiple iterations were performed until the desired stop criterion was reached. In our case, the stop criterion was the final retentate volume.
The presented hybrid models were used to predict the evolution of flux and R Lys throughout the UF process. Furthermore, the models yielded a prediction for the final c B,BSA and c B,Lys . The final c B,BSA and c B,Lys predictions were compared to the final c B,BSA and c B,Lys measured by SEC-HPLC. The model errors were compared using the NRMSE. Figure 2 shows a flowchart of the hybrid model methodology applied for crossflow filtration. Training experiments were performed by variations in the parameters that are expected to influence the flux. Following this, the model was trained on this training set with a defined experimental design space. The established models were applied to a validation data set that was not used for training. The model structure was optimized by varying the tuning parameters, e.g., number of nodes in an ANN and adding or removing training parameters. The model with the tuning parameters that led to the lowest error in the validation set was then applied to independent test runs with static process conditions.

Stagnant Film Theory
The presented hybrid models were compared to the established SFM. The SFM derives predictions from the mass transfer model described by convective transport toward the membrane and back-diffusion caused by the concentration gradient [23]. According to the SFM, the flux J is related to the bulk concentration c B of a single component by where c G is the gel layer concentration at the membrane surface and k is the mass transfer coefficient that depends on the diffusion coefficient and the thickness of the gel layer [23]. The SFM is valid in the pressure-independent region of the filtration. Since k and c G cannot be adjusted directly during the filtration, a correlation between the adjustable parameters TMP and CF, and k and c G was required. When plotting the flux versus log(c B ) for a constant TMP and CF, k and c G are estimated by the slope of linear regression and c G was estimated by extrapolating the regression line to the intersection with the abscissa ( Figure A6). It has been shown that this way of calculating k yields more accurate results than the Sherwood correlation [24][25][26] and more solid predictions compared to the osmotic pressure model [27] for similar settings. To compare the SFM to the hybrid models, the black box was replaced by the SFM in Equation (5) using the parameters k and c G instead of TMP and CF ( Figure 1C). In test runs, where the TMP and CF conditions were not covered in the training set, k and c G were estimated using linear interpolation.

Training Data Description
The data sets for training the hybrid models were generated from filtering BSA and lysozyme with a 30 kDa MWCO cellulose-based membrane (Hydrosart). A total of three training sets were generated covering three CFs (100, 200 and 300 mL/min) and five TMPs (0.8, 1. combined training set, the protein concentration of BSA c B, BSA ranged from 3.77 g/L to 77.93 g/L and of lysozyme c B, Lys from 0.28 g/L to 3.81 g/L. The concentration ranges for all training sets are summarized in Table A1. For a better comparison of Figure 3A-D, the x-axis of Figure 3C,D are reduced. The entire graphs are given in Figure A2. Generally, increasing bulk concentrations c B led to lower fluxes, while increasing TMP and CF led to higher fluxes in all training sets. This is in accordance with the underlying mechanisms: higher bulk concentrations lead to higher concentrations in the boundary layer and a more prominent effect of the back diffusion along the concentration gradient. An increased TMP leads to higher convective flow towards the membrane, but also to a faster accumulation of the protein at the boundary layer. High CF decreases the thickness of the concentration polarization layer by rectangular displacement. The training set obtained from experiments using only BSA exhibited higher fluxes then the two-component training set. Additionally, the flux decreased faster during filtration of the two-component solution ( Figure 3B) compared to the filtration of lysozyme only ( Figure 3D). This indicated an increased membrane resistance through the fouling effect on the Hydrosart membrane caused by lysozyme. Being smaller than the pores, lysozyme adsorbed at the inner pore channels [28,29] and reduced its diameter and subsequently the flux through the membrane and the membrane's selectivity.
The two-component training set ( Figure 3A,B) was used to train the black box of the hybrid models and to obtain the mechanistic model parameters k and c G . The data set with lysozyme solely ( Figure 3D) was used for two reasons: first, to investigate the effect of TMP and CF on the permeability of lysozyme and whether R Lys had to be recalculated for varying input parameters ( Figure A5); second, to correlate the permeate lysozyme concentration with the UV signal on the permeate side. This correlation was used to calculate R Lys (Equation (2)) for each observation of the combined training set ( Figure 3A,B), using solely the permeate UV signal. Another training experiment was performed with BSA solely ( Figure 3C). The observed fluxes and estimated SFM parameters k and c G were used to investigate model behavior and error when lysozyme was present in the test set but absent in the training set.

Comparison of the Hybrid Models to the Stagnant Film Theory
The optimal ANN structure in the hybrid models was determined by varying the number of hidden nodes from one to seven and recording the average error of 20 repetitions on the training set. The ANN with four hidden nodes yielded the lowest NRMSE for both HM 1 and HM 2, with an average of 3.4% NRMSE. Higher numbers of hidden nodes led to an error increase due to training set overfitting ( Figure A1).
With the SFM, the flux can only be modeled for a one-component system; no adaptations for a twoor multi-component system have been published in the literature so far. In the following, BSA was assumed to be the only component since its concentration was four to 46 times higher than lysozyme in the test runs (Table A2). The k and c G values of BSA, however, change in the presence of lysozyme. To allow a fair comparison between the hybrid models (which can incorporate multiple components as inputs) and the SFM, both sets of k and c G were evaluated. Both experiments were carried out with BSA alone. The combination of BSA with lysozyme was used for flux prediction and the results were compared to the prediction of the hybrid models.
The hybrid model trained solely on BSA ( Figure 4, red dotted line) and the SFM using k and c G based solely on BSA ( Figure 4, dark grey dot-dashed line) were able to predict a UF process with only BSA present ( Figure 4A, black line), but failed to predict the UF flux of BSA and lysozyme ( Figure 4B, black line). The latter failed due to membrane fouling by lysozyme and therefore the reduced flux and prolonged process times could not be described by any of these models. drastically changed the initial flux and flux evolution of the UF process and that incorporating both components in the model was essential for accurate predictions. On the contrary, SFM based on the training run containing BSA with lysozyme was also able to predict the two-component test run well ( Figure 4B, light grey dot-dot-dashed line), but showed a drastic offset when predicting a test run with only BSA ( Figure 4A, light grey dot-dot-dashed line). The k values from the two-component training set (Table A5) were generally lower than those calculated from solely BSA, since membrane fouling due to lysozyme was assumed. In the absence of lysozyme, however, no membrane fouling occurred and the flux for the same c B,BSA was higher.
In summary, the HM could predict both scenarios, since the varying concentration of lysozyme and its influence on the membrane fouling was integrated into the black box. However, the SFM only predicted one scenario well, depending on which k and c G were used. For the following two-component predictions, the SFM parameters were based on the two-component training set.

Comparison of Hybrid Model Performance
To further investigate both the interpolation and extrapolation capability of both HMs and the SFM model, a series of test runs were conducted under conditions that were partially not covered by the training sets. To test the hybrid models based on the two-component training set, additional test runs on BSA solutions spiked with lysozyme were performed. The two established hybrid model structures were compared for their R Lys , flux and final c B predictions individually. R Lys effects the in-process c B,Lys prediction and subsequently the flux and final c B,Lys . Additionally, the two hybrid models were compared to the SFM in terms of flux and c B,BSA prediction. c B,Lys, and R Lys could not be compared, since SFM can be applied to one-component only.
The test data consisted of nine UF runs performed at different TMP, CF, initial c B,BSA and c B,Lys conditions. Test runs 1−4 were performed within the training space. This meant that TMP and CF were within the training parameters ( Figure 5A, blue area) and the initial c B,BSA, and c B,Lys was higher than the initial training concentrations ( Figure 5B, blue area). The test runs 1, 2 and 9 were performed in the center of the TMP and CF training space ( Figure 5A), with test run 9 containing no lysozyme. Test run 3 was performed at the outer limit of the TMP and CF training space, to investigate how the predictions of the hybrid models changed at the border. Test run 4 was performed under TMP and CF conditions not covered by the training set but within the training space, to investigate the interpolation capabilities of the model. Test runs 5, 6, 7 and 8 were performed under conditions that were partially outside the training space, such as initial c B,Lys (8), initial c B,Lys (5, 6) and CF (7), to test the extrapolation capabilities. The test run parameters are summarized in Figure 5 and Table A2.

Flux Prediction
Regarding the prediction of the flux evolution, the two hybrid models performed similarly ( Figures 6A,C,E, 7A,C,E and A4A,C,E). Most test run predictions exhibited a small initial offset. At the beginning of the test experiments, the membrane was clean, while during the training set the membrane exhibited some lysozyme fouling and equilibrium of the concentration polarization layer due to the long training process time. This led to an initially underestimated flux. The offset became more pronounced when initial c B,Lys was higher than 0.3 g/L (test runs 2, 3, 4, 5 and 8; Figures 6C,E, 7A and A4A,C), indicating a stronger membrane fouling at this concentration. Even though all hybrid models were trained with c B,Lys higher than 0.3 g/L, the timely increasing membrane resistance due to fouling reached an equilibrium only after several minutes. After this point, the flux was predicted correctly. The highest initial offset was given in test run 8 ( Figure A4A), which exhibited the highest initial c B,Lys and therefore more fouling. Test run 7 ( Figure 7E) was performed at CF 350 mL/min, which was outside the training space. Both hybrid models predicted the flux of test run 7 ( Figure 7E) well, indicating that the models were not necessarily limited by the training space and showed good extrapolation capabilities of the input parameter CF. Test runs 4, 5 and 6 ( Figures 6C,E and 7C) exhibited TMPs and CFs within the training space parameters and all predicted well. The good flux predictions of these test runs showed the excellent interpolation capabilities of the ANN-aided hybrid models. The SFM predicted the initial flux and flux evolution inside the training space well (test runs 1, 3 and 4; Figures 6A,C, and A4C). However, for the test runs outside the training space, higher errors were exhibited (test runs 5, 6 and 8; Figures 6E, 7C and A4A). Outside the training space, k and c G were extrapolated from the training data, which potentially increased flux prediction uncertainty. Furthermore, high lysozyme concentrations also led to higher errors due to stronger fouling over time and not being able to incorporate the second component in the SFM. Here, the SFM underestimated the initial flux drastically (test runs 2 and 8; Figures 7A and A4A). For test run 9 ( Figure A4E)-only BSA, no lysozyme-the SFM with k and c G were exceptionally based on BSA training data ( Figure 3C) to allow fair comparison. In this case, the SFM yielded good initial flux predictions, but deviations at the end of the process, while HM 1 and 2 both showed excellent flux prediction over the entire process. On average, the flux prediction error of SFM was 6% NRMSE, while the error of the two hybrid models was 4.1% and 3.9% NRMSE ( Figure 8A).

Rejection Factor Prediction for Lysozyme
The rejection factor for lysozyme R Lys increased throughout the UF run, from around 0.6 to almost 1.0, as shown in Figures 6, 7 and A4. The pores became increasingly blocked throughout the UF process, most probably because lysozyme was absorbed in their inner wall, increasing the rejection factor. Results showed that there was no consistent correlation between the TMP and R Lys , or CF and R Lys ( Figure A5). Therefore, the influence of TMP and CF on c P,Lys was neglected when creating the calibration between UV absorbance and lysozyme permeate concentration. The rejection factor of BSA was 1 for all experiments. The model errors are given in Figure 8B. prediction over the entire process. On average, the flux prediction error of SFM was 6% NRMSE, while the error of the two hybrid models was 4.1% and 3.9% NRMSE ( Figure 8A).

Hybrid Model 1: Constant Lysozyme Rejection Factor
In HM 1 the rejection factor for lysozyme R Lys was assumed to be constant over time for all test runs, where lysozyme was present and therefore exhibited the largest R Lys error (38% NRMSE) compared to HM 2 (see Figure 8A). All test runs (Figures 6, 7 and A4) show that HM 1 overestimated R Lys at the beginning of all UF runs and underestimated it at the end. The average R Lys based on training data fitted all independently generated test data very well but lacked the ability to adjust to the increasing R Lys .

Hybrid Model 2: Dynamic Lysozyme Rejection Factor
In contrast to keeping the rejection factor constant, as in HM 1, a second black box was introduced in HM 2 to predict R Lys dynamically. This prediction was independent of the flux prediction but was based on the same four input parameters, namely TMP, CF, initial c B,BSA and initial c B,Lys . The NRMSE of the newly introduced black box was evaluated by comparing the observed R Lys values to the predicted R Lys . Since the correlation of R Lys and V P is quite simple, an ANN with one hidden node was used for R Lys prediction ( Figure A1C). For comparison, a multiple linear regression (MLR) model was also tested as an alternative black box, resulting in a less complex hybrid model that required less computation time and facilitated easier interpretability. However, the ANN with one node was chosen instead of the MLR, because of the lower prediction error regarding R Lys and final c B,Lys (see Table A4). HM 2, with an average R Lys NRMSE of 14%, performed better than HM 1. The improvement was achieved as HM 2 considered the increasing R Lys over the process, which subsequently strongly influenced the final c B,Lys prediction (Section 3.3.3). In test runs 5 and 6 ( Figures 6F and 7D) the prediction from HM 2 overestimated R Lys . These test runs exhibited a low TMP and high c B,BSA . The hybrid model assumed that the CP layer of BSA and fouling due to lysozyme were at an equilibrium, at which the lysozyme transmission was lower than in the test runs, where the CP layer was still building up. Low TMP additionally prolonged the time to reach flux steady state. R Lys of the other test runs 1, 2, 4, 7 and 8 ( Figures 6B,D, 7B,F and A4B) were predicted accurately with HM 2.
Even though HM 2 performed better than HM 1 in R Lys prediction, the flux predictions were almost identical (NRMSE 3.9 and 4.1 %). This indicated that they were not affected by small variations or changes in c B,Lys .

Endpoint Bulk Concentration
Since R BSA was 1, all models-HM and SFM-predicted the same c B,BSA at the final retentate volume, with an average error of 4.2% (Table A3). BSA did not show membrane fouling and was quantitatively recovered at the end of the process. The predictions of the final c B,Lys varied because of the different R Lys predictions. The discussion for c B,Lys prediction was divided into the test runs performed strictly inside and outside the training space, since the hybrid models performed differently.
Within the training space-test runs 1−4-HM 1 and HM 2 performed in accordance with the R Lys predictions ( Figure 8C). HM 1 exhibited the highest error of 9% since R Lys was not adjusted over the processing time. HM 2 recalculated R Lys with every iteration; its c B,Lys predictions were in good accordance with the measured concentrations, with an NRMSE of 4% and superior to HM 1. Similarly to the R Lys , the accuracy of the final c B,Lys prediction benefited from two separately trained black boxes.
In cases where at least one input parameter was outside of the training space-test runs 5−8-HM 1 performed best with an average NRMSE of 4% ( Figure 8D). In comparison, HM 2 yielded worse final c B,Lys predictions, exhibiting a three-fold increase in NRMSE (12%). Even though R Lys was updated in HM 2, it was overestimated throughout most of the test runs, leading to higher c B,Lys prediction and a cumulated NRMSE that increased with the duration of the process. In contrast, using HM 1 the initial R Lys over-prediction and under-prediction balanced out and yielded acceptable final c B,Lys predictions.
In summary, the more complex HM 2 showed superior performance within the trained space, which is the case for most modeling applications. It can predict the flux, R Lys and therefore the concentration, of both components at any time point of the process. For predictions outside the trained space, the simpler and more robust HM 1 performed better, giving accurate predictions on flux and the fully retained main component BSA. It can offer valuable insights when exploring parameter ranges if the desired optimal process conditions are not met in the trained space, before it is expanded and used to retrain new hybrid models.

Conclusions
UF modeling increases process understanding which is key for predicting process performance. The interactions of various components means that mechanistic modeling approaches for multi-component solutions might become very complex and require many experiments.
We developed and compared hybrid models to predict flux, rejection behavior and concentrations for UF of two-component solutions. The models were trained on training experiments that were generated in less than eight hours and tested on independently performed UF runs with varying product and impurity concentrations, TMPs and CFs. We showed that the hybrid model HM 2, with a dynamic impurity rejection factor containing two black boxes, exhibited the best predictions for impurity rejection behavior and final concentration within the trained parameter space and had excellent interpolation properties. The simpler HM 1 yielded stable predictions beyond the trained space, rendering it a valuable tool for extrapolation. Both hybrid models performed similarly well in predicting flux and mimicked product concentration. The SFM with mechanistic parameters exhibited higher flux prediction errors than both hybrid models and could not predict the lysozyme rejection factor and final concentration, since it can only assume a one-component system. Our results show that it is crucial to quantify and incorporate all components, including the impurities, to gain accurate and reliable process models. These variations can be included more easily in the hybrid model approach than in mechanistic models such as SFM, with low experimental effort and no mechanistic parameter adaption required.
A limitation of the presented models is the time-dependent fouling of the mimicked impurity at high initial concentrations. However, at the expected concentration ranges, e.g., after the chromatography capture step, the effect can be neglected. The proposed hybrid model structure can be used not only for the reliable prediction of final product concentrations, but also of the concentration of various quantifiable classes of impurities. Since impurities are a critical quality attribute (CQA) in many manufacturing bioprocesses, time-resolved concentration predictions help to better understand the process's outcome upfront. Furthermore, by taking adequate measures a potential batch rejection due to high impurity concentration can be avoided. The product and impurities can be measured with online sensors or correlated with offline analytics using soft sensors. In combination, with closed-loop process controllers, these hybrid models are a valuable tool for increased process understanding and advanced process control.

Conflicts of Interest:
The authors declare no conflict of interest.

. Neural Network Model Optimization
To choose the best-suited ANN structure, varying numbers of hidden nodes were tested. Each ANN was trained on the combined training set and validated. Figure A1A gives an overview of the optimal ANN structure including the inputs TMP, CF, c B,BSA , c B,Lys , and the output permeate flux (in HM 2 a second ANN with R Lys as output was added) with four hidden neurons. The input and output parameters were scaled between 0 and 1 before optimizing the ANN. This step is necessary to have the parameters on the same scale rendering them comparable. Each node in the hidden and output layer in Figure A1A forms a linear equation. As an example, the first hidden node x 21 is the sum of each multiplication of an input (TMP, CF, c B,BSA , c B,Lys ) and the corresponding weight (w 1 11 , w 1 21 , w 1 31 , w 1 41 ) multiplied with the bias (b 1 ) of the entire hidden layer.
x 21 = b 1 w 1 11 TMP scaled + w 1 21 CF scaled + w 1 31 c B,BSA,scaled + w 1 41 c B, Lys,scaled To determine the values for the weights and biases that result in the desired prediction the model is optimized in multiple epochs. As a first step, the weights and biases are randomly chosen and the first prediction with inputs from a given training set is performed. Since the weights and biases are not optimized the flux prediction will be of poor quality and the prediction error will be high. Using the desired output from the training set, the ANN is calculated backward which results in inputs parameters that fit the prediction. The error between the real and the backward calculated inputs is estimated and used to update the according to weights and biases. This optimization can be performed with different algorithms-in this publication we chose MATLAB's trainbr function which employs Bayesian regularization, which is an adaptation of the Levenberg-Marquardt optimizer and minimizes the squared errors and weights. Once the weights and biases are optimized, the ANN structure is defined and applied to the test sets and will predict the same results for a given set of inputs.
The presented ANN structure ( Figure A1A) was determined after screening a wide number of nodes in the hidden layer from one to seven, with the corresponding NRMSE being recorded and averaged. We chose an ANN with four hidden nodes for all hybrid model structures ( Figure A1B) because it exhibited the lowest average error and standard deviation. Structures with less than four nodes resulted in under-fitted models. More hidden nodes led to an over-fitting of the training data and higher prediction error and standard deviation. In HM 2 the R Lys black box ANN was optimized in the same way with 1 hidden node yielding the lowest error. The tested ANNs consisted of one hidden layer with a sigmoid activation function and linear activation functions in both the input and output layer. The inputs were normalized between 0 and 1.  Table A1 gives a summary of the measured c B,BSA, and c B,Lys for the three training sets containing BSA with lysozyme, only BSA and only lysozyme. Training set 2 and 3 consisted of two parts with different starting c B s to cover a wide c B space. For each c B the TMP and CF were increased stepwise in three minutes intervals. Afterward, c B was increased by concentrating the sample and removing the generated permeate from the reservoir.    Figure A2 gives the entire training data sets at CF 200 mL/min.  Figure A3 gives the correlation curve to calculate R Lys from the UV absorbance at 280 nm using Equation (2). The curve was calculated using the lysozyme training data (Table A1, Training set 2) and exhibits a correlation coefficient of 0.97.   Figure A5 showed that there was no consistent correlation between the TMP and R Lys , and CF and R Lys . In Table A4 two HM 2 models employing different black box types for R Lys prediction were compared. An ANN with one hidden node in one hidden layer performed better than MLR with linear and interaction terms when assessed for to final c B,Lys and R Lys predictions. Regarding flux prediction, both models perform equally.  Table A5 summarized the mass transfer coefficient k and gel concentration c G for flux prediction using the SFM. The SFM can be set up for a one-component solution only and since the BSA concentration was 4 to 46 times higher than the lysozyme concentration, BSA was chosen as the modeled component. k and c G for BSA were calculated for a BSA one-component solution (Table A5 left) and two-component solution (Table A5 right) containing BSA with lysozyme. k from two-component solution was generally lower than for one-component due to the fouling properties of lysozyme on the cellulose-based membrane, which reduced the transmembrane mass transfer.  Figure A6 shows an exemplary plot to graphically calculate k (negative slope) and c G (intercept with the abscissa): J = −k·c B + k·c G with −k (negative mass transfer rate) being the slope and k · c G being the intercept with the ordinate. The latter is divided by k, resulting in the gel layer concentration c G . At the lowest two TMPs (0.8 and 1.3 bar) the flux-ln(c B,BSA )-curve is not linear, since the is in the pressure-dependent region. In this case, only the points in the linear range were taken for calculating k and c G . For low TMPs and high CFs the flux is pressure dependent. In these cases, the SFM flux prediction will always overestimate the flux. The test runs, however, were performed at a TMP of 1.4 bar or higher, and therefore in the pressure independent flux region.