Next Article in Journal
Is Steam Explosion a Promising Pretreatment for Acid Hydrolysis of Lignocellulosic Biomass?
Previous Article in Journal
Numerical Simulation Analysis of Main Structural Parameters of Hydrocyclones on Oil-Gas Separation Effect
Article

Hybrid Modeling for Simultaneous Prediction of Flux, Rejection Factor and Concentration in Two-Component Crossflow Ultrafiltration

1
Department of Biotechnology, Institute of Bioprocess Science and Engineering, University of Natural Resources and Life Sciences, 1190 Vienna, Austria
2
Novasign GmbH, 1190 Vienna, Austria
*
Author to whom correspondence should be addressed.
Processes 2020, 8(12), 1625; https://doi.org/10.3390/pr8121625
Received: 23 November 2020 / Revised: 4 December 2020 / Accepted: 7 December 2020 / Published: 9 December 2020
(This article belongs to the Section Advanced Digital and Other Processes)

Abstract

Ultrafiltration is a powerful method used in virtually every pharmaceutical bioprocess. Depending on the process stage, the product-to-impurity ratio differs. The impact of impurities on the process depends on various factors. Solely mechanistic models are currently not sufficient to entirely describe these complex interactions. We have established two hybrid models for predicting the flux evolution, the protein rejection factor and two components’ concentration during crossflow ultrafiltration. The hybrid models were compared to the standard mechanistic modeling approach based on the stagnant film theory. The hybrid models accurately predicted the flux and concentration over a wide range of process parameters and product-to-impurity ratios based on a minimum set of training experiments. Incorporating both components into the modeling approach was essential to yielding precise results. The stagnant film model exhibited larger errors and no predictions regarding the impurity could be made, since it is based on the main product only. Further, the developed hybrid models exhibit excellent interpolation properties and enable both multi-step ahead flux predictions as well as time-resolved impurity forecasts, which is considered to be a critical quality attribute in many bioprocesses. Therefore, the developed hybrid models present the basis for next generation bioprocessing when implemented as soft sensors for real-time monitoring of processes.
Keywords: semi-parametric model; neural network; tangential flow filtration; downstream processing; advanced process monitoring semi-parametric model; neural network; tangential flow filtration; downstream processing; advanced process monitoring

1. Introduction

Membrane separation is a unit operation used in virtually all bioprocesses. One prominent type, crossflow ultrafiltration, is widely used from cell harvest and virus clearance approaches to product concentration steps. In downstream processing of biopharmaceuticals, ultrafiltration (UF) is commonly applied for concentration and buffer exchange after the capture step. It is also applied after virus filtration in single-pass mode to concentrate the sample before it is loaded onto the polishing chromatography, or after polishing to reach the final conditions for product formulation [1]. These process steps entail varying ratios of process and impurities to product concentration.
Modeling of process steps is of increasing importance for bioprocesses. Such process models increase understanding of processes, facilitate the discovery of optimal process conditions and are indispensable for model predictive control. The latter is a cornerstone of Quality by Design and Process Analytical Technology, which is recommended by authorities for biopharmaceutical production. The right balance of model complexity and usability is crucial to employ such models effectively for different unit operations.
To simplify the modeling of downstream processes, a common assumption is to reduce the overall sample composition down to a single target molecule. Coefficients and parameters used in mechanistic models, such as mass transfer models, are often approximated, taking only the target molecule into account. Such models may be limited if the sample contains high levels of impurity.
For some process steps, such as polishing chromatography [2] or ultra/diafiltration [3,4] before formulation, this assumption of one-component solutions is realistic, since the product is already of high purity at this process stage. For earlier process steps, however, this simplification deviates substantially from reality and can lead to erroneous models, e.g., for filtration steps after the capture step. Here, the neglected presence of host cell proteins [5], DNA [6], or protein aggregates [7] can strongly distort the prediction of the model, since effects like membrane fouling and interactions between the product and impurities are not considered. In more complex mechanistic models, if the impurities are well characterized, such effects can be considered. For example, for crossflow filtration, a hard sphere-based mixture model, including multiphase computational fluid dynamics and concentration polarization, was applied to a whey protein solution, leading to a permeate flux prediction error within 20% [8]. Other work has shown that mechanistic models of pore blockage and cake filtration can also predict filter fouling during virus filtration, as a function of the protein of interest, virus and membrane [9]. The initial and late stage of the filtration, however, was dominated by different mechanisms, rendering it difficult to build a valid model for the entire process. The influence of two components on (crossflow) UF was found to affect the process in different ways, from strong [10] to weak [11] to varying [5,12,13,14] protein-protein (or protein-membrane) interactions. To account for the highly different effects of all components on the process, the experimental part of data generation to estimate the parameters for mechanistic models might become very labor-intensive and the calculations rather complex. Further, if the overall behavior of the process changes because of varying concentrations of impurities, the assumptions of mechanistic models might not hold, to the detriment of the prediction.
One advantage of machine learning supported modeling approaches is that the effects of the impurity on flux and membrane fouling do not need to be fully quantified by the operator [15]. The quantification of these phenomena is performed by machine learning tools, such as an artificial neural network (ANN) [16]. Hybrid models combine the advantages of data-driven black box models (such as ANNs), correlating input with output variables (such as the concentration of impurity with the decrease in flux) with knowledge-based mechanistic models (white box models) derived from conservation of kinetic laws [17]. Hybrid models have been applied to bioprocesses for upstream [18] and downstream applications [19,20].
To compare the predictive power of a model concerning the training space, two terms are often used: interpolation and extrapolation. Interpolation allows the model to make predictions for parameters that lie within the range of training experiments. A model with good interpolation capabilities can make predictions with fewer training observations, since it is able to make reliable estimates of the spaces between the observations. A model with poor interpolation capabilities requires more granular coverage of the training space to make accurate predictions of test experiments. Extrapolation (also called range extrapolation) describes the extent to which a model can make predictions if the tested input parameters are outside the training space. A model with good extrapolation capabilities can make accurate predictions for parameters beyond the training space. A detailed explanation of interpolation and extrapolation in hybrid modeling is given in [21].
Recently, we have shown the benefits of hybrid modeling for the prediction of UF flux evolution. However, this previous model was only established for a one-component system [22]. In the present study, we extended the hybrid model to describe the impact of a modeled protein impurity on the decrease of the permeate flux over time in crossflow UF including the rejection behavior of the product and the impurity. This enables the operator to gain a more detailed understanding of the process. In addition, the impurity concentration is a critical quality attribute (CQA) in almost all manufacturing bioprocesses and if it is too high, the produced batch must be discarded. The presented hybrid models can predict the impurity concentration up front, and potentially minimizes the risk of batch rejection.
Product and impurity were mimicked with different ratios of bovine serum albumin (BSA) to lysozyme concentrations in the starting solution. BSA and lysozyme exhibit different physicochemical properties to facilitate separation and quantification. While BSA was fully retained by the membrane, lysozyme was only partially retained, rendering the predictions of the permeate flux over time even more complex. In a first assessment, we compared the abilities of the well-established mechanistic stagnant film model (SFM) and the recently established one-component hybrid to predict the filtration progress of a two-component solution. Further, we presented two hybrid model structures to predict the evolution of permeate flux and protein concentration of product and impurity by multi-step ahead predictions. One hybrid model included a static lysozyme rejection factor (RLys), while the other updated RLys dynamically in an iterative way. These model outputs were influenced by the transmembrane pressure (TMP), crossflow velocity (CF), the initial BSA concentration cB,BSA and lysozyme bulk concentration cB,Lys. Finally, these novel hybrid model structures were compared to the SFM regarding flux and concentration prediction.

2. Materials and Methods

2.1. Equipment and Chemicals

All UF experiments were performed on an ÄKTA Crossflow system (Cytiva, Marlborough, MA, USA) controlled by UNICORN 5.31 software. The reservoir tank held up to 1100 mL of bulk solution. The system featured an inline pH probe and UV monitor on the permeate side and a pressure-based reservoir level sensor. The experiments were performed with a Sartocon Slice Hydrosart Cassette hydrophilic, stabilized cellulose-based membrane (Sartorius AG, Göttingen, Germany) with a membrane area of 200 cm2. The model proteins were BSA and lysozyme (A2153 and L6876, both purchased from Sigma-Aldrich, St. Louis, MO, USA). The molecular weight cutoff (MWCO) of the membranes was 30 kDa, chosen so that BSA (66 kDa) was fully retained and lysozyme (14 kDa) was partially retained. BSA and lysozyme were chosen to mimic the protein of interest and process-related impurities, respectively. A filtration buffer of 50 mM phosphate-buffered saline (PBS), pH 8, was used.

2.2. Training and Test Data Generation

For the training experiments, the bulk reservoir was filled with 1000 mL of the lowest bulk BSA and/or lysozyme concentration cB,BSA and cB,Lys (see Table A1). The following two steps were then alternated. First, the TMP and CF were increased stepwise, while the permeate was redirected to the feed reservoir to keep the protein concentration cB constant. For each combination of TMP and CF, the permeate flux was recorded. Second, the sample was concentrated until the next desired cB was reached. These two steps were repeated at all concentrations given in Table A1. A total of 90 equilibrium fluxes were recorded for different concentrations and combinations of TMP and CF. Our previous work with a one-component system [22] showed that this training set size was sufficient to develop a well-trained hybrid model with accurate flux predictions. A detailed summary of all scouted TMPs, CFs, cBs and recorded fluxes is given in Table A1. Samples were taken after each concentration step for offline measurement. A more detailed description of the methodology for the training experiment is given in an earlier publication [22].
During the test experiments, samples were taken from the retentate and permeate. The measured retentate and permeate concentrations were used to calculate the rejection factor R of the model proteins. A summary of the performed test sets is provided in Table A2.

2.3. Concentration Polarization Correction

When concentrating the sample throughout the training experiments, we observed that the measured cB,BSA was lower than the expected concentration calculated from permeate volume (VP) and mass balance. The difference between observed and calculated concentration increased with concentration (see Figure A3B). This was because the concentration polarization (CP) layer—the protein gradient that forms on the surface of the membrane—increased with cB,BSA. This deviation was considered for the test experiments by employing a quadratic polynomial function (Equation A1) and used to correct the calculated cB,BSA.

2.4. Protein Quantification

BSA and lysozyme concentrations were determined with an analytical high-performance size-exclusion chromatography (SEC-HPLC) using a TSKgel G3000SWXL column (5 µm, 7.8 × 300 mm; TOSOH, Shiba, Tokyo, Japan). The separation was performed under isocratic conditions with 50 mM sodium phosphate, 200 mM NaCl, pH 6.5 as running buffer at a flow rate of 0.4 mL/min. Samples were diluted to a final concentration of 0.1 to 1.0 g/L using 50 mM PBS, pH 8 and filtered through a 0.22 μm Millex-GV Filter (Merck Millipore, Billerica, MA, USA) prior to analysis. The injection volume was 10 μL per sample. Due to the difference in the size of BSA and lysozyme, the peaks were fully separated and could be quantified independently, using standard calibrations from BSA and lysozyme stock solutions.

2.5. Hybrid Modeling

2.5.1. Black Box Model

The black box inside the first hybrid model (HM 1) aimed to predict the flux based on the combination of inputs parameters: TMP, CF and the bulk protein concentrations of BSA and lysozyme, cB,BSA and cB,Lys, respectively. In the second hybrid model (HM 2), an additional black box was employed to predict the rejection factor of lysozyme RLys (Figure 1B). An ANN was utilized for this purpose and optimized by varying the number of hidden nodes from 1 to 7. The ANN was set up with the feedforwardnet function and trained with the trainbr function, using MATLAB 2018b. A detailed description is the ANN structure and optimizer function is given in the Appendix A.

2.5.2. White Box Model

The white box model is the mechanistic part of the hybrid model and consisted of a mass balance. The incrementally decreasing bulk volume (dVB in Equation (1)) was derived from the permeate flux (J), which is the output of the black box, and the membrane area (A). The rejection factor R for component i was calculated with Equation (2), considering the concentration of i in both the retentate (cR; in crossflow filtration cR is equal to cB) and the permeate (cP). Equation (1) and Equation (2) were used to predict cB of each component and Equation (3) to calculate the VB after dt.
dV B dt = A · J
R i = 1 c P , i c R , i
d c B , i · V B dt = A · J · c B , i R i

2.5.3. Training and Test Data

RLys was calculated from the training set with the UV absorbance at 280 nm on the permeate side. A separate lysozyme training run was performed to correlate the UV signal at 280 nm with the permeate concentration determined by SEC-HPLC. The correlation curve (Figure A3A) with an R2 of 0.97 was used to calculate cP,Lys, and subsequently RLys for all observations of the training set was used to train the black box.
The observed flux and RLys were compared to the predictions of the hybrid models using the normalized root-mean-square error (NRMSE)
NRMSE = 100 · 1 n i = 1 n y i   y ^ i 2 y max y min
where n is the number of overserved fluxes yi and the corresponding predicted fluxes y ^ i . The normalization ymax−ymin allows a fair comparison of various fluxes due to different concentrations and process parameters.

2.5.4. Multistep-Ahead Hybrid Model

The structures of the investigated hybrid model are given in Figure 1. The first and simplest HM 1 (Figure 1A) assumed a constant RLys of 0.77 for all test sets based on the weighted average of all permeate and retentate concentrations samples taken throughout the training experiment. The weighted average considered the variation in cB,Lys and sample intervals using trapezoid rule integration. For the second HM 2 structure (Figure 1B), the flux and RLys were predicted separately, using two different black box models. The flux and RLys were fed into the same white box model, which yielded the predicted cB,BSA and cB,Lys after a defined time increment. The developed hybrid model is capable of predicting multiple steps ahead, as depicted in Figure 1D. The multi-step ahead structure uses HM 1, HM 2 or the SFM to predict cB,BSA and cB,Lys for a time increment (dt). The concentrations of the first iteration were used to calculate future fluxes and cBs of the second iteration, and so on. Multiple iterations were performed until the desired stop criterion was reached. In our case, the stop criterion was the final retentate volume.
The presented hybrid models were used to predict the evolution of flux and RLys throughout the UF process. Furthermore, the models yielded a prediction for the final cB,BSA and cB,Lys. The final cB,BSA and cB,Lys predictions were compared to the final cB,BSA and cB,Lys measured by SEC-HPLC. The model errors were compared using the NRMSE.
Figure 2 shows a flowchart of the hybrid model methodology applied for crossflow filtration. Training experiments were performed by variations in the parameters that are expected to influence the flux. Following this, the model was trained on this training set with a defined experimental design space. The established models were applied to a validation data set that was not used for training. The model structure was optimized by varying the tuning parameters, e.g., number of nodes in an ANN and adding or removing training parameters. The model with the tuning parameters that led to the lowest error in the validation set was then applied to independent test runs with static process conditions.

2.5.5. Stagnant Film Theory

The presented hybrid models were compared to the established SFM. The SFM derives predictions from the mass transfer model described by convective transport toward the membrane and back-diffusion caused by the concentration gradient [23]. According to the SFM, the flux J is related to the bulk concentration cB of a single component by
J = k · ln c G c B
where cG is the gel layer concentration at the membrane surface and k is the mass transfer coefficient that depends on the diffusion coefficient and the thickness of the gel layer [23]. The SFM is valid in the pressure-independent region of the filtration. Since k and cG cannot be adjusted directly during the filtration, a correlation between the adjustable parameters TMP and CF, and k and cG was required. When plotting the flux versus log(cB) for a constant TMP and CF, k and cG are estimated by the slope of linear regression and cG was estimated by extrapolating the regression line to the intersection with the abscissa (Figure A6). It has been shown that this way of calculating k yields more accurate results than the Sherwood correlation [24,25,26] and more solid predictions compared to the osmotic pressure model [27] for similar settings. To compare the SFM to the hybrid models, the black box was replaced by the SFM in Equation (5) using the parameters k and cG instead of TMP and CF (Figure 1C). In test runs, where the TMP and CF conditions were not covered in the training set, k and cG were estimated using linear interpolation.

3. Results and Discussion

3.1. Training Data Description

The data sets for training the hybrid models were generated from filtering BSA and lysozyme with a 30 kDa MWCO cellulose-based membrane (Hydrosart). A total of three training sets were generated covering three CFs (100, 200 and 300 mL/min) and five TMPs (0.8, 1.3, 1.8, 2.3 and 2.8 bar). The three training sets containing BSA, lysozyme and a combination of both are shown in Figure 3. In the combined training set, the protein concentration of BSA cB, BSA ranged from 3.77 g/L to 77.93 g/L and of lysozyme cB, Lys from 0.28 g/L to 3.81 g/L. The concentration ranges for all training sets are summarized in Table A1. For a better comparison of Figure 3A–D, the x-axis of Figure 3C,D are reduced. The entire graphs are given in Figure A2.
Generally, increasing bulk concentrations cB led to lower fluxes, while increasing TMP and CF led to higher fluxes in all training sets. This is in accordance with the underlying mechanisms: higher bulk concentrations lead to higher concentrations in the boundary layer and a more prominent effect of the back diffusion along the concentration gradient. An increased TMP leads to higher convective flow towards the membrane, but also to a faster accumulation of the protein at the boundary layer. High CF decreases the thickness of the concentration polarization layer by rectangular displacement. The training set obtained from experiments using only BSA exhibited higher fluxes then the two-component training set. Additionally, the flux decreased faster during filtration of the two-component solution (Figure 3B) compared to the filtration of lysozyme only (Figure 3D). This indicated an increased membrane resistance through the fouling effect on the Hydrosart membrane caused by lysozyme. Being smaller than the pores, lysozyme adsorbed at the inner pore channels [28,29] and reduced its diameter and subsequently the flux through the membrane and the membrane’s selectivity.
The two-component training set (Figure 3A,B) was used to train the black box of the hybrid models and to obtain the mechanistic model parameters k and cG. The data set with lysozyme solely (Figure 3D) was used for two reasons: first, to investigate the effect of TMP and CF on the permeability of lysozyme and whether RLys had to be recalculated for varying input parameters (Figure A5); second, to correlate the permeate lysozyme concentration with the UV signal on the permeate side. This correlation was used to calculate RLys (Equation (2)) for each observation of the combined training set (Figure 3A,B), using solely the permeate UV signal. Another training experiment was performed with BSA solely (Figure 3C). The observed fluxes and estimated SFM parameters k and cG were used to investigate model behavior and error when lysozyme was present in the test set but absent in the training set.

3.2. Comparison of the Hybrid Models to the Stagnant Film Theory

The optimal ANN structure in the hybrid models was determined by varying the number of hidden nodes from one to seven and recording the average error of 20 repetitions on the training set. The ANN with four hidden nodes yielded the lowest NRMSE for both HM 1 and HM 2, with an average of 3.4% NRMSE. Higher numbers of hidden nodes led to an error increase due to training set overfitting (Figure A1).
With the SFM, the flux can only be modeled for a one-component system; no adaptations for a two- or multi-component system have been published in the literature so far. In the following, BSA was assumed to be the only component since its concentration was four to 46 times higher than lysozyme in the test runs (Table A2). The k and cG values of BSA, however, change in the presence of lysozyme. To allow a fair comparison between the hybrid models (which can incorporate multiple components as inputs) and the SFM, both sets of k and cG were evaluated. Both experiments were carried out with BSA alone. The combination of BSA with lysozyme was used for flux prediction and the results were compared to the prediction of the hybrid models.
The hybrid model trained solely on BSA (Figure 4, red dotted line) and the SFM using k and cG based solely on BSA (Figure 4, dark grey dot-dashed line) were able to predict a UF process with only BSA present (Figure 4A, black line), but failed to predict the UF flux of BSA and lysozyme (Figure 4B, black line). The latter failed due to membrane fouling by lysozyme and therefore the reduced flux and prolonged process times could not be described by any of these models.
In contrast, the hybrid model trained with BSA (Figure 3C) and BSA with lysozyme (Figure 3A,B) training runs (Figure 4, blue dashed line) were able to predict both UF processes: BSA solely and BSA with lysozyme (Figure 4, black lines). These results showed that already low amounts of lysozyme drastically changed the initial flux and flux evolution of the UF process and that incorporating both components in the model was essential for accurate predictions. On the contrary, SFM based on the training run containing BSA with lysozyme was also able to predict the two-component test run well (Figure 4B, light grey dot-dot-dashed line), but showed a drastic offset when predicting a test run with only BSA (Figure 4A, light grey dot-dot-dashed line). The k values from the two-component training set (Table A5) were generally lower than those calculated from solely BSA, since membrane fouling due to lysozyme was assumed. In the absence of lysozyme, however, no membrane fouling occurred and the flux for the same cB,BSA was higher.
In summary, the HM could predict both scenarios, since the varying concentration of lysozyme and its influence on the membrane fouling was integrated into the black box. However, the SFM only predicted one scenario well, depending on which k and cG were used. For the following two-component predictions, the SFM parameters were based on the two-component training set.

3.3. Comparison of Hybrid Model Performance

To further investigate both the interpolation and extrapolation capability of both HMs and the SFM model, a series of test runs were conducted under conditions that were partially not covered by the training sets. To test the hybrid models based on the two-component training set, additional test runs on BSA solutions spiked with lysozyme were performed. The two established hybrid model structures were compared for their RLys, flux and final cB predictions individually. RLys effects the in-process cB,Lys prediction and subsequently the flux and final cB,Lys. Additionally, the two hybrid models were compared to the SFM in terms of flux and cB,BSA prediction. cB,Lys, and RLys could not be compared, since SFM can be applied to one-component only.
The test data consisted of nine UF runs performed at different TMP, CF, initial cB,BSA and cB,Lys conditions. Test runs 1−4 were performed within the training space. This meant that TMP and CF were within the training parameters (Figure 5A, blue area) and the initial cB,BSA, and cB,Lys was higher than the initial training concentrations (Figure 5B, blue area). The test runs 1, 2 and 9 were performed in the center of the TMP and CF training space (Figure 5A), with test run 9 containing no lysozyme. Test run 3 was performed at the outer limit of the TMP and CF training space, to investigate how the predictions of the hybrid models changed at the border. Test run 4 was performed under TMP and CF conditions not covered by the training set but within the training space, to investigate the interpolation capabilities of the model. Test runs 5, 6, 7 and 8 were performed under conditions that were partially outside the training space, such as initial cB,Lys (8), initial cB,Lys (5, 6) and CF (7), to test the extrapolation capabilities. The test run parameters are summarized in Figure 5 and Table A2.

3.3.1. Flux Prediction

Regarding the prediction of the flux evolution, the two hybrid models performed similarly (Figure 6A,C,E, Figure 7A,C,E and Figure A4A,C,E). Most test run predictions exhibited a small initial offset. At the beginning of the test experiments, the membrane was clean, while during the training set the membrane exhibited some lysozyme fouling and equilibrium of the concentration polarization layer due to the long training process time. This led to an initially underestimated flux. The offset became more pronounced when initial cB,Lys was higher than 0.3 g/L (test runs 2, 3, 4, 5 and 8; Figure 6C,E, Figure 7A and Figure A4A,C), indicating a stronger membrane fouling at this concentration. Even though all hybrid models were trained with cB,Lys higher than 0.3 g/L, the timely increasing membrane resistance due to fouling reached an equilibrium only after several minutes. After this point, the flux was predicted correctly. The highest initial offset was given in test run 8 (Figure A4A), which exhibited the highest initial cB,Lys and therefore more fouling. Test run 7 (Figure 7E) was performed at CF 350 mL/min, which was outside the training space. Both hybrid models predicted the flux of test run 7 (Figure 7E) well, indicating that the models were not necessarily limited by the training space and showed good extrapolation capabilities of the input parameter CF. Test runs 4, 5 and 6 (Figure 6C,E and Figure 7C) exhibited TMPs and CFs within the training space parameters and all predicted well. The good flux predictions of these test runs showed the excellent interpolation capabilities of the ANN-aided hybrid models.
The SFM predicted the initial flux and flux evolution inside the training space well (test runs 1, 3 and 4; Figure 6A,C, and Figure A4C). However, for the test runs outside the training space, higher errors were exhibited (test runs 5, 6 and 8; Figure 6E, Figure 7C and Figure A4A). Outside the training space, k and cG were extrapolated from the training data, which potentially increased flux prediction uncertainty. Furthermore, high lysozyme concentrations also led to higher errors due to stronger fouling over time and not being able to incorporate the second component in the SFM. Here, the SFM underestimated the initial flux drastically (test runs 2 and 8; Figure 7A and Figure A4A). For test run 9 (Figure A4E)—only BSA, no lysozyme—the SFM with k and cG were exceptionally based on BSA training data (Figure 3C) to allow fair comparison. In this case, the SFM yielded good initial flux predictions, but deviations at the end of the process, while HM 1 and 2 both showed excellent flux prediction over the entire process. On average, the flux prediction error of SFM was 6% NRMSE, while the error of the two hybrid models was 4.1% and 3.9% NRMSE (Figure 8A).

3.3.2. Rejection Factor Prediction for Lysozyme

The rejection factor for lysozyme RLys increased throughout the UF run, from around 0.6 to almost 1.0, as shown in Figure 6, Figure 7 and Figure A4. The pores became increasingly blocked throughout the UF process, most probably because lysozyme was absorbed in their inner wall, increasing the rejection factor. Results showed that there was no consistent correlation between the TMP and RLys, or CF and RLys (Figure A5). Therefore, the influence of TMP and CF on cP,Lys was neglected when creating the calibration between UV absorbance and lysozyme permeate concentration. The rejection factor of BSA was 1 for all experiments. The model errors are given in Figure 8B.

Hybrid Model 1: Constant Lysozyme Rejection Factor

In HM 1 the rejection factor for lysozyme RLys was assumed to be constant over time for all test runs, where lysozyme was present and therefore exhibited the largest RLys error (38% NRMSE) compared to HM 2 (see Figure 8A). All test runs (Figure 6, Figure 7 and Figure A4) show that HM 1 overestimated RLys at the beginning of all UF runs and underestimated it at the end. The average RLys based on training data fitted all independently generated test data very well but lacked the ability to adjust to the increasing RLys.

Hybrid Model 2: Dynamic Lysozyme Rejection Factor

In contrast to keeping the rejection factor constant, as in HM 1, a second black box was introduced in HM 2 to predict RLys dynamically. This prediction was independent of the flux prediction but was based on the same four input parameters, namely TMP, CF, initial cB,BSA and initial cB,Lys. The NRMSE of the newly introduced black box was evaluated by comparing the observed RLys values to the predicted RLys. Since the correlation of RLys and VP is quite simple, an ANN with one hidden node was used for RLys prediction (Figure A1C). For comparison, a multiple linear regression (MLR) model was also tested as an alternative black box, resulting in a less complex hybrid model that required less computation time and facilitated easier interpretability. However, the ANN with one node was chosen instead of the MLR, because of the lower prediction error regarding RLys and final cB,Lys (see Table A4).
HM 2, with an average RLys NRMSE of 14%, performed better than HM 1. The improvement was achieved as HM 2 considered the increasing RLys over the process, which subsequently strongly influenced the final cB,Lys prediction (Section 3.3.3). In test runs 5 and 6 (Figure 6F and Figure 7D) the prediction from HM 2 overestimated RLys. These test runs exhibited a low TMP and high cB,BSA. The hybrid model assumed that the CP layer of BSA and fouling due to lysozyme were at an equilibrium, at which the lysozyme transmission was lower than in the test runs, where the CP layer was still building up. Low TMP additionally prolonged the time to reach flux steady state. RLys of the other test runs 1, 2, 4, 7 and 8 (Figure 6B,D, Figure 7B,F and Figure A4B) were predicted accurately with HM 2.
Even though HM 2 performed better than HM 1 in RLys prediction, the flux predictions were almost identical (NRMSE 3.9 and 4.1 %). This indicated that they were not affected by small variations or changes in cB,Lys.

3.3.3. Endpoint Bulk Concentration

Since RBSA was 1, all models—HM and SFM—predicted the same cB,BSA at the final retentate volume, with an average error of 4.2% (Table A3). BSA did not show membrane fouling and was quantitatively recovered at the end of the process. The predictions of the final cB,Lys varied because of the different RLys predictions. The discussion for cB,Lys prediction was divided into the test runs performed strictly inside and outside the training space, since the hybrid models performed differently.
Within the training space—test runs 1−4—HM 1 and HM 2 performed in accordance with the RLys predictions (Figure 8C). HM 1 exhibited the highest error of 9% since RLys was not adjusted over the processing time. HM 2 recalculated RLys with every iteration; its cB,Lys predictions were in good accordance with the measured concentrations, with an NRMSE of 4% and superior to HM 1. Similarly to the RLys, the accuracy of the final cB,Lys prediction benefited from two separately trained black boxes.
In cases where at least one input parameter was outside of the training space—test runs 5−8—HM 1 performed best with an average NRMSE of 4% (Figure 8D). In comparison, HM 2 yielded worse final cB,Lys predictions, exhibiting a three-fold increase in NRMSE (12%). Even though RLys was updated in HM 2, it was overestimated throughout most of the test runs, leading to higher cB,Lys prediction and a cumulated NRMSE that increased with the duration of the process. In contrast, using HM 1 the initial RLys over-prediction and under-prediction balanced out and yielded acceptable final cB,Lys predictions.
In summary, the more complex HM 2 showed superior performance within the trained space, which is the case for most modeling applications. It can predict the flux, RLys and therefore the concentration, of both components at any time point of the process. For predictions outside the trained space, the simpler and more robust HM 1 performed better, giving accurate predictions on flux and the fully retained main component BSA. It can offer valuable insights when exploring parameter ranges if the desired optimal process conditions are not met in the trained space, before it is expanded and used to retrain new hybrid models.

4. Conclusions

UF modeling increases process understanding which is key for predicting process performance. The interactions of various components means that mechanistic modeling approaches for multi-component solutions might become very complex and require many experiments.
We developed and compared hybrid models to predict flux, rejection behavior and concentrations for UF of two-component solutions. The models were trained on training experiments that were generated in less than eight hours and tested on independently performed UF runs with varying product and impurity concentrations, TMPs and CFs. We showed that the hybrid model HM 2, with a dynamic impurity rejection factor containing two black boxes, exhibited the best predictions for impurity rejection behavior and final concentration within the trained parameter space and had excellent interpolation properties. The simpler HM 1 yielded stable predictions beyond the trained space, rendering it a valuable tool for extrapolation. Both hybrid models performed similarly well in predicting flux and mimicked product concentration. The SFM with mechanistic parameters exhibited higher flux prediction errors than both hybrid models and could not predict the lysozyme rejection factor and final concentration, since it can only assume a one-component system. Our results show that it is crucial to quantify and incorporate all components, including the impurities, to gain accurate and reliable process models. These variations can be included more easily in the hybrid model approach than in mechanistic models such as SFM, with low experimental effort and no mechanistic parameter adaption required.
A limitation of the presented models is the time-dependent fouling of the mimicked impurity at high initial concentrations. However, at the expected concentration ranges, e.g., after the chromatography capture step, the effect can be neglected.
The proposed hybrid model structure can be used not only for the reliable prediction of final product concentrations, but also of the concentration of various quantifiable classes of impurities. Since impurities are a critical quality attribute (CQA) in many manufacturing bioprocesses, time-resolved concentration predictions help to better understand the process’s outcome upfront. Furthermore, by taking adequate measures a potential batch rejection due to high impurity concentration can be avoided. The product and impurities can be measured with online sensors or correlated with offline analytics using soft sensors. In combination, with closed-loop process controllers, these hybrid models are a valuable tool for increased process understanding and advanced process control.

Author Contributions

Conceptualization, M.D.; methodology, M.K.; software, M.K. and I.B.-M.; validation, M.K. and I.B.M.; formal analysis, M.K.; investigation, M.K.; resources, A.D.; data curation, M.K. and I.B.-M.; writing—original draft preparation, M.K.; writing—review and editing, M.K., M.D., A.D.; visualization, M.K.; supervision, A.D. and M.D.; project administration, M.D.; funding acquisition, M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Austrian Research Promotion Agency (FFG), grant number 859219.

Conflicts of Interest

The authors declare no conflict of interest.

Symbols and Abbreviations

ANNartificial neural network
BSAbovine serum albumin
CFcross-flow velocity
CPconcentration polarization
HMhybrid model
MWCOmolecular weight cutoff
NRMSEnormalized root-mean-squared error
SECsize exclusion chromatography
SFMstagnant film model
TMPtransmembrane pressure
UFultrafiltration
Amembrane area [m2]
cBbulk concentration [g/L]
cGgel layer concentration [g/L]
cPpermeate concentration [g/L]
cRretentate concentration [g/L]
dttime increment [s]
Jpermeate flux [LMH] or [m/s]
kmass transfer coefficient [LMH]
RLyslysozyme retention coefficient [-]
VBbulk/reservoir volume [mL]
Vppermeate volume [mL]

Appendix A

Appendix A.1. Neural Network Model Optimization

To choose the best-suited ANN structure, varying numbers of hidden nodes were tested. Each ANN was trained on the combined training set and validated. Figure A1A gives an overview of the optimal ANN structure including the inputs TMP, CF, cB,BSA, cB,Lys, and the output permeate flux (in HM 2 a second ANN with RLys as output was added) with four hidden neurons. The input and output parameters were scaled between 0 and 1 before optimizing the ANN. This step is necessary to have the parameters on the same scale rendering them comparable. Each node in the hidden and output layer in Figure A1A forms a linear equation. As an example, the first hidden node x21 is the sum of each multiplication of an input (TMP, CF, cB,BSA, cB,Lys) and the corresponding weight (w111, w121, w131, w141) multiplied with the bias (b1) of the entire hidden layer.
x 21 = b 1 w 11 1 TMP scaled + w 21 1 CF scaled + w 31 1 c B , BSA , scaled + w 41 1 c B ,   Lys , scaled
To determine the values for the weights and biases that result in the desired prediction the model is optimized in multiple epochs. As a first step, the weights and biases are randomly chosen and the first prediction with inputs from a given training set is performed. Since the weights and biases are not optimized the flux prediction will be of poor quality and the prediction error will be high. Using the desired output from the training set, the ANN is calculated backward which results in inputs parameters that fit the prediction. The error between the real and the backward calculated inputs is estimated and used to update the according to weights and biases. This optimization can be performed with different algorithms—in this publication we chose MATLAB’s trainbr function which employs Bayesian regularization, which is an adaptation of the Levenberg-Marquardt optimizer and minimizes the squared errors and weights. Once the weights and biases are optimized, the ANN structure is defined and applied to the test sets and will predict the same results for a given set of inputs.
The presented ANN structure (Figure A1A) was determined after screening a wide number of nodes in the hidden layer from one to seven, with the corresponding NRMSE being recorded and averaged. We chose an ANN with four hidden nodes for all hybrid model structures (Figure A1B) because it exhibited the lowest average error and standard deviation. Structures with less than four nodes resulted in under-fitted models. More hidden nodes led to an over-fitting of the training data and higher prediction error and standard deviation. In HM 2 the RLys black box ANN was optimized in the same way with 1 hidden node yielding the lowest error. The tested ANNs consisted of one hidden layer with a sigmoid activation function and linear activation functions in both the input and output layer. The inputs were normalized between 0 and 1.
Figure A1. (A) Structure of the ANN including input, output and hidden layer, number of hidden nodes and activation functions. (B,C) Optimization of the ANN structure. (B) The NRMSE is plotted over the number of neurons in the hidden layer of the ANN of the permeate flux predicting black box. (C) The NRMSE of the RLys black box HM 2.
Figure A1. (A) Structure of the ANN including input, output and hidden layer, number of hidden nodes and activation functions. (B,C) Optimization of the ANN structure. (B) The NRMSE is plotted over the number of neurons in the hidden layer of the ANN of the permeate flux predicting black box. (C) The NRMSE of the RLys black box HM 2.
Processes 08 01625 g0a1

Appendix A.2. Experimental Data Summary

Table A1 gives a summary of the measured cB,BSA, and cB,Lys for the three training sets containing BSA with lysozyme, only BSA and only lysozyme. Training set 2 and 3 consisted of two parts with different starting cBs to cover a wide cB space. For each cB the TMP and CF were increased stepwise in three minutes intervals. Afterward, cB was increased by concentrating the sample and removing the generated permeate from the reservoir.
Table A1. Summary of all training data including measured cB,BSA, cB,Lys, and observed flux.
Table A1. Summary of all training data including measured cB,BSA, cB,Lys, and observed flux.
Training SetObservationcB,Lys [g/L]cB,BSA [g/L]Flux at TMP 0.8 bar [LMH]Flux at TMP 1.3 bar [LMH]Flux at TMP 1.8 bar [LMH]Flux at TMP 2.3 bar [LMH]Flux at TMP 2.8 bar [LMH]
11000.283.7787.6113.3121.0121.5119.1
0.58.4878.093.396.395.693.4
0.7614.3870.180.581.680.077.7
1.5124.6860.767.567.565.963.9
2.348.7246.850.750.148.647.1
3.8177.9334.036.636.235.134.0
2000.283.7792.9136.3157.6164.5164.2
0.58.4889.3121.2131.9132.7130.3
0.7614.3882.8106.5112.3111108.3
1.5124.6874.291.193.591.688.8
2.348.7259.268.669.267.164.7
3.8177.9343.349.349.447.846.2
3000.283.7792.0143.4178.3194.8199.8
0.58.4890.2133.8155.6161.5161.1
0.7614.3885.6121.5135.2136.9134.7
1.5124.6878.8106.2114.0113.4110.7
2.348.7265.182.285.183.280.6
3.8177.9348.459.260.458.956.9
2a1003.19-97.6150.1195.1232.6260.6
4.73-97.9146.5186.1218.2244.8
8.11-97.6144.7181.1209.5235.8
11.99-96.8142.2176.8205.2231.8
23.79-95.0137.5171.0199.1221.8
32.95-85.7124.9152.3172.4187.2
2000.39-98.3151.9196.6233.6264.2
0.44-99150.8192.2224.3252.4
0.53-98.3148.7187.9218.9244.8
0.72-98.3146.9184.7215.3241.6
0.9-96.8143.3178.9208.4232.9
1.48-89.3128.9160.9186.5204.1
3003.19-97.2152.28198.36236.52267.84
4.73-97.56150.84195.48229.68258.12
8.11-96.84149.4192.24225252.36
11.99-96.84148.32189.36222.12249.84
23.79-96.12145.08183.6215.28240.84
32.95-86.76129.96165.6192.6212.76
2b1003.19-86.0113.8121.0115.6105.5
4.73-67.381.084.282.479.9
8.11-59.869.871.670.668.8
11.99-55.162.663.461.960.5
23.79-45.750.049.747.946.8
32.95-36.037.836.435.334.2
2003.19-88.2121.7135.7137.9134.3
4.73-74.596.1105.8108.4108.0
8.11-67.785.792.994.393.6
11.99-63.078.583.984.282.8
23.79-54.064.867.065.563.4
32.95-43.950.049.747.946.8
3003.19-85.0121.0140.0148.7151.2
4.73-74.9102.6117.7125.6128.5
8.11-69.593.2105.8111.6113.0
11.99-65.587.597.6100.8100.8
23.79-57.273.879.279.678.1
32.95-50.062.664.8--
3a100-3.6590.8119.6131.9136.5138.3
-7.4584.9106.8114.8117.4117.2
-14.6576.692.497.097.396.4
-20.0668.282.185.585.183.3
-24.467.278.782.082.080.8
-44.7853.261.463.262.961.6
200-3.6597.9144171.7185.4191.5
-7.4596.1135154.4162164.5
-14.6590.4119.9132.1135.7135
-20.0681.4108.4118.4119.9118.1
-24.482.1104.8113114.5113
-44.7867.382.486.886.885
300-3.6598.4151.4190.8214.9228.5
-7.4598.6147.2178.4194.4201.0
-14.6593.8134.4155.9164.9167.2
-20.0687.5124.9143.4149.2149.0
-24.487.6120.8136.0141.1141.3
-44.7874.697.9106.1107.9106.8
3b100-70.1240.746.447.346.645.5
-80.1539.445.546.946.645.8
-131.6924.028.829.929.629.0
-179.9513.618.119.619.719.7
-231.676.310.913.113.813.7
-277.250.05.57.78.89.4
200-70.1253.664.166.265.563.4
-80.1550.861.263.763.462.6
-131.6931.838.940.34039.2
-179.9517.323.525.72625.7
-231.677.213.816.317.417.5
-277.255.99.410.8--
300-70.1262.679.083.382.663.5
-80.1558.273.378.078.362.5
-131.6937.146.849.549.339.2
-179.9519.527.530.330.925.7
-231.67----17.5
-277.25-----
Table A2 summarized the parameters chosen for the test experiments: TMP, CF, initial cB, and measured final cB of BSA and lysozyme.
Table A2. Summary of the test sets with varying initial cBs, CF, and TMP.
Table A2. Summary of the test sets with varying initial cBs, CF, and TMP.
Test Set NumberTMP [bar]CF [mL/min]Initial cB,BSA [g/L]Final
cB,BSA [g/L]
Initial cB,Lys [g/L]Final cB,Lys [g/L]
11.82004.0078.110.284.36
21.82003.7962.480.506.16
32.83003.8254.950.324.35
42.52804.5697.590.283.52
51.62305.97132.810.151.96
61.42708.80162.450.192.79
72.03503.6273.620.344.65
81.82602.3845.420.576.82
91.82006.68132.700.000.00
Figure A2 gives the entire training data sets at CF 200 mL/min.
Figure A2. Training data sets including different protein concentrations, TMP at CF 200 mL/min. (A,B) Multi-component training set containing BSA and lysozyme (blue) in the same solution and one component solution of (C) BSA (red) and (D) lysozyme (green).
Figure A2. Training data sets including different protein concentrations, TMP at CF 200 mL/min. (A,B) Multi-component training set containing BSA and lysozyme (blue) in the same solution and one component solution of (C) BSA (red) and (D) lysozyme (green).
Processes 08 01625 g0a2
Figure A3 gives the correlation curve to calculate RLys from the UV absorbance at 280 nm using Equation (2). The curve was calculated using the lysozyme training data (Table A1, Training set 2) and exhibits a correlation coefficient of 0.97.
Figure A3. (A) Calibration of UV absorbance at 280 nm on the permeate side of the membrane versus cP,Lys of the lysozyme training set measured with HP-SEC (R2 = 0.97). (B) Deviations between calculated and measured cB,BSA with an R2 of 0.9997, including the identity line (dotted line). The concentration steps were performed at TMP 1.8 bar and CF 200 mL/min.
Figure A3. (A) Calibration of UV absorbance at 280 nm on the permeate side of the membrane versus cP,Lys of the lysozyme training set measured with HP-SEC (R2 = 0.97). (B) Deviations between calculated and measured cB,BSA with an R2 of 0.9997, including the identity line (dotted line). The concentration steps were performed at TMP 1.8 bar and CF 200 mL/min.
Processes 08 01625 g0a3
The applied polynomial function to correct for the CP layer was:
cmeas,B,BSA = −0.0012 · ccalc,B,BSA2 + 1.0063 · ccalc,B,BSA
This function is specific to the protein and the membrane and must be adapted for new protein-membrane combinations.

Appendix A.3. Further Modeling Results

Figure A4 gives the flux predictions of all three model structures for test runs 8, 3, and 9.
Figure A4. Comparison between observed and predicted flux and RLys. (A) The flux of test run 8 over time, (B) RLys over permeate volume of test run 8, (C) the flux over time of test run 3, (D) RLys over permeate volume of test run 3, (E) the flux over time of test run 9.
Figure A4. Comparison between observed and predicted flux and RLys. (A) The flux of test run 8 over time, (B) RLys over permeate volume of test run 8, (C) the flux over time of test run 3, (D) RLys over permeate volume of test run 3, (E) the flux over time of test run 9.
Processes 08 01625 g0a4
Table A3 contains the prediction errors of HM 1 and 2 regarding flux, RLys, and final cB,BSA, and cB,Lys prediction. The cB,BSA predictions are identical for both hybrid models because RBSA is 1. Test run 9 contained no lysozyme, therefore no RLys and cB,Lys error was calculated. The HM 2 NRMSE for final cB,Lys show clear differences between test run 1−4 (input parameters inside the trained space) and test run 5−8 (at least one input parameter outside the trained space).
Table A3. Summary of the test sets with varying initial cBs, CF, and TMP, and the number of samples taken to calculate the observed R.
Table A3. Summary of the test sets with varying initial cBs, CF, and TMP, and the number of samples taken to calculate the observed R.
Test Set NumberNRMSE Flux [%]NRMSE RLys [%]NRMSE final cB,Lys [%]NRMSE Final cB,BSA [%]
HM 1HM 2SFMHM 1HM 2HM 1HM 2HM 1HM 2SFM
12.3 ± 0.32.1 ± 0.35.332.5 ± 0.06.2 ± 0.27.4 ± 0.04.1 ± 0.04.64.64.6
24.3 ± 0.24.1 ± 0.17.239.7 ± 0.014.7 ± 2.27.9 ± 0.04.9 ± 0.33.83.83.8
33.6 ± 0.23.2 ± 0.23.040.3 ± 0.020.4 ± 0.513.0 ± 0.04.3 ± 0.02.22.22.2
43.9 ± 0.33.7 ± 0.25.034.9 ± 0.06.9 ± 0.66.7 ± 0.03.4 ± 0.20.70.70.7
54.8 ± 0.34.9 ± 0.48.532.6 ± 0.025.7 ± 1.33.2 ± 0.011.9 ± 0.58.78.78.7
63.4 ± 0.33.3 ± 0.59.345.3 ± 0.024.5 ±1.37.6 ± 0.012.1 ± 0.40.10.10.1
74.7 ± 0.34.5 ± 0.44.440.6 ± 0.06.2 ±0.73.9 ± 0.09.4 ± 0.36.66.66.6
88.2 ± 0.28.0 ± 0.36.335.6 ± 0.010.0 ±2.01.9 ± 0.013.0 ± 1.67.27.27.2
91.7 ± 0.51.7 ± 0.44.90.00.00.00.04.64.64.6
Figure A5 showed that there was no consistent correlation between the TMP and RLys, and CF and RLys.
Figure A5. RLys and CF 100, 200, and 300 mL/min versus TMP for increasing bulk concentrations (A) 3.19 g/L, (B) 4.73 g/L, (C) 8.11 g/L, (D) 11.99 g/L, and (E) 23.79 g/L of the lysozyme training set.
Figure A5. RLys and CF 100, 200, and 300 mL/min versus TMP for increasing bulk concentrations (A) 3.19 g/L, (B) 4.73 g/L, (C) 8.11 g/L, (D) 11.99 g/L, and (E) 23.79 g/L of the lysozyme training set.
Processes 08 01625 g0a5
In Table A4 two HM 2 models employing different black box types for RLys prediction were compared. An ANN with one hidden node in one hidden layer performed better than MLR with linear and interaction terms when assessed for to final cB,Lys and RLys predictions. Regarding flux prediction, both models perform equally.
Table A4. HM 2 black box for RLys prediction. Comparison between an ANN with one hidden node and MLR regarding, final cB,Lys, RLys and flux prediction error.
Table A4. HM 2 black box for RLys prediction. Comparison between an ANN with one hidden node and MLR regarding, final cB,Lys, RLys and flux prediction error.
Test Run NumberNRMSE Final cB,Lys [%]NRMSE RLys [%]NRMSE Flux [%]
1-node ANNMLR1-node ANNMLR1-node ANNMLR
14.15.76.29.92.12.3
24.96.414.726.94.13.9
34.34.320.420.93.23.3
43.44.76.921.13.73.8
511.96.425.734.54.95.1
612.111.624.569.43.33.5
79.416.16.220.04.54.4
813.038.410.055.58.07.5
Average7.911.714.332.33.94.0
Table A5 summarized the mass transfer coefficient k and gel concentration cG for flux prediction using the SFM. The SFM can be set up for a one-component solution only and since the BSA concentration was 4 to 46 times higher than the lysozyme concentration, BSA was chosen as the modeled component. k and cG for BSA were calculated for a BSA one-component solution (Table A5 left) and two-component solution (Table A5 right) containing BSA with lysozyme. k from two-component solution was generally lower than for one-component due to the fouling properties of lysozyme on the cellulose-based membrane, which reduced the transmembrane mass transfer.
Table A5. Mass transfer coefficient k and gel concentration cG for SFM based on BSA (left) and BSA with lysozyme (right).
Table A5. Mass transfer coefficient k and gel concentration cG for SFM based on BSA (left) and BSA with lysozyme (right).
k based on BSAk based on BSA with lysozyme
Feedflow [mL/min] Feedflow [mL/min]
100200300100200300
TMP [bar]0.847.3636.2331.63TMP [bar]0.817.6133.8414.03
1.354.6741.9732.331.325.0336.1227.92
1.854.5142.1430.131.827.5936.7539.06
2.353.4041.6328.992.328.1138.2244.73
2.853.7542.9727.952.827.7038.5846.84
cG based on BSAcG based on BSA with lysozyme
Feedflow [mL/min] Feedflow [mL/min]
100200300100200300
TMP [bar]0.8277.83303.41279.25TMP [bar]0.8665.40280.124421.42
1.3302.79330.45323.301.3355.06312.56887.24
1.8322.39345.88355.301.8288.49277.52419.22
2.3332.29353.74369.462.3264.69273.21304.56
2.8323.99327.45378.282.8256.72252.80263.97
Figure A6 shows an exemplary plot to graphically calculate k (negative slope) and cG (intercept with the abscissa):
J = k · c B + k · c G
with −k (negative mass transfer rate) being the slope and k · cG being the intercept with the ordinate. The latter is divided by k, resulting in the gel layer concentration cG.
At the lowest two TMPs (0.8 and 1.3 bar) the flux-ln(cB,BSA)-curve is not linear, since the is in the pressure-dependent region. In this case, only the points in the linear range were taken for calculating k and cG. For low TMPs and high CFs the flux is pressure dependent. In these cases, the SFM flux prediction will always overestimate the flux. The test runs, however, were performed at a TMP of 1.4 bar or higher, and therefore in the pressure independent flux region.
Figure A6. Permeate flux vs. logarithmic cB,BSA to estimate the mass transfer coefficient k and gel concentration cG for the stagnant film theory (SFM). Data recorded at CF 200 mL/min.
Figure A6. Permeate flux vs. logarithmic cB,BSA to estimate the mass transfer coefficient k and gel concentration cG for the stagnant film theory (SFM). Data recorded at CF 200 mL/min.
Processes 08 01625 g0a6

References

  1. Kelley, B. Downstream Processing of monoclonal Antibodies: Current practice and future opportunities. In Process Scale Purification of Antibodies; John Wiley & Sons: Hoboken, NJ, USA, 2017; pp. 1–21. [Google Scholar]
  2. Benner, S.W.; Welsh, J.P.; Rauscher, M.A.; Pollard, J.M. Prediction of lab and manufacturing scale chromatography performance using mini-columns and mechanistic modeling. J. Chromatogr. A 2019, 1593, 54–62. [Google Scholar] [CrossRef] [PubMed]
  3. Hebbi, V.; Roy, S.; Rathore, A.S.; Shukla, A. Modeling and prediction of excipient and pH drifts during ultrafiltration/diafiltration of monoclonal antibody biotherapeutic for high concentration formulations. Sep. Purif. Technol. 2020, 238, 116392. [Google Scholar] [CrossRef]
  4. Baek, Y.; Singh, N.; Arunkumar, A.; Zydney, A. Ultrafiltration behavior of an Fc-fusion protein: Filtrate flux data and modeling. J. Membr. Sci. 2017, 528, 171–177. [Google Scholar] [CrossRef]
  5. Berg, G.V.D.; Smolders, C. Concentration polarization phenomena during dead-end ultrafiltration of protein mixtures. The influence of solute-solute interactions. J. Membr. Sci. 1989, 47, 1–24. [Google Scholar] [CrossRef]
  6. Borujeni, E.E.; Zydney, A.L. Membrane fouling during ultrafiltration of plasmid DNA through semipermeable membranes. J. Membr. Sci. 2014, 450, 189–196. [Google Scholar] [CrossRef]
  7. Jim, K.; Fane, A.; Fell, C.; Joy, D. Fouling mechanisms of membranes during protein ultrafiltration. J. Membr. Sci. 1992, 68, 79–91. [Google Scholar] [CrossRef]
  8. Haribabu, M.; Dunstan, D.E.; Martin, G.J.; Davidson, M.R.; Harvie, D.J.E. Simulating the ultrafiltration of whey proteins isolate using a mixture model. J. Membr. Sci. 2020, 613, 118388. [Google Scholar] [CrossRef]
  9. Namila, F.; Zhang, D.; Traylor, S.; Nguyen, T.; Singh, N.; Wickramasinghe, R.; Qian, X.; Wickramasinghe, S.R. The effects of buffer condition on the fouling behavior of MVM virus filtration of an Fc-fusion protein. Biotechnol. Bioeng. 2019, 116, 2621–2631. [Google Scholar] [CrossRef]
  10. Saksena, S.; Zydney, A.L. Influence of protein-protein interactions on bulk mass transport during ultrafiltration. J. Membr. Sci. 1997, 125, 93–108. [Google Scholar] [CrossRef]
  11. Matsuyama, H.; Shimomura, T.; Teramoto, M. Formation and characteristics of dynamic membrane for ultrafiltration of protein in binary protein system. J. Membr. Sci. 1994, 92, 107–115. [Google Scholar] [CrossRef]
  12. Iritani, E.; Mukai, Y.; Murase, T. Separation of binary protein mixtures by ultrafiltration. Filtr. Sep. 1997, 34, 967–973. [Google Scholar] [CrossRef]
  13. Teng, M.-Y.; Lin, S.-H.; Wu, C.-Y.; Juang, R.-S. Factors affecting selective rejection of proteins within a binary mixture during cross-flow ultrafiltration. J. Membr. Sci. 2006, 281, 103–110. [Google Scholar] [CrossRef]
  14. Müller, C.; Agarwal, G.; Melin, T.; Wintgens, T. Study of ultrafiltration of a single and binary protein solution in a thin spiral channel module. J. Membr. Sci. 2003, 227, 51–69. [Google Scholar] [CrossRef]
  15. Chen, H.; Kim, A.S. Prediction of permeate flux decline in crossflow membrane filtration of colloidal suspension: A radial basis function neural network approach. Desalination 2006, 192, 415–428. [Google Scholar] [CrossRef]
  16. Melcher, M.; Scharl, T.; Spangl, B.; Luchner, M.; Cserjan, M.; Bayer, K.; Leisch, F.; Striedner, G. The potential of random forest and neural networks for biomass and recombinant protein modeling in Escherichia coli fed-batch fermentations. Biotechnol. J. 2015, 10, 1770–1782, 2015. [Google Scholar] [CrossRef] [PubMed]
  17. Glassey, J.; Von Stosch, M. Hybrid Modeling in Process Industries, 1st ed.; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
  18. Bayer, B.; von Stosch, M.; Striedner, G.; Duerkop, M. Comparison of Modeling Methods for DoE-Based Holistic Upstream Process Characterization. Biotechnol. J. 2020, 15, 1900551. [Google Scholar] [CrossRef] [PubMed]
  19. Díaz, V.H.G.; Prado-Rubio, O.A.; Willis, M.J.; Von Stosch, M. Dynamic Hybrid Model for Ultrafiltration Membrane Processes; Elsevier: Amsterdam, The Netherlands, 2017; Volume 40, pp. 193–198. [Google Scholar]
  20. Chew, C.M.; Aroua, M.K.T.; Hussain, M.A. A practical hybrid modelling approach for the prediction of potential fouling parameters in ultrafiltration membrane water treatment plant. J. Ind. Eng. Chem. 2017, 45, 145–155. [Google Scholar] [CrossRef]
  21. Van Can, H.J.L.; Braake, H.A.B.T.; Dubbelman, S.; Hellinga, C.; Luyben, K.C.A.M.; Heijnen, J.J. Understanding and applying the extrapolation properties of serial gray-box models. AIChE J. 1998, 44, 1071–1089. [Google Scholar] [CrossRef]
  22. Krippl, M.; Dürauer, A.; Duerkop, M. Hybrid modeling of cross-flow filtration: Predicting the flux evolution and duration of ultrafiltration processes. Sep. Purif. Technol. 2020, 248, 117064. [Google Scholar] [CrossRef]
  23. Zydney, A.L. Stagnant film model for concentration polarization in membrane systems. J. Membr. Sci. 1997, 130, 275–281. [Google Scholar] [CrossRef]
  24. Van den Berg, G.B.; Smolders, C.A. Flux decline in ultrafiltration processes. Desalination 1990, 77, 101–133. [Google Scholar] [CrossRef]
  25. Berg, G.V.D.; Smolders, C. The boundary-layer resistance model for unstirred ultrafiltration. A new approach. J. Membr. Sci. 1989, 40, 149–172. [Google Scholar] [CrossRef]
  26. Berg, G.V.D.; Racz, I.; Smolders, C. Mass transfer coefficients in cross-flow ultrafiltration. J. Membr. Sci. 1989, 47, 25–51. [Google Scholar] [CrossRef]
  27. Thiess, H.; Leuthold, M.; Grummert, U.; Strube, J. Module design for ultrafiltration in biotechnology: Hydraulic analysis and statistical modeling. J. Membr. Sci. 2017, 540, 440–453. [Google Scholar] [CrossRef]
  28. Iritani, E. A Review on Modeling of Pore-Blocking Behaviors of Membranes During Pressurized Membrane Filtration. Dry. Technol. 2013, 31, 146–162. [Google Scholar] [CrossRef]
  29. Kujundzic, E.; Greenberg, A.R.; Fong, R.; Hernandez, M. Monitoring Protein Fouling on Polymeric Membranes Using Ultrasonic Frequency-Domain Reflectometry. Membranes 2011, 1, 195–216. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Schematic representation of the two hybrid model and mechanistic model structures, with implementation in the multi-step ahead model. (A) Hybrid model 1 (HM 1) using static, average RLys from the training set, (B) hybrid model 2 (HM 2) with two separate black boxes for flux and dynamic RLys prediction, (C) stagnant film model (SFM). (D) Multi-step ahead hybrid model structure.
Figure 1. Schematic representation of the two hybrid model and mechanistic model structures, with implementation in the multi-step ahead model. (A) Hybrid model 1 (HM 1) using static, average RLys from the training set, (B) hybrid model 2 (HM 2) with two separate black boxes for flux and dynamic RLys prediction, (C) stagnant film model (SFM). (D) Multi-step ahead hybrid model structure.
Processes 08 01625 g001
Figure 2. Flowchart of the hybrid model methodology for application in crossflow filtration.
Figure 2. Flowchart of the hybrid model methodology for application in crossflow filtration.
Processes 08 01625 g002
Figure 3. Training data sets including different protein concentrations and different TMPs at CF 200 mL/min: two-component training set containing (A) BSA and (B) lysozyme in the same solution (blue); one-component solution of (C) BSA (red) and (D) lysozyme (green).
Figure 3. Training data sets including different protein concentrations and different TMPs at CF 200 mL/min: two-component training set containing (A) BSA and (B) lysozyme in the same solution (blue); one-component solution of (C) BSA (red) and (D) lysozyme (green).
Processes 08 01625 g003
Figure 4. Comparing flux prediction of the test set containing (A) BSA (TMP 1.8 bar, CF 200 mL/min, initial cB,BSA 6.68 g/L) and (B) BSA with lysozyme (TMP 2.1 bar, CF 250 mL/min, initial cB,BSA 3.71 g/L, initial cB,Lys 0.38 g/L), with: hybrid model HM 1 trained on BSA solely (red dotted line) and the BSA and two-component training set (blue dashed line); SFM based on BSA solely (dark grey dot-dashed line) and two-component training set (light grey dot-dot-dashed line).
Figure 4. Comparing flux prediction of the test set containing (A) BSA (TMP 1.8 bar, CF 200 mL/min, initial cB,BSA 6.68 g/L) and (B) BSA with lysozyme (TMP 2.1 bar, CF 250 mL/min, initial cB,BSA 3.71 g/L, initial cB,Lys 0.38 g/L), with: hybrid model HM 1 trained on BSA solely (red dotted line) and the BSA and two-component training set (blue dashed line); SFM based on BSA solely (dark grey dot-dashed line) and two-component training set (light grey dot-dot-dashed line).
Processes 08 01625 g004
Figure 5. Schematic depiction of the training space (blue area) for: (A) TMP and CF of training runs (white dots) and test runs (grey dots); (B) initial to final cB,BSA and cB,Lys of the test runs (grey dots with grey solid lines) and the covered concentration range of the three training runs (white dots with black solid lines).
Figure 5. Schematic depiction of the training space (blue area) for: (A) TMP and CF of training runs (white dots) and test runs (grey dots); (B) initial to final cB,BSA and cB,Lys of the test runs (grey dots with grey solid lines) and the covered concentration range of the three training runs (white dots with black solid lines).
Processes 08 01625 g005
Figure 6. Comparison of observed and predicted flux and RLys. (A) The flux over time of test run 1, (B) RLys over permeate volume of test run 1, (C) the flux over time of test run 4, (D) RLys over permeate volume of test run 4, (E) the flux over time of test run 5, (F) RLys over permeate volume of test run 5.
Figure 6. Comparison of observed and predicted flux and RLys. (A) The flux over time of test run 1, (B) RLys over permeate volume of test run 1, (C) the flux over time of test run 4, (D) RLys over permeate volume of test run 4, (E) the flux over time of test run 5, (F) RLys over permeate volume of test run 5.
Processes 08 01625 g006
Figure 7. Comparison of observed and predicted flux and RLys. (A) The flux over time of test run 2, (B) RLys over permeate volume of test run 2, (C) the flux over time of test run 6, (D) RLys over permeate volume of test run 6, (E) the flux over time of test run 7, (F) RLys over permeate volume of test run 7.
Figure 7. Comparison of observed and predicted flux and RLys. (A) The flux over time of test run 2, (B) RLys over permeate volume of test run 2, (C) the flux over time of test run 6, (D) RLys over permeate volume of test run 6, (E) the flux over time of test run 7, (F) RLys over permeate volume of test run 7.
Processes 08 01625 g007
Figure 8. Summary of the prediction errors of HM 1, HM 2 and SFM in terms of (A) flux, (B) RLys, (C) final cB,Lys for test runs 1 to 4 with all parameters—TMP, CF, initial cB,BSA and cB,Lys—inside the training space and (D) final cB,Lys for test runs 5 to 8 performed partly outside the training space.
Figure 8. Summary of the prediction errors of HM 1, HM 2 and SFM in terms of (A) flux, (B) RLys, (C) final cB,Lys for test runs 1 to 4 with all parameters—TMP, CF, initial cB,BSA and cB,Lys—inside the training space and (D) final cB,Lys for test runs 5 to 8 performed partly outside the training space.
Processes 08 01625 g008
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Back to TopTop