Improved Time Resolved KPI and Strain Characterization of Multiple Hosts in Shake Flasks Using Advanced Online Analytics and Data Science

Shake flasks remain one of the most widely used cultivation systems in biotechnology, especially for process development (cell line and parameter screening). This can be justified by their ease of use as well as their low investment and running costs. A disadvantage, however, is that cultivations in shake flasks are black box processes with reduced possibilities for recording online data, resulting in a lack of control and time-consuming, manual data analysis. Although different measurement methods have been developed for shake flasks, they lack comparability, especially when changing production organisms. In this study, the use of online backscattered light, dissolved oxygen, and pH data for characterization of animal, plant, and microbial cell culture processes in shake flasks are evaluated and compared. The application of these different online measurement techniques allows key performance indicators (KPIs) to be determined based on online data. This paper evaluates a novel data science workflow to automatically determine KPIs using online data from early development stages without human bias. This enables standardized and cost-effective process-oriented cell line characterization of shake flask cultivations to be performed in accordance with the process analytical technology (PAT) initiative. The comparison showed very good agreement between KPIs determined using offline data, manual techniques, and automatic calculations based on multiple signals of varying strengths with respect to the selected measurement signal.


Introduction
The creation of a biopharmaceutical product from the very first development steps to market maturity costs several millions, if not billions, of euros [1,2]. To minimize the risk to patients and subsequent product failure, it is essential to thoroughly understand the product as well as the production process and its impact on product quality. For this reason, regulatory bodies such as the U.S. Food and Drug Administration (FDA) or the European Medicines Agency (EMA) require proof that the product and the process have been sufficiently investigated by the manufacturers. In the last two decades, the concept of quality by design (QbD), which has also been included in the International Conference on Harmonization (ICH) guidelines, has proven to be helpful and has become 1.
Use a recipe database as a basis for knowledge management; 2.
Automate and standardize detection of the exponential growth phase within shake flask experiments with enhanced data science; 3.
Automate determination of KPIs based on the detected exponential growth phase and data from the recipe database; 4.
Store KPIs in the database to simplify and enable comparison with other recipes.
The results were verified in two stages. First, the obtained online KPIs were calculated manually and, where possible, compared with the corresponding offline or literature data. Subsequently, the manually evaluated data were compared with that of the workflow. The variations in application, be it different filling heights, use of baffles, variation in medium, cell size, growth rate, oxygen demand and shaking speed were illustrated using representatives of microbial (E. coli and S. cerevisiae), plant (Vitis vinifera), mammalian (CHO, HEK) and insect cell cultures (High Five).

Workflow
The goal of this workflow, as shown in Figure 1, was to automatically identify the best fit for the exponential growth phase in the OUR or BSL signal. For the exponential fit to best match the observed exponential phase of the selected growth signal (BSL or OUR), it was important that the start, phase start , and end, phase end , of the exponential growth were set as accurately as possible. Unfortunately, signals are often subject to interference caused by high sensor noise, user errors or poorly chosen settings. The conditions of the shake flasks used, such as baffles or scratches on the flask surface, also contribute significantly to the signal to noise ratio, resulting in the start and end of the phase not being easy to identify. Therefore, the developed algorithm must detect and be robust against such disturbances. To enhance the robustness of the algorithm, a recipe was used, particularly in the initial phase fitting, which advised the algorithm of the approximate time interval when the exponential growth phase wa expected. The recipe contained rudimentary meta information, as shown in Table 1. Except for the two input values in the recipe, this workflow operated automatically without any user input or output. Table 1. Information stored in the recipe. The values of the individual attributes may differ from organism to organism, especially if the cultivation time varies considerably.

Attribute
Information Value O 2, threshold Oxygen limit when it can be assumed that exponential growth is impossible. 10-20% Growth speed Growth speed of the organism Fast, medium, slow Figure 1. Illustration of the workflow, consisting of three parts, to automatically determine the exponential growth phase in a shake flask cultivation. The first part "Initial Phase Fitting & Noise Reduction" of the workflow, shown in yellow, covers noise detection, signal smoothing if necessary, and the initial setting of the start (phasestart) and end (phaseend) of the exponential growth phase as described in Section 2.1.1. The second part "Optimization of phasestart" of the algorithm, depicted in green and described in Section 2.1.2, optimizes the setting of phasestart to improve the exponential fit, thus the predicted value of the exponential growth curve ( ). The last and third part "Optimization of phaseend" of the workflow shown in blue optimizes the setting of phaseend to further improve the exponential fit and produces the final output ( ), as described in Section 2.1.3. The individual steps of the algorithm are marked with white rectangles, outputs are highlighted as red rectangles.

Attribute Information Value
O2, threshold Oxygen limit when it can be assumed that exponential growth is impossible. 10-20% Growth speed Growth speed of the organism Fast, medium, slow The different recipes of the tested organisms were only differentiated according to whether the culture had a long or a short cultivation period. Not only was the growth signal itself important for detecting the exponential growth phase, but the information from the oxygen signal was also used, as its characteristics are useful for interpreting growth behavior. The created workflow consists of three parts. The initial phase fitting and noise reduction, optimization of phasestart, and optimization phaseend are explained in the following sub chapters.

Initial Phase Fitting and Noise Reduction
First, the raw signal of the OUR or biomass and oxygen are read in. In the second step, the minimum oxygen value is determined, since this characteristic can be the first indication of the end of the exponential growth curve, phaseend. During cultivation, an oxygen signal usually has several minima, however, in this case only the minima between the beginning of the experiment and the time when the O2 signal reaches a certain threshold (O2, threshold), when exponential growth is practically no longer possible, is of interest. The O2, threshold can differ from organism to organism, therefore, this information is stored in the recipe. The identified minima of the oxygen signal between the start of the experiment and O2, threshold is then stored as the initial phaseend. Illustration of the workflow, consisting of three parts, to automatically determine the exponential growth phase in a shake flask cultivation. The first part "Initial Phase Fitting & Noise Reduction" of the workflow, shown in yellow, covers noise detection, signal smoothing if necessary, and the initial setting of the start (phase start ) and end (phase end ) of the exponential growth phase as described in Section 2.1.1. The second part "Optimization of phase start " of the algorithm, depicted in green and described in Section 2.1.2, optimizes the setting of phase start to improve the exponential fit, thus the predicted value of the exponential growth curve (ŷ). The last and third part "Optimization of phase end " of the workflow shown in blue optimizes the setting of phase end to further improve the exponential fit and produces the final output (ŷ), as described in Section 2.1.3. The individual steps of the algorithm are marked with white rectangles, outputs are highlighted as red rectangles.
The different recipes of the tested organisms were only differentiated according to whether the culture had a long or a short cultivation period. Not only was the growth signal itself important for detecting the exponential growth phase, but the information from the oxygen signal was also used, as its characteristics are useful for interpreting growth behavior.
The created workflow consists of three parts. The initial phase fitting and noise reduction, optimization of phase start , and optimization phase end are explained in the following sub chapters.

Initial Phase Fitting and Noise Reduction
First, the raw signal of the OUR or biomass and oxygen are read in. In the second step, the minimum oxygen value is determined, since this characteristic can be the first indication of the end of the exponential growth curve, phase end . During cultivation, an oxygen signal usually has several minima, however, in this case only the minima between the beginning of the experiment and the time when the O 2 signal reaches a certain threshold (O 2, threshold ), when exponential growth is practically no longer possible, is of interest. The O 2, threshold can differ from organism to organism, therefore, this information is stored in the recipe. The identified minima of the oxygen signal between the start of the experiment and O 2, threshold is then stored as the initial phase end .
The maximum value of the oxygen signal can be an indication of the start of the exponential growth curve (phase start ) and is, therefore, identified in the next step. At the beginning of the cultivation in particular, oxygen may increase sharply in the experimental set-up (e.g., the time needed for temperature and oxygen equilibration after inoculation where the flask stands unmoved at room temperature), leading to a misidentification of the maximum oxygen value. To enhance robustness of this workflow, the search for the maximum value is restricted to a time window relating to an assumed growth speed. A distinction was drawn between slow (Vitis vinifera), medium (HEK, CHO) and fast-growing organisms (E. coli, S. cerevisiae), as listed in Table 1.
The next step is to check for peaks or plateaus in the timeframe of phase start and O 2,min in the growth signal, as these peaks can be another good indicator for a metabolic change and, therefore, for the end of the exponential growth (phase end ). However, this step is particularly prone to errors if the signal is too noisy. Therefore, a check is performed in advance to establish whether the signal has a high signal-to-noise ratio. If this is the case, the growth signal was smoothed with a Savitzky-Golay filter provided by the SciPy Python package [48,49].
If a peak or a plateau is found in the defined time frame, this characteristic is set as phase end . If neither of these two features can be identified, O 2, min is stored as phase end .

Optimization of Phase Start
At this point, the duration between phase start and phase end is assumed to be the maximum possible exponential growth phase. However, an exponential fit in this initial time period may lead to a poor-quality fit as a result of a low coefficient of determination (R 2 ) or a high root mean square error (RMSE). The value for R 2 can range between 0 and 1 and explains how well the model predicts the observed data in terms of proportion, whereas a value close to 1 is favorable. To increase the quality, and thus also the accuracy of the fit, the setting of phase start and phase end are optimized, as shown in the next parts of the workflow.
First, an initial fit between phase start and phase end is established, leading toŷ. Using a sliding window, a loop operated from half ofŷ to phase start . In each iteration, the R 2 value ofŷ and the observed growth data within the sliding window are calculated. If the value of R 2 worsened substantially, the loop is aborted and phase start is moved forward to the time point before the difference betweenŷ and the signal became too large.

Optimization of Phase End
After phase start was optimized, the setting of phase end can be improved. The first step is to create an exponential, temporary fit between the updated phase start and half ofŷ, resulting inŷ temp . In a loop, one after the other, another measuring point is added toŷ temp and refitted. With each new fit the RMSE is returned, providing information on how far, on average, the predicted values are from the observed data, whereas a low RMSE was favorable. If the RMSE increased x times in a row, the loop is aborted, since the exponential growth phase is assumed to be over. This query ensures that a local worsening of the fit, caused by noise, does not result in the optimization being aborted too early. The time point before the RMSE becomes deteriorated is set as new phase end . If there is no worsening of the RMSE, the phase end remaines unchanged. Lastly, an exponential fit is once again created between the updated phase start and phase end , which leads to the finalŷ.

Cultivation Results
To perform process-oriented strain characterization, as many KPIs as possible should be obtained from online data with minimal manual sampling effort. To allow a uniform evaluation despite the strong differences in the cultivation of plant and animal cell cultures and microbial fermentations, we decided on the following four relevant parameters for strain characterization: • maximum specific growth rate µ max ; • cell-specific oxygen consumption rate q O2 ; • biomass and product yield Y X/S and Y P/S ; • maximum achieved biomass concentration C X,max .
However, to determine these parameters, some factors need to be known, for example, the initial biomass C X,0 and nutrient C S,0 concentrations, as well as the maximum oxygen saturation of the medium c L,O2 * and the oxygen transfer rate (OTR) in the shake flask. The conditions at the start of the experiment are ideally measured or calculated directly during inoculation. The maximum oxygen saturation in the medium depends on its composition as well as the incubator atmosphere and can be determined by a blank measurement and subsequently included in the calculation or adjusted by setting the dissolved oxygen concentration to 100% after saturation with air has been achieved. To calculate C * O 2 , the atmospheric pressure p, the molar fraction of oxygen x O 2 (which is lowered in CO 2 incubators) and the temperature-dependent Henry constant must be known: Several approaches exist for determining OTRs in shake flasks: existing mathematical approximations [37], measurement [50,51] or computational fluid dynamics (CFD) [52,53]. If the volumetric mass transfer coefficient k L a is known, the OTR can be calculated as follows: Provided that the system is in a state of equilibrium, the OTR is equal to the oxygen uptake rate (OUR), which is the product of the biomass concentration C x and q O 2 : Thus, q O 2 can be calculated if the OTR and C X are known. As q O2 is assumed to be constant during the exponential growth phase, C X is proportional to the OTR and hence can be calculated if C x,0 is known. The biomass concentration at timepoint t C x(t) as well as µ max can, therefore, be estimated by a curve fitting to Equation (4). This is possible using offline biomass measurements, OUR data or biomass concentrations from backscattered light measurements.
Often-used yield coefficients compare the substrate used (C S ) with biomass (Y X/S ) or product (Y P/S ) formation but require metabolite measurements or well-trained models to do so.
The specific growth rates determined online were compared and validated against those calculated offline. Since q O2 could only be determined using online data, literature data were used as a reference.

Bacteria-E. coli
In the first experiments carried out with E. coli W3310 in complex media, it was found that opening the shaking incubator as well as sampling directly from the shake flask significantly influenced the measurement result, and subsequently, determination of the KPIs was no longer possible. Therefore, it was determined for all experiments that only at the beginning and end of the shaking would the incubator be opened and, if necessary, a sample taken. Various settings were tested for the different cultivation conditions, and two example results are shown in Figure 2. Cultivations in TB medium showed that the automated growth rate determined by the OUR led to good results ( Figure 2A), with µ OUR, auto. = 1.4048 h −1 compared with the manually determined values from online (µ OUR, man. = 1.4201 h −1 ) and offline data (µ CDW, offline = 1.3237 h −1 ). The exponential growth phase ended after approximately 5 h due to oxygen restriction and passed into a phase of slow, oxygen limited growth which lasted for 16 h. This phase was characterized by a plateau in the OUR. The end of the experiment was visible due to a sharp increase in oxygen and a drop in the OUR signal. This indicated the depletion of nutrients, in particular of glycerin and the consumable components of the complex media. the consumable components of the complex media.
Utilizing the BSL signal was, in general, less successful. Although a qualitative growth curve could be determined, the estimated values were too low. An example is depicted in Figure 2B, where the estimated growth rate using the BSL signal (µBSL, auto. = 0.7561 h −1 ) is significantly lower than the OUR (µOUR, auto. = 1.2075 h −1 ) or offline (µCDW, offline = 1.3237 h −1 ) based values. Additionally, in other experiments, OUR proved to outperform the BSL signal irrespective of whether baffles were used or not. The calculation of growth rates from the OUR was successful for both complex media. For the LB experiments (six flasks), a growth rate of µCDW, offline = 1.2982 ± 0.1153 h −1 was determined offline, which agreed well with the values calculated manually (µman., online = 1.2752 ± 0.2049 h −1 ) and automatically (µauto., online = 1.2075 ± 0.1511 h −1 ) from online data. Similar values were obtained in the experiments in TB medium (6 flasks). With offline data, a growth rate of µCDW, offline = 1.2889 ± 0.0453 h −1 was estimated, compared with µman., online = 1.3421 ± 0.0419 h −1 with manually and µauto., online = 1.3778 ± 0.0191 h −1 with automatically determined online data. The TB experiments had a biomass yield of YX/S = 1.679 ± Utilizing the BSL signal was, in general, less successful. Although a qualitative growth curve could be determined, the estimated values were too low. An example is depicted in Figure 2B, where the estimated growth rate using the BSL signal (µ BSL, auto. = 0.7561 h −1 ) is significantly lower than the OUR (µ OUR, auto. = 1.2075 h −1 ) or offline (µ CDW, offline = 1.3237 h −1 ) based values. Additionally, in other experiments, OUR proved to outperform the BSL signal irrespective of whether baffles were used or not.
The calculation of growth rates from the OUR was successful for both complex media. For the LB experiments (six flasks), a growth rate of µ CDW, offline = 1.2982 ± 0.1153 h −1 was determined offline, which agreed well with the values calculated manually (µ man., online = 1.2752 ± 0.2049 h −1 ) and automatically (µ auto., online = 1.2075 ± 0.1511 h −1 ) from online data. Similar values were obtained in the experiments in TB medium (6 flasks). With offline data, a growth rate of µ CDW, offline = 1.2889 ± 0.0453 h −1 was estimated, compared with µ man., online = 1.3421 ± 0.0419 h −1 with manually and µ auto., online = 1.3778 ± 0.0191 h −1 with automatically determined online data. The TB experiments had a biomass yield of Y X/S = 1.679 ± 0.032 g CDW g Gly −1 , whereby a CDW max of 9.11 ± 0.21 g L −1 was achieved. For the experiments in LB medium, no yield was estimated as no carbon source was added. The maximum biomass concentration was 1.33 ± 0.10 g L −1 just from the complex media ingredients.
To investigate the influence of the media components, experiments were carried out with a chemically defined medium according to Biener, in addition to the complex media listed so far. The most striking differences were the coloration (clear instead of brown) and the lower variability of the media components. Interestingly, the clear color did not prove advantageous for BSL measurements in the low OD 600 range. On the contrary, the growth rates calculated in this way were unrealistic, which could possibly be explained by low absorption and the associated increased reflection of the light from the liquid surface that acts as a mirror. The defined media components, on the other hand, ensured higher reproducibility, which was evident in the lower standard deviations. All determined growth rates were significantly lower compared with the complex media. With offline measurements, a growth rate µ CDW, offline of 0.620 ± 0.019 h −1 was determined. The manually determined growth rate based on online data was slightly higher (µ man., online = 0.6432 ± 0.004 h −1 ), and the algorithm-based rate was slightly lower (µ auto., online = 0.582 ± 0.020 h −1 ). The biomass yield (Y X/S = 0.434 ± 0.065 g CDW g Glc −1 ) and the maximum biomass concentration CDW max = 3.51 ± 1.27 g L −1 were significantly lower. This can be explained by the missing energy source from the complex media components and the absence of pH regulation, which led to an acidic regime as a result of acetate production. The specific oxygen consumption rate q O2 in the Biener medium was 2.132 ± 0.413 × 10 −2 mol g −1 h −1 , which agreed with reported data for E. coli in chemically defined media. Andersen and von Meyenburg reported a q O2 of 1.98 ± 0.17 × 10 −2 mol g −1 h −1 in minimal medium with 2 g L −1 glucose during the exponential growth phase [54]. Lin et al. estimated a q O2 of 2 ± 0.2 × 10 −2 mol g −1 h −1 with the same strain in chemically defined Teich media in a batch phase process [55]. Further data on experiments with E. coli can be found in Appendix B.

Yeast-S. cerevisiae
The basic design of the S. cerevisiae experiments corresponded to those with E. coli, but with higher working volumes. Of particular interest in these yeast cultivations were changes in metabolism that were visible online and were associated with different cultivation phases ( Figure 3). After a brief adaptation period, the yeast grew exponentially, consuming glucose, and producing ethanol (between start and 5 h). The highest growth rate was expected during this phase. Thus, the algorithm cut off the adaptation phase and used the short increase in oxygen to set the phase starting and ending points. This shift from glucose consumption and ethanol formation to ethanol consumption was also visible in the pH, which instead of decreasing, started to increase. During ethanol consumption, the suspension was oxygen limited (in this example, between 8 and 21 h) and thus, the informative value of oxygen and OUR data decreased. After 18 h, a change in pH was visible, which may indicate the switch from ethanol to acetate or complex media components as energy sources. Finally, after 21 h, oxygen saturation and pH increased sharply while the OUR dropped, indicating a complete depletion of energy sources and the end of the cultivation. The online calculated growth rate of µ OUR, auto. = 0.445 h −1 fit very well to the manually determined ones using online data (µ OUR, man. = 0.454 h −1 and µ CDW, offline = 0.453 h −1 ). Furthermore, with the online measurements, the q O2 was determined to be 3.12 mmol g −1 h −1 , which was comparable to the literature results with S. cerevisiae [56].
The growth rate estimation using the online backscattered light signal was less successful and associated with greater deviations (Figure 4). For the above-described experiment, a µ BSL, auto. of 0.546 h −1 was calculated. Even the manual determination using the backscattered light signal resulted in a comparatively large deviation (µ BSL, man. = 0.511 h −1 ), indicating that this method is not suitable for accurately and reliably determining KPIs.
The KPIs for S. cerevisiae were determined in 10 shake flask experiments under different cultivation conditions (however, always at 30 • C and with YPD), resulting in growth rates of µ CDW, offline = 0.496 ± 0.034 h −1 , µ OUR, man. = 0.489 ± 0.045 h −1 and µ OUR, auto = 0.474 ± 0.042 h −1 , with a biomass yield, Y X/S , of 0.812 ± 0.168 g CDW g Glc −1 and a q O2 = 2.588 ± 0.392 × 10 −3 mol g −1 h −1 . The relatively large standard deviation in the biomass yield can be explained by metabolic differences in baffled and unbaffled flasks. The latter reach oxygen limitation earlier, increasing the total cultivation time and requiring more energy for maintenance metabolism. This was also visible in the maximum achieved biomass. In baffled flasks, CDW max was 9.4 g L −1 , whereas in unbaffled flasks, CDW max was 7.7 g L −1 . The type of flask had no influence on the maximum growth rate in the batch experiments performed, since exponential growth had already been achieved before oxygen limitation occurred.  The growth rate estimation using the online backscattered light signal was less successful and associated with greater deviations (Figure 4). For the above-described experiment, a µBSL, auto. of 0.546 h −1 was calculated. Even the manual determination using the backscattered light signal resulted in a comparatively large deviation (µBSL, man. = 0.511 h −1 ), indicating that this method is not suitable for accurately and reliably determining KPIs.   The growth rate estimation using the online backscattered light signal was less successful and associated with greater deviations (Figure 4). For the above-described experiment, a µBSL, auto. of 0.546 h −1 was calculated. Even the manual determination using the backscattered light signal resulted in a comparatively large deviation (µBSL, man. = 0.511 h −1 ), indicating that this method is not suitable for accurately and reliably determining KPIs.  . However, using the BSL signal in the 500 mL flask (B), even after smoothing the signal, was not sufficient to obtain a realistic calculation of the growth rate (neither manually nor algorithm-based).

Plant Cells-Vitis vinifera
The cultivation of plant cells had some major differences compared with the microbial fermentations considered so far. First, they grew much more slowly, with doubling times of three days or more. This was accompanied by correspondingly long cultivation times, usually two weeks for V. vinifera batch cultivations. At the same time, very high cell densities could be achieved, so that up to 80% of the suspension could consist of cells. This was associated with a significant increase in viscosity. The metabolism of sucrose could be regarded as a metabolic peculiarity, which was first enzymatically and extracellularly cleaved into glucose and fructose. Subsequent use occurred simultaneously, with glucose being preferentially consumed. These metabolic changes were also very clearly visible in the online signals ( Figure 5). After the sucrose had been completely cleaved and the cells had completed their adaptation phase (72 h), there was a brief increase in the O 2 signal. A short adaptation in the O 2 signal was also seen when the glucose had been completely consumed (168 h). Particularly striking was the strong decrease in the dissolved oxygen concentration after 264 h, the time at which the last carbon source, fructose, was consumed.
The cultivation of plant cells had some major differences compared with the microbial fermentations considered so far. First, they grew much more slowly, with doubling times of three days or more. This was accompanied by correspondingly long cultivation times, usually two weeks for V. vinifera batch cultivations. At the same time, very high cell densities could be achieved, so that up to 80% of the suspension could consist of cells. This was associated with a significant increase in viscosity. The metabolism of sucrose could be regarded as a metabolic peculiarity, which was first enzymatically and extracellularly cleaved into glucose and fructose. Subsequent use occurred simultaneously, with glucose being preferentially consumed. These metabolic changes were also very clearly visible in the online signals ( Figure 5). After the sucrose had been completely cleaved and the cells had completed their adaptation phase (72 h), there was a brief increase in the O2 signal. A short adaptation in the O2 signal was also seen when the glucose had been completely consumed (168 h). Particularly striking was the strong decrease in the dissolved oxygen concentration after 264 h, the time at which the last carbon source, fructose, was consumed. This illustrates the advantages and disadvantages of measuring the oxygen in plant cell shake flask cultivations. This, and corresponding knowledge of plant physiology, allowed substrate consumption to be followed online without having to wait for time-consuming substrate measurements such as HPLC. However, these "jumps" in the oxygen signal caused deviations in the OUR calculated from it, making it unsuitable for calculating the growth rate. This illustrates the advantages and disadvantages of measuring the oxygen in plant cell shake flask cultivations. This, and corresponding knowledge of plant physiology, allowed substrate consumption to be followed online without having to wait for timeconsuming substrate measurements such as HPLC. However, these "jumps" in the oxygen signal caused deviations in the OUR calculated from it, making it unsuitable for calculating the growth rate.
In this case, the size of the plant cells and the high cell density within the suspension proved to be an advantage. The BSL signal was clear and unnoisy throughout the cultivation period and correlated very well to the cell dry weight concentration, even in the death phase ( Figure 5). The algorithm-based growth rate calculated from this (µ BSL, auto. = 0.0090 h −1 ) was also very similar to the rates calculated manually (µ BSL, man. = 0.0093 h −1 ) and offline (µ CDW,offline = 0.0090 h −1 ). The pH signal could be used to interpret the use of inorganic metabolites. The preferential uptake of ammonium and phosphate in the adaptation phase was accompanied by a strong decrease in pH. During the subsequent uptake of nitrate, the pH increased continuously until the end of cultivation ( Figure 5A).
The KPIs determined from all the experiments (12 flasks) can be summed up as follows: The growth rates calculated using µ BSL, auto. = 0.0080 ± 0.001 h −1 , µ BSL, man. = 0.0088 ± 0.001 h −1 and µ CDW, offline = 0.0084 ± 0.001 h −1 matched well, with a slight underestimation of the manually calculated growth rates based on the BSL signal and a slight overestimation when manually estimating them. The biomass yield Y X/S was 0.4744 ± 0.0366 g CDW g Glc −1 , resulting in maximum cell dry weight concentrations of CDW max = 15.298 ± 0.881 g L −1 . The specific oxygen consumption rate q O2 was 2.293 ± 0.400 × 10 −4 mol g −1 h −1 , compared with q O2 values from 1.1 to 7.1 × 10 −4 mol g −1 h −1 in V. vinifera L. cv Gamay suspension cultures in Gamborg B5 medium, reported by Pépin et al. [57].

Animal Cells
The study of animal cell cultures was performed in several steps. First, the two CHO cell lines were characterized ( Figure 6). Similar to the microbial cultivations, the OUR signal provided a very good estimation of the specific growth rate. Due to the relatively high filling volume in combination with the lower shaking rates and doubling times in the range of 18 to 24 h, the signal was not noisy and was easy to interpret Figure 6). For the ExpiCHO-S 6H8 cell line, the µ OUR, auto. = 0.0370 h −1 was similar to the µ OUR, man. = 0.0365 h −1 and both were slightly lower than the µ VCD, offline with 0.0389 h −1 ( Figure 6A). For CHO DP12 #1934, the results were all similar, with µ OUR, auto. = 0.0340 h −1 , µ OUR, man. = 0.0347 h −1 , and µ VCD, offline = 0.0348 h −1 ( Figure 6B). The pH signal was very interesting too, as it indicated whether the metabolic shift from lactate formation to consumption was successful or not. The increase in pH after 120 h for the ExpiCHO-S cells ( Figure 6A) correlated to the uptake of lactate and thus total consumption of the initial glucose, resulting in higher yields compared with an unsuccessful lactate shift with no pH change at the end in the depicted CHO-DP12 cultivation ( Figure 6B). The BSL signal showed a surprising trend for animal cell cultures. For the first 70 h, there was no change, despite cell growth. However, as viability decreased, the BSL signal increased. For animal cell cultures, this resulted in a correlation of the BSL signal with dead cell density. This could be explained by a change in the cell surface during death, which was also visually indicated by a milky turbidity in the cell suspension during the death phase.
The growth rates based on OUR for ExpiCHO-S (6 flasks) again showed good agreement with µ VCD, offline = 0.0390 ± 0.007 h −1 , µ OUR, man. = 0.0359 ± 0.005 h −1 , and µ OUR, auto = 0.0362 ± 0.007 h −1 . Comparable results were also obtained with the CHO DP-12 cells (eight flasks), resulting in growth rates of µ VCD, offline = 0.0349 ± 0.017 h −1 , µ OUR, man. = 0.0335 ± 0.018 h −1 , and µ OUR, auto = 0.0327 ± 0.011 h −1 . For both CHO cell lines, the OURbased growth rates were slightly lower than the offline data-based values. The standard deviation was minimal, indicating that with standardized inoculum preparation and chemically defined media, high reproducibility would be achievable. More significantly than growth rates, the cells differed in yield, with Y X/S = 3.126 ± 0.065 × 10 6 cells g Glc −1 for ExpiCHO-S and Y X/S = 1.990 ± 0.140 × 10 6 cells g Glc −1 for CHO DP-12. The lower cell specific oxygen consumption rate for ExpiCHO-S (q O2 = 2.283 ± 0.238 × 10 −13 mol cell −1 h −1 ) compared with CHO DP-12 cells (q O2 = 2.884 ± 0.623 × 10 −13 mol cell −1 h −1 ) indicated that ExpiCHO-S cells seem to have a more effective metabolism. This was also visible in the product yield, which was double for ExpiCHO-S (Y P/S = 0.0575 ± 0.0055 g IgG g Glc −1 ) compared with CHO DP-12 (Y P/S = 0.0267 ± 0.0038 g IgG g Glc  The growth rates based on OUR for ExpiCHO-S (6 flasks) again showed good agreement with µVCD, offline = 0.0390 ± 0.007 h −1 , µOUR, man. = 0.0359 ± 0.005 h −1 , and µOUR, auto = 0.0362 ± 0.007 h −1 . Comparable results were also obtained with the CHO DP-12 cells (eight flasks), resulting in growth rates of µVCD, offline = 0.0349 ± 0.017 h −1 , µOUR, man. = 0.0335 ± 0.018 h −1 , and µOUR, auto = 0.0327 ± 0.011 h −1 . For both CHO cell lines, the OUR-based growth rates were slightly lower than the offline data-based values. The standard deviation was minimal, indicating that with standardized inoculum preparation and chemically defined media, high reproducibility would be achievable. More significantly than growth rates, the cells differed in yield, with YX/S = 3.126 ± 0.065 × 10 6 cells gGlc −1 for ExpiCHO-S and YX/S = 1.990 ± 0.140 × 10 6 cells gGlc −1 for CHO DP-12. The lower cell specific oxygen consumption rate for ExpiCHO-S (qO2 = 2.283 ± 0.238 × 10 −13 mol cell −1 h −1 ) compared with CHO DP-12 cells (qO2 = 2.884 ± 0.623 × 10 −13 mol cell −1 h −1 ) indicated that ExpiCHO-S cells seem to have a more effective metabolism. This was also visible in the product yield, which was double for ExpiCHO-S (YP/S = 0.0575 ± 0.0055 gIgG gGlc −1 ) compared with CHO DP-12 (YP/S = 0.0267 ± 0.0038 gIgG gGlc −1 ). The qO2 values were in good agreement with the literature values (3.0 to 3.2 × 10 −13 mol cell −1 h −1 for CHO DP-12 [58] and 2.3 × 10 −13 mol cell −1 h −1 for CHO-S cells [59]). Additional data on the CHO DP-12 and as well as the characterization of HEK and High Five cultivations are provided in Appendix B. For low cell densities, the BSL signal had a high signal to noise ratio, which decreased as cell density (C) increased. The BSL signal increased as viability decreased, indicating a measurement of the dead cell density (C). The production and consumption of lactate was visible in the pH signal (purple line), with the lactate shift at 120 h clearly visible (D).

Evaluation of Measurement Techniques
No single measurement technique was suitable for all of the organism-flask combinations, but at least one was found to be appropriate in each examined case ( Table 2). In general, the OUR was more diverse in its use and may be the signal of choice for all culture types except plant cells. However, there were some drawbacks. First, the k L a-value must be known and an initial oxygen saturation until equilibrium must be performed, as even slight variations (>1% difference in oxygen saturation) have significant effects on the growth rate calculation. Second, if the suspension reaches oxygen limitation, no information gain is possible. Finally, signal disturbances, whether resulting from metabolic changes or from disturbing the shaking platform, may render the automated signal interpretation useless. Thus, it is recommended not to open the shaker and especially not to sample directly from the measurement flasks. Furthermore, the change in the shaker atmosphere and temperature may also have a significant influence on the signal. The BSL signal was not affected by oxygen limitation and the interruption caused by a stopped shaker or a removed flask was less severe. Thus, the signal is relatively robust, as long as the culture is dense enough, which is the case for plant cell cultures or higher density microbial cultivations. However, baffled flasks and the bubbles they introduce into the suspension will result in a higher signal to noise ratio, especially for low filling volumes and high shaking rates. Interestingly, changes in size and surface affect the BSL signal, resulting in the detection of mainly dead cells in animal cell cultivations.
The pH wa an excellent supporting parameter, indicating metabolic activities, e.g., the switch from glucose consumption and acetate production to acetate consumption or the metabolic shift from lactate formation to consumption for CHO cells. Furthermore, these metabolic changes were still visible, even if the oxygen signal could not be used due to oxygen limiting conditions.

Evaluation of KPIs
To evaluate the KPIs, the algorithm-based online data were compared with manually calculated online and offline data. All manual calculations were performed by the same user to reduce any human-related bias which may occur if several users manually set the exponential growth phases.
As shown in the previous subsections, the OUR can be used for most organisms and is essential for the calculation of the q O2 . The BSL signal does not correlate linearly to the OD 600 , which is probably the reason for the lower accuracy of this method for growth rate estimations. Calibration allows the BSL signal to be converted directly to OD 600 or CDW, increasing accuracy. However, this involves further manual work, which contradicts the approach presented here. Manually and automatically determined growth rates based on the OUR and the BSL data are compared in Table 3. Very good agreement between the growth rates determined offline and by means of the algorithm is shown, with an average deviation of 6.5%. The highest deviation of 12.8% was found for the HEK cells, and the deviation was less than 7% for all other cells.
The obtained online data could be confirmed by the measured offline data. However, an additional verification, especially for the specific oxygen uptake rates obtained only by online data, resulted from a comparison with already published values (Table 4). This comparison shows both the accuracy and the advantages of the proposed approach. Provided that comparable experiments (i.e., strain, medium, and process conditions similar) are available, the determined values of the presented approach compare very well with them. Table 3. Overview of the KPIs for all investigated organisms under standard conditions described in Section 2. For every investigated organism-medium combination, the number of replicates is indicated in brackets (replicates). µ offline was calculated based on measurements of cell dry weight for E. coli, S. cerevisiae and V. vinifera and viable cell density for CHO, HEK and High Five cells. The calculation of µ BSL and µ OUR was distinguished between manual ( man ) and automatic ( auto ) evaluation. The use of the BSL signal appeared to be useful only for S. cerevisiae and V. vinifera, whereas the use of the OUR only for V. vinifera did not seem to be purposeful. The calculated cell-specific oxygen uptake rates and biomass yields are based on the biomass measurements typically used for the organisms, i.e., cell dry weight for E. coli, S. cerevisiae and V. vinifera, and viable cell density for CHO, HEK and High Five.  Table 4. Comparison of automatically determined growth rates, specific oxygen uptake rates and biomass yield with published reference values. As far as possible, similar/same strains, media and process conditions were considered for the selection of literature data.  [77][78][79] However, as soon as the selected parameters deviated more strongly from the reference data, the range of literature values was also significantly larger, as can be seen, for example, in the growth rate of High Five cells or the specific oxygen uptake rate of S. cerevisiae. Another problem arose when complex media components were used, which made it very difficult to compare the maximum cell number or yield for media such as LB, TB and YPD. It was equally difficult to determine the biomass yield of High Five and HEK293 cells in a batch process without infection, as most publications are dedicated to the much more challenging production fed-batch processes with virus infection. Therefore, for the characterization of a new cell-medium combination or when changing the process conditions, the experimental determination of the KPI is recommended, which can be performed easily, automatically, and precisely with the presented approach.

Conclusions and Outlook
In contrast to state-of-the-art approaches, where media and strains are laboriously characterized or optimized using manually and invasively determined KPIs, the presented, novel workflow can determine KPIs automatically and standardize them in early-stage bioprocess development, as described in Section 2.1. By combining data science methods, adjusted to the small amount of data usually available in early-stage development, and measurements of online, non-invasive sensors, it has been shown that KPIs, which are comparable to results from the literature and manual evaluation, can be efficiently extracted from shake flask experiments. Furthermore, it has also been shown that the discussed workflow provides solid and robust results under realistic conditions.
The experiments carried out to evaluate the performance of the algorithm were conducted by different operators, over different time periods, with different flasks, and with different media and organisms, providing a good representation of the expected reality in bioprocess development. In Section 2.2, we have shown that the developed workflow can identify the exponential growth phase for the performed experiments just as well as an experienced user can using manual evaluation, see Table 3. Clearly, the accuracy of the algorithm depends on the quality of the online signal that is used, but the phase optimization technique using the algorithm and the recipe database substantially contribute to increasing the robustness of the workflow. Therefore, we conclude that the application of this algorithm in research and industry can help save time and resources for strain and medium characterization and optimization.
The robustness of phase detection could be improved by using further online signals such as dissolved CO 2 in addition to pH, dissolved oxygen, and OUR or BSL, as these data parameters also provide further insight into the growth behavior of the observed organism. Further experiments should verify the application of challenging cultivation conditions. This includes phototrophically growing organisms such as algae or fungal and bacterial cultures, which have a mycelial growth, e.g., used for antibiotic production, and are preferably cultivated on a growth surface but can also grow on the measuring surface of the sensor spots, and pose a challenge when cultivating in shake flasks [80].

General Equipment and Online Measurement Systems
The experiments were performed in 250 or 500 mL disposable Erlenmeyer flasks (Corning Inc., New York, NY, US) with and without baffles in Multitron Pro, Multitron Cell (Infors HT, Bottmingen, CH) and LT-X (Adolf Kühner AG, Birsfelden, CH) shaking incubators with 25 or 50 mm shaking amplitudes. Online measurements of backscattered light, dissolved oxygen concentration, and pH were carried out using up to four SFR vario devices (PreSens, Regensburg, DE) and corresponding disposable shake flasks equipped with sensor spots (Figure 7). Both sensor spots contain reversible fluorescent dyes that indicate changes in O 2 and pH. In contrast to electrodes, these sensors do not require calibration as they come pre-calibrated. Calibration constants are valid for a certain production cycle of flasks, indicated by a batch number. The device is placed inside the shaker and has integrated rechargeable batteries. The controlling PC establishes a wireless Bluetooth connection to initiate the readouts at flexible time intervals. To avoid disturbing the online measurements by stopping the shaker and removing the shake flasks, all comparative offline measurements (except for the determination of initial cell density and substrate concentration, which was performed in each shake flask) were conducted in reference flasks.
the shake flasks, all comparative offline measurements (except for the determination o initial cell density and substrate concentration, which was performed in each shake flask were conducted in reference flasks.
The PreSens system was chosen to establish the workflow because the simultaneou measurement of dissolved oxygen, biomass signal via backscattered light, and the calcu lated oxygen uptake rate allowed cross-validation of the data obtained. However, the ap proach can also be performed with other systems described in Section 1, such as the RA MOS, the Kuhner TOM or the CGQ. Additionally, of interest is the combination of differ ent measurement systems to ensure a comparable verification, as demonstrated by An derlei et al. [81].

Bacteria Cultivation-E. coli
The experiments were performed using the W3110 E. coli strain (thyA3662supOλ− DSMZ ordering number: 5911) and three media: The PreSens system was chosen to establish the workflow because the simultaneous measurement of dissolved oxygen, biomass signal via backscattered light, and the calculated oxygen uptake rate allowed cross-validation of the data obtained. However, the approach can also be performed with other systems described in Section 1, such as the RAMOS, the Kuhner TOM or the CGQ. Additionally, of interest is the combination of different measurement systems to ensure a comparable verification, as demonstrated by Anderlei et al. [81].

Bacteria Cultivation-E. coli
The experiments were performed using the W3110 E. coli strain (thyA3662supOλ−, DSMZ ordering number: 5911) and three media: The two complex media LB and TB have the same complex components, but TB is richer, has an additional carbon source (glycerin), and is buffered. The LB was mixed completely and autoclaved, while the TB media and buffer were autoclaved separately and mixed after the components had cooled down. The chemically defined media according to Biener et al. [60] with reduced glucose concentrations compared with the original recipe, consisted of different salt and trace element stock solutions as well as glucose and MgSO 4 solutions (Appendix A). All stock solutions were autoclaved separately and mixed after cooling down. Accordingly, the influence of different media (complex vs. defined), different temperatures and different shaking conditions were tested. For biomass quantification, the optical density at 600 nm (OD 600 ) was measured using a SmartSpec Plus photometer (Bio-Rad, Hercules, CA, US) and the cell dry weight (CDW) was determined gravimetrically in 1.5 mL tubes. Metabolites were analyzed using a Cedex Bio (Roche Diagnostics, Mannheim, Germany) and the corresponding kits for glucose, ammonia, and phosphate.
Temperature was set to 37 • C, shaking frequency to 180 rpm, and shaking amplitude to 50 mm. Biomass was determined identically to the E. coli experiments. For metabolite analytics, ethanol was measured instead of ammonia.  (Table 5). Both CHO cell lines produce the antibody immunoglobulin G (IgG) and behave in the same manner as industrial mammalian production cell lines. HEK cells are also widely used, but tend to form larger aggregates [86]. High Five insect cells have similar morphological characteristics to CHO cells but are cultured without CO 2 based pH control and at lower temperatures. The applied cultivation conditions are summarized in Table 6. The effect of different clones, media and cultivation conditions should test the robustness of the methodology. Cell density and viability were measured using a Cedex HiRes (Roche Custom Biotech) for the CHO and High Five cells, and with a NucleoCounter NC200 (Chemometec A/S, Allerod, Denmark) for the HEK293 cells. Metabolites (glucose, glutamine, ammonia), as well as the product IgG, were determined using a Cedex Bio (Roche Custom Biotech) and the corresponding analytic kits.

Software
The SFR varios that were used were controlled by the associated PreSens Flask Studio PFS software. The software connected to the device at each time point to initiate the measurement and retrieve the data that were stored in the integrated SQL database. Data were then visualized online as diagrams and could be compared with historical runs in a cumulative graph.
For more advanced analytics, as described in chapter 2.1, the data were transferred from the SQL database via a REST API to commercially available PAS-X Savvy 2022.03 software (Körber Pharma Software GmbH, Vienna, Austria). This additional interface was implemented in PFS. This software was used to develop the relevant algorithms and analyses for this work using Python 3.79 (Python Software Foundation, available online: https://www.python.org/, accessed on 22 March 2022).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Recipe for 1 L E.coli Medium
This recipe uses a modified medium from Biener et al. [60] with lowered glucose concentration. All stock solutions were mixed after autoclaving. Complementing the online data shown in Figure 2, an offline growth curve is depicted in Figure A1. For the experiment in LB medium, a maximum specific growth rate of µ offline = 1.323 h −1 was observed. When comparing the offline measurements with the backscattered light curve, the problem of non-linear correlation becomes apparent, which can lead to deviations in the calculation of the growth rate (µ BSL, auto. = 0.7561 h −1 ).
The calculation based on the OUR proved to be much more robust. Neither the size and type of the flasks, nor the process conditions, exhibited an impact on the OUR-based growth rate or its estimation. To further test the approach, a temperature variation using the TB medium was carried out ( Figure A2). For the selected range (20 to 37 • C), which covers typical cultivation conditions for E. coli in production processes, a linear relationship between temperature and growth rate was found.
The calculation based on the OUR proved to be much more robust. Neither the size and type of the flasks, nor the process conditions, exhibited an impact on the OUR-based growth rate or its estimation. To further test the approach, a temperature variation using the TB medium was carried out ( Figure A2). For the selected range (20 to 37 °C), which covers typical cultivation conditions for E. coli in production processes, a linear relationship between temperature and growth rate was found. Figure A1. Offline growth data of E. coli W3110 in LB medium. The difference in increase between offline and online measured biomass concentration in the absence of calibration leads to significant differences in the calculated growth rate (µoffline = 1.323 h −1 compared with µBSL, auto. = 0.7561 h −1 ). Figure A2. Dependence of growth rate on temperature in a E. coli cultivation (20 to 37 °C, 200 rpm, 25 mm shaking amplitude, 40 mL TB medium in 500 mL baffled shake flasks). The growth rates were calculated with online OUR data.

Appendix B.1.2. Animal Cell Cultures
As a supplement to the CHO experiments, cultivations with HEK and High Five cells were additionally performed. The results of these experiments were comparable to the Figure A1. Offline growth data of E. coli W3110 in LB medium. The difference in increase between offline and online measured biomass concentration in the absence of calibration leads to significant differences in the calculated growth rate (µ offline = 1.323 h −1 compared with µ BSL, auto. = 0.7561 h −1 ). and type of the flasks, nor the process conditions, exhibited an impact on the OUR-based growth rate or its estimation. To further test the approach, a temperature variation using the TB medium was carried out ( Figure A2). For the selected range (20 to 37 °C), which covers typical cultivation conditions for E. coli in production processes, a linear relationship between temperature and growth rate was found. Figure A1. Offline growth data of E. coli W3110 in LB medium. The difference in increase between offline and online measured biomass concentration in the absence of calibration leads to significant differences in the calculated growth rate (µoffline = 1.323 h −1 compared with µBSL, auto. = 0.7561 h −1 ). Figure A2. Dependence of growth rate on temperature in a E. coli cultivation (20 to 37 °C, 200 rpm, 25 mm shaking amplitude, 40 mL TB medium in 500 mL baffled shake flasks). The growth rates were calculated with online OUR data. O 2 concentration. For the High Five cells, µ OUR, auto. = 0.0421 h −1 , µ OUR, man. = 0.0436 h −1 , and µ VCD, offline = 0.0431 h −1 were also nearly identical.
above-described CHO cell experiments ( Figure A3). It was shown that neither the aggregation tendency of HEK cells nor the changed shaker atmosphere for High Five cells impacted the evaluation. At µOUR, auto. = 0.035 h −1 , the algorithm-based growth rate was in very good agreement with the manually determined µOUR, man. = 0.0359 h −1 and the µVCD, offline with 0.0350 h −1 ( Figure A3A). The metabolic shift of lactate formation to consumption at 100 h was clearly visible in the pH signal and matched the maximum OUR and lowest O2 concentration. For the High Five cells, µOUR, auto. = 0.0421 h −1 , µOUR, man. = 0.0436 h −1 , and µVCD, offline = 0.0431 h −1 were also nearly identical. Figure A3. Evaluation of HEK cells in 250 mL shake flasks with 80 mL working volume (A,C) and High Five cells in 500 mL shake flasks with 100 mL working volume (B,D). Both cultivations were performed with a shaking rate of 130 rpm and a shaking diameter of 50 mm. The OUR signal (yellow line) was used for growth rate calculation, resulting in µOUR, auto. = 0.00337 h −1 for HEK and µOUR, auto. = 0.0417 h −1 for High Five cells. The metabolic shift from lactate formation to consumption for HEK cells was clearly visible in the pH signal (purple line, A).
The determined OUR-based growth rates for HEK cells (6 flasks) were µVCD, offline = 0.0315 ± 0.0026 h −1 , µOUR, man. = 0.0365 ± 0.0010 h −1 , and µOUR, auto = 0.0355 ± 0.0005 h −1 . At 9.863 ± 3.76 × 10 −14 mol cell −1 h −1 , the qO2 was the lowest of all the investigated animal cells (slightly lower than values reported by Martinez-Monge of 1.3 to 1.58 × 10 −13 mol cell −1 h −1 [84]). The YX/S of 1.3921 ± 0.1511 × 10 6 cells gGlc −1 was also lower than the CHO cultivation yields. The High Five cells showed the highest growth rates of all the animal cells with µVCD, offline = 0.0445 ± 0.0014 h −1 , µOUR, man. = 0.0453 ± 0.0017 h −1 , and µOUR, auto = 0.0430 ± 0.0010 h −1 . This was combined with the highest qO2 of 4.185 ± 0.024 × 10 −13 mol cell −1 h −1 and the lowest YX/S of 1.2737 ± 0.0566 × 10 6 cells gGlc −1 . The qO2 lies in between the reported value Figure A3. Evaluation of HEK cells in 250 mL shake flasks with 80 mL working volume (A,C) and High Five cells in 500 mL shake flasks with 100 mL working volume (B,D). Both cultivations were performed with a shaking rate of 130 rpm and a shaking diameter of 50 mm. The OUR signal (yellow line) was used for growth rate calculation, resulting in µ OUR, auto. = 0.00337 h −1 for HEK and µ OUR, auto. = 0.0417 h −1 for High Five cells. The metabolic shift from lactate formation to consumption for HEK cells was clearly visible in the pH signal (purple line, A).