Fluorescence Spectroscopy and Chemometric Modeling for Bioprocess Monitoring

On-line sensors for the detection of crucial process parameters are desirable for the monitoring, control and automation of processes in the biotechnology, food and pharma industry. Fluorescence spectroscopy as a highly developed and non-invasive technique that enables the on-line measurements of substrate and product concentrations or the identification of characteristic process states. During a cultivation process significant changes occur in the fluorescence spectra. By means of chemometric modeling, prediction models can be calculated and applied for process supervision and control to provide increased quality and the productivity of bioprocesses. A range of applications for different microorganisms and analytes has been proposed during the last years. This contribution provides an overview of different analysis methods for the measured fluorescence spectra and the model-building chemometric methods used for various microbial cultivations. Most of these processes are observed using the BioView® Sensor, thanks to its robustness and insensitivity to adverse process conditions. Beyond that, the PLS-method is the most frequently used chemometric method for the calculation of process models and prediction of process variables.


Introduction
On-line measurements of substrate, products, intermediate products and other physicochemical process variables during bioreactor cultivation are becoming increasingly important. For the process analytical technology (PAT) initiative of the US Food and Drug Administration (FDA) and for quality by design (QbD) approaches, software sensors support the ambition to establish on-line monitoring methods to ensure high quality of manufacture of pharmaceutical products and batch-to-batch reproducibility [1][2][3][4].
For the on-line monitoring of critical bioprocess variables in bioreactors software sensors based on off-gas analyzers, dissolved oxygen (DO) or pH-electrodes are used. Because of the complexity of the biological matrix, the access to important process variables is limited or the sensors are not robust enough for the required conditions in bioreactors [1]. Beyond that, the application of on-line measurements, for example on-line HPLC, allows for the measurement of substrate and product concentrations during cultivation [1,5]. One disadvantage of on-line HPLC is the time delay between sampling and determination of the concentration of the observed process variable. As an alternative to on-line HPLC, the control of bioprocesses by using FIA measurements is described in [6,7], but there is also a time delay for the detection of the concentration of certain process variables. A frequently announced improvement of the detection of crucial process states can be achieved by using sensitive on-line software sensors in combination with mechanistic models for the estimation of process variables [8][9][10]. Therefore, the combination of soft-sensors with multivariate data analysis enables process supervision and control [3,11]. Recently, NIR-spectroscopy has been used for bioprocess monitoring [12][13][14], as well as Raman spectroscopy [15,16]. Both are techniques based on vibrational effects. In small molecule applications chemical compounds can be identified better using Raman than NIR-spectroscopy, but NIR-spectroscopy prevails for bioprocess fingerprinting [17]. However, both methods are not as sensitive compared to fluorescence-spectroscopy.
For more than 30 years fluorescence sensors have been applied for the monitoring of various biological processes. In biotechnology, pharma and food process engineering they are used for biomass and product prediction, process or media characterization. Fluorescence spectroscopy enables a highly developed and non-invasive technique the on-line monitoring and supervision of these processes. The maintenance of optimal process parameters is ensured by this approach. In 1970 Harrison and Chance already reported the use of the fluorescence technique for the monitoring of continuous cultures of microorganisms by recording the intensity of light emitted by reduced nicotinamide adenine dinucleotide (NADH), where a single wavelength combination of one excitation and one emission wavelength was measured [18]. What has also been known for decades is the linear correlation of culture fluorescence and the biomass concentration [19,20]. The fluorescence method is improved further by using more than only one single excitation and emission wavelength pair. For this, the fluorescence of a culture broth can be measured by a range of excitation wavelengths and a single emission wavelength or vice versa a single excitation wavelength and an emission spectrum or using an excitation-emission matrix (EEM) that consists of different combinations of excitation and emission wavelengths. Nowadays, EEM fluorescence spectroscopy or so-called 2-dimensional (2D-) fluorescence spectroscopy has been widely established. Here, a combination of multiple excitation and emission wavelengths is taken to observe different biological processes [21]. During microbial cultivation there are observable significant changes within the 2D-fluorescence spectra caused by variations in the concentration of biogenic fluorophores, such as aromatic amino acids, vitamins and co-enzymes [22].
A 2D-fluorescence spectrum consists of a high number of intensity values because of the different excitation and emission wavelength combinations. This leads to large data sets that require methods for data reduction and evaluation. The important process variables are only accessible by complex analysis methods that are able to detect the information hidden in the fluorescence spectra. Common approaches to get this information out of the 2D-fluorescence spectra are chemometric models, such as multiple linear regression (MLR), principal component regression (PCR) and partial least square regression (PLS) [23,24]. In addition to these linear methods, there are a number of applications of non-linear techniques, including, for example, artificial neural networks (ANN) and further machine learning methods [9,25,26].
In addition, there are a lot of attempts to reduce the high number of variables by selecting the wavelengths combinations out of the whole 2D-spectra that are important for the description and estimation of certain process variables. For this task also artificial intelligence algorithms containing for example self-organizing maps, genetic algorithm and ant colony algorithm are described [9,[27][28][29][30].
The reduction of the large data sets by using an ant colony optimization (ACO)-based methodology is reported for the analysis of near infrared (NIR) spectra [31]. This method is also applicable for the evaluation and wavelength selection of 2D-fluorescence spectra.
There are a high number of different attempts for the usage of fluorescence spectroscopy. Following, the fluorescence spectroscopy is set up for different types of cultivations of microorganisms and feeding procedures in the biotechnology, pharma and food process engineering. In this contribution, recent applications with its evaluation methods are discussed.

Principles and Fluorophores
Sensors based on fluorescence spectroscopy are widely used for different applications in bioprocess monitoring (Table 1). However, just three of these cultivation processes [26,43,54] are carried out on an industrial scale. All other applications were only executed on a laboratory scale.
Fluorescence spectra can be measured in situ and on-line in real-time. The principle of this measurement method is the interaction of light and matter. For this reason, a fast and non-invasive measurement technique is used as a requirement for on-line and real-time supervision and control for bioprocesses [61].
The fluorescent activity of the analyte occurs from certain molecules that emitted light after absorption. These so-called fluorophores contain an aromatic system. The emission wavelength is greater than the excitation wavelength, caused by an energy loss. The process of the fluorescence is illustrated by the Jablonski diagram shown in Figure 1. A light quantum of energy hvA supplied by an external source is absorbed by the fluorophore, creating an excited electronic singlet state (S1 or S2). On these energy levels the fluorophores can exist in different vibrational energy levels (v) corresponding to the Franck-Condon-principle. From the higher vibrational level of S1 or S2 the fluorophore is rapidly relaxing to the lowest vibrational level due to internal conversion. A photon with energy hvF is emitted when the fluorophore is returning to its electronic ground state S0. The energy of this emission photon is lower, and therefore of a longer wavelength than the excitation photon. This behavior can be seen in Figure 2, where in the upper left triangle of the spectrum no fluorescence signal can be seen. This so-called Stoke's shift enables the high sensitivity of the fluorescence technique because it allows the detection of emission photons against a low background, isolated from excitation photons [62]. Characteristic features of fluorophores are the quantum yield and the lifetime. The ratio of the number of excited and emitted photons is the quantum yield, and the lifetime is defined by the average time the molecule spends in the excited state before it returns to its ground state [63]. However, the fluorescence yield can be influenced by different effects which involve energy transfer and absorption. For example, the so-called inner filter effects reduce the intensity of the fluorescence measurements when non-fluorescent components of the medium absorb excitation or emission radiation while reducing the fluorescence yield of an observed fluorophore [64]. The excited state fluorescence lifetime also changes with changes in the environment. Furthermore, the culture fluorescence depends on bioprocess variables, such as the optical density (OD), viscosity, pH, the aeration of the bioreactor and a lot of further the fluorescence measurements affecting variables [22].  Table 2 molecules that show biogenic fluorescence include the amino acids and vitamins as well as the co-enzymes flavin adenine dinucleotide (FAD), NADH and reduced nicotineamide dinucleotide phosphate NAD(P)H [63].  Table 2. Excitation and emission wavelengths for some fluorophores used in biotechnology. The typical excitation ranges from the ultraviolet to the visible range of electromagnetic waves and the red shifted emission light spans the ultraviolet and visible spectral range. By applying 2D-fluorescence technique to cultivations of microorganisms one is able to detect changes in the spectra occurring during the cultivation caused by biogenic fluorophores (Figure 2).

Fluorescence Spectrometer
Further development of early fluorescence sensors, based on only a single wavelength combination for monitoring NADH or tryptophan emission [18,61], establish the 2D-fluorescence sensors as a commonly used method. Table 3 presents different fluorescence sensors and their specified measuring range. A small number of research groups still uses fluorescence sensors to monitor only the NADH signal. The BioView ® sensor has clearly turned out to be the most frequently employed fluorescence spectrometer.
The wavelength selection is done by filter systems or by monochromators when grating technology is used. The BioView ® sensor selects the different wavelength combinations by using two filter wheels ( Figure 3). Each filter wheel has 16 slots for adding filters of specific emission and excitation wavelengths as well as no filter for the measurement of scattered light. These filter wheels can be controlled individually. The default setting of the step size is 20 nm. The fluorescence spectrometer is combined with the bioprocess via optical light guides. The measuring time for the scanning of a complete 2D-fluorescence spectrum taking all filters is roughly 1 min, which allows almost continuous measurements, resulting in large data sets for a whole fermentation. However, not all of the data contain relevant information about the process. The analysis and evaluation of bioprocesses by using continuous fluorescence measurements are done by different chemometric methods, such as the principal component analysis (PCA), partial least squares regression (PLS) or neural networks (NN). These methods filter the significant information out of the data sets.

Extracting Information Out of Fluorescence Spectra
The fluorescence spectra contain a lot of useful information about the observed biological processes. The extraction of this information is done by different approaches. In general, the following processing steps have to be done: the first step during evaluation is a preprocessing step that in some way normalizes, centers or filters the raw data to avoid effects such as noise or differences caused by different intensity maxima of the fluorescence values. The preprocessing is followed by a data reduction, decomposition or wavelength selection step, where large data sets are reduced by transforming the data or by using just a selection of all variables of the data set. Afterwards a chemometric model is calculated according to the measured process variables and the preprocessed fluorescence spectra. The quality of the models is assessed with different evaluation methods. Chemometric models of a process can then be used for the monitoring and supervision of these processes.

Preprocessing
As also typical of other measurement techniques, measurement noise can mask the information hidden in a spectrum. Furthermore, the information is not necessarily only contained in the high intensity values in the spectra but may be in the low shoulders intensity values too. Hence, to avoid these effects a preprocessing is recommendable. The batch-to-batch variability can be reduced by calculating a difference spectrum. The first spectrum after inoculation or an average of three to five spectra from the beginning of the cultivation is subtracted from all following spectra to get the difference spectra [8][9][10]. Therefore, just the changes in the fluorescence intensity values during a fermentation are considered. Additionally, to reduce the influence of noise the fluorescence signals can be smoothened by using the average values over a few spectra [42,69] or using the Savitzky-Golay-filter [70]. A subsequent normalization is performed on a spectrum by dividing each value by the spectra average. Beside this, one can divide each data point in a spectrum by the corresponding data point in the first spectrum which will generate a kind of normalized spectrum [56]. As the last preprocessing step, the data can be centered and weighted to unit variance as well as normalized. The normalization can be done by using different methods. For fluorescence data some of the methods used are, for example, the normalization of the data by dividing each value by the spectrum average value or by subtracting the spectrum mean from each value followed by a division of the standard deviation of the spectrum, which is called SNV-transformation [71]. All these preprocessing steps enhance the quality of the continuous evaluation and analysis of fluorescence data. However, there is no common rule indicating which preprocessing procedure is the best one.

Data Reduction, Decomposition and Wavelength Selection
Various types of methods for data reduction, selection of relevant wavelength combination and data decomposition are recommended for the evaluation of fluorescence spectra from biotechnological processes (Table 4).
Using 2D-fluorescence spectra leads to large data sets caused by the simultaneous measurements of different excitation and emission wavelengths. One spectrum consists of a high number of fluorescence intensities, but not all of them contain important information according to the requested process variable. Different strategies are known to handle this problem. Commonly used are methods that transform the data to variables with high variance and variables with low variance, containing just background noise of the measurement, such as the principal component analysis (PCA) [72]. The high dimensional data sets of the multivariate data are reduced using PCA.
The original data matrix X decomposes into the product of two smaller matrices-the score T and loading matrix P-plus a residual matrix E, containing just noise. The score matrix T provides information about the actual state of the process, while the loading matrix P includes information regarding the extraction of knowledge from the intensity values of a spectrum. The original data matrix is transformed into a new matrix with principal components (PC), sorted by their variance. Most of the variance is contained in the first PC, in the second there is less variance than in the first and more variance than in the third and so on. By taking the first principal components, holding almost the whole variance of the data, the dimension of the original data matrix X is reduced dramatically. Because often one, two or three PCs contain approximately all the variance of the data-the others representing only noise-the data can be visualized by plotting a so-called score plot, where further interpretations become possible, for example the identification of different process states.
Parallel factor analysis (PARAFAC) decomposes data into high important factors and areas by taking the emission-excitation-matrix in its original three-way array structure [73][74][75]. As already described for the PCA, the PARAFAC transforms the data array into sets of loading matrices and a residual matrix by mostly reducing the dimension of the data. The PARAFAC models are a straightforward extension from the two-way PCA to multi-way data. The fluorescence data is arranged in a three-way array (measurement time × excitation wavelength × emission wavelength). The PARAFAC model can be described with the following equation, where F is the number of PARAFAC components which are considered here: xijk is the intensity of ith spectrum at the jth emission wavelength and at the kth excitation wavelength. The contribution of the spectrum to each component is represented by the parameters aif, bjf, ckf, and the residuals eijf containing the noise. The values of aif, bjf, and ckf are calculated by minimizing the sum of squared eijf. The spectra are decomposed into F PARAFAC components which represent the concentrations of hypothetical substances. For fluorescence excitation-emission data the loadings constructed out of aif, also referred as scores, may be interpreted as the relative concentration of process variable f in sample i, the loading vector of bif elements as the estimated emission spectrum and the cif loading vector is the estimated excitation spectrum of this process variable. As presented in Figure 4 the components of the PARAFAC model have a direct chemical interpretation, e.g. concentration of certain process variables, in a valid model. In the case of fluorescence spectra the single components with their emission and excitation loadings correspond to certain fluorophores. The scores of the model are estimates of the relative concentrations of the fluorophores identified by the loadings [76]. Different fluorophores are described by the calculated components and can be used as a reduced data set for further analysis. The PARAFAC analysis results in a model with a reduced set of variables to a few components model that describes almost the whole variance of the data set. One obvious advantage of the PARAFAC method is the unique solution of the model. An estimated PARAFAC model cannot be rotated without a loss of fit. Both PARAFAC and PCA belong to linear decomposition methods. PARAFAC decomposition is more robust, because it uses less parameters than PCA decomposition. For some examples PCA models might exist, however no PARAFAC model. In contrast to this, the self-organizing map (SOM) is an unsupervised neural network approach for data decomposition [77]. By using SOM fluorescence data can be classified without any external supervision. This non-linear algorithm enables the mapping of the three-way fluorescence spectra with identifying those combinations of excitation and emission spectra with useful information of the process variables. The SOM projects high-dimensional data onto lower dimensional data, onto a structured set of neurons, while retaining the data topology [77]. The neural network consists of only two layers, an input and an output (SOM) layer. Therefore, the 2D-fluorescence spectrum is transformed from the matrix format into one-dimensional vectors following normalization. Every element in the input vector is connected to every neuron in the output layer. The weight vectors wj of the feature map have the same dimension as the input vector xi. The neurons of the feature map compete for the input with their internal parameters. The neuron with the nearest matching parameters wins. For every iteration during the training of the SOM, a distance of d(xi, wi) is calculated and used for their measures of similarity where different distance metrics can be used, for example the Euclidean distance. The closest neuron to the input vector is chosen with its corresponding weight vector. The results of the SOM are classified spectral data presented by a feature map where the topological relationships hidden in the large input data sets are retained ( Figure 5) [27,37]. Afterwards, a wavelength selection while reducing the whole data set can be done by choosing representative data of each of those groups. For example, by the calculation of Euclidean distances within each class, representative wavelength combinations are found.
The optimal number of classes in a given feature map can be determined by estimating the degree of scattering of the fluorescence intensities of all spectral components in the corresponding class by computing the time-dependent variance of fluorescence intensities of all of the spectral components in the class [27]. The large data set containing all combinations of excitation and emission wavelengths can be reduced by selecting just a few wavelengths containing the process information by using SOM.
Other methods for the selecting of single wavelength combinations are machine learning methods inspired by Nature, such as evolutionary algorithms like the genetic algorithm (GA) [78,79] or swarm algorithms, including the ant-colony algorithm (ACO) [80]. Both algorithms mimic principles of Nature to find a subset of wavelength combinations that are able to describe the observed biological process. Further data decomposition and wavelength selection methods are described elsewhere (see Table 4). The decomposition of the large data set or the reduction by wavelength selection can improve the models while decreasing the computational effort.

Modeling for Variable Prediction
With the help of fluorescence spectroscopy measurements, meaningful information about the physiological state of a microorganism or process states can be obtained. Thus, to convert the information contained in the fluorescence spectra, modeling approaches must be applied. As an input to the modeling procedure the reduced data are used (such as the principal components of PCA, the PARAFAC components and the selected wavelength combinations of SOM). The procedure of calibrating a model and applying it for prediction is performed in two steps. Starting with the on-line measurements of 2D-fluoresscence spectra and the corresponding off-line values of the process variables of interest from one or more initial cultivations, a process model will be calculated (step one). Afterwards, the values of the process variables of a new cultivation can be predicted from the 2D-fluorescence spectra using the calculated model (step two).
A lot of different approaches are described in literature, including linear and non-linear methods as well as supervised and non-supervised learning procedures (see Table 5). Beside the different methods for the calculation of the chemometric models, there are a lot of various evaluation techniques for the prediction of quality. The models are calculated by using the whole 2D-fluorescence spectra or by taking reduced data as described above (Section 3.2). Most of the models for monitoring microbial cultivations are based on linear modeling, especially the partial least squares (PLS) regression method [23,75,81]. PLS enables the identification of the factors that not only show the largest amount of variance, but also allows a linear correlation between the fluorescence spectra and process variables via the covariance of the two data sets. Besides PLS, some other linear techniques are mentioned, including principal component regression (PCR) [36,69], linear weighted regression (LWR) [20,28,41,49,59] or multi-linear method, such as n-way partial least squares regression (NPLS) [10,28,47,58]. All these methods build models by identifying at least bi-linear correlations between the process variables and the fluorescence spectra [82,83].
Because of the inherently nonlinear nature of biological processes, the applications of different kinds of artificial neural networks (ANN) are described by some research groups [35,69,84], using different architectures. Different algorithms are implemented to handle cultivation data with neural networks [85,86], for example using feed-forward neural networks (FFNN) [21] with back-propagation neural networks (BPNN) [31]. By applying these models important bioprocess variables can be predicted on-line and in real-time, such as biomass, substrates and products as can be seen in Table 5. There is just one example where a closed loop for substrate feeding is described [10]. They implemented a control algorithm by predicting the metabolic state of the yeast cells by using fluorescence spectra. With this control procedure the yield of biomass was much higher due to a pure oxidative growth of the cells. For the implementation of the models different software is used. As can be seen in Table 5, Matlab ® has most of all applications. Different special toolboxes for building chemometric modeling are available in Matlab ® , for example the PLS-toolbox or the neural-network-toolbox. Altogether, there is a broad range of different applications for fluorescence spectroscopy, chemometric models, implementation software and various validation techniques (details in Table 5). When a process model is used for the prediction of process variables, the quality of that prediction must be judged. The evaluation of the predicted values can be done by using, for example, the root mean square error of prediction (RMSEP), or the goodness of the fitted model can be assessed by using the coefficient of determination (R 2 ).

Conclusions and Future Trends
The application of fluorescence spectroscopy, especially 2D-fluoresence, is becoming more and more interesting for the analysis of processes in the biotechnology, food and pharma industry. This fact is also pushed by PAT and QbD efforts. The batch-to-batch reproducibility, the control and automation get improved by using chemometric models based on fluorescence spectroscopy. The fast analysis enables the identification of critical process states in time and increases the efficiency of biotechnological processes.   PCN 8 units For this, the monitoring and prediction of the biomass concentration is the most described process variable. The applicability of fluorescence sensors for microbial cultivation, especially for E. coli and S. cerevisiae cultivations, is reported by a lot of research groups (Table 1). In addition there are further applications to more microorganisms and fungi as well as plant and mammalian cell lines. The comparison of the results, published by different research groups, is hindered by the different established quality criterions. The RMSEP is sometimes given in relative concentration or in percent. However, it is not mentioned what the reference is, the maximum or the average of each individual value of the observed process variable, which can make a big difference. In other cases the R 2 is provided, but without any related estimation errors for the evaluation of the prediction quality. Standardized quality criterions are desirable for further investigations so that results can be compared.
Nevertheless, all these approaches demonstrate that the process monitoring and automation can be highly improved by using fluorescence spectroscopy in combination with chemometric modeling. As evaluation technique PLS methods are dominating. However, also other approaches including ANN, GA and ACO show great potential to increase the analysis quality of fluorescence-based process control. While analyzing fluorescence-based data it should be taken into account that medium fluorescence or other effects as the mentioned inner filter or cascade effects can reduce the applications of fluorescence sensors. The quality and performance of processes in biotechnology, food and pharma industry can be carried out with methods for quality control, such as the "golden batch", where an optimal batch becomes the starting point for all following batches. An ideal process control strategy can be implemented with the help of process knowledge from using fluorescence sensors and process models. Altogether, the establishment of process sensors based on fluorescence spectroscopy should continue for further improvement of biotechnical processes in the future.
Off all published applications of fluorescence spectroscopy for bioprocess monitoring just three came out of an industrial environment, indicating that there is a big gap between the academic (laboratory scale) and the industrial application. One reason for this might be the fact that just for new implemented processes new measurement methods have a chance to be installed. Therefore, there will be always a kind of lag-phase between academic and industrial applications. But this lag-phase can be long because, as Max Planck mentioned [87]: "A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it." However, another reason for the gap might be the overall costs, which include the investment costs, the (re-)calibration cost as well as the maintenance cost. Although no reagents are necessary, the measurement method is mostly indirect. Furthermore, today just multipurpose fluorescence spectrometer are around. A fluorescence sensor which uses just specific wavelength combinations for a specific application will be of low-cost for the hardware. Therefore, using such sensors fluorescence applications even in an industrial environment might increase in the future.

Conflicts of Interest
The authors declare no conflict of interest.