In order to estimate the quality variables in real-time, one can use mathematical models, computation methods and previous knowledge to derive new process information. Today the concept is well established in engineering science and parts of the process industry. The soft sensor principle is based on hardware sensors monitoring a bioprocess in real-time and generating online data used by the soft sensor model to estimate process variables. The estimated process variables can then be used for monitoring purposes or be implemented in feedback control of the bioprocess. This section describes the typical steps of data-driven soft sensor model development. There are many steps in the development of soft sensor models in the literature [
6,
8], but in this review paper, soft sensor development has the following main steps: (I) data collection and pre-processing, (II) selection of variables, (III) soft-sensing model selection, training and validation and (IV) performance evaluation of the soft-sensing model. The presented soft sensor development procedure is quite general and can thus be applied to fermentation processes.
3.1. Data collection and Pre-Processing
During the initial stage of soft-sensor model development, an inspection of historical data is performed, which leads to better process understanding, diagnostics, and improvement. The goal of this phase is to recognize any obvious problems and attain an overview of the overall structure of data which may be managed at the initial stage of development. This paves the road to the subsequent use of collected plant data for model optimization, identification or other related black-box (e.g., data-driven) methods. The next goal of this phase is to ascertain the necessities for the model complexity. A well experience soft-sensing modeling developer can make a right decision at this stage whether we should use a simpl linear model, complex model or non-linear NN model for the soft sensor prediction model. Sometimes the decision of the soft sensor model developer may not be correct in the first stage, therefore the performance evaluation of the model should be compared with alternative soft sensor models at the later stage of development [
6].
Apart from soft sensor model evaluation, the following is an important question to be addressed before we dive deeper into model development: how do you prepare the input data and output data or targets before feeding them into actual soft sensor model? The goal of this step is to turn the data in such a way that the actual model can process it more efficiently. This includes dealing with missing values, outlier’s detection and replacement, data normalization, and feature extraction [
8,
11].
The problem of missing values is very essential to understand in order to effectively manage data. This problem arises when no value is stored for one or more variables in an observation. There are distinct approaches to deal missing values. The first approach is the removal of samples containing null values. This approach is advised only when the number of missing values is small, otherwise will not give the expected results while predicting the output [
12]. In the second approach, fill-in the missing values using some methods, for example, calculate either mean, median or mode of the feature and replace it with the missing values. These two approaches are quite common to deal with missing values. Some other useful approaches are also reported in the literature such as hot-deck imputation [
13], maximum likelihood (ML), and expectation-maximization EM [
14,
15].
The observation values of the sensor are called outliers. Outlier values deviate from the typical or meaningful range of values. Outliers can be caused, for example, hardware failures, communication errors, incorrect readings, the process working conditions, and so on. This can affect the performance of the soft sensor model [
6]. To alleviate the effects of outliers it is necessary first to identify them, and then to treat them. There are different outlier detection methods reviewed in the literature such as the 3σ outlier detection method [
16], and the Hampfel identifier [
16,
17,
18], which is a robust version of 3σ outlier method. The above methods are based on univariate outlier detection. In [
8,
19], the authors presented multivariate outlier detection methods. Some other multivariate methods employed in the soft sensor context is based on data projection such as Jolliffe parameters with PCA and PLS [
8,
20]. A comparison of different outlier detection approaches was provided in [
21].
In NN and other data-driven soft sensor models we need to normalize the inputs; otherwise, the model will be ill-conditioned. There are different normalization methods reported in the literature such as min-max normalization, z-score normalization [
8], and zero-mean normalization [
22]. Data normalization means to have a value between 0 and 1, which is the simplest method of normalization [
23,
24].
The significance of pre-processing is important because of the data characteristics. Data pre-processing is the step that requires a large amount of manual work and expert knowledge about the underlying process. A more detailed discussion is provided in [
25] regarding pre-processing approaches with real-world examples and their applications in the soft sensor context. A general overview of data pre-processing approaches is also presented in [
26].
3.2. Selection of Variables
The very next question a soft sensor model developer faces is with regards to the selection of input variables that influence the model output. Generally, the number of input variables of soft sensor model should not be too many; otherwise the model structure will be for more complex, there will be a large probability of overfitting occurring, and it influences the model’s training speed and output. For NN modeling a reduction in the input variable leads to a simplified NN architecture and reduced training time [
27]. There are many advantages in a reduced number of variables such as decreasing costs, reduction of model development time, and enabling the feasibility of an application. Many researchers have reported both supervised and unsupervised variable selection methods in the literature such as principal component analysis (PCA) [
28,
29], filter methods (correlation coefficient), wrapper methods, embedded methods [
30], mean impact value (MIV) [
31], and uniform incidence degree algorithms [
24], to give just a few examples. However, in most of the cases, the selection of most relevant variables for many soft sensor applications is made by system experts. Nonetheless, the selection of variables is an effective procedure to increase the model performance, prevent overfitting, and also to avoid the curse of dimensionality phenomena. Special attention is given to the judgement of the output variables such as product concentration, cell concentration, and substrate concentration. The selected data is used for the training and evaluation of the soft sensor model.
3.3. Soft Sensor Model Selection, Training and Validation
A soft sensor ‘model’ can be a mathematical representation of a real-world process. Generally, a soft sensor model is organized into two distinct categories, such as mechanistic models (first principle or white-box) and data-driven (empirical or black-box) models. The mechanistic models belong to the first category of soft sensors which are based on the first principle approach. Usually, mechanistic soft sensor models focus on the description of the optimal steady states. This type of soft sensor is mainly based on deriving equations that can describe the physical and chemical background of the process. For example, the Kalman filters and extended Kalman filter techniques belong to this category. The major disadvantages of mechanistic models is that they are considered to be rather time-consuming, as most of the processes are very complex and they are unable to express the actual process conditions [
9]. As a result, the data-driven soft-sensing modeling algorithms are becoming progressively popular in the process industry [
6] because data-driven soft sensor models are based on empirical observations and thus describe the real conditions of the process. Moreover, it requires few knowledge about the system to be modeled, describes the input and output relation more accurately, and producing reliable real-time estimation of unmeasurable process variables in the process industry. For example, SVM, NN, FL methods belong to the second category of soft sensor models.
For soft-sensing modeling, the selection of a model is always an important matter that needs to be paid much attention. Model selection is the critical process of selecting one final model from among a collection of candidate models for the training dataset. As the model is the heart of a soft sensor, the selection of the optimum kind is important for the soft sensor’s performance. The choice of models is also often subject to the personal preference and past experience of the developers that can be of a drawback for developing an optimum final soft sensor. This can be seen in the subject of published applications for soft sensors, where several researchers concentrate strongly on a model type in their field of expertise. Just to give a few examples, if the process variables are non-Gaussianly distributed or they have a non-linear relationship with each other, a non-linear probabilistic latent variable modeling method needs to be utilized; If the process variables are Gaussianly distributed or they have a linear correlation with each other, then a linear Gaussian probabilistic/PCA should be employed [
32]. Deep neural network models demonstrate great performance in complicated highly-non-linear processes, comprising richer information in deep layers of network and large training datasets [
33]. With the advantage of a small amount of data, SVM and LS-SVM enjoy high efficiency and robustness and thus have been commonly used [
34]. Another possible way for this task is to start with a simple model like linear regression and gradually increase the complexity of the model until we consider that we have a good model [
6,
35].
A large number of samples and the complex nature of the soft-sensing modeling methods need very long training times. As such, it is typical to use a simple separation of sample data into training and validation or training and test samples. By using the training samples, we train our selected soft-sensing model (used to fit the model) and evaluate the model on the validation/test data. It is essential to evaluate the soft sensor model performance on independent data while performing this task. There are different approaches [
36,
37] to estimate the performance of soft sensor model on unseen data such as automatically splitting a training dataset into train and validation datasets, manually and explicitly, and evaluating the performance by using k-fold cross-validation.
3.4. Performance Evaluation of Soft Sensor Model
After selecting the optimal soft sensor model structure and training the model, a trained soft sensor model has to be evaluated on an independent/test dataset once again. The test data provides the gold standard used to evaluate the soft sensor model. However, how to evaluate the performance of the data-based model is still an open question because performance evaluation is highly related to the selection of the soft sensor type. There are different criteria’s available for performance evaluation of each monitoring model. In the case of numerical performance evaluation, mean square error (MSE) or root mean square error (RMSE) loss function commonly used for such type of soft sensor models. Recently, in [
38] the authors proposed a novel fuzzy decision fusion system based on an analytic hierarchy process (AHP) for online process monitoring. Seven different tools were suggested for the evaluation of the model performance and tested on six different data-based process modeling methods.
The methodology presented in this review paper is the one most commonly used but is not the only possible way for developing a soft sensor model. For example, the presented methodology of a soft sensor model in [
8] and [
6] is detailed but consistent with our discussed methodology. In [
39], in addition to a general five-step soft sensor development methodology consisting of: (i) the collection of historical data and pre-processing, (ii) variables selection, (iii) model selection and training, (iv) validation, and (v) model maintenance, is more detailed but consistent with the methodology discussed here. There is a significant difference between the discussed methodology and in other presented works in that we have explained all the development procedures with the help of a real-time example of the microbial fermentation process.
3.5. Use Case Implementation of Soft Sensor
In this review paper, a penicillin fermentation process is taken as a research example for a better understanding of the development procedure of a soft sensor model. An experimental study was conducted on the penicillin fermentation process in the biological fermentation tank of the “National Key Discipline” laboratory of Jiangsu University. As we know, the penicillin fermentation process is a time-varying and complex nonlinear biochemical process. There are many input variables (auxiliary variables) available in the fermentation process, such as dissolved oxygen, CO2 concentration, temperature, reactor volume, pH value, stirring speed, substrate given rate, and so on. If all of them listed as input variables, the proposed model would be more complicated and affect the training speed. In order to determine the impact of input variables on soft-sensing model output, the concept of incidence degree is employed to assess the incidence degree among input and output variables.
Every one minute the field test data is sampled: glucose flow fg (u1), corn pulp flow rate fcs(u2), flow rate of potassium dihydrogen phosphate fp(u3), calcium carbonate flow rate fc(u4), flow rate of gluten powder fr(x5), dissolved oxygen concentration CL(x4), carbon dioxide concentration Cco2(x5), [H+]concentrations [H+](x6), and fermentation broth volume V(x7). Every four hours, the offline biological variables are obtained by sampling and testing: cell concentration X(x1), substrate concentration S(x2), and product concentration P(x3). The cell concentration, substrate concentration, and product concentration are selected as the target variables. The growth of biomass cells depends on several environment factors involving all the types of input control variables. Substrate concentration was tested with a glucose analyzer and a UV altimeter is used to measure product concentration. A total of 10 batches of fermentation process data were collected over a 200 hour span (between each batch). In order to improve the measurement accuracy, the sample data should be normalized within the range of [0,1].
In recent years, many data-based techniques have been introduced for real-time estimation or process monitoring and fault detection, where every technique performs well under its own assumption. In other words, a technique that performs well in a certain process condition may not provide a reasonable performance under several other process conditions, because of different data features [
38]. This review paper proposes a soft-sensing model based on Least Square Support Vector Machine (LS-SVM). The proposed model is successfully applied to the estimation of cell concentration, substrate concentration, and product concentration in a penicillin fermentation process. LS-SVM is a machine learning method based on statistical learning theory. It has excellent learning ability and prediction ability with small sample data, low difficulty in training and has been widely used to predict the quality variables in the fermentation process, steel, chemical and other industries [
40,
41]. Practice demonstrates that the values of the kernel parameters and penalty parameters of LS-SVM play a significant role in the generalization ability and accuracy of the model, and improper selection may make the LS-SVM prediction model prone to over-fitting and poor generalization ability. To solve this problem, many researchers have developed several optimization algorithms for LS-SVM model parameter selection. In this work, the parameters (e.g., regularization parameter C and gamma) of the LS-SVM model are optimized by using an evolutionary algorithm, namely particle swarm optimization (PSO). The basic idea of the PSO algorithm is to discover the global best solution through provided information and sharing among individuals in a group. A total of 10 batches (2,000 samples) of fermentation data were used in this example, among which the first six batches (1200 samples) of the experimental data were used to train the model for minimum error. The 7th and 8th batch (400 samples) of the experimental data is used for k-fold cross-validation and the data of the last two batches was used to test the final soft sensor model. The optimization algorithm has been iterated many times to attain a global optimal point. The actual and predicted results of the soft-sensing model based on PSO-LS-SVM and LS-SVM are shown in
Figure 2, these results are compared with the LS-SVM model to verify the effectiveness of the soft-sensing model.
Figure 3 presents the error curve between the PSO-LS-SVM soft-sensing value and the LS-SVM value. The simulation results demonstrate that the prediction results of PSO-LS-SVM soft-sensing model are closer to the real values.
Once the soft sensor model is ready for prime time, we test it one final time on the test dataset. In this example, the proposed model is used to train the data, which verified the fitting degree and prediction accuracy with the test dataset, and we select the mean absolute error (MAE) and root mean square error (RMSE) as the evaluation criteria for model performance.
Table 1 displays the prediction MAE and RMSE results of the PSO-LS-SVM soft-sensing model and the LS-SVM soft-sensing model on the test dataset. It can be seen that the error difference of the PSO-LS-SVM model is less than that of the the LS-SVM model. This is because of the optimization algorithm used, as the PSO-LS-SVM modeling method has excellent learning ability and prediction ability with small sample data and is suitable for the penicillin fermentation process. A more detailed discussion on optimization techniques will be described in
Section 5.