Deep Convolutional Feature-Based Probabilistic SVDD Method for Monitoring Incipient Faults of Batch Process

: Support vector data description (SVDD) has been widely applied to batch process fault detection. However, it often performs poorly, especially when incipient faults occur, because it only considers the shallow data feature and omits the probabilistic information of features. In order to provide better monitoring performance on incipient faults in batch processes, an improved SVDD method, called deep probabilistic SVDD (DPSVDD), is proposed in this work by integrating the convolutional autoencoder and the probability-related monitoring indices. For mining the hidden data features effectively, a deep convolutional features extraction network is designed by a convolutional autoencoder, where the encoder outputs and the reconstruction errors are used as the monitor features. Furthermore, the probability distribution changes of these features are evaluated by the Kullback-Leibler (KL) divergence so that the probability-related monitoring indices are developed for indicating the process status. The applications to the benchmark penicillin fermentation process demonstrate that the proposed method has a better monitoring performance on the incipient faults in comparison to the traditional SVDD methods.


Introduction
Due to the huge market demand for small-batch and high-added-value products, the batch process has been important means of production in modern industrial systems. Typical batch processes can be seen in the pharmaceutical industry, fine chemical engineering, food production, and semiconductors, etc. Compared with the traditional continuous process, the batch process is more complicated due to the significant process nonlinearity and its non-stationary property. In order to ensure good production quality and maximize the factory profits, real-time fault detection and process monitoring technology has been extensively studied in recent years [1,2].
The present fault detection methods can be categorized into model-based and databased. Because there exists great difficulty in building mechanism models, model-based methods are rarely applied in batch process monitoring. On the contrary, as the advanced computer control systems bring a large amount of process data, data-driven fault detection methods have become a topic of major interest [3,4]. These data-driven methods apply machine learning and multivariate statistical analysis tools to extract data features, and they then build the statistical monitoring models to describe the process behaviors. Some classical data-driven methods include principal component analysis (PCA), slow feature analysis (SFA), canonical variate analysis (CVA) and support vector data description (SVDD), etc. [5][6][7]. As the process data of multiple batches constitute a multiway matrix involving the batch, variable and time, these methods used in batch process monitoring are often called multiway methods, e.g., multiway PCA (MPCA), multiway SFA (MSFA), multiway CVA (MCVA) and multiway SVDD (MSVDD). For convenience, the word multiway is omitted in this paper since all the discussed methods are multiway-related. the intrinsic data features sufficiently. In recent years, deep learning methods have shown great success in many big data processing fields, including image recognition, natural language processing and genetic data analysis [25][26][27]. Deep learning neural networks utilize the mutiple stacked hidden layers to mine the intrinsic data features, which is helpful to improve the pattern recognition performance.
Motivated by the above analysis, the aim of this paper is to design a deep SVDD model for better incipient fault monitoring. In order to achieve this goal, a deep convolutional feature-based probabilistic SVDD method is proposed for monitoring incipient faults in batch processes. Different to the traditional shallow feature extraction, this method applies a convolutional autoencoder network to extract the deep data features. The extracted features include two parts: the encoder output and the reconstruction error. In order to make use of the probability information of these features, the Kullback-Leibler divergence is applied to measure their probability distribution changes so that the original deep features are transformed into probability-related features. Further, SVDD modeling is performed on these probability-related features to build the monitoring indices for sensitive incipient fault detection. Finally, we use a penicillin fermentation process to validate the proposed method.

Batch Process Data Preprocessing
Batch processes produce the same products by the batch-by-batch mode. As one batch operation brings a two-dimensional matrix involving the variable and sample dimension, the training data from multiple batches constitute a three-dimensional matrix X(I × J × K), where I, J, K represent the number of batches, variables and samples. This kind of multiway dataset is not directly analyzed because most of the present statistical modeling methods can only deal with a two-dimensional data matrix. Therefore, we need to unfold the three-dimensional data into a two-dimensional matrix and perform the corresponding normalization in the preprocessing stage.
The typical preprocessing method is batch-variable unfolding [13]. Its details are demonstrated in Figure 1. Firstly, the original three-dimensional data matrix X(I × J × K) is unfolded along the batch direction into X(I × K J). After the batch unfolding, the data are scaled to zero-mean and unit-variance in this direction. Secondly, data are rearranged along the variable direction into X(IK × J) to highlight the variation of the variables. After these two steps, a two-dimensional matrix is available for the following statistical modeling.

Overview of SVDD Principle
SVDD is a well-known one-class classification algorithm and has been widely applied to different kinds of anomaly detection tasks [7,28]. Its main idea is to find a minimal hypersphere to enclose as many training samples as possible.
Given the training dataset {x i , i = 1, 2, · · · , n}, the SVDD optimization objective can be described as [28] min R,c, i where c is the center of the hypersphere, R is the radius of the hypersphere, i is the slack variable, and γ is the trade-off parameter. The above optimization is under the linear assumption. However, the real process data often have nonlinear relationships. Therefore, a nonlinear mapping function Φ(.) is introduced to transform the original data space into a high-dimensional feature space, where the data are linearly related. In this case, the SVDD optimization constraint Equation (2) should be rewritten as By introducing the Lagrange multiplier, the dual optimization problem is given by [28] min where α i , α j are the Lagrange multipliers. As the real nonlinear mapping is often unknown, the kernel trick is utilized to deal with the inner production of two nonlinear vectors. This means that where K(.) represents the kernel function operation. In this paper, the common Gaussian kernel function is applied.
To solve the quadratic programming problem in Equations (4) and (5), the hypersphere center is and the hypersphere radius is where x * i is any sample corresponding to the 0 < α i < C, which is the so-called support vector. Given the new testing vector x t , its squared distance to the hyphersphere center is used to construct the monitoring index D as [21] In the anomaly detection scenario, the monitoring index of normal samples should be smaller than R 2 . The samples with D > R 2 are usually thought to be anomaly points. However, the hypersphere radius R lacks clear statistical significance. By referring to the common practice in data-driven fault diagnosis fields, this paper applies the kernel density estimation technique to determine the 99% confidence limit as the detection threshold D lim [21]. This means that the normal samples have the D < D lim with a 99% confidence level.

PCA-SVDD Method
Considering that the original process data are correlated and the intrinsic fault features may be covered by noise, PCA is often used to capture the uncorrelated data features [7]. Given the training data matrix X = [x 1 x 2 · · · x n ] T ∈ R n×m , it can be decomposed into two parts, namely the principal component matrix T and the residual matrix E. The PCA decomposition procedure is formulated as where P ∈ R m×k is the loading matrix, describing the project directions of the first k principal components. PCA is integrated with SVDD for the enhanced SVDD model, called PCA-SVDD. In this model, PCA is firstly used to analyze the original training data and then SVDD modeling is carried out on the matrices T and E, respectively. For the testing vector x t , its principal component vector and residual vector are denoted by t t and e t , respectively, which are computed by The corresponding two SVDD monitoring indices D (t) and D (e) are built as [22] Different to the basic SVDD method with only one monitoring index on the original data, PCA-SVDD has two monitoring indices for the principal components and residual components, respectively. Their detection thresholds D lim are determined by the kernel density estimation technique. As two uncorrelated subspaces are monitored, PCA-SVDD can investigate the process changes more elaborately.

Method Framework
When the traditional SVDD and PCA-SVDD methods are applied to monitor complicated industrial processes, they often neglect the early detection of incipient faults because these methods are only able to extract the shallow features. In order to deal with this issue, this paper proposes one deep probabilistic SVDD (DPSVDD) method by combining the deep learning technology. On the one hand, a typical deep neural network, the convolutional autoencoder, is introduced to mine the deep data features in order to reflect the incipient faulty information more sensitively. On the other, considering that the incipient faults do not lead to a significant fault amplitude but may lead to local probability changes in the data features, this paper applies the Kullback-Leibler (KL) divergence to transform the deep features into probability-related features.
The improved SVDD method has the following framework, shown in Figure 2. According to this figure, the DPSVDD method involves four steps: data preprocessing, deep features extraction, probability analysis and SVDD modeling. After preprocessing, the batch process training data are input into the convolutional autoencoder for the mining of deep features, including the encoder output and reconstruction error. Then, the Kullback-Leibler divergence is applied to extract probability features. Lastly, the SVDD modeling is performed to build the monitoring index.

Deep Convolutional Feature Extraction
This paper applies the convolutional autoencoder to extract the hidden data representations. The autoencoder is firstly introduced. An autoencoder is a specific unsupervised neural network where the expected outputs are set to be the same as the inputs [29,30]. As shown in Figure 3, usually, one autoencoder network includes five kinds of layers: input layer, encoder layer, bottleneck layer, decoder layer and output layer. The first three parts construct the encoder, which compresses the input data to obtain the main data variations. In some sense, the encoder can be viewed as one nonlinear PCA. The last three parts constitute the decoder, which tries to recover the original input from the extracted features at the bottleneck layer. The optimization objective of the autoencoder is to minimize the reconstruction error between the inputs and the outputs [30]. In the training process, the network parameters are adjusted based on the back propagation of the reconstruction errors.  A convolutional autoencoder inherits the feature extraction idea of the basic autoencoder, but it uses the convolutional operation to replace the fully connected operation in the encoder and decoder layers [31]. Due to the use of multiple convolutional layers, the network node number is enlarged. Therefore, the pooling layers are often designed after the convolutional layers. Similarly, the upsampling layers are inserted after the deconvolutional layers. The common convolutional autoencoders include the 1D type and 2D type [31,32]. The 2D convolutional autoencoder preserves the temporal and spatial locality effectively. A typical 2D convolutional autoencoder is shown in Figure 4. Two main operations in a convolutional autoencoder are convolution and pooling. The convolutional operation applies the convolutional kernel to extract the local features, which can be expressed by where X, T (l) stand for the input matrix and the convoluted output matrix, respectively, denotes the convolution operation, W (l) stands for the lth convolution kernel, b (l) is the bias corresponding to the W (l) . f (l) (.) is the activation function at the l-th layer. In this paper, the commonly used the Rectified Linear Unit (RELU) function is used.
The pooling operation compresses the data dimension for efficient computation. The pooling layer, also called the subsampling layer, is usually used after the convolutional layer. There are two common types of pooling: max pooling and average pooling. In this paper, average pooling is applied.
Based on a series of convolution and pooling operations, the encoder output, i.e., the bottleneck layer, can be obtained where L E is the number of convolutional operations in the encoder. Similarly, the output of the decoder is obtained based on the deconvolutions and up-samplings byX whereX is the reconstructed input, and L D is the number of the deconvolutional operations in the decoder. The mean square error between the original input data and the reconstructed data can be used as the cost function where X(k),X(k) represent the k-th input image and the corresponding reconstruction image. During the training, the reconstruction error is minimized through optimizing the network weights so that the final weights are optimal as W * (1) , · · · , W * ( Based on the CAE, two kinds of features can be obtained for process monitoring. One is the bottleneck layer, also the encoder output, which represents the compressed data variations, while another is the reconstruction error, which represents the remaining residual information.
For the test image data X t , the encoder output vector is denoted as t t , and the reconstruction error vector is expressed by e t . They are expressed by where d is the width size of testing image data, andX t is the reconstructed input data aŝ By applying SVDD modeling, we can obtain the corresponding CAE-SVDD monitoring indices D (t) and D (e) . Their detection thresholds are also determined by the kernel density estimation on the training data.

Probabilistic Monitoring Index Construction
As a deep learning technique, CAE provides more effective features for SVDD modeling. This factually prompts the traditional shallow SVDD model to the deep SVDD model. However, the deep SVDD model based on CAE feature extraction still omits the probability information of the monitored features, which may be helpful in the incipient fault detection. In some incipient fault cases, the probability distribution of features is affected although the amplitude is not significantly changed.
To evaluate the changes in the probability distribution of features, Kullback-Leibler divergence (KLD) is often used. KLD, also called relative entropy, measures the difference in two given probability distributions. For a random variable x, its two continuous probability distribution functions are denoted as P(x) and Q(x). The KLD of P(x) over Q(x) is expressed by It should be noted that KLD is not a symmetric metric. In other words, the KLD from P(x) to Q(x) is not the same as the KLD from Q(x) to P(x). In practice, a modified symmetric KLD version is defined as Under the assumption of Gaussian distribution, the expressions of P(x) and Q(x) can be given as where µ j , σ j are the mean and the standard variance of P(x), andμ j ,σ j are the mean and the standard variance of Q(x). Furthermore, the KLD can be computed as In the fault detection scenario, we can utilize Q(x) as the reference probability distribution, while applying the P(x) as the tested probability distribution. If these two probability distributions are the same, i.e., P(x) = Q(x), there exists KLD(P||Q) = 0. Otherwise, KLD(P||Q) > 0. Therefore, KLD can be used to investigate the deviations of monitored variables. In this paper, we apply the KLD to measure the probability changes of the deep features t t = [t t,1 , t t,2 , · · · , ] and e t = [e t,1 , e t,2 , · · · ], so that the probability-related features are expressed by t KLD i = KLD(P(t t,i ), Q(t t,i )) where t KLD i , e KLD i are the probability-related features corresponding to the encoder output t i and the reconstruction error e i . Further, building the SVDD models based on these probability-related features will yield deep probabilistic SVDD models.

Batch Process Monitoring Procedure
The monitoring procedure based on DPSVDD includes two phases: offline modeling and online monitoring. In the first phase, process training data of multiple batches are collected to build the DPSVDD model, while the online new data are projected onto the developed model in the second phase. The monitoring indices are compared with their threshold to indicate whether a fault occurs. The whole procedure is detailed below.
Offline modeling phase:

1.
Collect the offline training data from the batch process and perform the preprocessing using the batch-variable mode.

2.
Apply the deep convolutional autoecoder to extract the encoder output and the reconstruction errors as the process deep features.

3.
Compute the probability-related features by the KL divergence corresponding to the training data.

4.
Build SVDD models based on the probability-related features of training data.

5.
Compute the detection threshold of monitoring indices.
Online detection phase:

1.
Collect the online new data vector and preprocess it based on the training data.

2.
Project the preprocessed data onto CAE and obtain the corresponding features.

3.
Compute the probability-related features by the KL divergence for the new data.

4.
Calculate the monitoring indices of the new data vector.

5.
Compare the monitoring indices with the corresponding detection threshold and judge the process status.
It should be noted that the above procedure does not consider the model updating. In real applications, when the engineers judge that the current model cannot reflect the normal operations, so that high rates of missing detection or false alarms occur, the monitoring model should be updated.

Case Study
A case study on the penicillin fermentation process is given in this section. The penicillin fermentation process is a complex biochemical reaction system with batch operations and has been widely used as the benchmark objective for testing different batch process monitoring methods [11,33,34]. A diagram of a penicillin fermentation reactor is shown in Figure 5. This process consists of two main operating stages: the pre-culture stage and batch feeding stage. During the initial pre-culture phase, a large number of the nutrients necessary for cells are produced and penicillin cells appear during the period of exponential growth. In order to maintain a high yield of penicillin in the batch feeding stage, a continuous supply of glucose to the fermentation process is needed to keep the biomass growth rate constant. In order to provide the best conditions for the production of penicillin, the temperature and pH of the fermenter are controlled in two closed control loops. The process simulation is performed based on the software Pensim V2.0 [35]. In the simulation procedure, we collect 17 measurements as monitoring variables. Gaussian noises are added in the variable collection procedure. Thirty batches of normal operation data are collected to form the modeling data set, where each batch consists of 400 h with a sampling time of 0.5 h. This means that each dataset has 800 samples. In this paper, the simulated normal operation data belong to the same operation mode. It is assumed that all the used 30 batches are enough to describe the normal data changes sufficiently. Besides the normal operation case, we also simulate six batches of faulty operations. The detailed fault cases are listed in Table 1, which includes the step change and slope change in the aeration rate, agitator power and substrate feeding rate. The proposed method is compared with the other four methods: SVDD, PCA-SVDD, AE-SVDD, CAE-SVDD. Among these methods, SVDD is the basic method. In the other methods of PCA-SVDD, AE-SVDD, CAE-SVDD, different feature extraction layers are designed. The proposed method not only considers the deep feature extraction, but also utilizes the probability information to enhance the incipient fault detection. For the SVDD method, the Gaussian kernel width is used and the trade-off parameter γ is set to 0.05. When PCA is applied, the retained principal component number is determined by the rule of 85% cumulative percentage of variance. In the AE-SVDD, the node number of the bottleneck layer is set to 5. The CAE-SVDD is slightly complicated. As 2D-CAE is used, the moving window 17 × 17 is used to construct the 2D data image. The detailed CAE network structure parameters are listed in Table 2. The fault F2 is firstly illustrated, which is the ramp change in the aeration rate with the slope of +(0.5 L/h)/100 h. The monitoring results of different methods are listed in Figures 6-10. When SVDD is applied to monitor this fault, its monitoring chart is shown in Figure 6, where the monitoring index D alarms until the 400-th sample with the fault detection rate of 61%. Due to the strong noise and the slow speed of the incipient fault, SVDD cannot detect this fault sensitively. By applying PCA for feature extraction, the PCA-SVDD monitoring chart in Figure 7 gives a fault detection rate of 62% and 12% for the monitoring indices D (t) and D (e) , respectively. This result is slightly superior to the SVDD's. Further, using the autoencoder for nonlinear principal component extraction, AE-SVDD achieves fault detection rates of 58.5% and 61.67% for the two monitoring indices, respectively. Compared to PCA-SVDD, AE-SVDD improves the monitoring results of D (e) significantly. Although these three methods have a slight performance difference, none of them can detect this fault effectively. With the strong aid of the convolutional autoencoder in mining deep features, CAE shows a clear improvement, as shown in Figure 9. In particular, the index D (e) of CAE-SVDD detects the fault at the 270-th sample. Its fault detection rate reaches 87.17%. However, its D (t) index gives a poor fault detection rate of 48.5%, which is even lower than the basic SVDD's. By further considering the probability information of deep features, as shown in Figure 10, the D (t) index of DPSVDD enhances the detection performance to 71.83%, while the D (e) retains a high fault detection rate of 89.17%. The testing results on the fault F2 demonstrate that the CAE-SVDD can improve the monitoring performance of incipient faults effectively, and the proposed DPSVDD can lead to a further enhancement.  Another illustrated case is fault F5, which involves a 6% step change in the substrate feed rate. When this fault occurs, the SVDD D index in Figure 11 fluctuates around the detection threshold and cannot indicate the fault clearly. Its fault detection rate is only 37.5%. When using PCA-SVDD in Figure 12, the detection results are still unsatisfactory because its fault detection rates are 37% and 3.67%, respectively. Therefore, the linear features cannot reflect this incipient fault effectively. By utilizing the autoencoder for the capture of nonlinear features, the AE-SVDD monitoring charts in Figure 13 obtain a slight performance improvement, with fault detection rates of 40.5% and 28.33% for D (t) and D (e) , respectively. With further mining of the deep convolutional features, CAE-SVDD can detect this fault clearly, as shown in Figure 14, where the D (t) 's detection rate is 79.5%, while the D (e) 's detection rate is 88.5%. The CAE-SVDD gives around a 40% and 60% increment in terms of the fault detection rate, which shows the advantage of convolutional features. When DPSVDD is applied in Figure 15, its D (e) index displays a similar performance to that of CAE-SVDD. However, the D (t) index of DPSVDD achieves an 88% detection rate, which is 8.5% higher than the CAE-SVDD D (t) 's. In general, the proposed DPSVDD method has the best monitoring performance in terms of the monitoring of fault F5.   All the fault monitoring results on the six faults are summarized in Tables 3 and 4. In Table 3, it can be observed that the mean fault detection rates of SVDD D and PCA-SVDD D (t) are both around 40%. However, the PCA-SVDD D (e) index can provide a good supplement for incipient monitoring, although its monitoring performance is not satisfactory, with a low mean fault detection rate of 9.19%. As AE-SVDD is used, the D (e) index is strengthened so that the mean fault detection rate reaches 38.28%. For CAE-SVDD, its mean fault detection rates of the two monitoring indices are 63.17% and 87.08%, respectively, which are obviously higher than those of other methods. With the consideration of probability information, the DPSVDD further improves the mean detection rate of D (t) so that both indices provide higher mean fault detection rates of over than 80%. As for each testing dataset, the first 200 samples represent the normal operation status. In the ideal case, there should be no alarms when monitoring the normal samples. To evaluate the false alarm rates of the different methods, the results are listed in Table 4. From this table, we can see that all the FARs are around 1%. This is consistent with the use of the 99% confidence limit. However, it should be noted that the false alarm rate of DPSVDD D (e) is slightly higher than the other methods. Further, considering that no significant improvement can be found in terms of the fault detection rate, in practice, we can only perform the probability analysis on the D (t) index. Table 3. Fault detection rates (FDRs)(%) of the six tested faults by the five compared methods.   In the above discussion, the proposed method is only applied to detect whether a fault occurs. However, it cannot identify the type of fault that occurs. In fact, a complete fault diagnosis system contains two features, namely fault detection and fault pattern recognition. This paper only focuses on the former. The latter is also a valuable task that deserves future study because it is important in carrying out fault recovery actions.

Conclusions
Timely fault detection is important to the safe running of batch processes. In order to detect incipient faults sensitively, this paper developed an improved SVDD monitoring method by integrating deep convolutional feature extraction and probability analysis. The contributions include two aspects. On the one hand, a deep SVDD model is built based on the convolutional autoencoder for mining the temporal-spatial data features sufficiently. On the other hand, based on the extracted deep features, probability distribution differences between the real-time data and the modeling data are measured by KL divergence so that the probability-related deep features are obtained for the following SVDD monitoring indices' construction. The validation on the penicillin fermentation process shows that the proposed method can provide more sensitive detection of incipient faults.
Although a significant performance improvement is achieved by the proposed method, it still encounters some challenges. The first is the issue of how to deal with the multimode and multistage properties of batch processes. In this paper, a single global SVDD model is built, which may be not sufficient for some complicated batch processes. In the future, it is necessary to study the sub-model techniques for effectively handling the mutlimode and multistage characteristics. Another problem is the deep network parameter optimization. In this paper, the deep network structure and parameters are determined by user experience. The question of how to optimize them is deserving of further discussion.