Multimode Process Monitoring Based on Modiﬁed Density Peak Clustering and Parallel Variational Autoencoder

: Clustering algorithms and deep learning methods have been widely applied in the multimode process monitoring. However, for the process data with unknown mode, traditional clustering methods can hardly identify the number of modes automatically. Further, deep learning methods can learn effective features from nonlinear process data, while the extracted features cannot follow the Gaussian distribution, which may lead to incorrect control limit for fault detection. In this paper, a comprehensive monitoring method based on modiﬁed density peak clustering and parallel variational autoencoder (MDPC-PVAE) is proposed for multimode processes. Firstly, a novel clustering algorithm, named MDPC, is presented for the mode identiﬁcation and division. MDPC can identify the number of modes without prior knowledge of mode information and divide the whole process data into multiple modes. Then, the PVAE is established based on distinguished multimode data to generate the deep nonlinear features, in which the generated features in each VAE follow the Gaussian distribution. Finally, the Gaussian feature representations obtained by PVAE are provided to construct the statistics H 2 , and the control limits are determined by the kernel density estimation (KDE) method. The effectiveness of the proposed method is evaluated by the Tennessee Eastman process and semiconductor etching process. MD-kNN is The of in GMM-SDAE is The network structures of GMM-SDAE are the same with PVAE. The conﬁdence level of the control limits is 0.99.


Introduction
With the increasing demands for production efficiency, stable system, and safe operation in modern industry, fault detection and diagnosis (FDD) has received more and more attention. In recent years, data-driven methods, especially the multivariate statistical process monitoring (MSPM) methods, have become very popular. Principal component analysis (PCA) and partial least squares (PLS) are two major MSPM methods [1][2][3][4]. To solve the dynamic problem, a novel dynamic weight principal component analysis (DWPCA) algorithm and a hierarchical monitoring strategy were proposed [5]. In order to handle missing or corrupted data, a variational Bayesian PCA (VBPCA)-based methodology was presented and applied in wastewater treatment plants (WWTPs) [6]. For the large-scale process data, Zhang et al. proposed a decentralized fault diagnosis approach based on multiblock kernel partial least squares (MBKPLS) [7].
Many MSPM methods have also been proposed for processes with multiple operating conditions. For the multiple working modes and plant-wide characteristics, Chang et al. proposed an on-line operating performance evaluation approach based on a multiple threelevel multi-block hybrid model [8]. Peng et al. proposed a multiple PLS model and applied it to quality-related prediction and fault detection [9]. However, it should be noted that traditional MSPM methods assume that data obey a single-peak distribution. The local process information is not taken into consideration. This may lead to erroneous and costly monitoring results [10].
In recent years, many methods have been developed to improve the performance of multimode process monitoring. One intuitive idea is the global model. For example, local least squares support vector regression (LSSVR) and two-step independent component analysis-principal component analysis (ICA-PCA) were introduced for the multimode on-line monitoring [11]. Song and Shi presented a global model based on temporal-spatial global locality projections for multimode process [12]. However, most global modeling methods are hardly able to represent each operating mode precisely due to the statistical averaging. The local modeling methods are also developed to monitor the status of a multimode process. A local neighborhood standardization strategy integrating with PCA (LNS-PCA) was developed for multimode fault detection [13]. A fault detection method based on adaptive Mahalanobis distance and k-nearest neighbor (MD-kNN) was proposed and applied in semiconductor manufacturing [14]. Deng et al. proposed a local neighborhood similarity analysis method for monitoring processes [15]. However, these methods require the nearest neighbor searching on each historical sample, which is a high computational load. Meanwhile, the prior knowledge to select the number of neighbors is difficult to obtain. Another common multimode process monitoring method is based on mixture model, which is suitable to represent the data sources driven by different operating modes. Yu and Qin proposed a probabilistic approach based on finite Gaussian mixture model (FGMM) and Bayesian inference for fault detection under different modes [16]. Taking the dynamic characteristics of industrial process into consideration, an adaptive Gaussian mixture model (GMM) using some prior knowledge for adaptive updating was proposed [17]. A novel monitoring strategy, which combines the advantages of multiple modeling strategies and GMM, was proposed for multimode processes [18]. In addition, the clustering method is adopted to separate the different process modes. Khediri et al. presented a procedure based on kernel k-means clustering and support vector domain description (SVDD) to identify the nonlinear process modes and detect faults, respectively [19]. The fuzzy c-means (FCM) clustering method was employed to partition the multimode process data into multiple clusters [20]. A novel monitoring strategy based on locality preserving projection (LPP) and FCM was proposed for extracting the multimode feature [21]. Luo et al. proposed a mode partition method based on the warped k-means clustering algorithm [22]. However, similar to the GMM-based methods, aforementioned clustering methods must manually set the cluster numbers in advance [23].
Density peaks clustering (DPC) is a novel clustering algorithm proposed by Rodriguez and Laio in 2014 [24]. It provides a simple way to find the cluster centers and an efficient manner to group non-center data. DPC has been widely studied and applied in the field of the multimode process monitoring. For example, a hierarchical mode division based on hierarchical density peaks clustering and hybrid geodesic distance was presented to extract more available information from the multimode process data, which can improve the adaptability of industrial processes with uncertainty [25]. A kNN-based modified DPC method was proposed and applied to multimode process monitoring [26]. The DPC can find the cluster centers for each operation mode according to the distribution characteristics of the process data. Nevertheless, there are still some unresolved problems existing in the original DPC to hinder it from becoming a reliable clustering algorithm: (1) The magnitudes of local density in different modes are inconsistent, which may lead to the wrong selection of cluster centers, and (2) there is not a valid criterion to determine the optimal number of cluster centers automatically.
In the past decade, deep learning has been widely studied due to its powerful feature extracting and representation learning ability. Among various of deep learning methods, autoencoder (AE) plays a central role. An anomaly detection method based on sequence gated recurrent units (SGRU) and AE was proposed for industrial multimode process [27].
A monitoring scheme based on GMM and stacked denoising autoencoder (GMM-SDAE) was constructed to identify the mode and extract feature from monitoring data [28]. Since AE is a self-reconstruction network, most AE-related methods cannot ensure the distribution characteristics of extracted features. This may cause improper control limit, which leads to inferior monitoring performance. Recently, variational autoencoder (VAE) has attracted increasing attention in the process monitoring domain. A significant advantage of VAE is that the learned hidden features follow the Gaussian distribution. Associated with the nonlinear reflection of neural network, VAE is suitable for the feature learning in industrial process monitoring. A nonlinear process monitoring method based on VAE was proposed to tackle the Gaussian assumption problem [29]. For the multivariate fault isolation problem, a process monitoring framework was proposed using VAE and branch-bound algorithm [30]. However, the VAE-based method is limited by the Gaussian distribution assumption of the hidden feature, which is not suitable for data with multiple mixture distributions [31,32].
Owing to the diversity of data distribution in different mode, the local density in DPC is inappropriate as a measure for selecting cluster centers. In addition, the number of the cluster centers in DPC is usually selected manually. For the multimode process dataset without mode information, improper number of modes may degrade the modeling and monitoring performance. Meanwhile, VAE shows good performance in the single-mode process monitoring, but it cannot model well for multimode data.
To address the problem above, this paper proposes a multimode process monitoring method based on modified density peak clustering and parallel variational autoencoder (MDPC-PVAE). The MDPC-PVAE can fully extract the informative nonlinear feature from the multimode process data. The MDPC can effectively identify and divide the process data in different modes, and the learned features by PVAE follow the Gaussian distribution. The proposed method contains two phases: mode identification and feature generation. In the mode identification phase, the MDPC is proposed to identify the mode information and determine the process data in each mode. As a modified decision graph measure, local density ratio is presented to unify the local density peak and reduce the local density diversity in different clusters. Moreover, total entropy estimation is employed as a criterion to determine the optimal number of cluster centers. In the feature generation phase, the PVAE is presented to learn the multimode data and generate representative features for process monitoring. It is constituted by multiple VAEs, in which the process data of a mode is utilized for constructing a VAE. The learned features in each VAE follows the Gaussian distribution. Finally, the monitoring statistics H 2 are constructed based on the Gaussian features generated by PVAE. The corresponding control limits are determined by the kernel density estimation (KDE) method. Different from most multi-model methods, in the on-line monitoring, the new sample is directly fed into the MDPC-PVAE without determining which mode the sample belongs to in advance.
The remainder of this paper is organized as follows: in Section 2, DPC and VAE are briefly introduced. In Section 3, the proposed MDPC-PVAE and its multimode monitoring procedure are described. In Section 4, the performance of proposed method is compared with some related multimode process monitoring methods on the Tennessee Eastman (TE) process and semiconductor etching (SE) process. Finally, conclusions are drawn in Section 5.

Preliminaries
In this section, we briefly review the basic concepts related to DPC and VAE.

Density Peak Clustering
DPC is a simple density-based clustering algorithm. Different from traditional centerbased clustering algorithms such as k-means and FCM, DPC is able to detect non-spherical clusters and to recognize the correct number of clusters by artificial observation. There are two basic assumptions about cluster centers. First, the cluster center has higher local density than that of their neighbor points. Second, the distances between cluster centers are relatively large. For a dataset X ∈ R n×m with n samples and m variables, the Euclidean distance d ij between two samples x i and x j is calculated as follows: For a sample x i , two important measures, i.e., the local density ρ i and the minimum distance δ i are defined to select the cluster centers. The local density ρ i is defined as: where, in presents the number of data points that have a distance to x i less than d c . The local density can be also calculated using the Gaussian function: The minimum distance δ i of data point x i is measured by calculating the minimum distance between x i and the other data points with higher local density. It is expressed as: Then, the measures ρ i and δ i are used to generate a two-dimensional decision graph. Generally, the cluster centers are manually selected according to the location of measures. The cluster centers always locate on the upper-right corner. In some studies, the cluster centers can also be determined by the composite indicator ε i as: Obviously, the data point with larger ε i is more likely the cluster center. After the cluster centers are determined, each remaining data point is assigned to the cluster to which its nearest neighbor with higher local density belongs.

Variational Autoencoder
The VAE is a stochastic generative model that can solve the inference problem. It can replace the latent representation of given data with stochastic variables and force the latent variables to obey an expected Gaussian distribution. The basic structure of VAE is shown in Figure 1. Given the dataset X, the goal of VAE is to generate the datax from the unobserved latent variable z by optimizing the network parameters θ. To make the generated datax similar to the original data x with high probability, we should maximize the likelihood p θ (x): The log likelihood log p θ x i can be expressed as: where p θ (x i |z) is the decoder, q φ (z|x i ) is the encoder, θ and φ are the network parameters, and D KL is the Kullback-Leibler (KL) divergence. In Equation (7), p θ (z x i ) is intractable, and KL divergence is non-negative. VAE considers an approximation of the marginal likelihood, denoted as evidence lower bound (ELBO), which is a lower bound of the log likelihood as: The ELBO consists of two terms. The first term is the reconstruction error, which is the same as the training objective of an autoencoder. The KL divergence term is a distance measure between the probability distribution of generated data and expected Gaussian distribution. Through maximizing the ELBO, the network parameters θ and φ are optimized. Stochastic gradient descent algorithm is used for the network training.

Multimode Process Monitoring Based on MDPC-PVAE
This section introduces the detail of the proposed MDPC-PVAE and its multimode process monitoring procedure.

MDPC-PVAE
In the original DPC, the cluster centers are selected from local density peaks. However, for the multimode problem, the data distribution characteristics between different mode are various. DPC only focuses on value of local density and neglects if the density is really large in absolute magnitude. It means that a data point A in the higher density and wider coverage area is more likely becoming a cluster center than the data point B in a low-density and narrow coverage area even if the data point B has the highest local density in its area. Considering density diversity in a different area, namely a new decision graph measure, the local density ratio γ i is proposed to handle the density differences as: where S i is the set containing the M data points with distances to x i less than d c . If data point x i is a local density peak, all its M neighbor points have lower local density than ρ i , i.e., ρ i > ρ j , and γ i > 1. The local density ratio can reduce the influence from large density differences across clusters. After the γ i for each data point has been calculated, integrated with the previously mentioned measure δ i , the candidate cluster centers C ca are obtained through introducing the conditions as follows: where T γ and T δ are the thresholds for γ i and δ i , respectively. T γ can remove the non-center data with low local density ratio. T δ is used for eliminating the redundant data with high local density ratio. In this study, T γ is set to 1, and the average value of δ i is taken as the threshold T δ . After that, the composite indicator ε i about C ca is defined as: The selection order of cluster centers is determined by the value of composite indicator ; that is: Then, we can sort all the data points corresponding to the {s i } P i=1 as the selection sequence of cluster centers C i : After the selection sequence of cluster centers is obtained, entropy estimation is employed to determine the number of cluster centers. Renyi entropy is a nonparametric estimation method that reflects the similarity or dissimilarity metric between data in the same space. For a stochastic variable x with a probability density function f(x), its Renyi entropy is: where α is the information order. If α = 2, the Renyi quadratic entropy is given as: Equation (15) can be directly estimated by the Parzen window density estimation with a multi-dimensional Gaussian window function. Probability density function estimation of a cluster with the center C k can be represented as follows: where N k is the number of data points belonging to the cluster with center C k , and G is the Gaussian window function with covariance matrix σ 2 I. G is represented as follows: where M is the dimension number of x. The scale parameter governs the width of the Parzen window. By substituting (17) into (16), the entropy formula of cluster with center C k can be obtained as follows: where V(C k ) can be expressed as follows: Assume that there are R clusters in the dataset; the first R elements in the selection sequence of cluster center, i.e., {C 1 , C 2 , . . . , C R }, are chosen as the cluster centers. Each remaining point is assigned to the cluster to which its nearest neighbor with higher density belongs. The total entropy can be calculated as follows: Note that, in calculating the E(C n ), the kernel size σ in the Gaussian window function is unified and consistent with that of E(1). Obviously, if all the data points are assigned to the proper clusters, the data in each cluster may be more similar, and then, the total entropy becomes lower. The number of cluster centers N c can be obtained through finding the minimum total entropy in different combination of cluster centers: The overall procedures of MDPC are described in Algorithm 1. Considering that the multimode data usually show the multi-clusters distribution characteristics, three synthetic datasets (i.e., Aggregation, R15, and D31) are adopted to verify the clustering effect of MDPC [33]. The γ − δ decision graph, total entropy with different number of clusters, and visual clustering result on the three synthetic datasets are shown in Figures 2-4. The clustering results show that MDPC shows good clustering performance. Through introducing local density ratio, the cluster centers can be found exactly. Entropy estimation determines the optimal number of clusters. The clustering results of DPC, DBSCAN, and MDPC on the Aggregation dataset re shown in Figure 5. It can be seen that MDPC can find the accurate cluster center. DPC and DBSCAN can only determine cluster centers with the high local density and ignore the local density peak.     Based on the MDPC, the whole multimode process is distinguished as multiple single modes. The process data X are divided in N c data subsets {X 1 , X 2 , . . . , X N c }. Since there are many differences between modes in the input profiles, conditions, process characteristics, and control strategy, traditional VAE cannot fully describe these multimode process data. Hence, the PVAE is constructed for the multimode process data, in which the samples in each mode are used to build a corresponding VAE. Owing to the large difference in the numerical range and magnitude between modes, it is firstly necessary to normalize the data subsets, respectively. In the VAE, the form of the prior distribution p θ (h) is specified as a standard normal distribution; that is, p θ (h) ∼ N(0, 1). The form of D KL (q φ (h x i ) p θ (h)) in Equation (8) can be computed as follows: where k is the dimension of the expected Gaussian distribution, tr(·) is the trace of the matrix, and det(·) is the determinant of the matrix. The loss function of PVAE for the data subsets X i can be written as: whereX i is the reconstruction sample subset. Stochastic gradient descent is adapted to train the PVAE.

MDPC-PVAE for Multimode Process Monitoring
Once the PVAE is trained well, due to hidden feature h in VAE following the Gaussian distribution, the H 2 statistic can be directly constructed in the encoder feature subspace, which is similar to the Hotelling's T-squared statistic. The H 2 statistic in the ith mode of PVAE is defined as follows: where Σ is the covariance matrix of the hidden features. The control limit of statistics H 2 is calculated by the kernel density estimation (KDE) method [34]. The probability density function of H 2 (i) is fitted using the kernel function. Given the confidence level ζ, the value of the density function at ζ is the control limit H 2 (i, lim) . When a new monitoring sample x new arrives, it is firstly normalized according to different data subsets, and the hidden representations are obtained from the PVAE. Then, the statistics H 2 (i) of x new can be calculated based on Equation (24).
, and x new is abnormal. Otherwise, x new is normal; meanwhile, the current mode is the kth mode in which H 2 (k) < H 2 (k, lim) . The procedure of MDPC-PVAE based multimode process monitoring is presented in Figure 6. It includes two phases: off-line modeling and on-line monitoring.
Collect the multimode normal process data X and normalize the samples.

3.
Normalize the samples in the data subset and save the normalization parameters for on-line monitoring.

4.
Design the architecture of PVAE and train the PVAE with data subset X 1 , X 2 , . . . , X N c .

5.
Compute the hidden features and construct the monitoring statistic H 2 (i) . 6.
Calculate the control limit with a confidence level of 0.99 for each mode by KDE.
Obtain the on-line sample x new and normalize it by the saved normalization parameters in off-line modeling as x (1) new , x new to the PVAE and obtain the hidden features.

3.
Calculate statistics H 2 (i) ; if each statistic is greater than its corresponding control limit H 2 (i, lim) , x new is faulty. Otherwise, x new is normal and record the current mode type.

Case Study
In this section, two benchmark cases (i.e., TE process and SE process) were conducted to test the monitoring performance of MDPC-PVAE. Based on the requirements of environment and different products, there are various operation conditions in TE process and SE process. They are typical multimode processes that were extensively applied to the performance evaluation of multimode fault detection methods. The simulations were implemented on a computer with configurations as follows: Operating system: 64-bit Microsoft Windows 10; CPU: Intel i7-8700 (3.20 GHz); RAM: 8GB; Software: Matlab2020.

Tennessee Eastman Process
The TE process is a realistic simulation program of a large-scale chemical industrial plan, which has become a benchmark platform for FDD methods evaluation [35]. This process includes five units: the reactor, vapor-liquid separator, product condenser, recycle compressor, and product stripper. There are total of 22 continuous measurement variables, 19 composition measurement variables, and 12 manipulated variables. The flow sheet of TE process is exhibited in Figure 7. In this study, we used the revised TE simulation proposed in [36]. There are six different process operation modes. In each mode, the simulation data include 1 normal dataset and 28 faulty datasets. Table 1 lists the description of these 28 faults. Each dataset is simulated for a duration of 100 h at a sampling rate of 3 min, resulting in 2000 observed samples. In the fault datasets, the abnormal conditions are introduced from the 601st to the 2000th sample. In this study, the operating Modes 1, 2, and 3 were chosen for multimode process simulation. Thirty-three process-measured variables were selected for fault detection modeling. The detail description can be found in [37]. The faults were selected based on different types, including step faults (1, 2, 4-7), random variation faults (8, 10-13, 17, 18, 20, 24-28), sticking fault (14), and unknown faults (19). The three datasets in normal condition are used for clustering analysis and network training. The γ − δ decision graph and total entropy with different number of clusters are shown in Figure 8. The number of clusters was calculated as three. The network parameters and hyperparameters were determined properly by using grid searching. The network structures of PVAE were designed at 33-100-70, 33-95-70, and 33-100-65.
In this case study, three methods including LNS-PCA, MD-kNN, and GMM-SDAE were constructed to compare with the MDPC-PVAE to verify the effectiveness of the proposed method. The number of principal components for LNS-PCA is 17 [38]. The number of neighbors in LNS-PCA and MD-kNN is set to five. The number of multimode parameters in GMM-SDAE is three. The network structures of GMM-SDAE are the same with PVAE. The confidence level of the control limits is 0.99. Table 1. Description of faults in the TE process.

Fault
Description Type

A/C feed ratio, B composition constant (stream 4)
Step 2 B composition, A/C ratio constant (stream 4) Step 3 D feed temperature (stream 2) Step 4 Water inlet temperature for reactor cooling Step 5 Water inlet temperature for condenser cooling Step 6 A feed loss (stream 1) Step 7 C header pressure loss (stream 4) Step 8 A/B/C composition of stream 4 Random variation 9 D feed (stream 2) temperature Random variation 10 C feed (stream 4) temperature Random variation 11 Cooling water inlet temperature of reactor Random variation 12 Cooling water inlet temperature of separator Random variation 13 Reaction kinetics Random variation 14 Cooling water outlet valve of reactor Sticking 15 Cooling water outlet valve of separator Reactor cooling water flow Random variation 28 Condenser cooling water flow Random variation In the LNS-PCA and GMM-SDAE, two statistics, i.e., Hotelling's T-squared (T 2 ) and squared prediction error (SPE), were calculated to detect process faults. The T 2 statistic and SPE statistic measure the variation of sample x i projected in the feature space (h) and residual space (x −x), respectively [13,25]. The T 2 and SPE statistics are defined as follows: where h is a vector representing the extracted features by LNS-PCA and GMM-SDAE, and φ is the covariance matrix. The monitoring statistic D 2 of MD-kNN is defined as [39]: where M 2 i,j denotes Mahalanobis distance from sample i to its jth nearest neighbor. Two important indicators, i.e., fault detection rate (FDR) and false-alarm rate (FAR), were taken for performance evaluation of MDPC-PVAE. They are defined as follows: Figure 9 shows the monitoring charts of MDPC-PVAE in the TE process. As shown in Figure 9a, the testing dataset is Fault 4 in Mode 1. The first 600 samples of H 2 (1) are all below its control limit. When the fault occurs at the 601st sample, H 2 (1) begins to increase and exceed the control limit. H 2 (2) and H 2 (3) are always above their control limits. Similarity, in Figure 9b, the H 2 (3) can also recognize the normal samples, and the other two statistics are above the control limits. This is because, due to the generated hidden features from PVAE following the Gaussian distribution, each sub-VAE in the PVAE only recognize the within-mode normal condition. When detecting the faulty condition within-mode or the arbitrary condition in the other modes, the corresponding statistic will be greater than its control limit. In this case study, a testing dataset for one fault type is composed of the corresponding datasets in three modes. The comparison results of FAR/FDR are exhibited in Table 2. It is obvious that MDPC-PVAE has a relatively high accuracy for most faults, especially for Faults 12, 24, and 28. Moreover, the MDPC-PVAE has the highest average FDR with 86.05% among all the comparison methods. The average FDRs of LNS-PCA (SPE) and GMM-SDAE (T 2 ) are relatively high with 84.68% and 85.49%, respectively. However, their average FARs are also much higher with over 3%. MDPC-PVAE obtains the lowest FAR with 0.9% The monitoring results of Fault 25 in Mode 1 with the four methods are shown in Figure 10. The LNS-PCA (T 2 ) only achieves the FDR with 82.14%. LNS-PCA (SPE) and GMM-SDAE (T 2 and SPE) can achieve the FDR with about 92%. Furthermore, the LNS-PCA (T 2 ) and GMM-SDAE (T 2 and SPE) have higher FAR, about 2%. The performance of the proposed MDPC-PVAE is slightly better than other methods, with an FDR of 94% and FAR of 0.17%. Figure 11 shows the monitoring results of Fault 26 in Mode 2. The LNS-PCA (T 2 ) and GMM-SDAE (SPE) have lower FDRs with 68.71% and 69.71%, respectively. The FDRs of LNS-PCA (SPE) and GMM-SDAE (T 2 ) are relatively high with 86.79% and 84.64%, while their FARs are also correspondingly high with about 1.6%. MD-kNN achieves the FAR of 0%, but its FAR is also low with 82.71%. The FDR and FAR of MDPC-PVAE are 84.14% and 0.33%. MDPC-PVAE has higher FDR and lower FAR. This is because the generated features by MDPC-SAE follow Gaussian distribution, which are more beneficial for statistic construction and fault detection. Compared with the other methods, the proposed MDPC-PVAE can more effectively capture features from the local process data, which improves its performance of multimode fault detection.
The computational complexity of the model is an important evaluation indicator in practical engineering applications. The computational times for the four methods are presented in Table 3. The testing time for a batch size is 2000 samples. MDPC-PVAE consumes 40.51 ms. LNS-PCA and MD-kNN consume more time with 67,982.97 and 72,103.54, respectively. This is not a surprise because they both need to search through the whole dataset to find the nearest neighbors, which greatly increases the computational time of the monitoring process. Compared with parallel network structure of MDPC-PVAE, GMM-SDAE only uses a feature extraction network for on-line monitoring. However, the mode of the monitoring sample should be firstly determined by GMM. Meanwhile, in order to obtain the reconstruction error, the decoder network is also used for computing the reconstruction sample. That means the number of hidden layers required for calculating is twice that of MDPC-PVAE, which brings more parameters and more computational burden. Both of the two factors can lead to the increase of testing time.

Semiconductor Etching Process
The semiconductor dataset was collected from an A1 stack etch process performed on a commercial scale LAM 9600 plasma etch tool at Texas Instrument, Dallas, USA [39,40]. The data consist of 108 normal wafers taken during three experiments and 21 wafers with intentionally induced faults taken during the same experiments. There are 21 variables in a wafer. Excluding the time and step number variables, we selected 19 variables for fault detection modeling. Due to the fact that the original data in the semiconductor etching process are three-dimensional, statistics pattern analysis (SPA) method was adopted to replace the batch data with statistical characteristics of variables. The mean and variance of variables were chosen to constitute the statistics pattern vector of batch.
The γ − δ decision graph and total entropy with different number of clusters are shown in Figure 12

Conclusions
In this study, the modified density peak clustering and parallel variational autoencoderbased multimode process monitoring method is proposed. The MDPC can identify the number of modes and divide the process data without prior knowledge about mode information. The PVAE is built up based on the divided multimode process data. The Gaussian distribution characteristic of generated features from PVAE is beneficial for improving the fault detection effect and reducing the false alarm. The MDPC-PVAE can solve the inaccurate mode identification problem and the uncertainty of generated features distribution. The effectiveness of the proposed method is verified in the TE process and the SE process. Compared with related multimode process monitoring methods, such as LNS-PCA, MD-kNN, and GMM-SDAE, the simulation results indicate that the MDPC-VPAE has an excellent monitoring performance with the FDRs of 86.05% and 100%, respectively. Furthermore, the testing times show that the MDPC-VPAE has good performance in computational efficiency. It is suitable for real-time monitoring and fault detection in practical engineering applications.
In the on-line monitoring, MDPC-PVAE can directly feed the monitoring data into the generation network without the mode identification phase to detect faults, while the mode information of the faulty data cannot be obtained. In addition, this work mainly studies on the steady modes in multimode process. The transition modes are not explicitly considered. The further investigation is needed for extending this method to monitor transition processes.