Next Article in Journal
Application of Achiral and Chiral High-Performance Liquid Chromatography Methods for Determination of Lactic Acid in Cosmetic Products
Previous Article in Journal
Study on the Mechanical Properties and Failure Mechanisms of Coal–Rock Composite Specimens Considering Variations in Weaker Components
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Memory-Enhanced and Prediction-Assisted Conditional Variational Autoencoder for Unsupervised Fault Detection in Industrial Processes

Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Nanjing Forestry University, Nanjing 210037, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(12), 5941; https://doi.org/10.3390/app16125941
Submission received: 19 May 2026 / Revised: 5 June 2026 / Accepted: 11 June 2026 / Published: 12 June 2026
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

Autoencoders (AEs) have been widely used for industrial process fault detection owing to their ability to learn nonlinear representations from normal operating data. However, conventional AE methods rely heavily on reconstruction errors and may miss weak faults due to overgeneralization. In addition, insufficient modeling of temporal evolution and operating condition variations may reduce their sensitivity to dynamic faults. To address these issues, this study proposes a memory-enhanced and prediction-assisted conditional variational autoencoder named MI-CVAE for unsupervised fault detection. In the proposed framework, statistical features extracted from sliding windows are used as condition information to describe variable operating states. A memory module stores representative normal prototypes to constrain reconstruction and reduce overgeneralization to faulty samples. Meanwhile, an Informer branch captures temporal dependencies and provides complementary prediction residuals. Reconstruction and prediction residuals are fused to construct squared prediction error and squared Mahalanobis distance statistics, with control limits determined by kernel density estimation. The proposed method is validated on the Benchmark Simulation Model No. 1 wastewater treatment benchmark and a real papermaking process dataset. The results show that MI-CVAE outperforms the evaluated comparison methods, particularly in detecting weak and dynamic faults, while maintaining a low false alarm rate.

1. Introduction

Modern process industries are increasingly required to sustain continuous production under stringent and variable conditions. As system structures become more integrated and working environments more complex, faults are more likely to arise during production. If such events remain undetected, they may compromise production stability, degrade product quality, and increase safety risks. Therefore, accurate fault detection is of great significance for identifying abnormal states at an early stage and ensuring the safe and reliable operation of industrial systems [1].
During the past decades, various fault detection methods have been developed for industrial process monitoring. Traditional model based methods usually rely on accurate mathematical descriptions or mechanistic knowledge of the monitored system [2]. Although these methods have clear physical meanings and strong interpretability, their application is often limited when dealing with complex industrial processes, since it is difficult to establish accurate mechanistic models for systems with strong nonlinearity, time-varying behavior, and multivariable coupling. In contrast, data-driven fault detection methods do not require explicit mechanistic models and can directly extract useful information from historical process data [3]. Classical data-driven methods mainly include principal component analysis (PCA) [4], partial least squares [5], independent component analysis [6], canonical variate analysis [7], support vector machine, Gaussian mixture model, and k-nearest neighbor methods [8,9]. These methods have been widely applied in process monitoring and have achieved satisfactory performance in many industrial scenarios. Nevertheless, most of them are based on shallow feature representations or statistical assumptions, and their ability to describe complex nonlinear and dynamic characteristics is still limited. When the monitored process contains strong temporal dependence and hidden nonlinear relationships, traditional methods may fail to extract discriminative fault features effectively.
Deep learning has provided an effective modeling framework for fault detection in industrial processes [10]. Unlike conventional models, deep learning approaches can automatically extract hierarchical representations from process data, thereby improving the characterization of complex nonlinear relationships and dynamic behaviors. For example, a cascaded monitoring network named MoniNet was proposed to simultaneously capture temporal dynamic correlations and local spatial correlations, enabling effective anomaly detection in real industrial processes [11]. Recurrent and convolutional architectures were systematically evaluated for early fault detection in the Tennessee Eastman process, showing that deep learning models can improve detection performance while reducing the dependence on manual feature engineering [12]. Bayesian recurrent neural networks were used for chemical process fault detection, enabling nonlinear dynamic modeling while providing uncertainty information for monitoring decisions [13]. Deep recurrent neural networks have also been incorporated into residual control charts for autocorrelated process monitoring and verified using papermaking process data [14]. In addition, recurrent neural networks have been used for sensor fault detection and isolation in nonlinear systems [15]. Compared with conventional recurrent neural networks, long short-term memory (LSTM) networks can better preserve long-term dependencies through gating mechanisms, making them suitable for fault detection in dynamic processes. By combining an LSTM-based attention model with the sequential probability ratio test, early fault warning can be achieved by evaluating the statistical deviation of prediction residuals [16].
In practical industrial scenarios, normal operating data are typically abundant, whereas fault samples remain scarce [17]. Moreover, fault conditions are often diverse and difficult to exhaustively collect, while accurate data labeling requires substantial labor and time costs. Consequently, modeling approaches that rely heavily on labeled fault data are often difficult to meet the practical requirements of industrial process monitoring. In contrast, unsupervised fault detection methods characterize regular process behavior using normal operating data and detect anomalies by measuring the deviation of new observations from the learned reference. Therefore, they are more suitable for industrial applications with limited labeled fault samples. In this context, the autoencoder (AE) has been widely applied to industrial process fault detection because of its clear structure, relatively stable training process, and ability to learn nonlinear feature representations from normal operating data [18,19].
A typical autoencoder consists of an encoder and a decoder. The encoder maps input samples into a low-dimensional latent space, while the decoder reconstructs the original inputs from the latent representations. When trained only on normal operating data, an AE can capture the main features and distribution characteristics of regular process behavior and accurately reconstruct samples within this range. Faulty samples that deviate from this reference usually produce reconstructed outputs that differ significantly from the original inputs. The resulting reconstruction errors are used to construct anomaly scores for fault detection [20]. Various AE-based models have been developed to improve fault detection performance in industrial processes. Deep AE based feature learning has shown its ability to extract representative process features for process pattern recognition [21]. To capture coexisting linear and nonlinear characteristics, PCA was combined with a stacked autoencoder to enhance fault detection in complex industrial processes [22]. In addition, sparse autoencoder combined with adaptive slow feature analysis has been applied to fault detection in time-varying processes [23]. For wastewater treatment applications, a stacked denoising autoencoder was employed for sensor validation in real plants and achieved fault detection rates of up to 98% [24]. Moreover, a multistage variational autoencoder (VAE) was designed for wastewater treatment process monitoring by combining stage division with probabilistic latent modeling [25].
However, AE based fault detection methods generally assume that faulty samples cannot be well reconstructed by a model trained only with normal operating data. This assumption does not always hold in practice. Previous studies have reported that deep autoencoders may generalize well to samples outside the normal distribution and reconstruct some faulty samples with small errors, especially when the fault magnitude is weak or the faulty pattern is close to the normal operating distribution [10,26]. As a result, faulty samples may be incorrectly identified as normal, leading to missed detections. This phenomenon is commonly associated with the overgeneralization problem of AE models [27].
Memory augmented autoencoders provide a promising strategy for alleviating this problem. Instead of directly using latent features for reconstruction, memory augmented AE models store representative normal prototypes in an external memory bank. During reconstruction, the latent representation of the current sample is used as a query to retrieve the most relevant memory items, and the retrieved normal prototypes are then used to guide the decoding process. In this way, the model tends to reconstruct samples according to stored normal patterns, thereby limiting its ability to recover faulty samples and increasing the reconstruction discrepancy between normal and faulty conditions [27,28].
Beyond reconstruction constraints, effective fault detection in industrial processes requires explicit modeling of temporal evolution. Since industrial process data usually exhibit temporal dependence and dynamic correlations, insufficient dynamic modeling may reduce the sensitivity to weak or slowly evolving faults [29,30]. Another practical challenge is that industrial processes often operate under variable conditions caused by load fluctuations, set point changes, and process adjustments [31,32]. Such variations may shift the statistical characteristics of normal samples and make the boundary of normal operating patterns more difficult to describe accurately [33].
Motivated by these considerations, this paper proposes MI-CVAE, a memory enhanced and prediction assisted conditional variational autoencoder for unsupervised fault detection in industrial processes. In the proposed framework, local statistical information is incorporated into the VAE as auxiliary condition input to better characterize normal operating behavior under varying process conditions. A memory module is embedded in the latent space to store representative normal prototypes and constrain the reconstruction process, thereby alleviating the overgeneralization problem of AE models. To further capture process dynamics, an Informer prediction branch is introduced to learn the temporal evolution of process variables. Reconstruction and prediction errors are then jointly used to construct monitoring statistics for fault detection. The main contributions of this paper are summarized as follows.
(1) A memory-enhanced conditional VAE framework is proposed for unsupervised industrial fault detection. Local statistical information is used to characterize variations in normal operating states, while memory prototypes constrain reconstruction and suppress the excessive recovery of abnormal samples.
(2) An Informer prediction branch is introduced into the reconstruction model to jointly use reconstruction and prediction errors for fault detection. The reconstruction branch measures deviations from normal patterns, while the prediction branch captures abnormal dynamic evolution, thereby improving the detection of weak and dynamic faults.
(3) The effectiveness of the proposed MI-CVAE method is validated on the Benchmark Simulation Model No. 1 (BSM1) wastewater treatment benchmark and a real papermaking process dataset. Experimental results demonstrate that MI-CVAE outperforms the comparison methods while maintaining a low false alarm rate.

2. Dataset Description

2.1. Case 1: BSM1

BSM1 is a standardized simulation platform widely used in the field of wastewater treatment [34], and its system configuration is shown in Figure 1. This platform simulates a typical activated sludge treatment process, consisting of five biological reactors and one secondary clarifier. Internal and external recirculation streams are also incorporated to enable the effective removal of nitrogen and carbon pollutants. To construct process monitoring data, BSM1 was simulated under dry-weather conditions. The influent profile covered 14 consecutive days with a sampling interval of 15 min, yielding a total of 1345 data points. Considering their relevance to effluent quality and operational regulation, 15 key process variables were selected for analysis, including influent flow rate, dissolved oxygen concentration, suspended solids, and various nitrogen-containing component concentrations. Detailed information on these variables is provided in Table 1.
Eight typical fault conditions were constructed on the BSM1 simulation platform. Faults 1–4 are process faults, which were introduced by changing biochemical reaction parameters, settling performance parameters, or actuator output signals, causing the system dynamics to deviate from normal operation. Faults 5–8 are sensor faults, mainly involving abnormal variations in control setpoints or measurement signals, such as bias, drift, and complete failure. These faults are used to assess the model’s detection performance for both process disturbances and measurement abnormalities.
For the process faults, Faults 1 and 2 simulate reduced microbial activity by decreasing the maximum specific growth rates of autotrophic and heterotrophic microorganisms, respectively. Fault 3 represents deterioration of settling performance by reducing the settling velocity in the secondary clarifier. Fault 4 is introduced by increasing the nitrate actuator output signal, resulting in abnormal changes in internal recirculation and nitrogen-related variables. These faults reflect typical abnormalities in biochemical reactions, settling separation, and operational regulation.
Sensor faults are used to simulate measurement abnormalities in monitoring and control loops. Fault 5 corresponds to a shift in the dissolved oxygen controller setpoint, Faults 6 and 7 represent fixed bias and linear drift of the dissolved oxygen sensor, respectively, and Fault 8 denotes complete sensor failure. Since the BSM1 system involves feedback control, sensor faults may not only affect state observation but also propagate to related process variables through the control loop.
The detailed settings and parameter descriptions of the eight faults are summarized in Table 2. To illustrate the dynamic influence of process disturbances, Figure 2 shows the temporal responses of all variables under Fault 1 and compares them with those under normal operating conditions.

2.2. Case 2: Papermaking Process Monitoring Dataset

To validate the applicability and robustness of the proposed fault detection model in practical industrial processes, production data collected from a papermaking enterprise from January to December 2024 were used in this study. The dataset covers four key sections of the papermaking process: the approach flow, wire, press, and drying sections. These sections are sequentially connected and exhibit strong coupling and dynamic transmission among process variables, making them representative for process monitoring. The raw field data were first screened to exclude invalid records associated with production shutdown, operating condition switching, and abnormal missing values. Consequently, 442 valid samples were retained for each process section. The approach flow, wire, press, and drying sections include 21, 10, 27, and 13 process variables, respectively.
In practical industrial processes, severe faults generally occur with low frequency, and field monitoring data often lack sufficient and accurate fault annotations. Therefore, representative abnormal patterns of process variables are commonly constructed based on normal operating data in industrial process monitoring studies to evaluate the identification capability of fault detection methods under different dynamic disturbance conditions. Referring to commonly observed abnormal evolution patterns of variables in industrial process monitoring, four types of faults were constructed in this study, including drift, cycle, scale up, and scale down faults. In combination with the actual operating characteristics of the papermaking process, disturbances were introduced into selected key variables at specified time points to simulate different variation trends that may occur under abnormal operating conditions.
The detailed fault settings, including the fault number, corresponding process section, fault type, and affected variables, are summarized in Table 3. All faults were introduced at the 293rd sample. The model was trained using data collected under normal operating conditions, while the fault data were used for testing and performance evaluation. Figure 3 compares the variable trajectories under normal and fault-injection conditions for Fault 1, showing the characteristic changes in process variables after fault occurrence.

3. Materials and Methods

3.1. Data Preprocessing

The original process data are first standardized, and time-series samples are then constructed using a sliding window, with the next observation used as the prediction target.
For the i-th input window, the condition vector is constructed by concatenating the mean and standard deviation vectors of all variables within the window:
c i = [ μ i c , σ i c ]
where μ i c D and σ i c D represent the mean and standard deviation vectors of the variables in the window, respectively. Hence, c i 2 D .
For the p-th variable, they are calculated as:
μ i , p c = 1 L t = 1 L x t , p ( i )
σ i , p c = 1 L t = 1 L x t , p ( i ) μ i , p c 2
where L denotes the window length, and x t , p ( i ) denotes the value of the p-th variable at the t-th step within the i-th window. In this way, the condition vector retains the local level and fluctuation information of each window, thereby providing auxiliary constraints for latent distribution learning.

3.2. Conditional Variational Autoencoder

In this work, conditional information is introduced into the VAE framework [35] to construct a CVAE, whose structure is shown in Figure 4. Compared with the conventional VAE, the CVAE incorporates a condition vector derived from window-based statistical features into both the encoder and decoder. This enables the latent distribution to be learned under local process constraints, thereby improving the representation of normal operating modes.
For an input sample x i and its corresponding condition vector c i , the encoder maps them into the latent distribution:
q ϕ ( z i | x i , c i ) = N ( z i ; μ i , σ i 2 )
In Equation (4), zi is the latent variable, μi and σ i 2 represent the mean and variance of the latent variable distribution; ϕ denotes the encoder parameters. The reparameterization trick is then used to sample the latent variable:
z i = μ i + σ i ε , ε ~ N ( 0 , I )
During decoding, z i is concatenated with c i and fed into the decoder to reconstruct the input sample:
p θ ( x i | z i , c i )
where θ denotes the decoder parameters. By introducing c i into both the encoder and decoder, the CVAE can learn latent representations related to the current local process state.
The CVAE training objective consists of a reconstruction loss and a Kullback–Leibler divergence term:
L CVAE = L rec + β L KL
The parameter β is the weight factor of the KL divergence term. The reconstruction loss L rec is defined as:
L rec = 1 N L D i = 1 N x i x ^ i 2
Here, L is the window length, D is the number of variables, and N is the number of samples constructed from sliding windows. The KL divergence term is given by:
L KL = 1 2 N i = 1 N j = 1 d z 1 + log σ i , j 2 μ i , j 2 σ i , j 2
The symbol dz denotes the dimensionality of the latent variable, and β is the weight factor for the KL term. By jointly optimizing these two terms, the model learns a smooth latent representation while retaining the ability to reconstruct normal samples under local process constraints.

3.3. Memory Module

The memory module stores representative normal patterns in the latent space and retrieves the most relevant memory information for the current sample [27]. The memory matrix is defined as:
M = [ m 1 , m 2 , , m K ] K × d z
where K denotes the number of memory units, and m k represents the latent prototype vector of the k-th memory unit. For the i-th input sample, the encoder first outputs the parameters of the latent variable distribution, μ i and σ i . Here, μ i serves as the query vector for the memory module, which is used to compute the similarity between the input and each memory unit. To eliminate the influence of the vector norm difference on the matching results, both the query vector and the memory vectors are L 2 -normalized, and the cosine similarity between them is computed as:
sim ( μ i , m k ) = μ i m k μ i 2 m k 2
This similarity measures the directional consistency between μi and each memory prototype. Subsequently, a temperature coefficient τ is introduced to scale the similarity, and a softmax function is applied to obtain the attention weight for each memory unit:
a i , k = exp sim ( μ i , m k ) / τ j = 1 K exp sim ( μ i , m j ) / τ
The parameter τ adjusts the sharpness of the attention distribution. When τ is small, the model focuses more on a few memory units with high similarity. When τ is large, the attention distribution becomes smoother. After obtaining the attention weights, a weighted sum over the memory units is computed to obtain the memory read vector:
z i m = k = 1 K a i , k m k
This vector represents the information most relevant to the current input in terms of normal operating patterns. It is fused with the latent variable z i obtained from the reparameterization trick using a fusion coefficient α to form the final enhanced latent representation:
z i f = ( 1 α ) z i + α z i m
Here, α is the fusion coefficient. The fused latent vector preserves both the characteristics of the current input sample and the memory-enhanced information, serving as input for subsequent decoder reconstruction.

3.4. Informer-Based Prediction Module

The Informer-based prediction module is used to model the temporal evolution immediately following the input window. By introducing a prediction branch, the model complements the reconstruction branch in capturing dynamic dependencies and improves its sensitivity to abnormal temporal variations.
Let the input window sequence be:
X i = [ x i 1 , x i 2 , , x i L ] L × D
where L is the window length and D is the number of variables. The Informer module takes the window sequence X i as input and outputs the prediction y ^ i D at the next time step. First, a linear mapping projects the original input into a high-dimensional feature space, and positional encoding is added to retain temporal order information, yielding the initial embedding representation:
H i 0 = X i W e + P
with W e and P denoting the input projection matrix and positional encoding matrix, respectively. H i 0 L × d model represents the initial feature embedding of the input sequence, and d m o d e l is the embedding dimension.
During the encoding phase, Informer employs probabilistic sparse (ProbSparse) self-attention to model the long-range temporal dependencies in the input sequence [36]. Figure 5 shows the structure of ProbSparse self-attention. Instead of computing attention for all queries, ProbSparse self-attention selects the top-u queries with the highest sparsity scores for attention calculation. This strategy reduces computational complexity while preserving the dominant dependency relationships in the sequence.
For the l-th encoder layer, the input features H i l 1 are linearly projected to obtain the query, key, and value matrices:
Q = H i l 1 W Q , K = H i l 1 W K , V = H i l 1 W V
The matrices W Q , W K , W V correspond to the query, key, and value projections, respectively. For each query vector q t , Informer calculates a sparsity measure using its dot product with all key vectors to evaluate its contribution to overall attention:
M ( q t , K ) = max j q i k j T d k 1 L K j = 1 L K q i k j T d k
Here, d k is the key dimension and L K is the sequence length. This metric reflects the sparsity of attention for query q t ; queries that have stronger correlations with a few keys receive higher sparsity scores and are retained for subsequent attention calculation.
Based on the sparsity measure, the most important u queries are selected from all queries to form the sparse query set Q ¯ , and the ProbSparse attention is then computed accordingly:
Q ¯ = Top u ( Q , M ( Q , K ) )
ProbSparse ( Q , K , V ) = Softmax Q ¯ K T d k V
In the multi-head mechanism, the features are projected into multiple subspaces, and the outputs of all heads are concatenated and linearly transformed:
MultiHead ( Q , K , V ) = Concat ( head 1 , head 2 , , head h ) W O
where the output of the r-th attention head is:
head r = ProbSparse ( Q r , K r , V r )
After ProbSparse attention, the features are fed into the feedforward network with residual connections and layer normalization for stable training. Thus, the output of the l-th layer of the encoder can be expressed as:
H ˜ i l = LayerNorm H i l 1 + MultiHead ( Q , K , V )
H i l = LayerNorm H ˜ i l + FFN ( H ˜ i l )
Here, FFN ( ) denotes the position-wise feed-forward network.
To improve sequence modeling efficiency, Informer incorporates a distilling mechanism between encoder layers. Specifically, after partial encoding, a 1D convolution, activation, and pooling compress the temporal dimension, reducing redundant information while highlighting dominant dynamic features:
H i l , dist = MaxPool ( ELU ( Conv 1 D ( H i l ) ) )
After multi-layer ProbSparse attention and distilling, the encoder outputs high-level temporal features, which are projected to the prediction layer to obtain the multivariate forecast for the next time step:
h i = Pool ( H i enc )
y ^ i = f pred ( h i )
where h i represents the aggregated window-level temporal features from the encoder, f pred ( ) is the prediction projection function, and y ^ i D is the predicted multivariate value at the next time step.

3.5. Joint Loss Function

To jointly optimize reconstruction, prediction, latent distribution regularization, and memory representation learning, a joint loss function is adopted. The overall loss function is defined as:
L = λ rec L rec + λ pred L pred + β L KL + λ pull L pull + λ ent L ent + λ decay L decay
In Equation (28), L rec , L pred and L KL denote the reconstruction loss, prediction loss, and KL divergence loss, respectively. The terms L pull , L ent and L decay are introduced to constrain memory representation learning. The coefficient β is the same KL divergence weight as defined in Equation (7). The parameters λrec, λpred, λpull, λent, and λdecay are the weights assigned to the reconstruction loss, prediction loss, pull loss, entropy regularization term, and memory weight decay term, respectively.
The reconstruction loss and KL divergence loss have been introduced in Section 3.2 and are not elaborated here. The prediction loss measures the deviation between the predicted and actual next-step states:
L pred = 1 N H D i = 1 N y i y ^ i 2
where D is the number of variables, H is the prediction horizon (for this model H = 1 since single-step prediction is used). Additionally, the memory module introduces corresponding constraints. The pull loss reduces the discrepancy between the mean latent variable and the memory read vector:
L pull = 1 N i = 1 N μ i z i m 2
The entropy regularization term is used to control the sharpness of memory attention allocation, formulated as:
L ent = 1 N i = 1 N k = 1 K a i , k log ( a i , k )
The memory weight decay term is expressed as:
L decay = 1 K k = 1 K m k 2
By optimizing these loss terms together, the model learns normal reconstruction patterns, temporal prediction relationships, and memory-enhanced latent representations, providing a basis for subsequent monitoring statistic construction and fault detection.

4. Process Monitoring Framework Based on MI-CVAE

4.1. The Proposed MI-CVAE Model

The MI-CVAE model consists of a conditional variational autoencoder, a memory module, and an Informer-based prediction branch, forming a dual-branch monitoring framework that combines reconstruction and prediction. As shown in Figure 6, the model takes the time-series window sample x i and its condition vector c i as inputs. The condition vector is constructed from the mean and standard deviation of each variable within the window to provide local operating-condition information.
In the reconstruction branch, the encoder maps x i and c i into a latent distribution, and the latent representation is obtained through the reparameterization trick. The memory module retrieves representative normal patterns from the memory bank using the latent mean as the query, and then fuses the retrieved memory information with the original latent variable for reconstruction. This memory-enhanced mechanism constrains the reconstruction process with normal operating patterns, thereby reducing the over-generalization of the autoencoder to abnormal samples and improving the sensitivity of reconstruction errors to faults. In the prediction branch, the Informer module learns temporal dependencies from the input window and predicts the process state at the next time step. This branch complements the reconstruction branch by capturing the normal temporal evolution of process variables. When faults occur, the deviation from the learned evolution pattern leads to increased prediction errors.
Overall, MI-CVAE integrates normal-pattern reconstruction and temporal prediction within a unified framework. The reconstruction branch captures latent distribution characteristics, while the prediction branch models dynamic evolution patterns, providing a more comprehensive feature basis for fault detection.
Figure 7 illustrates the implementation procedure of the MI-CVAE-based fault detection model. The detailed steps are as follows.
Step 1: The raw process data are standardized, and the input samples and one-step-ahead prediction targets are constructed using a sliding-window strategy. Meanwhile, the mean and standard deviation of each variable within the window are extracted to form the condition vector.
Step 2: The MI-CVAE model is trained using samples collected under normal operating conditions. The model is jointly optimized by the reconstruction loss, prediction loss, KL divergence loss, and constraint terms associated with the memory module.
Step 3: The reconstruction and prediction errors are calculated using the normal samples in the training set. The fused squared prediction error (SPE) and squared Mahalanobis distance (MD2) statistics are then constructed, and the corresponding control limits are determined by kernel density estimation.
Step 4: The test samples are fed into the trained MI-CVAE model to calculate the corresponding SPE and MD2 statistics. These statistics are compared with the control limits to discriminate between normal and abnormal operating states.
Step 5: Based on the detection results in the normal and faulty periods, the fault detection rate and false alarm rate are calculated to evaluate the fault detection performance of the model.

4.2. Monitoring Statistics

Monitoring statistics are used to quantify the deviation of process samples from normal operating conditions. In this work, SPE and MD2 monitoring statistics are constructed from both the reconstruction and prediction branches.
For an arbitrary input window sample, the reconstruction error and prediction error are defined as:
e rec = x x ^
e pred = y y ^
where x denotes the input window sample, and x ^ is the reconstructed output of the model. y denotes the true prediction target, and y ^ is the output of the prediction branch. e rec and e pred represent the reconstruction error and prediction error, respectively.
According to the definition of SPE, the SPE statistics corresponding to the reconstruction and prediction branches are expressed as:
SPE rec = e rec T e rec
SPE pred = e pred T e pred
To avoid scale differences between the two branches, the SPE statistics are standardized using the mean and standard deviation calculated from the training set:
SPE z rec = SPE rec μ SPE rec σ SPE rec
SPE z pred = SPE pred μ SPE pred σ SPE pred
where μ SPE rec and σ SPE rec denote the mean and standard deviation of the SPE statistic of the reconstruction branch in the training set, respectively. Similarly, μ SPE pred and σ SPE pred denote the mean and standard deviation of the SPE statistic of the prediction branch. SPE z rec and SPE z pred are the standardized SPE statistics of the reconstruction and prediction branches, respectively.
The standardized SPE statistics from the two branches are then fused in a weighted manner to obtain the final SPE monitoring statistic:
SPE = ω rec SPE z rec + ω pred SPE z pred
The coefficients ω rec and ω pred are assigned to the reconstruction and prediction branches, respectively, and satisfy:
ω rec + ω pred = 1
To further consider the covariance structure among residual variables, MD2 is introduced as a complementary monitoring statistic. For the reconstruction and prediction branches, the instantaneous MD2 statistics are defined as:
MD t 2 , rec = e t rec μ t rec T t rec 1 e t rec μ t rec
MD t 2 , pred = e t pred μ t pred T t pred 1 e t pred μ t pred
where e t rec denotes the reconstruction residual at the t-th time step within the input window, and e t pred denotes the prediction residual at the t-th prediction step. μ t rec and μ t pred are the mean vectors of the corresponding residuals in the training set, while t rec and t pred are the corresponding residual covariance matrices.
Since local anomalies may be diluted by averaging over the window, the maximum instantaneous MD2 is used as the window-level statistic:
MD 2 , rec = max MD 1 2 , rec , MD 2 2 , rec , , MD L 2 , rec
The MD2 statistic of the prediction branch is defined as:
MD 2 , pred = max MD 1 2 , pred , MD 2 2 , pred , , MD H 2 , pred
where L denotes the length of the input window, and H denotes the prediction horizon. When the prediction horizon is equal to 1, the MD2 statistic of the prediction branch is calculated from the residual of a single prediction step.
Similar to SPE, the branch-specific MD2 statistics are standardized using the corresponding training statistics:
MD z 2 , rec = MD 2 , rec μ MD rec σ MD rec
MD z 2 , pred = MD 2 , pred μ MD pred σ MD pred
Here, μ MD rec and σ MD rec denote the mean and standard deviation of the MD2 statistic of the reconstruction branch in the training set, respectively. μ MD pred and σ MD pred denote the mean and standard deviation of the MD2 statistic of the prediction branch, respectively. MD z 2 , rec and MD z 2 , pred are the standardized MD2 statistics of the two branches.
The final MD2 monitoring statistic is obtained as:
MD 2 = ω rec MD z 2 , rec + ω pred MD z 2 , pred
After obtaining the monitoring statistics from the training set, kernel density estimation (KDE) is employed to model the probability distribution of the training statistics in a nonparametric manner, thereby avoiding prior assumptions about their distribution forms. Let { J k } k = 1 n denote a set of monitoring statistic values calculated from the training samples. The corresponding probability density function can be estimated as:
f ^ ( J ) = 1 n b k = 1 n K J J k b
In Equation (48), n is the number of training samples, b is the bandwidth, and K(⋅) represents the kernel function. A Gaussian kernel is used in this work. Given a confidence level α, the control limit δ is determined by:
P ( J δ ) = α
Here, J denotes the monitoring statistic to be modeled, which can be either the fused SPE or MD2, and δ represents the corresponding control limit under the confidence level α. The confidence level of the KDE-based control limit was set to α = 0.99, following the common practice in process monitoring studies where a 99% confidence limit is used to determine monitoring thresholds [37,38]. This setting corresponds to an approximate nominal false alarm probability of 1% under normal operating conditions, thereby helping to suppress unnecessary false alarms while preserving sufficient sensitivity to faults.
During testing, each sample is fed into the trained MI-CVAE model, and its fused SPE and MD2 statistics are compared with the control limits. A sample is identified as faulty if either SPE > δ SPE or MD 2 > δ MD ; otherwise, it is regarded as normal.

4.3. Evaluation Metrics

The fault detection rate (FDR) and false alarm rate (FAR) were adopted as evaluation metrics. Based on the confusion matrix, TP denotes the number of fault samples correctly identified as abnormal, FN denotes the number of fault samples incorrectly classified as normal, FP denotes the number of normal samples incorrectly identified as abnormal, and TN denotes the number of normal samples correctly classified as normal. The two metrics are defined as follows:
FDR = TP TP + FN × 100 %
FAR = FP TN + FP × 100 %
A higher FDR indicates stronger capability in identifying fault samples, whereas a lower FAR indicates fewer false alarms under normal operating conditions. Therefore, a desirable fault detection method should achieve a high FDR while maintaining a low FAR.

5. Case Studies

5.1. Case 1: BSM1

Before model training, grid search was performed for the sliding window length, latent dimension, memory module parameters, prediction branch parameters, and loss weights. In particular, the number of memory units K was selected by considering the trade-off between representation capacity and model complexity. To further analyze the influence of the number of memory units, a sensitivity analysis was conducted by testing K in {5, 10, 20, 30, 40} on the BSM1 dataset. As shown in Figure 8, the average FDR increased as K increased from 5 to 20, indicating that a larger memory bank can better represent the diversity of normal operating patterns. However, when K was further increased to 30 and 40, the improvement in FDR became marginal, while model complexity increased. Therefore, K = 20 was selected as a balanced setting considering both detection performance and false alarm control.
The optimal parameters were selected based on the cross-validation results. The final parameter settings are listed in Table 4.
Table 5 and Table 6 present the fault detection results of different methods on the BSM1 dataset, and the corresponding monitoring curves are shown in Figure 9 and Figure 10. In addition to the ablated variants, AE-Transformer and LSTM-GAN are included as representative deep learning baselines. The proposed MI-CVAE achieves the best performance under both monitoring statistics. Specifically, the average FDRs based on SPE and MD2 are 97.1% and 98.9%, respectively, while the corresponding average FARs are 3.4% and 3.8%. Relative to AE-Transformer and LSTM-GAN, MI-CVAE increases the average SPE-based FDR by 10.7% and 18.0%, respectively. For the MD2 statistic, the corresponding gains are 3.8% and 7.0%. These results suggest that the proposed framework learns more discriminative fault representations than models using only Transformer-based temporal modeling or GAN-based distribution learning.
For Faults 1, 5, and 8, most methods achieve high detection rates, suggesting that these faults cause obvious deviations from normal operating conditions. MI-CVAE reaches 100.0% FDR for these three faults under both statistics. Faults 5 and 8 are directly related to dissolved oxygen control or sensor failure, which can induce pronounced changes in the monitored variables and are therefore relatively easy to detect. As shown in Figure 9 and Figure 10, the SPE and MD2 statistics increase rapidly after fault occurrence and remain above the control limits for most faulty samples. The advantages of MI-CVAE are more evident for Faults 4, 6, and 7. Fault 4 is caused by an abnormal increase in the nitrate actuator output signal, which may affect the system state through the control loop and internal recirculation. Such abnormal variations are not always sufficiently reflected by reconstruction errors alone. Under SPE, the FDR of CVAE for Fault 4 is only 36.9%, whereas MI-CVAE increases it to 89.9%. Under MD2, the FDR is further improved from 45.6% to 94.1%. This improvement indicates that the proposed model is more sensitive to actuator-related dynamic disturbances. The periodic peaks in the monitoring curves also show that MI-CVAE can capture repeated abnormal deviations caused by this fault. Faults 6 and 7, corresponding to dissolved oxygen sensor bias and drift, are also difficult to detect because their effects may be partially compensated by the feedback control system, resulting in weak or gradual abnormal changes. For Fault 6, the SPE-based FDR of CVAE is only 8.2%, while MI-CVAE increases it to 96.3%; under MD2, MI-CVAE further reaches 100.0%. For Fault 7, MI-CVAE achieves FDRs of 94.4% and 100.0% under SPE and MD2, respectively. These results indicate that the prediction branch effectively captures abnormal temporal evolution, while the memory module strengthens the distinction between normal fluctuations and fault-induced deviations. Thus, MI-CVAE shows clear advantages in detecting weak sensor abnormalities and slow dynamic deviations.
The FAR results in Table 6 further demonstrate the robustness of the proposed method. MI-CVAE obtains the lowest average FARs under SPE and MD2, with values of 3.4% and 3.8%, respectively. Although MD2 is generally more sensitive to subtle distributional shifts, it does not cause excessive false alarms in MI-CVAE. For Faults 2 and 3, the MD2-based FARs are both 0.0%, while the corresponding FDRs remain above 97%, indicating a good balance between detection sensitivity and false alarm control. These results confirm that MI-CVAE has clear advantages in detecting weak disturbances, sensor bias and drift faults, and actuator-related dynamic faults, while maintaining stable performance for faults with more distinct abnormal patterns.

5.2. Case 2: Papermaking Process Monitoring Dataset

As described in Section 2.2, the papermaking process monitoring dataset contains 442 samples. In this experiment, the dataset was divided into training and test sets at a ratio of 6:4. The training set contains 265 samples collected under normal operating conditions, while the test set contains 177 samples, of which the last 150 samples correspond to fault data. Although its size is relatively limited due to the practical constraints of industrial data acquisition, it is used as a real industrial case to examine the applicability of the proposed method. The evaluation is further supported by the standard BSM1 benchmark dataset, and the conclusions are drawn from the combined results of both datasets.
Table 7 and Table 8 present the detection results of different methods on the papermaking dataset, and the corresponding monitoring curves are shown in Figure 11 and Figure 12. MI-CVAE achieves the highest average detection performance among all evaluated methods. Under the SPE statistic, its average FDR reaches 94.8%, exceeding those of CVAE, CVAE-Memory, CVAE-Informer, AE-Transformer, LSTM-GAN, and MI-VAE by 21.0%, 14.0%, 6.2%, 3.9%, 10.0%, and 3.4%, respectively. Under the MD2 statistic, the average FDR further reaches 96.1%, with improvements of 18.0%, 9.3%, 5.3%, 3.0%, 7.2%, and 3.1% over the corresponding comparison methods. Meanwhile, the average FARs remain only 0.0% and 1.1%, indicating that the improvement in detection sensitivity is not achieved at the expense of excessive false alarms.
The performance differences can be understood from the fault characteristics and model structures. For Faults 1 and 2, the abnormal information is mainly reflected in relatively evident changes in material flow and consistency related variables, which can be partially captured by reconstruction-based methods. However, MI-CVAE uses memory enhanced normal prototypes to strengthen the distinction between normal fluctuations and fault induced deviations. For Fault 4, the cyclic pressure disturbance may be mixed with normal pressure variations, so its abnormality is not always prominent in reconstruction errors. In this case, CVAE and CVAE-Memory are limited because they do not explicitly model whether the process state evolves according to normal temporal patterns. By introducing the prediction branch, MI-CVAE can better capture such abnormal dynamic evolution. For Faults 5–7, the faults involve coupled changes in vacuum, drying temperature, and steam pressure related variables. The integration of reconstruction modeling, memory representation, and temporal prediction enables MI-CVAE to describe both variable deviations and dynamic correlation changes, leading to more stable detection performance on the papermaking dataset.

6. Conclusions and Limitations

6.1. Conclusions

This study proposed an unsupervised fault detection framework named MI-CVAE for nonlinear and dynamic industrial processes. By incorporating condition information, memory constrained reconstruction, and Informer based temporal prediction into a unified framework, the proposed method improves the representation of normal operating patterns and enhances the detection sensitivity to weak and dynamic faults.
The effectiveness of MI-CVAE was validated on the BSM1 wastewater treatment benchmark and a real papermaking process dataset. On the BSM1 dataset, MI-CVAE achieved average FDR values of 97.1% and 98.9% under the SPE and MD2 statistics, respectively, while maintaining average FAR values of 3.4% and 3.8%. On the papermaking dataset, MI-CVAE obtained average FDR values of 94.8% and 96.1% under the SPE and MD2 statistics, respectively, with average FAR values of 0.0% and 1.1%. These results demonstrate that MI-CVAE achieves more stable detection performance than the comparison methods, especially for weak and dynamic faults, without causing excessive false alarms.

6.2. Limitations and Future Work

Despite these encouraging results, several limitations should be acknowledged. Due to practical constraints in industrial data acquisition, the real papermaking dataset used in this study is relatively limited in size. Model adaptability to evolving operating conditions also requires further consideration. When intentional process improvements or operational adjustments change the statistical distribution of normal data, new normal patterns outside the training data may be temporarily identified as faults. The current MI-CVAE framework supports offline retraining with newly collected normal operating data, whereas online adaptive updating has not yet been fully implemented. Another limitation lies in its data-driven nature, as physical mechanisms of the monitored process are not explicitly incorporated, leaving room for further improvement in interpretability. Since FDR and FAR are mainly reported as point estimates in this study, systematic uncertainty analysis should also be further strengthened.
Future work will be carried out from the following aspects. More papermaking process data under broader operating conditions will be collected to extend the validation scope of MI-CVAE. To improve its adaptability to evolving industrial processes, retraining and adaptive updating strategies will be further investigated. The integration of MI-CVAE with physical models and signal processing features will also be explored to enhance model interpretability and dynamic feature representation. In addition, statistical uncertainty analysis based on repeated experiments with different random seeds, confidence interval estimation, and bootstrap based evaluation will be introduced to quantify the uncertainty of FDR and FAR more comprehensively.

Author Contributions

Methodology, L.W.; data collection, X.W.; supervision, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Shandong Provincial Natural Science Foundation, China (ZR2021MF135) and Natural Science Foundation of Jiangsu Provincial Universities, China (22KJA530003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from third party and are available from the authors with the permission of the third party.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S.N. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Comput. Chem. Eng. 2003, 27, 293–311. [Google Scholar] [CrossRef]
  2. Isermann, R. Model-based fault-detection and diagnosis—Status and applications. Annu. Rev. Control 2005, 29, 71–85. [Google Scholar] [CrossRef]
  3. Qin, S.J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220–234. [Google Scholar] [CrossRef]
  4. Teppola, P.; Mujunen, S.-P.; Minkkinen, P.; Puijola, T.; Pursiheimo, P. Principal component analysis, contribution plots and feature weights in the monitoring of sequential process data from a paper machine’s wet end. Chemom. Intell. Lab. Syst. 1998, 44, 307–317. [Google Scholar] [CrossRef]
  5. Godoy, J.L.; Vega, J.R.; Marchetti, J.L. A fault detection and diagnosis technique for multivariate processes using a PLS-decomposition of the measurement space. Chemom. Intell. Lab. Syst. 2013, 128, 25–36. [Google Scholar] [CrossRef]
  6. Lee, J.-M.; Yoo, C.; Lee, I.-B. Statistical process monitoring with independent component analysis. J. Process Control 2004, 14, 467–485. [Google Scholar] [CrossRef]
  7. Russell, E.L.; Chiang, L.H.; Braatz, R.D. Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis. Chemom. Intell. Lab. Syst. 2000, 51, 81–93. [Google Scholar] [CrossRef]
  8. MacGregor, J.F.; Kourti, T. Statistical process control of multivariate processes. Control Eng. Pract. 1995, 3, 403–414. [Google Scholar] [CrossRef]
  9. He, Q.P.; Wang, J. Fault Detection Using the k-Nearest Neighbor Rule for Semiconductor Manufacturing Processes. IEEE Trans. Semicond. Manuf. 2007, 20, 345–354. [Google Scholar] [CrossRef]
  10. Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep Learning for Anomaly Detection: A Review. ACM Comput. Surv. 2021, 54, 1–38. [Google Scholar] [CrossRef]
  11. Yu, W.; Zhao, C.; Huang, B. MoniNet with Concurrent Analytics of Temporal and Spatial Information for Fault Detection in Industrial Processes. IEEE Trans. Cybern. 2022, 52, 8340–8351. [Google Scholar] [CrossRef]
  12. Lomov, I.; Lyubimov, M.; Makarov, I.; Zhukov, L.E. Fault detection in Tennessee Eastman process with temporal deep learning models. J. Ind. Inf. Integr. 2021, 23, 100216. [Google Scholar] [CrossRef]
  13. Sun, W.; Paiva, A.R.C.; Xu, P.; Sundaram, A.; Braatz, R.D. Fault detection and identification using Bayesian recurrent neural networks. Comput. Chem. Eng. 2020, 141, 106991. [Google Scholar] [CrossRef]
  14. Chen, S.; Yu, J. Deep recurrent neural network-based residual control chart for autocorrelated processes. Qual. Reliab. Eng. Int. 2019, 35, 2687–2708. [Google Scholar] [CrossRef]
  15. Kumar, S.R.; Devakumar, J. Recurrent neural network based sensor fault detection and isolation for nonlinear systems: Application in PWR. Prog. Nucl. Energy 2023, 163, 104836. [Google Scholar] [CrossRef]
  16. Pang, C.; Duan, D.; Zhou, Z.; Han, S.; Yao, L.; Zheng, C.; Yang, J.; Gao, X. An integrated LSTM-AM and SPRT method for fault early detection of forced-oxidation system in wet flue gas desulfurization. Process Saf. Environ. Prot. 2022, 160, 242–254. [Google Scholar] [CrossRef]
  17. Brito, L.C.; Susto, G.A.; Brito, J.N.; Duarte, M.A.V. An explainable artificial intelligence approach for unsupervised fault detection and diagnosis in rotating machinery. Mech. Syst. Signal Process. 2022, 163, 108105. [Google Scholar] [CrossRef]
  18. Zeng, L.; Jin, Q.; Lin, Z.; Zheng, C.; Wu, Y.; Wu, X.; Gao, X. Dual-attention LSTM autoencoder for fault detection in industrial complex dynamic processes. Process Saf. Environ. Prot. 2024, 185, 1145–1159. [Google Scholar] [CrossRef]
  19. El Mokhtari, K.; McArthur, J.J. Autoencoder-Based fault detection using building automation system data. Adv. Eng. Inf. 2024, 62, 102810. [Google Scholar] [CrossRef]
  20. Qian, J.; Song, Z.; Yao, Y.; Zhu, Z.; Zhang, X. A review on autoencoder based representation learning for fault detection and diagnosis in industrial processes. Chemom. Intell. Lab. Syst. 2022, 231, 104711. [Google Scholar] [CrossRef]
  21. Yu, J.; Zheng, X.; Wang, S. A deep autoencoder feature learning method for process pattern recognition. J. Process Control 2019, 79, 1–15. [Google Scholar] [CrossRef]
  22. Li, J.; Yan, X. Process monitoring using principal component analysis and stacked autoencoder for linear and nonlinear coexisting industrial processes. J. Taiwan Inst. Chem. Eng. 2020, 112, 322–329. [Google Scholar] [CrossRef]
  23. Tan, S.; Zhou, X.; Shi, H.; Song, B. Adaptive slow feature analysis—Sparse autoencoder based fault detection for time-varying processes. J. Taiwan Inst. Chem. Eng. 2023, 142, 104599. [Google Scholar] [CrossRef]
  24. Ba-Alawi, A.H.; Vilela, P.; Loy-Benitez, J.; Heo, S.; Yoo, C. Intelligent sensor validation for sustainable influent quality monitoring in wastewater treatment plants using stacked denoising autoencoders. J. Water Process Eng. 2021, 43, 102206. [Google Scholar] [CrossRef]
  25. Peng, C.; Kai, W.; Kun, Z.; Fanchao, M. Monitoring of wastewater treatment process based on multi-stage variational autoencoder. Expert Syst. Appl. 2022, 207, 117919. [Google Scholar] [CrossRef]
  26. Spigler, G. Denoising Autoencoders for Overgeneralization in Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 998–1004. [Google Scholar] [CrossRef] [PubMed]
  27. Ma, Z.L.; Li, X.J.; Nian, F.Q. An Interpretable Fault Detection Approach for Industrial Processes Based on Improved Autoencoder. IEEE Trans. Instrum. Meas. 2025, 74, 3518813. [Google Scholar] [CrossRef]
  28. Gong, D.; Liu, L.; Le, V.; Saha, B.; Mansour, M.R.; Venkatesh, S.; van den Hengel, A. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1705–1714. [Google Scholar]
  29. Li, G.; Ge, M.; Wan, J.; Han, D.; Li, M.; Zhou, M. MemMambaAD: Memory-augmented state space model for multivariate time series anomaly detection. Eng. Appl. Artif. Intell. 2025, 158, 111308. [Google Scholar] [CrossRef]
  30. Dong, Y.; Qin, S.J. A novel dynamic PCA algorithm for dynamic data modeling and process monitoring. J. Process Control 2018, 67, 1–11. [Google Scholar] [CrossRef]
  31. Choi, S.W.; Martin, E.B.; Morris, A.J.; Lee, I.-B. Adaptive Multivariate Statistical Process Control for Monitoring Time-Varying Processes. Ind. Eng. Chem. Res. 2006, 45, 3108–3118. [Google Scholar] [CrossRef]
  32. Zhao, C.; Sun, H. Dynamic Distributed Monitoring Strategy for Large-Scale Nonstationary Processes Subject to Frequently Varying Conditions Under Closed-Loop Control. IEEE Trans. Ind. Electron. 2019, 66, 4749–4758. [Google Scholar] [CrossRef]
  33. Zhao, C. Perspectives on nonstationary process monitoring in the era of industrial artificial intelligence. J. Process Control 2022, 116, 255–272. [Google Scholar] [CrossRef]
  34. Alex, J.; Benedetti, L.; Copp, J.; Gernaey, K.; Jeppsson, U.; Nopens, I.; Pons, M.-N.; Rieger, L.; Rosen, C.; Steyer, J. Benchmark Simulation Model No. 1 (BSM1); Report by the IWA Taskgroup on benchmarking of control strategies for WWTPs; Lund University: Lund, Sweden, 2008; Volume 1. [Google Scholar]
  35. Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar] [CrossRef]
  36. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; pp. 11106–11115. [Google Scholar]
  37. Cai, L.; Tian, X.; Chen, S. Monitoring Nonlinear and Non-Gaussian Processes Using Gaussian Mixture Model-Based Weighted Kernel Independent Component Analysis. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 122–135. [Google Scholar] [CrossRef] [PubMed]
  38. Deng, X.; Cai, P.; Cao, Y.; Wang, P. Two-Step Localized Kernel Principal Component Analysis Based Incipient Fault Diagnosis for Nonlinear Industrial Processes. Ind. Eng. Chem. Res. 2020, 59, 5956–5968. [Google Scholar] [CrossRef]
Figure 1. Sewage treatment process BSM1 benchmark platform.
Figure 1. Sewage treatment process BSM1 benchmark platform.
Applsci 16 05941 g001
Figure 2. Sample data of fault 1 in BSM1.
Figure 2. Sample data of fault 1 in BSM1.
Applsci 16 05941 g002
Figure 3. Sample data of fault 1 in papermaking dataset.
Figure 3. Sample data of fault 1 in papermaking dataset.
Applsci 16 05941 g003
Figure 4. Structure of the conditional variational autoencoder.
Figure 4. Structure of the conditional variational autoencoder.
Applsci 16 05941 g004
Figure 5. Structure of probabilistic sparse self-attention.
Figure 5. Structure of probabilistic sparse self-attention.
Applsci 16 05941 g005
Figure 6. Structure of the MI-CVAE network.
Figure 6. Structure of the MI-CVAE network.
Applsci 16 05941 g006
Figure 7. Flowchart of the MI-CVAE fault detection framework.
Figure 7. Flowchart of the MI-CVAE fault detection framework.
Applsci 16 05941 g007
Figure 8. Sensitivity analysis of the number of memory units K on the BSM1 dataset: (a) Average FDR under different K values; (b) Average FAR under different K values.
Figure 8. Sensitivity analysis of the number of memory units K on the BSM1 dataset: (a) Average FDR under different K values; (b) Average FAR under different K values.
Applsci 16 05941 g008
Figure 9. SPE monitoring results of MI-CVAE on the BSM1 dataset.
Figure 9. SPE monitoring results of MI-CVAE on the BSM1 dataset.
Applsci 16 05941 g009
Figure 10. MD2 monitoring results of MI-CVAE on the BSM1 dataset.
Figure 10. MD2 monitoring results of MI-CVAE on the BSM1 dataset.
Applsci 16 05941 g010
Figure 11. Fault detection results of MI-CVAE using SPE on the papermaking dataset.
Figure 11. Fault detection results of MI-CVAE using SPE on the papermaking dataset.
Applsci 16 05941 g011
Figure 12. Fault detection results of MI-CVAE using MD2 on the papermaking dataset.
Figure 12. Fault detection results of MI-CVAE using MD2 on the papermaking dataset.
Applsci 16 05941 g012
Table 1. Variables for modeling and monitoring in BSM1.
Table 1. Variables for modeling and monitoring in BSM1.
No.VariableDescriptionSampling LocationUnit
1QinInfluent flow rateInfluent of wastewaterm3/day
2SNHinInfluent ammonia nitrogen concentrationInfluent of wastewatermg N/m3
3SNO2Nitrite concentrationReactor 2mg N/m3
4SO3Dissolved oxygen concentrationReactor 3mg COD/m3
5SO4Dissolved oxygen concentrationReactor 4mg COD/m3
6SO5Dissolved oxygen concentrationReactor 5mg COD/m3
7TSS4Total suspended solidsReactor 4mg SS/m3
8TSS5Total suspended solidsReactor 5mg SS/m3
9TSSeTotal suspended solidsExternal recirculationmg SS/m3
10TSSwTotal suspended solidsEffluent of wastewatermg SS/m3
11TSSrTotal suspended solidsInternal recirculationmg SS/m3
12KLa5Oxygen transfer coefficientReactor 51/day
13QintrInternal recirculation flow rateEffluent of reactor 5m3/day
14SNHeffEffluent ammonia nitrogen concentrationEffluent of wastewatermg N/m3
15SNOeffEffluent nitrite concentrationEffluent of wastewatermg N/m3
Table 2. Types of faults of the BSM1 model.
Table 2. Types of faults of the BSM1 model.
Fault No.Fault DescriptionSimulation Mode
1Decrease the maximum specific growth rate of autotrophs bacteriaStep plus ramp compound disturbance
2Decrease the maximum specific growth rate of heterotrophs bacteriaStep plus ramp compound disturbance
3Decrease the settling velocity of the secondary clarifierStep plus ramp compound disturbance
4Increase the output signal of the nitrate actuatorStep plus ramp compound disturbance
5Change the setpoint of the DO controllerStep
6Bias fault of the DO sensorBias
7Drift fault of the DO sensorDrift
8Complete failure of the DO sensorConstant value
Table 3. Fault scenarios and affected variables in the papermaking dataset.
Table 3. Fault scenarios and affected variables in the papermaking dataset.
Fault No.SectionFault NameFault TypeAffected Variables
1Approach flow sectionHigh-consistency pulp blockageDriftHigh-consistency pulp flow rate, headbox pressure, headbox stock consistency
2Approach flow sectionDilution-water screen failureScale upDilution-water screen inlet pressure, white-water consistency
3Wire sectionVacuum system leakageDriftBottom-/top-wire vacuum-box level
4Press sectionFirst-press shoe overpressureCycleFirst-press shoe loading-zone oil pressure, shoe internal pressure, edge-pressure ratio
5Press sectionInsufficient vacuum dewateringDriftSuction-box vacuum pressure, transfer-felt suction-roll vacuum level
6Drying sectionExhaust system failureScale downExhaust-air temperature, supply-air temperature
7Drying sectionSteam pressure fluctuationDriftInlet steam flow to the dryer drum, drum steam pressure
Table 4. Parameter configuration for the MI-CVAE model.
Table 4. Parameter configuration for the MI-CVAE model.
Parameter NameSymbol or VariableParameter Setting
Sliding window lengthL48
CVAE latent dimensiond32
Number of memory unitsK20
Memory module temperature coefficientτ0.10
Memory fusion coefficientα0.25
Reconstruction loss weightλrec1.0
Prediction loss weightλpred0.8
KL divergence weightβ0.005
Memory pull constraint weightλpull0.10
Attention entropy constraint weightλent0.001
Memory weight decay coefficientλdecay0.0001
Optimizer/Adam
Learning rateLr0.001
Batch sizeBATCH_SIZE32
Number of training epochsEPOCHS120
Reconstruction branch fusion weightωrec0.6
Prediction branch fusion weightωpred0.4
Table 5. Comparison of FDR (%) for different methods on the BSM1 dataset.
Table 5. Comparison of FDR (%) for different methods on the BSM1 dataset.
Fault No.12345678Average FDR
CVAESPE95.781.154.536.9100.08.245.9100.065.3
MD2100.094.897.145.6100.064.292.2100.086.7
CVAE-MemorySPE98.987.578.139.3100.014.258.4100.072.1
MD2100.094.297.946.2100.063.493.1100.086.9
CVAE-InformerSPE98.691.487.171.8100.043.174.6100.083.3
MD2100.094.797.680.4100.080.496.7100.093.7
AE-TransformerSPE99.092.589.376.8100.054.679.0100.086.4
MD2100.096.098.583.6100.085.497.2100.095.1
LSTM-GANSPE97.588.082.661.599.538.564.8100.079.1
MD2100.094.096.074.5100.074.096.5100.091.9
MI-VAESPE99.595.294.381.3100.068.985.2100.090.6
MD2100.097.199.088.0100.091.999.5100.096.9
MI-CVAESPE100.097.099.589.9100.096.394.4100.097.1
MD2100.097.499.894.1100.0100.0100.0100.098.9
Table 6. Comparison of FAR (%) for different methods on the BSM1 dataset.
Table 6. Comparison of FAR (%) for different methods on the BSM1 dataset.
Fault No.12345678Average FAR
CVAESPE3.93.93.93.97.08.12.57.75.1
MD24.62.12.55.65.66.312.76.35.7
CVAE-MemorySPE6.74.64.67.78.18.82.58.86.5
MD24.92.52.56.06.06.715.56.76.4
CVAE-InformerSPE4.62.82.55.66.06.31.16.04.4
MD24.21.41.45.35.35.63.25.34.0
AE-TransformerSPE4.22.11.85.05.56.01.45.74.0
MD24.01.21.05.45.24.64.05.43.9
LSTM-GANSPE5.03.23.06.26.57.02.06.85.0
MD24.82.82.86.06.26.55.86.25.1
MI-VAESPE3.91.11.14.95.35.60.75.33.5
MD23.95.31.11.14.95.35.36.04.1
MI-CVAESPE4.40.40.44.85.25.60.45.63.4
MD24.00.00.05.24.85.26.05.23.8
Table 7. Comparison of FDR (%) for different methods on the papermaking dataset.
Table 7. Comparison of FDR (%) for different methods on the papermaking dataset.
Fault No.1234567Average FDR
CVAESPE71.382.768.774.079.363.377.373.8
MD276.088.773.381.385.364.078.078.1
CVAE-MemorySPE75.382.073.378.779.387.390.080.8
MD285.376.082.089.390.791.393.386.8
CVAE-InformerSPE87.390.785.392.784.785.394.088.6
MD289.392.088.795.390.089.391.390.8
AE-TransformerSPE89.393.388.093.388.790.792.790.9
MD291.396.090.796.792.091.394.093.1
LSTM-GANSPE84.089.382.788.082.782.084.784.8
MD286.792.087.392.788.086.788.788.9
MI-VAESPE90.096.087.392.092.791.390.791.4
MD291.398.092.094.094.788.093.393.0
MI-CVAESPE90.799.390.098.794.794.096.094.8
MD292.7100.094.7100.094.095.396.096.1
Table 8. Comparison of FAR (%) for different methods on the papermaking dataset.
Table 8. Comparison of FAR (%) for different methods on the papermaking dataset.
Fault No.1234567Average FAR
CVAESPE11.13.77.414.80.018.53.78.5
MD214.87.411.118.53.722.27.412.2
CVAE-MemorySPE7.43.77.411.10.014.83.76.9
MD211.13.77.414.83.718.57.49.5
CVAE-InformerSPE3.70.03.77.40.011.10.03.7
MD27.40.03.711.10.014.80.05.3
AE-TransformerSPE3.70.03.77.40.07.40.03.2
MD23.70.00.07.40.011.10.03.2
LSTM-GANSPE7.43.77.411.10.011.13.76.3
MD20.03.70.011.13.714.87.45.8
MI-VAESPE3.70.00.03.70.07.40.02.1
MD23.70.00.07.40.00.03.72.1
MI-CVAESPE0.00.00.00.00.00.00.00.0
MD20.03.70.03.70.00.00.01.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, L.; Wang, X.; Liu, H. Memory-Enhanced and Prediction-Assisted Conditional Variational Autoencoder for Unsupervised Fault Detection in Industrial Processes. Appl. Sci. 2026, 16, 5941. https://doi.org/10.3390/app16125941

AMA Style

Wei L, Wang X, Liu H. Memory-Enhanced and Prediction-Assisted Conditional Variational Autoencoder for Unsupervised Fault Detection in Industrial Processes. Applied Sciences. 2026; 16(12):5941. https://doi.org/10.3390/app16125941

Chicago/Turabian Style

Wei, Lingli, Xinyuan Wang, and Hongbin Liu. 2026. "Memory-Enhanced and Prediction-Assisted Conditional Variational Autoencoder for Unsupervised Fault Detection in Industrial Processes" Applied Sciences 16, no. 12: 5941. https://doi.org/10.3390/app16125941

APA Style

Wei, L., Wang, X., & Liu, H. (2026). Memory-Enhanced and Prediction-Assisted Conditional Variational Autoencoder for Unsupervised Fault Detection in Industrial Processes. Applied Sciences, 16(12), 5941. https://doi.org/10.3390/app16125941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop