Next Article in Journal
Moringa oleifera Lam as a Biocoagulant in the Treatment of Water from the Amazon River in the Amazon Region, Brazil
Previous Article in Journal
Investigations into Microchannel-Controlled Copper–Copper Temperature Gradient Bonding
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Industrial Process Monitoring Based on DiPCA and ARMA-Transformer

1
Department of Automation, College of Artificial Intelligence, China University of Petroleum (Beijing), Beijing 102249, China
2
Department of Safety Engineering, College of Safety and Ocean Engineering, China University of Petroleum (Beijing), Beijing 102249, China
*
Author to whom correspondence should be addressed.
Processes 2026, 14(10), 1504; https://doi.org/10.3390/pr14101504
Submission received: 11 April 2026 / Revised: 28 April 2026 / Accepted: 30 April 2026 / Published: 7 May 2026
(This article belongs to the Section Process Control, Modeling and Optimization)

Abstract

This study addresses the problem of incipient fault detection in industrial processes by developing an enhanced dynamic inner principal component analysis (DiPCA) framework. This framework integrates a Transformer attention mechanism and a masked linear autoregressive moving average (ARMA) structure. In contrast to the original DiPCA, this approach can effectively characterize complex dynamic processes while maintaining the theoretical integrity of the original framework. Here, the sliding-window singular value decomposition is chosen to construct the dynamic feature matrix. The ARMA-Transformer network can capture the multi-step prediction features. A DiPCA-based dual monitoring system can then be used to establish dynamic and static residual statistics. The simulations of the Tennessee Eastman Process (TEP) demonstrate that the proposed method yields fault detection rates (FDRs) of 77.80% and 69.50% for incipient faults 3 and 9. These are higher than those of traditional DiPCA, ARMA, PCA, and DPCA by more than 40% on average. Validation on the Case Western Reserve University (CWRU) bearing dataset further shows an average FDR of 94.74% across 13 fault conditions. It thus offers an effective approach for early anomaly identification and safety monitoring in complex industrial processes.

1. Introduction

Fault detection plays a fundamental role in ensuring the safety and efficiency of industrial processes. Modern industrial systems are becoming increasingly complex. Therefore, the early and accurate detection of incipient faults is essential. These faults usually have subtle amplitudes and evolve slowly. If such faults remain undetected, they may progressively accumulate, leading to severe production disruptions and economic losses. Consequently, reliable and real-time fault detection technologies are required. These technologies must also handle noise and complex process dynamics effectively.
Over the past decades, numerous data-driven methods have been proposed for process monitoring. Principal component analysis (PCA) [1], Partial Least Squares (PLS) [2], and Independent Component Analysis (ICA) [3] represent foundational approaches. However, they rely on linear and static assumptions. These methods ignore temporal dependencies and cannot capture the gradual evolution of incipient faults. To address dynamics, extensions such as Dynamic PCA (DPCA) [4] and Dynamic PLS (DPLS) [5] were developed. They incorporate time-lagged variables but remain fundamentally linear, and the augmented dimensionality grows rapidly with lag order. This curse of dimensionality is compounded by the fact that these methods offer no remedy for nonlinear process behaviors. Nonlinear variants, including Kernel PCA (KPCA) [6] and Kernel PLS (KPLS) [7], map data into high-dimensional feature spaces to capture nonlinearities. Nevertheless, they still struggle with incipient faults whose weak amplitudes are easily masked by noise and normal variations. Moreover, quality-driven methods like PLS and its variants introduce additional detection delays due to their reliance on the availability and timeliness of quality measurements, which is particularly unsuitable for early fault scenarios. Collectively, despite these methodological advances from linear static to dynamic and nonlinear frameworks, existing approaches still fall short of providing reliable detection for incipient faults.
Deep learning methods have also been applied to process monitoring. Convolutional Neural Networks (CNNs) [8] effectively extract local features from raw data. Stacked Autoencoders (SAEs) [9] and their variants, such as Deep Ensemble SAE (DE-SAE) [10], excel at nonlinear dimensionality reduction. However, their reconstruction objectives are mainly static. This limits their capacity to model dynamic evolution patterns. More recently, the Transformer architecture [11] has gained attention. Its self-attention mechanism can capture long-range dependencies and overcomes the vanishing gradient limitations of Recurrent Neural Networks (RNNs) [12]. Subsequent studies have further integrated attention mechanisms with Long Short-Term Memory (LSTM) [13] to enhance the identification of critical time steps. Nevertheless, applying standard Transformer architectures to industrial process monitoring remains non-trivial: industrial data exhibit strong autocorrelation and high noise levels, which differ fundamentally from the linguistic data for which Transformers were originally designed. Given these limitations, deep learning methods alone are often insufficient for detecting subtle incipient faults.
Recent research has explored hybrid methods that integrate statistical and deep learning paradigms. These methods combine the interpretability of traditional multivariate statistical process monitoring (MSPM) techniques with the representational power of deep neural networks. These efforts span three primary directions.
The first focuses on ensemble learning frameworks that fuse multiple detectors or features, progressing from a PCA ensemble detector (PCAED) [14] for detecting specific incipient faults, to a time-domain Feature Ensemble Net (FENet) [15], and further to a dense FENet [16] that integrates multiple basic detectors and a time/frequency feature-driven ensemble method [17] that combines time-domain FENet with a PSD-based frequency-domain network via Bayesian inference.
The second direction introduces statistical testing on network outputs to enhance detection reliability. Representative works include a hybrid framework combining ANN-based classification with sequential hypothesis testing for parameter change detection in power converters [18], a system integrating deep anomaly detection with sequential probability ratio testing for temporal evidence aggregation [19], and a method that couples significance testing with conformal prediction to construct prediction sets with formal risk guarantees for fault detection [20]. A third direction combines DiPCA [21] with recurrent networks such as LSTM for dynamic process monitoring [22]. While effective, this approach treats the neural network as an external predictor applied after DiPCA. The proposed method differs in that the ARMA [23]-Transformer replaces the linear AR component inside DiPCA, preserving the autoregressive form, the unsupervised learning setting, and the statistical interpretability of the DiPCA framework. Despite these advances, a closer examination reveals a fundamental distinction: the neural network and the statistical component are coupled externally—operating either as parallel detection modules in ensemble frameworks, or as post hoc decision layers in reliability-oriented approaches. This external coupling limits how organically the neural network can be integrated with the statistical decomposition and residual analysis that underpin traditional MSPM frameworks.
In this work, we explore a distinct architectural paradigm: embedding an ARMA-Transformer directly as the dynamic engine inside DiPCA, replacing its original linear autoregressive component while preserving the full DiPCA statistical backbone. Unlike external coupling, the neural network becomes an integral part of the MSPM framework. This deep integration enables the network’s nonlinear dynamic modeling capability to directly enhance the latent variables extraction that is central to DiPCA, while the established statistical mechanisms of DiPCA remain intact to ensure reliable fault detection decisions. The main contributions of the proposed method are listed as follows:
(1)
A new architecture is proposed that integrates deep learning with traditional statistical process monitoring, offering an effective solution for industrial process safety monitoring. Traditional statistical methods like DiPCA are interpretable and provide rigorous control limits, but their linear dynamics cannot capture complex nonlinear patterns in modern industrial processes. Deep learning methods like Transformers excel at nonlinear modeling, but they lack statistical inferential power and are difficult to interpret. The proposed architecture combines the strengths of both: the ARMA-Transformer handles nonlinear dynamic prediction, while DiPCA provides the statistical framework for residual analysis and fault decision.
(2)
The novelty lies in embedding a neural network as the dynamic core inside DiPCA, rather than adding it as an external module. The original DiPCA relies on a linear autoregressive model to capture temporal dependencies. In this work, an AR-MA-Transformer takes over this role at the same position, while the projection matrices, residual deflation, and control limit calculation of DiPCA remain unchanged. The neural network is not added as an external module; it is placed inside DiPCA as its dynamic engine.
The remainder of this paper is organized as follows. Section 2 gives the problem formulation. Section 3 introduces the details of the proposed ARMA-Transformer DiPCA framework. Section 4 presents experimental results and analysis on the Tennessee Eastman Process (TEP) [24] and the Case Western Reserve University (CWRU) [25] bearing dataset. Finally, Section 5 and Section 6 summarize findings and discuss future work.

2. Problem Formulation

Despite extensive research on fault detection in industrial processes, the reliable detection of incipient faults under unsupervised conditions remains a significant challenge. These faults are typically characterized by their small amplitude and susceptibility to process noise and operating fluctuations. In addition, their fault features often manifest only in a subset of process variables. Unlike abrupt faults that produce sustained large deviations, incipient faults evolve slowly with amplitudes comparable to normal process variations, making them statistically indistinguishable from normal behavior when examined on a per-sample basis.
Existing methods such as DiPCA have demonstrated effectiveness in dynamic process monitoring. However, the linear autoregressive component at the core of DiPCA inherently limits its sensitivity to the subtle, nonlinear evolution patterns of incipient faults in complex industrial environments. This limitation becomes particularly pronounced under unsupervised conditions without labeled fault data.
To address this critical limitation, an enhanced DiPCA framework is proposed in this paper. Compared to traditional DiPCA, the proposed method introduces two key innovations: sliding-window singular value decomposition for dynamic feature extraction, and a masked linear ARMA attention model that replaces the linear dynamic component to generate more informative dynamic residuals. By combining residual analysis with multi-index thresholds and fusion rules, and determining control limits via kernel density estimation (KDE) [26], the framework achieves improved detection performance and numerical robustness, particularly for faults 3 and 9 in the TEP, as well as on the CWRU bearing dataset. It should be noted that simply appending deep learning models as external modules to an MSPM framework does not resolve the fundamental limitation. The linear dynamic component of DiPCA remains unchanged, and the statistical decomposition and residual analysis that define the monitoring backbone are not directly influenced by the deep learning component. A more effective approach is to replace the linear dynamic core of DiPCA with a neural network that serves as its internal dynamic engine, allowing the nonlinear modeling capacity to directly participate in the latent variables extraction and residual generation that are central to fault detection.

3. ARMA-Transformer Based on DiPCA

In this section, we describe how the proposed ARMA-Transformer based on DiPCA was developed. First, the dynamic latent variables were extracted through DiPCA. Then, sliding-window singular-value sequences were utilized to construct the features. Furthermore, an autoregressive Transformer with masked linear ARMA components was employed for multi-step forecasting. The overall framework of the proposed ARMA-Transformer based on DiPCA is illustrated in Figure 1.

3.1. Sliding-Window Singular Values

For the real-time data stream of industrial processes, set an appropriate window size w , and select w 1 samples immediately preceding the current sample to form a sliding-window matrix. Assume the sampling instant is q , and the number of sensors is m . Hence, the sliding-window matrix is given by
X q = x q w + 1 , 1 x q w + 1 , m x q , 1 x q , m R w × m
The standardization of X q yields the matrix X ¯ q :
X ¯ q = X q 1 w × 1 × μ q T 1 w × 1 × σ q T
where 1 w × 1 denotes a column vector of ones, μ q R 1 × m and σ q R 1 × m are row vectors of column-wise means and standard deviations:
μ q = 1 w i = 1 W X q , i
σ q = 1 w i = 1 w X q , i - μ q 2
where X q , i is the i-th row of X q . After standardization, each column of X ¯ q has zero mean and unit variance.
Performing singular value decomposition on the standardized sliding-window matrix X q R w × m yields
X ¯ q = U q M q V q T
where U q R w × w and V q R m × m are orthogonal, and the diagonal entries of M q are the m non-negative singular values σ 1 , q , σ 2 , q , , σ m , q at time index q . By sliding the window along the time series, we collect, for each j = 1 , 2 , , m , the sequence
s j = σ j , w , σ j , w + 1 , , σ j , T T
forming m singular-value time series. Selecting the top n s v of these yields a multivariate sequence S R T × n s v which serves as the input to the DiPCA stage. In this work, n s v = 5 is used. These singular value sequences compactly represent the dominant energy structure within each window. DiPCA is then applied to extract latent variables that exhibit strong temporal autocorrelation from this representation.

3.2. DiPCA

DiPCA integrates dynamic data modeling with PCA to extract latent variables with autocorrelation structures from high-dimensional process data.
In the dynamic latent variables extraction stage, it iteratively extracts l dynamic latent variables t k = S w k k = 1 , 2 , , l , where t k R N represents the k -th latent variable scores and w k R n s v is the k -th projection vector. The projection vector w k is obtained by solving the optimization problem:
m a x w k , β J = i = s N 1 ( j = 1 s β j t k , i j ) t k , i
subject to the constraints w k = 1 and β = 1 , where s is the lag order, β R s is the autoregressive coefficient vector, and t k , i = S i w k is the i -th score of the k -th component, where S i denotes the i-th row of the singular value feature matrix S
The optimization is solved iteratively through alternating updates. With w k fixed, β is obtained by solving the least squares problem:
β = ( T l a g T T l a g ) 1 T l a g T t s + 1 : N
and then normalized as β β / β . Then, with β fixed, w k is updated via gradient ascent followed by normalization, where the gradient is given by
w k J = X T T lag β + j = 1 s β j X s j : N 1 j T t s + 1 : N
This process iterates until convergence is achieved by monitoring relative changes in w k and β . Upon convergence, the loading vector is computed as
p k = S T t k t k T t k
and the residual matrix is updated by removing the extracted component:
S S t k p k T
In the Projection and Reconstruction Matrices stage, the final projection and loading matrices are
W = w 1 , w 2 , , w l R n s v × l
P = p 1 , p 2 , , p l R n s v × l
The latent variables and data reconstruction are given by
T = S R
X ^ = T P T
where R = W ( P T W ) 1 R n s v × l is the projection matrix.

3.3. Dynamic Prediction with ARMA Attention Formulation

The latent variables T extracted by DiPCA capture the dominant autocorrelated dynamics of the process, but the linear AR model within DiPCA cannot fully characterize the residual nonlinear temporal patterns. To address this, the dynamic prediction of latent variables employs an ARMA attention mechanism that integrates both autoregressive (AR) and moving average (MA) components. For the latent variables T R N × l obtained from DiPCA via Equation (14), the autoregressive component follows the AR structure of the classical ARMA model, where past observations are weighted by fixed coefficients:
o t AR = i = 1 t w t , i v i
where v i R l × d represents the value vector at time i , and w t , i R l × d denotes the attention weights derived from fixed parameter matrices rather than dynamic query–key interactions. This formulation constitutes a causal linear layer with a strictly lower triangular structure.
Following the moving average structure of the ARMA model, the prediction error is decomposed into past innovations weighted by MA coefficients:
v t + 1 = o t AR + o t MA + ϵ t = i = 1 t w t , i v i + j = 1 t 1 θ t 1 , j ϵ j + ϵ t
where the AR residual term r t is decomposed as
r t = j = 1 t 1 θ t 1 , j ϵ j + ϵ t
In the indirect MA weight generation stage, to avoid computationally expensive matrix operations while maintaining the MA structure, an indirect weight generation scheme is employed [27]. Instead of explicitly computing θ t 1 , j , the AR residuals r j are utilized as value inputs for the MA component:
j = 1 t 1 β t 1 , j r j = j = 1 t 1 θ t 1 , j ϵ j
This establishes the matrix relationship B r = Θ ϵ with the implicit transformation:
B = Θ I + Θ 1
Θ = B I B 1
The MA weight matrix Θ exhibits desirable temporal decay properties, with elements satisfying
θ i j = b 1 + b i j 1 , for   i > j
where b represents the characteristic decay parameter. This decay pattern matches the behavior of a standard MA process, where the influence of past innovations diminishes with increasing time lag.
In the MA component via the Transformed Queries and Keys stage, the MA component o t MA is computed using a dedicated attention mechanism applied to the AR residuals. First, the queries and keys for the MA part are transformed via specialized functions:
Q t MA ~ = ϕ q MA Q t MA
K j MA ~ = ϕ k MA K j MA
where ϕ q MA and ϕ k MA are element-wise transformation functions designed to shape the distribution of attention weights for the MA dynamics. The MA output is then obtained by applying the transformed attention weights to the residual values r j :
o t MA = j = 1 t 1 Q t MA ~ K j MA ~ d k r j
This formulation efficiently captures the short-term dependencies and error corrections in the residual sequence, corresponding to the MA characteristic within the attention framework.
In the final ARMA output and prediction stage, the final output of the ARMA attention block is generated by integrating the autoregressive and MA components, followed by a linear projection:
o t = o t AR + o t MA W o
where W o R d × d is a learnable output projection matrix. This combined output o t serves as the one-step-ahead prediction for the latent variable scores at time t + 1 , effectively leveraging both the historical pattern (via AR) and the recent innovation sequence (via MA) to enhance dynamic prediction accuracy.

3.4. Control Limit Calculation and Fault Diagnosis

Once the ARMA-Transformer predicts the latent scores, the prediction error is decomposed into two complementary components: the dynamic residual in the latent space and the static residual in the original feature space. These residuals form the basis for fault detection. The control limits for monitoring statistics are established through KDE to characterize the empirical distribution of normal operation data. For each of the three monitoring statistics, the corresponding control limit ϕ l i m is determined as the α -quantile of the KDE-fitted distribution. The KDE-based probability density function is formulated as
f KDE x = 1 n h i = 1 n K x ϕ i h
where K is the Gaussian kernel function, h is the bandwidth parameter, and ϕ i are the observed monitoring statistics from normal operation data.
Three monitoring statistics are employed for comprehensive fault detection:
ϕ v i = v i T Σ v 1 v i
where v i = T i T ^ i represents the dynamic residual vector.
T r 2 i = E r i T Σ r 1 E r i
where E r i denotes the PCA-reduced static residual.
Q r i = | E i E ^ i | 2 2
where E i = X i T ^ i P T represents the static residual vector.
A process observation is flagged as faulty when any monitoring statistic exceeds its control limit.

3.5. Algorithms

The operation of the ARMA-Transformer based on DiPCA consists of two sequential parts. First, historical normal data is used to train the model and compute statistical control limits, as formalized in Algorithm 1. Subsequently, the trained model is deployed for continuous, real-time incipient fault detection on new data samples, with the exact steps provided in Algorithm 2.
Algorithm 1. Offline Training
Input: X q —input data; w—sliding window width; n s v —number of singular values; s—lag order; l—number of latent variables
Output: P,W,R—DiPCA matrices; w t , i ,   β t 1 , j ,   W o —trained parameters; ϕ v l i m , T 2 r l i m , Q r l i m —control limits
1: Normalize sliding windows
2: Perform SVD on each window and obtain sequence { s j }, retain top n s v to form S
3: Extract l dynamic latent variables T and obtain P, W (Equations (7)–(11))
4: Compute R (Equations (14) and (15))
5: for each training epoch do
6:   Input T into transformer
7:   Compute loss between predicted and actual T (Equations (16)–(26))
8:   Backpropagate loss and update model parameters w t , i , β t 1 , j , W o
9: end for
10: Perform forward pass on T to obtain predicted
11: Compute ϕ v , T r 2 , Q r on normal data (Equations (28)–(30))
12: Compute control limit ϕ v l i m , T 2 r l i m , Q r l i m (Equation (27))
15: return P, W, R, w t , i ,   β t 1 , j ,   W o , ϕ v l i m , T 2 r l i m , Q r l i m
Algorithm 2. Online Detection
Input: X q —new sample; P, W, R, w t , i ,   β t 1 , j ,   W o , ϕ v l i m , T 2 r l i m , Q r l i m —trained parameters; s—lag order; ϕ v l i m , T 2 r l i m , Q r l i m —control limits
Output: status of X q
1: Extract singular values from sliding window on X q , perform SVD, form S
2: Project to latent space T (Equation (14))
3: for each predicted step do
4:  Predict T (Equations (16)–(26))
5:  Compute ϕ v , T r 2 , Q r (Equations (28)−(30))
6:  if ϕ v > ϕ v l i m or T r 2 > T 2 r l i m or Q r > Q r l i m  then
7:   Mark step as abnormal
8:  else
9:   Mark step as normal
10:  end if
11: end for
12: if any abnormal step exists then
13:  status = fault
14: else
15:  status = normal
16: end if
17: return status of X q

4. Simulation

In this section, the famous benchmark TEP and the CWRU bearing dataset are used to verify the superior performance of the proposed methodology.

4.1. Tennessee Eastman Process

This chemical plant simulator, originally created at Eastman Chemical Company, is widely accepted in process systems engineering for benchmarking fault detection algorithms because it exhibits pronounced nonlinear and time-varying behavior. The simulation integrates five-unit operations: a reactor, product condenser, vapor–liquid separator, recycle compressor, and stripper. Thirty-three signals are logged: twenty-two continuous measurements (XMEAS 1–22) and eleven controller outputs (XMV 1–11). Together, they describe the essential dynamic and operational states of the plant. Twenty-one fault scenarios are predefined within the simulator. In this study, faults 3 and 9 are mainly examined. Fault 3 is a step change in the reactor cooling-water inlet temperature, and fault 9 is a random variation in the reactor feed composition. Figure 2 shows the system structure of TEP.
The simulation data were generated using a closed-loop version of the TEP [28], available at http://depts.Washington.edu/control/LARRY/TE/download.html (accessed on 10 April 2026). Both the training and test datasets were simulated for 200 h with a sampling interval of three minutes. For each test dataset, a fault was introduced after 100 h of simulation. With the exception of fault 6, 4000 training samples and 4000 faulty test samples were obtained for each fault type. It should be noted that fault 0 represents the normal operating condition. The fault detection rate (FDR) was calculated based on the last 2000 sampling instants of each test dataset, while the false alarm rate (FAR) was evaluated using the fault 0 dataset.
A comprehensive performance comparison of the proposed method against other fault detection approaches on the TEP dataset is presented in Table 1. The compared methods include the traditional statistical approaches PCA and DPCA, as well as the deep learning-based approaches Transformer, LSTM-VAE [29], and 1DRGANomaly [30]. The proposed method achieves an average FDR of 82.97%, significantly outperforming PCA and DPCA. For the challenging incipient faults 3 and 9, the Transformer achieves FDRs of only 2.70% and 3.35%, LSTM-VAE achieves 9.13% and 0.04%, and 1DRGANomaly achieves 1.90% and 1.90%, while the proposed method attains 77.80% and 69.50%. These results highlight the advantage of embedding the ARMA-Transformer within the DiPCA statistical framework over both traditional and standalone deep learning approaches.
An ablation study quantifying the contribution of each component is presented in Table 2. For the SVD-based methods, a sliding-window size of 1500 was used, and the top five singular values were retained from each window. All models were trained with a fixed random seed of 42, the Adam optimizer with a learning rate of 0.001, and 100 training epochs. The lag order was s = 2, and the number of latent variables was l = 5 for DiPCA. The ARMA-Transformer used a hidden dimension of 64 with four attention heads. Control limits were computed via KDE at a confidence level of 99% (α = 0.01). The results show that the full model achieves an average FDR of 82.97%. Removing the SVD-based feature extraction causes a drop of 19.30 percentage points, replacing the ARMA-Transformer with the original linear AR component of DiPCA reduces the average FDR by 17.93 percentage points, and removing the DiPCA latent variable structure leads to a decrease of 7.51 percentage points. These results demonstrate that all three components contribute meaningfully to the overall performance, with SVD providing noise suppression and compact dynamic representation, DiPCA extracting autocorrelated latent structure, and the ARMA-Transformer capturing nonlinear temporal patterns that the linear AR model cannot characterize.
For detailed illustration, the detection trajectories obtained by the three methods are compared in Figure 3 and Figure 4, where the red dashed line denotes the control limit and the green dotted line marks the time instant at which the fault is introduced.

4.2. CWRU Bearing Dataset

The CWRU bearing dataset serves as an internationally recognized benchmark for validating bearing fault diagnosis algorithms. In this study, vibration signals were acquired at a sampling frequency of 48 kHz from the experimental platform, which consists of a motor, a torque transducer, and related data acquisition equipment. The dataset includes artificially introduced single-point faults on the inner race, outer race (at 3, 6, and 12 o’clock positions), and rolling elements, generated via electrical discharge machining to simulate realistic localized damage. A total of 13 distinct fault conditions are systematically categorized based on fault location and size, providing a comprehensive foundation for evaluating diagnostic model performance under varying fault types and severity levels.
The performance of the proposed method on the CWRU dataset is comprehensively summarized in Table 3. To ensure a fair comparison, all deep models were configured with consistent hyperparameters: a fixed random seed of 42, a sliding-window size of 1000, the Adam optimizer with a learning rate of 0.001, and 100 training epochs. For the proposed method, the top 5 singular values were retained, the lag order was s = 2, the number of latent variables was l = 5 for DiPCA, and the ARMA-Transformer used a hidden dimension of 64 with four attention heads. Control limits were computed via KDE at a confidence level of 99% (α = 0.01). The comparative results further highlight the advantage of the integrated framework over the individual DiPCA and ARMA baselines. For the ball fault with a fault size of 14 mils, DiPCA achieves an FDR of only 16.35% and ARMA achieves 66.20%, whereas the proposed method significantly improves the detection rate to 92.75%. For the inner race fault with a fault size of 14 mils, DiPCA achieves only 28.90% and ARMA achieves 55.85%, while the proposed method attains 74.20%. For the outer race fault at the 12 o’clock position with a fault size of 21 mils, DiPCA achieves only 39.65% and ARMA achieves 86.30%, compared to 95.20% achieved by the proposed method. In terms of average FDR, the proposed method achieves 94.74%, substantially outperforming DiPCA (51.61%) and ARMA (90.53%). These results collectively demonstrate the superiority and generalization capability of the proposed integrated monitoring framework in bearing fault detection tasks.

5. Discussion

Because no physical model is required, data-driven fault detection is a widely studied research topic in dynamic processes. The experimental findings presented in this study demonstrate the efficacy of the ARMA-Transformer DiPCA framework for incipient fault detection in complex industrial systems. The superior performance observed across both the TEP and the CWRU bearing datasets can be attributed to several methodological innovations inherent in the proposed approach.
Primarily, the integration of a masked linear ARMA attention mechanism within the Transformer architecture addresses a fundamental limitation of conventional process monitoring techniques. Traditional statistical methodologies, including PCA and DPCA, are predicated on linear assumptions. Consequently, they cannot adequately capture the complex temporal dependencies of industrial processes. By incorporating ARMA dynamics into the attention mechanism, the proposed method effectively models both autoregressive patterns and MA components. This enables a more precise characterization of process dynamics and fault signatures.
Furthermore, the synergistic combination of deep learning components with the established DiPCA statistical framework constitutes a robust monitoring system. While deep learning models exhibit superior capability in capturing complex nonlinear relationships, they often lack the statistical rigor required for reliable process monitoring. Here, the DiPCA foundation provides a statistical framework for residual analysis and control limit determination, ensuring that the proposed method maintains a low average FAR of 0.067% across all TEP fault scenarios while simultaneously achieving enhanced detection sensitivity. This low false alarm rate, together with the interpretable control limits derived from the statistical framework, helps reduce operator burden in practical applications.
To assess the practical deployability of the proposed method, its computational complexity and real-time performance are examined. All experiments were conducted on a computer equipped with an Intel Core Ultra 9 285H CPU and 16 GB of RAM, using Python 3.13 and PyTorch 2.8.0 (CPU-only). Across the 22 TEP fault scenarios, the average training time was 5.3 s, and the average test time was 3.4 s. The test time corresponds to processing 4000 samples, yielding an average inference latency of approximately 0.85 ms per sample. This per-sample latency is well within the sampling interval of three minutes in the TEP benchmark, confirming that the proposed method is suitable for real-time industrial process monitoring. Notably, these results were achieved without GPU acceleration, indicating that the method can be deployed on modest industrial computing hardware. The short training time further suggests that the model can be periodically retrained or updated without disrupting online operations.
The observed performance variations across different fault categories merit further investigation. For TEP faults 3 and 9, the proposed method demonstrates substantial improvements over conventional approaches, suggesting that the ARMA-Transformer component is particularly effective at detecting faults characterized by subtle, slowly evolving deviations. The comparatively lower performance on fault 5 relative to faults 3 and 9 may be attributed to the specific dynamic characteristics associated with condenser cooling-water temperature perturbations, which may present greater detection challenges. For fault 5, the step change introduces only a subtle shift in a single variable. Since the sliding-window SVD captures the dominant energy structure of the windowed data, this low-amplitude perturbation is easily diluted by other high-energy variables, resulting in singular value features that carry insufficient fault information for the downstream DiPCA and ARMA-Transformer stages. For fault 15, which also exhibits modest detection rates, the valve sticking fault produces intermittent transient deviations that occur only when the valve is commanded to move. The fixed-weight AR attention captures stable temporal patterns, and the MA component further smooths short-term residual fluctuations through its exponential decay structure. As a result, the transient signatures may be partially attenuated before reaching the monitoring statistics, leading to discontinuous threshold exceedances and a lower overall FDR. These cases indicate that while the deep integration of the ARMA-Transformer with DiPCA enhances sensitivity to slowly evolving incipient faults, the reliance on singular value features and the smoothing effect of the MA decay structure can reduce responsiveness to isolated transient events and single-variable perturbations.
For the CWRU dataset, the consistently high performance across the majority of fault conditions substantiates the method’s generalization capability to mechanical systems. The exception of condition 5 (74.20% FDR) indicates that certain fault orientations may pose specific challenges for vibration-based detection methodologies. This phenomenon could be attributable to the directional sensitivity of vibration sensors or to distinct propagation patterns associated with faults at the inner race with a 14-size defect. At this intermediate defect size (14 mils), the fault-induced impulses are weaker than those of larger defects (21 mils), while the increased randomness from the larger damaged area makes the fault signature less periodic than that of smaller defects (7 mils). The sliding-window SVD extracts dominant singular value patterns and may not fully preserve these weak and partially irregular impulses. The fixed-weight AR attention captures stable latent dynamics, and the MA component applies temporal smoothing to the residuals. For vibration signals with irregular transient impulses, this smoothing can attenuate fault signatures before they reach the monitoring statistics. Despite these effects, the FDR of 74.20% remains substantially higher than the DiPCA-only result of 28.90% and the ARMA-only result of 55.85%, indicating that the integrated framework retains considerable fault sensitivity even under suboptimal conditions.
The practical implications of these findings hold significant relevance for industrial applications. The ability of the proposed method to maintain low FARs while achieving high detection sensitivity addresses a critical operational requirement in process industries. False alarms and undetected faults can lead to unnecessary production interruptions and economic losses. Moreover, the method’s efficacy in detecting small-scale faults underscores its potential for predictive maintenance applications, facilitating early intervention prior to the escalation of faults into catastrophic failures.

6. Conclusions

This research addresses the core challenge of detecting incipient faults in complex industrial processes. It proposes an enhanced DiPCA based on ARMA-Transformer. The method combines the ARMA structure for modeling temporal patterns with the Transformer architecture for capturing long-range dependencies. To overcome this limitation, a masked linear ARMA attention mechanism is introduced to replace the linear dynamic component of DiPCA. Tests on the TEP show that this method detects typical incipient faults 3 and 9 at much higher rates than conventional methods. In addition, tests on the CWRU bearing dataset confirm that the method works well for mechanical systems. The main contribution of this work is an embedded architecture that integrates deep learning with traditional statistical process monitoring. The neural network is placed inside DiPCA as its dynamic engine rather than added as an external module, offering an effective solution for industrial process safety monitoring. However, the effectiveness of the proposed method under time-varying process dynamics and real industrial conditions remains to be further investigated, as current validation primarily focuses on standard benchmarks. Future work will include validation on real minute-level industrial datasets from large-scale chemical enterprises to assess the method’s performance under actual sensor noise, variable coupling, and temporal irregularities.

Author Contributions

Conceptualization, methodology, software, validation, formal analysis, X.K., H.H. and M.C.; investigation, M.C.; resources, data curation, X.K. and H.H.; writing—original draft preparation, writing—review and editing, X.K., H.H. and M.C.; visualization, supervision, project administration, M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Malluhi, B.; Nounou, H.; Nounou, M. Enhanced Multiscale Principal Component Analysis for Improved Sensor Fault Detection and Isolation. Sensors 2022, 22, 5564. [Google Scholar] [CrossRef] [PubMed]
  2. Geladi, P.; Kowalski, B.R. Partial least-squares regression: A tutorial. Anal. Chim. Acta 1986, 185, 1–17. [Google Scholar] [CrossRef]
  3. Tong, C.; Palazoglu, A.; Yan, X. Improved ICA for process monitoring based on ensemble learning and Bayesian inference. Chemom. Intell. Lab. Syst. 2014, 135, 141–149. [Google Scholar] [CrossRef]
  4. Kazemi, P.; Masoumian, A.; Martin, P. Fault Detection and Isolation for Time-Varying Processes Using Neural-Based Principal Component Analysis. Processes 2024, 12, 1218. [Google Scholar] [CrossRef]
  5. Kaspar, M.H.; Ray, W.H. Dynamic PLS modelling for process control. Chem. Eng. Sci. 1993, 48, 3447–3461. [Google Scholar] [CrossRef]
  6. Lee, J.-M.; Yoo, C.; Choi, S.W.; Vanrolleghem, P.A.; Lee, I.-B. Non-linear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59, 223–234. [Google Scholar] [CrossRef]
  7. Rosipal, R.; Trejo, L.J. Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space. J. Mach. Learn. Res. 2002, 2, 97–123. [Google Scholar] [CrossRef]
  8. Lee, K.B.; Cheon, S.; Kim, C.O. A Convolutional Neural Network for Fault Classification and Diagnosis in Semiconductor Manufacturing Processes. IEEE Trans. Semicond. Manuf. 2017, 30, 135–142. [Google Scholar] [CrossRef]
  9. Yu, J.; Yan, X. A New Deep Model Based on the Stacked Autoencoder with Intensified Iterative Learning Style for Industrial Fault Detection. Process Saf. Environ. Prot. 2021, 153, 47–59. [Google Scholar] [CrossRef]
  10. Li, Z.; Tian, L.; Jiang, Q.; Yan, X. Distributed-ensemble stacked autoencoder model for non-linear process monitoring. Inf. Sci. 2021, 542, 302–316. [Google Scholar] [CrossRef]
  11. Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. arXiv 2021, arXiv:2106.13008. [Google Scholar] [CrossRef]
  12. Salehinejad, H.; Sankar, S.; Barfett, J.; Colak, E.; Valaee, S. Recent Advances in Recurrent Neural Networks. arXiv 2017, arXiv:1801.01078. [Google Scholar] [CrossRef]
  13. Zeng, L.; Jin, Q.; Lin, Z.; Zheng, C.; Wu, Y.; Wu, X.; Gao, X. Dual-attention LSTM autoencoder for fault detection in industrial complex dynamic processes. Process Saf. Environ. Prot. 2024, 185, 1145–1159. [Google Scholar] [CrossRef]
  14. Liu, D.; Shang, J.; Chen, M. Principal Component Analysis-Based Ensemble Detector for Incipient Faults in Dynamic Processes. IEEE Trans. Ind. Inf. 2021, 17, 5391–5401. [Google Scholar] [CrossRef]
  15. Liu, D.; Wang, M.; Chen, M. Feature Ensemble Net: A Deep Framework for Detecting Incipient Faults in Dynamical Processes. IEEE Trans. Ind. Inf. 2022, 18, 8618–8628. [Google Scholar] [CrossRef]
  16. Wang, M.; Cheng, F.; Chen, K.; Qiu, G.; Cheng, Y.; Chen, M. Incipient fault detection based on dense ensemble net. Neurocomputing 2024, 601, 128211. [Google Scholar] [CrossRef]
  17. Miao, Y.; Li, Z.; Chen, M. Time/Frequency Feature-Driven Ensemble Learning for Fault Detection. Processes 2024, 12, 2099. [Google Scholar] [CrossRef]
  18. Markovic, N.; Stoetzel, T.; Staudt, V.; Kolossa, D. Hybrid Condition Monitoring for Power Converters: Learning-Based Methods With Statistical Guarantees. IEEE Access 2023, 11, 31855–31865. [Google Scholar] [CrossRef]
  19. George, A.; Shepherd, W.; Tait, S.; Mihaylova, L.; Anderson, S.R. Explainable Deep Anomaly Detection with Sequential Hypothesis Testing for Robotic Sewer Inspection. arXiv 2025, arXiv:2507.22546. [Google Scholar] [CrossRef]
  20. Mei, M.; Li, Y.; Qian, Y.; Jia, Z. Calibrated Prediction Set in Fault Detection with Risk Guarantees via Significance Tests. arXiv 2025, arXiv:2508.01208. [Google Scholar] [CrossRef]
  21. Dong, Y.; Qin, S.J. A novel dynamic PCA algorithm for dynamic data modeling and process monitoring. J. Process Control 2018, 67, 1–11. [Google Scholar] [CrossRef]
  22. Bai, Y.; Xiang, S.; Cheng, F.; Zhao, J. A Dynamic-Inner LSTM Prediction Method for Key Alarm Variables Forecasting in Chemical Process. Chin. J. Chem. Eng. 2023, 55, 266–276. [Google Scholar] [CrossRef]
  23. Zorzi, M. On the Identification of ARMA Graphical Models. IEEE Trans. Autom. Control 2025, 70, 403–414. [Google Scholar] [CrossRef]
  24. Yin, S.; Ding, S.X.; Haghani, A.; Hao, H.; Zhang, P. A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. J. Process Control 2012, 22, 1567–1581. [Google Scholar] [CrossRef]
  25. Hendriks, J.; Dumond, P.; Knox, D.A. Towards better benchmarking using the CWRU bearing fault dataset. Mech. Syst. Signal Process. 2022, 169, 108732. [Google Scholar] [CrossRef]
  26. Tobin, R.J.; Houghton, C.J. A kernel-based calculation of information on a metric space. Entropy 2013, 15, 4540–4560. [Google Scholar] [CrossRef]
  27. Lu, J.; Han, X.; Sun, Y.; Yang, S. Autoregressive Moving-Average Attention Mechanism for Time Series Forecasting. arXiv 2024, arXiv:2410.03159. [Google Scholar] [CrossRef]
  28. Bathelt, A.; Ricker, N.L.; Jelali, M. Revision of the Tennessee Eastman Process Model. IFAC-PapersOnLine 2015, 48, 309–314. [Google Scholar] [CrossRef]
  29. Han, P.; Ellefsen, A.L.; Li, G.; Holmeset, F.T.; Zhang, H. Fault Detection with LSTM-Based Variational Autoencoder for Maritime Components. IEEE Sens. J. 2021, 21, 21903–21912. [Google Scholar] [CrossRef]
  30. Deng, X.; Xiao, L.; Liu, X.; Zhang, X. One-Dimensional Residual GANomaly Network-Based Deep Feature Extraction Model for Complex Industrial System Fault Detection. IEEE Trans. Instrum. Meas. 2023, 72, 3520013. [Google Scholar] [CrossRef]
Figure 1. The scheme of ARMA-Transformer based on DiPCA.
Figure 1. The scheme of ARMA-Transformer based on DiPCA.
Processes 14 01504 g001
Figure 2. The system structure of TEP.
Figure 2. The system structure of TEP.
Processes 14 01504 g002
Figure 3. Detection performance of fault 3 in TEP. (a) DiPCA; (b) ARMA; (c) the proposed method.
Figure 3. Detection performance of fault 3 in TEP. (a) DiPCA; (b) ARMA; (c) the proposed method.
Processes 14 01504 g003
Figure 4. Detection performance of fault 9 in TEP. (a) DiPCA; (b) ARMA; (c) the proposed method.
Figure 4. Detection performance of fault 9 in TEP. (a) DiPCA; (b) ARMA; (c) the proposed method.
Processes 14 01504 g004
Table 1. FDRs (%) of different methods for TEP.
Table 1. FDRs (%) of different methods for TEP.
FaultTypePCADPCATransformerLSTM-VAE1DRGANomalyThe Proposed Method
T 2 SPE
0/1.702.100.000.000.900.450.00
1Step99.9599.95100.0086.6599.7099.9099.60
2Step99.9599.8099.8086.2699.1099.7099.30
3Step5.7010.252.709.130.701.9077.80
4Step99.9599.95100.0086.910.2099.9599.80
5Step3.354.001.850.001.300.700.00
6Step100.00100.0099.3032.1399.3099.3098.60
7Step100.00100.00100.0086.960.0099.95100.00
8Random variation99.6599.6599.6585.8797.7599.1598.60
9Random variation7.7012.853.350.041.401.9069.50
10Random variation93.5595.3091.9085.302.8572.7197.30
11Random variation98.7099.4599.9586.7855.5297.1599.75
12Random variation46.5061.5053.5584.6110.2440.5895.65
13Slow drift97.6597.5599.8585.9195.4099.1598.85
14Sticking99.9099.9099.9586.6534.2897.5099.45
15Sticking3.052.502.450.001.051.1519.40
16Unknown1.802.401.300.000.301.1548.25
17Unknown99.1099.1599.7086.2662.8298.4599.15
18Unknown87.0593.2090.5084.5715.9961.1796.90
19Unknown99.9099.8599.9086.3931.9897.2599.20
20Unknown99.3099.3098.7585.3553.1298.5098.10
21Constant position2.903.651.250.000.800.2547.10
Average/68.8470.4968.8459.3236.3765.1282.97
Table 2. Ablation study: FDRs (%) of different component combinations on the TEP dataset.
Table 2. Ablation study: FDRs (%) of different component combinations on the TEP dataset.
FaultTypeSVD + DiPCASVD + ARMADiPCA + ARMAThe Proposed Method
0/0.000.000.000.00
1Step99.8599.80100.0099.60
2Step95.9099.2599.3599.30
3Step0.0036.250.0077.80
4Step98.50100.00100.0099.80
5Step0.108.700.000.00
6Step99.3099.30100.0098.60
7Step100.00100.00100.00100.00
8Random variation96.6598.9598.6598.60
9Random variation0.1528.050.0569.50
10Random variation85.0598.1078.1397.30
11Random variation97.9599.9095.4099.75
12Random variation2.3596.756.8695.65
13Slow drift99.0098.9099.0598.85
14Sticking99.1599.8099.8099.45
15Sticking0.103.650.0019.40
16Unknown0.1511.500.0048.25
17Unknown99.3099.2596.7099.15
18Unknown95.4097.3069.6296.90
19Unknown99.7099.5097.8099.20
20Unknown97.0598.2595.6598.10
21Constant position0.1011.450.0047.10
Average/65.0475.4663.6782.97
Table 3. FDRs (%) for different fault types in the CWRU bearing dataset.
Table 3. FDRs (%) for different fault types in the CWRU bearing dataset.
IndexFault LocationFault SizeDiPCAARMAThe Proposed Method
1Ball721.0599.3096.45
21416.3566.2092.75
32128.3096.3591.65
4Inner race769.5598.5097.85
51428.9055.8574.20
62179.2596.8098.35
7Outer race at 3 o’clock positions782.4099.4098.80
82189.35100.00100.00
9Outer race at 6 o’clock positions793.1599.6598.95
101417.3092.6096.85
112167.8586.7592.20
12Outer race at 12 o’clock positions737.8599.2598.40
132139.6586.3095.20
Average FDR//51.6190.5394.74
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kong, X.; Huang, H.; Chen, M. Industrial Process Monitoring Based on DiPCA and ARMA-Transformer. Processes 2026, 14, 1504. https://doi.org/10.3390/pr14101504

AMA Style

Kong X, Huang H, Chen M. Industrial Process Monitoring Based on DiPCA and ARMA-Transformer. Processes. 2026; 14(10):1504. https://doi.org/10.3390/pr14101504

Chicago/Turabian Style

Kong, Xiaoran, Hanxuan Huang, and Maoyin Chen. 2026. "Industrial Process Monitoring Based on DiPCA and ARMA-Transformer" Processes 14, no. 10: 1504. https://doi.org/10.3390/pr14101504

APA Style

Kong, X., Huang, H., & Chen, M. (2026). Industrial Process Monitoring Based on DiPCA and ARMA-Transformer. Processes, 14(10), 1504. https://doi.org/10.3390/pr14101504

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop