1. Introduction
Fault detection plays a fundamental role in ensuring the safety and efficiency of industrial processes. Modern industrial systems are becoming increasingly complex. Therefore, the early and accurate detection of incipient faults is essential. These faults usually have subtle amplitudes and evolve slowly. If such faults remain undetected, they may progressively accumulate, leading to severe production disruptions and economic losses. Consequently, reliable and real-time fault detection technologies are required. These technologies must also handle noise and complex process dynamics effectively.
Over the past decades, numerous data-driven methods have been proposed for process monitoring. Principal component analysis (PCA) [
1], Partial Least Squares (PLS) [
2], and Independent Component Analysis (ICA) [
3] represent foundational approaches. However, they rely on linear and static assumptions. These methods ignore temporal dependencies and cannot capture the gradual evolution of incipient faults. To address dynamics, extensions such as Dynamic PCA (DPCA) [
4] and Dynamic PLS (DPLS) [
5] were developed. They incorporate time-lagged variables but remain fundamentally linear, and the augmented dimensionality grows rapidly with lag order. This curse of dimensionality is compounded by the fact that these methods offer no remedy for nonlinear process behaviors. Nonlinear variants, including Kernel PCA (KPCA) [
6] and Kernel PLS (KPLS) [
7], map data into high-dimensional feature spaces to capture nonlinearities. Nevertheless, they still struggle with incipient faults whose weak amplitudes are easily masked by noise and normal variations. Moreover, quality-driven methods like PLS and its variants introduce additional detection delays due to their reliance on the availability and timeliness of quality measurements, which is particularly unsuitable for early fault scenarios. Collectively, despite these methodological advances from linear static to dynamic and nonlinear frameworks, existing approaches still fall short of providing reliable detection for incipient faults.
Deep learning methods have also been applied to process monitoring. Convolutional Neural Networks (CNNs) [
8] effectively extract local features from raw data. Stacked Autoencoders (SAEs) [
9] and their variants, such as Deep Ensemble SAE (DE-SAE) [
10], excel at nonlinear dimensionality reduction. However, their reconstruction objectives are mainly static. This limits their capacity to model dynamic evolution patterns. More recently, the Transformer architecture [
11] has gained attention. Its self-attention mechanism can capture long-range dependencies and overcomes the vanishing gradient limitations of Recurrent Neural Networks (RNNs) [
12]. Subsequent studies have further integrated attention mechanisms with Long Short-Term Memory (LSTM) [
13] to enhance the identification of critical time steps. Nevertheless, applying standard Transformer architectures to industrial process monitoring remains non-trivial: industrial data exhibit strong autocorrelation and high noise levels, which differ fundamentally from the linguistic data for which Transformers were originally designed. Given these limitations, deep learning methods alone are often insufficient for detecting subtle incipient faults.
Recent research has explored hybrid methods that integrate statistical and deep learning paradigms. These methods combine the interpretability of traditional multivariate statistical process monitoring (MSPM) techniques with the representational power of deep neural networks. These efforts span three primary directions.
The first focuses on ensemble learning frameworks that fuse multiple detectors or features, progressing from a PCA ensemble detector (PCAED) [
14] for detecting specific incipient faults, to a time-domain Feature Ensemble Net (FENet) [
15], and further to a dense FENet [
16] that integrates multiple basic detectors and a time/frequency feature-driven ensemble method [
17] that combines time-domain FENet with a PSD-based frequency-domain network via Bayesian inference.
The second direction introduces statistical testing on network outputs to enhance detection reliability. Representative works include a hybrid framework combining ANN-based classification with sequential hypothesis testing for parameter change detection in power converters [
18], a system integrating deep anomaly detection with sequential probability ratio testing for temporal evidence aggregation [
19], and a method that couples significance testing with conformal prediction to construct prediction sets with formal risk guarantees for fault detection [
20]. A third direction combines DiPCA [
21] with recurrent networks such as LSTM for dynamic process monitoring [
22]. While effective, this approach treats the neural network as an external predictor applied after DiPCA. The proposed method differs in that the ARMA [
23]-Transformer replaces the linear AR component inside DiPCA, preserving the autoregressive form, the unsupervised learning setting, and the statistical interpretability of the DiPCA framework. Despite these advances, a closer examination reveals a fundamental distinction: the neural network and the statistical component are coupled externally—operating either as parallel detection modules in ensemble frameworks, or as post hoc decision layers in reliability-oriented approaches. This external coupling limits how organically the neural network can be integrated with the statistical decomposition and residual analysis that underpin traditional MSPM frameworks.
In this work, we explore a distinct architectural paradigm: embedding an ARMA-Transformer directly as the dynamic engine inside DiPCA, replacing its original linear autoregressive component while preserving the full DiPCA statistical backbone. Unlike external coupling, the neural network becomes an integral part of the MSPM framework. This deep integration enables the network’s nonlinear dynamic modeling capability to directly enhance the latent variables extraction that is central to DiPCA, while the established statistical mechanisms of DiPCA remain intact to ensure reliable fault detection decisions. The main contributions of the proposed method are listed as follows:
- (1)
A new architecture is proposed that integrates deep learning with traditional statistical process monitoring, offering an effective solution for industrial process safety monitoring. Traditional statistical methods like DiPCA are interpretable and provide rigorous control limits, but their linear dynamics cannot capture complex nonlinear patterns in modern industrial processes. Deep learning methods like Transformers excel at nonlinear modeling, but they lack statistical inferential power and are difficult to interpret. The proposed architecture combines the strengths of both: the ARMA-Transformer handles nonlinear dynamic prediction, while DiPCA provides the statistical framework for residual analysis and fault decision.
- (2)
The novelty lies in embedding a neural network as the dynamic core inside DiPCA, rather than adding it as an external module. The original DiPCA relies on a linear autoregressive model to capture temporal dependencies. In this work, an AR-MA-Transformer takes over this role at the same position, while the projection matrices, residual deflation, and control limit calculation of DiPCA remain unchanged. The neural network is not added as an external module; it is placed inside DiPCA as its dynamic engine.
The remainder of this paper is organized as follows.
Section 2 gives the problem formulation.
Section 3 introduces the details of the proposed ARMA-Transformer DiPCA framework.
Section 4 presents experimental results and analysis on the Tennessee Eastman Process (TEP) [
24] and the Case Western Reserve University (CWRU) [
25] bearing dataset. Finally,
Section 5 and
Section 6 summarize findings and discuss future work.
2. Problem Formulation
Despite extensive research on fault detection in industrial processes, the reliable detection of incipient faults under unsupervised conditions remains a significant challenge. These faults are typically characterized by their small amplitude and susceptibility to process noise and operating fluctuations. In addition, their fault features often manifest only in a subset of process variables. Unlike abrupt faults that produce sustained large deviations, incipient faults evolve slowly with amplitudes comparable to normal process variations, making them statistically indistinguishable from normal behavior when examined on a per-sample basis.
Existing methods such as DiPCA have demonstrated effectiveness in dynamic process monitoring. However, the linear autoregressive component at the core of DiPCA inherently limits its sensitivity to the subtle, nonlinear evolution patterns of incipient faults in complex industrial environments. This limitation becomes particularly pronounced under unsupervised conditions without labeled fault data.
To address this critical limitation, an enhanced DiPCA framework is proposed in this paper. Compared to traditional DiPCA, the proposed method introduces two key innovations: sliding-window singular value decomposition for dynamic feature extraction, and a masked linear ARMA attention model that replaces the linear dynamic component to generate more informative dynamic residuals. By combining residual analysis with multi-index thresholds and fusion rules, and determining control limits via kernel density estimation (KDE) [
26], the framework achieves improved detection performance and numerical robustness, particularly for faults 3 and 9 in the TEP, as well as on the CWRU bearing dataset. It should be noted that simply appending deep learning models as external modules to an MSPM framework does not resolve the fundamental limitation. The linear dynamic component of DiPCA remains unchanged, and the statistical decomposition and residual analysis that define the monitoring backbone are not directly influenced by the deep learning component. A more effective approach is to replace the linear dynamic core of DiPCA with a neural network that serves as its internal dynamic engine, allowing the nonlinear modeling capacity to directly participate in the latent variables extraction and residual generation that are central to fault detection.
3. ARMA-Transformer Based on DiPCA
In this section, we describe how the proposed ARMA-Transformer based on DiPCA was developed. First, the dynamic latent variables were extracted through DiPCA. Then, sliding-window singular-value sequences were utilized to construct the features. Furthermore, an autoregressive Transformer with masked linear ARMA components was employed for multi-step forecasting. The overall framework of the proposed ARMA-Transformer based on DiPCA is illustrated in
Figure 1.
3.1. Sliding-Window Singular Values
For the real-time data stream of industrial processes, set an appropriate window size
, and select
samples immediately preceding the current sample to form a sliding-window matrix. Assume the sampling instant is
, and the number of sensors is
. Hence, the sliding-window matrix is given by
The standardization of
yields the matrix
:
where
denotes a column vector of ones,
and
are row vectors of column-wise means and standard deviations:
where
is the i-th row of
. After standardization, each column of
has zero mean and unit variance.
Performing singular value decomposition on the standardized sliding-window matrix
yields
where
and
are orthogonal, and the diagonal entries of
are the
non-negative singular values
at time index
. By sliding the window along the time series, we collect, for each
, the sequence
forming
singular-value time series. Selecting the top
of these yields a multivariate sequence
which serves as the input to the DiPCA stage. In this work,
is used. These singular value sequences compactly represent the dominant energy structure within each window. DiPCA is then applied to extract latent variables that exhibit strong temporal autocorrelation from this representation.
3.2. DiPCA
DiPCA integrates dynamic data modeling with PCA to extract latent variables with autocorrelation structures from high-dimensional process data.
In the dynamic latent variables extraction stage, it iteratively extracts
dynamic latent variables
, where
represents the
-th latent variable scores and
is the
-th projection vector. The projection vector
is obtained by solving the optimization problem:
subject to the constraints
and
, where
is the lag order,
is the autoregressive coefficient vector, and
is the
-th score of the
-th component, where
denotes the i-th row of the singular value feature matrix S
The optimization is solved iteratively through alternating updates. With
fixed,
is obtained by solving the least squares problem:
and then normalized as
. Then, with
fixed,
is updated via gradient ascent followed by normalization, where the gradient is given by
This process iterates until convergence is achieved by monitoring relative changes in
and
. Upon convergence, the loading vector is computed as
and the residual matrix is updated by removing the extracted component:
In the Projection and Reconstruction Matrices stage, the final projection and loading matrices are
The latent variables and data reconstruction are given by
where
is the projection matrix.
3.3. Dynamic Prediction with ARMA Attention Formulation
The latent variables T extracted by DiPCA capture the dominant autocorrelated dynamics of the process, but the linear AR model within DiPCA cannot fully characterize the residual nonlinear temporal patterns. To address this, the dynamic prediction of latent variables employs an ARMA attention mechanism that integrates both autoregressive (AR) and moving average (MA) components. For the latent variables
obtained from DiPCA via Equation (14), the autoregressive component follows the AR structure of the classical ARMA model, where past observations are weighted by fixed coefficients:
where
represents the value vector at time
, and
denotes the attention weights derived from fixed parameter matrices rather than dynamic query–key interactions. This formulation constitutes a causal linear layer with a strictly lower triangular structure.
Following the moving average structure of the ARMA model, the prediction error is decomposed into past innovations weighted by MA coefficients:
where the AR residual term
is decomposed as
In the indirect MA weight generation stage, to avoid computationally expensive matrix operations while maintaining the MA structure, an indirect weight generation scheme is employed [
27]. Instead of explicitly computing
, the AR residuals
are utilized as value inputs for the MA component:
This establishes the matrix relationship
with the implicit transformation:
The MA weight matrix
exhibits desirable temporal decay properties, with elements satisfying
where
represents the characteristic decay parameter. This decay pattern matches the behavior of a standard MA process, where the influence of past innovations diminishes with increasing time lag.
In the MA component via the Transformed Queries and Keys stage, the MA component
is computed using a dedicated attention mechanism applied to the AR residuals. First, the queries and keys for the MA part are transformed via specialized functions:
where
and
are element-wise transformation functions designed to shape the distribution of attention weights for the MA dynamics. The MA output is then obtained by applying the transformed attention weights to the residual values
:
This formulation efficiently captures the short-term dependencies and error corrections in the residual sequence, corresponding to the MA characteristic within the attention framework.
In the final ARMA output and prediction stage, the final output of the ARMA attention block is generated by integrating the autoregressive and MA components, followed by a linear projection:
where
is a learnable output projection matrix. This combined output
serves as the one-step-ahead prediction for the latent variable scores at time
, effectively leveraging both the historical pattern (via AR) and the recent innovation sequence (via MA) to enhance dynamic prediction accuracy.
3.4. Control Limit Calculation and Fault Diagnosis
Once the ARMA-Transformer predicts the latent scores, the prediction error is decomposed into two complementary components: the dynamic residual in the latent space and the static residual in the original feature space. These residuals form the basis for fault detection. The control limits for monitoring statistics are established through KDE to characterize the empirical distribution of normal operation data. For each of the three monitoring statistics, the corresponding control limit
is determined as the
-quantile of the KDE-fitted distribution. The KDE-based probability density function is formulated as
where
is the Gaussian kernel function,
is the bandwidth parameter, and
are the observed monitoring statistics from normal operation data.
Three monitoring statistics are employed for comprehensive fault detection:
where
represents the dynamic residual vector.
where
denotes the PCA-reduced static residual.
where
represents the static residual vector.
A process observation is flagged as faulty when any monitoring statistic exceeds its control limit.
3.5. Algorithms
The operation of the ARMA-Transformer based on DiPCA consists of two sequential parts. First, historical normal data is used to train the model and compute statistical control limits, as formalized in Algorithm 1. Subsequently, the trained model is deployed for continuous, real-time incipient fault detection on new data samples, with the exact steps provided in Algorithm 2.
| Algorithm 1. Offline Training |
| Input: —input data; w—sliding window width; —number of singular values; s—lag order; l—number of latent variables |
| Output: P,W,R—DiPCA matrices; —trained parameters; , , —control limits |
| 1: Normalize sliding windows |
2: Perform SVD on each window and obtain sequence {}, retain top to form S 3: Extract l dynamic latent variables T and obtain P, W (Equations (7)–(11)) 4: Compute R (Equations (14) and (15)) |
| 5: for each training epoch do |
| 6: Input T into transformer |
| 7: Compute loss between predicted and actual T (Equations (16)–(26)) |
8: Backpropagate loss and update model parameters 9: end for 10: Perform forward pass on T to obtain predicted |
| 11: Compute , , on normal data (Equations (28)–(30)) |
| 12: Compute control limit , , (Equation (27)) |
| 15: return P, W, R, , , , |
| Algorithm 2. Online Detection |
| Input: —new sample; P, W, R, , , , —trained parameters; s—lag order; , , —control limits |
| Output: status of |
| 1: Extract singular values from sliding window on , perform SVD, form S |
| 2: Project to latent space T (Equation (14)) |
3: for each predicted step do 4: Predict T (Equations (16)–(26)) |
| 5: Compute , , (Equations (28)−(30)) |
| 6: if or or then |
| 7: Mark step as abnormal |
| 8: else |
| 9: Mark step as normal |
| 10: end if |
| 11: end for |
| 12: if any abnormal step exists then |
| 13: status = fault |
| 14: else |
15: status = normal 16: end if 17: return status of |
4. Simulation
In this section, the famous benchmark TEP and the CWRU bearing dataset are used to verify the superior performance of the proposed methodology.
4.1. Tennessee Eastman Process
This chemical plant simulator, originally created at Eastman Chemical Company, is widely accepted in process systems engineering for benchmarking fault detection algorithms because it exhibits pronounced nonlinear and time-varying behavior. The simulation integrates five-unit operations: a reactor, product condenser, vapor–liquid separator, recycle compressor, and stripper. Thirty-three signals are logged: twenty-two continuous measurements (XMEAS 1–22) and eleven controller outputs (XMV 1–11). Together, they describe the essential dynamic and operational states of the plant. Twenty-one fault scenarios are predefined within the simulator. In this study, faults 3 and 9 are mainly examined. Fault 3 is a step change in the reactor cooling-water inlet temperature, and fault 9 is a random variation in the reactor feed composition.
Figure 2 shows the system structure of TEP.
The simulation data were generated using a closed-loop version of the TEP [
28], available at
http://depts.Washington.edu/control/LARRY/TE/download.html (accessed on 10 April 2026). Both the training and test datasets were simulated for 200 h with a sampling interval of three minutes. For each test dataset, a fault was introduced after 100 h of simulation. With the exception of fault 6, 4000 training samples and 4000 faulty test samples were obtained for each fault type. It should be noted that fault 0 represents the normal operating condition. The fault detection rate (FDR) was calculated based on the last 2000 sampling instants of each test dataset, while the false alarm rate (FAR) was evaluated using the fault 0 dataset.
A comprehensive performance comparison of the proposed method against other fault detection approaches on the TEP dataset is presented in
Table 1. The compared methods include the traditional statistical approaches PCA and DPCA, as well as the deep learning-based approaches Transformer, LSTM-VAE [
29], and 1DRGANomaly [
30]. The proposed method achieves an average FDR of 82.97%, significantly outperforming PCA and DPCA. For the challenging incipient faults 3 and 9, the Transformer achieves FDRs of only 2.70% and 3.35%, LSTM-VAE achieves 9.13% and 0.04%, and 1DRGANomaly achieves 1.90% and 1.90%, while the proposed method attains 77.80% and 69.50%. These results highlight the advantage of embedding the ARMA-Transformer within the DiPCA statistical framework over both traditional and standalone deep learning approaches.
An ablation study quantifying the contribution of each component is presented in
Table 2. For the SVD-based methods, a sliding-window size of 1500 was used, and the top five singular values were retained from each window. All models were trained with a fixed random seed of 42, the Adam optimizer with a learning rate of 0.001, and 100 training epochs. The lag order was s = 2, and the number of latent variables was l = 5 for DiPCA. The ARMA-Transformer used a hidden dimension of 64 with four attention heads. Control limits were computed via KDE at a confidence level of 99% (α = 0.01). The results show that the full model achieves an average FDR of 82.97%. Removing the SVD-based feature extraction causes a drop of 19.30 percentage points, replacing the ARMA-Transformer with the original linear AR component of DiPCA reduces the average FDR by 17.93 percentage points, and removing the DiPCA latent variable structure leads to a decrease of 7.51 percentage points. These results demonstrate that all three components contribute meaningfully to the overall performance, with SVD providing noise suppression and compact dynamic representation, DiPCA extracting autocorrelated latent structure, and the ARMA-Transformer capturing nonlinear temporal patterns that the linear AR model cannot characterize.
For detailed illustration, the detection trajectories obtained by the three methods are compared in
Figure 3 and
Figure 4, where the red dashed line denotes the control limit and the green dotted line marks the time instant at which the fault is introduced.
4.2. CWRU Bearing Dataset
The CWRU bearing dataset serves as an internationally recognized benchmark for validating bearing fault diagnosis algorithms. In this study, vibration signals were acquired at a sampling frequency of 48 kHz from the experimental platform, which consists of a motor, a torque transducer, and related data acquisition equipment. The dataset includes artificially introduced single-point faults on the inner race, outer race (at 3, 6, and 12 o’clock positions), and rolling elements, generated via electrical discharge machining to simulate realistic localized damage. A total of 13 distinct fault conditions are systematically categorized based on fault location and size, providing a comprehensive foundation for evaluating diagnostic model performance under varying fault types and severity levels.
The performance of the proposed method on the CWRU dataset is comprehensively summarized in
Table 3. To ensure a fair comparison, all deep models were configured with consistent hyperparameters: a fixed random seed of 42, a sliding-window size of 1000, the Adam optimizer with a learning rate of 0.001, and 100 training epochs. For the proposed method, the top 5 singular values were retained, the lag order was s = 2, the number of latent variables was l = 5 for DiPCA, and the ARMA-Transformer used a hidden dimension of 64 with four attention heads. Control limits were computed via KDE at a confidence level of 99% (α = 0.01). The comparative results further highlight the advantage of the integrated framework over the individual DiPCA and ARMA baselines. For the ball fault with a fault size of 14 mils, DiPCA achieves an FDR of only 16.35% and ARMA achieves 66.20%, whereas the proposed method significantly improves the detection rate to 92.75%. For the inner race fault with a fault size of 14 mils, DiPCA achieves only 28.90% and ARMA achieves 55.85%, while the proposed method attains 74.20%. For the outer race fault at the 12 o’clock position with a fault size of 21 mils, DiPCA achieves only 39.65% and ARMA achieves 86.30%, compared to 95.20% achieved by the proposed method. In terms of average FDR, the proposed method achieves 94.74%, substantially outperforming DiPCA (51.61%) and ARMA (90.53%). These results collectively demonstrate the superiority and generalization capability of the proposed integrated monitoring framework in bearing fault detection tasks.
5. Discussion
Because no physical model is required, data-driven fault detection is a widely studied research topic in dynamic processes. The experimental findings presented in this study demonstrate the efficacy of the ARMA-Transformer DiPCA framework for incipient fault detection in complex industrial systems. The superior performance observed across both the TEP and the CWRU bearing datasets can be attributed to several methodological innovations inherent in the proposed approach.
Primarily, the integration of a masked linear ARMA attention mechanism within the Transformer architecture addresses a fundamental limitation of conventional process monitoring techniques. Traditional statistical methodologies, including PCA and DPCA, are predicated on linear assumptions. Consequently, they cannot adequately capture the complex temporal dependencies of industrial processes. By incorporating ARMA dynamics into the attention mechanism, the proposed method effectively models both autoregressive patterns and MA components. This enables a more precise characterization of process dynamics and fault signatures.
Furthermore, the synergistic combination of deep learning components with the established DiPCA statistical framework constitutes a robust monitoring system. While deep learning models exhibit superior capability in capturing complex nonlinear relationships, they often lack the statistical rigor required for reliable process monitoring. Here, the DiPCA foundation provides a statistical framework for residual analysis and control limit determination, ensuring that the proposed method maintains a low average FAR of 0.067% across all TEP fault scenarios while simultaneously achieving enhanced detection sensitivity. This low false alarm rate, together with the interpretable control limits derived from the statistical framework, helps reduce operator burden in practical applications.
To assess the practical deployability of the proposed method, its computational complexity and real-time performance are examined. All experiments were conducted on a computer equipped with an Intel Core Ultra 9 285H CPU and 16 GB of RAM, using Python 3.13 and PyTorch 2.8.0 (CPU-only). Across the 22 TEP fault scenarios, the average training time was 5.3 s, and the average test time was 3.4 s. The test time corresponds to processing 4000 samples, yielding an average inference latency of approximately 0.85 ms per sample. This per-sample latency is well within the sampling interval of three minutes in the TEP benchmark, confirming that the proposed method is suitable for real-time industrial process monitoring. Notably, these results were achieved without GPU acceleration, indicating that the method can be deployed on modest industrial computing hardware. The short training time further suggests that the model can be periodically retrained or updated without disrupting online operations.
The observed performance variations across different fault categories merit further investigation. For TEP faults 3 and 9, the proposed method demonstrates substantial improvements over conventional approaches, suggesting that the ARMA-Transformer component is particularly effective at detecting faults characterized by subtle, slowly evolving deviations. The comparatively lower performance on fault 5 relative to faults 3 and 9 may be attributed to the specific dynamic characteristics associated with condenser cooling-water temperature perturbations, which may present greater detection challenges. For fault 5, the step change introduces only a subtle shift in a single variable. Since the sliding-window SVD captures the dominant energy structure of the windowed data, this low-amplitude perturbation is easily diluted by other high-energy variables, resulting in singular value features that carry insufficient fault information for the downstream DiPCA and ARMA-Transformer stages. For fault 15, which also exhibits modest detection rates, the valve sticking fault produces intermittent transient deviations that occur only when the valve is commanded to move. The fixed-weight AR attention captures stable temporal patterns, and the MA component further smooths short-term residual fluctuations through its exponential decay structure. As a result, the transient signatures may be partially attenuated before reaching the monitoring statistics, leading to discontinuous threshold exceedances and a lower overall FDR. These cases indicate that while the deep integration of the ARMA-Transformer with DiPCA enhances sensitivity to slowly evolving incipient faults, the reliance on singular value features and the smoothing effect of the MA decay structure can reduce responsiveness to isolated transient events and single-variable perturbations.
For the CWRU dataset, the consistently high performance across the majority of fault conditions substantiates the method’s generalization capability to mechanical systems. The exception of condition 5 (74.20% FDR) indicates that certain fault orientations may pose specific challenges for vibration-based detection methodologies. This phenomenon could be attributable to the directional sensitivity of vibration sensors or to distinct propagation patterns associated with faults at the inner race with a 14-size defect. At this intermediate defect size (14 mils), the fault-induced impulses are weaker than those of larger defects (21 mils), while the increased randomness from the larger damaged area makes the fault signature less periodic than that of smaller defects (7 mils). The sliding-window SVD extracts dominant singular value patterns and may not fully preserve these weak and partially irregular impulses. The fixed-weight AR attention captures stable latent dynamics, and the MA component applies temporal smoothing to the residuals. For vibration signals with irregular transient impulses, this smoothing can attenuate fault signatures before they reach the monitoring statistics. Despite these effects, the FDR of 74.20% remains substantially higher than the DiPCA-only result of 28.90% and the ARMA-only result of 55.85%, indicating that the integrated framework retains considerable fault sensitivity even under suboptimal conditions.
The practical implications of these findings hold significant relevance for industrial applications. The ability of the proposed method to maintain low FARs while achieving high detection sensitivity addresses a critical operational requirement in process industries. False alarms and undetected faults can lead to unnecessary production interruptions and economic losses. Moreover, the method’s efficacy in detecting small-scale faults underscores its potential for predictive maintenance applications, facilitating early intervention prior to the escalation of faults into catastrophic failures.
6. Conclusions
This research addresses the core challenge of detecting incipient faults in complex industrial processes. It proposes an enhanced DiPCA based on ARMA-Transformer. The method combines the ARMA structure for modeling temporal patterns with the Transformer architecture for capturing long-range dependencies. To overcome this limitation, a masked linear ARMA attention mechanism is introduced to replace the linear dynamic component of DiPCA. Tests on the TEP show that this method detects typical incipient faults 3 and 9 at much higher rates than conventional methods. In addition, tests on the CWRU bearing dataset confirm that the method works well for mechanical systems. The main contribution of this work is an embedded architecture that integrates deep learning with traditional statistical process monitoring. The neural network is placed inside DiPCA as its dynamic engine rather than added as an external module, offering an effective solution for industrial process safety monitoring. However, the effectiveness of the proposed method under time-varying process dynamics and real industrial conditions remains to be further investigated, as current validation primarily focuses on standard benchmarks. Future work will include validation on real minute-level industrial datasets from large-scale chemical enterprises to assess the method’s performance under actual sensor noise, variable coupling, and temporal irregularities.