1. Introduction
Anomaly detection in PHM refers to the automated identification of deviations from expected system behavior and is essential for enabling proactive maintenance and preventing unexpected failures. Early approaches primarily relied on rule-based systems and handcrafted indicators. They were later replaced by data-driven machine learning techniques such as Support Vector Machine (SVM), Random Forests, K-Nearest Neighbor (KNN) and Principal Component Analysis (PCA) as multivariate sensor data became more widely available [
1,
2,
3,
4]. In recent years, deep learning models including Autoencoders (AE), Long-Short Term Memory (LSTM), and Gated Recurrent Unit (GRU) networks have gained prominence due to their ability to capture complex nonlinear and temporal dependencies in time-series data [
5,
6]. However, despite their strong performance, many existing approaches remain sensitive to operating condition variability and offer limited interpretability, motivating the development of more robust and structurally informed anomaly detection frameworks. Anomaly detection can be formulated as either a supervised or unsupervised learning task. While supervised approaches rely on labeled samples from both normal and faulty conditions, such annotations are often scarce, costly, or unreliable in complex engineering systems. As a result, anomaly detection in PHM is predominantly addressed in an unsupervised setting in which models are trained on healthy data to characterize normal operating behavior and identify deviations from it [
6]. This formulation places strong emphasis on learning robust representations of normal dynamics that remain sensitive to subtle and early-stage degradations.
Recent advances in deep learning have enabled data-driven diagnostics for rotating machinery; however, widely adopted sequence modeling techniques, including Transformer-based self-attention, may struggle under noisy, non-stationary, or weakly separable operating conditions. Recurrence Quantification Analysis (RQA) provides a nonlinear dynamical framework for characterizing chaotic and recurrent structures in time-series signals [
7]. In the existing literature, RQA has predominantly been employed as an offline feature extraction tool within hybrid learning pipelines [
8].
While several recent studies have explored recurrence-based features and attention mechanisms for time series analysis, our approach differs fundamentally in its integration strategy. Recent work using recurrence plot images as CNN inputs for time series classification employs spatial feature extraction from RP visualizations, whereas our method computes scalar RQA metrics to directly modulate attention weights in LSTM autoencoders, preserving sequential information while injecting dynamical structure [
9]. Similarly, multi-scale asymmetric recurrence plot approaches combined with Swin Transformers for bearing fault diagnosis apply vision transformers to RP images, treating the problem as spatial image processing rather than temporal sequence modeling [
10]. MTF-based methods with mixed attention residual networks convert signals to 2D Markov transition field images and apply spatial attention, fundamentally differing from our approach which computes RQA metrics directly from time series and embeds them into temporal attention scoring for LSTM autoencoders [
11]. Although physics-informed attention mechanisms have shown promise in PDE solving through PINNsFormer [
12], where physical laws guide network learning, our work extends this paradigm to industrial anomaly detection by embedding chaos-theoretic RQA descriptors into attention scoring rather than differential equation constraints, thereby bridging physics-informed deep learning with practical PHM applications.
To the best of our knowledge, no prior study has systematically integrated RQA-derived nonlinear dynamical descriptors directly into the attention scoring mechanism of LSTM-based autoencoders for unsupervised bearing anomaly detection. Embedding RQA-informed structural cues into attention computation therefore offers a principled means of guiding attention toward dynamically informative regions of the signal.
The main contributions of this study are summarized as follows:
We propose three distinct RQA-enhanced attention mechanisms, namely Hybrid QKVRQAA, Input-level RQA-Guided Channel Attention (CRQAA), and Encoder-level RQA-Guided Channel Attention (ERQAA), which differ in how and where RQA-derived nonlinear dynamical information is mathematically integrated into the attention pipeline.
We provide a systematic comparative analysis within individual datasets demonstrating how different RQA integration levels influence representation learning and anomaly detection performance under varying signal dynamical regimes, including non-stationary and noisy conditions.
Extensive experiments conducted on three publicly available bearing datasets (IMS, CWRU, and HUST) show that embedding RQA-derived dynamical descriptors into attention mechanisms consistently improves anomaly detection performance, particularly for signals exhibiting nonlinear and chaotic characteristics.
By computing RQA metrics directly from raw vibration signals, the proposed approach incorporates physics-informed dynamical priors into deep attention models, effectively bridging nonlinear dynamical system analysis and data-driven representation learning.
The remainder of this manuscript is organized to first review related work, then present the proposed methodology, followed by experimental evaluation, and finally conclude with directions for future research.
2. Related Work
PHM systems are typically implemented through a structured processing pipeline encompassing data acquisition, preprocessing, representation learning, anomaly scoring, and decision support. Although specific modeling techniques vary across applications, most anomaly detection frameworks follow a common workflow that integrates signal processing, feature learning, and decision-making components.
Figure 1 illustrates this general anomaly detection pipeline, which provides a contextual framework for positioning the methods reviewed in this section as well as the approach proposed later in the paper.
According to the literature, signal-, model-, or data-based methods, as well as deep learning and hybrid approaches, are widely used in anomaly detection [
13]. Examples of signal-based methods include FFT, WT, and RQA [
14,
15]. Within model-based methods, state-space modeling and KF are frequently mentioned [
16]. Data-based machine learning methods include approaches such as Isolation Forests (IF), Principal Component Analysis (PCA), Single-Class SVM, and K-Means clustering. On the deep learning side, AE, LSTM-AE, CNNs, and attention-based models are reported to be used in anomaly detection [
17]. Furthermore, hybrid approaches combining attention-based mechanisms with data-driven methods have also been reported [
18]. Attention mechanisms enable the model to focus on critical time steps or sensor channels, allowing for the suppression of unnecessary information and more effective learning of long-term dependencies [
19]. Different types of attention have been proposed in the literature [
20]. Channel attention mechanisms are used to highlight important feature channels, particularly in convolutional architectures, while spatial and temporal attention mechanisms highlight critical regions in the input space and important time steps in sequential data, respectively. Multi-headed attention can capture different types of relationships by processing the input in parallel attention subspaces, while external attention mechanisms aim to increase computational efficiency through external memory structures [
21,
22,
23]. Self-attention mechanisms are widely used in time series modeling tasks due to their ability to selectively focus on the most informative parts of the sequence [
24]. This structure provides an advantage, particularly in capturing slow-developing decay trends and long-term dependencies [
25]. Compared to recurrent models, its ability to process the entire sequence simultaneously increases computational efficiency and facilitates adaptation to variable and noisy working conditions. Furthermore, attention weights explicitly reveal which time steps the model considers more important in its decision-making process, thereby supporting interpretability. When combined with LSTM architectures, attention operates as a temporal weighting mechanism that highlights informative segments of sequential data, whereas CNN-based models emphasize salient patterns across sensor channels. In contrast, Transformer architectures employ self-attention to jointly capture long-range temporal dependencies and global inter-sensor interactions [
24].
Many engineering systems, particularly rotating machinery, exhibit nonlinear dynamic behavior in which the system output is not directly proportional to its input and cannot be adequately described using linear equations. In rotating machines, nonlinear events arising from mechanical interactions, wear processes, and operating variability motivate the use of chaos-based analysis techniques for vibration signal interpretation. Chaotic analysis enables the characterization of nonlinear system dynamics, supports prediction and forecasting, facilitates anomaly detection, and provides insight into system stability and degradation mechanisms. Within this context, RQA has emerged as an effective tool for analyzing nonlinear and non-stationary time series.
Compared to traditional time-frequency methods such as Fourier Transform or Wavelet Analysis, RQA offers distinct advantages for nonlinear and non-stationary signals. RQA does not assume stationarity, requires relatively short time series, and can detect subtle changes in system dynamics that may not be apparent in spectral analysis [
26]. These characteristics make RQA particularly suitable for condition monitoring of rotating machinery, where transient events and nonlinear dynamics are prevalent.
RQA has proven valuable for assessing signal instability, which is a common property of real-world vibration data [
27,
28]. By extracting sensitive recurrence-based features, RQA-based methods improve the interpretability and robustness of fault diagnosis even under noisy operating conditions [
29]. Moreover, the instability of recurrence quantification measures has demonstrated strong predictive capability when characterizing complex dynamical behavior [
30]. Owing to their robustness to noise, ability to extract comprehensive dynamical features, and computational efficiency, RQA-based approaches are suitable for both exploratory analysis and real-time monitoring applications [
28,
29]. Consequently, RQA has been applied in a wide range of PHM tasks, including early detection of aircraft engine failures, identification of bearing and rotor faults via vibration analysis, RUL estimation in chaotic systems, and time series analysis of helicopter and aircraft sensor data [
7,
8].
In the literature, RQA has been applied to monitor transient accelerometer signals from auxiliary aircraft equipment such as fuel pumps, where it provides early warnings of degradation and improves mean time before failure estimation. Studies have shown that combining RQA with traditional diagnostic methods can enhance failure detection accuracy and maintenance planning for aircraft components, outperforming classical models such as k-Nearest Neighbor and Random Forest, and demonstrating strong potential for engineering applications, including aviation bearings [
7,
31]. Furthermore, the integration of RQA with Kalman filtering techniques has enabled the prediction of bearing failures by extracting entropy-based features from vibration signals and modeling degradation dynamics, with reported prediction horizons of up to 50 min prior to failure [
32]. Beyond mechanical systems, RQA has also been used to analyze surface pressure data on wing profiles, successfully distinguishing flow transitions at different angles of attack, and to interpret turbulence measurements by separating turbulent and non-turbulent segments using recurrence-based variables, thereby reducing subjectivity in boundary definitions [
33].
Despite the extensive body of work on anomaly detection in PHM, most existing studies employ nonlinear dynamical descriptors such as RQA either as standalone diagnostic indicators or as offline feature extraction tools integrated into conventional machine learning pipelines. In parallel, attention mechanisms in deep learning models are predominantly driven by data similarity measures and temporal correlations, which can be sensitive to noise and non-stationary operating conditions. As a result, the structural dynamical information captured by recurrence-based analysis remains largely untapped within attention scoring mechanisms. This gap motivates the present study, which systematically integrates RQA-derived nonlinear dynamical descriptors directly into the attention mechanism of LSTM-based autoencoders for unsupervised anomaly detection.
4. Results
In this study, an LSTM backbone was employed to capture temporal dependencies and extract sequential features. Within the anomaly detection framework, RQA-based attention mechanisms were integrated into an LSTM-AE architecture to enhance bearing anomaly detection performance. The proposed models differ according to the stage at which RQA information is incorporated: hybrid LSTM-AE-QKVRQAA combines RQA priors with QKV attention scores, LSTM-AE-CRQAA applies an RQA-Guided Channel Attention module at the input level, and the LSTM-AE-ERQAA introduces this guidance within the encoder’s latent representation. The overall system workflow is illustrated in
Figure 5.
In the proposed model, each input sequence is first processed by the LSTM layers embedded within the encoder part of the AE, which capture the temporal dependencies across different time windows. Following the encoder, three distinct attention mechanisms are examined. The features enriched with RQA measures derived from RP are fed into a dedicated attention module. The RQA vector is incorporated as a gating bias into the attention scores, enabling the computation of attention weights that are sensitive to the regularity and recurrence properties of the underlying system dynamics.
The decoder LSTM layers then reconstruct the input sequence from this enriched representation, while anomaly detection is performed by comparing the reconstruction error against a threshold determined from the learned normal-condition distribution during training. Through this process, the RQA-enhanced attention mechanism assigns higher weights to patterns that reflect healthy operating behavior, thereby improving the model’s capability to detect bearing anomalies. A schematic illustration of the implemented models is presented in
Figure 6.
In this study, three publicly available bearing datasets commonly used in the literature were employed. A sample time-series visualization of the IMS dataset is presented in
Figure 7. For the anomaly detection experiments, the first 531 samples of the signal were considered as healthy data, while the remaining portion was labeled as anomalous.
A time-series representation of the CWRU dataset is provided in
Figure 8. For the anomaly detection task, the first 230 samples were designated as healthy data, while the remaining portion of the signal was labeled as anomalous.
A time-series representation of the HUST dataset is shown in
Figure 9. The segment corresponding to normal operation begins at sample 2004 and ends at sample 2504.
Windowed samples were split into training/validation/test sets using a time-ordered strategy to avoid leakage due to overlapping windows. The model was trained on healthy windows only; validation was performed on the remaining healthy portion, and test evaluation was conducted on the full timeline to assess detection performance across the entire degradation process.
As illustrated in the figures, the number of normal samples in both the CWRU and HUST datasets is considerably limited. Due to the scarcity of normal time-series segments and the presence of stochastic variations in sensor measurements that may negatively impact model performance, a noise-based data augmentation strategy was employed to increase the diversity of the training set. In this approach, each original sample was augmented by adding Gaussian noise with a zero mean and a predefined standard deviation ( = noise level), generating new synthetic samples.
Mathematically, this process can be expressed as,
, where
denotes the original time series and
represents the noise vector. In this study, the noise level was set to
. The
noisy replicas produced from each sample were then combined with the original data, expanding the training set size to
times its initial volume, with
was selected. To validate the impact of Gaussian noise augmentation on recurrence-based features, a quantitative sensitivity analysis was conducted. As shown in
Table 1, moderate noise levels (σ ≤ 0.05) preserve recurrence-driven anomaly separability, whereas excessive noise (σ = 0.1) degrades RQA-sensitive structures, leading to reduced detection performance. This confirms that the adopted percentile-based thresholding strategy ensures robustness under realistic noise levels, while also revealing its practical operating limits. This analysis also supports the design choice of percentile-based recurrence thresholding introduced in the methodology section.
This augmentation strategy improves the model’s robustness against measurement noise and environmental disturbances while reducing overfitting and enhancing generalization capability. Importantly, since recurrence plots were constructed using a percentile-based thresholding strategy, moderate noise levels (σ ≤ 0.05) do not significantly distort recurrence density, thereby ensuring consistent RQA-based feature extraction under realistic noise conditions.
To clarify the core concepts and procedural steps of the hybrid LSTM-AE-QKVRQAA model developed in this study, Algorithm 1 presents the fundamental pseudocode. The number of units in the LSTM layers and the dropout rates were optimized individually for each dataset (
Table 2).
| Algorithm 1. LSTM-AE-QKVRQAA |
| Input: |
|
| Output: |
|
| LSTM-AE-QKVRQAA model: |
The input layer is defined The LSTM layer is applied Layer normalization is applied Dropout is applied The LSTM layer is applied again Layer normalization is applied Time axis average is taken with GlobalAveragePooling1D Bottleneck dense layer is applied A Q-K-V projection is performed with dense layers.
- o
RP is extracted for each sample in the batch - o
RQA metrics are calculated:
- ▪
RR - ▪
DET - ▪
LAM - ▪
Lmean - ▪
Lmax - ▪
ENTR - ▪
TT
- o
RQA features are projected onto the dense layer dimension - o
Temporal axis average and RQA vector are multiplicatively fused to produce a scalar gate - o
This scalar gate is broadcast to all (T,T) positions and combined with classical scaled dot-product scores - o
Attention weights are obtained using Softmax; they are multiplied by V and accumulated over time to produce the context vector
A channel-level weighted summary is obtained It is passed through the dense layer and ReLU activation It is repeated for the time step using RepeatVector and passed to the decoder The LSTM layer is applied Dropout is applied The LSTM layer is applied Final layer: Dense layer, activation: linear Adam optimizer, Learning rate: 0.001, and model is trained using the MSE loss function LSTM-AE-QKVRQAA model is returned
|
For multivariate time series like IMS dataset with 4 accelerometers, the signals were concatenated along the feature dimension before phase space reconstruction, resulting in a joint recurrence matrix that captures cross-channel temporal dependencies. This approach differs from computing separate RQA metrics for each channel and enables the attention mechanism to leverage inter-sensor correlations [
61].
Table 2 presents the hyperparameters that are not commonly applied across all models.
Algorithm 2 provides the core pseudocode outlining the fundamental concepts and step-by-step procedure of the proposed LSTM-AE-CRQAA model.
| Algorithm 2. LSTM-AE-CRQAA |
| Input: |
|
| Output: |
|
| LSTM-AE-CRQAA model: |
The input layer is defined The LSTM layer is applied Layer normalization is applied Dropout is applied The LSTM layer is applied again Layer normalization is applied Time axis average is taken with GlobalAveragePooling1D Bottleneck dense layer is applied
- o
RP is extracted for each sample in the batch - o
RQA metrics are calculated:
- ▪
RR - ▪
DET - ▪
LAM - ▪
Lmean - ▪
Lmax - ▪
ENTR - ▪
TT
- o
The RQA vector is projected - o
The input tensor is projected - o
A channel-level multiplicative gate is applied
The time axis average is taken using the GlobalAveragePooling1D layer The encoder summary is combined with channel weights The context vector is passed through a Dense layer and ReLU activation It is repeated for the time step using RepeatVector and transferred to the decoder The LSTM layer is applied Dropout is applied The LSTM layer is applied Final layer: Dense layer, activation: linear The model is trained using Adam optimization, learning rate: 0.001, and MSE loss function LSTM-AE-CRQAA model is returned
|
In the third model, LSTM-AE-ERQAA, the RQA metrics are computed from the encoder output rather than from the raw input sequence. All other computations and hyperparameter settings are kept identical to the LSTM-AE-CRQAA model. The objective of this design is to determine whether anomaly detection performance is more strongly influenced by RQA features derived directly from the input signal or by those computed from the latent representation produced by the encoder. To ensure full reproducibility, the random seeds were fixed at 42 for all experiments. The hyperparameters used in the model are summarized in
Table 3.
In the threshold selection stage, the 3-sigma rule is employed. For each time window
, an anomaly score
is computed, which corresponds to the reconstruction error of the autoencoder. Using the training set composed solely of fault-free samples,
, the location and scale parameters of the score distribution are estimated according to the following Equations (35) and (36):
Since the reconstruction error in autoencoder-based anomaly detection does not generate negative anomalies, a one-sided 3-sigma rule was applied to capture only the extreme values on the upper side of the distribution. According to the one-sided 3-sigma rule, the decision threshold
is computed in Equation (37):
For any new window, the decision function is defined in Equation (38):
The performance metrics of the models are presented in
Table 4.
The confusion matrices of the best-performing models for the three datasets corresponding to the models yielding the highest F1-score and AUC on each dataset are visualized in
Figure 10.
Table 4 summarizes the performance of the baseline LSTM-AE and the proposed RQA-guided attention models across the IMS, CWRU, and HUST datasets. Overall, the results indicate that incorporating RQA-derived dynamical descriptors into attention mechanisms consistently improves anomaly detection performance, with the gains becoming more pronounced as the dynamical complexity and noise level of the dataset increase. In the IMS dataset, which exhibits relatively regular and well-structured dynamics, the baseline LSTM-AE already achieves strong performance. Nevertheless, the proposed LSTM-AE-QKVRQAA model further improves the results, reaching an accuracy of 99.47%, an F1-score of 99.41%, and an AUC of 99.45%, indicating more balanced precision–recall behavior and improved sensitivity to subtle anomalies. This suggests that integrating RQA-informed dynamical priors into the QKV-based attention mechanism enhances temporal feature discrimination even in comparatively simple operating conditions. For the CWRU dataset, both LSTM-AE-QKVRQAA and LSTM-AE-CRQAA significantly outperform the baseline model by approximately 6–7% in terms of accuracy and F1-score, achieving near-perfect classification performance. These results highlight the effectiveness of RQA-derived nonlinear descriptors in guiding attention toward structurally meaningful recurrence patterns, which is particularly beneficial for distinguishing bearing fault conditions under varying operating loads. The most substantial improvement is observed on the HUST dataset, where the baseline LSTM-AE performs poorly due to strong noise and heterogeneous operating regimes. In contrast, RQA-guided models demonstrate a significant performance increase. Notably, the LSTM-AE-CRQAA model achieves an F1-score of 99.85% and an AUC of 99.00%, confirming the robustness of the proposed RQA-guided channel-attention mechanism under challenging and nonstationary conditions.
Overall, these findings demonstrate that the observed performance gains cannot be attributed solely to the backbone autoencoder architecture. Instead, they arise from the explicit incorporation of nonlinear dynamical information through RQA, which enriches feature representations, improves anomaly separability, and mitigates overfitting tendencies commonly observed in conventional attention-based models. Due to the superior performance achieved on the HUST dataset compared to existing approaches, an additional robustness analysis with respect to random seed initialization is conducted in this study. This analysis aims to evaluate the stability and reliability of the proposed RQA-guided attention mechanism beyond a single training run. The robustness results of the LSTM-AE-CRQAA model under different random seeds are summarized in
Table 5.
As reported in
Table 5, the proposed method demonstrates consistent robustness against random initialization effects. While performance varies across seeds, high detection capability is largely preserved. In particular, Seeds 24 and 42 yield near-perfect results, with F1-scores exceeding 99.7% and AUC values close to 99%, indicating excellent separability between normal and anomalous bearing conditions. Under less favorable initializations (e.g., SEED = 5 and SEED = 1024), the model still achieves competitive performance, with F1-scores of 83.6% and 97.7%, respectively. The SEED = 0 case exhibits a more conservative behavior characterized by perfect precision but reduced recall, indicating that random initialization primarily affects the operating decision threshold rather than the underlying feature representation. This behavior can be attributed to a tighter reconstruction error distribution, which leads to a higher effective threshold when statistical thresholding is applied, rather than to a degradation of the learned latent space. Importantly, an inspection of the learned parameter statistics across all seeds reveals stable weight distributions centered near zero with well-bounded standard deviations, indicating numerically stable training without pathological parameter growth. These observations confirm that the reported performance is not the result of a single favorable training instance but instead reflects the intrinsic modeling capacity of the proposed framework rather than a single favorable training run.
To further assess robustness across different datasets, the proposed LSTM-AE-QKVRQAA model was also evaluated on the IMS bearing dataset using the same set of random seeds (0, 5, 24, 42, and 1024). The corresponding results are summarized in
Table 6.
As shown in
Table 6, the proposed model demonstrates consistently strong performance across different random initializations. High recall values are maintained for all seeds, reaching or approaching 100% in several runs, which confirms reliable detection of anomalous bearing segments. While minor variations in precision and overall accuracy are observed due to stochastic training effects, the F1-score remains consistently high, ranging from approximately 96% to nearly 100%, and the AUC values consistently exceed 96%. The best overall performance is achieved under SEED = 1024, yielding an F1-score of 99.88% and an AUC of 99.90%. Importantly, none of the evaluated seeds results in a significant performance degradation, indicating that the effectiveness of the proposed approach is not dependent on a particular initialization. Moreover, the distributions of learned weights and biases remain highly consistent across different initializations, with parameter means centered near zero and comparable variance levels. This observation further confirms that the proposed RQA-guided attention mechanism is not overly sensitive to random initialization and exhibits robust and reproducible behavior across datasets with different dynamical characteristics. To ensure consistency and reproducibility across all experiments, a fixed random seed (SEED = 42) is used throughout the study for all datasets, unless otherwise stated.
Figure 11 presents a Spearman rank correlation analysis between individual RQA metrics, the LSTM-AE-CRQAA gating magnitude, and the reconstruction-based anomaly score.
The results indicate that the proposed RQA-guided gating mechanism is primarily influenced by DET and LAM, suggesting that structurally repetitive and quasi-stationary dynamics play a dominant role in modulating the attention gate. This observation is consistent with the design objective of LSTM-AE-CRQAA, where recurrence structure is exploited as a dynamical prior rather than a direct anomaly indicator. In contrast, the anomaly score exhibits stronger associations with ENTR, TT, and Lmean, reflecting increased dynamical complexity and disrupted recurrence patterns during anomalous operating conditions. These metrics capture variations in diagonal length distribution and temporal trapping behavior, which are known to increase under degradation or fault evolution. Notably, the RR remains nearly constant due to the adopted percentile-based thresholding strategy, explaining its negligible correlation with both the gating signal and the anomaly score. Overall, these findings demonstrate that CRQAA selectively leverages physically meaningful RQA descriptors instead of uniformly weighting all recurrence features.
Figure 12 analyzes the relationship between individual RQA descriptors, the LSTM-AE-QKVRQAA gating strength, and the reconstruction-based anomaly score. LSTM-AE-QKVRQAA exhibits substantially stronger correlations, indicating a tighter coupling between recurrence dynamics and the attention modulation process.
DET, LAM, and TT show the strongest correlations with both the gating magnitude and the anomaly score, suggesting that LSTM-AE-QKVRQAA emphasizes persistent and structured dynamical patterns that are also reflected in reconstruction error. This behavior indicates that the fusion-based attention mechanism integrates recurrence structure more directly into the context representation. Lmean demonstrates a stronger association with anomaly magnitude than with the gating signal, implying a secondary role in severity estimation rather than attention control. In contrast, RR and Lmax exhibit limited influence, which is expected given the percentile-based recurrence thresholding and the rarity of extreme diagonal structures in the analyzed signals. Overall, the results confirm that LSTM-AE-QKVRQAA preserves the interpretability of RQA-driven attention while strengthening the alignment between recurrence dynamics and anomaly severity.
Table 7 presents a sensitivity analysis of key RQA hyperparameters on the HUST dataset.
The results show that the proposed RQA-guided models are robust to moderate variations in embedding dimension and delay, confirming that RQA is primarily used as a structural descriptor rather than for precise attractor reconstruction. While a recurrence rate of 10% yields slightly higher performance on HUST, a fixed value of 20% was adopted throughout the main experiments to ensure consistency across datasets and to follow common practice in RQA-based studies. Notably, excessive recurrence density (30%) leads to a clear degradation in performance, indicating loss of discriminative recurrence structures.
5. Discussion
The performance of different RQA-aware attention architectures varies significantly across datasets due to their distinct signal characteristics. For the IMS dataset, which exhibits highly periodic dynamics, the LSTM-AE-CRQAA model shows limited temporal resolution, as channel-wise RQA aggregation reduces sensitivity to fine-grained phase variations in strongly periodic signals. In this case, the LSTM-AE-ERQAA model provides more stable results, as the encoder preserves dominant low-frequency periodic patterns sufficient for recurrence detection (F1-score = 93.68%). Nevertheless, the LSTM-AE-QKVRQAA model achieves the highest performance by jointly incorporating Q–K–V interactions and global RQA-derived deviation, benefiting from the stable dynamics that allow attention weights to be learned more clearly (F1-score = 99.41%).
For the CWRU dataset, the LSTM-AE-CRQAA model demonstrates superior performance, as RQA metrics computed directly from the input signal preserve more discriminative information in this high-noise environment. Applying RQA at the encoder output (LSTM-AE-ERQAA) leads to information loss, indicating that LSTM-AE-CRQAA is the most effective and lightweight approach for single-sensor systems with elevated noise levels.
The HUST bearing dataset [
39] has been primarily utilized for supervised fault classification tasks [
45,
62]. However, unsupervised anomaly detection—critical for real-world scenarios where labeled failure data is scarce—remains underexplored for this dataset. This study addresses this gap by introducing an RQA-aware attention framework specifically designed for unsupervised anomaly detection on HUST bearing data. On this dataset, the LSTM-AE-CRQAA model achieves exceptional performance (F1-score = 99.85%), while LSTM-AE-ERQAA performs poorly (F1-score = 57.53%). This stark contrast reveals a critical insight: encoder-based dimensionality reduction suppresses high-frequency chaotic signatures essential for RQA in noise-dominant signals. Analysis of the latent representations shows that the HUST encoder (bottleneck dimension = 8) filters out high-frequency components to reduce noise. While beneficial for reconstruction, this low-pass filtering effect removes fine-grained dynamical structures (e.g., short diagonal lines in RP) that RQA relies on to distinguish chaotic from regular behavior.
The literature comparison with state-of-the-art methods is presented in
Table 8.
While chaos theory and RQA have been extensively studied in bearing fault diagnosis, most existing work focuses on supervised fault classification rather than unsupervised anomaly detection. Several recent studies have explored unsupervised approaches on standard benchmark datasets, though with varying methodological frameworks and performance metrics. It should be noted that the comparison with prior studies is provided for contextual reference only, as differences in supervision level, preprocessing pipelines, and evaluation protocols prevent strict one-to-one benchmarking.
Studies employing RQA metrics for bearing fault diagnosis have predominantly focused on supervised classification frameworks. For example, experiments conducted on the CWRU bearing dataset using 12 kHz vibration recordings report anomaly detection accuracies reaching 96.97% [
69].
DCC method [
68] represents a strong supervised baseline reported in the recent literature, achieving 100% AUC, 100% accuracy, and 100% F1-score on the CWRU dataset (12 kHz). DCC employs a non-reconstructive approach with spectral normalization, directly scoring normality without autoencoder reconstruction. The proposed LSTM-AE-CRQAA achieves 99.25% F1-score and 99.01% AUC on CWRU. However, a direct comparison with the proposed approach is not straightforward, as these studies address supervised multi-class fault identification, whereas the present work focuses on unsupervised anomaly detection using higher-resolution 48 kHz signals, which preserve richer high-frequency fault-related dynamics. In addition, several aspects critical for fair benchmarking—such as class-wise performance, dataset imbalance handling, and evaluation protocols—are either not consistently reported or differ substantially across studies. Consequently, the reported results should be interpreted as complementary rather than directly comparable to the proposed framework.
Ref. [
63] proposed AE-AnoWGAN, an unsupervised framework combining autoencoders with Wasserstein GANs for bearing anomaly detection. Raw vibration signals are transformed into time–frequency spectrograms via continuous wavelet transform and processed through a multi-encoder, multi-decoder GAN architecture. On the IMS dataset, the method achieved an AUC of 92.00%. However, the authors do not specify which of the three operating-condition subsets was used, limiting reproducibility. In comparison, the proposed LSTM-AE-QKVRQAA reaches higher performance (99.41% F1-score, 99.45% AUC) through joint Q–K–V attention integration.
Ref. [
64] introduced MRRAE, combining convolutional autoencoders with memory modules that store prototypical normal patterns. The model detects anomalies by measuring deviations from stored memory representations, achieving 97.97% accuracy and 97.73% F1-score on the IMS dataset. While MRRAE effectively preserves representative patterns through memory augmentation, it lacks the temporal focusing capability inherent in QKV-based attention mechanisms. The proposed LSTM-AE-QKVRQAA, by contrast, dynamically recalibrates attention weights using RQA-derived chaos metrics, enabling real-time adaptation to evolving signal dynamics without fixed memory templates.
Addressing the scarcity or complete absence of fault samples, ref. [
65] proposed DIDAD, a dual-stream CNN-based framework. Feature extractors process normal and test data separately, with outputs fused through an autoencoder-based module. Validated on the IMS dataset, DIDAD achieved accuracy exceeding 98.00%. The proposed LSTM-AE-QKVRQAA attains comparable accuracy (99.47% on IMS) while delivering more balanced performance across multiple metrics due to RQA-enhanced attention that captures both reconstruction error and dynamical complexity.
Ref. [
66] introduced VCEAD, employing autoencoder-based reconstruction error alongside TCN-based vibration forecasting. Anomalies are detected using a variable cumulative error criterion. On the IMS dataset, VCEAD achieved 96.72% accuracy and 97.74% F1-score—performance closely matching LSTM-AE-QKVRQAA (99.41% F1-score). However, VCEAD relies on fixed threshold-based cumulative error, whereas the proposed RQA-aware attention provides adaptive anomaly scoring grounded in chaos-theoretic recurrence analysis, potentially offering better interpretability and robustness to non-stationary signals.
Ref. [
67] proposed the DAAD framework, combining domain adaptation with unsupervised anomaly detection to address distribution shifts across operating conditions. On the CWRU dataset, DAAD achieved an AUC of 95.70%. Despite their effectiveness in cross-domain transfer via adversarial or distribution-based alignment, domain adaptation techniques conventionally depend on normal data from both source and target domains. In contrast, the proposed RQA-aware attention embeds chaos-theoretic invariants directly into the attention mechanism, enabling single-domain training while maintaining robustness to condition variations. On CWRU, LSTM-AE-CRQAA achieves 99.25% F1-score and 99.01% AUC without domain adaptation overhead.
All experiments in this study were conducted within individual datasets, following the commonly adopted evaluation protocol in unsupervised anomaly detection. While the proposed RQA-guided attention mechanism was validated on multiple bearing datasets exhibiting different dynamical characteristics (IMS, CWRU, and HUST), no explicit cross-dataset or domain-shift training–testing scenario was considered.
We acknowledge that such cross-dataset evaluation would provide stronger evidence regarding generalization under distributional shifts. However, in unsupervised anomaly detection, differences in sensor configuration, sampling frequency, operating conditions, and fault annotation standards across datasets often make direct cross-dataset transfer ill-posed without additional adaptation mechanisms. Investigating domain adaptation and cross-dataset generalization therefore constitutes an important direction for future work.
It should be noted that the reported correlations quantify association rather than causality; nevertheless, they provide useful insight into how different recurrence properties interact with the proposed attention mechanisms.
6. Conclusions
In this study, hybrid deep learning architectures were proposed to improve unsupervised bearing anomaly detection by systematically integrating recurrence quantification analysis (RQA) metrics into different stages of LSTM-based autoencoder models. RQA descriptors were embedded at the input level, encoder output, and within a QKV attention mechanism, resulting in three architectures: LSTM-AE-QKVRQAA, LSTM-AE-CRQAA, and LSTM-AE-ERQAA.
The proposed models were evaluated on three benchmark bearing datasets—IMS, CWRU, and HUST—characterized by different noise levels and dynamical behaviors. Experimental results demonstrate that RQA-enhanced attention mechanisms significantly improve anomaly detection performance by capturing nonlinear recurrence structures and temporal dependencies inherent in vibration signals. Among the proposed architectures, the hybrid LSTM-AE-QKVRQAA consistently achieved the most balanced and robust performance across datasets, highlighting the benefit of jointly modeling temporal attention and global RQA-based dynamical cues.
On the IMS dataset, LSTM-AE-QKVRQAA achieved a 99.41% F1-score and a 99.45% AUC, outperforming the baseline LSTM-AE. For the CWRU dataset, RQA-aware models improved accuracy and F1-score by approximately 6–7%, approaching near-perfect anomaly discrimination, with LSTM-AE-CRQAA achieving a 99.25% F1-score and a 99.01% AUC. In the more challenging HUST dataset, where the baseline model exhibited limited performance, the LSTM-AE-CRQAA architecture achieved an F1-score of 99.85% and an AUC of 99.00%, demonstrating strong robustness under noisy and heterogeneous operating conditions. Comparative analysis with state-of-the-art methods further confirms that LSTM-AE-QKVRQAA and LSTM-AE-CRQAA outperform existing deep learning-based anomaly detection approaches, particularly in terms of robustness across datasets with different noise levels and dynamical characteristics. These findings validate that embedding chaos-aware RQA descriptors into attention mechanisms provides an effective and principled way to model nonlinear dynamics, making the proposed framework well suited for practical PHM applications.
Despite the promising performance, several limitations should be acknowledged. First, RQA descriptors are computed using fixed embedding and recurrence parameters selected empirically, which may not optimally capture system dynamics under all operating conditions. Adaptive or data-driven parameter tuning could further improve robustness, especially in highly non-stationary environments. Second, the computation of recurrence plots and RQA metrics introduces additional computational cost compared to standard attention mechanisms. While acceptable for offline PHM analysis, this overhead may limit applicability in real-time or edge-based monitoring systems. In addition, the proposed models are trained in an offline manner and assume stationary degradation distributions. In realistic industrial settings, degradation patterns may evolve due to changing loads, environments, or maintenance actions. Incorporating online or continual learning strategies could help address such concept drift. Finally, this study focuses on univariate or globally aggregated multivariate recurrence analysis. Extending the framework to multi-scale and channel-wise RQA representations, as well as integrating physics-informed constraints, represents a promising direction for future research.