Next Article in Journal
Evaluating the FLUX.1 Synthetic Data on YOLOv9 for AI-Powered Poultry Farming
Previous Article in Journal
Composition and Technological Properties of Modified Lingonberry (Vaccinium vitis-idaea L.) Pomace
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bearing Lifespan Reliability Prediction Method Based on Multiscale Feature Extraction and Dual Attention Mechanism

1
College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
2
Institute of Regulatory Science for Medical Devices, Sichuan University, Chengdu 610065, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(7), 3662; https://doi.org/10.3390/app15073662
Submission received: 14 February 2025 / Revised: 15 March 2025 / Accepted: 23 March 2025 / Published: 27 March 2025

Abstract

:
Accurate prediction of the remaining useful life (RUL) of rolling bearings was crucial for ensuring the safe operation of machinery and reducing maintenance losses. However, due to the high nonlinearity and complexity of mechanical systems, traditional methods failed to meet the requirements of medium- and long-term prediction tasks. To address this issue, this paper proposed a recurrent neural network with a dual attention model. By employing path weight selection methods, Discrete Fourier transform, and selection mechanisms, the prediction accuracy and generalization ability in complex time series analysis were significantly improved. Evaluation results based on mean absolute error (MAE) and root mean square error (RMSE) indicated that the dual attention mechanism effectively focused on key features, optimized feature extraction, and improved prediction performance. An end-to-end RUL prediction model was established based on the MS-DAN network, and the effectiveness of the method was validated using the IEEE PHM 2012 Data Challenge dataset, providing more accurate decision support for equipment maintenance engineers.

1. Introduction

Accurate prediction of the remaining useful life (RUL) of rolling bearings is crucial for ensuring the safe operation of machinery and reducing maintenance costs [1,2,3]. Modern industrial equipment operates in complex and dynamic environments, where its reliability directly impacts production efficiency and safety [4,5,6]. Therefore, accurate reliability assessment was crucial. However, the actual structure of the equipment was influenced by various internal and external time-varying effects, complex operational disturbances, measurement noise, and other factors, making it challenging to ensure the accuracy of long-term predictions [7,8]. To address this issue, this paper analyzed real operational data and developed new methods and tools for reliability evaluation.
In recent years, deep learning has attracted widespread attention in fields such as natural language processing, transfer learning, and computer vision [9,10,11,12]. Gomez et al. [13] proposed an improved time fusion Transformer method, which replaces the traditional long short-term memory (LSTM) [14] with a bidirectional long short-term memory (Bi-LSTM) encoder–decoder to enhance the ability to capture time-series features. This method integrates Bayesian optimization based on a tree-structured Parzen estimator for the state of health and RUL prediction of lithium batteries. Niazi et al. [15] developed a parallel neural network framework that employs multi-channel processing combined with Time Transformer [16], LSTM, and other methods to capture spatiotemporal dependencies, improving efficiency and accuracy in handling multidimensional features. Zhu et al. [17] proposed the TACT model constrained by L2 regularization for RUL prediction. The model combines multi-scale convolutional neural network CNN and Transformer to achieve simultaneous extraction of local and global features and introduces delayed prediction constraints to optimize training. Lin et al. [18] introduced a nonlinear multi-stage degradation model based on the Wiener process, incorporating a stage division method to automatically determine the number of stages, change point locations, and drift model forms. Variational Bayesian methods were used to adaptively estimate parameters and derive RUL analytically. Traditional RUL prediction models [1,2] had inherent limitations in extracting critical information. They lacked the ability to effectively represent significant features, particularly when dealing with multi-directional data [19]. With the deepening research on equipment degradation processes, attention mechanisms have been gradually introduced into RUL prediction. This mechanism helps dynamically capture key features and improves the model’s prediction performance in complex degradation processes. For example, Zhang et al. [20] proposed an algorithm that integrates an improved self-attention mechanism, temporal convolutional network (TCN) [21], and squeeze-and-excitation mechanism to weight the contributions of input features across both the time-step and channel dimensions, highlighting key features highly relevant to the RUL. Xu et al. [22] proposed an RUL prediction method based on an improved Transformer model that integrates attention mechanisms and deep learning, comprehensively considering spatiotemporal characteristics and various operating conditions. Ding et al. [23] designed a multi-scale convolution module [24] combined with the Swish activation function, embedding local feature learning into global sequence modeling. This approach simultaneously extracts local dependencies and global interaction information from raw time signals and transforms them into trainable class labels. Zhao et al. [25] proposed a novel gated attention mechanism called Capsule Neural Network [26].
These methods achieved significant results in equipment reliability assessment and RUL prediction but also exhibited some notable shortcomings. Firstly, the requirements for different features might have varied across different degradation stages. Certain features might have been critical to the degradation process at specific stages, yet traditional methods often fail to dynamically weight or focus on features based on the changing degradation stages, leading to the neglect of key features. Secondly, the models lacked adaptability, as many traditional models relied on fixed parameters or feature extraction methods, making it challenging to flexibly adjust to varying degradation stages or environmental conditions. Although attention mechanisms could capture long-term dependencies, they might have still lacked sufficient adaptability in adjusting focus points. To address these issues, this paper proposed a time-series analysis method based on multi-scale feature extraction and path weight selection. As shown in Figure 1, this method employed Discrete Fourier Transform (DFT) [27] to extract periodic components from time-series data and used a TopK selection mechanism [28] to retain the most critical path weights. This allowed the importance of feature paths to be dynamically adjusted based on the degradation stage of the equipment. In the healthy stage, when degradation signs were not yet apparent, the path weight selection mechanism automatically reduced the influence of paths with lower contributions. In the degradation stage, when key features became more critical, the mechanism enhanced the weights of these paths, improving the extraction rate of degradation-related information. After extracting path features, the model first utilized the proposed attention model, EM-Net, to extract initial features, ensuring an effective representation of the data in the feature space. Subsequently, the model employed a recurrent neural network (RNN) [29] to progressively capture the temporal dependencies of the signals. Finally, the processed features were passed to the RUL prediction module, which incorporated activation functions and dropout regularization to prevent overfitting, ultimately generating accurate RUL predictions to provide precise lifespan estimations for the equipment.
This study made significant contributions to effectively capturing information during the equipment degradation process, extracting key information, and enhancing the model’s robustness against interference. Compared to existing methods, our model demonstrated superior prediction accuracy and generalization capability. In particular, by integrating path weight selection with an innovative approach based on one-dimensional convolution and spatial attention mechanisms, the model further improved its ability to predict the degradation process of bearings. The path weight selection mechanism dynamically adjusted the weights of feature paths according to different stages of bearing degradation, enabling more precise capture of degradation information. Experimental results showed that the proposed method outperformed other existing methods on both the PHM 2012 [30] bearing dataset and real-world equipment data, validating its effectiveness and superiority.
The main contributions of this paper were as follows:
1.
A path weight selection mechanism was proposed, which could dynamically adjust the weights of feature paths according to different stages of bearing degradation, thereby capturing degradation information more accurately;
2.
A dual attention mechanism was constructed, capable of flexibly capturing dependencies between channels and automatically adjusting the importance of each channel, which effectively enhanced the model’s feature representation capability;
3.
The MS-DAN prediction method was proposed, which enhanced the feature extraction capability during the equipment degradation process and demonstrated excellent performance in prediction accuracy.

2. Materials and Methods

2.1. Materials

This paper used the dataset provided by the IEEE PHM2012 Challenge to verify the effectiveness of the proposed RUL prediction method. The dataset was collected by the PRONOSTIA experimental platform, and the collection device is shown in Figure 2. The PRONOSTIA platform is an experimental setup designed and implemented by the French FEMTO-ST Institute, specifically for testing and verifying bearing fault detection, diagnosis, and prediction methods. The platform operated under three different working conditions (speed 1800 rpm/load 4000 N, speed 1650 rpm/load 4200 N, speed 1500 rpm/load 5000 N), with accelerometers installed on both the vertical and horizontal axes to measure the vibration signals of the rolling bearings. Vibration data were collected every 10 s for 0.1 s, with a sampling frequency of 25.6 kHz to obtain online health monitoring data (such as speed, load, temperature, and vibration). A total of 2560 sets of sample data were collected every 10 s. The deep groove ball bearing was chosen as the experimental bearing primarily because of its excellent load-bearing capacity and wide applicability, especially in environments where variable loads needed to be supported.
The PHM2012 dataset contained 17 full life cycle tests of bearings. It was divided into 6 training sets and 11 test sets. Both the training and test sets covered different speed and load conditions to verify the effectiveness of bearing fault diagnosis and prediction methods. Bearings 1-1 and 2-1 represented different loads and speeds to simulate the degradation process under actual operating conditions, as shown in Figure 3 and Table 1. Through the PRONOSTIA platform, researchers were able to obtain complete data from normal operation to failure of the bearing in a controlled environment, providing experimental data for training and validation of machine learning models. According to relevant studies in the literature [31,32], horizontal vibration signals typically provide more useful information than vertical vibration signals for tracking bearing degradation. Therefore, this paper only used horizontal vibration signals for the experiments.

2.2. Methods

2.2.1. Overview

The MS-DAN model, as shown in Figure 4, was built on the traditional RNN model with the addition of the attention mechanism and feature selection proposed in this paper. Based on the different stages of bearing degradation, it dynamically adjusted the weights of each feature path to more accurately capture degradation information. The MS-DAN model first performed feature selection and converted the time-domain signal into the frequency-domain signal through DFT to extract useful features. Then, the frequency-domain features were downsampled using kernel average pooling, and important features were highlighted by SoftMax weighting. The most important K features were retained, and the frequency-domain features were converted back to the time domain through Inverse Discrete Fourier Transform (IDFT). Next, the data entered the feature learning stage, where the features were further optimized through the feature extractor and enhanced feature modules. Through the attention mechanism, the model automatically assigned weights to different features, strengthened the areas containing more degradation information, and helped the model focus on important features. The cross-attention mechanism further enhanced the interactive information between features. In the RNN module, the model used recurrent neural networks to capture long-term dependencies in the time series. After passing through the RNN model, the loss L was calculated, and the model parameters were optimized through backpropagation.
Ultimately, this method not only enhanced the accuracy of RUL predictions but also improved the model’s adaptability to various operating conditions, providing more reliable support for maintenance decision-making of mechanical equipment. The pseudocode of the algorithm is shown in Algorithm 1.
Algorithm 1: Applying proposed to Prediction
Input: A set of dataset samples Learning_set = {(X1), (X2), …, (Xn)}. The Full_Test_Set is the test set. The number of learning epochs is M.
Output: the optimal model and its predicted RUL
1.
  Load the training set and validation set;
2.
  Begin:
3.
  Initialize all wights and biases.
4.
  For m = 1, 2, …, M do
5.
  Extract features through Multiscale model → FR;
6.
  Input FR to MS-DAN;
7.
  Calculate the output for the MS-DAN;
8.
  Input the feature of FMS into sequence X, and input RNN;
9.
  Calculate the output for the RNN layer;
10.
Calculate the RUL;
11.
Model Fit (Adam, (train X)) → M(m);
12.
Model Evaluate (M(m), (Val X)) → Rmae(m).
13.
End For
14.
Save the optimal model which has min Rmae in M epochs.
15.
End
16.
Load the testing set;
17.
Load the optimal model in terms of RUL performances

2.2.2. Multi-Scale Partitioning

Multi-scale partitioning could be easily extended to the multivariable case by independently considering each variable. In the multi-scale module, we defined a set S = {S1, …, SM} that contained M patch size values, where each patch size S corresponded to a patch partitioning operation. For the input time series Xi ∈ RH×d, where H represents the length of the time series and d represents the dimensionality of the features, the partitioning operation for a patch size S divides X into P patches, (X1, X2, …, XP), where each patch Xi ∈ RS×d contained S time steps. As shown in Figure 5.
The extraction of periodic components was primarily achieved through the DFT. By applying the DFT to the input time series X, it was converted from the time domain to the frequency domain. fk represents a single frequency component.
X f = D F T ( X ) = { f 1 , f 2 , , f x }
The amplitudes of various frequency components were computed, and the Top Kf frequencies with the largest amplitudes were selected. This selection process not only ensured the sparsity of the frequency domain but also effectively retained the most important frequency components, thereby reducing redundant information. Kf represented the number of selected frequency components.
X f = T o p K ( { X f } , K f )
The amplitudes of various frequency components were computed, and the Top Kf frequencies with the largest amplitudes were selected. This selection process not only ensured the sparsity of the frequency domain but also effectively retained the most important frequency components, thereby reducing redundant information.
X f ( t ) = k = 1 K A k c o s ( 2 π f k t + φ k )
A and Φ represent the amplitude and phase of the selected frequencies, respectively.
Using the IDFT, the selected frequency components were converted back to the time domain to obtain the periodic part Xf, reconstructing the periodic fluctuations in the time series, as described by the following formula.
X s = I D F T ( X f , K f )
The residual component was then averaged through pooling using kernels of different sizes. Multiple convolution operations were performed on the residual component with different pooling kernels, and the output for each kernel was computed. Subsequently, the SoftMax function was applied to determine the weight of each kernel. Yk represents the output value after the pooling operation.
Y k = A v g p o o l ( X s , k e r n e l )
The features of each segment were normalized using the SoftMax function to obtain the corresponding weights.
W k = exp ( Y k ) i 1 p exp ( Y i )
The periodic components captured cyclical fluctuations, while the trend components reflected long-term changes in the time series. The TopK selection mechanism allowed us to retain the most important path weights, thereby optimizing the feature extraction process. The pseudocode of feature selection is shown in Algorithm 2.
F = k = 1 K W k Y k
W p a t h = s o f t m a x ( F )
T o p K ( W p a t h , K )
Algorithm 2: Feature Selection using DFT
Input: Time series X∈RH×d. Number of K.
Output: Selected_k
1.
  Perform DFT on the input time series X to obtain frequency components;
2.
  Xf = DFT (X):
3.
  Xf = TopK({Xf },K)
4.
  Xs = IDFT(Xf)
5.
  Pooling (Xf, kernel)
6.
  For size in kernel
7.
   Apply the pooling operation to the input Xf
8.
  Return Xpooled
9.
  For weights in SoftMax weights
10.
 Calculate the SoftMax weight of each pooling kernel output
11.
Return selected_K
12.
Selected_K = TopK (Xpooled, K)

2.2.3. Attention Mechanism

The introduction of the attention mechanism in deep learning significantly improved model performance, especially in handling sequence data and natural language processing tasks [33,34,35]. The attention mechanism assigned different weights to each element in the input sequence, allowing the model to focus on the most relevant parts for the current task when calculating the output. This enabled the model to allocate weighted attention across different input positions, thereby improving the efficiency of information utilization. This paper designed an EM-Net, which included an efficient channel attention network (ECA-Net) [36] and convolutional block attention module (CBAM) [37]. ECA-Net replaced the channel attention module in CBAM by generating channel attention through the weighted cross-channel information and used a simple 1D convolution to model the relationships between channels. This approach not only reduced the number of parameters but also improved computational efficiency, as shown in Figure 6.
Performed global average pooling on the input feature X1 ∈ RW×H×C. The pooling operation averaged along the spatial dimensions (W, H) for each channel, resulting in a feature g(X), where c is the number of channels.
g ( X ) = 1 W × H w = 1 W h = 1 H X 1 ( w , h , c )
The formula for calculating the kernel size k is based on the number of channels c.
k = | l o g 2 ( c ) + b / γ |
To prevent the calculated kernel size from being 1, which would make it ineffective at extracting information between channels, set b = 1 and γ = 2. Perform a Conv1D operation on g(X) using the computed kernel size k, and normalize the convolution result through the Sigmoid function to obtain the weighted coefficient for each channel Fc.
σ ( x ) = 1 / 1 + e x
F c = C o n v 1 D ( g ( x ) ) × σ ( k )
The two pooled features were concatenated to form a joint feature containing information from the pooling operations. Then, a 7 × 7 convolutional layer was used to convolve the concatenated feature. The purpose of this step was to extract richer spatial features through the convolution operation and generate a spatial attention matrix Fd, which contained attention weights for each spatial location. This helped the model focus more on important regions, improving its ability to perceive and enhance the features.
F d = σ ( C o n v 2 D 7 × 7 ( [ A v g p o o l ( F c × X ) , M a x p o o l ( F c × X ) ] ) )
X 2 = X 1 × F C × F d

2.2.4. RNN

RNN was a type of neural network model capable of processing sequential data. This enabled it to excel in tasks such as speech recognition, natural language processing, and time series prediction [38,39]. Especially in RUL prediction, RNN could capture the temporal relationships in data by passing information through hidden states across time steps. To better extract features from equipment condition monitoring data, the independent recurrent neural network (IndRNN) [40] network was employed. IndRNN achieved this by making the update of each neuron independent, meaning that each neuron was updated based solely on its own state and input, without relying on the states of other neurons. This approach effectively avoided the gradient vanishing and explosion issues present [41,42,43] in RNN and LSTM. Since each neuron’s computation was independent, the information transfer over long time steps became more stable.
As shown in Figure 7. The hidden layer ht was updated based on Xt and the previously hidden layer ht−1 at step t.
h t = σ ( W X t + u t 1 · h t 1 + b )
The hidden state update of each neuron was calculated based on the current input and the hidden state from the previous time step. For the n neuron, the hidden state hn,t at time step t was given by the following equation.
h n , t = σ ( W n X t + u n , t 1 · h n , t 1 + b n )
Xt ∈ RM represents the input at time step t, ht−1 ∈ RN was the hidden state from the previous time step, Wn and hn were the weight matrices for the current input and the previous hidden state, respectively, bn was the bias term, and σ was the activation function. In this equation, the current hidden state was influenced not only by the input but also by the hidden state from the previous time step, reflecting the temporal dependency characteristic of recurrent neural networks. In IndRNN, the design of the loss function was crucial for the optimization process. A commonly used loss function was the mean absolute error, which was given by the following form.
L = 1 M 1 M | y t y t |
Among them, y t represented the actual target output, while y t was the predicted output of the network at time step t. The loss function was computed based on the errors at all time steps, which reflected the difference between the network’s output and the actual target. By minimizing this loss function, IndRNN could continuously adjust its parameters to improve prediction accuracy. To minimize the loss function and optimize the network parameters, IndRNN used the backpropagation algorithm. The core idea of backpropagation was to gradually update the weights and biases in the network based on the gradient information of each neuron with respect to the loss function.
J n h n , t = J n h n , T h n , T h n , t = J n h n , T k = t T 1 σ n , k + 1 u n = J n h n , T u n T 1 k = t T 1 σ n , k + 1
Since the hidden state update of each neuron was independent, IndRNN could accelerate the training process by performing independent backpropagation operations when computing gradients.

3. Results

3.1. Evaluation Criteria

This paper used TensorFlow [44] for code compilation to predict the remaining useful life of PHM-bearing data. Mean absolute error (MAE) [45] and root mean square error (RMSE) [46] were used as evaluation metrics. Lower MAE and RMSE values were preferred. A smaller MAE indicated lower prediction error, implying higher prediction accuracy and reduced variability in predictions. Similarly, a smaller RMSE suggested greater stability in predictions.
M A E = 1 m t 1 m | y t y ^ t |
R M S E = 1 m t 1 m ( y t y ^ t ) 2

3.2. Experimental Setup and Performance

As shown in Table 2 the proposed method exhibited a clear advantage over CNN, TCN, gated recurrent unit (GRU), bidirectional GRU (BiGRU), and BiLSTM in both Bearing 1 and Bearing 2 test sets. It demonstrated lower MAE and RMSE compared to the other models. The network model outperformed the baseline models in direct prediction, highlighting its enhanced generalization. Adaptability was a main advantage of the proposed network architecture, allowing it to select different scales for various temporal dynamics. This adaptability enabled it to effectively capture the complex temporal patterns present in different datasets, demonstrating superior generalization ability. Table 3 and Figure 8 show the performance comparison of the model under different parameters.

3.3. Ablation Experiment

In order to validate the effectiveness of the proposed improved attention mechanism, ablation experiments were conducted with the following configurations: base model, base model + CBAM, and ours (Multiscale + CBAM + IndRNN). As shown in Table 4, the proposed model (ours) achieved the best performance across multiple metrics, demonstrating the effectiveness of integrating multiscale, CBAM, and IndRNN. Furthermore, the proposed model maintained a competitive parameter size, balancing performance and computational efficiency effectively.

3.4. Comparison of Different Modules

The evaluation results were presented. The main advantage of the model was its adaptability, which allowed it to select different scales according to varying temporal dynamics. Through this adaptive mechanism, the model was able to identify and capture complex temporal dependencies and dynamic changes in various time series data. This adaptability had great potential in real-world applications, enabling the model to better handle a wide range of time series problems with high prediction accuracy and stability, as shown in Table 5.

4. Discussion

To effectively predict the remaining useful life (RUL) of condition monitoring data, this paper proposed a multi-stage RUL prediction method. First, real-time signals of the monitoring data were preprocessed using path weight selection. The purpose of path weight selection was to choose the most representative and important features from the monitoring signals in order to better reflect the degradation state of the equipment. As shown in Table 4, the ablation experiment results indicated that after applying path weight selection, compared to (base + EM-NET), the performance improved across all bearing data, with an average reduction of 20% in MAE and 15% in RMSE. This improvement suggested that path weight selection effectively optimized the data features, thereby enhancing the model’s prediction accuracy. Additionally, this paper introduces an improved attention mechanism module, which strengthens the traditional attention mechanism to capture more degradation information and preserve more detailed features. The features processed by this module were then input into the RNN, which effectively captured long-term dependencies in the time series [19,47], allowing the model to better learn the degradation patterns and trends of the equipment. By combining path weight selection with the dual attention mechanism, the model was able to focus more on regions containing more degradation information, helping it emphasize features that were crucial for prediction. This integrated approach further improved the model’s accuracy and robustness. Finally, the MS-DAN model demonstrated significant performance improvements on the PHM-bearing dataset. As shown in Table 5, the MS-DAN model, compared to newer models, reduced RMSE by an average of 13.51% and MAE by an average of 10.14%. These results indicated that the MS-DAN model exhibited notable improvements in both prediction accuracy and generalization ability, providing a more reliable solution for condition monitoring and RUL prediction.
Multiscale selected the top K patch sizes for combination to adapt to different time series samples. As shown in Figure 9, the impact of different K values on the prediction result was evaluated in Figure 9. The results showed that performance with K = 2 and K = 3 outperformed K = 1 and K = 6, highlighting the advantage of adaptively modeling critical multi-scale features to enhance accuracy. Furthermore, different time series samples benefited from feature extraction using various patch sizes, but not all patch sizes were equally effective. These findings highlighted the adaptability of the model, emphasizing its ability to identify and apply optimal combinations of patch sizes to address the diverse periodic and trend patterns present in the samples.
Table 4 presents the hyperparameter settings for model training across six different experiments, with all experiments using 50 epochs. Comparing Exp2 and Exp4, it could be observed that, with other parameters held constant, a larger batch size led to better model performance. Comparing Exp1, Exp3, and Exp4, it was evident that using SGD or RMSprop as optimizers was less efficient in complex tasks compared to Adam. Finally, comparing Exp4 and Exp6, it could be seen that using a smaller learning rate allowed for more precise model adjustments and prevented skipping over optimal weights.

5. Conclusions

This study developed a novel model that incorporates an improved attention mechanism, utilizing efficient one-dimensional convolution to generate channel weights. This design significantly reduces the number of parameters while avoiding the introduction of dimensionality reduction operations, thereby enhancing the model’s efficiency. The features extracted through path weight selection were input into the RNN and the enhanced attention module to further improve the model’s predictive performance. Experimental results demonstrated that the model achieved excellent predictive performance on the condition monitoring dataset. In the future, this study plans to adopt adaptive techniques to further optimize the network model. For example, a multi-scale adaptive attention mechanism [48] was introduced to extract information from both minor fluctuations and the long-term stability of equipment, thereby enhancing the model’s capability to predict stability under different operating conditions. Additionally, the research team plans to collaborate further with West China Hospital to collect operational data from ventilators, monitors, and portable extracorporeal devices and apply the proposed method to the prediction of medical equipment stability.

Author Contributions

Conceptualization, X.L. and M.W.; methodology, X.L. and M.W.; software, X.L.; formal analysis, X.L.; investigation, X.L. and M.W.; resources, M.W.; data curation, X.L.; writing—original draft preparation, X.L.; writing—review and editing, X.L. and M.W.; visualization, X.L.; supervision, M.W.; project administration, M.W.; funding acquisition, M.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (Project No. 2022YFC2407600, Project No. 2022YFC3601000).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The research in this paper uses a publicly available dataset. The dataset can be downloaded from the following link: [https://github.com/wkzs111/phm-ieee-2012-data-challenge-dataset] (accessed on 23 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MAEMean absolute error
RMSERoot mean square error
RULRemaining useful life
LSTMLong short-term memory
Bi-LSTMBidirectional long short-term memory
CNNConvolutional neural network
TCNTemporal convolutional network
DFTDiscrete Fourier Transform
RNNRecurrent neural network
ECA-NetEfficient channel attention network
CBAMConvolutional block attention module
IndRNNIndependent recurrent neural network
GRUGated recurrent unit
BiGRUBidirectional gated recurrent unit

References

  1. Wang, Y.; Zhao, Y.; Addepalli, S. Remaining useful life prediction using deep learning approaches: A review. Procedia Manuf. 2020, 49, 81–88. [Google Scholar]
  2. Ferreira, C.; Gonçalves, G. Remaining Useful Life prediction and challenges: A literature review on the use of Machine Learning Methods. J. Manuf. Syst. 2022, 63, 550–562. [Google Scholar]
  3. Zhang, Y.; Fang, L.; Qi, Z.; Deng, H. A review of remaining useful life prediction approaches for mechanical equipment. IEEE Sens. J. 2023, 23, 29991–30006. [Google Scholar]
  4. Zio, E. Some challenges and opportunities in reliability engineering. IEEE Trans. Reliab. 2016, 65, 1769–1782. [Google Scholar]
  5. Wang, Q.; Liu, W.; Xin, Z.; Yang, J.; Yuan, Q. Development and application of equipment maintenance and safety integrity management system. J. Loss Prev. Process Ind. 2011, 24, 321–332. [Google Scholar]
  6. Cepin, M.; Radim, B. Safety and Reliability. Theory and Applications; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
  7. Bagri, I.; Tahiry, K.; Hraiba, A.; Touil, A.; Mousrij, A. Vibration Signal Analysis for Intelligent Rotating Machinery Diagnosis and Prognosis: A Comprehensive Systematic Literature Review. Vibration 2024, 7, 1013–1062. [Google Scholar] [CrossRef]
  8. Zhang, P.; Chen, R.; Xu, X.; Yang, L.; Ran, M. Recent progress and prospective evaluation of fault diagnosis strategies for electrified drive powertrains: A comprehensive review. Measurement 2023, 222, 113711. [Google Scholar]
  9. Alyafeai, Z.; AlShaibani, M.S.; Ahmad, I. A survey on transfer learning in natural language processing. arXiv 2020, arXiv:2007.04239. [Google Scholar]
  10. Chai, J.; Zeng, H.; Li, A.; Ngai, E.W. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Mach. Learn. Appl. 2021, 6, 100134. [Google Scholar]
  11. Reza, M.; Mannan, M.; Mansor, M.; Ker, P.J.; Mahlia, T.M.I.; Hannan, M. Recent advancement of remaining useful life prediction of lithium-ion battery in electric vehicle applications: A review of modelling mechanisms, network configurations, factors, and outstanding issues. Energy Rep. 2024, 11, 4824–4848. [Google Scholar]
  12. Song, L.; Jin, Y.; Lin, T.; Zhao, S.; Wei, Z.; Wang, H. Remaining useful life prediction method based on the spatiotemporal graph and GCN nested parallel route model. IEEE Trans. Instrum. Meas. 2024, 73, 1–12. [Google Scholar]
  13. Gomez, W.; Wang, F.K.; Chou, J.H. Li-ion battery capacity prediction using improved temporal fusion transformer model. Energy 2024, 296, 131114. [Google Scholar]
  14. Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [PubMed]
  15. Niazi, S.G.; Huang, T.; Zhou, H.; Bai, S.; Huang, H.-Z. Multi-scale time series analysis using TT-ConvLSTM technique for bearing remaining useful life prediction. Mech. Syst. Signal Process. 2024, 206, 110888. [Google Scholar]
  16. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Gomez, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  17. Zhu, J.; Ma, J.; Wu, J. A regularized constrained two-stream convolution augmented transformer for aircraft engine remaining useful life prediction. Eng. Appl. Artif. Intell. 2024, 133, 108161. [Google Scholar]
  18. Lin, W.; Chai, Y.; Fan, L.; Zhang, K. Remaining useful life prediction using nonlinear multi-phase Wiener process and variational Bayesian approach. Reliab. Eng. Syst. Saf. 2024, 242, 109800. [Google Scholar]
  19. Kumar, A.; Parkash, C.; Vashishtha, G.; Tang, H.; Kundu, P.; Xiang, J. State-space modeling and novel entropy-based health indicator for dynamic degradation monitoring of rolling element bearing. Reliab. Eng. Syst. Saf. 2022, 221, 108356. [Google Scholar]
  20. Zhang, Q.; Liu, Q.; Ye, Q. An attention-based temporal convolutional network method for predicting remaining useful life of aero-engine. Eng. Appl. Artif. Intell. 2024, 127, 107241. [Google Scholar]
  21. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
  22. Xu, D.; Xiao, X.; Liu, J.; Sui, S. Spatio-temporal degradation modeling and remaining useful life prediction under multiple operating conditions based on attention mechanism and deep learning. Reliab. Eng. Syst. Saf. 2023, 229, 108886. [Google Scholar]
  23. Ding, Y.; Jia, M. Convolutional transformer: An enhanced attention mechanism architecture for remaining useful life estimation of bearings. IEEE Trans. Instrum. Meas. 2022, 71, 3515010. [Google Scholar]
  24. Cai, Z.; Fan, Q.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; Volume 14, pp. 354–370. [Google Scholar]
  25. Zhao, C.; Huang, X.; Li, Y.; Li, S. A novel remaining useful life prediction method based on gated attention mechanism capsule neural network. Measurement 2022, 189, 110637. [Google Scholar]
  26. Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  27. Wang, Z. Fast algorithms for the discrete W transform and for the discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 1984, 32, 803–816. [Google Scholar]
  28. Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
  29. Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar]
  30. Nectoux, P.; Gouriveau, R.; Medjaher, K. An experimental platform for bearings accelerated degradation tests. In Proceedings of the IEEE International Conference on Prognostics and Health Management IEEE, Beijing, China, 18–21 June 2012; pp. 23–25. [Google Scholar]
  31. Soualhi, A.; Medjaher, K.; Zerhouni, N. Bearing health monitoring based on Hilbert–Huang transform, support vector machine, and regression. IEEE Trans. Instrum. Meas. 2014, 64, 52–62. [Google Scholar]
  32. Singleton, R.K.; Strangas, E.G.; Aviyente, S. Extended Kalman filtering for remaining-useful-life estimation of bearings. IEEE Trans. Ind. Electron. 2014, 62, 1781–1790. [Google Scholar]
  33. Xu, L.; Huang, J.; Nitanda, A.; Asaoka, R.; Yamanishi, K. A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification. arXiv 2020, arXiv:2007.15897. [Google Scholar]
  34. Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
  35. Liu, C.; Huang, L.; Wei, Z.; Zhang, W. Subtler mixed attention network on fine-grained image classification. Appl. Intell. 2021, 51, 7903–7916. [Google Scholar]
  36. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  37. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  38. Shastry, K.A.; Shastry, A. An integrated deep learning and natural language processing approach for continuous remote monitoring in digital health. Decis. Anal. J. 2023, 8, 100301. [Google Scholar]
  39. Wei, D.; Wang, B.; Lin, G.; Liu, D.; Dong, Z.; Liu, H.; Liu, Y. Research on unstructured text data mining and fault classification based on RNN-LSTM with malfunction inspection report. Energies 2017, 10, 406. [Google Scholar] [CrossRef]
  40. Li, S.; Li, W.; Cook, C.; Zhu, C.; Gao, Y. Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  41. Zhao, B.; Li, S.; Gao, Y. IndRNN based long-term temporal recognition in the spatial and frequency domain. In Adjunct Proceedings of the 2020 ACM International Joint; Association for Computing Machinery: New York, NY, USA, 2020; pp. 368–372. [Google Scholar]
  42. Zhang, P.; Meng, J.; Luan, Y.; Liu, C. Plant miRNA-lncRNA Interaction Prediction with the Ensemble of CNN and IndRNN. Interdiscip. Sci. 2020, 12, 82–89. [Google Scholar] [PubMed]
  43. Liao, H. Image Classification Based on IndCRNN Module. In Proceedings of the ICVISP 2020: 2020 4th International Conference on Vision, Image and Signal Processing, Bangkok, Thailand, 9–11 December 2020; pp. 1–6. [Google Scholar]
  44. Pang, B.; Nijkamp, E.; Wu, Y.N. Deep learning with tensorflow: A review. J. Educ. Behav. Stat. 2020, 45, 227–248. [Google Scholar]
  45. Qiao, C.; Li, D.; Guo, Y.; Liu, C.; Jiang, T.; Dai, Q.; Li, D. Evaluation and development of deep neural networks for image super-resolution in optical microscopy. Nat. Methods 2021, 18, 194–202. [Google Scholar]
  46. Hodson, T. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. Discuss. 2022, 15, 5481–5487. [Google Scholar]
  47. Wei, G.; Zhao, J.; Feng, Y.; He, A.; Yu, J. A novel hybrid feature selection method based on dynamic feature importance. Appl. Soft Comput. 2020, 93, 106337. [Google Scholar]
  48. Shao, X.; Kim, C.-S. Adaptive multi-scale attention convolution neural network for cross-domain fault diagnosis. Expert Syst. Appl. 2024, 236, 121216. [Google Scholar]
Figure 1. The schematic illustration of the proposed.
Figure 1. The schematic illustration of the proposed.
Applsci 15 03662 g001
Figure 2. The PRONOSTIA platform.
Figure 2. The PRONOSTIA platform.
Applsci 15 03662 g002
Figure 3. Dataset of bearing.
Figure 3. Dataset of bearing.
Applsci 15 03662 g003
Figure 4. The flowchart of the proposed method.
Figure 4. The flowchart of the proposed method.
Applsci 15 03662 g004
Figure 5. Multi-scale partitioning.
Figure 5. Multi-scale partitioning.
Applsci 15 03662 g005
Figure 6. Layered architecture of EM-Net.
Figure 6. Layered architecture of EM-Net.
Applsci 15 03662 g006
Figure 7. Basic structure of IndRNN.
Figure 7. Basic structure of IndRNN.
Applsci 15 03662 g007
Figure 8. Results of different hyperparameters of the experiment.
Figure 8. Results of different hyperparameters of the experiment.
Applsci 15 03662 g008
Figure 9. Network performance with different K. (a) represents the MAE for bearings 1-3 to 1-7 with different K values, (b) represents the RMSE for bearings 1-3 to 1-7 with different K values.
Figure 9. Network performance with different K. (a) represents the MAE for bearings 1-3 to 1-7 with different K values, (b) represents the RMSE for bearings 1-3 to 1-7 with different K values.
Applsci 15 03662 g009
Table 1. Operating condition information of PHM2012 dataset.
Table 1. Operating condition information of PHM2012 dataset.
Operating ConditionRadial Force/NRotational Speed/(r·min⁻¹)Training SetTesting Set
Condition 140001800Bearing 1-1,
Bearing 1-2
Bearing 1-3, Bearing 1-4, Bearing 1-5, Bearing 1-6, Bearing 1-7
Condition 242001650Bearing 2-1, Bearing 2-2Bearing 2-3, Bearing 2-4, Bearing 2-5, Bearing 2-6, Bearing 2-7
Condition 344001500Bearing 3-1, Bearing 3-2Bearing 3-3
Table 2. Results of metrics using different models.
Table 2. Results of metrics using different models.
MethodCNNTCNGRU
MetricMAERMSEMAERMSEMAERMSE
Bearing 130.1610.1930.1080.1220.1020.133
40.1050.1280.1050.1420.0960.135
50.1620.1930.1850.2510.1530.228
60.1450.1680.1550.1860.1980.265
70.1250.1490.1720.2560.1820.236
Bearing 230.1540.1950.1960.2350.2050.221
40.1120.1580.0930.1310.0870.132
50.1510.1860.1890.2150.1910.238
60.1790.2030.2050.2180.2120.256
70.1840.2160.1950.2320.1960.245
MethodBiGRUBiLSTMProposed (ours)
MetricMAERMSEMAERMSEMAERMSE
Bearing 130.0890.1080.0790.0850.0890.105
40.0950.1150.0840.1030.0580.075
50.1280.1560.1060.1320.0720.084
60.1020.1390.1330.1950.0850.103
70.1060.1220.0850.1050.0480.059
Bearing 130.1520.1870.1260.1560.0650.075
40.1280.1530.0660.0850.0970.109
50.1510.2080.1320.1660.0800.098
60.1650.2120.0920.1080.0920.102
70.1590.1950.1320.1760.1120.119
Table 3. Experiment hyperparameters of the model.
Table 3. Experiment hyperparameters of the model.
HyperparametersExp1Exp2Exp3
Epochs505050
Batch size256128256
optimizerRMSpropAdamSGD
Learning rate10−310−310−3
-Exp4Exp5Exp6
Epochs505050
Batch size256128256
optimizerAdamAdamAdam
Learning rate10−310−310−4
Table 4. Ablation experiment of modules.
Table 4. Ablation experiment of modules.
ModelEvaluated
Metrics
Bearing 1-5Bearing 2-3Bearing 2-5
Base modelMAE0.1280.1210.133
RMSE0.1450.1330.177
Base model + EM-NETMAE0.0930.0810.103
RMSE0.1020.0960.112
Ours (Base model + EM-NET + Multiscale)MAE0.0720.0650.080
RMSE0.0840.0750.098
Table 5. Comparison of classification performance on Bearings 1–3.
Table 5. Comparison of classification performance on Bearings 1–3.
MethodsEvaluated Metrics
MAERMSE
TCN + Hybrid Attention Mechanism0.0950.109
Patch + PAS + Multiscale0.0930.105
DMW-Trans0.0990.137
MLP + Transformer0.1110.140
Ours0.0890.105
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, X.; Wang, M. Bearing Lifespan Reliability Prediction Method Based on Multiscale Feature Extraction and Dual Attention Mechanism. Appl. Sci. 2025, 15, 3662. https://doi.org/10.3390/app15073662

AMA Style

Luo X, Wang M. Bearing Lifespan Reliability Prediction Method Based on Multiscale Feature Extraction and Dual Attention Mechanism. Applied Sciences. 2025; 15(7):3662. https://doi.org/10.3390/app15073662

Chicago/Turabian Style

Luo, Xudong, and Minghui Wang. 2025. "Bearing Lifespan Reliability Prediction Method Based on Multiscale Feature Extraction and Dual Attention Mechanism" Applied Sciences 15, no. 7: 3662. https://doi.org/10.3390/app15073662

APA Style

Luo, X., & Wang, M. (2025). Bearing Lifespan Reliability Prediction Method Based on Multiscale Feature Extraction and Dual Attention Mechanism. Applied Sciences, 15(7), 3662. https://doi.org/10.3390/app15073662

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop