Next Article in Journal
AI-Powered Analysis of Eye Tracker Data in Basketball Game
Previous Article in Journal
A Practical Method for Red-Edge Band Reconstruction for Landsat Image by Synergizing Sentinel-2 Data with Machine Learning Regression Algorithms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Remaining Useful Life Prediction for Rolling Bearings Based on TCN–Transformer Networks Using Vibration Signals

1
Xi’an Key Laboratory of Extreme Environment and Protection Technology, School of Aerospace Engineering, Xi’an Jiaotong University, Xi’an 710049, China
2
Xi’an Institute of Electromechanical Information Technology, Xi’an 710065, China
3
Xi’an Modern Control Technology Research Institute, Xi’an 710065, China
*
Authors to whom correspondence should be addressed.
Sensors 2025, 25(11), 3571; https://doi.org/10.3390/s25113571
Submission received: 22 April 2025 / Revised: 30 May 2025 / Accepted: 3 June 2025 / Published: 5 June 2025
(This article belongs to the Section Fault Diagnosis & Sensors)

Abstract

:
Remaining useful life (RUL) prediction plays a core role in industrial prognostics and health management (PHM), requiring data-driven models with higher predictive capability for accurate long time series prediction. Developing reliable deep learning-based models based on multi-sensor monitoring data is fundamental for accurately predicting vibration trends during bearing operation and is crucial for bearing fault diagnosis and RUL prediction. In this work, a method for constructing a health index based on vibration signal is developed to describe the performance features of rolling bearings, which mainly includes feature extraction, sensitive feature index selection, dimensionality reduction, and normalization methods. In addition, a new RUL prediction method, TCN–Transformer, is developed which can efficiently learn and integrate local and global features of vibration signals, addressing the long time series prediction problem in RUL prediction. The TCN extracts local features, while the Transformer learns global features, both of which are seamlessly integrated through a specially designed feature fusion attention module. Both the health indicator (HI) constructed from extracted time domain and frequency domain feature parameters and the RUL prediction method were rigorously validated using the IEEE PHM 2012 Data Challenge dataset for rolling bearing prognostics. By employing the proposed HI construction method, the average comprehensive bearing performance index, used to evaluate RUL prediction accuracy, is improved by 8.69% across the entire dataset compared to the original feature-based composite index. The proposed RUL prediction model can more accurately predict the RUL of rolling bearings under different conditions, reducing the RMSE and MAE by 14.62% and 9.26%, respectively, and improving the SCORE by 13.04%. These results underscore the efficacy and superiority of our approach in RUL prediction of rotating machinery across varying conditions.

1. Introduction

The deep learning method has demonstrated exceptional performance across various aspects of the industrial sector, particularly in health monitoring and the intelligent operation and maintenance of critical industrial components. Prognostics and health management (PHM), which includes fault detection, diagnosis, and remaining useful life (RUL) prediction, has recently attracted great research interest [1,2,3,4]. PHM based on using deep learning methods has great potential in the capability of deploying these maintenance strategies provides the opportunity of setting efficient, just-in-time and just-right maintenance strategies [5,6]. Rolling bearings are key components in rotating machinery, directly affecting the safety of the entire mechanical system. Vibration monitoring is crucial for early fault detection, localization, and differentiation [7,8,9,10,11]. Developing reliable deep learning-based models based on multi-sensor monitoring data is fundamental for accurately predicting vibration trends during bearing operation and is crucial for bearing fault diagnosis and RUL prediction [12,13,14,15].
Generally, PHM primarily employs various techniques to analyze monitoring data, extract discriminative knowledge, and assess the health status of mechanical equipment. It is generally expected to achieve three functions: health status monitoring, fault diagnosis, and RUL prediction. Among them, RUL estimation is considered to be the most challenging task because the continuous use time of mechanical equipment is inconsistent, and it is difficult to accurately extract sensitive degradation features under different degradation modes [16,17,18]. Constructing a health index (HI) to describe performance features from continuous operational signals is a critical prerequisite for effective RUL prediction using data-driven methods. Deep learning-based methods, such as Recurrent Neural Networks (RNNs) and their variants, are increasingly being utilized to extract identifiable degradation features through their specialized cyclic memory structures, establishing themselves as a prominent area of research [19].
Current RUL prediction is generally divided into three categories: physical model-based methods, data-driven techniques, and hybrid strategies [20]. Physical model-based methods are established according to component damage mechanisms and deterioration laws of specific failure modes, with prominent examples including Fatigue Crack Growth (FCG) [21] and Fatigue Spall Progression Life (FSPL) models [22]. These approaches describe structural degradation evolution through physical mechanisms but typically need substantial prior knowledge, making accurate degradation estimation difficult in complex conditions [23,24,25]. In contrast, data-driven methods construct models based on sensor data without depending on particular degradation patterns, using extensive historical data for empirical learning [26]. With the progress of machine learning, data-driven approaches are being applied more frequently in industrial applications to learn empirical patterns from historical data.
The advent of machine learning technologies has significantly influenced RUL prediction development. Bearing sensor monitoring data are time series data. Therefore, the problem of bearing degradation trend prediction is essentially a regression problem related to time series [27]. Consequently, the ability of the constructed model to learn effective time information is crucial to the final prediction result. Traditional machine learning-based prediction methods usually require a feature extraction process before prediction. The procedure involves first extracting a set of features from condition monitoring data and then inputting these features into a machine learning model to perform the RUL prediction task. Traditional machine learning-based methods do not consider the correlations between time series signals that reflect the changes in the health state of mechanical equipment. Additionally, they typically rely on manually extracting features from raw sensor data, estimating health indicators, degradation states, and predicting RUL using failure thresholds. With the advancements in deep learning technologies, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers [28], the application of this method has garnered increasing attention. Deep learning techniques possess the capability to analyze high-dimensional data and automatically extract features. Deep learning-based methods have emerged and achieved remarkable results across various fields, primarily due to their robust capability to map the relationship between degradation paths and measurement data, and their ability to automatically learn degradation features, thus eliminating the need for manual feature extraction and expert knowledge of mechanical systems. Among these methods, the RNN and Long Short-Term Memory (LSTM) models have been particularly prominent in RUL prediction tasks, effectively utilizing temporal information [29]. Additionally, both BiLSTM and LSTM models demonstrate strong performance in time domain health monitoring applications [30,31].
However, as the service life of mechanical equipment continues to extend, long-term degradation behavior prediction becomes increasingly essential, and the shortcomings of RNN-based prediction frameworks are gradually exposed, mainly in the following aspects: (1) RNNs’ inability to process time series in parallel, necessitating strict chronological order; (2) difficulties in memorizing long-term historical data, leading to error accumulation in predictions; and (3) increased computational complexity due to the intricate gating structures of RNN variants like LSTM and Gated Recurrent Units (GRUs) [32]. Therefore, how to process long time series efficiently and accurately has become an urgent problem that needs to be solved.
Furthermore, those existing models have fully succeeded in effectively learning local correlation features and global features, which is the key to RUL prediction. Recently, Transformer-based models have successfully learned global features for different types of data, including time series [33]. Their attention mechanisms enhance training speed, support parallel computing, and improve accuracy compared to RNNs. The unique output mechanism of the Transformer-based models can greatly reduce the error accumulation in the prediction process. In response to the unique challenges in time series modeling tasks, many variants of the Transformer model have been developed, which have been successfully applied to a variety of time series tasks, including but not limited to prediction, anomaly identification, and classification problems. However, directly applying these models to multivariate time series data for bearing vibration prediction may not fully utilize the inherent characteristics of the data, such as temporal dynamics and the relationship between different dimensions. It often fails to capture the overall feature distribution of the time series, which limits its prediction effect.
In contrast, CNN-based architectures demonstrate superior local pattern extraction capabilities through their hierarchical filter structures [34]. Temporal Convolutional Networks (TCNs) [35] further enhance this advantage by incorporating dilated convolutional layers, preserving CNN’s inherent local feature extraction while effectively capturing long-range temporal dependencies in sequential data [36]. This complementary functionality motivates the integration of Transformer architectures with TCNs for enhanced time series representation learning. While existing studies have employed serial arrangements of these models for temporal data processing [37], such sequential architectures often neglect the intrinsic interplay between local and global features in vibration signals. A more effective approach requires independent learning of hierarchical features, followed by deliberate fusion through optimized integration mechanisms.
The main contributions of this study are summarized as follows:
A method for constructing a health index based on vibration signal (HIVS) is developed to describe the performance features of rolling bearings. The bearing vibration signals can be decomposed into eight wave packets using wavelet transformation, resulting in an initial feature set comprising 32 feature indexes that capture the signal characteristics. These indexes are derived from the original vibration signals across the time domain, frequency domain, and time–frequency domain. Subsequently, irrelevant and redundant features are filtered out, retaining eight key sensitive feature indexes. Finally, using the principal component analysis (PCA) method, these sensitive feature indexes are reduced from high-dimensional space to one dimension and then normalized, thereby constructing a HIVS that can significantly indicate the performance status of the bearings.
A new RUL prediction model, named TCN–Transformer, has been developed to efficiently learn and integrate both local and global features from bearing vibration signals, thereby addressing the challenges associated with long time series predictions in RUL estimation. The model utilizes TCN to integrate signals from different frequency domains, while leveraging the Transformer’s capabilities to process time domain signals, effectively handling the complexities of bearing life evolution.
The subsequent sections of this paper are structured as follows: In Section 2, the methods of feature extraction and HI construction for rolling bearings using vibration signals are proposed, including feature extraction, sensitive feature index selection, dimensionality reduction, and normalization. In Section 3, the RUL prediction model, TCN–Transformer, is proposed, and the RUL process for rolling bearings is introduced. In Section 4, both the HI construction and RUL prediction methods are tested and verified on the IEEE PHM 2012 Data Challenge dataset for RUL prediction of rolling bearings. Finally, conclusions are summarized in Section 5.

2. Feature Extraction and Health Index Construction Method

Rolling bearings generate complex vibration signals during operation, which contain a large amount of information about their health status. Through effective feature extraction, key indexes describing the bearing degradation state, such as vibration amplitude, frequency distribution, etc., can be identified from these signals. These indexes can intuitively reflect the working status of the bearing and are important indicators for evaluating bearing performance.
This work first extracts 32 feature indexes in the time domain, frequency domain, and time–frequency domain from the original vibration signals to construct an initial feature set. Then, based on the four feature evaluation indices of monotonicity, correlation, predictability, and robustness, irrelevant and redundant features are filtered out from the original feature set, and eight key sensitive feature indexes that can accurately reflect the performance of rolling bearings are retained. Finally, these sensitive feature indexes are reduced from high-dimensional space to one dimension and then normalized, thereby constructing a HI that can significantly indicate the performance status of the bearings.

2.1. Feature Extraction Method

In our previous work [8], it was demonstrated that vibration signals of rolling bearings can be decomposed into multiple characteristic waveforms in the frequency domain through empirical functional decomposition.
As bearing performance degrades over time, both frequency domain and time domain characteristics exhibit temporal variations. Considering that the four primary failure modes of bearings, fatigue pitting, plastic deformation, wear, and cage damage, are mutually independent, selecting several dominant features to characterize this model is of significant importance for reducing model parameters.
  • Time domain feature extraction
Usually, feature extraction methods include three categories: time domain analysis, frequency domain analysis, and time–frequency domain analysis. Time domain features include basic statistics (such as mean, variance, peak, etc.) and advanced statistical indicators (such as skewness, kurtosis, etc.). These features can describe the distribution characteristics and change patterns of vibration signals from different perspectives. However, the reliability of many parameters will decrease as the fault progresses, after reaching a certain level. By extracting and analyzing these time domain features, a bearing HI can be constructed, thereby realizing early identification and prediction of bearing performance degradation. In this work, 10 dimensional and 9 dimensionless time domain features are extracted from the original vibration signals of rolling bearings, as listed in Table 1.
2.
Frequency domain feature extraction
Frequency domain feature extraction of rolling bearings is one of the key techniques for understanding and analyzing the health state. By converting the vibration signal from the time domain to the frequency domain, the main frequency components and their amplitudes in the signal can be identified, which is extremely important for detecting specific fault types of bearings. Frequency domain analysis is usually implemented with the help of Fourier transform, which can decompose time series data into a series of frequency components, thereby revealing the characteristic frequencies and harmonics of the working bearings. Frequency domain analysis has obvious advantages over time domain analysis in identifying complex fault modes. It can more accurately distinguish and locate early signs of bearing failure, especially small changes in the background of noise. In this work, five frequency domain features are extracted from the original vibration signals of rolling bearings, as listed in Table 2.
3.
Time–frequency domain feature extraction
When the performance of rolling bearings degrades, the energy of each node after wavelet packet decomposition will also change accordingly. Therefore, the wavelet packet energy of the vibration signal after wavelet packet decomposition can be used to select certain specific sub-band energies as characteristic indexes to characterize the degradation of rolling bearings. According to the literature [38], the Haar wavelet is used as the basis function to perform a three-layer wavelet packet decomposition on the vibration signals of the rolling bearing in this work. After decomposition, eight sub-bands are obtained, and the energy ratios of the eight sub-bands are used as the time–frequency feature indexes (S25–S32). The energy of the wavelet packet sub-band is defined as follows:
E j i t = n = 1 N x j n i t 2
where E j i t is the wavelet packet energy. N is the length of the node signal x j i t after wavelet packet decomposition.
The wavelet packet energy ratio reflects the performance degradation of the rolling bearing by calculating the energy ratio of the wavelet packet reconstructed signal at different time scales. The energy ratio p j i of each sub-band after wavelet packet decomposition is defined as follows:
p j i = E j i ( t ) j E j i ( t )

2.2. Constructing the Sensitive Feature Set for Rolling Bearings

The goal of feature selection is to find a set of feature subsets that are effective for performance assessment, which can ensure that the prediction performance is maintained at a good level while the feature dimension is reduced. The purpose of constructing a sensitive feature set is to select the features that are most sensitive to changes in the bearing state and best reflect its health status from a large number of original or derived features. It can not only significantly reduce the dimensionality of data and reduce the computational task of model training but also improve the robustness and interpretability of prediction models. By eliminating redundant and irrelevant features, feature selection helps focus on those most representative indexes, thereby providing a more accurate assessment of bearing performance. This work defines four evaluation indices for effective feature selection, namely monotonicity index, correlation index, predictive index, and robustness index, and establishes the process of selecting optimal features as a combinatorial optimization problem to construct sensitive feature set for rolling bearings.
Monotonicity refers to the unidirectional change tendency of features as performance degrades. This work uses Spearman’s rank correlation coefficient as the monotonicity index [39], and the formula is as follows:
M o ( Y ) = 1 6 i = 1 N d i 2 N ( N 2 1 )
where d is the difference between the two variables, and N is the total number of monitoring times during the entire performance degradation process.
Temporal correlation emphasizes the dependence of features on time series. This work uses the Pearson correlation coefficient to describe the correlation between features and time series. The formula is as follows:
M o ( Y ) = 1 6 i = 1 N y i y ¯ t i t ¯ i = 1 N y i y ¯ 2 i = 1 N t i t ¯ 2
where Y = [ y 1 , y 2 , , y N ] is the performance feature sequence, t i represents the i-th monitoring moment, t ¯ and y ¯ represent their means. N is the total number of monitoring times in the entire performance process.
Predictability means that the features can provide enough information to predict the future state. Predictive index is defined as follows:
P r e ( Y ) = exp σ ( y f ) y ¯ f y ¯ s
where y is the performance characteristic, y ¯ s is the mean at the initial moment, y ¯ f is the mean at the failure moment, and σ ( y f ) is the standard deviation at the failure moment.
Robustness refers to the ability of a feature set to maintain its predictive performance in the face of various disturbances and noise. Robustness index is adopted as follows:
R o b ( Y ) = 1 N n = 1 N exp y i y ˜ i y i
where Y = [ y 1 , y 2 , , y N ] is the performance feature sequence, N is the total number of monitoring times in the entire performance process, and Y = y ˜ 1 , y ˜ 2 , , y ˜ N , is the trend sequence of the corresponding performance feature.
Relying only on a single evaluation indicator to evaluate the features is often incomplete for performance degradation assessment. To fully utilize multiple evaluation indices, and assuming that the discrete features can be approximated linearly, a weighted linear combination model integrating multiple evaluation indices is constructed to determine performance degradation features, in view of the important role of monotonicity, correlation, predictability, and robustness in the performance prediction of rolling bearings. A comprehensive index is introduced and defined as follows:
J = w 1 M o n ( Y ) + w 2 C o r r ( Y , T ) + w 3 P r e ( Y ) + w 4 R o b ( F ) , w i > 0 i w i = 1 , i = 1 , 2 , 3 , 4
For the characterization and prediction of the performance of rolling bearings, the key is to select features that reflect the overall degradation trend because the degradation of rolling bearings is monotonic and irreversible. Therefore, in the comprehensive index, the importance of monotonicity should be fully considered, and its weight is relatively high. Performance degradation is a continuous process, and its changes show certain regularity in time series. Considering temporal correlation can help understand the dynamic degradation process and extract features that can reflect the degradation trend, thereby improving the time sensitivity and accuracy of the prediction model. However, in experiments, it was found that the extracted features usually have higher robustness, resulting in a decrease in robustness discrimination, so the weight is lower. By referring to the parameter settings and experimental results in the literature, the weights of w 1 , w 2 , w 3 , w 4 are set to 0.4, 0.4, 0.1, and 0.1, respectively [40].

2.3. Dimensionality Reduction Method for Sensitive Feature Index

PCA, as a widely adopted statistical technique, can be used for data dimensionality reduction and feature extraction [41]. It converts multiple variables that may be correlated into a series of linearly independent variables through orthogonal transformation. These new variables are called principal components. In the application of dimensionality reduction in sensitive feature indexes, PCA shows significant advantages. By using PCA to reduce dimensionality, the complexity of the model can be effectively reduced, the computing efficiency can be improved, while overfitting can be avoided, and the generalization ability of the model can be enhanced.
The dimensionality reduction process of sensitive feature indexes based on PCA is shown in Figure 1, and the details are as follows:
  • Feature extraction of vibration signals. There are 32 feature indexes used here, namely 10 dimensionless time domain indexes (S1–S10), 9 dimensionless time domain indexes (S11–S19), 5 frequency domain characteristics (S20–S24), and 8 time–frequency domain indexes (S25–S32), regarding the energy ratio of sub-bands. Using the original vibration signal, a total of 32 feature indexes above are extracted to form the original feature set.
  • Sensitive feature index selection based on evaluation indices. Based on the comprehensive index that takes into account the monotonicity, correlation, predictability, and robustness of the features, eight sensitive feature indexes of rolling bearings are selected.
  • Dimensionality reduction in sensitive feature indexes. The selected sensitive degradation features are input into the PCA algorithm as input data, and the first principal component is extracted as the rolling bearing performance degradation feature index after dimensionality reduction.

2.4. Constructing the Health Index for Rolling Bearing

After selecting, the eight feature indexes are spliced into a 2D array by column, in which each column represents a feature vector. This 2D array is input into the PCA algorithm as input data, and the first principal component is extracted as the performance degradation feature index of rolling bearing after dimensionality reduction. Then, the obtained performance feature index is processed to remove outliers and then normalized by min–max normalization approach to obtain its health index. Finally, the health index is smoothed by the moving average method to obtain the final performance degradation trend of the rolling bearing. The normalization formula is as follows:
x nor = x x min x max x min
where x nor is the normalized value, and x max and x min are the maximum and minimum values of this group of sequences, respectively. The moving average formula is as follows:
y t = i = 1 n ( x t i + x t + i ) + x t 2 n + 1
where y t is the smoothed value obtained at time t. x t is the value of original sequence at time t, and n is the window size of sliding average.

3. TCN–Transformer Networks and RUL Prediction

As a structural innovation model based on CNN, TCN achieves overall parallel processing capabilities for long-term sequences by using a unified filter in each layer. Compared with traditional recursive network models, such as LSTM and GRU, TCN is more concise and clearer in structural design, while also improving accuracy [35]. TCN can effectively adjust the size of the receptive field by stacking more causal convolution layers, increasing the expansion factor, and increasing the number of filters, thereby flexibly controlling the memory usage of the model. Faced with the common problem of gradient explosion or disappearance in RNN, TCN effectively avoids this problem with its unique backpropagation path and ability to handle different sequence times. In addition, TCN can significantly shorten the training cycle due to its low memory requirements when processing long-term sequences.
Given the Transformer’s limitations in effectively utilizing information across multi-parameter temporal domains, this study adopts two key strategies: (1) manual selection of critical feature parameters, and (2) processing of cross-feature temporal variations through a dedicated TCN module. In this architecture, the Transformer’s role is specifically focused on applying attention mechanisms to perform RUL (remaining useful life) prediction using these processed features.
  • Causal convolution
TCN needs to meet two major requirements: (1) ensuring that the network output length is consistent with the input length, using a one-dimensional fully convolutional network (FCN), and keeping the length of each layer unchanged by zero padding; (2) ensuring that future inputs will not affect past inputs, which is achieved by using causal convolution to eliminate the interference of future elements, as the output at any time t is only related to the current and previous input elements.
2.
Dilated convolution
When processing historical data, causal convolution requires more hidden layers as the depth of historical data increases, which requires a deeper network structure or more filters. To solve this problem, dilated convolution technology was introduced [24]. By adding holes to standard convolution, dilated convolution can effectively expand the size of the receptive field, so that the output data can cover a wider range of information without losing data in the pooling layer.
3.
Residual module
In order to solve the performance degradation problem caused by the increase in network depth, the concept of residual module is proposed. In the residual module, the rectified linear unit (ReLU) is used as the activation function, and the weight normalization method is used to normalize the weight of the convolution filter for normalization. At the same time, in order to further enhance the generalization ability of the model, a spatial dropout step is added for regularization after each dilated convolution operation, that is, the output of the entire channel is randomly set to zero in each step of the training process. In addition, to deal with the shape mismatch problem between input and output, an additional 1 × 1 convolution layer is introduced in TCN to ensure that tensors of the same shape can be transferred between different layers, as shown in Figure 2.

3.1. Construction of TCN–Transformer Networks

The TCN–Transformer architecture employs a hierarchical parallelization approach to integrate TCN and Transformer components. In the design, the TCN layer extracts local temporal features from the input sequence, while the Transformer module captures global data patterns. These learned representations are subsequently combined using a multi-head feature fusion attention module. The parallel structure concludes by concatenating both branches’ outputs, followed by a fully connected layer that projects them to the target dimension while maintaining the original time series structure. Compared with conventional TCN and Transformer, TCN–Transformer introduces two key innovations: (1) a hierarchical parallel architecture that simultaneously utilizes the Transformer block’s local window self-attention and the TCN block’s deep convolutional operations; (2) the incorporation of a specialized multi-head features fusion attention module for effective branch feature integration. Figure 3 illustrates the framework of the TCN–Transformer network.
(1)
Hierarchical parallel design
As shown in Figure 3, the TCN–Transformer network comprises parallel computation flows for both the Temporal Convolutional Network (TCN) and the Transformer (Trans) modules. Inside the TCN flow, there are two dilated causal convolution layers with weight normalization, which constitute a hidden layer of the TCN model. The input of the TCN layer is the data after the rolling bearing vibration signal is preprocessed or the features of the local and global features fused from the previous layer, and the output of the TCN layer is the local features of the bearing signal. The Transformer module has a multi-head self-attention mechanism and a feedforward neural network, both equipped with layer normalization. Similarly, the input to this module comprises the preprocessed rolling bearing vibration signal data, which includes the fused local and global features from the prior layer. The output of the Transformer module captures the global features of the bearing’s full life vibration signal [42].
(2)
Multi-head feature fusion attention module
The multi-head feature fusion attention module contains two attention mechanisms, which aims to establish an interaction between two parallel branches to fuse local and global features. As shown on the right side of Figure 3, the output of the TCN layer Y i ˜ R t × c and the output of the Transformer module Z ˜ i R t × c interact in the multi-head feature fusion attention module to bidirectionally fuse local features Y i ˜ and global features Y i ˜ . Specifically, the output value of the TCN layer is updated by residual connection with the multi-head feature fusion attention module to obtain the new TCN layer output value Y i + 1 , as described below:
Y i + 1 = Y ˜ i + A Z ˜ i Y ˜ i Z i W e v
where W e v is the learnable parameter of embedding layer, and A Z ˜ i Y ˜ i is the fusion matrix from Transformer to TCN, which can be calculated by matrix multiplication and Softmax function:
A Z ˜ i Y ˜ i = s o f t m a x ( Y ˜ i W e q ( Z ˜ i W e k ) T c )
where W e q and W e q are the learnable parameters of the two linear layers. Similarly, the output value of the updated Transformer module is defined as follows:
Z i + 1 = Z ˜ i + A Y ˜ i Z ˜ i Y i W d v
A Y ˜ i Z ˜ i = s o f t m a x ( Z ˜ i W d q ( Y ˜ i W d k ) T c )
where W d v , W d q , and W d k are the learnable parameters of the three linear layers. A Y ˜ i Z ˜ i is the fusion matrix from TCN to Transformer.

3.2. RUL Prediction Based on TCN–Transformer

This section describes the RUL prediction process of rolling bearing based on the TCN–Transformer networks in detail. The flowchart is shown in Figure 4. The specific steps are as follows:
(1)
Data input. The original vibration signal data of the rolling bearing is processed. According to the method proposed, the original vibration signal is extracted in the time domain, frequency domain, and time–frequency domain. Subsequently, sensitive features are selected to construct a feature set. The selected sensitive degradation feature data is input into the model to train the model for remaining life prediction.
(2)
Dataset division. Referring to the most commonly used dataset division method, the dataset is divided into training set, validation set, and test set in a ratio of 7:1:2.
(3)
Model training. The training set data is input into the constructed TCN–Transformer networks. TCN–Transformer trains the model and completes the steps of forward propagation, backpropagation, and parameter optimization. The TCN–Transformer network with the optimal parameters is obtained.
(4)
Model prediction. Input the test set data into the optimal model trained in the third step and finally output the RUL prediction result of the rolling bearing.

4. Results and Discussion

4.1. Verification of Feature Extraction and Health Index Construction

The IEEE PHM 2012 Data Challenge published a rolling bearing full lifecycle dataset, which collected on the PRONOSTIA experimental system that was designed to test bearing fault detection, diagnosis, and RUL prediction methods [43]. The main goal of PRONOSTIA is to provide experimental data to describe the degradation process of rolling bearings throughout their service life. The IEEE PHM 2012 dataset provides accelerated degradation test data for a total of 17 bearings, namely, 7 bearings for condition 1 (4000 N, 1800 rpm), 7 bearings each for condition 2 (4200 N, 1650 rpm), and 3 bearings for condition 3 (5000 N, 1500 rpm). In this work, all the 17 bearings were used as a benchmark to test our prediction method, but only the results for seven bearings under condition 1, marked as Bearing 1-1, 1-2, 1-3, 1-4, 1-5, 1-6, and 1-7, respectively, were selected for further investigation. When using vibration signals to track the degradation state of rolling bearings, the horizontal vibration signal usually carries more degradation information than the vertical vibration signal [44]. Therefore, this work only uses horizontal experimental data.
The 32 feature indexes introduced above are extracted to form the original dataset for the seven bearings. Subsequently, these 32 indices are selected accordingly. Considering that the failure vibration signals of the bearings can be represented by eight wave packets, the assessment of bearing failure utilizes the 32 indices, which are comprehensive in nature. For the dataset of each bearing, the monotonicity index, correlation index, predictive index, robustness index, and comprehensive index are calculated. The statistical results of the feature indexes are shown in Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9. Only the data of bearings 1-1, 1-2, and 1-3 are shown in the charts. In particular, Figure 9 shows the comprehensive index statistics of the three bearings, and also gives the average values of the comprehensive index of the seven bearings under working condition 1.
The feature indexes are sorted according to the average value of the comprehensive index, as shown in Figure 10. Only the eight feature indexes with the highest average value of the comprehensive index are selected, which are the energy ratio of the third frequency sub-band (S27), the energy ratio of the seventh frequency sub-band (S31), the energy ratio of the fourth frequency sub-band (S28), the energy ratio of the eight frequency sub-band (S32), the kurtosis index (S14), the energy ratio of the second frequency sub-band (S26), the center of gravity frequency (S20), and the minimum value (S3), respectively. To make a further demonstration, the degradation trend of the eight selected sensitive feature indexes over time is shown in Figure 11.
Figure 12 shows the health index trend of the seven bearings obtained using the feature extraction, selection, and PCA dimensionality reduction methods. It can be seen that the health index of rolling bearings obtained based on the proposed method in this work has good monotonicity and time series correlation. Subsequently, the monotonicity, correlation, predictability, robustness, and comprehensive indexes of the selected feature indexes for the seven datasets are calculated, as given in Table 3. It can be seen that the obtained monotonicity, correlation, predictability, robustness, and comprehensive indexes are relatively good, all above 0.8. Meanwhile, the PCA result indicates that the seven main features can account for 91.387% of the variance in the data. For industrial problems, such explanatory power is sufficient. In addition, all the comprehensive indexes obtained in this work are all higher than the highest original comprehensive index for each feature set. The average comprehensive index of the seven bearings is 0.8995, improved by 8.69% on average.

4.2. Experiment and Verification of the TCN–Transformer Networks

This section uses the IEEE PHM 2012 rolling bearing full life dataset as the verification dataset for the proposed method. The detailed information of the dataset used in this section is shown in Table 4. It shows information such as working conditions and actual life. In order to better verify the effectiveness of the proposed method, this paper refers to the task setting of previous literature [45] and carefully sets up six groups of RUL estimation tasks to evaluate the prediction performance of the proposed method. The specific arrangement is shown in Table 5.
The training of the neural network based on 32 feature indexes was conducted on an Ubuntu 18.04 server equipped with four NVIDIA 2080 Ti 11 GB graphics cards, using PyTorch 1.0 as the deep learning framework.
First, data preprocessing should be carried out in the analysis. While deep learning models inherently can extract features directly from raw vibration signals (an approach widely adopted in PHM applications), the high computational demands and compromised prediction accuracy associated with processing full lifecycle raw data make this method impractical. To address these challenges in long-sequence RUL prediction, we employ the optimized eight-feature degradation set as the model input, achieving significant reduction in data dimensionality while preserving degradation signatures, and elimination of noise interference in raw signals. This preprocessing strategy balances computational efficiency with predictive performance.
Next, RUL labeling and normalization are implemented to standardize the prognostic framework. Since operational conditions vary significantly, the absolute RUL values (measured in cycles) naturally differ in magnitude. Direct use of these unprocessed values as training labels would adversely affect both the model’s convergence rate during training and its generalization performance during deployment. To address this, we adopt normalized RUL values—a well-established practice in prognostic modeling [46]. The normalization process computes the relative degradation state by taking the ratio of current RUL to the entire RUL value, expressed as follows:
R U L t = T t o t a l t
R U L t n o r m = R U L t T t o t a l
where T t o t a l is the total life time, and the normalized remaining life R U L t n o r m is between 0 and 1.
Effective sample embedding obviously influences the predictive capability of data-driven prognostic models. Utilizing isolated single-time step data as model input fails to capture the essential temporal dependencies between current and historical degradation states. To address this limitation, we implement a temporal window embedding strategy [46] that explicitly models degradation continuity through causal relationships. This method employs a fixed time window WL to sequentially concatenate multiple time steps of monitoring data and treats each window as an independent input. The embedded sample consists of the current time step MS and the previous L − 1 time step M S s , noted as follows:
M S I n p u t t = ( M S t L + 1 , , M S t 2 , M S t 1 , M S t ) W L
where L is the time window length. In order to obtain as many training samples as possible, the moving step of the time window is generally set to 1.
In order to make a comparison with the existing advanced prediction models, three criteria of root mean square error (RMSE), mean absolute error (MAE), and scoring function (SCORE) are used to evaluate the prediction performance. Among them, the first two can evaluate well the fitting ability and prediction accuracy of each prediction model, and the calculation formulas are as follows:
R M S E = 1 n i = 1 n ( y ^ i y i ) 2
M A E = 1 n i = 1 n y ^ i y i
where y ^ i is the predicted value at time i, y i is the true value at time i, and n represents the total number of samples. In addition, the last evaluation criterion is used to evaluate the rationality of the prediction results, that is, the proactive prediction and the lagging prediction are not considered with a unified standard. In the actual service environment, the risk brought by the advanced prediction is much smaller than that of the lagging prediction. Therefore, a good prediction RUL tends to be a conservative prediction, so this scoring algorithm imposes a penalty on the lagging prediction. Therefore, SCORE can be defined as follows [43]:
S C O R E = 1 N 1 i = 1 N 1 A i ; A i = e ln ( 0.5 ) ( E r i / 5 ) , E r i 0 e + ln ( 0.5 ) ( E r i / 20 ) , E r i > 0
E r i = y i y ^ i y i × 100 %
where N is the length of the prediction data, and E r i is the percentage error.
In order to further clarify the advantages of TCN–Transformer in RUL prediction, this work makes two comparisons. One is to compare the RUL prediction performance of TCN–Transformer networks, TCN model, and Transformer model, and the other is to compare the proposed model with several baseline models and advanced prediction models. To explore the degree of improvement of the TCN–Transformer networks on each task, the improvement index (IMP) is calculated as follows:
I M P = 1 T T C M T × 100 %
wherein TT represents the evaluation index value of the TCN–Transformer networks, and CMT represents the evaluation index value of the current optimal model. During the model training process, the settings of various hyperparameters are shown in Table 6. At the same time, 10 cross-validation experiments are performed, where 70% of the data is used for training, 10% for validation, and 20% for testing. Each task obtained the average results to avoid the randomness of the prediction results.
Figure 13 displays the comparative results between predicted and actual RUL values for the six tasks, including detailed error analysis. The TCN–Transformer demonstrates remarkable tracking capability, with its prediction curve maintaining close proximity to the truth RUL, thereby successfully capturing the bearing’s degradation characteristics and providing initial validation of the model’s efficacy. Particularly noteworthy is the model’s superior prediction performance during approximately 80% of the lifespan, while most deviations tend to emerge in the final degradation stage. This may originate from the traditional linear degradation assumption in RUL prediction, which is valid during stable operation. However, actual failure process often exhibits complex nonlinear behavior, especially during accelerated deterioration stages where degradation rates follow exponential growth patterns. Simultaneously, during the advanced stages of degradation, significant variations in degradation rates are observed among different bearings, indicating a discrepancy between the assigned training labels and actual degradation states. The prediction errors of our model predominantly occur in the later degradation stages. Furthermore, the cumulative effect of these errors amplifies the observed divergence.
Table 7 shows the comparison between TCN–Transformer networks and other advanced models. The baseline models compared here include RNN, LSTM, and GRU, and the five most advanced prediction models include Dual-LSTM [47], LSTM-AON [48], BiGRU-GSA [49], TCN-RSA [49], and TFT [45]. Compared with the prediction results of baseline and state-of-the-art models (even including the current best-performing TFT model), the TCN–Transformer networks proposed in this work achieved the best evaluation index results. The IMP of RMSE, MAE, and SCORE are 14.62%, 9.26%, and 13.04%, respectively, which shows that the proposed model is more competitive in RUL prediction. Specifically in terms of the rationality of RUL prediction, the SCORE of the TCN–Transformer networks is significantly higher than other models, which is the key to evaluating the reliability of RUL prediction. Figure 14 is a visual comparison between TCN–Transformer and other models, which more intuitively shows that the model proposed has the best results in the three evaluation indices. The above comparative analysis further verified the advantages of TCN–Transformer.

4.3. Ablation Experiment and Results of TCN–Transformer Networks

The TCN–Transformer networks are compared with the TCN model and Transformer model on six tasks. Table 8 shows the results of TCN–Transformer networks ablation experiment. Table 5 presents the composition of the experimental dataset. It can be seen that, compared with the TCN model and the Transformer model, the prediction effect of the TCN–Transformer networks has improved on all tasks. The average IMP of RMSE, MAE, and SCORE reached 39.67%, 38.07%, and 26.63%, respectively, which shows that the proposed TCN–Transformer networks can obviously improve the accuracy and rationality of RUL prediction. In addition, the IMP degree of RMSE and MAE is significantly higher than that of SCORE, which indicates that the RUL prediction curve of the proposed TCN–Transformer networks is more consistent with the real RUL curve. The comparison proves that the proposed TCN–Transformer network is superior to the original TCN model and Transformer model, and also proves that the hierarchical parallel design of the TCN–Transformer networks and the multi-head feature fusion attention module are reasonable and effective.
The main purpose of this paper is to develop an advanced technical framework to optimize the performance monitoring and RUL prediction of rolling bearings, contributing to the health monitoring and maintenance methods of key mechanical components in intelligent manufacturing. First, by extracting and selecting key features, a HI set that can accurately reflect bearing performance degradation is constructed, effectively solving the problem of characterizing the performance degradation state. Second, by combining the TCN model and Transformer model, TCN–Transformer networks are proposed, which can efficiently learn and integrate local features and global features, providing a new solution for RUL prediction. These methods not only demonstrate the great potential of deep learning in complex system monitoring and prediction but also provide support for maintenance decisions in practical applications, especially in the field of preventive maintenance. In addition, these methods have broad application prospects which are not limited to rolling bearings or rotating machinery. They can also be extended to performance monitoring and life prediction of other key industrial components, thereby bringing a wider impact to the field of intelligent manufacturing.
Based on the innovative points of this work, there will be more future research focused on aspects such as the mathematical interpretability of the model, parameter optimization, and improvement of the performance of individual modules. Meanwhile, the frequency of the sensor is also of great significance for the performance study of the model. This requires a more systematic experimental platform and data analysis.

5. Conclusions

In this work, a method for constructing a HIVS is first developed to describe the performance features of rolling bearings. And then, a new RUL prediction model, TCN–Transformer, was developed to efficiently solve the long time series prediction problem in RUL prediction. The conclusions are summarized as follows:
  • The method for constructing a HIVS was developed to describe the performance features of rolling bearings. The eight sensitive feature indexes that can accurately reflect the performance of rolling bearings were selected from the 32 indexes to construct the feature set, and then the obtained sensitive feature index after dimensionality reduction was processed to remove outliers and then normalized to obtain the HI. The average comprehensive index of bearings improved by 8.69% on average.
  • The TCN–Transformer employs a hierarchical parallel architecture combining TCN and Transformer modules, achieving higher computational efficiency and a more compact network scale. Compared with classical standalone TCN or Transformer networks, our approach significantly reduces the required number of channels through feature compression. The outputs from the TCN and Transformer modules interact through a novel multi-head feature fusion attention mechanism, enabling bidirectional integration of local temporal patterns (captured by TCN) and global dependencies (learned by Transformer). This specialized attention module dynamically prioritizes the most discriminative features extracted by both sub-networks, ensuring precise focus on performance-critical characteristics for RUL prediction.
  • Compared with existing methods, the proposed TCN–Transformer demonstrates superior accuracy in predicting the RUL of rolling bearings across diverse operating conditions. Specifically, in ablation studies, TCN–Transformer outperforms both the standalone TCN and Transformer models, achieving consistent improvements across all evaluation tasks. When compared with state-of-the-art methods, TCN–Transformer reduces RMSE and MAE by 14.62% and 9.26%, respectively, while improving the SCORE metric by 13.04%. These results conclusively validate the superiority of our approach in RUL prediction.

Author Contributions

Conceptualization, X.J., Y.J., and S.F.; Methodology, X.J. and Y.J.; Validation, H.J. and S.F.; Formal Analysis, Y.J., S.L. and K.L.; Investigation, X.J., Y.J., S.L., K.L. and J.X.; Writing—Original Draft, X.J.; Writing—Review and Editing, X.J. and S.F.; Visualization, J.X.; Supervision, H.J. and S.F.; Funding Acquisition, X.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (12102320), and the National Major Science and Technology Project (J2019-IV-0003-0070). APC was funded by (12102320).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Zhao, Z.; Liang, B.; Wang, X.; Lu, W. Remaining useful life prediction of aircraft engine based on degradation pattern learning. Reliab. Eng. Syst. Saf. 2017, 164, 74–83. [Google Scholar] [CrossRef]
  2. AlShorman, O.; Irfan, M.; Saad, N.; Zhen, D.; Haider, N.; Glowacz, A.; AlShorman, A. A review of artificial intelligence methods for condition monitoring and fault diagnosis of rolling element bearings for induction motor. Shock Vib. 2020, 2020, 1–20. [Google Scholar] [CrossRef]
  3. Li, Y.; Wang, S.; Li, N.; Deng, Z. Multiscale symbolic diversity entropy: A novel measurement approach for time-series analysis and its application in fault diagnosis of planetary gearboxes. IEEE Trans. Industr. Inform. 2022, 18, 1121–1131. [Google Scholar] [CrossRef]
  4. Ma, M.; Mao, Z. Deep wavelet sequence-based gated recurrent units for the prognosis of rotating machinery. Struct. Health Monit. 2021, 20, 1794–1804. [Google Scholar] [CrossRef]
  5. Zio, E. Prognostics and Health Management (PHM): Where are we and where do we (need to) go in theory and practice. Reliab. Eng. Syst. Saf. 2022, 218 Pt A, 108119. [Google Scholar] [CrossRef]
  6. Ahang, M.; Jalayer, M.; Shojaeinasab, A.; Ogunfowora, O.; Charter, T.; Najjaran, H. Synthesizing Rolling Bearing Fault Samples in New Conditions: A Framework Based on a Modified CGAN. Sensors 2022, 22, 5413. [Google Scholar] [CrossRef] [PubMed]
  7. Forest, F.; Fink, O. Calibrated Adaptive Teacher for Domain-Adaptive Intelligent Fault Diagnosis. Sensors 2024, 24, 7539. [Google Scholar] [CrossRef]
  8. Lv, K.L.; Jiang, H.N.; Fu, S.N.; Du, T.C.; Jin, X.C.; Fan, X.L. A predictive analytics framework for rolling bearing vibration signal using deep learning and time series techniques. Comput. Electr. Eng. 2024, 117, 109314. [Google Scholar] [CrossRef]
  9. Giraudo, L.; Di Maggio, L.G.; Giorio, L.; Delprete, C. Dynamic Multibody Modeling of Spherical Roller Bearings with Localized Defects for Large-Scale Rotating Machinery. Sensors 2025, 25, 2419. [Google Scholar] [CrossRef]
  10. Liu, X.R.; Yan, C.F.; Ming Lv Wu, L.X. Multi-rolling element faults diagnosis of rolling bearing based on time-frequency analysis and multi-curves extraction. Meas. Sci. Technol. 2024, 35, 106113. [Google Scholar] [CrossRef]
  11. Jiang, L.L.; Shi, C.Z.; Sheng, H.S.; Li, X.J.; Yang, T.G. Lightweight CNN architecture design for rolling bearing fault diagnosis. Meas. Sci. Technol. 2024, 35, 126142. [Google Scholar] [CrossRef]
  12. Wang, T.; Li, X.; Wang, W.; Du, J.; Yang, X. A spatiotemporal feature learning-based RUL estimation method for predictive maintenance. Measurement 2023, 214, 112824. [Google Scholar] [CrossRef]
  13. Guo, J.X.; Zhang, T.Y.; Xue, K.L.; Liu, J.H.; Wu, J.; Zhao, Y.D. Fault diagnosis of rolling bearing based on parameter-adaptive re-constraint VMD optimized by SABO. Meas. Sci. Technol. 2025, 36, 016174. [Google Scholar] [CrossRef]
  14. Kiakojouri, A.; Wang, L. A Generalized Convolutional Neural Network Model Trained on Simulated Data for Fault Diagnosis in a Wide Range of Bearing Designs. Sensors 2025, 25, 2378. [Google Scholar] [CrossRef]
  15. Wang, H.; An, J.; Yang, J.; Xu, S.; Wang, Z.M.; Cao, Y.; Yuan, W.Q. Remaining useful life prediction method of bearings based on the interactive learning strategy. Comput. Electr. Eng. 2025, 121, 109853. [Google Scholar] [CrossRef]
  16. Xu, Z.; Guo, Y.; Saleh, J.H. Accurate remaining useful life prediction with uncertainty quantification: A deep learning and nonstationary gaussian process approach. IEEE Trans. Reliab. 2021, 71, 443–456. [Google Scholar] [CrossRef]
  17. Hai, B.; Jiang, H.K.; Yao, P.; Wang, K.B.; Yao, R.H. Rolling bearing fault feature extraction using non-convex periodic group sparse method. Meas. Sci. Technol. 2021, 32, 105005. [Google Scholar] [CrossRef]
  18. Hu, C.F.; Liu, Z.J.; Xiao, X.W.; Jin, Y.F.; Wang, T.; Zhou, L.H.; Su, L. A degradation evaluation method with the convolutional neural network for the cyclic symmetry rolling bearing. Meas. Sci. Technol. 2025, 36, 016188. [Google Scholar] [CrossRef]
  19. Chang, Y.; Chen, Q.; Chen, J.; He, S.; Li, F.; Zhou, Z. Intelligent fault diagnosis scheme via multi-module supervised-learning network with essential features capture-regulation strategy. ISA Trans. 2022, 129, 459–475. [Google Scholar] [CrossRef]
  20. Zhu, J.; Chen, N.; Shen, C. A new data-driven transferable remaining useful life prediction approach for bearing under different working conditions. Mech. Syst. Signal Process. 2020, 139, 106602. [Google Scholar] [CrossRef]
  21. Shih, Y.S.; Chen, J.J. Analysis of fatigue crack growth on a cracked shaft. Int. J. Fatigue 1997, 19, 477–485. [Google Scholar] [CrossRef]
  22. Choi, Y.; Liu, C.R. Spall progression life model for rolling contact verified by finish hard machined surfaces. Wear 2007, 262, 24–35. [Google Scholar] [CrossRef]
  23. Chen, C.; Liu, Y.; Wang, S.; Sun, X.; Cairano-Gilfedder, C.; Titmus, S.; Syntetos, A.A. Predictive maintenance using cox proportional hazard deep learning. Adv. Eng. Inform. 2020, 44, 101054. [Google Scholar] [CrossRef]
  24. Aremu, O.O.; Hyland-Wood, D.; McAree, P.R. A Relative Entropy Weibull-SAX framework for health indices construction and health stage division in degradation modeling of multivariate time series asset data. Adv. Eng. Inform. 2019, 40, 121–134. [Google Scholar] [CrossRef]
  25. Caravaca, C.F.; Flamant, Q.; Anglada, M.; Gremillard, L.; Chevalier, J. Impact of sandblasting on the mechanical properties and aging resistance of alumina and zirconia based ceramics. J. Eur. Ceram. Soc. 2018, 38, 915–925. [Google Scholar] [CrossRef]
  26. Wu, R.T.; Jahanshahi, M.R. Data fusion approaches for structural health monitoring and system identification: Past, present, and future. Struct. Health Monit. 2020, 19, 552–586. [Google Scholar] [CrossRef]
  27. Ye, R.; Dai, Q. A novel transfer learning framework for time series forecasting. Knowl.-Based Syst. 2018, 156, 74–99. [Google Scholar] [CrossRef]
  28. Dang, D.Z.; Su, B.Y.; Wang, Y.W.; Ao, W.K.; Ni, Y.Q. A pencil lead break-triggered, adversarial autoencoder-based approach for rapid and robust rail damage detection. Eng. Appl. Artif. Intel. 2025, 150, 110637. [Google Scholar] [CrossRef]
  29. Mao, W.; He, J.; Tang, J.; Li, Y. Predicting remaining useful life of rolling bearings based on deep feature representation and long short-term memory neural network. Adv. Mech. Eng. 2018, 10, 1–18. [Google Scholar] [CrossRef]
  30. Wang, Y.L.; Lu, Y.; Tan, Y.K.; Ao, W.K.; Ni, Y.-Q.; Tang, Q.-C. Bayesian optimization bidirectional LSTM approach for the condition assessment of underground-operating trains. J. Civ. Struct. Health Monit. 2025. [Google Scholar] [CrossRef]
  31. Silka, J.; Wieczorek, M.; Wozniak, M. Recurrent neural network model for high-speed train vibration prediction from time series. Neural Comput. Appl. 2022, 34, 13305–13318. [Google Scholar] [CrossRef]
  32. Caterini, A.L.; Chang, D.E. Recurrent neural networks. In Deep Neural Networks in a Mathematical Framework; SpringerBriefs in Computer Science; Springer: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
  33. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar] [CrossRef]
  34. Andrew, G.; Menglong, Z. Efficient convolutional neural networks for mobile vision applications, mobilenets. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
  35. Van Den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016, arXiv:1609.03499. [Google Scholar] [CrossRef]
  36. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
  37. Lin, L.; Xu, B.; Wu, W.; Richardson, T.W.; Bernal, E.A. Medical Time Series Classification with Hierarchical Attention-based Temporal Convolutional Networks: A Case Study of Myotonic Dystrophy Diagnosis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 83–86. [Google Scholar] [CrossRef]
  38. Zheng, X.; Qian, Y.; Wang, S. GRU prediction forperformance degradation of rolling bearings based on optimalwavelet packet and Mahalanobis distance. J. Vib. Shock 2020, 39, 9–46+63. (In Chinese) [Google Scholar] [CrossRef]
  39. Moradi, M.; Broer, A.; Chiachío, J.; Benedictus, R.; Loutas, T.H.; Zarouchas, D. Intelligent health indicator construction for prognostics of composite structures utilizing a semi-supervised deep neural network and SHM data. Eng. Appl. Artif. Intel. 2023, 117, 105502. [Google Scholar] [CrossRef]
  40. Wei, X.P. Deep Learning Based Health State Assessment and Remaining Life Prediction of Rolling Bearings. Master’s Thesis, Southwest Jiaotong University, Chengdu, China, 2021. (In Chinese). [Google Scholar] [CrossRef]
  41. Ao, W.K.; Hester, D.; O’Higgins, C.; Brownjohn, J. Tracking long-term modal behaviour of a footbridge and identifying potential SHM approaches. J. Civil Struct. Health Monit. 2025, 14, 1311–1337. [Google Scholar] [CrossRef]
  42. Liu, Y.; Wijewickrema, S.; Li, A.; Bester, C.; O’Leary, S.; Bailey, J. Time-transformer: Integrating local and global features for better time series generation. arXiv 2023, arXiv:2312.11714. [Google Scholar] [CrossRef]
  43. Nectoux, P.; Gouriveau, R.; Medjaher, K.; Ramasso, E.; Morello, B.; Zerhouni, N.; Varnier, C. PRONOSTIA: An experimental platform for bearings accelerated degradation tests. In Proceedings of the IEEE International Conference on Prognostics and Health Management, Denver, CO, USA, 18–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1–8. Available online: https://hal.science/hal-00719503v1 (accessed on 2 June 2025).
  44. Soualhi, A.; Medjaher, K.; Zerhouni, N. Bearing health monitoring based on hilbert-huang transform, support vector machine, and regression. IEEE Trans. Instrum. Meas. 2015, 64, 52–62. [Google Scholar] [CrossRef]
  45. Chang, Y.; Li, F.; Chen, J.; Liu, Y.; Li, Z. Efficient temporal flow Transformer accompanied with multi-head probsparse self-attention mechanism for remaining useful life prognostics. Reliab. Eng. Syst. Saf. 2022, 226, 108701. [Google Scholar] [CrossRef]
  46. Cao, Y.; Ding, Y.; Jia, M.; Tian, R. A novel temporal convolutional network with residual self-attention mechanism for remaining useful life prediction of rolling bearings. Reliab. Eng. Syst. Saf. 2021, 215, 107813. [Google Scholar] [CrossRef]
  47. Shi, Z.; Chehade, A. A dual-LSTM framework combining change point detection and remaining useful life prediction. Reliab. Eng. Syst. Saf. 2021, 205, 107257. [Google Scholar] [CrossRef]
  48. Xiang, S.; Qin, Y.; Zhu, C.; Wang, Y.; Chen, H. LSTM networks based on attention ordered neurons for gear remaining life prediction. ISA Trans. 2020, 106, 343–354. [Google Scholar] [CrossRef]
  49. Chang, Y.; Chen, J.; Lv, H.; Liu, S. Heterogeneous bi-directional recurrent neural network combining fusion health indicator for predictive analytics of rotating machinery. ISA Trans. 2022, 122, 409–423. [Google Scholar] [CrossRef]
Figure 1. Flowchart of dimensionality reduction in sensitive feature indexes based on principal component analysis method.
Figure 1. Flowchart of dimensionality reduction in sensitive feature indexes based on principal component analysis method.
Sensors 25 03571 g001
Figure 2. Framework of TCN model.
Figure 2. Framework of TCN model.
Sensors 25 03571 g002
Figure 3. Framework of proposed TCN–Transformer networks.
Figure 3. Framework of proposed TCN–Transformer networks.
Sensors 25 03571 g003
Figure 4. The remaining useful life prediction process of rolling bearings based on the TCN–Transformer networks.
Figure 4. The remaining useful life prediction process of rolling bearings based on the TCN–Transformer networks.
Sensors 25 03571 g004
Figure 5. A statistical chart of the monotonicity index of the 32 feature indexes.
Figure 5. A statistical chart of the monotonicity index of the 32 feature indexes.
Sensors 25 03571 g005
Figure 6. A statistical chart of the correlation index of the 32 feature indexes.
Figure 6. A statistical chart of the correlation index of the 32 feature indexes.
Sensors 25 03571 g006
Figure 7. Statistical chart of predictive index of 32 feature indexes.
Figure 7. Statistical chart of predictive index of 32 feature indexes.
Sensors 25 03571 g007
Figure 8. A statistical chart of the robustness index of the 32 feature indexes.
Figure 8. A statistical chart of the robustness index of the 32 feature indexes.
Sensors 25 03571 g008
Figure 9. Statistical chart of comprehensive indexes of 32 feature indexes.
Figure 9. Statistical chart of comprehensive indexes of 32 feature indexes.
Sensors 25 03571 g009
Figure 10. Ranking of the values of the comprehensive indexes of the 32 feature indexes.
Figure 10. Ranking of the values of the comprehensive indexes of the 32 feature indexes.
Sensors 25 03571 g010
Figure 11. The selected sensitive feature indexes for Bearing 1-1: (a) the energy ratio of the third frequency sub-band (S27), (b) the energy ratio of the seventh frequency sub-band (S31), (c) the energy ratio of the fourth frequency sub-band (S28), (d) the energy ratio of the eight frequency sub-band (S32), (e) the kurtosis index (S14), (f) the energy ratio of the second frequency sub-band (S26), (g) the center of gravity frequency (S20), and (h) the minimum value (S3).
Figure 11. The selected sensitive feature indexes for Bearing 1-1: (a) the energy ratio of the third frequency sub-band (S27), (b) the energy ratio of the seventh frequency sub-band (S31), (c) the energy ratio of the fourth frequency sub-band (S28), (d) the energy ratio of the eight frequency sub-band (S32), (e) the kurtosis index (S14), (f) the energy ratio of the second frequency sub-band (S26), (g) the center of gravity frequency (S20), and (h) the minimum value (S3).
Sensors 25 03571 g011
Figure 12. The health index trend over time of the seven bearings obtained using the proposed method.
Figure 12. The health index trend over time of the seven bearings obtained using the proposed method.
Sensors 25 03571 g012
Figure 13. RUL prediction results for rolling bearings: (a) Task A; (b) Task B; (c) Task C; (d) Task D; (e) Task E; (f) Task F.
Figure 13. RUL prediction results for rolling bearings: (a) Task A; (b) Task B; (c) Task C; (d) Task D; (e) Task E; (f) Task F.
Sensors 25 03571 g013
Figure 14. Comparison of evaluation indices of TCN–Transformer and other models.
Figure 14. Comparison of evaluation indices of TCN–Transformer and other models.
Sensors 25 03571 g014
Table 1. Dimensional and dimensionless time domain feature indexes used in this work.
Table 1. Dimensional and dimensionless time domain feature indexes used in this work.
Dimensional IndexFunctionDimensionless IndexFunction
Mean absolute value (S1) x a v = 1 N i = 1 N x i Skewness (S11) x s k e = i = 1 N x i x ¯ 3 N 1 x σ 3
Peak (S2) x p = max | x i | Kurtosis (S12) x k u r = i = 1 N x i x ¯ 4 N 1 x σ 4
Minimum (S3) x min = min x i Skewness factor (S13) α = x s k e x r m s 3
Mean value (S4) x ¯ = 1 N i = 1 N x i Kurtosis factor (S14) β = x k u r x r m s 4
Maximum (S5) x max = max x i Crest factor (S15) C f = x p x r m s
Root mean square (S6) x r m s = 1 N i = 1 n x i 2 Shape factor (S16) S f = x rms x av
Root amplitude (S7) x r = 1 N i = 1 N x i 2 Impulse factor (S17) I f = x p x a v
Variance (S8) D x = 1 N i = 1 N x i x ¯ 2 Clearance factor (S18) C L f = x p x r
Standard deviation (S9) x σ = 1 N i = 1 N x i x ¯ 2 Coefficient of variation (S19) K v = D x / x av
Maximum to minimum difference (S10) x p p = max x i min x i
Note: xi denotes the vibration signal sequence collected by the sensor, x i = [ x 1 , x 2 , , x N ] . N denotes the number of data points.
Table 2. Frequency domain feature indexes used in this work.
Table 2. Frequency domain feature indexes used in this work.
IndexFunction
Centroid frequency (S20) f c = k = 0 N 1 f k X k k = 0 N 1 X k
Average frequency (S21) f m = 1 N k = 0 N 1 X ( k )
Standard deviation of frequency (S22) σ f = k = 0 N 1 f k f m 2 X ( k ) k = 0 N 1 X ( k )
Root mean square of frequency (S23) f r m s = k = 0 N 1 f k 2 N
Variance of frequency (S24) σ f 2 = k = 0 N 1 f k f m 2 X ( k ) k = 0 N 1 X ( k )
Table 3. Calculated evaluation indexes using the eight sensitive feature indexes for the seven rolling bearings.
Table 3. Calculated evaluation indexes using the eight sensitive feature indexes for the seven rolling bearings.
BearingMonotonicity IndexCorrelation IndexPredictive IndexRobustness IndexComprehensive IndexOriginal Maximum Comprehensive IndexImprovement
1-10.94580.96540.94280.89430.94820.89046.5%
1-20.86640.86280.94280.88050.87200.81736.7%
1-30.95930.95730.94280.88010.94900.852411.3%
1-40.80330.77070.94280.90970.81490.75188.4%
1-50.87790.89520.94280.88930.89240.82188.6%
1-60.90170.91320.94280.89820.91400.809612.9%
1-70.91670.89780.94280.85870.90590.85126.4%
Table 4. The details of the dataset used in this section selected from the IEEE PHM 2012 dataset.
Table 4. The details of the dataset used in this section selected from the IEEE PHM 2012 dataset.
Dataset 1
Load (N)
Rotation Speed
(rpm)
Dataset 2
Load (N)
Rotation Speed
(rpm)
4000180042001650
BearingActual lifeBearingActual life
Bearing 1-17 h 47 min 00 sBearing 2-12 h 31 min 40 s
Bearing 1-22 h 25 min 00 sBearing 2-22 h 12 min 40 s
Bearing 1-35 h 00 min 10 sBearing 2-33 h 20 min 10 s
Bearing 1-43 h 09 min 40 sBearing 2-41 h 41 min 50 s
Bearing 1-56 h 23 min 29 sBearing 2-55 h 33 min 30 s
Bearing 1-64 h 10 min 11 sBearing 2-61 h 35 min 10 s
Table 5. Remaining useful life prediction task description using IEEE PHM 2012 dataset.
Table 5. Remaining useful life prediction task description using IEEE PHM 2012 dataset.
TaskTraining BearingTest Bearing
ABearing 1-1, 1-2, 1-3Bearing 1-4
BBearing 1-1, 1-2, 1-3Bearing 1-5
CBearing 1-1, 1-2, 1-3Bearing 1-6
DBearing 2-1, 2-2, 2-3Bearing 2-4
EBearing 2-1, 2-2, 2-3Bearing 2-5
FBearing 2-1, 2-2, 2-3Bearing 2-6
Table 6. Hyperparameter values of TCN–Transformer networks.
Table 6. Hyperparameter values of TCN–Transformer networks.
HyperparameterValueHyperparameteValue
Batch Size32Epochs10
Activation FunctionGELULearning Rate0.0001
Embedding Dimension64Hidden Unit Dimension256
Temporal Window Length30Loss FunctionMSE
Table 7. Comparison of results of TCN–Transformer and other advanced models.
Table 7. Comparison of results of TCN–Transformer and other advanced models.
ModelAverage RMSEAverage MAEAverage SCORE
RNN0.10930.09010.2129
LSTM0.09690.08150.2004
GRU0.09910.08310.2496
Dual-LSTM0.10550.07280.2714
LSTM-AON0.08730.06950.3286
BiGRU-GSA0.08520.06340.4553
TCN-RSA0.07650.05290.507
TFT0.06020.04320.5614
TCN–Transformer0.05140.03920.6346
IMP14.62%9.26%13.04%
Table 8. Results of TCN–Transformer ablation experiment.
Table 8. Results of TCN–Transformer ablation experiment.
RMSEMAESCORE
TaskTCN-
Transformer
TransformerTCNTCN-
Transformer
TransformerTCNTCN-
Transformer
TransformerTCN
A0.03120.01770.62250.02270.01350.51310.63560.52900.1226
B0.06600.04911.01820.04830.04060.84670.56730.53840.0724
C0.06070.12080.97610.05300.10770.81840.59160.41220.0700
D0.02750.12780.51270.02260.06240.42010.67640.54650.1961
E0.08000.12571.00870.04360.08940.85670.61780.38280.0491
F0.04300.07030.42980.04480.06620.34980.71880.57430.2675
Average0.05140.08520.76130.03920.06330.63410.63460.49720.1296
IMP39.67% 38.07% 26.63%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jin, X.; Ji, Y.; Li, S.; Lv, K.; Xu, J.; Jiang, H.; Fu, S. Remaining Useful Life Prediction for Rolling Bearings Based on TCN–Transformer Networks Using Vibration Signals. Sensors 2025, 25, 3571. https://doi.org/10.3390/s25113571

AMA Style

Jin X, Ji Y, Li S, Lv K, Xu J, Jiang H, Fu S. Remaining Useful Life Prediction for Rolling Bearings Based on TCN–Transformer Networks Using Vibration Signals. Sensors. 2025; 25(11):3571. https://doi.org/10.3390/s25113571

Chicago/Turabian Style

Jin, Xiaochao, Yaping Ji, Shiteng Li, Kailang Lv, Jianzheng Xu, Haonan Jiang, and Shengnan Fu. 2025. "Remaining Useful Life Prediction for Rolling Bearings Based on TCN–Transformer Networks Using Vibration Signals" Sensors 25, no. 11: 3571. https://doi.org/10.3390/s25113571

APA Style

Jin, X., Ji, Y., Li, S., Lv, K., Xu, J., Jiang, H., & Fu, S. (2025). Remaining Useful Life Prediction for Rolling Bearings Based on TCN–Transformer Networks Using Vibration Signals. Sensors, 25(11), 3571. https://doi.org/10.3390/s25113571

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop