Next Article in Journal
Robust Continuous Sliding-Mode-Based Assistive Torque Control for Series Elastic Actuator-Driven Hip Exoskeleton
Previous Article in Journal
Composite Adaptive Control of Robot Manipulators with Friction as Additive Disturbance
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing

1
Shanghai Engineering Research Center of Industrial Big Data and Intelligent System, Institute of Artificial Intelligence, Donghua University, Shanghai 201620, China
2
College of Information Science and Technology, Donghua University, Shanghai 201620, China
*
Author to whom correspondence should be addressed.
Actuators 2025, 14(5), 238; https://doi.org/10.3390/act14050238
Submission received: 26 March 2025 / Revised: 7 May 2025 / Accepted: 8 May 2025 / Published: 9 May 2025
(This article belongs to the Section Actuators for Manufacturing Systems)

Abstract

:
In aerospace manufacturing, the reliability of machining equipment, particularly spindle bearings, is critical to maintaining productivity, as bearing health significantly constrains operational efficiency. Accurate prediction of the remaining useful life (RUL) of bearings can preempt failures, reduce downtime, and boost productivity. While conventional BiGRU-based models for bearing RUL prediction have shown promise, they often overlook handcrafted extracted time-series features that could enhance accuracy. This study introduces a novel model, BiGRU-ASSA-iTransformer, that integrates deep learning and handcrafted feature extraction to improve RUL prediction. The approach employs two parallel processes with a fusion step: First, a bi-directional gated recurrent unit (BiGRU) captures dynamic degradation features from raw vibration signals, with an adaptive sparse self-attention (ASSA) mechanism emphasizing short-term degradation cues. Second, 13 time-domain, frequency-domain, and statistical features, derived from traditional expertise, are processed using iTransformer to encode temporal correlations. These outputs are then fused via an attention mechanism. Experiments on the PHM 2012 and XJTU-SY datasets demonstrate that this model achieves the lowest prediction error and highest accuracy compared to existing methods, highlighting the value of combining handcrafted and deep learning approaches for robust RUL prediction in aerospace applications.

1. Introduction

With the rapid development of industrial automation and intelligent manufacturing, complex aerospace structural part machining workshops put forward higher requirements for the operational reliability of key manufacturing resources. In complex aerospace structural parts machining workshops, the operational reliability of manufacturing equipment is crucial to production efficiency, among which five-axis machining centers and mill-turn centers are widely used in high-precision structural parts machining as key manufacturing resources [1,2]. However, these types of equipment have various types of failures, including electrical failures, mechanical wear and tear, transmission system malfunctions, and control system abnormalities [3]. Analysis shows that the probability of failure varies depending on the complexity of the working conditions, in which the failure caused by mechanical wear occupies a more significant position among all the failures and is considered to be one of the main sources of failure. Spindle failure is particularly critical in mechanical failures due to its direct impact on machining accuracy and equipment continuity, and it belongs to the most decisive type of failure. Within the scope of spindle failures, bearing failures become the most prominent problem due to the combined effects of long-term high loads, over-speed operation, and environmental factors such as vibration and high temperatures, significantly restricting the overall usability of the equipment, which is the core bottleneck [4,5,6]. Accurate prediction of the remaining useful life (RUL) of bearings can provide early warning of failures, reduce unplanned downtime, significantly improve productivity, and reduce maintenance costs [7].
Currently, bearing RUL prediction methods can be categorized into model-driven, data-driven, and hybrid approaches [8,9]. Model-driven methods rely on physical degradation models, but their accuracy often suffers under complex, noisy operating conditions due to parameter sensitivities, limiting their applicability. Hybrid methods attempt to integrate physical models with data but frequently face challenges related to data quality and modeling complexity. In contrast, data-driven methods exhibit greater flexibility by mining degradation patterns directly from sensor data, an advantage that has become more pronounced following the advent of deep learning. Recurrent neural networks (RNNs) and their variants (e.g., LSTM and GRU) have demonstrated robust performance in time-series modeling [10], and bidirectional gated recurrent units (BiGRU) offer enhanced capabilities for capturing bearing degradation processes via bidirectional propagation. However, the conventional BiGRU method still has shortcomings: on one hand, it typically relies solely on the final time-step features for prediction, overlooking the potential contributions of earlier degradation stages [11]; on the other hand, it often neglects handcrafted features derived from expert domain knowledge, limiting further improvements in accuracy.
Therefore, this paper proposes a rolling bearing RUL prediction model (BiGRU-ASSA-iTransformer, BAIT-RUL for short). The model integrates four modules, namely, an adaptive sparse self-attention mechanism (ASSA) [12], a bi-directional gated recurrent unit (BiGRU), iTransformer [13], and handcrafted feature extraction, and consists of two parallel processes: (1) Capture the dynamic features of the degradation of original vibration signals throughout the whole process by utilizing the bi-directional gated recurrent unit (BiGRU), and focus on the key time steps in the degradation of bearings in combination with the ASSA mechanism. (2) Extract 13 handcrafted feature sequences related to bearing degradation in the time domain, frequency domain, and statistical features based on domain knowledge, and model their intrinsic temporal correlations through iTransformer. Finally, the dynamic features captured by BiGRU and the handcrafted time-sequence features optimized by iTransformer are fused by the attention mechanism to further improve the prediction accuracy.
The structure of this paper is as follows: Section 2 analyzes the development status and technical characteristics of existing RUL prediction methods; Section 3 describes the construction process of the BiGRU-ASSA-iTransformer model and its theoretical foundation; Section 4 introduces the used datasets and experimental setups, and analyzes the results; Section 5 discusses the significance of the experiments; and Section 6 summarizes the research results and looks forward to the future direction.

2. Related Works

Bearing remaining useful life (RUL) prediction is a key technology in the field of machinery health management, and the widely used prediction methods mainly include model-driven, data-driven, and hybrid methods. Among them, data-driven methods have gradually dominated the research in recent years due to their advantages of not requiring accurate mathematical modeling, high adaptability, and automatic extraction of complex features [14,15,16]. In this chapter, we will focus on reviewing the current status of the application of attention mechanisms and handcrafted feature extraction methods in bearing residual life prediction, analyzing their technical characteristics and deficiencies, and thus presenting the motivation for the research in this paper.

2.1. Attention Mechanism in Bearing Remaining Life Prediction

The attention mechanism is used to extract key information more effectively by dynamically assigning different weights to each part of the sequence, thus improving the modeling ability of the model on complex time-series data. Initially, this mechanism was widely used in the field of natural language processing, and in recent years, it has been gradually applied to the bearing RUL prediction task [17,18,19]. Peng et al. [20] proposed a multiscale temporal convolutional Transformer (MTCT), which synchronizes the extraction of long-term degradation through the introduction of a convolutional self-attention (CsA) mechanism and a temporal convolution network (TCN) attention module, and extracts long-term degradation features and local contextual correlations to achieve prediction directly from raw run-to-failure data. Zhao et al. [21] introduced a multi-head self-attention mechanism (MSM) to enhance feature representation and sequence modeling in RUL prediction, and analyzed the effect of head number on model accuracy, robustness, and interpretability by visualizing attention weight distributions and applying graph theory to interpret attention behavior. Similarly, the information-guided attention network (IGAN) designed by Wang et al. [22] utilizes multi-scale expansion convolution and a temporal attention mechanism to dynamically focus on different temporal locations of degradation features and enhance the model’s adaptability to the bearing degradation process. The adaptive stage-divided long temporal attention network (AD-LTAN) proposed by Gao et al. [23] identifies degradation initiation points through adaptive health stage dividing, and combines the multilayer extended convolution and temporal attention mechanism to improve the prediction accuracy of long-life bearings. In addition, Xiang et al. [24] proposed a novel LSTM network with attention-ordered neurons (LSTM-AON) for gear remaining useful life (RUL) prediction, enhancing robustness and long-term prediction accuracy by integrating an attention-guided tree structure. The attention mechanism guides the hierarchical division of input and historical information, assigning physical meaning to different attention levels, thereby improving the network’s ability to prioritize and retain critical degradation information for accurate RUL prediction.
The standard self-attention mechanism (Self-Attention) is able to capture long-range dependencies by modeling the relationship between all elements of a sequence, but it also suffers from high computational complexity and easy interference from redundant information, and has limitations in complex aerospace machining environments. Aerospace machining workshop bearings work in complex environments with high loads, obvious nonlinearities in the degradation process, and higher requirements for information capture in localized critical degradation phases. However, most of the existing attention mechanisms fail to effectively realize the adaptive attention of local short-term features, which limits the performance of the model in such complex working conditions.
The adaptive sparse self-attention mechanism (ASSA) was first proposed by Zhou et al. [12] in 2024 in an image recovery task. This mechanism adaptively filters low correlation features and highlights key features by combining sparse self-attention (SSA) and dense self-attention (DSA). SSA employs a sparse filtering mechanism based on the squared ReLU function, which efficiently suppresses redundant information interference, whereas the DSA retains core information through the softmax mechanism. In this study, considering that the ASSA mechanism can adaptively focus on more important regions in an image or sequence, it is introduced into the field of bearing RUL prediction to realize the effective capture of critical short-term information during bearing degradation.

2.2. Handcrafted Features in Bearing Remaining Life Prediction

In parallel with attention-based techniques, handcrafted feature extraction—using time-, frequency-, or statistical-domain features—remains valuable in bearing RUL prediction because it leverages established domain knowledge [25]. Researchers have demonstrated that combining handcrafted features with deep learning can enhance model performance. For example, Cao et al. [26] proposed a multidomain hybrid feature approach incorporating time, frequency, and entropy features for improved health indicator (HI) representation, while Niazi et al. [27] incorporated adaptive windows to refine handcrafted features over varying degradation stages. Likewise, Yang et al. [28] extracted multiple time-domain and entropy-based features—including variance, RMS, energy, sample entropy, shape factor, and Rényi entropy—from reconstructed vibration signals obtained via PCHIP-LCD and ISC selection. These features were then fused using improved independent component analysis and Mahalanobis distance to form a sensitive degradation indicator for RUL prediction. Other studies, such as that by Cui et al. [29], utilized graph convolutional networks (GCNs) in tandem with GRU for spatio-temporal feature learning and cross-domain feature alignment.
Although the combination of handcrafted features and deep learning shows some potential in RUL prediction, its application in complex aerospace structural part machining workshop scenarios still faces challenges. The degradation process of aerospace machining bearings is affected by the superposition of multiple sources of interference, and the traditional handcrafted features are usually used as static inputs, ignoring the dynamically changing characteristics of the features over time and failing to comprehensively capture the dynamic evolution of the bearing degradation process.
To solve the problem of dynamic modeling of feature sequences, Liu et al. [13] proposed the dimensional inversion Transformer (iTransformer). The model was initially used to solve the generalized multivariate long sequence prediction task. iTransformer efficiently models the dynamic correlations among variables by inverting the temporal and variable dimensions of the traditional Transformer model so that each variable is independently embedded as a token. In this paper, we believe that the feature dynamic encoding approach of this model is also advantageous for dynamic feature modeling of bearing degradation, and thus, iTransformer is introduced to model the temporal dynamics of handmade features to more effectively capture the feature evolution laws during bearing degradation.
In summary, the current data-driven bearing RUL prediction methods still face the following deficiencies in complex aerospace processing environments: (1) the existing attention mechanism lacks precise attention to local short-term degradation of key features and sparse optimization; (2) the handcrafted feature methods tend to neglect the dynamic evolution process of the features, which makes it difficult to effectively characterize the changing law of degradation features under complex environments; (3) the deep learning methods lack effective combination with domain knowledge and fail to achieve dynamic feature modeling, thus failing to realize the efficient integration of dynamic features and handcrafted features.
To address the above deficiencies, this paper proposes a bearing RUL prediction model, BAIT-RUL, that combines BiGRU, the adaptive sparse self-attention mechanism (ASSA) [12], and dimension inversion Transformer (iTransformer) [13], which utilizes BiGRU to capture the long-term-dependent features of the degradation process, and applies ASSA to focus on the key time steps in the process. The model utilizes BiGRU to capture the long-term dependent features of the degradation process, applies ASSA to focus on the critical time steps in the degradation process, assigns higher weights to the time steps that contain more rich degradation information, introduces iTransformer to dynamically encode the sequence of handmade features in a chronological order, and ultimately integrates the in-depth features with the chronological optimization features through the attention mechanism, to enhance the prediction accuracy and robustness of the model under complex working conditions.

3. Methods

In this section, a bearing remaining life (RUL) prediction model, BAIT-RUL, is proposed, combining deep learning and domain knowledge to enhance prediction accuracy and robustness. The model processes raw vibration signals and handcrafted features separately through two parallel branches and an attention fusion mechanism to comprehensively capture dynamic changes and temporal correlations. The architecture of BAIT-RUL is illustrated in Figure 1.
The BiGRU-ASSA branch employs a three-layer bidirectional GRU (BiGRU) to extract dynamic features from raw vibration signals, capturing long-term temporal dependencies. An adaptive sparse self-attention (ASSA) mechanism dynamically weights the BiGRU outputs to emphasize short-term degradation features critical for RUL prediction. The iTransformer branch processes 13 handcrafted features (e.g., time-domain, frequency-domain, and statistical features) derived from domain knowledge, using an improved Transformer (iTransformer) to capture global dependencies and evolutionary trends in bearing degradation. These branches are complementary: the BiGRU-ASSA branch focuses on local, dynamic patterns, while the iTransformer branch models global trends, with their feature representations correlated through the attention fusion mechanism. The fusion mechanism computes weighted correlations between the branches’ outputs using multi-head attention, dynamically balancing the contributions of local and global features. For instance, in early degradation stages, the fusion mechanism may prioritize BiGRU-ASSA’s dynamic features to capture subtle signal changes, while in later stages, it may emphasize iTransformer’s handcrafted features to reflect stable trends. This correlation and interdependence enable the model to adapt to varying degradation phases, enhancing robustness.
The features from the two branches are fused via the attention mechanism, and RUL predictions are generated after dimensionality reduction by a fully connected layer. As shown in Figure 1, BAIT-RUL effectively integrates dynamic degradation information with the temporal characterization of handcrafted features.

3.1. BiGRU-ASSA Module

To fully utilize the temporal information and dynamic features of the original vibration signals, this study designs a feature extraction path based on a bidirectional gated recurrent unit (BiGRU) with an adaptive sparse self-attention mechanism (ASSA). The structure of this feature extraction branch is illustrated in Figure 2. Specifically, the BiGRU module captures the long-term dependencies of the signals through a bidirectional propagation mechanism, while the ASSA module dynamically focuses on the important features at key time steps during the degradation process. The following sections describe the theoretical foundations and technical details of these two modules, respectively.

3.1.1. Bidirectional Gated Recurrent Unit

The remaining life prediction of bearings usually relies on time-series data collected from sensors (e.g., horizontal vs. vertical vibration signals) [30]. Recurrent neural networks (RNNs) can effectively capture the temporal dependence of the data due to their property of connecting nodes along a time series. However, traditional RNNs are prone to the problem of gradient vanishing or explosion during training, which limits their ability to learn long sequences. To overcome this problem, gated recurrent units (GRUs) have been developed, which selectively retain and forget historical information through the reset gate and update gate mechanisms, thus enhancing the ability to capture long-term dependencies and simplifying the network structure to improve computational efficiency [31].
Although GRU effectively improves the RNN’s ability to model long-term dependencies, unidirectional GRU only utilizes historical information and lacks effective mining of future information. To address this issue, this paper employs a bidirectional gated recurrent unit (BiGRU). The BiGRU contains two GRU structures with opposite directions, which can more comprehensively capture the characteristics of the dynamic evolution of the bearing degradation process by considering both past and future sequence information through forward and backward propagation [32]. For example, weak degradation signals in the early stages of a bearing may be more evident in subsequent stages of development, and a backward-propagating GRU can effectively capture and enhance such features.
In the specific implementation, the input and hidden states at time step t are set as x t and h t , respectively; then, the reset gate r t and update gate z t of the GRU dynamically regulate the information flow through the following equation:
r t = σ W x r x t + W h r h t 1 + b r ,
z t = σ W x z + W h z h t 1 + b z ,
where σ is the Sigmoid activation function, and { W x r , W h r , W x z , W h z } and b r , b z represent the set of weight matrix and bias vector, respectively. The update gate z t controls the fusion ratio of the history state to the current input, and the reset gate r t determines how much of the history memory is retained. This gating mechanism enables the GRU to adaptively filter noise and retain key features associated with degradation.
At moment t , the hidden state update formula of the GRU is:
h t ~ = t a n h W x h x t + W h h r t h t 1 + b r ,
h t = 1 z t h t 1 + z t h t ~ ,
where h t ~ is the candidate hidden state, t a n h ( x ) is the hyperbolic tangent function, the symbol denotes the Hadamard product, and W x h and W h h are two weight matrices. In BiGRU, the final hidden state integrates the forward and backward outputs to fully characterize the input sequence.
As an improved structure of GRU, BiGRU consists of two GRU networks with opposite propagation directions. In contrast to GRU, Figure 3 shows the BIGRU network with K GRU units, where each node in the BIGRU contains information about the entire input sequence, allowing for a better synthesis of feature extraction for all input samples.
Due to its strong sequential modeling capabilities, the BiGRU network has successfully been used for machine RUL prediction [33,34]. However, the BiGRU network used in these studies gives the same weight to all time steps in the bearing degradation process, ignoring the important contribution of critical time steps in the bearing degradation process, as shown in Figure 3.

3.1.2. Adaptive Sparse Self-Attention

In the bearing remaining useful life (RUL) prediction task, different time steps of the time-series data contribute unevenly to the final prediction [11]. In a complex aerospace structural component machining workshop, spindle bearings are affected by high loads, high speeds, and multi-source disturbances (e.g., high temperatures and high vibrations), and the degradation stage contains more critical information. In contrast, the contribution of the smooth data in the normal operation stage is smaller. The traditional BiGRU network can capture the dependencies in the time-series data, but it is unable to weight the importance of features and time steps, resulting in irrelevant information that may interfere with the model performance. For this reason, this paper introduces the adaptive sparse self-attention (ASSA) mechanism into the BiGRU network to enhance the model’s ability to focus on key features of aerospace machine bearing degradation.
ASSA dynamically filters information valuable for RUL prediction and suppresses redundant noise by combining the sparse self-attention (SSA) and dense self-attention (DSA) branches. Specifically, the SSA branch employs square ReLU-based attention computation to filter features with low query key matching scores and ensure sparse attention to degraded critical time steps; the DSA branch utilizes a softmax layer to retain dense feature information and compensate for over-sparsity. ASSA fuses the SSA and DSA branches adaptively to obtain temporal features extracted by BiGRU from the weighted results and propagates them through the network. This dual-branch design allows the model to flexibly adapt to the complex patterns of bearing degradation in aerospace machining workshops [12].
In the application, the ASSA module receives the dynamic feature sequence output from BiGRU, and by assigning different weights to the features at each time step, it highlights the significant change points in the degradation process of the aerospace bearings, thus improving the accuracy and robustness of the prediction. Compared with the use of BiGRU alone, the incorporation of ASSA enables the model to automatically identify the time steps with the most predictive value and reduces the interference of irrelevant information, which is especially suitable for the demand of high-precision RUL prediction under complex aerospace working conditions.
First, given the input one-dimensional time-series features X R T × C (where T is the number of time steps and C is the feature dimension), the time series is divided into several non-overlapping time segments according to a fixed window, and the features are learned for each time segment. Next, ASSA computes the attention score by generating a matrix of queries Q , keys K , and values V as:
Q = X W Q , K = X W K , V = X W V ,
where W Q , W K , W V R C × d is a linear projection matrix and d is the projection dimension. The attention calculation can be defined as:
A = f Q K T d + B V ,
where A denotes the estimated attention, B is the learnable relative positional bias, and f ( ) is the scoring function, considering the temporal correlation of the time series.
ASSA utilizes two self-attention mechanisms to process the input features. The first one is sparse self-attention (SSA), which filters out features with low matching scores by a ReLU-based squaring layer that removes noise and highlights key time-series information. Its calculation formula is:
S S A = R e L U 2 Q K T d + B ,
The second is dense self-attention (DSA), which computes the attention scores of all query key pairs through a softmax layer, preserving the global dependency of the time series:
D S A = S o f t M a x Q K T d + B ,
To adaptively regulate the weights of these two mechanisms, ASSA introduces a weighting mechanism that dynamically adjusts their weights according to the performance of each branch. The final attention matrix A is given by the following equation:
A = w 1 S S A + w 2 D S A V ,
where w 1 , w 2 R 1 are two normalized weights for adaptively adjusting the two branches, which can be obtained by the following formula:
w n = e a n i = 1 N e a i , n = 1,2 ,
where { a 1 , a 2 } are learnable parameters indicating adaptive control over SSA and DSA branches. With this fusion strategy, the model can flexibly adjust the sparseness and denseness of the attention at different stages of bearing degradation to capture the evolutionary trend of key time steps and features more accurately, which optimizes the modeling of degraded features in time series.

3.2. iTransformer Module

To leverage the temporal information in domain knowledge-based handcrafted features, we employ an inverted Transformer (iTransformer) to optimize the time-series representations of 13 handcrafted features (e.g., time-domain, frequency-domain, and statistical features). The structure is shown in Figure 4. While end-to-end deep learning methods can extract degradation features, they often fail to incorporate a priori knowledge of physical degradation mechanisms in complex machining environments. The iTransformer processes handcrafted feature sequences to capture global dependencies and evolutionary trends, complementing the local, dynamic features extracted by the BiGRU-ASSA branch. The resulting global feature representations are correlated with BiGRU-ASSA’s dynamic features through the attention fusion mechanism, integrating local and global information to enhance RUL prediction accuracy. The handcrafted feature extraction process and iTransformer model are described in detail below.

3.2.1. Handcrafted Features

In the complex environment of an aerospace structural component machining workshop, spindle bearing degradation is influenced by high loads, high speeds, and multiple sources of disturbances (e.g., vibration, temperature variations), exhibiting significant time-series evolutionary characteristics. To effectively characterize these properties, this study draws on Refs. [25,26,27,28,29] to extract 13 handcrafted features from the original vibration signals, which have proven effective for predicting the remaining bearing life. These include time-domain features (e.g., root mean square (RMS), cragginess, peaking factor), frequency-domain features (e.g., power spectral density, spectral gravity), and statistical features (e.g., energy entropy). These features are computed using signal processing techniques and are designed to capture physical patterns in bearing degradation, such as amplitude changes, frequency distribution shifts, and signal complexity variations. The features are selected based on their strong correlation with the degradation process, providing physically meaningful and informative inputs for subsequent modeling.

3.2.2. iTransformer

Transformer (TR), proposed by Vaswani et al. in 2017, is a neural network architecture based on a self-attention mechanism originally designed for natural language processing tasks [35]. Traditional Transformer tokenizes in time steps and faces the challenges of high computational complexity and confusing multivariate information in time-series prediction. For this reason, Yong Liu et al. proposed iTransformer in 2024 to improve the traditional architecture through the dimension inversion (inverted) method [13]. The method embeds each variable of a time series as a token independently, as shown in Figure 4; captures inter-variable correlations using the self-attention mechanism; and models the time-series evolution of each variable with the help of feed-forward neural networks (FFNs), thus improving the efficiency of long series prediction and reducing the information aliasing.
In this study, iTransformer is introduced for time-series dynamic modeling of handmade features to effectively mine the dynamic evolution law of handmade features. The specific realization steps are as follows:
  • Feature embedding (Embedding)
iTransformer adopts variable-level tokenization; firstly, all handcrafted features need to be embedded, and each feature variable is mapped to a high-dimensional representation space:
h n 0 = E m b e d d i n g X : , n ,
where X : , n denotes the n th handcrafted feature sequence, and Embedding is implemented by a multilayer perceptron machine (MLP) to capture the initial representation of features.
2.
Inter-feature interaction modeling (Self-Attention)
The iTransformer computes the dependencies between different handmade features through the self-attention mechanism to extract the global information and improve the representation of the features. The encoding process is as follows:
H l + 1 = T r m B l o c k H l , l = 0 , , L 1 ,
where H is the embedded representation of all features and TrmBlock consists of multiple Transformer encoder layers, each containing multi-head self-attention (MHSA) and a feed-forward neural network (FFN).
3.
Time-dependent modeling (FFN)
Since iTransformer adopts a feed-forward neural network (FFN) to independently learn the temporal evolution relationship of each feature, it can better portray the dynamic trend of handcrafted features without losing information due to feature overlapping.
The introduction of iTransformer not only preserves the domain knowledge of handcrafted features but also enhances their temporal correlation representation through variable-level modeling, providing high-quality inputs for subsequent feature fusion and RUL prediction.

3.3. Experimental Setup

The experiments in this study were conducted on a computer equipped with a 13th Gen Intel(R) Core (TM) i9-13900HX 2.20 GHz processor (Intel Corporation, Santa Clara, CA, USA), 32 GB RAM, and an NVIDIA RTX 4060 GPU (NVIDIA Corporation, Santa Clara, CA, USA; CUDA 11.8). The operating system was Windows 11, the programming language was Python 3.9.21, and the deep learning framework was PyTorch 1.10.0. All experiments were conducted in the same hardware environment and hyperparameter settings to ensure the reproducibility and fairness of the experiments.

3.3.1. Bearing Datasets

To comprehensively verify the effectiveness and generalization ability of the proposed method, two widely used public bearing datasets are used in this paper: the PRONOSTIA bearing dataset from the IEEE PHM 2012 Data Challenge [36] and the XJTU-SY rolling bearing dataset [37]. These two datasets are collected under different operating conditions and cover a variety of degradation modes, which can effectively evaluate the adaptability and robustness of the model in diverse environments.
  • PRONOSTIA Dataset
The PRONOSTIA dataset from the IEEE PHM 2012 Data Challenge, provided by the French FEMTO-ST Institute, is a widely utilized public dataset for rolling bearing health monitoring and residual life prediction. PRONOSTIA employs constant operating conditions to study the degradation behavior of bearings under stable settings. The experimental platform comprises a motor, housing, and load system, with bearings operated at constant speed and load until failure. Three sets of bearing vibration signals under different operating conditions (varying speed–load combinations) were collected, with each condition capturing the full-life data of multiple bearings. As shown in Figure 5, the vibration signal waveforms of rolling bearings 1_1, 2_1, and 3_1, recorded in both horizontal and vertical directions, exhibit a progressive widening trend in signal distribution over time, despite unavoidable noise in the data, indicating that the signals carry significant characteristic information reflecting the gradual deterioration of the bearing’s health condition. The training and test sets for the experiment are divided as detailed in Table 1.
2.
XJTU-SY Dataset
The XJTU-SY rolling bearing dataset was jointly collected by Xi’an Jiaotong University (XJTU) and Shenyang Machine Tool Group (SY) and is widely used for residual life prediction research. The dataset consists of 15 sets of full life-cycle vibration signals of rolling bearings under three different operating conditions, which were collected by performing several accelerated degradation tests. To record the entire degradation process, i.e., from normal condition to severe failure, each accelerated degradation test was performed until the maximum amplitude of the horizontal or vertical vibration signals exceeded a threshold value [38]. As illustrated in Figure 6, the temporal vibration waveforms of bearings 1_1, 2_1, and 3_1, recorded in both horizontal and vertical directions, reveal a progressive evolution in signal characteristics over the bearing life cycle, despite the presence of noise, highlighting critical patterns associated with the degradation process. The training and test sets for the experiment are divided as shown in Table 2. This dataset is detailed in Table 3, which outlines operational data such as conditions, sample sizes, work times, and fault types for each bearing.

3.3.2. Data Preprocessing

In data preprocessing, this paper adopts three steps of normalization, sliding window division, and residual lifetime (RUL) label calculation to ensure the standardization of input data and the completeness of time-series information and to provide reasonable prediction targets.
(1)
Data Normalization
To reduce the magnitude difference between features and improve the stability of model training, Z-score normalization is applied to all input features in this paper:
X = X μ σ ,
where X is the original feature value, μ is the mean value of the feature, and σ is the standard deviation of the feature. This method centers the data distribution and improves the convergence speed of the model.
(2)
Sliding Window Division
A sliding window is used to divide the data to maintain the temporal information and improve the model’s ability to learn the long-term dependencies. The length of the sliding window is 30, and the step size is 10.
(3)
RUL Label Calculation
In the RUL prediction task, the dataset needs to provide the remaining life labels for each time step to be used as a supervisory signal to guide the model training. For each bearing dataset, let the total lifetime of the bearing be T (i.e., the number of time steps contained in the dataset), and for time step i , RUL is defined as:
R U L i = T i T ,
where T is the total running time step of the bearing and i is the current time step. The calculated RUL values are normalized to [0, 1] to ensure that the RUL of different bearings are comparable.
The computed RUL labels are added to the corresponding datasets for subsequent training and evaluation.

3.3.3. Handcrafted Features Calculation

In this experiment, 13 types of handcrafted features are extracted from the original vibration signals, including time-domain features (e.g., mean and variance), vibration features (e.g., peaks), frequency-domain features (e.g., spectral energy ratios, spectral flatness), and nonlinear features (e.g., cragness, entropy value, and fractal dimensionality), to comprehensively characterize the evolutionary trend of bearing degradation. These features reflect the statistical distribution, vibration intensity, spectral characteristics, and nonlinear dynamics of the signal, respectively. The extracted feature visualization for PRONOSTIA dataset Bearing1_3 is illustrated in Figure 7.
In the feature visualization of the PRONOSTIA dataset Bearing1_3, the mean and variance indicate the overall level and volatility of the signal, the peak and cragness highlight local shocks, the spectral energy ratio reveals changes in frequency distribution, and the entropy value and fractal dimension quantify signal complexity. To eliminate scale differences, all features are normalized after extraction to ensure consistency and effectiveness in model training.

3.3.4. Evaluation Metrics

To quantitatively assess the performance of the proposed model and the comparative model in the prediction of the remaining useful life (RUL) of rolling bearings, this paper adopts the root mean square error (RMSE) and the mean absolute error (MAE) as the assessment indexes [39,40]. The specific definitions are as follows:
  • Root mean square error (RMSE):
R M S E = i = 1 N y ^ i y i 2 N ,
2.
Mean absolute error (MAE):
M A E = i = 1 N y ^ i y i N ,
where y ^ i and y i denote the predicted life percentage and actual life percentage of the tested bearings, and N is the number of tested samples. RMSE measures the overall deviation between the predicted value and the actual value, and MAE reflects the average absolute error of the prediction. The lower the values of RMSE and MAE, the better the prediction performance of the model for RUL.

3.3.5. Model Parameters

The BAIT-RUL model parameters are set as follows: The model contains two parallel branches. The first branch uses a three-layer bi-directional GRU to process the original vibration signals, with an input feature dimension of 4, a time step of 30, several hidden units per layer of 64, and an output dimension of 128, and extracts dynamically degraded features in combination with the adaptive sparse self-attention mechanism (ASSA, with an input dimension of 128). The second branch uses the iTransformer module to process 13 handmade features with 13-dimensional temporal sequences as input, optimizing the feature temporal correlation through an encoder, and outputting a 13 × 16-dimensional feature representation. After the fusion of the two branches of features, it is mapped to 1-dimensional through the fully connected layer, and the output is the predicted value of the remaining service life of the bearing.
The model is trained using the Adam optimizer with an initial learning rate of 0.001, momentum parameters β1 = 0.9, β2 = 0.999, combined with a cosine annealing learning rate schedule, and a minimum learning rate of 0.0001. The training process consists of 5 iterations, 32 rounds of training per iteration, and a batch size of 64. To ensure robustness against the stochasticity in training, multiple runs were conducted with random seeds (100, 200, 300, 400, 500), and seed 100 was selected based on consistent performance across the XJTU-SY and PRONOSTIA datasets. To prevent overfitting, dropout regularization is introduced into the key module, and the fully connected layer L2 regularization is applied (coefficient 0.01). The state is printed every 10 batches during training, and an early stopping strategy is applied to terminate training if the validation set loss does not drop for 10 consecutive epochs.

4. Results

4.1. Experimental Results

This section conducts a systematic comparative analysis of the proposed BAIT-RUL model against state-of-the-art baseline models, including LSTM [41], Transformer [42], Transformer-LSTM [43], CNN-BiLSTM [44], and BiGRU-Transformer-Attention [45], in terms of prediction performance on the PRONOSTIA and XJTU-SY rolling bearing datasets.

4.1.1. PRONOSTIA Bearing Dataset

Figure 8 shows the remaining life (RUL) prediction results of various models on the PRONOSTIA dataset for the first working condition, Bearing1_3. Each figure illustrates the actual RUL (blue line), predicted RUL (red line), 95% confidence interval (shaded red area), and absolute prediction error (gray area). In terms of the overall trend, the BAIT-RUL model proposed in this paper demonstrates higher consistency and prediction accuracy compared to the other models, maintaining stable predictions throughout the entire bearing life cycle. The quantitative comparison of RMSE and MAE on the test set is summarized in Table 4, which further supports the performance improvement of BAIT-RUL in numerical prediction.
The LSTM model initially tracks the actual RUL effectively but exhibits increasing deviation, with a widening 95% confidence interval and rising absolute error, highlighting its capability for short-term dependencies but limitations in capturing the complex, nonlinear dynamics of long-term degradation. The Transformer model performs well in the early stages, with the predicted RUL closely matching the actual RUL, yet it shows fluctuations, an expanding 95% confidence interval, and growing error as degradation intensifies, indicating its strength in long-sequence modeling but weakness in addressing local nonlinear features and abrupt changes. The Transformer-LSTM hybrid enhances mid-stage alignment with the actual RUL by leveraging LSTM’s memory, but a persistently wide 95% confidence interval and increasing error in later stages reveal its difficulty in capturing local mutations, likely due to LSTM’s memory decay.
The CNN-BiLSTM model reasonably tracks the actual RUL in the early and middle stages, maintaining a moderately stable 95% confidence interval, but displays greater deviation and error in the late stages, suggesting that while it benefits from CNN’s local feature extraction and BiLSTM’s bidirectional temporal modeling, it struggles with long-term dynamic trends as degradation complexity increases. The BiGRU-Transformer-Attention model surpasses others in the mid-degradation phase, achieving closer RUL alignment and a narrower 95% confidence interval, yet it over-predicts with rising error and a wider interval in the final stage, indicating that its attention mechanism, while effective for key features, lacks precision for short-term abrupt changes. In contrast, the BAIT-RUL model consistently excels, maintaining a tight 95% confidence interval and minimal absolute error throughout, underscoring its robustness in effectively balancing global trend modeling with sensitivity to local degradation patterns.
The BAIT-RUL model proposed in this paper maintains stable prediction accuracy across different degradation stages, as illustrated in Figure 8. The prediction curves align closely with the actual RUL curves, with errors kept to within 0.05 in most intervals. This performance is further supported by the numerical comparison results in Table 4, which show a reduction in RMSE and MAE for BAIT-RUL compared to other models.
The absolute prediction errors shown in the figure further illustrate the stability of the predictions of the model proposed in this paper across the entire bearing life cycle. Compared with other models, the error fluctuation is minimized and more evenly distributed, reflecting good adaptability to different degradation stages and patterns. Figure 9 presents the boxplot analysis of RMSE and MAE for all models, which provides additional insights into the performance consistency across different bearings. The compact distribution of the BAIT-RUL model in Figure 9 suggests that it maintains relatively stable performance across varying degradation patterns.
Table 4 summarizes the RMSE and MAE values for all models on the PRONOSTIA dataset, showing that the proposed model achieves lower prediction errors for 10 out of 11 bearings across the three operating conditions. For the remaining bearing (Bearing1_4), the model still demonstrates competitive performance.

4.1.2. XJTU-SY Bearing Dataset

Figure 10 demonstrates the prediction results for Bearing1_3 of the XJTU-SY dataset. Overall, the BAIT-RUL model prediction curves are highly consistent with the real RUL, exhibiting high accuracy and stability, which aligns with the conclusions drawn from the PRONOSTIA dataset. Table 5 provides a numerical comparison of RMSE and MAE across different models, further validating the performance advantage of BAIT-RUL.
The LSTM model exhibits significant deviations in both the early and late stages of degradation, with the predicted RUL diverging markedly from the actual RUL, a wide 95% confidence interval, and high absolute prediction error, indicating its difficulty in accurately capturing the degradation trend across all stages. The Transformer and Transformer-LSTM models perform stably in the early and middle stages of degradation, but their errors increase in the later stages. The former struggles to capture local nonlinear features, while the latter, despite LSTM improving long-term trend modeling, remains insensitive to local mutations.
The CNN-BiLSTM model performs well in the early stages, with the predicted RUL closely following the actual RUL, but as the degradation process becomes more complex, the error increases significantly in the later stages, with rising absolute error, suggesting limitations in adapting to long-term dynamic changes. The BiGRU-Transformer-Attention model improves prediction smoothness but still exhibits errors at the end of the degradation, with the attention mechanism showing limitations in capturing short-term features. Figure 11 presents the RMSE and MAE distribution for all models using boxplots, further highlighting the stability of BAIT-RUL’s predictions.
The BAIT-RUL model outperforms the benchmark model on Bearing1_3. However, BAIT-RUL slightly underperforms Transformer-LSTM on Bearing2_4, possibly because Bearing2_4’s degradation is more continuous and linear, whereas LSTM holds an advantage for such trends. Overall, the results summarized in Table 5 and Figure 11 confirm that the BiGRU-ASSA-iTransformer model maintains strong generalization and prediction ability across complex degradation patterns on both the XJTU-SY and PRONOSTIA datasets.

5. Discussion

5.1. Ablation Experiment

In this section, ablation experiments were conducted to verify the validity of the proposed model and to assess the influence of each component module on the prediction of the remaining bearing life. The experiments were conducted on Bearing1_1 and Bearing2_4 of the PRONOSTIA dataset and Bearing1_3 and Bearing2_5 of the XJTU-SY dataset, and five different variants were used for the comparative analysis to examine the contributions of the different modules. Due to space constraints, only the prediction results for Bearing2_4 of the PRONOSTIA dataset are shown in this paper. Figure 12 presents the overall performance comparison of different ablation models on the PRONOSTIA dataset, while Table 6 summarizes the RMSE and MAE results for Bearing1_3 and Bearing2_4 (PRONOSTIA) and Bearing1_3 and Bearing2_5 (XJTU-SY).
BiGRU, as a benchmark model, relies heavily on a two-way loop structure to model time-series relationships. However, the model fails to fully utilize the global patterns and short-term change characteristics in time series, and thus has some limitations in predicting the degradation trend. For this reason, the study introduces the iTransformer to strengthen the temporal encoding capability of the features and incorporates the ASSA mechanism to focus on short-term localized changes. As shown in Figure 12, removing the iTransformer component significantly degrades the model’s long-term prediction ability.
On the other hand, the ASSA mechanism mainly focuses on short-term feature extraction, playing a key role in identifying local degradation trend changes. Table 6 shows that removing the ASSA mechanism leads to increased RMSE and MAE, indicating reduced sensitivity to short-term variations.
The experiments also validate the necessity of the bidirectional BiGRU structure. When a unidirectional GRU is used instead of the BiGRU, the predictive ability of the model decreases. In contrast, the bidirectional structure better captures past and future information, improving the accuracy of modeling degradation trends.
Figure 12 further confirms that the BAIT-RUL model achieves lower prediction errors compared to its ablated variants, demonstrating the contribution of each module to the overall performance.

5.2. Impact of ASSA on Feature Weighting in RUL Prediction

To deeply investigate the role of ASSA in the prediction of RUL of bearings, the attention matrix of the test samples is illustrated in Figure 13. The traditional BiGRU model, in the absence of an attention mechanism, assigns equal weights to the features at all time steps, ignoring the fact that the critical degradation stage may have a higher contribution. However, the dynamic nature of the bearing degradation process suggests that there are significant differences in the contribution of features from different time steps to RUL prediction.
The attention matrix visualization clearly reveals this property through a 3D heat map, where both the horizontal (key position) and vertical (query position) coordinates denote a sequence of time steps ranging from 0 to 30, reflecting the 30-times step lengths of the model inputs. This time step is divided by a sliding window method (window size of 30, step size of 10), with each window covering a shorter segment of the signal in order to control the computational complexity while capturing short-term dynamic changes. As shown in the heat map, the most recent time step is assigned a larger attention weight, indicating its higher importance for RUL prediction. This phenomenon is consistent with the physical law of bearing degradation: the later stages of degradation are usually accompanied by significant feature changes (e.g., a sharp increase in vibration amplitude), which are more critical in contributing to the RUL prediction.
Although it is difficult to directly explain the physical meanings of the high-level features extracted by BiGRU, the introduction of ASSA significantly enhances the model’s ability to model the degradation process by quantifying the differences in the contributions of the features at each time step. ASSA not only highlights the features at the critical time step but also retains the potential information at other time steps, ensuring the comprehensiveness of feature utilization. This dynamic weight allocation strategy effectively enhances the prediction accuracy of the BAIT-RUL model, especially in the late stage of degradation showing lower prediction errors, which fully demonstrates the important value of ASSA in the prediction of complex degradation patterns.

6. Conclusions

In this paper, a rolling bearing remaining useful life (RUL) prediction model (BAIT-RUL) incorporating handcrafted features is proposed to enhance the operational reliability prediction capability of spindle bearings in complex aerospace structural component machining workshops. Aiming at the limitations of traditional prediction models in RUL prediction, which pay insufficient attention to the time-step contribution difference and underutilize handcrafted features, the BAIT-RUL model constructs two parallel feature processing branches by integrating a bidirectional gated recurrent unit (BiGRU), an adaptive sparse self-attention mechanism (ASSA), an iTransformer, and a handcrafted feature extraction module. The BiGRU-ASSA branch effectively captures the dynamic degradation features in the original vibration signals and focuses on the features with later time steps through ASSA to extract the rich degradation information contained in these phases. The iTransformer branch, on the other hand, optimizes the temporal correlations of 13 handcrafted features through the encoder to fully exploit the potential value of domain knowledge. Eventually, the model fuses the features of the two branches through the attention mechanism, significantly reducing the prediction error. The experimental results show that BAIT-RUL outperforms the existing state-of-the-art methods on the PRONOSTIA dataset (formerly known as the PHM 2012 dataset) and the XJTU-SY dataset.
Despite the promising results achieved in this study, there are still some limitations that deserve further exploration. (1) Although the 13 handcrafted features adopted in this paper cover the time domain, frequency domain, and statistical domain, they may not fully capture the specific degradation patterns under extreme working conditions (e.g., high loads, high-speed rotation, or low-temperature environments) encountered in aerospace spindle bearings. Future research could introduce an aerospace condition-based feature screening mechanism or apply advanced feature engineering techniques to improve the quality of handcrafted features and enhance adaptability to aerospace applications. (2) This study validated the model on the PRONOSTIA and XJTU-SY datasets; however, these datasets do not fully represent the diversity of bearing types under complex aerospace working conditions. In the future, the model could be extended to more aerospace industry scenarios (e.g., different working conditions and bearing types) for validation, further improving its generalization ability.

Author Contributions

Conceptualization, Y.L. and Y.C.; methodology, Q.Q.; writing—original draft preparation, Q.Q.; writing—review and editing, Y.L.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China under grant No. 2022YFB3302700 and the National Natural Science Foundation of China under grant No. 52375486.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
RULRemaining useful life
BiGRUBidirectional gated recurrent unit
ASSAAdaptive sparse self-attention
BAIT-RULBiGRU-ASSA-iTransformer for remaining useful life prediction
GRUGated recurrent unit
CNNConvolutional neural network
LSTMLong short-term memory network
MLPMulti-layer perceptron
RMSERoot mean square error
MAEMean absolute error

References

  1. Yang, G.; Tan, Q.; Tian, Z.; Jiang, X.; Chen, K.; Lu, Y.; Liu, W.; Yuan, P. Integrated Optimization of Process Planning and Scheduling for Aerospace Complex Component Based on Honey-Bee Mating Algorithm. Appl. Sci. 2023, 13, 5190. [Google Scholar] [CrossRef]
  2. Adamopoulou, E.; Daskalakis, E. Applications and Technologies of Big Data in the Aerospace Domain. Electronics 2023, 12, 2225. [Google Scholar] [CrossRef]
  3. Tang, Y.; Zhang, J.; Tian, H.; Liu, H.; Zhao, W. Optimization Method of Spindle Speed with the Consideration of Chatter and Forced Vibration for Five-Axis Flank Milling. Int. J. Adv. Manuf. Technol. 2023, 125, 3159–3169. [Google Scholar] [CrossRef]
  4. Peng, J.; Yin, M.; Cao, L.; Xie, L.-F.; Wang, X.-J.; Yin, G.-F. Study on the Thermally Induced Spindle Angular Errors of a Five-Axis CNC Machine Tool. Adv. Manuf. 2023, 11, 75–92. [Google Scholar] [CrossRef]
  5. Wang, Z.; Wang, S.; Wang, S.; Zhao, Z.; Yang, T.; Su, Z. Prediction of Five-Axis Machining-Induced Residual Stress Based on Cutting Parameter Identification. J. Manuf. Processes 2023, 103, 320–336. [Google Scholar] [CrossRef]
  6. Mao, W.; Liu, Y.; Ding, L.; Safian, A.; Liang, X. A New Structured Domain Adversarial Neural Network for Transfer Fault Diagnosis of Rolling Bearings under Different Working Conditions. IEEE Trans. Instrum. Meas. 2021, 70, 3509013. [Google Scholar] [CrossRef]
  7. Xu, J.; Ma, B.; Fan, Y.; Ding, X. ATPRINPM: A Single-Source Domain Generalization Method for the Remaining Useful Life Prediction of Unknown Bearings. In Proceedings of the 2022 International Conference on Sensing, Measurement & Data Analytics in the Era of Artificial Intelligence (ICSMD), Harbin, China, 30 November–2 December 2022; pp. 1–6. [Google Scholar]
  8. Chen, J.; Huang, R.; Chen, Z.; Mao, W.; Li, W. Transfer Learning Algorithms for Bearing Remaining Useful Life Prediction: A Comprehensive Review from an Industrial Application Perspective. Mech. Syst. Signal Process. 2023, 193, 110239. [Google Scholar] [CrossRef]
  9. Jin, Y.; Yang, X.; Liu, J.; Yang, Y.; Hei, X.; Shangguan, A. An Improved Nonlinear Health Index CRRMS for the Remaining Useful Life Prediction of Rolling Bearings. Actuators 2025, 14, 88. [Google Scholar] [CrossRef]
  10. Zhong, Z.; Zhao, Y.; Yang, A.; Zhang, H.; Zhang, Z. Prediction of Remaining Service Life of Rolling Bearings Based on Convolutional and Bidirectional Long- and Short-Term Memory Neural Networks. Lubricants 2022, 10, 170. [Google Scholar] [CrossRef]
  11. Chen, Z.; Wu, M.; Zhao, R.; Guretno, F.; Yan, R.; Li, X. Machine Remaining Useful Life Prediction via an Attention-Based Deep Learning Approach. IEEE Trans. Ind. Electron. 2021, 68, 2521–2531. [Google Scholar] [CrossRef]
  12. Zhou, S.; Chen, D.; Pan, J.; Shi, J.; Yang, J. Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16 June 2024; pp. 2952–2963. [Google Scholar]
  13. Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar] [CrossRef]
  14. Shen, J.; Zhou, H.; Jin, M.; Jin, Z.; Wang, Q.; Mu, Y.; Hong, Z. RUL Prediction of Rolling Bearings Based on Fruit Fly Optimization Algorithm Optimized CNN-LSTM Neural Network. Lubricants 2025, 13, 81. [Google Scholar] [CrossRef]
  15. Shutin, D.; Bondarenko, M.; Polyakov, R.; Stebakov, I.; Savin, L. Method for On-Line Remaining Useful Life and Wear Prediction for Adjustable Journal Bearings Utilizing a Combination of Physics-Based and Data-Driven Models: A Numerical Investigation. Lubricants 2023, 11, 33. [Google Scholar] [CrossRef]
  16. Wei, Y.; Wu, D.; Terpenny, J. Bearing Remaining Useful Life Prediction Using Self-Adaptive Graph Convolutional Networks with Self-Attention Mechanism. Mech. Syst. Signal Process. 2023, 188, 110010. [Google Scholar] [CrossRef]
  17. Rathore, M.S.; Harsha, S.P. An Attention-Based Stacked BiLSTM Framework for Predicting Remaining Useful Life of Rolling Bearings. Appl. Soft Comput. 2022, 131, 109765. [Google Scholar] [CrossRef]
  18. Park, Y.-I.; Song, J.W.; Kang, S.-J. Pseudo-Label-Vector-Guided Parallel Attention Network for Remaining Useful Life Prediction. IEEE Trans. Ind. Informat. 2023, 19, 5602–5611. [Google Scholar] [CrossRef]
  19. Nie, L.; Xu, S.; Zhang, L. Multi-Head Attention Network with Adaptive Feature Selection for RUL Predictions of Gradually Degrading Equipment. Actuators 2023, 12, 158. [Google Scholar] [CrossRef]
  20. Peng, H.; Jiang, B.; Mao, Z.; Liu, S. Local Enhancing Transformer with Temporal Convolutional Attention Mechanism for Bearings Remaining Useful Life Prediction. IEEE Trans. Instrum. Meas. 2023, 72, 3522312. [Google Scholar] [CrossRef]
  21. Zhao, Q.; Zhang, X.; Wang, F.; Fan, P.; Mbeka, E. The Effect of the Head Number for Multi-Head Self-Attention in Remaining Useful Life Prediction of Rolling Bearing and Interpretability. Neurocomputing 2025, 616, 128946. [Google Scholar] [CrossRef]
  22. Wang, L.; Cao, H.; Chen, X. Information Guided Attention Network for Bearing Remaining Useful Life Prediction Adaptive to Working Conditions and Fault Modes. Eng. Appl. Artif. Intell. 2025, 147, 110197. [Google Scholar] [CrossRef]
  23. Gao, P.; Wang, J.; Shi, Z.; Ming, W.; Chen, M. Long-Term Temporal Attention Neural Network with Adaptive Stage Division for Remaining Useful Life Prediction of Rolling Bearings. Reliab. Eng. Syst. Saf. 2024, 251, 110218. [Google Scholar] [CrossRef]
  24. Xiang, S.; Qin, Y.; Zhu, C.; Wang, Y.; Chen, H. LSTM Networks Based on Attention Ordered Neurons for Gear Remaining Life Prediction. ISA Trans. 2020, 106, 343–354. [Google Scholar] [CrossRef]
  25. Zhou, K.; Tang, J. A Wavelet Neural Network Informed by Time-Domain Signal Preprocessing for Bearing Remaining Useful Life Prediction. Appl. Math. Modell. 2023, 122, 220–241. [Google Scholar] [CrossRef]
  26. Cao, X.; Zhang, F.; Zhao, J.; Duan, Y.; Guo, X. Remaining Useful Life Prediction of Rolling Bearing Based on Multi-Domain Mixed Features and Temporal Convolutional Networks. Appl. Sci. 2024, 14, 2354. [Google Scholar] [CrossRef]
  27. Niazi, S.G.; Huang, T.; Zhou, H.; Bai, S.; Huang, H.-Z. Multi-Scale Time Series Analysis Using TT-ConvLSTM Technique for Bearing Remaining Useful Life Prediction. Mech. Syst. Signal Process. 2024, 206, 110888. [Google Scholar] [CrossRef]
  28. Yang, C.; Ma, J.; Wang, X.; Li, X.; Li, Z.; Luo, T. A Novel Based-Performance Degradation Indicator RUL Prediction Model and Its Application in Rolling Bearing. ISA Trans. 2022, 121, 349–364. [Google Scholar] [CrossRef]
  29. Cui, L.; Xiao, Y.; Liu, D.; Han, H. Digital Twin-Driven Graph Domain Adaptation Neural Network for Remaining Useful Life Prediction of Rolling Bearing. Reliab. Eng. Syst. Saf. 2024, 245, 109991. [Google Scholar] [CrossRef]
  30. Xu, Z.; Bashir, M.; Liu, Q.; Miao, Z.; Wang, X.; Wang, J.; Ekere, N. A Novel Health Indicator for Intelligent Prediction of Rolling Bearing Remaining Useful Life Based on Unsupervised Learning Model. Comput. Ind. Eng. 2023, 176, 108999. [Google Scholar] [CrossRef]
  31. Wen, L.; Su, S.; Li, X.; Ding, W.; Feng, K. GRU-AE-Wiener: A Generative Adversarial Network Assisted Hybrid Gated Recurrent Unit with Wiener Model for Bearing Remaining Useful Life Estimation. Mech. Syst. Signal Process. 2024, 220, 111663. [Google Scholar] [CrossRef]
  32. Wang, X.; Xie, G.; Zhang, Y.; Liu, H.; Zhou, L.; Liu, W.; Gao, Y. The Application of a BiGRU Model with Transformer-Based Error Correction in Deformation Prediction for Bridge SHM. Buildings 2025, 15, 542. [Google Scholar] [CrossRef]
  33. Zhang, B.; Yin, Y.; Li, B.; He, S.; Song, J. A Hybrid Algorithm for Predicting the Remaining Service Life of Hybrid Bearings Based on Bidirectional Feature Extraction. Measurement 2025, 242, 116152. [Google Scholar] [CrossRef]
  34. A Deep Transfer Network Based on Dual-Task Learning for Predicting the Remaining Useful Life of Rolling Bearings. Available online: https://colab.ws/articles/10.1088%2F1361-6501%2Fadafd2 (accessed on 20 March 2025).
  35. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
  36. Nectoux, P.; Gouriveau, R.; Medjaher, K.; Ramasso, E.; Chebel-Morello, B.; Zerhouni, N.; Varnier, C. PRONOSTIA: An Experimental Platform for Bearings Accelerated Degradation Tests. In Proceedings of the Conference on Prognostics and Health Management, IEEE Catalog Number: CPF12PHM-CDR. Denver, CO, USA, 18–21 June 2012; pp. 1–8. [Google Scholar]
  37. Lei, Y.; Han, T.; Wang, B.; Li, N.; Yan, T.; Yang, J. XJTU-SY Rolling Element Bearing Accelerated Life Test Datasets: A Tutorial. Researchgate 2019, 16, 001. [Google Scholar] [CrossRef]
  38. Wang, B.; Lei, Y.; Li, N.; Li, N. A Hybrid Prognostics Approach for Estimating Remaining Useful Life of Rolling Element Bearings. IEEE Trans. Rel. 2020, 69, 401–412. [Google Scholar] [CrossRef]
  39. Meng, Z.; Ma, S.; Cao, W.; Li, J.; Cao, L.; Fan, F.; Wang, X. A Remaining Useful Life Prediction Method of Rolling Bearings by RSA-BAFT Combined with Copula Entropy Feature Selection. Expert Syst. Appl. 2025, 275, 127100. [Google Scholar] [CrossRef]
  40. Guo, W.; Li, F.; Zhang, P.; Luo, L. A Stage-Related Online Incremental Transfer Learning-Based Remaining Useful Life Prediction Method of Bearings. Appl. Soft Comput. 2025, 169, 112491. [Google Scholar] [CrossRef]
  41. Ma, M.; Mao, Z. Deep-Convolution-Based LSTM Network for Remaining Useful Life Prediction. IEEE Trans. Ind. Informat. 2021, 17, 1658–1667. [Google Scholar] [CrossRef]
  42. Kim, S.; Seo, Y.-H.; Park, J. Transformer-Based Novel Framework for Remaining Useful Life Prediction of Lubricant in Operational Rolling Bearings. Reliab. Eng. Syst. Saf. 2024, 251, 110377. [Google Scholar] [CrossRef]
  43. Mu, H.; Zhai, X.; Yin, D.; Qiao, F. A Method of Remaining Useful Life Prediction of Multi-Source Signals Aero-Engine Based on RF-Transformer-LSTM. In Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, 9–12 October 2022; pp. 2502–2507. [Google Scholar]
  44. Jia, J.; Yan Yang, Y.Y.; Junyu Guo, J.G.; Le Dai, L.D. A Hybrid CNN-BiLSTM and Wiener Process-Based Prediction Approach of Remaining Useful Life for Rolling Bearings. Comput. Res. Prog. Appl. Sci. Eng. 2022, 8, 2817. [Google Scholar] [CrossRef]
  45. An, X.; Zhang, C.; Liu, C.; Liu, G.; Hao, J. Residual Life Prediction of Rolling Bearings Based on Transformer-BiGRU-Attention Model with Improved Sparrow Optimization Algorithm. In Proceedings of the TEPEN International Workshop on Fault Diagnostic and Prognostic, Qingdao, China, 8–11 May 2024; Springer Nature: Cham, Switzerland, 2024; pp. 23–33. [Google Scholar]
Figure 1. Architecture of the proposed BAIT-RUL model.
Figure 1. Architecture of the proposed BAIT-RUL model.
Actuators 14 00238 g001
Figure 2. Structure of the BiGRU-ASSA branch.
Figure 2. Structure of the BiGRU-ASSA branch.
Actuators 14 00238 g002
Figure 3. Standard BiGRU for regression problems.
Figure 3. Standard BiGRU for regression problems.
Actuators 14 00238 g003
Figure 4. Structure of the iTransformer branch.
Figure 4. Structure of the iTransformer branch.
Actuators 14 00238 g004
Figure 5. Vibration time-series plots for bearings 1_1, 2_1, and 3_1 in the PRONOSTIA dataset.
Figure 5. Vibration time-series plots for bearings 1_1, 2_1, and 3_1 in the PRONOSTIA dataset.
Actuators 14 00238 g005
Figure 6. Vibration time-series plots for bearings 1_1, 2_1, and 3_1 in the XJTU-SY dataset.
Figure 6. Vibration time-series plots for bearings 1_1, 2_1, and 3_1 in the XJTU-SY dataset.
Actuators 14 00238 g006
Figure 7. Handcrafted feature extraction for PRONOSTIA dataset Bearing1_3.
Figure 7. Handcrafted feature extraction for PRONOSTIA dataset Bearing1_3.
Actuators 14 00238 g007
Figure 8. Prediction results for Bearing1_3 in the PRONOSTIA dataset using different approaches: (a) LSTM; (b) Transformer; (c) Transformer-LSTM; (d) CNN-BiLSTM; (e) BiGRU-Transformer-Attention; (f) BAIT-RUL.
Figure 8. Prediction results for Bearing1_3 in the PRONOSTIA dataset using different approaches: (a) LSTM; (b) Transformer; (c) Transformer-LSTM; (d) CNN-BiLSTM; (e) BiGRU-Transformer-Attention; (f) BAIT-RUL.
Actuators 14 00238 g008
Figure 9. Boxplot of RMSE and MAE distribution for different models on the PRONOSTIA dataset.
Figure 9. Boxplot of RMSE and MAE distribution for different models on the PRONOSTIA dataset.
Actuators 14 00238 g009
Figure 10. Prediction results for Bearing1_3 in the XJTU-SY dataset using different approaches: (a) LSTM; (b) Transformer; (c) Transformer-LSTM; (d) CNN-BiLSTM; (e) BiGRU-Transformer-Attention; (f) BAIT-RUL.
Figure 10. Prediction results for Bearing1_3 in the XJTU-SY dataset using different approaches: (a) LSTM; (b) Transformer; (c) Transformer-LSTM; (d) CNN-BiLSTM; (e) BiGRU-Transformer-Attention; (f) BAIT-RUL.
Actuators 14 00238 g010
Figure 11. Boxplot of RMSE and MAE distribution for different models on the SJTU-SY dataset.
Figure 11. Boxplot of RMSE and MAE distribution for different models on the SJTU-SY dataset.
Actuators 14 00238 g011
Figure 12. Performance comparison of ablation models on the PRONOSTIA dataset.
Figure 12. Performance comparison of ablation models on the PRONOSTIA dataset.
Actuators 14 00238 g012
Figure 13. Attention matrix of one sample.
Figure 13. Attention matrix of one sample.
Actuators 14 00238 g013
Table 1. Division of training and test sets in the PRONOSTIA dataset.
Table 1. Division of training and test sets in the PRONOSTIA dataset.
DatasetsOperating Conditions
Conditions 1Conditions 2Conditions 3
Learning setBearing1_1Bearing2_1Bearing3_1
Bearing1_2Bearing2_2Bearing3_2
Test setBearing1_3Bearing2_3Bearing3_3
Bearing1_4Bearing2_4
Bearing1_5Bearing2_5
Bearing1_6Bearing2_6
Bearing1_7Bearing2_7
Table 2. Division of training and test sets in the XJTU-SY dataset.
Table 2. Division of training and test sets in the XJTU-SY dataset.
DatasetsOperating Conditions
Conditions 1Conditions 2Conditions 3
Learning setBearing1_1
Bearing1_2
Bearing2_1Bearing3_1
Bearing2_2Bearing3_2
Bearing2_3Bearing3_3
Test setBearing1_3Bearing2_4
Bearing2_5
Bearing3_4
Bearing3_5
Bearing1_4
Bearing1_5
Table 3. Summary of XJTU-SY bearing dataset.
Table 3. Summary of XJTU-SY bearing dataset.
ConditionDatasetSample SizeWork TimeFault Type
1Bearing1_11232 h 3 minOuter race fault
Bearing1_21612 h 41 minOuter race fault
Bearing1_31582 h 38 minOuter race fault
Bearing1_41222 h 2 minCage fault
Bearing1_55252 minInner race fault
2Bearing2_14918 h 11 minInner race fault
Bearing2_21612 h 41 minOuter race fault
Bearing2_35338 h 53 minCage fault
Bearing2_44242 minOuter race fault
Bearing2_53395 h 39 minOuter race fault
3Bearing3_1253842 h 18 minOuter race fault
Bearing3_2249641 h 18 minCompound fault
Bearing3_33716 h 11 minInner race fault
Bearing3_4151525 h 15 minInner race fault
Bearing3_51141 h 54 minOuter race fault
Table 4. RMSE and MAE comparison of different models on the PRONOSTIA dataset.
Table 4. RMSE and MAE comparison of different models on the PRONOSTIA dataset.
Test BearingLSTMTransformerTransformer-LSTMCNN-BiLSTMBiGRU-Transformer-AttentionBAIT-RUL
RMSEMAERMSEMAERMSEMAERMSEMAERMSEMAERMSEMAE
Bearing1_30.1100.1020.0860.0690.0540.0500.0810.0630.0520.0350.0380.033
Bearing1_40.1160.1040.0900.0710.0630.0450.0950.0780.0340.0280.0460.036
Bearing1_50.1230.1120.0910.0820.0800.0670.0580.0420.0680.0560.0240.016
Bearing1_60.1180.1010.1030.0910.1010.0850.0730.0540.0740.0570.0570.044
Bearing1_70.1320.1210.1060.0900.0590.0480.1040.0910.0630.0540.0480.042
Bearing2_30.1380.1200.1230.1000.0990.0880.1240.0970.0720.0540.0610.039
Bearing2_40.1710.1580.1350.1100.1250.0990.1210.0970.0880.0760.0540.040
Bearing2_50.1650.1520.1170.1040.0870.0790.1460.1180.1260.0910.0820.062
Bearing2_60.1230.1120.0980.0820.0900.0750.1200.0900.0780.0610.0540.038
Bearing2_70.1310.1090.1050.0900.0970.0860.1230.0840.0590.0500.0500.040
Bearing3_30.0890.0760.0560.0440.0410.0350.0550.0420.1820.1710.0290.021
Note: Values in bold denote the optimal performance among all comparative methods.
Table 5. RMSE and MAE comparison of different models on the XJTU-SY dataset.
Table 5. RMSE and MAE comparison of different models on the XJTU-SY dataset.
Test BearingLSTMTransformerTransformer-LSTMCNN-BiLSTMBiGRU-Transformer-AttentionBAIT-RUL
RMSEMAERMSEMAERMSEMAERMSEMAERMSEMAERMSEMAE
Bearing1_30.0640.0580.0420.0330.0300.0240.0350.0300.0260.0210.0080.007
Bearing1_40.1290.1080.1020.0860.0910.0720.0780.0630.0930.0740.0340.026
Bearing1_50.0870.0760.0680.0550.0520.0440.0720.0670.0620.0590.0460.042
Bearing2_40.0660.0530.0510.0460.0240.0180.0560.0520.0520.0470.0410.035
Bearing2_50.1200.1140.1120.1040.1030.0780.1760.1490.2270.1920.0680.061
Bearing3_40.0920.0790.0780.0610.0610.0490.0450.0370.0340.0290.0230.021
Bearing3_50.1460.1250.1310.0980.0990.0860.0790.0680.1270.0980.0640.053
Note: Values in bold denote the optimal performance among all comparative methods.
Table 6. RMSE and MAE results of ablation models on the PRONOSTIA and XJTU-SY datasets.
Table 6. RMSE and MAE results of ablation models on the PRONOSTIA and XJTU-SY datasets.
Test BearingBiGRUBiGRU + iTransformerBiGRU + ASSAGRU + ASSA + iTransformerBAIT-RUL
RMSEMAERMSEMAERMSEMAERMSEMAERMSEMAE
PRONOSTIABearing1_30.0980.0900.0670.0580.0480.0340.0760.0610.0380.033
Bearing2_40.1190.0900.0660.0580.0690.0540.0710.0480.0540.040
XJTU-SYBearing1_30.0680.0550.0140.0120.0240.0200.0180.0150.0080.007
Bearing2_50.1990.1680.0710.0630.1500.1300.1870.1560.0680.061
Note: Values in bold denote the optimal performance among all comparative methods.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lyu, Y.; Qiu, Q.; Chu, Y.; Zhang, J. An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing. Actuators 2025, 14, 238. https://doi.org/10.3390/act14050238

AMA Style

Lyu Y, Qiu Q, Chu Y, Zhang J. An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing. Actuators. 2025; 14(5):238. https://doi.org/10.3390/act14050238

Chicago/Turabian Style

Lyu, Youlong, Qingpeng Qiu, Ying Chu, and Jie Zhang. 2025. "An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing" Actuators 14, no. 5: 238. https://doi.org/10.3390/act14050238

APA Style

Lyu, Y., Qiu, Q., Chu, Y., & Zhang, J. (2025). An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing. Actuators, 14(5), 238. https://doi.org/10.3390/act14050238

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop