An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing

Lyu, Youlong; Qiu, Qingpeng; Chu, Ying; Zhang, Jie

doi:10.3390/act14050238

Open AccessArticle

An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing

¹

Institute of Artificial Intelligence, Donghua University, Shanghai 201620, China

²

Shanghai Engineering Research Center of Industrial Big Data and Intelligent System, Shanghai 201620, China

³

College of Information Science and Technology, Donghua University, Shanghai 201620, China

^*

Author to whom correspondence should be addressed.

Actuators 2025, 14(5), 238; https://doi.org/10.3390/act14050238

Submission received: 26 March 2025 / Revised: 7 May 2025 / Accepted: 8 May 2025 / Published: 9 May 2025

(This article belongs to the Section Actuators for Manufacturing Systems)

Download

Browse Figures

Versions Notes

Abstract

In aerospace manufacturing, the reliability of machining equipment, particularly spindle bearings, is critical to maintaining productivity, as bearing health significantly constrains operational efficiency. Accurate prediction of the remaining useful life (RUL) of bearings can preempt failures, reduce downtime, and boost productivity. While conventional BiGRU-based models for bearing RUL prediction have shown promise, they often overlook handcrafted extracted time-series features that could enhance accuracy. This study introduces a novel model, BiGRU-ASSA-iTransformer, that integrates deep learning and handcrafted feature extraction to improve RUL prediction. The approach employs two parallel processes with a fusion step: First, a bi-directional gated recurrent unit (BiGRU) captures dynamic degradation features from raw vibration signals, with an adaptive sparse self-attention (ASSA) mechanism emphasizing short-term degradation cues. Second, 13 time-domain, frequency-domain, and statistical features, derived from traditional expertise, are processed using iTransformer to encode temporal correlations. These outputs are then fused via an attention mechanism. Experiments on the PHM 2012 and XJTU-SY datasets demonstrate that this model achieves the lowest prediction error and highest accuracy compared to existing methods, highlighting the value of combining handcrafted and deep learning approaches for robust RUL prediction in aerospace applications.

Keywords:

RUL prediction; aerospace manufacturing; BiGRU; iTransformer

1. Introduction

With the rapid development of industrial automation and intelligent manufacturing, complex aerospace structural part machining workshops put forward higher requirements for the operational reliability of key manufacturing resources. In complex aerospace structural parts machining workshops, the operational reliability of manufacturing equipment is crucial to production efficiency, among which five-axis machining centers and mill-turn centers are widely used in high-precision structural parts machining as key manufacturing resources [1,2]. However, these types of equipment have various types of failures, including electrical failures, mechanical wear and tear, transmission system malfunctions, and control system abnormalities [3]. Analysis shows that the probability of failure varies depending on the complexity of the working conditions, in which the failure caused by mechanical wear occupies a more significant position among all the failures and is considered to be one of the main sources of failure. Spindle failure is particularly critical in mechanical failures due to its direct impact on machining accuracy and equipment continuity, and it belongs to the most decisive type of failure. Within the scope of spindle failures, bearing failures become the most prominent problem due to the combined effects of long-term high loads, over-speed operation, and environmental factors such as vibration and high temperatures, significantly restricting the overall usability of the equipment, which is the core bottleneck [4,5,6]. Accurate prediction of the remaining useful life (RUL) of bearings can provide early warning of failures, reduce unplanned downtime, significantly improve productivity, and reduce maintenance costs [7].

Currently, bearing RUL prediction methods can be categorized into model-driven, data-driven, and hybrid approaches [8,9]. Model-driven methods rely on physical degradation models, but their accuracy often suffers under complex, noisy operating conditions due to parameter sensitivities, limiting their applicability. Hybrid methods attempt to integrate physical models with data but frequently face challenges related to data quality and modeling complexity. In contrast, data-driven methods exhibit greater flexibility by mining degradation patterns directly from sensor data, an advantage that has become more pronounced following the advent of deep learning. Recurrent neural networks (RNNs) and their variants (e.g., LSTM and GRU) have demonstrated robust performance in time-series modeling [10], and bidirectional gated recurrent units (BiGRU) offer enhanced capabilities for capturing bearing degradation processes via bidirectional propagation. However, the conventional BiGRU method still has shortcomings: on one hand, it typically relies solely on the final time-step features for prediction, overlooking the potential contributions of earlier degradation stages [11]; on the other hand, it often neglects handcrafted features derived from expert domain knowledge, limiting further improvements in accuracy.

Therefore, this paper proposes a rolling bearing RUL prediction model (BiGRU-ASSA-iTransformer, BAIT-RUL for short). The model integrates four modules, namely, an adaptive sparse self-attention mechanism (ASSA) [12], a bi-directional gated recurrent unit (BiGRU), iTransformer [13], and handcrafted feature extraction, and consists of two parallel processes: (1) Capture the dynamic features of the degradation of original vibration signals throughout the whole process by utilizing the bi-directional gated recurrent unit (BiGRU), and focus on the key time steps in the degradation of bearings in combination with the ASSA mechanism. (2) Extract 13 handcrafted feature sequences related to bearing degradation in the time domain, frequency domain, and statistical features based on domain knowledge, and model their intrinsic temporal correlations through iTransformer. Finally, the dynamic features captured by BiGRU and the handcrafted time-sequence features optimized by iTransformer are fused by the attention mechanism to further improve the prediction accuracy.

The structure of this paper is as follows: Section 2 analyzes the development status and technical characteristics of existing RUL prediction methods; Section 3 describes the construction process of the BiGRU-ASSA-iTransformer model and its theoretical foundation; Section 4 introduces the used datasets and experimental setups, and analyzes the results; Section 5 discusses the significance of the experiments; and Section 6 summarizes the research results and looks forward to the future direction.

2. Related Works

Bearing remaining useful life (RUL) prediction is a key technology in the field of machinery health management, and the widely used prediction methods mainly include model-driven, data-driven, and hybrid methods. Among them, data-driven methods have gradually dominated the research in recent years due to their advantages of not requiring accurate mathematical modeling, high adaptability, and automatic extraction of complex features [14,15,16]. In this chapter, we will focus on reviewing the current status of the application of attention mechanisms and handcrafted feature extraction methods in bearing residual life prediction, analyzing their technical characteristics and deficiencies, and thus presenting the motivation for the research in this paper.

2.1. Attention Mechanism in Bearing Remaining Life Prediction

The attention mechanism is used to extract key information more effectively by dynamically assigning different weights to each part of the sequence, thus improving the modeling ability of the model on complex time-series data. Initially, this mechanism was widely used in the field of natural language processing, and in recent years, it has been gradually applied to the bearing RUL prediction task [17,18,19]. Peng et al. [20] proposed a multiscale temporal convolutional Transformer (MTCT), which synchronizes the extraction of long-term degradation through the introduction of a convolutional self-attention (CsA) mechanism and a temporal convolution network (TCN) attention module, and extracts long-term degradation features and local contextual correlations to achieve prediction directly from raw run-to-failure data. Zhao et al. [21] introduced a multi-head self-attention mechanism (MSM) to enhance feature representation and sequence modeling in RUL prediction, and analyzed the effect of head number on model accuracy, robustness, and interpretability by visualizing attention weight distributions and applying graph theory to interpret attention behavior. Similarly, the information-guided attention network (IGAN) designed by Wang et al. [22] utilizes multi-scale expansion convolution and a temporal attention mechanism to dynamically focus on different temporal locations of degradation features and enhance the model’s adaptability to the bearing degradation process. The adaptive stage-divided long temporal attention network (AD-LTAN) proposed by Gao et al. [23] identifies degradation initiation points through adaptive health stage dividing, and combines the multilayer extended convolution and temporal attention mechanism to improve the prediction accuracy of long-life bearings. In addition, Xiang et al. [24] proposed a novel LSTM network with attention-ordered neurons (LSTM-AON) for gear remaining useful life (RUL) prediction, enhancing robustness and long-term prediction accuracy by integrating an attention-guided tree structure. The attention mechanism guides the hierarchical division of input and historical information, assigning physical meaning to different attention levels, thereby improving the network’s ability to prioritize and retain critical degradation information for accurate RUL prediction.

The standard self-attention mechanism (Self-Attention) is able to capture long-range dependencies by modeling the relationship between all elements of a sequence, but it also suffers from high computational complexity and easy interference from redundant information, and has limitations in complex aerospace machining environments. Aerospace machining workshop bearings work in complex environments with high loads, obvious nonlinearities in the degradation process, and higher requirements for information capture in localized critical degradation phases. However, most of the existing attention mechanisms fail to effectively realize the adaptive attention of local short-term features, which limits the performance of the model in such complex working conditions.

The adaptive sparse self-attention mechanism (ASSA) was first proposed by Zhou et al. [12] in 2024 in an image recovery task. This mechanism adaptively filters low correlation features and highlights key features by combining sparse self-attention (SSA) and dense self-attention (DSA). SSA employs a sparse filtering mechanism based on the squared ReLU function, which efficiently suppresses redundant information interference, whereas the DSA retains core information through the softmax mechanism. In this study, considering that the ASSA mechanism can adaptively focus on more important regions in an image or sequence, it is introduced into the field of bearing RUL prediction to realize the effective capture of critical short-term information during bearing degradation.

2.2. Handcrafted Features in Bearing Remaining Life Prediction

In parallel with attention-based techniques, handcrafted feature extraction—using time-, frequency-, or statistical-domain features—remains valuable in bearing RUL prediction because it leverages established domain knowledge [25]. Researchers have demonstrated that combining handcrafted features with deep learning can enhance model performance. For example, Cao et al. [26] proposed a multidomain hybrid feature approach incorporating time, frequency, and entropy features for improved health indicator (HI) representation, while Niazi et al. [27] incorporated adaptive windows to refine handcrafted features over varying degradation stages. Likewise, Yang et al. [28] extracted multiple time-domain and entropy-based features—including variance, RMS, energy, sample entropy, shape factor, and Rényi entropy—from reconstructed vibration signals obtained via PCHIP-LCD and ISC selection. These features were then fused using improved independent component analysis and Mahalanobis distance to form a sensitive degradation indicator for RUL prediction. Other studies, such as that by Cui et al. [29], utilized graph convolutional networks (GCNs) in tandem with GRU for spatio-temporal feature learning and cross-domain feature alignment.

Although the combination of handcrafted features and deep learning shows some potential in RUL prediction, its application in complex aerospace structural part machining workshop scenarios still faces challenges. The degradation process of aerospace machining bearings is affected by the superposition of multiple sources of interference, and the traditional handcrafted features are usually used as static inputs, ignoring the dynamically changing characteristics of the features over time and failing to comprehensively capture the dynamic evolution of the bearing degradation process.

To solve the problem of dynamic modeling of feature sequences, Liu et al. [13] proposed the dimensional inversion Transformer (iTransformer). The model was initially used to solve the generalized multivariate long sequence prediction task. iTransformer efficiently models the dynamic correlations among variables by inverting the temporal and variable dimensions of the traditional Transformer model so that each variable is independently embedded as a token. In this paper, we believe that the feature dynamic encoding approach of this model is also advantageous for dynamic feature modeling of bearing degradation, and thus, iTransformer is introduced to model the temporal dynamics of handmade features to more effectively capture the feature evolution laws during bearing degradation.

In summary, the current data-driven bearing RUL prediction methods still face the following deficiencies in complex aerospace processing environments: (1) the existing attention mechanism lacks precise attention to local short-term degradation of key features and sparse optimization; (2) the handcrafted feature methods tend to neglect the dynamic evolution process of the features, which makes it difficult to effectively characterize the changing law of degradation features under complex environments; (3) the deep learning methods lack effective combination with domain knowledge and fail to achieve dynamic feature modeling, thus failing to realize the efficient integration of dynamic features and handcrafted features.

To address the above deficiencies, this paper proposes a bearing RUL prediction model, BAIT-RUL, that combines BiGRU, the adaptive sparse self-attention mechanism (ASSA) [12], and dimension inversion Transformer (iTransformer) [13], which utilizes BiGRU to capture the long-term-dependent features of the degradation process, and applies ASSA to focus on the key time steps in the process. The model utilizes BiGRU to capture the long-term dependent features of the degradation process, applies ASSA to focus on the critical time steps in the degradation process, assigns higher weights to the time steps that contain more rich degradation information, introduces iTransformer to dynamically encode the sequence of handmade features in a chronological order, and ultimately integrates the in-depth features with the chronological optimization features through the attention mechanism, to enhance the prediction accuracy and robustness of the model under complex working conditions.

3. Methods

In this section, a bearing remaining life (RUL) prediction model, BAIT-RUL, is proposed, combining deep learning and domain knowledge to enhance prediction accuracy and robustness. The model processes raw vibration signals and handcrafted features separately through two parallel branches and an attention fusion mechanism to comprehensively capture dynamic changes and temporal correlations. The architecture of BAIT-RUL is illustrated in Figure 1.

The BiGRU-ASSA branch employs a three-layer bidirectional GRU (BiGRU) to extract dynamic features from raw vibration signals, capturing long-term temporal dependencies. An adaptive sparse self-attention (ASSA) mechanism dynamically weights the BiGRU outputs to emphasize short-term degradation features critical for RUL prediction. The iTransformer branch processes 13 handcrafted features (e.g., time-domain, frequency-domain, and statistical features) derived from domain knowledge, using an improved Transformer (iTransformer) to capture global dependencies and evolutionary trends in bearing degradation. These branches are complementary: the BiGRU-ASSA branch focuses on local, dynamic patterns, while the iTransformer branch models global trends, with their feature representations correlated through the attention fusion mechanism. The fusion mechanism computes weighted correlations between the branches’ outputs using multi-head attention, dynamically balancing the contributions of local and global features. For instance, in early degradation stages, the fusion mechanism may prioritize BiGRU-ASSA’s dynamic features to capture subtle signal changes, while in later stages, it may emphasize iTransformer’s handcrafted features to reflect stable trends. This correlation and interdependence enable the model to adapt to varying degradation phases, enhancing robustness.

The features from the two branches are fused via the attention mechanism, and RUL predictions are generated after dimensionality reduction by a fully connected layer. As shown in Figure 1, BAIT-RUL effectively integrates dynamic degradation information with the temporal characterization of handcrafted features.

3.1. BiGRU-ASSA Module

To fully utilize the temporal information and dynamic features of the original vibration signals, this study designs a feature extraction path based on a bidirectional gated recurrent unit (BiGRU) with an adaptive sparse self-attention mechanism (ASSA). The structure of this feature extraction branch is illustrated in Figure 2. Specifically, the BiGRU module captures the long-term dependencies of the signals through a bidirectional propagation mechanism, while the ASSA module dynamically focuses on the important features at key time steps during the degradation process. The following sections describe the theoretical foundations and technical details of these two modules, respectively.

3.1.1. Bidirectional Gated Recurrent Unit

The remaining life prediction of bearings usually relies on time-series data collected from sensors (e.g., horizontal vs. vertical vibration signals) [30]. Recurrent neural networks (RNNs) can effectively capture the temporal dependence of the data due to their property of connecting nodes along a time series. However, traditional RNNs are prone to the problem of gradient vanishing or explosion during training, which limits their ability to learn long sequences. To overcome this problem, gated recurrent units (GRUs) have been developed, which selectively retain and forget historical information through the reset gate and update gate mechanisms, thus enhancing the ability to capture long-term dependencies and simplifying the network structure to improve computational efficiency [31].

Although GRU effectively improves the RNN’s ability to model long-term dependencies, unidirectional GRU only utilizes historical information and lacks effective mining of future information. To address this issue, this paper employs a bidirectional gated recurrent unit (BiGRU). The BiGRU contains two GRU structures with opposite directions, which can more comprehensively capture the characteristics of the dynamic evolution of the bearing degradation process by considering both past and future sequence information through forward and backward propagation [32]. For example, weak degradation signals in the early stages of a bearing may be more evident in subsequent stages of development, and a backward-propagating GRU can effectively capture and enhance such features.

In the specific implementation, the input and hidden states at time step

t

are set as

x_{t}

and

h_{t}

, respectively; then, the reset gate

r_{t}

and update gate

z_{t}

of the GRU dynamically regulate the information flow through the following equation:

r_{t} = σ (W_{x r} x_{t} + W_{h r} h_{t - 1} + b_{r}),

(1)

z_{t} = σ (W_{x z} + W_{h z} h_{t - 1} + b_{z}),

(2)

where

σ (\cdot)

is the Sigmoid activation function, and

{{W}_{x r}, W_{h r}, W_{x z}, W_{h z}}

and

\{b_{r}, b_{z}\}

represent the set of weight matrix and bias vector, respectively. The update gate

z_{t}

controls the fusion ratio of the history state to the current input, and the reset gate

r_{t}

determines how much of the history memory is retained. This gating mechanism enables the GRU to adaptively filter noise and retain key features associated with degradation.

At moment

t

, the hidden state update formula of the GRU is:

\tilde{h_{t}} = t a n h (W_{x h} x_{t} + W_{h h (r_{t} ⨀ h_{t - 1})} + b_{r}),

(3)

h_{t} = (1 - z_{t}) h_{t - 1} + z_{t} \tilde{h_{t}},

(4)

where

\tilde{h_{t}}

is the candidate hidden state,

t a n h (x)

is the hyperbolic tangent function, the symbol

⨀

denotes the Hadamard product, and

W_{x h}

and

W_{h h}

are two weight matrices. In BiGRU, the final hidden state integrates the forward and backward outputs to fully characterize the input sequence.

As an improved structure of GRU, BiGRU consists of two GRU networks with opposite propagation directions. In contrast to GRU, Figure 3 shows the BIGRU network with K GRU units, where each node in the BIGRU contains information about the entire input sequence, allowing for a better synthesis of feature extraction for all input samples.

Due to its strong sequential modeling capabilities, the BiGRU network has successfully been used for machine RUL prediction [33,34]. However, the BiGRU network used in these studies gives the same weight to all time steps in the bearing degradation process, ignoring the important contribution of critical time steps in the bearing degradation process, as shown in Figure 3.

3.1.2. Adaptive Sparse Self-Attention

In the bearing remaining useful life (RUL) prediction task, different time steps of the time-series data contribute unevenly to the final prediction [11]. In a complex aerospace structural component machining workshop, spindle bearings are affected by high loads, high speeds, and multi-source disturbances (e.g., high temperatures and high vibrations), and the degradation stage contains more critical information. In contrast, the contribution of the smooth data in the normal operation stage is smaller. The traditional BiGRU network can capture the dependencies in the time-series data, but it is unable to weight the importance of features and time steps, resulting in irrelevant information that may interfere with the model performance. For this reason, this paper introduces the adaptive sparse self-attention (ASSA) mechanism into the BiGRU network to enhance the model’s ability to focus on key features of aerospace machine bearing degradation.

ASSA dynamically filters information valuable for RUL prediction and suppresses redundant noise by combining the sparse self-attention (SSA) and dense self-attention (DSA) branches. Specifically, the SSA branch employs square ReLU-based attention computation to filter features with low query key matching scores and ensure sparse attention to degraded critical time steps; the DSA branch utilizes a softmax layer to retain dense feature information and compensate for over-sparsity. ASSA fuses the SSA and DSA branches adaptively to obtain temporal features extracted by BiGRU from the weighted results and propagates them through the network. This dual-branch design allows the model to flexibly adapt to the complex patterns of bearing degradation in aerospace machining workshops [12].

In the application, the ASSA module receives the dynamic feature sequence output from BiGRU, and by assigning different weights to the features at each time step, it highlights the significant change points in the degradation process of the aerospace bearings, thus improving the accuracy and robustness of the prediction. Compared with the use of BiGRU alone, the incorporation of ASSA enables the model to automatically identify the time steps with the most predictive value and reduces the interference of irrelevant information, which is especially suitable for the demand of high-precision RUL prediction under complex aerospace working conditions.

First, given the input one-dimensional time-series features

X \in R^{T \times C}

(where

T

is the number of time steps and

C

is the feature dimension), the time series is divided into several non-overlapping time segments according to a fixed window, and the features are learned for each time segment. Next, ASSA computes the attention score by generating a matrix of queries

Q

, keys

K,

and values

V

as:

Q = X W_{Q}, K = X W_{K}, V = X W_{V},

(5)

where

W_{Q}, W_{K}, W_{V} \in R^{C \times d}

is a linear projection matrix and

d

is the projection dimension. The attention calculation can be defined as:

A = f (\frac{{Q K}^{T}}{\sqrt{d}} + B) V,

(6)

where

A

denotes the estimated attention,

B

is the learnable relative positional bias, and

f (\cdot)

is the scoring function, considering the temporal correlation of the time series.

ASSA utilizes two self-attention mechanisms to process the input features. The first one is sparse self-attention (SSA), which filters out features with low matching scores by a ReLU-based squaring layer that removes noise and highlights key time-series information. Its calculation formula is:

S S A = {R e L U}^{2} (\frac{{Q K}^{T}}{\sqrt{d}} + B),

(7)

The second is dense self-attention (DSA), which computes the attention scores of all query key pairs through a softmax layer, preserving the global dependency of the time series:

D S A = S o f t M a x (\frac{{Q K}^{T}}{\sqrt{d}} + B),

(8)

To adaptively regulate the weights of these two mechanisms, ASSA introduces a weighting mechanism that dynamically adjusts their weights according to the performance of each branch. The final attention matrix

A

is given by the following equation:

A = (w_{1} * S S A + w_{2} * D S A) V,

(9)

where

w_{1}, w_{2} \in R^{1}

are two normalized weights for adaptively adjusting the two branches, which can be obtained by the following formula:

w_{n} = \frac{e^{a_{n}}}{\sum_{i = 1}^{N} e^{a_{i}}}, n = \{1,2\},

(10)

where

{a_{1}, a_{2}}

are learnable parameters indicating adaptive control over SSA and DSA branches. With this fusion strategy, the model can flexibly adjust the sparseness and denseness of the attention at different stages of bearing degradation to capture the evolutionary trend of key time steps and features more accurately, which optimizes the modeling of degraded features in time series.

3.2. iTransformer Module

To leverage the temporal information in domain knowledge-based handcrafted features, we employ an inverted Transformer (iTransformer) to optimize the time-series representations of 13 handcrafted features (e.g., time-domain, frequency-domain, and statistical features). The structure is shown in Figure 4. While end-to-end deep learning methods can extract degradation features, they often fail to incorporate a priori knowledge of physical degradation mechanisms in complex machining environments. The iTransformer processes handcrafted feature sequences to capture global dependencies and evolutionary trends, complementing the local, dynamic features extracted by the BiGRU-ASSA branch. The resulting global feature representations are correlated with BiGRU-ASSA’s dynamic features through the attention fusion mechanism, integrating local and global information to enhance RUL prediction accuracy. The handcrafted feature extraction process and iTransformer model are described in detail below.

3.2.1. Handcrafted Features

In the complex environment of an aerospace structural component machining workshop, spindle bearing degradation is influenced by high loads, high speeds, and multiple sources of disturbances (e.g., vibration, temperature variations), exhibiting significant time-series evolutionary characteristics. To effectively characterize these properties, this study draws on Refs. [25,26,27,28,29] to extract 13 handcrafted features from the original vibration signals, which have proven effective for predicting the remaining bearing life. These include time-domain features (e.g., root mean square (RMS), cragginess, peaking factor), frequency-domain features (e.g., power spectral density, spectral gravity), and statistical features (e.g., energy entropy). These features are computed using signal processing techniques and are designed to capture physical patterns in bearing degradation, such as amplitude changes, frequency distribution shifts, and signal complexity variations. The features are selected based on their strong correlation with the degradation process, providing physically meaningful and informative inputs for subsequent modeling.

3.2.2. iTransformer

Transformer (TR), proposed by Vaswani et al. in 2017, is a neural network architecture based on a self-attention mechanism originally designed for natural language processing tasks [35]. Traditional Transformer tokenizes in time steps and faces the challenges of high computational complexity and confusing multivariate information in time-series prediction. For this reason, Yong Liu et al. proposed iTransformer in 2024 to improve the traditional architecture through the dimension inversion (inverted) method [13]. The method embeds each variable of a time series as a token independently, as shown in Figure 4; captures inter-variable correlations using the self-attention mechanism; and models the time-series evolution of each variable with the help of feed-forward neural networks (FFNs), thus improving the efficiency of long series prediction and reducing the information aliasing.

In this study, iTransformer is introduced for time-series dynamic modeling of handmade features to effectively mine the dynamic evolution law of handmade features. The specific realization steps are as follows:

Feature embedding (Embedding)

iTransformer adopts variable-level tokenization; firstly, all handcrafted features need to be embedded, and each feature variable is mapped to a high-dimensional representation space:

h_{n}^{0} = E m b e d d i n g (X_{:, n}),

(11)

where

X_{:, n}

denotes the

n

th handcrafted feature sequence, and Embedding is implemented by a multilayer perceptron machine (MLP) to capture the initial representation of features.

2.: Inter-feature interaction modeling (Self-Attention)

The iTransformer computes the dependencies between different handmade features through the self-attention mechanism to extract the global information and improve the representation of the features. The encoding process is as follows:

H^{l + 1} = T r m B l o c k (H^{l}), l = 0, \dots, L - 1,

(12)

where

H

is the embedded representation of all features and TrmBlock consists of multiple Transformer encoder layers, each containing multi-head self-attention (MHSA) and a feed-forward neural network (FFN).

3.: Time-dependent modeling (FFN)

Since iTransformer adopts a feed-forward neural network (FFN) to independently learn the temporal evolution relationship of each feature, it can better portray the dynamic trend of handcrafted features without losing information due to feature overlapping.

The introduction of iTransformer not only preserves the domain knowledge of handcrafted features but also enhances their temporal correlation representation through variable-level modeling, providing high-quality inputs for subsequent feature fusion and RUL prediction.

3.3. Experimental Setup

The experiments in this study were conducted on a computer equipped with a 13th Gen Intel(R) Core (TM) i9-13900HX 2.20 GHz processor (Intel Corporation, Santa Clara, CA, USA), 32 GB RAM, and an NVIDIA RTX 4060 GPU (NVIDIA Corporation, Santa Clara, CA, USA; CUDA 11.8). The operating system was Windows 11, the programming language was Python 3.9.21, and the deep learning framework was PyTorch 1.10.0. All experiments were conducted in the same hardware environment and hyperparameter settings to ensure the reproducibility and fairness of the experiments.

3.3.1. Bearing Datasets

To comprehensively verify the effectiveness and generalization ability of the proposed method, two widely used public bearing datasets are used in this paper: the PRONOSTIA bearing dataset from the IEEE PHM 2012 Data Challenge [36] and the XJTU-SY rolling bearing dataset [37]. These two datasets are collected under different operating conditions and cover a variety of degradation modes, which can effectively evaluate the adaptability and robustness of the model in diverse environments.

PRONOSTIA Dataset

The PRONOSTIA dataset from the IEEE PHM 2012 Data Challenge, provided by the French FEMTO-ST Institute, is a widely utilized public dataset for rolling bearing health monitoring and residual life prediction. PRONOSTIA employs constant operating conditions to study the degradation behavior of bearings under stable settings. The experimental platform comprises a motor, housing, and load system, with bearings operated at constant speed and load until failure. Three sets of bearing vibration signals under different operating conditions (varying speed–load combinations) were collected, with each condition capturing the full-life data of multiple bearings. As shown in Figure 5, the vibration signal waveforms of rolling bearings 1_1, 2_1, and 3_1, recorded in both horizontal and vertical directions, exhibit a progressive widening trend in signal distribution over time, despite unavoidable noise in the data, indicating that the signals carry significant characteristic information reflecting the gradual deterioration of the bearing’s health condition. The training and test sets for the experiment are divided as detailed in Table 1.

2.: XJTU-SY Dataset

The XJTU-SY rolling bearing dataset was jointly collected by Xi’an Jiaotong University (XJTU) and Shenyang Machine Tool Group (SY) and is widely used for residual life prediction research. The dataset consists of 15 sets of full life-cycle vibration signals of rolling bearings under three different operating conditions, which were collected by performing several accelerated degradation tests. To record the entire degradation process, i.e., from normal condition to severe failure, each accelerated degradation test was performed until the maximum amplitude of the horizontal or vertical vibration signals exceeded a threshold value [38]. As illustrated in Figure 6, the temporal vibration waveforms of bearings 1_1, 2_1, and 3_1, recorded in both horizontal and vertical directions, reveal a progressive evolution in signal characteristics over the bearing life cycle, despite the presence of noise, highlighting critical patterns associated with the degradation process. The training and test sets for the experiment are divided as shown in Table 2. This dataset is detailed in Table 3, which outlines operational data such as conditions, sample sizes, work times, and fault types for each bearing.

3.3.2. Data Preprocessing

In data preprocessing, this paper adopts three steps of normalization, sliding window division, and residual lifetime (RUL) label calculation to ensure the standardization of input data and the completeness of time-series information and to provide reasonable prediction targets.

(1): Data Normalization

To reduce the magnitude difference between features and improve the stability of model training, Z-score normalization is applied to all input features in this paper:

X^{'} = \frac{X - μ}{σ},

(13)

where

X

is the original feature value,

μ

is the mean value of the feature, and

σ

is the standard deviation of the feature. This method centers the data distribution and improves the convergence speed of the model.

(2): Sliding Window Division

A sliding window is used to divide the data to maintain the temporal information and improve the model’s ability to learn the long-term dependencies. The length of the sliding window is 30, and the step size is 10.

(3): RUL Label Calculation

In the RUL prediction task, the dataset needs to provide the remaining life labels for each time step to be used as a supervisory signal to guide the model training. For each bearing dataset, let the total lifetime of the bearing be

T

(i.e., the number of time steps contained in the dataset), and for time step

i

, RUL is defined as:

{R U L}_{i} = \frac{T - i}{T},

(14)

where

T

is the total running time step of the bearing and i is the current time step. The calculated RUL values are normalized to [0, 1] to ensure that the RUL of different bearings are comparable.

The computed RUL labels are added to the corresponding datasets for subsequent training and evaluation.

3.3.3. Handcrafted Features Calculation

In this experiment, 13 types of handcrafted features are extracted from the original vibration signals, including time-domain features (e.g., mean and variance), vibration features (e.g., peaks), frequency-domain features (e.g., spectral energy ratios, spectral flatness), and nonlinear features (e.g., cragness, entropy value, and fractal dimensionality), to comprehensively characterize the evolutionary trend of bearing degradation. These features reflect the statistical distribution, vibration intensity, spectral characteristics, and nonlinear dynamics of the signal, respectively. The extracted feature visualization for PRONOSTIA dataset Bearing1_3 is illustrated in Figure 7.

In the feature visualization of the PRONOSTIA dataset Bearing1_3, the mean and variance indicate the overall level and volatility of the signal, the peak and cragness highlight local shocks, the spectral energy ratio reveals changes in frequency distribution, and the entropy value and fractal dimension quantify signal complexity. To eliminate scale differences, all features are normalized after extraction to ensure consistency and effectiveness in model training.

3.3.4. Evaluation Metrics

To quantitatively assess the performance of the proposed model and the comparative model in the prediction of the remaining useful life (RUL) of rolling bearings, this paper adopts the root mean square error (RMSE) and the mean absolute error (MAE) as the assessment indexes [39,40]. The specific definitions are as follows:

Root mean square error (RMSE):

R M S E = \sqrt{\sum_{i = 1}^{N} \frac{{({\hat{y}}_{i} - y_{i})}^{2}}{N}},

(15)

2.: Mean absolute error (MAE):

M A E = \sum_{i = 1}^{N} \frac{|{\hat{y}}_{i} - y_{i}|}{N},

(16)

where

{\hat{y}}_{i}

and

y_{i}

denote the predicted life percentage and actual life percentage of the tested bearings, and

N

is the number of tested samples. RMSE measures the overall deviation between the predicted value and the actual value, and MAE reflects the average absolute error of the prediction. The lower the values of RMSE and MAE, the better the prediction performance of the model for RUL.

3.3.5. Model Parameters

The BAIT-RUL model parameters are set as follows: The model contains two parallel branches. The first branch uses a three-layer bi-directional GRU to process the original vibration signals, with an input feature dimension of 4, a time step of 30, several hidden units per layer of 64, and an output dimension of 128, and extracts dynamically degraded features in combination with the adaptive sparse self-attention mechanism (ASSA, with an input dimension of 128). The second branch uses the iTransformer module to process 13 handmade features with 13-dimensional temporal sequences as input, optimizing the feature temporal correlation through an encoder, and outputting a 13 × 16-dimensional feature representation. After the fusion of the two branches of features, it is mapped to 1-dimensional through the fully connected layer, and the output is the predicted value of the remaining service life of the bearing.

The model is trained using the Adam optimizer with an initial learning rate of 0.001, momentum parameters β1 = 0.9, β2 = 0.999, combined with a cosine annealing learning rate schedule, and a minimum learning rate of 0.0001. The training process consists of 5 iterations, 32 rounds of training per iteration, and a batch size of 64. To ensure robustness against the stochasticity in training, multiple runs were conducted with random seeds (100, 200, 300, 400, 500), and seed 100 was selected based on consistent performance across the XJTU-SY and PRONOSTIA datasets. To prevent overfitting, dropout regularization is introduced into the key module, and the fully connected layer L2 regularization is applied (coefficient 0.01). The state is printed every 10 batches during training, and an early stopping strategy is applied to terminate training if the validation set loss does not drop for 10 consecutive epochs.

4. Results

4.1. Experimental Results

This section conducts a systematic comparative analysis of the proposed BAIT-RUL model against state-of-the-art baseline models, including LSTM [41], Transformer [42], Transformer-LSTM [43], CNN-BiLSTM [44], and BiGRU-Transformer-Attention [45], in terms of prediction performance on the PRONOSTIA and XJTU-SY rolling bearing datasets.

4.1.1. PRONOSTIA Bearing Dataset

Figure 8 shows the remaining life (RUL) prediction results of various models on the PRONOSTIA dataset for the first working condition, Bearing1_3. Each figure illustrates the actual RUL (blue line), predicted RUL (red line), 95% confidence interval (shaded red area), and absolute prediction error (gray area). In terms of the overall trend, the BAIT-RUL model proposed in this paper demonstrates higher consistency and prediction accuracy compared to the other models, maintaining stable predictions throughout the entire bearing life cycle. The quantitative comparison of RMSE and MAE on the test set is summarized in Table 4, which further supports the performance improvement of BAIT-RUL in numerical prediction.

The LSTM model initially tracks the actual RUL effectively but exhibits increasing deviation, with a widening 95% confidence interval and rising absolute error, highlighting its capability for short-term dependencies but limitations in capturing the complex, nonlinear dynamics of long-term degradation. The Transformer model performs well in the early stages, with the predicted RUL closely matching the actual RUL, yet it shows fluctuations, an expanding 95% confidence interval, and growing error as degradation intensifies, indicating its strength in long-sequence modeling but weakness in addressing local nonlinear features and abrupt changes. The Transformer-LSTM hybrid enhances mid-stage alignment with the actual RUL by leveraging LSTM’s memory, but a persistently wide 95% confidence interval and increasing error in later stages reveal its difficulty in capturing local mutations, likely due to LSTM’s memory decay.

The CNN-BiLSTM model reasonably tracks the actual RUL in the early and middle stages, maintaining a moderately stable 95% confidence interval, but displays greater deviation and error in the late stages, suggesting that while it benefits from CNN’s local feature extraction and BiLSTM’s bidirectional temporal modeling, it struggles with long-term dynamic trends as degradation complexity increases. The BiGRU-Transformer-Attention model surpasses others in the mid-degradation phase, achieving closer RUL alignment and a narrower 95% confidence interval, yet it over-predicts with rising error and a wider interval in the final stage, indicating that its attention mechanism, while effective for key features, lacks precision for short-term abrupt changes. In contrast, the BAIT-RUL model consistently excels, maintaining a tight 95% confidence interval and minimal absolute error throughout, underscoring its robustness in effectively balancing global trend modeling with sensitivity to local degradation patterns.

The BAIT-RUL model proposed in this paper maintains stable prediction accuracy across different degradation stages, as illustrated in Figure 8. The prediction curves align closely with the actual RUL curves, with errors kept to within 0.05 in most intervals. This performance is further supported by the numerical comparison results in Table 4, which show a reduction in RMSE and MAE for BAIT-RUL compared to other models.

The absolute prediction errors shown in the figure further illustrate the stability of the predictions of the model proposed in this paper across the entire bearing life cycle. Compared with other models, the error fluctuation is minimized and more evenly distributed, reflecting good adaptability to different degradation stages and patterns. Figure 9 presents the boxplot analysis of RMSE and MAE for all models, which provides additional insights into the performance consistency across different bearings. The compact distribution of the BAIT-RUL model in Figure 9 suggests that it maintains relatively stable performance across varying degradation patterns.

Table 4 summarizes the RMSE and MAE values for all models on the PRONOSTIA dataset, showing that the proposed model achieves lower prediction errors for 10 out of 11 bearings across the three operating conditions. For the remaining bearing (Bearing1_4), the model still demonstrates competitive performance.

4.1.2. XJTU-SY Bearing Dataset

Figure 10 demonstrates the prediction results for Bearing1_3 of the XJTU-SY dataset. Overall, the BAIT-RUL model prediction curves are highly consistent with the real RUL, exhibiting high accuracy and stability, which aligns with the conclusions drawn from the PRONOSTIA dataset. Table 5 provides a numerical comparison of RMSE and MAE across different models, further validating the performance advantage of BAIT-RUL.

The LSTM model exhibits significant deviations in both the early and late stages of degradation, with the predicted RUL diverging markedly from the actual RUL, a wide 95% confidence interval, and high absolute prediction error, indicating its difficulty in accurately capturing the degradation trend across all stages. The Transformer and Transformer-LSTM models perform stably in the early and middle stages of degradation, but their errors increase in the later stages. The former struggles to capture local nonlinear features, while the latter, despite LSTM improving long-term trend modeling, remains insensitive to local mutations.

The CNN-BiLSTM model performs well in the early stages, with the predicted RUL closely following the actual RUL, but as the degradation process becomes more complex, the error increases significantly in the later stages, with rising absolute error, suggesting limitations in adapting to long-term dynamic changes. The BiGRU-Transformer-Attention model improves prediction smoothness but still exhibits errors at the end of the degradation, with the attention mechanism showing limitations in capturing short-term features. Figure 11 presents the RMSE and MAE distribution for all models using boxplots, further highlighting the stability of BAIT-RUL’s predictions.

The BAIT-RUL model outperforms the benchmark model on Bearing1_3. However, BAIT-RUL slightly underperforms Transformer-LSTM on Bearing2_4, possibly because Bearing2_4’s degradation is more continuous and linear, whereas LSTM holds an advantage for such trends. Overall, the results summarized in Table 5 and Figure 11 confirm that the BiGRU-ASSA-iTransformer model maintains strong generalization and prediction ability across complex degradation patterns on both the XJTU-SY and PRONOSTIA datasets.

5. Discussion

5.1. Ablation Experiment

In this section, ablation experiments were conducted to verify the validity of the proposed model and to assess the influence of each component module on the prediction of the remaining bearing life. The experiments were conducted on Bearing1_1 and Bearing2_4 of the PRONOSTIA dataset and Bearing1_3 and Bearing2_5 of the XJTU-SY dataset, and five different variants were used for the comparative analysis to examine the contributions of the different modules. Due to space constraints, only the prediction results for Bearing2_4 of the PRONOSTIA dataset are shown in this paper. Figure 12 presents the overall performance comparison of different ablation models on the PRONOSTIA dataset, while Table 6 summarizes the RMSE and MAE results for Bearing1_3 and Bearing2_4 (PRONOSTIA) and Bearing1_3 and Bearing2_5 (XJTU-SY).

BiGRU, as a benchmark model, relies heavily on a two-way loop structure to model time-series relationships. However, the model fails to fully utilize the global patterns and short-term change characteristics in time series, and thus has some limitations in predicting the degradation trend. For this reason, the study introduces the iTransformer to strengthen the temporal encoding capability of the features and incorporates the ASSA mechanism to focus on short-term localized changes. As shown in Figure 12, removing the iTransformer component significantly degrades the model’s long-term prediction ability.

On the other hand, the ASSA mechanism mainly focuses on short-term feature extraction, playing a key role in identifying local degradation trend changes. Table 6 shows that removing the ASSA mechanism leads to increased RMSE and MAE, indicating reduced sensitivity to short-term variations.

The experiments also validate the necessity of the bidirectional BiGRU structure. When a unidirectional GRU is used instead of the BiGRU, the predictive ability of the model decreases. In contrast, the bidirectional structure better captures past and future information, improving the accuracy of modeling degradation trends.

Figure 12 further confirms that the BAIT-RUL model achieves lower prediction errors compared to its ablated variants, demonstrating the contribution of each module to the overall performance.

5.2. Impact of ASSA on Feature Weighting in RUL Prediction

To deeply investigate the role of ASSA in the prediction of RUL of bearings, the attention matrix of the test samples is illustrated in Figure 13. The traditional BiGRU model, in the absence of an attention mechanism, assigns equal weights to the features at all time steps, ignoring the fact that the critical degradation stage may have a higher contribution. However, the dynamic nature of the bearing degradation process suggests that there are significant differences in the contribution of features from different time steps to RUL prediction.

The attention matrix visualization clearly reveals this property through a 3D heat map, where both the horizontal (key position) and vertical (query position) coordinates denote a sequence of time steps ranging from 0 to 30, reflecting the 30-times step lengths of the model inputs. This time step is divided by a sliding window method (window size of 30, step size of 10), with each window covering a shorter segment of the signal in order to control the computational complexity while capturing short-term dynamic changes. As shown in the heat map, the most recent time step is assigned a larger attention weight, indicating its higher importance for RUL prediction. This phenomenon is consistent with the physical law of bearing degradation: the later stages of degradation are usually accompanied by significant feature changes (e.g., a sharp increase in vibration amplitude), which are more critical in contributing to the RUL prediction.

Although it is difficult to directly explain the physical meanings of the high-level features extracted by BiGRU, the introduction of ASSA significantly enhances the model’s ability to model the degradation process by quantifying the differences in the contributions of the features at each time step. ASSA not only highlights the features at the critical time step but also retains the potential information at other time steps, ensuring the comprehensiveness of feature utilization. This dynamic weight allocation strategy effectively enhances the prediction accuracy of the BAIT-RUL model, especially in the late stage of degradation showing lower prediction errors, which fully demonstrates the important value of ASSA in the prediction of complex degradation patterns.

6. Conclusions

In this paper, a rolling bearing remaining useful life (RUL) prediction model (BAIT-RUL) incorporating handcrafted features is proposed to enhance the operational reliability prediction capability of spindle bearings in complex aerospace structural component machining workshops. Aiming at the limitations of traditional prediction models in RUL prediction, which pay insufficient attention to the time-step contribution difference and underutilize handcrafted features, the BAIT-RUL model constructs two parallel feature processing branches by integrating a bidirectional gated recurrent unit (BiGRU), an adaptive sparse self-attention mechanism (ASSA), an iTransformer, and a handcrafted feature extraction module. The BiGRU-ASSA branch effectively captures the dynamic degradation features in the original vibration signals and focuses on the features with later time steps through ASSA to extract the rich degradation information contained in these phases. The iTransformer branch, on the other hand, optimizes the temporal correlations of 13 handcrafted features through the encoder to fully exploit the potential value of domain knowledge. Eventually, the model fuses the features of the two branches through the attention mechanism, significantly reducing the prediction error. The experimental results show that BAIT-RUL outperforms the existing state-of-the-art methods on the PRONOSTIA dataset (formerly known as the PHM 2012 dataset) and the XJTU-SY dataset.

Despite the promising results achieved in this study, there are still some limitations that deserve further exploration. (1) Although the 13 handcrafted features adopted in this paper cover the time domain, frequency domain, and statistical domain, they may not fully capture the specific degradation patterns under extreme working conditions (e.g., high loads, high-speed rotation, or low-temperature environments) encountered in aerospace spindle bearings. Future research could introduce an aerospace condition-based feature screening mechanism or apply advanced feature engineering techniques to improve the quality of handcrafted features and enhance adaptability to aerospace applications. (2) This study validated the model on the PRONOSTIA and XJTU-SY datasets; however, these datasets do not fully represent the diversity of bearing types under complex aerospace working conditions. In the future, the model could be extended to more aerospace industry scenarios (e.g., different working conditions and bearing types) for validation, further improving its generalization ability.

Author Contributions

Conceptualization, Y.L. and Y.C.; methodology, Q.Q.; writing—original draft preparation, Q.Q.; writing—review and editing, Y.L.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China under grant No. 2022YFB3302700 and the National Natural Science Foundation of China under grant No. 52375486.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RUL	Remaining useful life
BiGRU	Bidirectional gated recurrent unit
ASSA	Adaptive sparse self-attention
BAIT-RUL	BiGRU-ASSA-iTransformer for remaining useful life prediction
GRU	Gated recurrent unit
CNN	Convolutional neural network
LSTM	Long short-term memory network
MLP	Multi-layer perceptron
RMSE	Root mean square error
MAE	Mean absolute error

References

Yang, G.; Tan, Q.; Tian, Z.; Jiang, X.; Chen, K.; Lu, Y.; Liu, W.; Yuan, P. Integrated Optimization of Process Planning and Scheduling for Aerospace Complex Component Based on Honey-Bee Mating Algorithm. Appl. Sci. 2023, 13, 5190. [Google Scholar] [CrossRef]
Adamopoulou, E.; Daskalakis, E. Applications and Technologies of Big Data in the Aerospace Domain. Electronics 2023, 12, 2225. [Google Scholar] [CrossRef]
Tang, Y.; Zhang, J.; Tian, H.; Liu, H.; Zhao, W. Optimization Method of Spindle Speed with the Consideration of Chatter and Forced Vibration for Five-Axis Flank Milling. Int. J. Adv. Manuf. Technol. 2023, 125, 3159–3169. [Google Scholar] [CrossRef]
Peng, J.; Yin, M.; Cao, L.; Xie, L.-F.; Wang, X.-J.; Yin, G.-F. Study on the Thermally Induced Spindle Angular Errors of a Five-Axis CNC Machine Tool. Adv. Manuf. 2023, 11, 75–92. [Google Scholar] [CrossRef]
Wang, Z.; Wang, S.; Wang, S.; Zhao, Z.; Yang, T.; Su, Z. Prediction of Five-Axis Machining-Induced Residual Stress Based on Cutting Parameter Identification. J. Manuf. Processes 2023, 103, 320–336. [Google Scholar] [CrossRef]
Mao, W.; Liu, Y.; Ding, L.; Safian, A.; Liang, X. A New Structured Domain Adversarial Neural Network for Transfer Fault Diagnosis of Rolling Bearings under Different Working Conditions. IEEE Trans. Instrum. Meas. 2021, 70, 3509013. [Google Scholar] [CrossRef]
Xu, J.; Ma, B.; Fan, Y.; Ding, X. ATPRINPM: A Single-Source Domain Generalization Method for the Remaining Useful Life Prediction of Unknown Bearings. In Proceedings of the 2022 International Conference on Sensing, Measurement & Data Analytics in the Era of Artificial Intelligence (ICSMD), Harbin, China, 30 November–2 December 2022; pp. 1–6. [Google Scholar]
Chen, J.; Huang, R.; Chen, Z.; Mao, W.; Li, W. Transfer Learning Algorithms for Bearing Remaining Useful Life Prediction: A Comprehensive Review from an Industrial Application Perspective. Mech. Syst. Signal Process. 2023, 193, 110239. [Google Scholar] [CrossRef]
Jin, Y.; Yang, X.; Liu, J.; Yang, Y.; Hei, X.; Shangguan, A. An Improved Nonlinear Health Index CRRMS for the Remaining Useful Life Prediction of Rolling Bearings. Actuators 2025, 14, 88. [Google Scholar] [CrossRef]
Zhong, Z.; Zhao, Y.; Yang, A.; Zhang, H.; Zhang, Z. Prediction of Remaining Service Life of Rolling Bearings Based on Convolutional and Bidirectional Long- and Short-Term Memory Neural Networks. Lubricants 2022, 10, 170. [Google Scholar] [CrossRef]
Chen, Z.; Wu, M.; Zhao, R.; Guretno, F.; Yan, R.; Li, X. Machine Remaining Useful Life Prediction via an Attention-Based Deep Learning Approach. IEEE Trans. Ind. Electron. 2021, 68, 2521–2531. [Google Scholar] [CrossRef]
Zhou, S.; Chen, D.; Pan, J.; Shi, J.; Yang, J. Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16 June 2024; pp. 2952–2963. [Google Scholar]
Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar] [CrossRef]
Shen, J.; Zhou, H.; Jin, M.; Jin, Z.; Wang, Q.; Mu, Y.; Hong, Z. RUL Prediction of Rolling Bearings Based on Fruit Fly Optimization Algorithm Optimized CNN-LSTM Neural Network. Lubricants 2025, 13, 81. [Google Scholar] [CrossRef]
Shutin, D.; Bondarenko, M.; Polyakov, R.; Stebakov, I.; Savin, L. Method for On-Line Remaining Useful Life and Wear Prediction for Adjustable Journal Bearings Utilizing a Combination of Physics-Based and Data-Driven Models: A Numerical Investigation. Lubricants 2023, 11, 33. [Google Scholar] [CrossRef]
Wei, Y.; Wu, D.; Terpenny, J. Bearing Remaining Useful Life Prediction Using Self-Adaptive Graph Convolutional Networks with Self-Attention Mechanism. Mech. Syst. Signal Process. 2023, 188, 110010. [Google Scholar] [CrossRef]
Rathore, M.S.; Harsha, S.P. An Attention-Based Stacked BiLSTM Framework for Predicting Remaining Useful Life of Rolling Bearings. Appl. Soft Comput. 2022, 131, 109765. [Google Scholar] [CrossRef]
Park, Y.-I.; Song, J.W.; Kang, S.-J. Pseudo-Label-Vector-Guided Parallel Attention Network for Remaining Useful Life Prediction. IEEE Trans. Ind. Informat. 2023, 19, 5602–5611. [Google Scholar] [CrossRef]
Nie, L.; Xu, S.; Zhang, L. Multi-Head Attention Network with Adaptive Feature Selection for RUL Predictions of Gradually Degrading Equipment. Actuators 2023, 12, 158. [Google Scholar] [CrossRef]
Peng, H.; Jiang, B.; Mao, Z.; Liu, S. Local Enhancing Transformer with Temporal Convolutional Attention Mechanism for Bearings Remaining Useful Life Prediction. IEEE Trans. Instrum. Meas. 2023, 72, 3522312. [Google Scholar] [CrossRef]
Zhao, Q.; Zhang, X.; Wang, F.; Fan, P.; Mbeka, E. The Effect of the Head Number for Multi-Head Self-Attention in Remaining Useful Life Prediction of Rolling Bearing and Interpretability. Neurocomputing 2025, 616, 128946. [Google Scholar] [CrossRef]
Wang, L.; Cao, H.; Chen, X. Information Guided Attention Network for Bearing Remaining Useful Life Prediction Adaptive to Working Conditions and Fault Modes. Eng. Appl. Artif. Intell. 2025, 147, 110197. [Google Scholar] [CrossRef]
Gao, P.; Wang, J.; Shi, Z.; Ming, W.; Chen, M. Long-Term Temporal Attention Neural Network with Adaptive Stage Division for Remaining Useful Life Prediction of Rolling Bearings. Reliab. Eng. Syst. Saf. 2024, 251, 110218. [Google Scholar] [CrossRef]
Xiang, S.; Qin, Y.; Zhu, C.; Wang, Y.; Chen, H. LSTM Networks Based on Attention Ordered Neurons for Gear Remaining Life Prediction. ISA Trans. 2020, 106, 343–354. [Google Scholar] [CrossRef]
Zhou, K.; Tang, J. A Wavelet Neural Network Informed by Time-Domain Signal Preprocessing for Bearing Remaining Useful Life Prediction. Appl. Math. Modell. 2023, 122, 220–241. [Google Scholar] [CrossRef]
Cao, X.; Zhang, F.; Zhao, J.; Duan, Y.; Guo, X. Remaining Useful Life Prediction of Rolling Bearing Based on Multi-Domain Mixed Features and Temporal Convolutional Networks. Appl. Sci. 2024, 14, 2354. [Google Scholar] [CrossRef]
Niazi, S.G.; Huang, T.; Zhou, H.; Bai, S.; Huang, H.-Z. Multi-Scale Time Series Analysis Using TT-ConvLSTM Technique for Bearing Remaining Useful Life Prediction. Mech. Syst. Signal Process. 2024, 206, 110888. [Google Scholar] [CrossRef]
Yang, C.; Ma, J.; Wang, X.; Li, X.; Li, Z.; Luo, T. A Novel Based-Performance Degradation Indicator RUL Prediction Model and Its Application in Rolling Bearing. ISA Trans. 2022, 121, 349–364. [Google Scholar] [CrossRef]
Cui, L.; Xiao, Y.; Liu, D.; Han, H. Digital Twin-Driven Graph Domain Adaptation Neural Network for Remaining Useful Life Prediction of Rolling Bearing. Reliab. Eng. Syst. Saf. 2024, 245, 109991. [Google Scholar] [CrossRef]
Xu, Z.; Bashir, M.; Liu, Q.; Miao, Z.; Wang, X.; Wang, J.; Ekere, N. A Novel Health Indicator for Intelligent Prediction of Rolling Bearing Remaining Useful Life Based on Unsupervised Learning Model. Comput. Ind. Eng. 2023, 176, 108999. [Google Scholar] [CrossRef]
Wen, L.; Su, S.; Li, X.; Ding, W.; Feng, K. GRU-AE-Wiener: A Generative Adversarial Network Assisted Hybrid Gated Recurrent Unit with Wiener Model for Bearing Remaining Useful Life Estimation. Mech. Syst. Signal Process. 2024, 220, 111663. [Google Scholar] [CrossRef]
Wang, X.; Xie, G.; Zhang, Y.; Liu, H.; Zhou, L.; Liu, W.; Gao, Y. The Application of a BiGRU Model with Transformer-Based Error Correction in Deformation Prediction for Bridge SHM. Buildings 2025, 15, 542. [Google Scholar] [CrossRef]
Zhang, B.; Yin, Y.; Li, B.; He, S.; Song, J. A Hybrid Algorithm for Predicting the Remaining Service Life of Hybrid Bearings Based on Bidirectional Feature Extraction. Measurement 2025, 242, 116152. [Google Scholar] [CrossRef]
A Deep Transfer Network Based on Dual-Task Learning for Predicting the Remaining Useful Life of Rolling Bearings. Available online: https://colab.ws/articles/10.1088%2F1361-6501%2Fadafd2 (accessed on 20 March 2025).
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Nectoux, P.; Gouriveau, R.; Medjaher, K.; Ramasso, E.; Chebel-Morello, B.; Zerhouni, N.; Varnier, C. PRONOSTIA: An Experimental Platform for Bearings Accelerated Degradation Tests. In Proceedings of the Conference on Prognostics and Health Management, IEEE Catalog Number: CPF12PHM-CDR. Denver, CO, USA, 18–21 June 2012; pp. 1–8. [Google Scholar]
Lei, Y.; Han, T.; Wang, B.; Li, N.; Yan, T.; Yang, J. XJTU-SY Rolling Element Bearing Accelerated Life Test Datasets: A Tutorial. Researchgate 2019, 16, 001. [Google Scholar] [CrossRef]
Wang, B.; Lei, Y.; Li, N.; Li, N. A Hybrid Prognostics Approach for Estimating Remaining Useful Life of Rolling Element Bearings. IEEE Trans. Rel. 2020, 69, 401–412. [Google Scholar] [CrossRef]
Meng, Z.; Ma, S.; Cao, W.; Li, J.; Cao, L.; Fan, F.; Wang, X. A Remaining Useful Life Prediction Method of Rolling Bearings by RSA-BAFT Combined with Copula Entropy Feature Selection. Expert Syst. Appl. 2025, 275, 127100. [Google Scholar] [CrossRef]
Guo, W.; Li, F.; Zhang, P.; Luo, L. A Stage-Related Online Incremental Transfer Learning-Based Remaining Useful Life Prediction Method of Bearings. Appl. Soft Comput. 2025, 169, 112491. [Google Scholar] [CrossRef]
Ma, M.; Mao, Z. Deep-Convolution-Based LSTM Network for Remaining Useful Life Prediction. IEEE Trans. Ind. Informat. 2021, 17, 1658–1667. [Google Scholar] [CrossRef]
Kim, S.; Seo, Y.-H.; Park, J. Transformer-Based Novel Framework for Remaining Useful Life Prediction of Lubricant in Operational Rolling Bearings. Reliab. Eng. Syst. Saf. 2024, 251, 110377. [Google Scholar] [CrossRef]
Mu, H.; Zhai, X.; Yin, D.; Qiao, F. A Method of Remaining Useful Life Prediction of Multi-Source Signals Aero-Engine Based on RF-Transformer-LSTM. In Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, 9–12 October 2022; pp. 2502–2507. [Google Scholar]
Jia, J.; Yan Yang, Y.Y.; Junyu Guo, J.G.; Le Dai, L.D. A Hybrid CNN-BiLSTM and Wiener Process-Based Prediction Approach of Remaining Useful Life for Rolling Bearings. Comput. Res. Prog. Appl. Sci. Eng. 2022, 8, 2817. [Google Scholar] [CrossRef]
An, X.; Zhang, C.; Liu, C.; Liu, G.; Hao, J. Residual Life Prediction of Rolling Bearings Based on Transformer-BiGRU-Attention Model with Improved Sparrow Optimization Algorithm. In Proceedings of the TEPEN International Workshop on Fault Diagnostic and Prognostic, Qingdao, China, 8–11 May 2024; Springer Nature: Cham, Switzerland, 2024; pp. 23–33. [Google Scholar]

Figure 1. Architecture of the proposed BAIT-RUL model.

Figure 2. Structure of the BiGRU-ASSA branch.

Figure 3. Standard BiGRU for regression problems.

Figure 4. Structure of the iTransformer branch.

Figure 5. Vibration time-series plots for bearings 1_1, 2_1, and 3_1 in the PRONOSTIA dataset.

Figure 6. Vibration time-series plots for bearings 1_1, 2_1, and 3_1 in the XJTU-SY dataset.

Figure 7. Handcrafted feature extraction for PRONOSTIA dataset Bearing1_3.

Figure 8. Prediction results for Bearing1_3 in the PRONOSTIA dataset using different approaches: (a) LSTM; (b) Transformer; (c) Transformer-LSTM; (d) CNN-BiLSTM; (e) BiGRU-Transformer-Attention; (f) BAIT-RUL.

Figure 9. Boxplot of RMSE and MAE distribution for different models on the PRONOSTIA dataset.

Figure 10. Prediction results for Bearing1_3 in the XJTU-SY dataset using different approaches: (a) LSTM; (b) Transformer; (c) Transformer-LSTM; (d) CNN-BiLSTM; (e) BiGRU-Transformer-Attention; (f) BAIT-RUL.

Figure 11. Boxplot of RMSE and MAE distribution for different models on the SJTU-SY dataset.

Figure 12. Performance comparison of ablation models on the PRONOSTIA dataset.

Figure 13. Attention matrix of one sample.

Table 1. Division of training and test sets in the PRONOSTIA dataset.

Datasets	Operating Conditions
Datasets	Conditions 1	Conditions 2	Conditions 3
Learning set	Bearing1_1	Bearing2_1	Bearing3_1
Learning set	Bearing1_2	Bearing2_2	Bearing3_2
Test set	Bearing1_3	Bearing2_3	Bearing3_3
	Bearing1_4	Bearing2_4
	Bearing1_5	Bearing2_5
	Bearing1_6	Bearing2_6
	Bearing1_7	Bearing2_7

Table 2. Division of training and test sets in the XJTU-SY dataset.

Datasets	Operating Conditions
Datasets	Conditions 1	Conditions 2	Conditions 3
Learning set	Bearing1_1 Bearing1_2	Bearing2_1	Bearing3_1
		Bearing2_2	Bearing3_2
		Bearing2_3	Bearing3_3
Test set	Bearing1_3	Bearing2_4 Bearing2_5	Bearing3_4 Bearing3_5
	Bearing1_4
	Bearing1_5

Table 3. Summary of XJTU-SY bearing dataset.

Condition	Dataset	Sample Size	Work Time	Fault Type
1	Bearing1_1	123	2 h 3 min	Outer race fault
	Bearing1_2	161	2 h 41 min	Outer race fault
	Bearing1_3	158	2 h 38 min	Outer race fault
	Bearing1_4	122	2 h 2 min	Cage fault
	Bearing1_5	52	52 min	Inner race fault
2	Bearing2_1	491	8 h 11 min	Inner race fault
	Bearing2_2	161	2 h 41 min	Outer race fault
	Bearing2_3	533	8 h 53 min	Cage fault
	Bearing2_4	42	42 min	Outer race fault
	Bearing2_5	339	5 h 39 min	Outer race fault
3	Bearing3_1	2538	42 h 18 min	Outer race fault
	Bearing3_2	2496	41 h 18 min	Compound fault
	Bearing3_3	371	6 h 11 min	Inner race fault
	Bearing3_4	1515	25 h 15 min	Inner race fault
	Bearing3_5	114	1 h 54 min	Outer race fault

Table 4. RMSE and MAE comparison of different models on the PRONOSTIA dataset.

Test Bearing	LSTM		Transformer		Transformer-LSTM		CNN-BiLSTM		BiGRU-Transformer-Attention		BAIT-RUL
Test Bearing	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE
Bearing1_3	0.110	0.102	0.086	0.069	0.054	0.050	0.081	0.063	0.052	0.035	0.038	0.033
Bearing1_4	0.116	0.104	0.090	0.071	0.063	0.045	0.095	0.078	0.034	0.028	0.046	0.036
Bearing1_5	0.123	0.112	0.091	0.082	0.080	0.067	0.058	0.042	0.068	0.056	0.024	0.016
Bearing1_6	0.118	0.101	0.103	0.091	0.101	0.085	0.073	0.054	0.074	0.057	0.057	0.044
Bearing1_7	0.132	0.121	0.106	0.090	0.059	0.048	0.104	0.091	0.063	0.054	0.048	0.042
Bearing2_3	0.138	0.120	0.123	0.100	0.099	0.088	0.124	0.097	0.072	0.054	0.061	0.039
Bearing2_4	0.171	0.158	0.135	0.110	0.125	0.099	0.121	0.097	0.088	0.076	0.054	0.040
Bearing2_5	0.165	0.152	0.117	0.104	0.087	0.079	0.146	0.118	0.126	0.091	0.082	0.062
Bearing2_6	0.123	0.112	0.098	0.082	0.090	0.075	0.120	0.090	0.078	0.061	0.054	0.038
Bearing2_7	0.131	0.109	0.105	0.090	0.097	0.086	0.123	0.084	0.059	0.050	0.050	0.040
Bearing3_3	0.089	0.076	0.056	0.044	0.041	0.035	0.055	0.042	0.182	0.171	0.029	0.021

Note: Values in bold denote the optimal performance among all comparative methods.

Table 5. RMSE and MAE comparison of different models on the XJTU-SY dataset.

Test Bearing	LSTM		Transformer		Transformer-LSTM		CNN-BiLSTM		BiGRU-Transformer-Attention		BAIT-RUL
Test Bearing	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE
Bearing1_3	0.064	0.058	0.042	0.033	0.030	0.024	0.035	0.030	0.026	0.021	0.008	0.007
Bearing1_4	0.129	0.108	0.102	0.086	0.091	0.072	0.078	0.063	0.093	0.074	0.034	0.026
Bearing1_5	0.087	0.076	0.068	0.055	0.052	0.044	0.072	0.067	0.062	0.059	0.046	0.042
Bearing2_4	0.066	0.053	0.051	0.046	0.024	0.018	0.056	0.052	0.052	0.047	0.041	0.035
Bearing2_5	0.120	0.114	0.112	0.104	0.103	0.078	0.176	0.149	0.227	0.192	0.068	0.061
Bearing3_4	0.092	0.079	0.078	0.061	0.061	0.049	0.045	0.037	0.034	0.029	0.023	0.021
Bearing3_5	0.146	0.125	0.131	0.098	0.099	0.086	0.079	0.068	0.127	0.098	0.064	0.053

Note: Values in bold denote the optimal performance among all comparative methods.

Table 6. RMSE and MAE results of ablation models on the PRONOSTIA and XJTU-SY datasets.

Test Bearing		BiGRU		BiGRU + iTransformer		BiGRU + ASSA		GRU + ASSA + iTransformer		BAIT-RUL
Test Bearing		RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE
PRONOSTIA	Bearing1_3	0.098	0.090	0.067	0.058	0.048	0.034	0.076	0.061	0.038	0.033
PRONOSTIA	Bearing2_4	0.119	0.090	0.066	0.058	0.069	0.054	0.071	0.048	0.054	0.040
XJTU-SY	Bearing1_3	0.068	0.055	0.014	0.012	0.024	0.020	0.018	0.015	0.008	0.007
XJTU-SY	Bearing2_5	0.199	0.168	0.071	0.063	0.150	0.130	0.187	0.156	0.068	0.061

Note: Values in bold denote the optimal performance among all comparative methods.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lyu, Y.; Qiu, Q.; Chu, Y.; Zhang, J. An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing. Actuators 2025, 14, 238. https://doi.org/10.3390/act14050238

AMA Style

Lyu Y, Qiu Q, Chu Y, Zhang J. An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing. Actuators. 2025; 14(5):238. https://doi.org/10.3390/act14050238

Chicago/Turabian Style

Lyu, Youlong, Qingpeng Qiu, Ying Chu, and Jie Zhang. 2025. "An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing" Actuators 14, no. 5: 238. https://doi.org/10.3390/act14050238

APA Style

Lyu, Y., Qiu, Q., Chu, Y., & Zhang, J. (2025). An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing. Actuators, 14(5), 238. https://doi.org/10.3390/act14050238

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Adaptive BiGRU-ASSA-iTransformer Method for Remaining Useful Life Prediction of Bearing in Aerospace Manufacturing

Abstract

1. Introduction

2. Related Works

2.1. Attention Mechanism in Bearing Remaining Life Prediction

2.2. Handcrafted Features in Bearing Remaining Life Prediction

3. Methods

3.1. BiGRU-ASSA Module

3.1.1. Bidirectional Gated Recurrent Unit

3.1.2. Adaptive Sparse Self-Attention

3.2. iTransformer Module

3.2.1. Handcrafted Features

3.2.2. iTransformer

3.3. Experimental Setup

3.3.1. Bearing Datasets

3.3.2. Data Preprocessing

3.3.3. Handcrafted Features Calculation

3.3.4. Evaluation Metrics

3.3.5. Model Parameters

4. Results

4.1. Experimental Results

4.1.1. PRONOSTIA Bearing Dataset

4.1.2. XJTU-SY Bearing Dataset

5. Discussion

5.1. Ablation Experiment

5.2. Impact of ASSA on Feature Weighting in RUL Prediction

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI