Remaining Useful Life Prediction for Rotating Machinery via Multi-Graph-Based Spatiotemporal Feature Fusion

Cao, Xiangang; Gao, Chenjian; Zhang, Xinyuan

doi:10.3390/app16062738

Open AccessArticle

Remaining Useful Life Prediction for Rotating Machinery via Multi-Graph-Based Spatiotemporal Feature Fusion

by

Xiangang Cao

^1,2

,

Chenjian Gao

^1,2,*

and

Xinyuan Zhang

^1,2

¹

School of Mechanical Engineering, Xi’an University of Science and Technology, Xi’an 710054, China

²

Shanxi Province Key Laboratory of Mine Electromechanical Equipment Intelligent Detection and Control, Xi’an 710054, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 2738; https://doi.org/10.3390/app16062738

Submission received: 10 February 2026 / Revised: 4 March 2026 / Accepted: 9 March 2026 / Published: 13 March 2026

Download

Browse Figures

Versions Notes

Abstract

Rotating machinery serves as a critical component in various engineering systems, making accurate prediction of its Remaining Useful Life (RUL) essential for ensuring operational stability. To address the technical limitations of mainstream RUL prediction models comprehensively capturing spatial correlations among multiple sensors, this paper proposes a multi-graph-structured spatiotemporal feature fusion model for RUL prediction of rotating machinery. Breaking through the constraints of constructing a single correlation graph, the model first builds two distinct graphs—a prior correlation graph based on the structural mechanism of the rotating machinery and a similarity correlation graph derived from monitoring data distribution characteristics. These dual-perspective graphs collectively characterize the potential spatial dependencies among multiple sensors. Subsequently, a Graph Attention Network (GAT) is introduced to aggregate spatial features from both graphs, and a feature concatenation fusion strategy is adopted to achieve a comprehensive representation of the inter-sensor spatial dependencies. Finally, a Long Short-Term Memory (LSTM) network is employed to extract temporal evolution features from the operational data. The effective fusion of these spatial and temporal features enhances the model’s RUL prediction performance. Simulation experiments conducted on the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset validated the robustness of the proposed method.

Keywords:

Graph Attention Networks (GAT); Remaining Useful Life (RUL) prediction; spatial correlation; Long Short-Term Memory (LSTM)

1. Introduction

In the context of intelligent manufacturing development, the intelligent operation and maintenance of rotating machinery has become a core link in achieving high efficiency in industrial production. As a key technology of intelligent operation and maintenance, the prediction accuracy of Remaining Useful Life (RUL) directly determines the scientific soundness and rationality of equipment maintenance decisions [1,2]. With the advancement of industrial monitoring technology, the multi-sensor and multi-parameter monitoring mode has become the mainstream approach for data-driven RUL prediction. However, in practical applications, the latent spatial correlations among sensors have not been fully explored and utilized. How to accurately capture the spatial dependencies among multiple sensors and reasonably construct a sensor correlation representation model has thus become a key issue for further improving the accuracy of data-driven RUL prediction.

Feature fusion is a key technique in multi-source information processing and intelligent state perception. Its core lies in the organic integration of monitoring information from different sources, dimensions, and representations, thereby obtaining more comprehensive and discriminative feature representations by weakening redundant information and enhancing complementary information. In the field of rotating machinery life prediction, a single sensor or feature is often insufficient to fully describe the complex degradation process of equipment. Feature fusion effectively integrates multi-sensor information, fully exploiting the correlations and complementarities among data, thus providing solid support for high-precision RUL prediction [3,4,5].

Currently, the methods for predicting the RUL of turbofan engines can be broadly categorized into two types: model-based methods and data-driven methods [6,7]. Model-based methods typically rely on physical models or prior knowledge to predict remaining life by establishing mathematical models of engine performance degradation. These methods often base their models on fundamental theories such as thermodynamics, mechanics, or chemical reactions to simulate the degradation process of the engine under different operating conditions. Typical model-based approaches include physics-based degradation models and accelerated-aging test models. However, model-based methods have certain limitations, particularly when the model fails to accurately capture complex nonlinear degradation patterns, resulting in potentially inaccurate predictions. Additionally, constructing these models generally requires complex engineering expertise and a significant amount of prior data, and they can struggle to handle large-scale, multivariate complexities in real-world applications.

To overcome these limitations, data-driven methods have gained widespread attention in recent years, particularly with the application of deep learning (DL) techniques [8]. Unlike model-based methods, data-driven approaches train on extensive historical operational data, learning potential patterns directly from the data, thereby better adapting to the nonlinear and time-varying characteristics of engine operations. With the help of DL, data-driven methods can automatically extract features, significantly improving the accuracy and reliability of RUL predictions. Current DL methods mainly focus on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) [9]. CNNs, with their strong feature extraction capabilities, effectively capture local spatial features in engine operational data. Li et al. [10] proposed a two-dimensional (2D) deep CNN to predict the RUL of turbofan engines. Li et al. [11] introduced a multi-scale deep CNN (MS-DCNN) that utilizes three convolutional kernels of different sizes in parallel to extract features at various scales. RNNs, due to their superior modeling capabilities for time-series data, can handle long-term dynamic changes during engine operation, making them particularly suitable for capturing temporal dependencies in the degradation process of engines. Wu et al. [12] proposed a Long Short-Term Memory (LSTM) network combined with optimization algorithms and employed dynamic differential techniques to extract new features from raw engine health monitoring data. Zhang et al. [13] designed an RUL prediction model integrating a two-layer LSTM with an attention mechanism, demonstrating the effectiveness of the approach through experimental results. Additionally, some researchers have combined CNNs and RNNs. Li et al. [14] introduced a Directed Acyclic Graph (DAG) network combining LSTM and CNN to predict RUL. Ma et al. [15] developed a Convolutional Long Short-Term Memory (CLSTM) network, enhancing the accuracy of RUL predictions for turbofan engines.

Despite the effectiveness of the aforementioned methods for predicting the RUL of engines, they often overlook the potential spatial relationships between sensors. For example, when the pressure sensor readings increase for the same component, the temperature sensor readings may also increase correspondingly, indicating a significant correlation between the two. Relevant literature [16,17] highlights the importance of focusing on sensor features that contain more degradation information. Liu et al. [18] proposed an RUL prediction method based on feature attention, applying the attention mechanism directly to the input data and assigning greater weights to important features. Song et al. [19] introduced an attention mechanism for weighted sequence data RUL prediction, which assigns weights to different sensors and time steps, enhancing prediction accuracy. These methods have achieved good results in processing Euclidean data, but many actual data are generated from non-Euclidean spaces, making them difficult to handle effectively.

Recently, some scholars have attempted to utilize Graph Neural Networks (GNNs) to capture the spatial correlations between sensors. The construction of correlation graphs in GNNs can reflect the interactive relationships among sensors. Currently, most GNN correlation graphs are constructed based solely on prior knowledge of engine structures or on feature similarity between sensors. Kong et al. [20] considered prior knowledge of turbofan engine structural information and the influences among sensors to construct a sensor interaction structure graph for turbofan engines. Zhang et al. [21] employed the K-Nearest Neighbors (KNN) algorithm to calculate the Euclidean distances between data, ultimately constructing an interaction structure graph. While both the prior correlation graph and the similarity correlation graph have their advantages, a correlation graph from a single perspective is insufficient to comprehensively reflect the spatial relationships between sensors, thereby limiting their RUL prediction performance.

To address the challenges of inadequate characterization of sensor spatial correlations and insufficient integration of spatiotemporal features in multi-sensor RUL prediction, this paper proposes a hybrid GAT-LSTM model that integrates multi-correlation graphs. The proposed method constructs sensor prior correlation graphs and data-driven similarity correlation graphs from two perspectives: prior structural mechanisms of the equipment and feature similarity (Pearson correlation coefficient) among sensors, enabling a more comprehensive modeling of complex spatial dependencies among sensors. On this basis, GAT is employed to complementarily fuse spatial features from the dual-graph structure, effectively capturing the coupling relationships among sensors. Subsequently, LSTM is utilized to deeply extract temporal degradation trends from monitoring data, achieving joint spatiotemporal feature modeling to fully characterize the degradation process over the equipment’s entire lifecycle. The main innovations and contributions of this paper are as follows:

(1): A multi-sensor spatial correlation fusion mechanism based on dual-correlation graphs is proposed. Unlike traditional methods that rely on a single graph structure or simple correlation modeling, this approach simultaneously leverages prior equipment knowledge and data feature similarity to construct dual-correlation graphs, depicting sensor relationships from both physical layout and data dimensions. This provides more comprehensive spatial features to support accurate RUL prediction.
(2): A novel spatiotemporal fusion prediction framework combining GAT and LSTM is established. By organically integrating GAT with LSTM, the framework achieves deep fusion of spatial features and temporal evolution patterns, overcoming the limitations of traditional methods that focus solely on time-series modeling or single spatial feature extraction. This enables a more holistic capture of spatiotemporal coupling patterns during equipment degradation.
(3): Validation of the proposed method is conducted on the publicly available C-MAPSS aircraft engine dataset. Comparative and ablation experiments demonstrate that the proposed method outperforms various mainstream approaches in prediction accuracy, offering new insights for multi-sensor RUL prediction.

2. Theoretical Foundation

2.1. Graph Attention Network

The GAT is a deep learning method that incorporates attention mechanisms to process graph-structured data, enabling the dynamic assignment of varying importance weights to different nodes’ neighbors within a graph [22,23]. Compared to Graph Convolutional Networks (GCN), the key advantage of GAT lies in its ability to learn attention weights, allowing for more flexible capture of inter-node relationships. This is particularly suitable for scenarios where nodes exhibit complex dependencies.

In GAT, the attention weight

a_{i j}

between each node

v_{i}

and its neighboring node

v_{j}

is obtained by calculating the similarity between the node features. Firstly, the feature vectors

h_{i}

and

h_{j}

of node

v_{i}

and node

v_{j}

, respectively, are mapped to a new space.

e_{i j} = a^{T} [W h_{i} ∥ W h_{j}]

(1)

where

W

is a weight matrix,

h_{i}

and

h_{j}

are the feature vectors of node

v_{i}

and node

v_{j}

, respectively,

∥

denotes the concatenation operation, and

a

is a learnable attention vector. Then, the attention scores between the nodes are computed using the LeakyReLU activation function.

a_{i j} = \frac{\exp (LeakyReLU (e_{i j}))}{\sum_{k \in N (i)} \exp (LeakyReLU (e_{i k}))}

(2)

where

N (i)

denotes the set of neighbors of node

v_{i}

, and the denominator normalizes the attention scores of all neighbors of node

v_{i}

, as shown in Figure 1a, to ensure that the sum of weights of all neighboring nodes is 1. Next, the calculated attention weights

a_{i j}

are used to perform a weighted sum of the features of the neighboring nodes

v_{j}

of node

v_{i}

to update the representation of node

v_{i}

.

h_{i}^{'} = ReLU (\sum_{j \in N (i)} a_{i j} W h_{j})

(3)

This process ensures that each node’s representation depends not only on the features of its neighboring nodes but also on the relative importance of those neighbors within the current graph. To further enhance the model’s expressive power, a multi-head attention mechanism is introduced. In this mechanism, multiple attention heads operate in parallel, with each head independently computing attention weights and generating new node representations. Finally, the outputs of all heads are averaged to obtain the final representation of the node.

h_{i}^{'} = \frac{1}{K} \sum_{k = 1}^{K} ReLU (\sum_{j \in N (i)} a_{i j}^{(k)} W^{(k)} h_{j})

(4)

where

K

denotes the number of attention heads, and

a_{i j}^{(k)}

and

W^{(k)}

represent the attention coefficients and weight matrices of the

k

-th attention head, respectively. Through this multi-head averaging mechanism, the model can learn different features in various subspaces, as shown in Figure 1b, and enhance model stability through averaging.

2.2. Long Short-Term Memory Network

LSTM is a special type of RNN designed to address the gradient vanishing and exploding problems encountered by traditional RNNs when processing long sequences. LSTM incorporates gating mechanisms to effectively capture long-term dependencies [24]. The core idea of LSTM is to use three “gates” (input gate, forget gate, output gate) to control the flow of information, as shown in Figure 2. Each gate’s function is to determine which information should be remembered, which should be forgotten, and which information will influence the current output. The computation process of an LSTM unit includes the following steps.

Forget gate: Determines how much of the previous memory to discard. The output

f_{t}

of the forget gate is computed based on the current input

x_{t}

and the previous hidden state

o_{t - 1}

.

f_{t} = σ (W_{f} [o_{t - 1}, x_{t}] + b_{f})

(5)

where

W_{f}

is the weight matrix of the forget gate,

b_{f}

is the bias, and

σ

is the Sigmoid activation function, which controls the output range to [0, 1], representing the proportion of memory to be forgotten.

Input gate: Controls the update of memory based on the current input. First, the candidate memory vector

{\tilde{C}}_{t}

is computed, and then it decides how much new information to incorporate into the cell state.

{\tilde{C}}_{t} = \tanh (W_{C} [o_{t - 1}, x_{t}] + b_{C})

(6)

i_{t} = σ (W_{i} [o_{t - 1}, x_{t}] + b_{i})

(7)

where

{\tilde{C}}_{t}

is the candidate memory, and

i_{t}

is the output of the input gate, representing the degree of new information update.

Update cell state: The cell state

C_{t}

of the LSTM is updated by combining the forget gate and the input gate.

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t}

(8)

where

f_{t} \cdot C_{t - 1}

represents the retention of part of the previous state’s memory, and

i_{t} \cdot {\tilde{C}}_{t}

denotes the incorporation of new information into the memory.

Output gate: Determines the output

o_{t}

at the current time step, which is computed based on the updated cell state

C_{t}

and the current input

x_{t}

.

l_{t} = σ (W_{l} [o_{t - 1}, x_{t}] + b_{l})

(9)

o_{t} = l_{t} \cdot \tanh (C_{t})

(10)

where

l_{t}

is the output of the output gate, and

o_{t}

is the hidden state at the current time step, representing the network’s output. Through the synergistic operation of these gates, LSTM can selectively “remember” important information and “forget” irrelevant or outdated information, thereby effectively capturing long-term dependencies in longer sequences.

3. Framework of RUL Prediction Based on GAT-LSTM

3.1. Overview of the Research Framework

Figure 3 illustrates the detailed framework of this study. The proposed RUL framework comprises three main components, as follows.

3.2. Data Preprocessing

The signals collected by different sensors are used as input data for the prediction model. First, the sensors that can characterize degradation are selected from the numerous sensors monitoring the health status of the turbofan engine. Data preprocessing, including normalization, is then performed to prepare the data for subsequent analysis.

3.3. Association Graph Learning

This study comprehensively reflects the spatial relationships between sensors by constructing both a priori graph structures and similarity measurement graph structures. Sensors are mapped as nodes in the graph, each containing a segment of feature data, and each edge represents whether there is an interaction between sensors. The collection of edges forms the adjacency matrix. Through the form of nodes and edges, the relationships between sensors can be clearly presented.

Based on the a priori graph structure, correlation is considered to exist between two sensors if they are mounted on the same component, are physically adjacent in the engine layout, or monitor similar physical parameters. As shown in Figure 4, the colors of the sensors correspond to those of the components on which they are mounted, and the figure also illustrates the sensor connection topology constructed based on the above criteria. Based on this, the adjacency matrix of the prior graph is constructed as follows:

A_{p r i o r} (i, j) = \{\begin{matrix} 1, & i ⇌ j \\ 0, & o t h e r w i s e \end{matrix}

(11)

where

⇌

represents the connections between sensors

i

and

j

in Figure 4, with a value of 1 indicating the presence of a physical or functional correlation between the two sensors, and 0 indicating the absence of such correlation.

Figure 4. Construction of the prior knowledge-based correlation graph.

Based on the similarity graph structure, to accurately capture associations between sensors at the data level, the Pearson correlation coefficient is employed to quantify the feature similarity of sensor monitoring data. A threshold is applied to screen these correlations, enabling the sparse construction of the association graph, as illustrated in Figure 5. For any two sensors $i$ and $j$ , let their preprocessed features be $X = [x_{1}, x_{2}, \dots, x_{N}]$ and $Y = [y_{1}, y_{2}, \dots, y_{N}]$ respectively (where $N$ is the length of the time-series data). The Pearson correlation coefficient $r_{i j}$ between them is defined as:

r_{i j} = \frac{\sum_{k = 1}^{N} (x_{k} - \bar{X}) (y_{k} - \bar{Y})}{\sqrt{\sum_{k = 1}^{N} {(x_{k} - \bar{X})}^{2}} \sqrt{\sum_{k = 1}^{N} {(y_{k} - \bar{Y})}^{2}}}

(12)

where

\bar{X}

and

\bar{Y}

represent the mean values of the time-series data

X

and

Y

, respectively. The numerator corresponds to the covariance of the two sequences, while the denominator is the product of their standard deviations. A larger absolute value of

r_{i j}

indicates a stronger similarity in the data characteristics between the sensors: a value greater than 0 denotes a positive correlation, whereas a value less than 0 indicates a negative correlation. By calculating

r_{i j}

pairwise for all sensors, an

M \times M

Pearson correlation coefficient matrix can be obtained, where

M

is the number of sensors selected for modeling.

Figure 5. Construction of the similarity-based association graph.

To avoid introducing redundant information from weakly correlated sensors and to reduce the computational complexity of the graph structure, a correlation coefficient threshold

θ

is set to filter the associations, thereby achieving sparsely connected networks. The adjacency matrix

A_{s}

of the similarity-based association graph is defined as follows:

A_{s} (i, j) = {\begin{matrix} 1, & | r_{i j} | > θ \\ 0, & | r_{i j} | \leq θ \end{matrix}

(13)

where

| r_{i j} |

represents the absolute value of the correlation coefficient. When

| r_{i j} | > 0.7

, it is considered that there exists strong feature similarity between sensors

X

and

Y

, and the corresponding adjacency matrix element is set to 1; otherwise, the correlation is judged as weak, the element is set to 0, and no connecting edge is established.

3.4. Remaining Useful Life Prediction Model

The correlation graph and its node features are used as model inputs. The spatial and structural information of the sensors is aggregated through a GAT. The aggregated spatial features from the prior graph and similarity graph are then concatenated, resulting in richer spatial features. These concatenated features are passed to the LSTM for deeper extraction of temporal features. Subsequently, a fully connected layer is used to establish a predictive model between the aggregated features and RUL, thereby enabling better prediction of the equipment’s degradation trend under complex conditions.

4. Experiment

4.1. Experimental Data Sources

The RUL algorithm presented in this paper will be validated through simulation experiments using the turbofan engine degradation dataset released by NASA Ames Research Center. The structure of the engine, as shown in Figure 6, includes key components such as the fan, high-pressure compressor (HPC), low-pressure compressor (LPC), high-pressure turbine (HPT), low-pressure turbine (LPT), and nozzle.

The dataset is partitioned into four subsets, as detailed in Table 1. FD001 and FD003 simulate scenarios under a single operating condition, while FD002 and FD004 replicate more complex scenarios involving dynamic transitions between multiple operating conditions. Each subset is accompanied by a training set, a test set, and the corresponding test set labels. The sensor monitoring data in the training sets consist of complete, full life-cycle sequences, which comprehensively reflect the entire process from initial normal operation to failure. In contrast, the test sets contain only fragmented monitoring data from segments of the engine life cycles.

The FD001–FD004 datasets each contain 26 columns of data. The first column represents the identification number of different engines in the dataset. The second column indicates the operating time of each engine in cycles. Columns 3 to 5 provide three different operating conditions simulated for engine operation: altitude of engine operation, Mach number of the aircraft, and throttle angle of the aircraft. Columns 6 to 26 are the measurements of 21 sensors obtained from the C-MAPSS platform simulating the engine’s operation to failure. Information about the sensors is provided in Table 2.

4.2. Data Processing

4.2.1. Sensor Selection

The dataset contains monitoring data from 21 sensor channels. However, in the actual performance degradation analysis, data from some sensors do not carry effective information related to engine performance deterioration, necessitating a preliminary screening of the sensor data. Figure 7 illustrates the variation trends of each sensor’s monitoring data with the number of engine operating cycles. It can be clearly observed that the readings from Sensors 1, 5, 6, 10, 16, 18, and 19 show no significant changes as the number of operating cycles increases, remaining consistently stable. These seven non-informative sensor channels are therefore excluded. Ultimately, the remaining 14 sensor channels are retained as the foundational data for subsequent model analysis.

4.2.2. Data Normalization

Since the numerical scales of the data from different sensors vary significantly, such magnitude differences can easily cause the model to be excessively biased towards high-value features, thereby compromising the accuracy of subsequent RUL predictions. Therefore, it is necessary to apply normalization to the filtered sensor data to eliminate interference caused by dimensional differences. The Min-Max normalization method is employed to normalize the raw sensor data, mapping all values to the [0, 1] interval so that features across all dimensions are placed on a comparable scale. The calculation formula for Min-Max normalization is as follows:

x_{n o r m} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(14)

where

x_{n o r m}

represents the normalized sensor data,

x

is the original data, and

x_{\min}

and

x_{\max}

are the minimum and maximum values of the original data, respectively.

4.2.3. Label Building

The normalized data is divided into several graphical samples using the sliding window technique. For a multi-sensor time series of length

T

, the input data is segmented by sliding a time window of size

S

along the time axis with a step size

l

of 1, resulting in

T - S + 1

graphical samples, as illustrated in Figure 8. Meanwhile, a piecewise linear function [25] is used as the label for the samples, as illustrated in Figure 9.

In Figure 9,

R_{e a r l y}

represents the constant RUL value during the health stage. For studies conducted on this dataset, the commonly adopted threshold generally falls within the range of 110 to 130. In this paper, the RUL label determination threshold

R_{e a r l y}

is set to 125. The RUL label is calculated as follows:

R U L = \{\begin{matrix} R_{e a r l y}, & R U L \geq R_{e a r l y} \\ R U L, & o t h e r s \end{matrix}

(15)

4.2.4. Evaluation Indicators

This paper employs the Root Mean Squared Error (RMSE) and the Scoring Function to evaluate the method. The formulations of the two evaluation metrics are shown below:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} d_{i}^{2}}

(16)

S c o r e = \{\begin{matrix} \sum_{i = 1}^{N} (e^{- \frac{d_{i}}{13}} - 1), & d_{i} < 0 \\ \sum_{i = 1}^{N} (e^{\frac{d_{i}}{10}} - 1), & d_{i} \geq 0 \end{matrix}

(17)

where

N

is the number of samples in the test set, and

d_{i} = {\tilde{y}}_{i} - y_{i}

represents the difference between the predicted value and the true value for a single sample,

{\tilde{y}}_{i}

denotes the predicted RUL for the

i

-th sample, and

y_{i}

denotes the true RUL for the

i

-th sample. The RMSE metric effectively reflects the overall prediction bias of the model, where a smaller value indicates higher overall prediction accuracy. In the scoring function, the constants 13 and 10 are penalty coefficients commonly used in the field, which control the severity of penalties for under-prediction and over-prediction, respectively. A smaller Score value represents stronger model reliability.

4.3. RUL Prediction Results

To mitigate the interference caused by experimental stochasticity, the experiment was repeated ten times, and the average of the ten runs was taken as the final prediction result. The RUL prediction results of the proposed method on the FD001–FD004 test sets are presented in Figure 10, where blue circles represent the actual RUL of the test set engines and green asterisks denote the predicted values obtained by our method. From an overall trend perspective, although deviations exist between the predicted and actual values for some samples, the prediction curve generated by the model closely aligns with the actual degradation trend of the engines, without any significant systematic bias. Further analysis of the performance on each sub-dataset reveals that the model achieves higher prediction accuracy on FD001 and FD003, while the deviations are slightly larger on FD002 and FD004. This is primarily because FD002 and FD004 contain not only a larger number of engine test units but also diverse operating conditions and compound fault modes. Therefore, even in such highly complex test scenarios, the model maintains stable predictive capability, validating the robustness of the GAT-LSTM model under complex operating conditions.

Additionally, one engine was selected from each test set (Engine 20 from FD001, Engine 242 from FD002, Engine 46 from FD003, and Engine 71 from FD004), and their full-lifecycle RUL prediction curves were visualized, as shown in Figure 11. It can be observed that the predicted RUL curves generally align well with the actual degradation trends. However, in the early degradation stage, due to weak sensor signals and insufficiently manifested fault features, the predicted values exhibit certain fluctuations. As the engines gradually approach the failure threshold, the prediction errors significantly decrease, indicating that the model’s sensitivity to degradation features progressively strengthens. Nevertheless, some deviations persist in the later degradation stage, which may be attributed to the sharp decline in engine performance during the late lifecycle and abrupt changes in correlations among sensors.

4.4. Comparison with State-of-the-Art Methods

To fully demonstrate the superiority of the proposed model, it is compared with other publicly available RUL prediction models, including traditional machine learning (ML) methods such as Extreme Learning Machine (ELM) [26]; CNN-based methods (CNN [27] and MS-DCNN [11]); RNN-based methods (LSTMNN [28] and BiLSTM-ED [29]); hybrid methods combining CNN and RNN (CNN-LSTM [30] and AGCNN [18]), where AGCNN considers sensor correlations; and GNN-based methods (STFA [20] and HGNN-AGCF [31]). The experimental comparison results on the four test sets are shown in Table 3.

As shown in Table 3, the proposed GAT-LSTM model achieves the best predictive performance on the FD002 and FD004 datasets, which feature higher operational complexity, significantly outperforming all compared methods. On the relatively simpler FD001 and FD003 datasets, its prediction accuracy is slightly lower than that of the top-performing baseline model, yet the overall difference is minimal, placing it within the same performance level. This result indicates that GAT-LSTM exhibits stronger adaptability to complex operating conditions and can effectively extract deep degradation features from multi-condition data. In simpler scenarios, where sensor correlations remain relatively stable, the dual-angle spatial relationship modeling adopted in this study may introduce some information redundancy; nevertheless, the proposed method still maintains strong predictive capability.

Traditional machine learning methods and deep learning models that neglect spatial correlations among sensors fail to exploit the potential interactive information between sensors, thus encountering performance bottlenecks in RUL prediction tasks. This limitation is particularly pronounced on multi-condition datasets such as FD002 and FD004. The AGCNN model, which preliminarily captures sensor correlations by introducing correlation weights, achieves significantly improved performance compared to traditional methods, confirming the critical role of spatial correlation features in RUL prediction. However, AGCNN lacks inherent capability for graph-structured representation when modeling spatial relationships, which still constrains its performance improvement. Further examination of GNN-based methods reveals that STFA relies solely on the structural spatial characteristics of the engine to measure sensor correlations, while HGNN-AGCF constructs graphs using both prior knowledge of the engine and similarity between node feature vectors, yet it ignores temporal dependencies among data. In contrast, the proposed GAT-LSTM model comprehensively captures spatial correlations among sensors from both prior knowledge and data similarity perspectives, while incorporating an LSTM module to extract temporal dependency features. This synergistic modeling of spatiotemporal information enables the model to maintain excellent predictive performance under complex operating conditions.

4.5. Ablation Experiments

To clarify the actual contribution of each component in the GAT-LSTM model, a series of ablation studies were designed. The first variant, GAT-LSTM-Prio, retains only the prior knowledge-based correlation graph as input to the GAT module, aiming to validate the limitations of using a single prior graph for capturing spatial features. The second variant, GAT-LSTM-Corr, employs the Pearson correlation-based similarity graph as the sole input to the GAT, investigating the feature representation capability of a single data-driven correlation graph. The third variant, GCN-LSTM, replaces the GAT module with a Graph Convolutional Network (GCN) while keeping the dual-graph fusion strategy and the LSTM module unchanged, thereby verifying the unique advantage of the graph attention mechanism in dynamically assigning weights to sensor correlations. The fourth variant, GAT-FC, directly removes the LSTM temporal feature extraction module and feeds the fused spatial features from the dual graphs into a fully connected layer for RUL prediction, which serves to demonstrate the importance of modeling temporal dependencies for predicting long-term degradation processes.

The comparative results of the ablation models are presented in Table 4. The experimental results show that the proposed GAT-LSTM achieves the best overall performance across all four test sets. Specifically, both single-graph variants, GAT-LSTM-Prio and GAT-LSTM-Corr, underperform the complete model, which validates that constructing an association graph from a single perspective is insufficient. Their Score values deteriorate sharply on the complex-condition datasets FD002 and FD004, indicating that modeling spatial relationships from only one viewpoint lacks generalizability in complex operating environments. The results of the GCN-LSTM variant also exhibit a clear performance drop compared to the baseline model. This is because GCN aggregates neighboring nodes with fixed weights and cannot distinguish the importance of different sensor correlations, whereas GAT dynamically learns node weights through an attention mechanism, enabling more precise capture of dependencies among critical sensors. Furthermore, the variant model GAT-FC, which removes the LSTM module, shows a significant performance decline across all four sub-datasets, fully demonstrating the indispensability of temporal modeling. In summary, the ablation experiments verify the necessity of each core component in the GAT-LSTM model.

4.6. Model Complexity Analysis

To comprehensively evaluate the computational cost, Table 5 presents a comparison of the number of parameters and the training time per epoch between the proposed GAT-LSTM model and other compared methods.

As shown in Table 5, the proposed GAT-LSTM model has a parameter size of approximately 227,854, which is higher than that of models such as ELM and CNN, but lower than that of AGCNN and HGNN-AGCF. In terms of training time per epoch, the proposed model is slightly higher than STFA and CNN-LSTM, yet lower than AGCNN and HGNN-AGCF, attributable to the incorporation of dual-graph construction and spatiotemporal fusion modules. Although the parameter count and computational efficiency of the proposed method may not be optimal, the results are acceptable considering its superior predictive performance.

Additionally, to validate the feasibility of the proposed GAT-LSTM model for online RUL prediction, its inference latency was evaluated. Inference time refers to the duration required for the trained model to process a single input sample and output the RUL prediction. All tests were conducted on a computer equipped with an Intel Core i7-12700H CPU, 16 GB RAM, and an NVIDIA RTX 3060 GPU. To mitigate randomness in the test results, each sample was inferred 20 times, and the average value was taken. Experimental results demonstrate that the average inference latency of the proposed GAT-LSTM model is approximately 12.4 ms per sample, which is substantially lower than the general response time requirements for online health monitoring in industrial systems. Therefore, the proposed model meets the practical application demands for real-time RUL prediction.

5. Conclusions

To address the challenge that existing mainstream methods struggle to effectively model spatiotemporal dependencies in multi-sensor data, this paper proposes a GAT-LSTM hybrid method for RUL prediction of rotating machinery. By integrating dual-aspect correlation graphs based on prior knowledge and data similarity, the method comprehensively captures spatial interactions among sensors and, in combination with an LSTM module, extracts temporal degradation features, achieving synergistic spatiotemporal modeling.

Experimental results on the CMAPSS dataset demonstrate that the proposed method effectively mines spatiotemporal dependencies within sensor networks, performing particularly well on the FD002 and FD004 subsets, which feature higher operational complexity. Specifically, compared to the best baseline model, the RMSE is reduced by 4.96% and 9.57%, respectively. Comparative experiments and ablation studies further validate the effectiveness of the dual-aspect spatial modeling mechanism and the spatiotemporal feature synergy strategy. The above results fully demonstrate that the proposed model is capable of handling RUL prediction tasks under complex operating conditions, offering a feasible technical solution for the health management of rotating machinery.

Despite the promising predictive performance achieved by the proposed method, certain limitations remain. First, in actual degradation processes, the correlations among sensors often evolve dynamically with equipment status, which is not fully considered in the current model. Second, the model training relies on complete full-lifecycle data, and its performance may be constrained in scenarios where only partial life data are available. To address these issues, future research can be pursued in the following directions: exploring updating mechanisms for dynamic spatial correlations among sensors to capture the time-varying nature of sensor relationships; investigating adaptive fusion methods for multi-aspect correlation graphs to further enhance model generalization; and introducing transfer learning strategies to improve model robustness in scenarios with missing data.

Author Contributions

Conceptualization, C.G. and X.C.; methodology, C.G.; software, C.G.; validation, X.Z. and C.G.; formal analysis, C.G.; investigation, C.G.; resources, X.Z.; data curation, C.G.; writing—original draft preparation, C.G.; writing—review and editing, X.C. and X.Z.; visualization, C.G.; supervision, X.Z.; project administration, X.C.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the National Natural Science Foundation of China (Grant Nos. 52274158 and 51834006) and Science and Technology Program of Shaanxi Province (2024QY2-GJHX-09).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this study is openly available on the NASA repository and it is called the Turbofan Engine Degradation Simulation Dataset (https://data.nasa.gov/dataset/cmapss-jet-engine-simulated-data) (accessed on 17 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lei, Y.; Li, N.; Guo, L.; Li, N.; Yan, T.; Lin, J. Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mech. Syst. Sig. Process. 2018, 104, 799–834. [Google Scholar] [CrossRef]
Wang, T.; Tang, X.; Lu, J.; Liu, F. A novel spatio-temporal hybrid neural network for remaining useful life prediction. J. Supercomput. 2023, 79, 19095–19117. [Google Scholar] [CrossRef]
Jia, B.; Liang, G.; Huang, Z.; Song, X.; Liao, Z. Rotating Machinery Structural Faults Feature Enhancement and Diagnosis Based on Multi-Sensor Information Fusion. Machines 2025, 13, 553. [Google Scholar] [CrossRef]
Jose, J.S.; Yosimar, A.J.; Miguel, D.; Jesus, R.R.; Alfredo, R.O. Condition monitoring strategy based on an optimized selection of high-dimensional set of hybrid features to diagnose and detect multiple and combined faults in an induction motor. Measurement 2021, 178, 109404. [Google Scholar] [CrossRef]
Zhou, H.; Huang, Q.; Zhou, C.; He, P.; Zhe, N.; Wang, H. Rotating machinery fault diagnosis method based on temporal–spatial vibration feature fusion extraction. IEEE Sens. J. 2024, 25, 1184–1197. [Google Scholar] [CrossRef]
Zhang, Y.; Li, Y.; Wang, Y.; Yang, Y.; Wei, X. Adaptive spatio-temporal graph information fusion for remaining useful life prediction. IEEE Sens. J. 2021, 22, 3334–3347. [Google Scholar] [CrossRef]
Zhang, X.; Leng, Z.; Zhao, Z.; Li, M.; Yu, D.; Chen, X. Spatial-temporal dual-channel adaptive graph convolutional network for remaining useful life prediction with multi-sensor information fusion. Adv. Eng. Inf. 2023, 57, 102120. [Google Scholar] [CrossRef]
Yang, L.; Chen, Y.; Ma, X.; Qiu, Q.; Peng, R. A prognosis-centered intelligent maintenance optimization framework under uncertain failure threshold. IEEE Trans. Reliab. 2023, 73, 115–130. [Google Scholar] [CrossRef]
Liang, P.; Li, Y.; Wang, B.; Yuan, X.; Zhang, L. Remaining useful life prediction via a deep adaptive transformer framework enhanced by graph attention network. Int. J. Fatigue 2023, 174, 107722. [Google Scholar] [CrossRef]
Li, X.; Ding, Q.; Sun, J. Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab. Eng. Syst. Saf. 2018, 172, 1–11. [Google Scholar] [CrossRef]
Li, H.; Zhao, W.; Zhang, Y.; Zio, E. Remaining useful life prediction using multi-scale deep convolutional neural network. Appl. Soft Comput. 2020, 89, 106113. [Google Scholar] [CrossRef]
Wu, Y.; Yuan, M.; Dong, S.; Lin, L.; Liu, Y. Remaining useful life estimation of engineered systems using vanilla LSTM neural networks. Neurocomputing 2018, 275, 167–179. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, Q.; Shao, S.; Niu, T.; Yang, X. Attention-based LSTM network for rotatory machine remaining useful life prediction. IEEE Access 2020, 8, 132188–132199. [Google Scholar] [CrossRef]
Li, J.; Li, X.; He, D. A directed acyclic graph network combined with CNN and LSTM for remaining useful life prediction. IEEE Access 2019, 7, 75464–75475. [Google Scholar] [CrossRef]
Ma, M.; Mao, Z. Deep-convolution-based LSTM network for remaining useful life prediction. IEEE Trans. Ind. Inf. 2020, 17, 1658–1667. [Google Scholar] [CrossRef]
Wang, G.; Zhang, Y.; Lu, M.; Wu, Z. Hierarchical graph neural network with adaptive cross-graph fusion for remaining useful life prediction. Meas. Sci. Technol. 2023, 34, 055112. [Google Scholar] [CrossRef]
Xiao, Y.; Cui, L.; Liu, D. Multi-graph attention fusion graph neural network for remaining useful life prediction of rolling bearings. Meas. Sci. Technol. 2024, 35, 106125. [Google Scholar] [CrossRef]
Liu, H.; Liu, Z.; Jia, W.; Lin, X. Remaining useful life prediction using a novel feature-attention-based end-to-end approach. IEEE Trans. Ind. Inf. 2020, 17, 1197–1207. [Google Scholar] [CrossRef]
Song, Y.; Gao, S.; Li, Y.; Jia, L.; Li, Q.; Pang, F. Distributed attention-based temporal convolutional network for remaining useful life prediction. IEEE Internet Things J. 2020, 8, 9594–9602. [Google Scholar] [CrossRef]
Kong, Z.; Jin, X.; Xu, Z.; Zhang, B. Spatio-temporal fusion attention: A novel approach for remaining useful life prediction based on graph neural network. IEEE Trans. Instrum. Meas. 2022, 71, 3515912. [Google Scholar] [CrossRef]
Zhang, D.; Stewart, E.; Entezami, M.; Roberts, C.; Yu, D. Intelligent acoustic-based fault diagnosis of roller bearings using a deep graph convolutional network. Measurement 2020, 156, 107585. [Google Scholar] [CrossRef]
Chen, X.; Zeng, M. Convolution-graph attention network with sensor embeddings for remaining useful life prediction of turbofan engines. IEEE Sens. J. 2023, 23, 15786–15794. [Google Scholar] [CrossRef]
Vrahatis, A.G.; Lazaros, K.; Kotsiantis, S. Graph attention networks: A comprehensive review of methods and applications. Future Internet 2024, 16, 318. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
Chen, Z.; Wu, M.; Zhao, R.; Guretno, F.; Yan, R.; Li, X. Machine remaining useful life prediction via an attention-based deep learning approach. IEEE Trans. Ind. Electron. 2020, 68, 2521–2531. [Google Scholar] [CrossRef]
Zhang, C.; Lim, P.; Qin, A.K.; Tan, K.C. Multiobjective deep belief networks ensemble for remaining useful life estimation in prognostics. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2306–2318. [Google Scholar] [CrossRef]
Sateesh Babu, G.; Zhao, P.; Li, X.L. Deep convolutional neural network based regression approach for estimation of remaining useful life. In Database Systems for Advanced Applications, Proceedings of the 21st International Conference, DASFAA 2016, Dallas, TX, USA, 16–19 April 2016; Part I 21; Springer International Publishing: Cham, Switzerland, 2016; pp. 214–228. [Google Scholar]
Liao, Y.; Zhang, L.; Liu, C. Uncertainty prediction of remaining useful life using long short-term memory network based on bootstrap method. In 2018 IEEE International Conference on Prognostics and Health Management (ICPHM); IEEE: Piscataway, NJ, USA, 2018; pp. 1–8. [Google Scholar]
Yu, W.; Kim, I.Y.; Mechefske, C. Remaining useful life estimation using a bidirectional recurrent neural network based autoencoder scheme. Mech. Syst. Signal Process. 2019, 129, 764–780. [Google Scholar] [CrossRef]
Wu, Z.; Yu, S.; Zhu, X.; Ji, Y.; Pecht, M. A weighted deep domain adaptation method for industrial fault prognostics according to prior distribution of complex working conditions. IEEE Access 2019, 7, 139802–139814. [Google Scholar] [CrossRef]
Wang, L.; Cao, H.; Xu, H.; Liu, H. A gated graph convolutional network with multi-sensor signals for remaining useful life prediction. Knowl. Based Syst. 2022, 252, 109340. [Google Scholar] [CrossRef]

Figure 1. GAT model: (a) the attention mechanism used in GAT, and (b) multi-head attention performed by node 1 on its neighboring nodes.

Figure 2. The basic structure of an LSTM cell.

Figure 3. The RUL prediction framework for turbofan engines based on GAT-LSTM.

Figure 6. Diagram of the turbofan engine.

Figure 7. The data tendency of all engines.

Figure 8. The process of time window processing.

Figure 9. RUL of a machine.

Figure 10. The results of the RUL predictions for the C-MAPSS dataset.

Figure 11. Testing examples of RUL predictive performance demonstration. (a) Test #20 of FD001; (b) Test #242 of FD002; (c) Test #46 of FD003; (d) Test #71 of FD004.

Table 1. Information of the C-MAPSS dataset.

Dataset	FD001	FD002	FD003	FD004
Engine units for training	100	260	100	249
Engine units for testing	100	259	100	248
Operating conditions	1	6	1	6
Fault modes	1	1	2	2

Table 2. Turbofan engine data set introduction.

Index	Symbol	Description	Units
1	T2	Total temperature at fan inlet	°R
2	T24	Total temperature at LPC outlet	°R
3	T30	Total temperature at HPC outlet	°R
4	T50	Total temperature at LPT outlet	°R
5	P2	Pressure at fan inlet	psia
6	P15	Total pressure in bypass-duct	psia
7	P30	Total pressure at HPC outlet	psia
8	Nf	Physical fan speed	rpm
9	Nc	Physical core speed	rpm
10	epr	Engine pressure ratio (P50/P2)	-
11	Ps30	Static pressure at HPC outlet	psia
12	phi	Ratio of fuel flow to Ps30	pps/psi
13	NRf	Corrected fan speed	rpm
14	NRc	Corrected core speed	rpm
15	BPR	Bypass Ratio	-
16	farB	Burner fuel-air ratio	-
17	htBleed	Bleed Enthalpy	-
18	Nf_dmd	Demanded fan speed	rpm
19	PCNfR_dmd	Demanded corrected fan speed	rpm
20	W31	HPT coolant bleed	lbm/s
21	W32	LPT coolant bleed	lbm/s

Table 3. Contrast to other models based on C-MAPSS dataset.

Methods	FD001		FD002		FD003		FD004
Methods	RMSE	Score	RMSE	Score	RMSE	Score	RMSE	Score
ELM [23]	17.27	523	37.28	498,150	18.90	573.78	38.43	121,414
CNN [24]	18.45	1286	30.29	13,570	19.82	1596	29.16	7886
MS-DCNN [25]	11.44	196.22	19.35	3747	11.67	241.89	22.22	4844
LSTMNN [26]	14.89	481	26.86	7982	15.11	493	27.11	5200
BiLSTM-ED [27]	14.74	273	22.07	3099	17.48	574	23.49	3202
CNN-LSTM [11]	14.40	290	27.23	9869	14.32	316	26.69	6594
AGCNN [28]	12.42	225.51	19.43	1492.76	13.39	227.09	21.50	3392.6
STFA [26]	11.35	194.44	19.17	2493.09	11.64	224.53	21.41	2760.13
HGNN-AGCF [29]	12.58	218.04	21.67	4584.97	12.40	248.47	22.43	2737.86
GAT-LSTM	11.78	214.12	18.22	1989.67	11.91	217.85	19.36	2152.34

Table 4. Comparative Results of Ablation Experiments.

Model Variant	FD001		FD002		FD003		FD004
Model Variant	RMSE	Score	RMSE	Score	RMSE	Score	RMSE	Score
GAT-LSTM-Prio	13.05	295.41	20.85	3540.33	12.80	268.12	21.45	4320.18
GAT-LSTM-Corr	12.31	252.15	20.38	3315.67	13.55	305.74	20.72	4024.56
GCN-LSTM	12.40	260.10	19.55	2750.50	12.85	255.20	20.38	3750.88
GAT-FC	14.55	380.28	21.80	4010.45	14.88	410.31	22.74	5450.92
GAT-LSTM	11.78	214.12	18.22	1989.67	11.91	217.85	19.36	2152.34

Table 5. Computational complexity comparison of different models.

Methods	Parameters	Training (s/Epoch)	Methods	Parameters	Training (s/Epoch)
ELM	12,284	0.82	CNN-LSTM	214,967	14.75
CNN	84,752	4.18	AGCNN	241,879	16.18
MS-DCNN	155,892	7.45	STFA	197,531	13.48
LSTMNN	97,628	6.76	HGNN-AGCF	264,952	18.76
BiLSTM-ED	188,543	11.32	GAT-LSTM	227,854	15.12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, X.; Gao, C.; Zhang, X. Remaining Useful Life Prediction for Rotating Machinery via Multi-Graph-Based Spatiotemporal Feature Fusion. Appl. Sci. 2026, 16, 2738. https://doi.org/10.3390/app16062738

AMA Style

Cao X, Gao C, Zhang X. Remaining Useful Life Prediction for Rotating Machinery via Multi-Graph-Based Spatiotemporal Feature Fusion. Applied Sciences. 2026; 16(6):2738. https://doi.org/10.3390/app16062738

Chicago/Turabian Style

Cao, Xiangang, Chenjian Gao, and Xinyuan Zhang. 2026. "Remaining Useful Life Prediction for Rotating Machinery via Multi-Graph-Based Spatiotemporal Feature Fusion" Applied Sciences 16, no. 6: 2738. https://doi.org/10.3390/app16062738

APA Style

Cao, X., Gao, C., & Zhang, X. (2026). Remaining Useful Life Prediction for Rotating Machinery via Multi-Graph-Based Spatiotemporal Feature Fusion. Applied Sciences, 16(6), 2738. https://doi.org/10.3390/app16062738

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Remaining Useful Life Prediction for Rotating Machinery via Multi-Graph-Based Spatiotemporal Feature Fusion

Abstract

1. Introduction

2. Theoretical Foundation

2.1. Graph Attention Network

2.2. Long Short-Term Memory Network

3. Framework of RUL Prediction Based on GAT-LSTM

3.1. Overview of the Research Framework

3.2. Data Preprocessing

3.3. Association Graph Learning

3.4. Remaining Useful Life Prediction Model

4. Experiment

4.1. Experimental Data Sources

4.2. Data Processing

4.2.1. Sensor Selection

4.2.2. Data Normalization

4.2.3. Label Building

4.2.4. Evaluation Indicators

4.3. RUL Prediction Results

4.4. Comparison with State-of-the-Art Methods

4.5. Ablation Experiments

4.6. Model Complexity Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI