Remaining Useful Life Prediction for Aero-Engines Based on Multi-Scale Dilated Fusion Attention Model

Xiao, Guosong; Jin, Chenfeng; Bai, Jie

doi:10.3390/app15179813

Open AccessArticle

Remaining Useful Life Prediction for Aero-Engines Based on Multi-Scale Dilated Fusion Attention Model

by

Guosong Xiao

^1,2,

Chenfeng Jin

³

and

Jie Bai

^1,2,*

¹

Key Laboratory of Civil Aircraft Airworthiness Technology, Civil Aviation University of China, Tianjin 300300, China

²

Institute of Scientific and Technological Innovation, Civil Aviation University of China, Tianjin 300300, China

³

School of Aeronatical Engineering, Civil Aviation University of China, Tianjin 300300, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9813; https://doi.org/10.3390/app15179813

Submission received: 14 July 2025 / Revised: 3 September 2025 / Accepted: 4 September 2025 / Published: 7 September 2025

(This article belongs to the Section Mechanical Engineering)

Download

Browse Figures

Versions Notes

Abstract

To address the limitations of CNNs and RNNs in handling complex operating conditions, multi-scale degradation patterns, and long-term dependencies—with attention mechanisms often failing to highlight key degradation features—this paper proposes a remaining useful life (RUL) prediction framework based on a multi-scale dilated fusion attention (MDFA) module. The MDFA leverages parallel dilated convolutions with varying dilation rates to expand receptive fields, while a global-pooling branch captures sequence-level degradation trends. Additionally, integrated channel and spatial attention mechanisms enhance the model’s ability to emphasize informative features and suppress noise, thereby improving overall prediction robustness. The proposed method is evaluated on NASA’s C-MAPSS and N-CMAPSS datasets, achieving MAE values of 0.018–0.026, RMSE values of 0.021–0.032, and R² scores above 0.987, demonstrating superior accuracy and stability compared to existing baselines. Furthermore, to verify generalization across domains, experiments on the PHM2012 bearing dataset show similar performance (MAE: 0.023–0.026, RMSE: 0.031–0.032, R²: 0.987–0.995), confirming the model’s effectiveness under diverse operating conditions and its adaptability to different degradation behaviors. This study provides a practical and interpretable deep-learning solution for RUL prediction, with broad applicability to aero-engine prognostics and other industrial health-monitoring tasks.

Keywords:

deep learning; remaining useful life (RUL) prediction; multi-scale dilated fusion attention (MDFA) module; channel attention mechanism; spatial attention mechanism

1. Introduction

Failures of aircraft or gas turbine engines can cause substantial economic losses and, in extreme cases, threaten flight safety. Despite technological advances improving turbine–engine reliability, their complex mechanical structures remain vulnerable to multi-mode failures and abnormal degradation. The International Air Transport Association (IATA) reports that the average service life of commercial aircraft rose from 21.3 years in 2000 to 28.6 years by 2023, highlighting the growing importance of engine reliability. Similarly, AviationWeek projects the global engine maintenance, repair, and overhaul (MRO) market will reach USD 50.3 billion by 2025 (CAGR 4.7%). In this context, predictive maintenance has emerged as a proactive strategy to minimize unplanned downtime and cut maintenance costs. At its core lies evaluation of an engine’s current status and estimation of its remaining useful life (RUL)—the number of cycles or time it can operate normally before failure. Achieving accurate RUL predictions therefore depends on advanced health-monitoring systems and robust prognostic models.

Current RUL prediction approaches for aero-engines can be broadly classified into three categories: physics-based models, data-driven models, and expert system-based methods [1,2]. Physics-based models [3] establish detailed simulations of engine components to analyze failure mechanisms. For example, Saxena et al. [4] proposed a framework based on compressor efficiency degradation equations, while Liu et al. [5] applied Paris’ law to model turbine disk crack propagation under low-cycle fatigue via finite-element simulations. Although these methods enable insight into underlying physical mechanisms, they often incur high development and maintenance costs, depend heavily on idealized assumptions (e.g., steady-state conditions), and lack flexibility for capturing sudden failures under transient shock loads.

Data-driven approaches, by contrast, eliminate the need for detailed physical models or extensive prior knowledge [6], relying on machine learning and deep learning algorithms to uncover hidden patterns in historical monitoring data. Deep learning architectures, in particular, have demonstrated strong capability in extracting informative features from high-dimensional signals, thereby facilitating more accurate health assessments [7]. Such methods reduce the complexity associated with physics-based model construction and can deliver reliable RUL predictions [8] even when detailed degradation mechanisms are not fully understood. As a result, data-driven techniques have become a central focus within Prognostics and Health Management (PHM). However, data-driven methods still face challenges in handling diverse operating conditions and complex degradation patterns.

When operational or monitoring data are incomplete or unavailable, expert system-based methods offer an alternative by using statistical inference grounded in expert knowledge and historical maintenance records. Common statistical distributions include Poisson, Weibull, exponential, log-normal, and inverse Gaussian [9,10]. For instance, Liu et al. [11] employed lifecycle data from similar engines and current operational cycles to develop a Weibull-based RUL model within a reliability framework. Bai et al. [12] further extended this concept to a fleet-level model using inverse Gaussian distributions to analyze performance drift and failure sequences, estimating average engine lifespans at a macro scale. Although expert system approaches can perform adequately with sparse data, they often struggle to adapt when underlying operating conditions change significantly, limiting their general applicability.

Within the data-driven category, methods can be further subdivided into traditional machine learning techniques and deep learning-based approaches. Traditional machine learning models often face difficulties when processing high-dimensional data, leading to trade-offs between predictive accuracy and computational efficiency. Deep learning models, by contrast, exhibit superior feature extraction capabilities and have thus become the mainstream. Representative deep-learning methods include convolutional neural networks (CNNs) and long short-term memory networks (LSTMs), which have shown strong performance on large-scale operational datasets. For example, Li et al. [13] proposed a deep CNN for RUL estimation; Chui et al. [14] designed a sensor-time–based LSTM-RNN encoder–decoder model; and Wahid et al. [15] developed a Transformer-based architecture for adaptive degradation modeling. To further refine temporal feature representation, Shi et al. [16] introduced attention-based dynamic weighting to address memory decay, and Qin et al. [17] proposed a multi-scale fusion framework that extracts degradation features at various scales before feeding them into fully connected networks for final RUL prediction.

Despite these advances, existing data-driven approaches still exhibit notable limitations. First, traditional CNNs and RNNs excel at capturing local patterns but struggle to model long-term dependencies inherent in full-lifecycle degradation. Although LSTM and GRU variants partially alleviate gradient vanishing, they still suffer from low learning efficiency, hindering their ability to capture subtle, long-term degradation trends. Second, Transformer-based methods, while effective for modeling global dependencies, tend to be parameter-intensive and may show unstable performance when data are limited. Third, many models focus solely on single-scale feature extraction or global channel attention, neglecting selective emphasis on temporally critical regions; this oversight can lead to insufficient recognition of RUL-relevant patterns and high information redundancy, ultimately constraining both prediction accuracy and computational efficiency.

To address these challenges, this paper proposes a novel RUL prediction model for aero-engines based on a multi-scale dilated fusion attention (MDFA) mechanism. First, to overcome the limited receptive field of traditional CNNs, the model introduces multi-scale dilated convolution modules with parallel branches of varying dilation rates, thereby enhancing feature extraction across different temporal scales. Second, to capture locally important temporal signals often overlooked by existing methods, the model integrates both channel and spatial attention mechanisms, enabling selective emphasis on informative dimensions and time steps. This synergistic design enhances the depth and sensitivity of feature representation, alleviating constraints in long-term dependency modeling and redundancy control found in conventional approaches. Experimental validation demonstrates that the MDFA model achieves superior robustness, precision, and generalization performance across diverse degradation scenarios.

2. Methods

2.1. Multi-Scale Dilated Convolutional Network

Time-series data from multiple aero-engine sensors exhibit strong nonlinearity and temporal dependencies, but traditional CNNs enlarge the receptive field by stacking layers, increasing model complexity and causing vanishing/exploding gradients. To address these issues, this study incorporates a multi-scale dilated convolution (DC) module, which enlarges the receptive field without deepening the network and thus improves long-range dependency capture for more accurate RUL prediction. As shown in Figure 1, the yellow blocks represent the temporal feature maps along the sequence at each layer; The blue blocks mark the feature at the current time step (input, middle layers, and output). The network comprises three parallel convolutional branches with dilation rates of 1, 2, and 4, enabling extraction of degradation features across various temporal scales.

The receptive field of a dilated convolution grows exponentially with depth, mathematically described as Equation (1):

R_{N} = 1 + \sum_{i = 1}^{N} (K - 1) \times (I_{i} - 1) \times \prod_{j = 1}^{i} S_{j}

(1)

where N denotes the number of dilated Convolutional layers, K denotes the kernel size of the i-th layer,

I_{i}

is the i-th layer dilation rate, and

S_{j}

is the j-th step.

Figure 1 illustrates a deep neural network architecture constructed by stacking multiple dilated convolutional layers. Each layer employs a convolutional kernel of size 1 × 3, which operates along the temporal dimension to capture local dependencies between adjacent input points while maintaining computational efficiency. As the dilation rate increases, the effective receptive field of the 1 × 3 kernel expands, enabling the network to model long-range dependencies without increasing the number of parameters. As the network deepens, the receptive field progressively enlarges. Dilated convolutions expand the receptive field by inserting zeros between kernel elements, enabling a broader contextual view without increasing the number of parameters [18]. The mathematical formulation is presented in Equation (2).

F_{o u t} (t) = \sum_{k = 0}^{K - 1} W (k) \cdot X (t - τ \cdot k)

(2)

where

F_{o u t} (t)

denotes the output feature at time step t,

W (k)

represents the weight parameter of the convolution kernel at the k-th element, and

X (t - τ \cdot k)

indicates the input value at time step t–k.

2.2. Dual Attention Channels

The dual attention mechanism comprises channel and spatial attention modules.

For clarity of the schematic in Figure 2, green blocks indicate the three parallel 2D convolutional branches, the orange block denotes the global average-pooling branch, the light-green stack represents the merged multi-scale feature maps (H × W × C), the mixed light yellow, yellow bar correspond to the channel-attention gating MLP (ReLU→Sigmoid), and the mixed color square depict the spatial attention map and the final hybrid color cube represents the recalibrated output tensor. Colors are for visual distinction only and carry no quantitative meaning.

For channel attention, global average pooling is first applied to the merged feature maps to capture global statistical descriptors of each channel, after which a 1 × 1 convolution generates channel-wise attention weights that are normalized via a Sigmoid activation to emphasize informative channels. In the spatial attention module, a 1 × 1 convolution compresses the feature maps along the channel axis to produce a spatial attention map, which is similarly normalized by a Sigmoid function to highlight critical time steps in the sequence. The Sigmoid function is adopted because it maps values into the range [0, 1], which is well-suited for representing attention weights as probabilities and ensures numerical stability during training. In contrast, functions such as Tanh or ReLU could introduce negative values or unbounded outputs, making the attention weights less interpretable and potentially hindering convergence. These attention weights are then broadcast and multiplied element-wise with the input features across channels, achieving channel-wise reweighting, as formulated in Equations (3) and (4).

Z_{C} = σ (W_{2} \times δ (W_{1} \times A v g P o o l (U)))

(3)

U^{'} = U \otimes Z_{C}

(4)

The spatial attention mechanism is computed as shown in Equation (5):

S = f_{1 \times 1} (U), S \in R^{B \times 1 \times H \times W}

(5)

The Sigmoid function is used to normalize the weights into the range [0, 1], as expressed in Equation (6):

Z_{S} = σ (S), Z_{S} \in {[0, 1]}^{B \times 1 \times H \times W}

(6)

Next, the spatial attention map is broadcast to the original number of channels and multiplied element-wise with the input features to generate the spatially weighted output, as described in Equation (7):

U^{″} = U \otimes Z_{S}

(7)

where

A v g P o o l (.)

denotes global average pooling;

W_{1}

and

W_{2}

are the weight matrices of the fully connected layers in the channel attention block;

δ

and

σ

denote the ReLU and Sigmoid activation functions;

f_{1 \times 1} (.)

indicates 1D convolution;

\otimes

represents channel-wise multiplication.

The outputs of the channel and spatial attention modules are fused using element-wise maximum, forming a joint attention feature map, as defined in Equation (8):

F_{f u s e d} = m a x (F_{c h a n n e l}, F_{s p a t i a l})

(8)

where

F_{s p a t i a l}

and

F_{c h a n n e l}

denote the spatially and channel-weighted features, respectively. The fused attention is further modulated with the original merged features via element-wise multiplication, as shown in Equations (9) and (10):

F_{e n h a n c e d} = F_{f u s e d} \cdot F_{u}

(9)

F_{u} = C o n c a t (F_{r - 1}, F_{r - 2}, F_{r - 4} F_{g l o b a l})

(10)

A final 1 × 1 convolution layer is applied to reduce dimensionality and integrate features, producing the final output representation.

2.3. MDFA-Based RUL Prediction Model Architecture

Figure 2 presents the overall structure of the proposed multi-scale dilated fusion attention (MDFA) network. This model integrates multi-scale dilated convolutions with dual attention mechanisms to deeply mine degradation patterns from bearing vibration signals. The first module comprises three parallel convolutional branches, each configured with a distinct dilation rate to capture spatial features at different receptive fields. Through this multi-scale structure, the model effectively extracts representative features from the input sensor data across various temporal scales.

In the second stage of the MDFA network, feature aggregation is achieved through both channel and spatial attention mechanisms. The channel attention mechanism first applies global average pooling to the feature maps from the three parallel convolutional branches (each with a different dilation rate), producing a global descriptor for each channel. Two fully connected layers—activated by ReLU and Sigmoid, respectively—then learn channel importance weights, which are multiplied element-wise with the original features along the channel dimension to enhance informative channels. The ReLU activation introduces nonlinearity and prevents negative responses in the hidden layer, thereby improving feature representation capacity, while the subsequent Sigmoid activation constrains the learned weights to the range [0, 1], making them interpretable as attention coefficients. If alternative activations were used, such as Tanh or ReLU in the output layer, the weights might become negative or unbounded, which would weaken their suitability for attention modeling and hinder network stability. Simultaneously, the spatial attention mechanism applies global pooling across the temporal dimension of the multi-scale features, followed by a 1 × 1 convolution and Sigmoid activation to generate spatial attention weights that emphasize critical time steps. These weights are broadcast and multiplied element-wise with the input features for spatial reweighting. The outputs of both attention modules are fused via element-wise addition to yield an enhanced feature representation, which is passed through a final 1 × 1 convolution layer for dimensionality reduction and feature integration, producing the network’s deep output features.

Although the input consists of 1D time-series sensor signals, we reshape the data into a two-dimensional format (time × feature/channel), which allows the convolution kernels to simultaneously capture temporal dependencies and inter-sensor correlations. Compared with strictly 1D convolutions, the 2D formulation provides richer local feature interactions across both dimensions while remaining computationally efficient, thereby better aligning with the multiscale attention design of MDFA.

3. Experimental Procedure

3.1. The Settings of Model Parameters

The proposed MDFA model was validated on benchmark datasets for aero-engine RUL prediction, specifically using NASA’s C-MAPSS dataset, which contains sensor data from gas-path and mechanical subsystems of aircraft engines. Through extensive experiments, the optimal dilation-rate configuration was identified as {1, 2, 4}, balancing the extraction of local details with modeling of global degradation trends while avoiding redundant computations from overlapping receptive fields. Compared to other configurations, such as {2, 4, 8}, {1, 3, 5}, and {1, 2, 4, 8}, the {1, 2, 4} setup maintains greater feature diversity without unnecessary complexity. The detailed hyperparameter settings are listed in Table 1.

Although a comprehensive ablation analysis is not included, we observed in repeated preliminary experiments that using larger dilation rates (e.g., {2, 4, 8}) often led to oversmoothing, where short-term degradation cues were diluted, while irregular intervals, such as {1, 3, 5}, produced less consistent receptive-field coverage and unstable convergence. In contrast, {1, 2, 4} not only yielded lower validation loss on multiple subsets but also provided a more interpretable multi-scale representation (fine, medium, and coarse temporal resolution). These empirical findings support the choice of {1, 2, 4} as a practical and effective configuration for this task.

The hyperparameter values in Table 1 were determined through a combination of grid search and empirical tuning on the training and validation sets. We first adopted commonly used settings in related RUL prediction studies as initial references, and then adjusted parameters such as learning rate, dropout, and window size based on validation performance to achieve a balance between model accuracy and generalization.

The MDFA model was trained with an initial learning rate of 0.0001, which was dynamically reduced during training to speed up convergence and avoid getting stuck in suboptimal minima. A dropout rate of 0.3 was applied after key layers to prevent overfitting and improve generalization on unseen operating conditions. We used the Adam optimizer, which was chosen for its adaptive learning-rate adjustments and robustness when handling sparse gradients and large datasets, and adopted mean squared error (MSE) as the loss function to penalize large deviations between predicted and actual RUL values. Table 2 summarizes the input–output tensor dimensions, where B is the batch size, C is the per-time-step feature dimension, and H is the sequence length.

Within the multi-scale dilated convolution module, three parallel branches with dilation rates {1, 2, 4} and 3 × 3 kernels process single-channel input, forming receptive fields that capture degradation features at different temporal scales without excessively deepening the network. A fourth branch applies global average pooling to aggregate long-term trends across the entire sequence. These four outputs are concatenated and fused to ensure consistent dimensionality before passing to the dual attention stage. The channel attention submodule learns channel-wise importance via global pooling followed by two fully connected layers (ReLU then Sigmoid), reweighting each feature channel. Simultaneously, the spatial attention submodule applies global pooling along the channel axis, followed by a 1 × 1 convolution and Sigmoid activation to highlight critical time steps. By multiplying these attention weights back into the fused feature maps, the network emphasizes the most informative dimensions and temporal locations. Finally, a 1 × 1 convolution compresses the feature map, reducing computational cost while preserving the enriched, multi-scale representation needed for accurate RUL prediction. This design balances local detail extraction, global trend modeling, and computational efficiency.

3.2. Experimental Introduction

3.2.1. Dataset Description

The NASA C-MAPSS [19] (Commercial Modular Aero-Propulsion System Simulation) dataset was employed to validate the effectiveness of the proposed method. C-MAPSS is a high-fidelity computer model used to simulate the degradation of large commercial turbofan engines. It includes atmospheric models, allowing simulations under various conditions: (1) altitudes ranging from 0 to 40,000 feet, (2) flight Mach numbers from 0 to 0.90, and (3) sea-level temperatures from −60 to 103 °F.

As shown in Table 3, the C-MAPSS dataset comprises four subsets, each simulating engine degradation under different fault modes and operating conditions.

In real-world applications, performance degradation in engines is generally not evident during the early operational phase. However, as operational time increases, engine health progressively deteriorates [20]. To reflect this behavior, a piecewise linear function is employed to annotate the remaining useful life (RUL) of each sample. Specifically, during the early stage of engine operation, the components maintain high performance with minimal degradation, resulting in a constant RUL value.

According to the RUL annotation strategy described above and the method proposed by Al-Khazraji et al. [21], the RUL for each data sample is calculated as shown in Equation (11):

R U L_{i} = \{\begin{matrix} R U L_{\max}, o t h e r w i s e \\ C_{\max} - C_{t}, 0 \leq R U L_{i} < R U L_{\max} \end{matrix}

(11)

where

R U L_{i}

represents the computed variable,

C_{\max}

represents the maximum number of operational cycles for a given engine unit, and

C_{t}

denotes the current time step within its life cycle. The calculated variable corresponds to the remaining number of operational cycles from time step t.

In RUL prediction tasks, setting an appropriate initial RUL value is crucial for effective model training and accurate forecasting. Following the approach in [22], the initial RUL is assigned a value within the range of 120 to 130, which aligns with the health characteristics of engine components during the operational stage. Accordingly, this study set the initial RUL to 125. The linear degradation curves in Figure 3 were derived directly from the C-MAPSS dataset by applying the piecewise linear RUL labeling strategy commonly used in RUL prediction studies. Specifically, the RUL was set to a constant initial value (125 cycles in this study) during the early stable stage, and then decreased linearly with each subsequent cycle until failure, thereby generating the straight-line degradation trajectories shown in the plots.

Figure 3a–d show the RUL trajectories of engine 3 in the FD001 through FD004 subsets. In each case, the early phase (highlighted in green) exhibits a flat plateau where the RUL remains constant at 125, corresponding roughly to cycles 54, 81, 97, and 182 for the respective subsets. This plateau reflects a quasi-steady-state region in which engine degradation is negligible and operating conditions are stable. Once the operational cycle surpasses a critical turning point (marked by blue dots), RUL begins to decline almost linearly, signaling the transition from stable operation to progressive degradation. The linear descent often corresponds to the accumulation of wear or damage mechanisms, such as material fatigue or erosion, that intensify once certain thresholds are crossed. Eventually, the RUL reaches zero, denoting failure. Identifying this turning point is crucial for prognostic models, as it delineates the boundary between normal operation and active degradation; accurate detection of this inflection can markedly improve RUL prediction by focusing the model’s attention on features that emerge only after degradation initiates.

3.2.2. C-MPASS Data Preprocessing

Raw sensor signals in aero-engine RUL modeling often contain noise, redundant information, and strong inter-variable coupling, which can increase model complexity and degrade both prediction accuracy and generalization performance [23]. To mitigate these issues, Principal Component Analysis (PCA) is applied to the multi-dimensional sensor data from the C-MAPSS dataset for dimensionality reduction. PCA transforms the original correlated variables into a set of orthogonal principal components, retaining the majority of the information while reducing redundancy. The optimal number of principal components is selected based on the cumulative contribution rate. Figure 4 presents the contribution rates of the sensor features across each sub-dataset, guiding the choice of components that capture sufficient variance without unnecessary complexity.

To ensure that the selected features are applicable across various operating conditions and fault modes, the sensor contribution rates across all subdatasets were comprehensively analyzed. A set of key sensor variables that consistently exhibit significant contributions across different scenarios was selected, as summarized in Table 4.

To evaluate the effectiveness of the proposed approach, three commonly used performance metrics were adopted: root mean squared error (RMSE), mean absolute error (MAE), and the scoring function (Score). These three metrics were chosen because they are the most widely used in the prognostics community—particularly in the PHM Data Challenge—and provide complementary perspectives: RMSE emphasizes large errors, MAE reflects average prediction deviation, and Score incorporates asymmetric penalization for early versus late predictions. Compared with variance, confidence intervals, or statistical significance tests, which primarily quantify result uncertainty, these three measures directly capture predictive accuracy and are thus more suitable for fair benchmarking against prior studies. These metrics are defined as follows:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i}^{p} - y_{i}^{t})}^{2}}

(12)

MAE = \frac{1}{N} \sum_{i = 1}^{N} | y_{i}^{p} - y_{i}^{t} |

(13)

Score = \{\begin{matrix} \sum_{i = 1}^{N} \exp (- \frac{e_{i}}{13}) - 1, e_{i} < 0 (early prediction) \\ \sum_{i = 1}^{N} \exp (\frac{e_{i}}{10}) - 1, e_{i} > 0 (late prediction) \end{matrix}

(14)

where N denotes the total number of test samples, and

e_{i} = {\hat{y}}_{i} - y_{i}

is the prediction error, with

{\hat{y}}_{i}

representing the predicted RUL and yi the true RUL. A negative error (e_i < 0) corresponds to an early prediction (the model predicts a shorter life than reality), whereas a positive error (e_i > 0 > 0) corresponds to a late prediction (the model predicts a longer life than reality). The constants (13 for early prediction and 10 for late prediction) control the penalty strength. A smaller Score indicates smaller errors and less severe penalties, thereby reflecting better predictive performance.

3.2.3. Experimental Analysis

The proposed MDFA model was evaluated on the C-MAPSS dataset. Due to computational constraints, RUL prediction results of a subset of engines were randomly selected for visualization. Figure 5 illustrates the comparison between predicted and actual RUL values for selected engine units across different subdatasets. The predicted values are outputs of the model at each time step, while the corresponding RUL files provide the ground truth labels.

Figure 5a–d compare the RUL prediction performance of the MDFA model across all four subdataset test sets. It is evident that prediction errors in FD002 and FD004 exceeded those in FD001 and FD003, which can be traced back to the operational complexity: FD001 and FD003 each encompass a single, relatively simple operating mode, whereas FD002 and FD004 involve six distinct operating scenarios. This added variability not only introduced sudden shifts in sensor patterns but also altered degradation trajectories, making it more challenging for any model to generalize. As a result, the MDFA’s multiscale receptive fields and attention modules, designed to capture both local and global features, still exhibited larger deviations under these heterogeneous conditions. Nonetheless, even in FD002 and FD004, the predicted RUL curves follow the ground truth trends closely, demonstrating that MDFA maintains strong robustness. Occasional spikes in error typically coincide with abrupt operating condition changes, suggesting that future work could benefit from incorporating explicit operating status encoding or online domain adaptation strategies to further reduce these deviations. While Figure 5 mainly provides a qualitative comparison, the purpose of this visualization is to illustrate overall prediction trends and the robustness of the proposed model under different operating conditions. A detailed quantitative error distribution analysis will be considered in our future work to complement the visual results.

To illustrate individual engine performance, Figure 6 visualizes the RUL prediction for the third engine in each subdataset (FD001–FD004). In FD001 and FD003, where operating modes remained consistent, the predicted and actual curves overlap almost entirely, indicating that MDFA successfully captured gradual degradation without being distracted by noise or redundant information. For FD002 and FD004, the trajectories still align closely overall, but occasional misalignments correspond to points where the engine’s operating mode shifted, reflecting transient sensor behavior that momentarily masked degradation signals. This observation underlines the importance of the MDFA’s attention mechanisms: by selectively reweighting informative channels and time steps, the model minimizes the impact of non-degradation-related fluctuations. Across all four cases, MDFA demonstrated high prediction accuracy and stability, confirming its effectiveness in both simple and complex operating environments.

The experimental results across the four subdatasets indicate that the MDFA model demonstrates strong agreement between the predicted RUL curves and the actual degradation trajectories, with the majority of predictions falling within the 95% confidence interval. This reflects the model’s robust generalization ability and high prediction stability. Specifically, in FD001 and FD003—which involve single operating conditions and fault types—the MDFA model accurately captured the degradation trends of the engine units, yielding predictions closely aligned with the ground truth. These results underscore the model’s capability in temporal feature extraction and sequence modeling under relatively simple conditions.

In contrast, for FD002 and FD004, which contain multiple operating regimes and compound fault scenarios, the model exhibits minor prediction fluctuations at certain time steps. Nevertheless, the overall degradation patterns remain consistent with the actual RUL curves, indicating strong adaptability and resilience to complex operational variations and fault couplings. In Figure 6, the third engine from each subdataset was chosen as a representative case to illustrate the prediction process in a clear and consistent manner across datasets. This specific selection does not affect the generality of our conclusions, as similar trends were observed for other engines during our experiments. The choice of a single engine per dataset was made primarily for clarity of presentation, avoiding overly cluttered figures while still conveying the model’s typical prediction behavior. It is worth noting that the predicted RUL curves occasionally lag behind the true degradation trajectories. This lag arises from the model’s conservative learning of degradation dynamics and acts as a safeguard against premature failure alarms, which can be beneficial in predictive maintenance scenarios. A more detailed quantification of this lag effect will be considered in future work.

In summary, the MDFA model not only achieved high prediction accuracy under simple operating conditions but also maintained reliable performance under more complex settings, thus confirming its broad applicability to practical RUL prediction tasks in aerospace systems.

3.2.4. Comparative Experiment

To further evaluate the proposed MDFA model’s effectiveness, we compared its performance with several advanced deep learning approaches, including deep convolutional neural networks (DCNNs), temporal convolutional networks (TCNs) [24], trend-aware fully convolutional networks (TaFCNs) [25], squeeze and excitation networks (SeNets) [26], standard convolutional neural networks (CNNs), gated recurrent units (GRUs), and various hybrid architectures. Table 5 presents the root mean squared error (RMSE) and Score metrics for all models across the four C-MAPSS subdatasets. MDFA consistently outperformed these baselines. In terms of RMSE, it achieved relative improvements of 7.6 percent on FD001, 3.9 percent on FD002, 1.1 percent on FD003 and 3.1 percent on FD004. For the Score metric, its gains were 0.5 percent on FD001, 7.4 percent on FD002, 2.3 percent on FD003, and 18.8 percent on FD004. The performance gap was most pronounced on FD002 and FD004, which involve more complex operating conditions, varied fault patterns and different training sample sizes. In contrast, FD001 and FD003 presented simpler operating scenarios, leading to more accurate predictions across all models. These results demonstrate that MDFA not only achieves higher accuracy in predicting remaining useful life, but also generalizes reliably in both simple and complex degradation environments, highlighting its strong potential for practical use in aero-engine health prognostics.

3.3. Experimental Analysis on the N-CMPASS

3.3.1. Dataset Description

To further assess the effectiveness and generalizability of the proposed MDFA model, we conducted additional RUL prediction experiments on the N-CMAPSS dataset. The N-CMAPSS dataset [27] comprises eight subdatasets, aggregating operational data from 128 aircraft engine units under various degradation modes. These fault scenarios affect critical engine components, including the fan, low/high-pressure compressors, and low/high-pressure turbines, primarily impacting their flow and efficiency. Each subdataset is stored in a file and consists of two distinct parts: the development set and the test set. Both parts include six categories of variables: operational settings w, measured signals, virtual sensor readings, engine health parameters θ, remaining useful life, and auxiliary monitoring indicators. In this study, we focused on the DS02 subdataset, which provides complete degradation trajectories of 10 engine units from their healthy state to system failure. This dataset served as a critical benchmark for evaluating the RUL prediction capability of the proposed model.

3.3.2. Data Processing Procedure

The development and test sets were first loaded using appropriate data handling libraries and then concatenated to construct a unified input set. To evaluate the relevance of each sensor variable to the RUL prediction task, we employed the ExtraTreesRegressor algorithm to compute the feature importance scores, with the ranked results illustrated in Figure 7.

In accordance with the threshold suggested by [28], we selected variables with importance coefficients greater than 0.01 as model input features. This ensures that the selected variables contribute significantly to the predictive performance and provide sufficient informational content. The final selected features, their importance scores, and corresponding physical interpretations are summarized in Table 6.

To comprehensively evaluate the predictive performance of the proposed model, we compared it against several state-of-the-art deep learning models, including fully connected networks, convolutional neural networks, and gated recurrent unit (GRU) models. We also used two indicators, MAE and RMSE, to evaluate it.

3.3.3. Comparative Experiment Analysis

Figure 8 shows the RUL predictions on the N-CMAPSS DS02 subset, where each model’s output is compared with the ground truth. While all models captured the overall degradation trend, MDFA outperformed them by fusing multi-scale features and focusing on critical local information, yielding prediction curves that align more closely with actual degradation paths. Its enhanced feature extraction and temporal representation demonstrate robustness in complex scenarios.

To further validate the effectiveness of the proposed MDFA model for remaining useful life (RUL) prediction, a series of comparative experiments were conducted against several representative deep learning models, including TCN, TaFCN, DCNN, SeNet, CNN, GRU, and their hybrid architectures. The results are shown in Table 7.

MDFA consistently outperforms baseline models in RMSE and MAE. GRU captures long-range dependencies but misses critical local features, limiting accuracy. CNN-GRU hybrids improve local pattern extraction but lack depth for long-term trends. SeNet’s channel-wise attention enhances feature focus, and TaFCN’s temporal convolution with attention highlights key temporal features; however, both struggle with multi-scale cyclical degradation patterns. In contrast, MDFA’s multi-scale feature fusion combined with selective attention more precisely extracts critical degradation patterns, delivering robust, generalizable RUL predictions in complex scenarios.

3.4. Experimental Study Based on Bearing Dataset

3.4.1. Dataset Description

To assess generalization beyond aero-engine data, we applied the MDFA model to the PHM2012 bearing dataset—a standard benchmark for rotating machinery RUL prediction. The PHM2012 data were collected on the PRONOSTIA accelerated degradation test platform [29], which synchronously records full-lifecycle bearing data via acceleration and temperature sensors; vibration signals were sampled at 25.6 kHz (temperature at 10 kHz), with 0.1 s segments captured every 10 s. Experiments were terminated when the vibration amplitude exceeded 20 g, simulating bearing failure [30]. The dataset comprised 17 degradation sequences under three operating conditions (Table 8). Previous research indicates that horizontal vibrations more sensitively reflect bearing degradation than vertical ones [31]; consequently, we used horizontal vibration data for RUL prediction. Under Operating Condition 1, bearing 2_1 was chosen as a representative case, and its full-lifecycle horizontal vibration signal is shown in Figure 9.

The lifecycle of a bearing can typically be divided into four distinct stages: Stable Health Stage: During this phase, the vibration signal amplitude remains relatively low, indicating that the bearing operates under normal, healthy conditions. Incipient Defect Stage: As time progresses, minor internal defects begin to develop, leading to a gradual increase in the vibration signal amplitude. Severe Fault Propagation Stage: In this phase, the defects become more pronounced, and the vibration amplitude continues to rise significantly as the bearing deteriorates further. Late Degradation Stage: The vibration acceleration amplitude increases rapidly, eventually reaching its peak value. At this point, the bearing is considered to have reached complete failure.

3.4.2. PHM 2012 Dataset Preprocessing

The raw data from the PHM2012 dataset undergo several preprocessing steps, including noise reduction and normalization, to enhance data quality and consistency. Following this, 13 informative features are extracted through time–frequency domain analysis. These features encompass kurtosis, entropy, waveform indicators, spectral characteristics, vibration metrics, and various statistical descriptors, which are critical for accurately estimating the remaining useful life (RUL) of bearings. During model training, RUL values are normalized to the [0, 1] range using an independent scaler. Each data sample is then labeled according to the RUL at the end of a defined sliding time window, and by varying window lengths, three-dimensional tensors are constructed to capture temporal dependencies for deep sequence modeling. To quantitatively assess the prediction performance of the proposed MDFA-based network, standard regression metrics are used, including the coefficient of determination (R² score), MAE, and RMSE.

3.4.3. Comparative Experiments

To evaluate the predictive accuracy of the proposed MDFA model, experiments were conducted under two operating conditions using the PHM2012 dataset. In each condition, bearings 1 and 2 were used for training, and bearing 3 was reserved for testing. Figure 10 and Figure 11 present the RUL prediction results on test bearings 1_3 and 2_3, respectively.

Under Operating Condition 1, while TCN and GRU achieved reasonable accuracy, the MDFA model outperformed them by more effectively focusing on critical local degradation patterns through its channel and spatial attention mechanisms. GRU-based variants with added convolution or attention modules showed limited improvement, as they often missed key local features. Global attention mechanisms tended to smooth out early transient signals, resulting in delayed predictions. The DCNN model suffered from overfitting due to noise amplification and lack of regularization. In the more complex Operating Condition 2, all models exhibited higher prediction errors, but MDFA maintained robust performance. Its predictions closely followed the true RUL trajectory, demonstrating strong sensitivity to early degradation and failure phases.

To further validate its superiority, MDFA was compared against a range of models, including GRU, CNN, hybrid GRU-CNN, TCN, Transformer, TaFCN, and SeNet. Under Operating Condition 1, MDFA achieved MAE reductions of up to 87.8% and RMSE reductions up to 88.8%, along with significant improvements in R² scores. These results confirm that MDFA’s combination of multi-scale dilated convolutions and dual attention mechanisms enables accurate extraction of degradation features and enhanced focus on key temporal patterns, leading to superior RUL prediction accuracy.

3.4.4. Generalization Experiments

Given the nonstationary nature of real-world industrial environments, evaluating model performance under a single condition is insufficient. To assess the generalization and robustness of the proposed MDFA model, a cross-condition experiment was conducted. Bearing 1 from Operating Condition 1 and bearing 2 from Operating Condition 2 were used for training, while bearings 1 and 3 from Operating Condition 3 were used for testing.

As shown in Figure 10, Figure 11 and Figure 12 and summarized in Table 9, traditional deep models exhibited noticeable performance degradation under cross-condition settings. For instance, the DCNN model achieved an R² of −0.351 on bearing 1_3, performing worse than a mean predictor. In contrast, the MDFA model achieved an R² of 0.987, MAE of 0.023, and RMSE of 0.03. Compared with GRU, CNN, TCN, Transformer, and newer models like TaFCN and SeNet, MDFA consistently delivered superior results, with significant reductions in MAE and RMSE and improvements in R².

Overall, MDFA achieved the best performance across all test bearings, demonstrating strong generalization to varying speeds, loads, and degradation patterns. Its multi-scale dilated convolution module effectively captures degradation cues over different temporal scales, while the combined channel and spatial attention mechanisms enhance feature sensitivity and suppress noise. These capabilities make MDFA highly reliable for RUL prediction under complex real-world conditions.

Although the proposed MDFA model incorporates multiple parallel dilated convolution branches, dual attention modules, and a post-fusion convolutional layer, the additional computational cost remains moderate. As shown in Table 9, the average inference time per sample for MDFA was 195 ms, which is only slightly higher than those of the lightweight models, such as CNN (142 ms) and GRU (167 ms), yet substantially lower than those of Transformer (310 ms) and CNN-GRU-Attention (265 ms). This demonstrates that MDFA achieves a favorable balance between accuracy and efficiency; it consistently outperforms baseline models in prediction accuracy while maintaining computational demands at a practical level suitable for real-time prognostics applications.

4. Discussion

This study proposes a novel remaining useful life (RUL) prediction framework based on multi-scale dilated convolution and fusion attention (MDFA), aimed at improving prognostic accuracy and robustness for aeroengine bearings. Extensive experiments on the NASA C-MAPSS and N-CMAPSS datasets demonstrate that the MDFA model consistently outperforms conventional deep learning approaches, such as CNN, GRU, TCN, Transformer, SeNet, and TaFCN. Specifically, MDFA achieved MAE values as low as 0.018–0.026, RMSE values of 0.021–0.032, and R² scores above 0.987 across multiple test subdatasets, highlighting its superior predictive accuracy and stability under diverse operational conditions.

To further assess generalization, cross-domain validation was conducted using the PHM2012 bearing dataset, which features variable speeds, loads, and degradation modes typical of real-world rotating machinery. The MDFA model maintained consistently low MAE (0.023–0.026), RMSE (0.031–0.032), and high R² (0.987–0.995) across all test bearings, demonstrating strong adaptability to complex, non-stationary degradation patterns. These results confirm that the MDFA framework effectively combines multi-scale temporal representation with attention-based feature refinement, offering a robust and generalizable solution for accurate RUL prediction in both aero-engine systems and broader mechanical prognostics applications.

5. Conclusions

The proposed MDFA framework demonstrates clear advantages, including its ability to capture degradation dynamics at multiple temporal scales and to selectively emphasize informative features through dual attention mechanisms. These design choices enable the model to achieve superior accuracy and robustness compared to conventional approaches, particularly in complex, multi-operating-condition datasets. Furthermore, the cross-domain validation results suggest that MDFA generalizes well to unseen scenarios, making it a promising solution for real-world prognostics across different machinery types.

However, despite these strengths, several limitations and practical challenges remain. The model’s reliance on extensive historical sensor data may limit its applicability in situations with sparse or noisy measurements, and its computational complexity, due to multi-scale convolutions and attention operations, could hinder real-time deployment in embedded or resource-constrained environments. Additionally, the tuning of hyperparameters, such as dilation rates, attention configurations, and input window sizes, may require significant domain expertise, potentially restricting straightforward adoption in industrial settings. Future work could focus on lightweight implementations, adaptive hyperparameter optimization, and integration with online learning to address these challenges.

Moreover, it should be noted that the current evaluation of the MDFA framework is limited to three publicly available datasets, primarily covering aero-engine and bearing degradation scenarios. The C-MAPSS dataset focuses on high-pressure compressor (HPC) and fan-related faults, while the PHM2012 bearing dataset captures gradual degradation reflected through vibration signals. Consequently, the model’s performance under other industrial machinery components, such as pumps or gearboxes, remains untested.

Additionally, early-stage, intermittent, or transient faults, which often occur in real-world operational environments, have not been assessed. This limitation restricts the demonstrated applicability of MDFA to a broader range of practical degradation patterns. Future work could extend evaluation to more diverse fault types and machinery systems to further validate the model’s generalization capability.

Author Contributions

Conceptualization, G.X. and C.J.; methodology, C.J.; software, C.J.; validation, G.X., C.J. and J.B.; formal analysis, C.J.; investigation, C.J.; resources, G.X.; data curation, C.J.; writing—original draft preparation, C.J.; writing—review and editing, G.X. and J.B.; visualization, C.J.; supervision, G.X. and J.B.; project administration, G.X.; funding acquisition, G.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Conflicts of Interest

The author declares no potential conflicts of interest.

References

Shutin, D.; Bondarenko, M.; Polyakov, R.; Stebakov, I.; Savin, L. Method for on-line remaining useful life and wear prediction for adjustable journal bearings utilizing a combination of physics-based and data-driven models: A numerical investigation. Lubricants 2023, 11, 33. [Google Scholar] [CrossRef]
Fernandes, P.H.E.; Silva, G.C.; Pitz, D.B.; Schnelle, M.; Koschek, K.; Nagel, C.; Beber, V.C. Data-driven, physics-based, or both: Fatigue prediction of structural adhesive joints by artificial intelligence. Appl. Mech. 2023, 4, 334–355. [Google Scholar]
Liu, C.; Li, Q.; Wang, K. State-of-charge estimation and remaining useful life prediction of supercapacitors. Renew. Sustain. Energy Rev. 2021, 150, 111408. [Google Scholar] [CrossRef]
Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage propagation modeling for aircraft engine run-to-failure simulation. In Proceedings of the 2008 International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; pp. 1–9. [Google Scholar]
Zhao, B.; Xie, L.; Song, J.; Ren, J.; Wang, B.; Zhang, S. Fatigue life prediction of aero-engine compressor disk based on a new stress field intensity approach. Int. J. Mech. Sci. 2020, 165, 105190. [Google Scholar] [CrossRef]
Jardine, A.K.S.; Lin, D.; Banjevic, D. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process. 2006, 20, 1483–1510. [Google Scholar] [CrossRef]
Zhang, J.; Jiang, Y.; Wu, S.; Li, X.; Luo, H.; Yin, S. Prediction of remaining useful life based on bidirectional gated recurrent unit with temporal self-attention mechanism. Reliab. Eng. Syst. Saf. 2022, 221, 108297. [Google Scholar] [CrossRef]
Lei, Y.; Li, N.; Guo, L.; Li, N.; Yan, T.; Lin, J. Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mech. Syst. Signal Process. 2018, 104, 799–834. [Google Scholar] [CrossRef]
Muller, A.; Suhner, M.C.; Iung, B. Formalisation of a new prognosis model for supporting proactive maintenance implementation on industrial system. Reliab. Eng. Syst. Saf. 2008, 93, 234–253. [Google Scholar] [CrossRef]
Pham, H.; Lai, C.D. On recent generalizations of the Weibull distribution. IEEE Trans. Reliab. 2007, 56, 454–458. [Google Scholar] [CrossRef]
Liu, J.; Lei, F.; Pan, C.; Hu, D.; Zuo, H. Prediction of remaining useful life of multi-stage aero-engine based on clustering and LSTM fusion. Reliab. Eng. Syst. Saf. 2021, 214, 107807. [Google Scholar]
Fang, B.; Hongfu, Z.; Shuhong, R. Average life prediction for aero-engine fleet based on performance degradation data. In Proceedings of the 2010 Prognostics and System Health Management Conference, Macao, China, 12–14 January 2010; pp. 1–6. [Google Scholar]
Li, X.; Ding, Q.; Sun, J.Q. Remaining useful life estima-tion in prognostics using deep convolution neural net-works. Reliab. Eng. Syst. Saf. 2018, 172, 1–11. [Google Scholar] [CrossRef]
Chui, K.T.; Gupta, B.B.; Vasant, P. A genetic algorithm optimized RNN-LSTM model for remaining useful life prediction of turbofan engine. Electronics 2021, 10, 285. [Google Scholar] [CrossRef]
Wahid, A.; Yahya, M.; Breslin, J.G.; Intizar, M.A. Self-attention transformer-based architecture for remaining useful life estimation of complex machines. Procedia Comput. Sci. 2023, 217, 456–464. [Google Scholar] [CrossRef]
Shi, H.; Xie, S.; Zhang, X.; Shi, G.; Wu, B. Remaining useful life prediction of weighted k-out-of-n systems based on dynamic random weights of importance. Comput. Ind. Eng. 2023, 183, 109540. [Google Scholar] [CrossRef]
ALMASIA. Latest lessons learned, modern condition monitoring and advanced predictive maintenance for gas turbines. Aust. J. Mech. Eng. 2015, 14, 199–211. [Google Scholar] [CrossRef]
Cui, L.; Huang, J.; Zhang, F. Quantitative and localization diagnosis of a defective ball bearing based on vertical–horizontal synchronization signal analysis. IEEE Trans. Ind. Electron. 2017, 64, 8695–8706. [Google Scholar] [CrossRef]
Arias Chao, M.; Kulkarni, C.; Goebel, K.; Fink, O. Aircraft engine run-to-failure dataset under real flight conditions for prognostics and diagnostics. Data 2021, 6, 5. [Google Scholar] [CrossRef]
Zhao, S.; Pang, Y.; Chen, J.; Liu, J. Predication of remaining useful life of aircraft engines based on multi-head attention and LSTM. In Proceedings of the 2022 IEEE 6th Information Technology and Mechatronics Engineering, Chongqing, China, 4–6 March 2022. [Google Scholar]
Al-Khazraji, H.; Nasser, A.R.; Hasan, A.M.; Al Mhdawi, A.K.; Al-Raweshidy, H.; Humaidi, A.J. Aircraft engines remaining useful life prediction based on a hybrid model of autoencoder and deep belief network. IEEE Access 2022, 10, 82156–82163. [Google Scholar] [CrossRef]
Chen, Z.; Wu, M.; Zhao, R.; Guretno, F.; Yan, R.; Li, X. Machine remaining useful life prediction via an attention-based deep learning approach. IEEE Trans. Ind. Electron. 2020, 68, 2521–2531. [Google Scholar] [CrossRef]
Cheng, Y.; Wu, J.; Zhu, H.; Or, S.W.; Shao, X. Remaining Useful Life Prognosis Based on Ensemble Long Short-Term Memory Neural Network. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
Fan, L.; Chai, Y.; Chen, X. Trend attention fully convolutional network for remaining useful life estimation. Reliab. Eng. Syst. Saf. 2022, 225, 108590. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Xu, T.; Han, G.; Zhu, H.; Taleb, T.; Peng, J. Multi-resolution LSTM-based prediction model for remaining useful life of aero-engine. IEEE Trans. Veh. Technol. 2023, 73, 1931–1941. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and Regression by Random Forest. R News 2002, 23, 25–31. [Google Scholar]
Zhang, J.; Luan, Z.; Ni, L.; Qi, L.; Gong, X. MSDANet: A multi-scale dilation attention network for medical image segmentation. Biomed. Signal Process. Control 2024, 90, 105889. [Google Scholar] [CrossRef]
Li, F.; Zhou, Y.; Chen, Y.L.; Li, J.; Dong, Z.; Tan, M. Multi-scale attention-based lightweight network with dilated convolutions for infrared and visible image fusion. Complex Intell. Syst. 2024, 10, 705–719. [Google Scholar] [CrossRef]
Gaihua, W.; Tianlun, Z.; Yingying, D.; Jinheng, L.; Lei, C. A serial-parallel self-attention network joint with multi-scale dilated convolution. IEEE Access 2021, 9, 71909–71919. [Google Scholar] [CrossRef]

Figure 1. The structure of the multi-layer stacked dilated convolution network.

Figure 2. The structure of the MDFA network.

Figure 3. Remaining useful life (RUL) curves of engine 3 under different datasets. (a) RUL degradation trajectory for engine unit 3 in FD001 subset of C-MAPSS dataset; (b) RUL degradation trajectory for engine unit 3 in FD002 subset of C-MAPSS dataset; (c) RUL degradation trajectory for engine unit 3 in FD003 subset of C-MAPSS dataset; (d) RUL degradation trajectory for engine unit 3 in FD004 subset of C-MAPSS dataset.

Figure 4. Contribution rate of CMPASS subset dataset. (a) FD001 dataset sensor contribution rate; (b) FD002 dataset sensor contribution rate; (c) FD003 dataset sensor contribution rate; (d) FD004 dataset sensor contribution rate.

Figure 5. The prediction performance of the subdataset engines on the CMPASS dataset. (a) The prediction results of engines in the FD001 dataset. (b) The prediction results of engines in the FD002 dataset. (c) The prediction results of engines in the FD003 dataset. (d) The prediction results of engines in the FD004 dataset.

Figure 6. The predicted remaining useful life of the CMPASS dataset. (a) Comparative analysis of actual and predicted remaining useful life during flight cycles using FD001 subset data. (b) Comparative analysis of actual and predicted remaining useful life during flight cycles using FD002 subset data. (c) Comparative analysis of actual and predicted remaining useful life during flight cycles using FD003 subset data. (d) Comparative analysis of actual and predicted remaining useful life during flight cycles using FD004 subset data.

Figure 7. Feature importance for RUL prediction in aircraft engines Using ExtraTreesRegressor algorithm.

Figure 8. Comparison of remaining useful life prediction for N-CMPASS dataset using different models. (a) CNN model; (b) DCNN model; (c) GRU model; (d) CNN-GRU model; (e) CNN-GRU-Attention model; (f) SeNet model; (g) TCN model; (h) MDFA model.

Figure 9. Bearing 2_1 life cycle.

Figure 10. Different graphs showing remaining life prediction for bearing 1_3. (a) CNN-GRU model; (b) MDFA model; (c) GRU model; (d) TCN mode; (e) DCNN model; (f) CNN-GRU-Attention model.

Figure 11. Different graphs showing remaining life prediction for bearing 2_3. (a) CNN-GRU model; (b) MDFA model; (c) GRU model; (d) TCN mode; (e) DCNN model; (f) CNN-GRU-Attention model.

Figure 12. Different graphs showing remaining life prediction for bearings 3_1 and 3_3. (a) CNN-GRU model; (b) MDFA model; (c) GRU model; (d) TCN mode; (e) DCNN model; (f) CNN-GRU-Attention model.

Table 1. Model hyperparameter configuration.

Parameter	Value	Description
batch_size	32	Training batch size
window_size	30	Sliding window size
epoch	100	Number of training epochs
optim	Adam	Optimization algorithm
loss function	null	Mean squared error
lr	0.0001	Initial learning rate
Dropout	0.3	Dropout rate for regularization

Table 2. Network parameter configuration.

Network Layer		Configuration	Parameters
Multi-scale dilated convolutional	Branch 1	Input	[B,C,H]
		Kernel	3
		Dilation	1
		Output	[B,C,H]
	Branch 2	Input	[B,C,H]
		Kernel	3
		Dilation	2
		Output	[B,C,H]
	Branch 3	Input	[B,C,H]
		Kernel	3
		Dilation	4
		Output	[B,C,H]
	Average pooling	Input	[B,C,H]
	Average pooling	Input	[B,C,H]
	Feature fusion	Output	[B,4C,H]
	Feature fusion	Output	[B,4C,H]
Channel attention module and spatial attention module	Average pooling	Input	[B,4C,H]
	Average pooling	Output	[B,C,1]
	1 × 1 Convolution	Input	[B,4C,H]
		Output	[B,C,1]
		Kernel	3
		Activation	Sigmoid
		Dropout	0.3
	1 × 1 Convolution	Input	[B,4C,H]
		Output	[B,C,1]
		Kernel	3
		Dropout	0.3
		Activation	Sigmoid
Feature fusion module		Input	[B,4C,H]
Feature fusion module		Output	[B,C,H]
Dimensionality reduction	1 × 1 Convolution	Input	[B,C,H]
		Output	[B,1,1]
		Kernel	1
		Dropout	0.3
		Activation	ReLU

Table 3. The dataset of CMPASS.

Dataset	FD001	FD002	FD003	FD004
Operating conditions	1	6	1	6
Fault modes	1	1	2	2
Training samples	100	260	100	249
Testing Samples	100	259	100	248
Fault modes	HPC malfunction	HPC malfunction	HPC and Fan malfunction	HPC and Fan malfunction

Table 4. The key sensor variables.

Symbol	Names of Sensors
Ps30	High-Pressure Compressor Exit Static Pressure
T24	Low-Pressure Compressor Exit Total Temperature
T30	High-Pressure Compressor Exit Total Temperature
P30	High-Pressure Compressor Exit Total Pressure
Phi	Fuel Flow
Nf	Low-Pressure Rotor Speed (Fan Speed)
Nc	High-Pressure Rotor Speed (Core Speed)
W31/W32	High/Low-Pressure Turbine Cooling Air Flow

Table 5. Evaluation metrics comparison across different datasets.

Model	FD001		FD002		FD003		FD004
Model	RMSE	Score	RMSE	Score	RMSE	Score	RMSE	Score
CNN	18.33	1275.6	25.63	9865.5	19.63	1693.9	24.65	7589.6
TaFCN	13.99	301.72	17.06	1491.85	12.02	372.85	19.85	6856.89
DCNN	12.75	419.65	17.52	2301.56	13.86	331.63	24.56	10,563.56
CNN-GRU-Attention	13.69	300.56	18.23	1599.98	13.78	310.56	24.31	5909.66
SeNet	15.11	402.51	21.46	3663.81	15.64	606.60	23.75	4951.71
CNN-GRU	15.94	414.73	20.49	3459.05	13.24	334.93	24.39	8765.70
TCN	17.47	1205.79	17.56	3156.92	16.80	627.56	23.79	10,490.58
MDFA	11.78	299.08	16.38	1380.65	11.89	303.34	19.23	3566.44
MsDCNN	15.72	400.49	18.02	1888.45	15.01	393.93	22.27	4394.10
Improved rate	7.6%	0.5%	3.9%	7.4%	1.1%	2.3%	3.1%	18.8%

Table 6. Critical sensor feature screening results for aircraft engines based on contribution rate threshold.

Sensor Code	Sensor Name	Importance Score
LPT_eff_mod	Low-Pressure Turbine Efficiency Correction Factor	0.101
HPT_eff_mod	High-Pressure Turbine Efficiency Correction Factor	0.532
LPT_flow_mod	Low-Pressure Turbine Flow Correction Factor	0.200
T50	High-Pressure Turbine Outlet Temperature	0.036
SmHPC	Low-Pressure Turbine Outlet Temperature	0.015
SmLPC	Low-Pressure Compressor Surge Margin	0.052
phi	Fuel Flow to High-Pressure Compressor Static Pressure Ratio	0.019

Table 7. The comparison experiment results on engine 2.

Model	RMSE	MAE
CNN	10.17	5.03
DCNN	9.49	5.88
SeNet	12.41	5.39
CNN-GRU	9.86	5.14
CNN-GRU-Attention	9.53	4.42
TCN	10.38	5.59
MDFA	8.95	4.31
GRU	10.44	5.02

Table 8. Experimental dataset (PHM2012).

Operating Condition	Radial Load/N	Rotational Speed (rpm)	Training Data	Test Data
Condition 1	4000	1800	Bearing1_1, Bearing 1_2	Bearing1_3, Bearing1_4, Bearing1_5, Bearing1_6, Bearing1_7
Condition 2	4200	1650	Bearing2_1, Bearing2_2	Bearing2_3, Bearing2_4, Bearing2_5, Bearing2_6, Bearing2_7
Condition 3	5000	1500	Bearing3_1, Bearing3_2	Bearing3_3

Table 9. Remaining life prediction results of different models.

Model	Bearing 1_3			Bearing 2_3			Bearing 3_1, 3_3			Average Inference Time (ms)
	RMSE	$R^{2}$ Score	MAE	RMSE	$R^{2}$ Score	MAE	RMSE	$R^{2}$ Score	MAE	Average Inference Time (ms)
CNN	0.100	0.874	0.078	0.327	0.619	0.270	0.160	0.645	0.122	142
TaFCN	0.041	0.980	0.027	0.081	0.917	0.054	0.058	0.956	0.046	171
DCNN	0.182	0.158	0.153	0.825	0.747	0.402	0.317	−0.351	0.180	165
CNN-GRU	0.188	0.584	0.148	0.194	0.567	0.144	0.153	0.682	0.128	235
SeNet	0.159	0.684	0.130	0.089	0.902	0.066	0.166	0.627	0.138	180
CNN-GRU-Attention	0.049	0.969	0.034	0.071	0.936	0.059	0.047	0.969	0.037	265
Transformer	0.108	0.854	0.094	0.119	0.823	0.094	0.097	0.973	0.069	310
MDFA	0.021	0.995	0.018	0.032	0.989	0.026	0.031	0.987	0.023	195
TCN	0.041	0.979	0.029	0.085	0.909	0.069	0.031	0.986	0.026	188
GRU	0.048	0.970	0.041	0.056	0.961	0.031	0.030	0.985	0.023	167

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, G.; Jin, C.; Bai, J. Remaining Useful Life Prediction for Aero-Engines Based on Multi-Scale Dilated Fusion Attention Model. Appl. Sci. 2025, 15, 9813. https://doi.org/10.3390/app15179813

AMA Style

Xiao G, Jin C, Bai J. Remaining Useful Life Prediction for Aero-Engines Based on Multi-Scale Dilated Fusion Attention Model. Applied Sciences. 2025; 15(17):9813. https://doi.org/10.3390/app15179813

Chicago/Turabian Style

Xiao, Guosong, Chenfeng Jin, and Jie Bai. 2025. "Remaining Useful Life Prediction for Aero-Engines Based on Multi-Scale Dilated Fusion Attention Model" Applied Sciences 15, no. 17: 9813. https://doi.org/10.3390/app15179813

APA Style

Xiao, G., Jin, C., & Bai, J. (2025). Remaining Useful Life Prediction for Aero-Engines Based on Multi-Scale Dilated Fusion Attention Model. Applied Sciences, 15(17), 9813. https://doi.org/10.3390/app15179813

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Remaining Useful Life Prediction for Aero-Engines Based on Multi-Scale Dilated Fusion Attention Model

Abstract

1. Introduction

2. Methods

2.1. Multi-Scale Dilated Convolutional Network

2.2. Dual Attention Channels

2.3. MDFA-Based RUL Prediction Model Architecture

3. Experimental Procedure

3.1. The Settings of Model Parameters

3.2. Experimental Introduction

3.2.1. Dataset Description

3.2.2. C-MPASS Data Preprocessing

3.2.3. Experimental Analysis

3.2.4. Comparative Experiment

3.3. Experimental Analysis on the N-CMPASS

3.3.1. Dataset Description

3.3.2. Data Processing Procedure

3.3.3. Comparative Experiment Analysis

3.4. Experimental Study Based on Bearing Dataset

3.4.1. Dataset Description

3.4.2. PHM 2012 Dataset Preprocessing

3.4.3. Comparative Experiments

3.4.4. Generalization Experiments

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI