1. Introduction
Failures of aircraft or gas turbine engines can cause substantial economic losses and, in extreme cases, threaten flight safety. Despite technological advances improving turbine–engine reliability, their complex mechanical structures remain vulnerable to multi-mode failures and abnormal degradation. The International Air Transport Association (IATA) reports that the average service life of commercial aircraft rose from 21.3 years in 2000 to 28.6 years by 2023, highlighting the growing importance of engine reliability. Similarly, AviationWeek projects the global engine maintenance, repair, and overhaul (MRO) market will reach USD 50.3 billion by 2025 (CAGR 4.7%). In this context, predictive maintenance has emerged as a proactive strategy to minimize unplanned downtime and cut maintenance costs. At its core lies evaluation of an engine’s current status and estimation of its remaining useful life (RUL)—the number of cycles or time it can operate normally before failure. Achieving accurate RUL predictions therefore depends on advanced health-monitoring systems and robust prognostic models.
Current RUL prediction approaches for aero-engines can be broadly classified into three categories: physics-based models, data-driven models, and expert system-based methods [
1,
2]. Physics-based models [
3] establish detailed simulations of engine components to analyze failure mechanisms. For example, Saxena et al. [
4] proposed a framework based on compressor efficiency degradation equations, while Liu et al. [
5] applied Paris’ law to model turbine disk crack propagation under low-cycle fatigue via finite-element simulations. Although these methods enable insight into underlying physical mechanisms, they often incur high development and maintenance costs, depend heavily on idealized assumptions (e.g., steady-state conditions), and lack flexibility for capturing sudden failures under transient shock loads.
Data-driven approaches, by contrast, eliminate the need for detailed physical models or extensive prior knowledge [
6], relying on machine learning and deep learning algorithms to uncover hidden patterns in historical monitoring data. Deep learning architectures, in particular, have demonstrated strong capability in extracting informative features from high-dimensional signals, thereby facilitating more accurate health assessments [
7]. Such methods reduce the complexity associated with physics-based model construction and can deliver reliable RUL predictions [
8] even when detailed degradation mechanisms are not fully understood. As a result, data-driven techniques have become a central focus within Prognostics and Health Management (PHM). However, data-driven methods still face challenges in handling diverse operating conditions and complex degradation patterns.
When operational or monitoring data are incomplete or unavailable, expert system-based methods offer an alternative by using statistical inference grounded in expert knowledge and historical maintenance records. Common statistical distributions include Poisson, Weibull, exponential, log-normal, and inverse Gaussian [
9,
10]. For instance, Liu et al. [
11] employed lifecycle data from similar engines and current operational cycles to develop a Weibull-based RUL model within a reliability framework. Bai et al. [
12] further extended this concept to a fleet-level model using inverse Gaussian distributions to analyze performance drift and failure sequences, estimating average engine lifespans at a macro scale. Although expert system approaches can perform adequately with sparse data, they often struggle to adapt when underlying operating conditions change significantly, limiting their general applicability.
Within the data-driven category, methods can be further subdivided into traditional machine learning techniques and deep learning-based approaches. Traditional machine learning models often face difficulties when processing high-dimensional data, leading to trade-offs between predictive accuracy and computational efficiency. Deep learning models, by contrast, exhibit superior feature extraction capabilities and have thus become the mainstream. Representative deep-learning methods include convolutional neural networks (CNNs) and long short-term memory networks (LSTMs), which have shown strong performance on large-scale operational datasets. For example, Li et al. [
13] proposed a deep CNN for RUL estimation; Chui et al. [
14] designed a sensor-time–based LSTM-RNN encoder–decoder model; and Wahid et al. [
15] developed a Transformer-based architecture for adaptive degradation modeling. To further refine temporal feature representation, Shi et al. [
16] introduced attention-based dynamic weighting to address memory decay, and Qin et al. [
17] proposed a multi-scale fusion framework that extracts degradation features at various scales before feeding them into fully connected networks for final RUL prediction.
Despite these advances, existing data-driven approaches still exhibit notable limitations. First, traditional CNNs and RNNs excel at capturing local patterns but struggle to model long-term dependencies inherent in full-lifecycle degradation. Although LSTM and GRU variants partially alleviate gradient vanishing, they still suffer from low learning efficiency, hindering their ability to capture subtle, long-term degradation trends. Second, Transformer-based methods, while effective for modeling global dependencies, tend to be parameter-intensive and may show unstable performance when data are limited. Third, many models focus solely on single-scale feature extraction or global channel attention, neglecting selective emphasis on temporally critical regions; this oversight can lead to insufficient recognition of RUL-relevant patterns and high information redundancy, ultimately constraining both prediction accuracy and computational efficiency.
To address these challenges, this paper proposes a novel RUL prediction model for aero-engines based on a multi-scale dilated fusion attention (MDFA) mechanism. First, to overcome the limited receptive field of traditional CNNs, the model introduces multi-scale dilated convolution modules with parallel branches of varying dilation rates, thereby enhancing feature extraction across different temporal scales. Second, to capture locally important temporal signals often overlooked by existing methods, the model integrates both channel and spatial attention mechanisms, enabling selective emphasis on informative dimensions and time steps. This synergistic design enhances the depth and sensitivity of feature representation, alleviating constraints in long-term dependency modeling and redundancy control found in conventional approaches. Experimental validation demonstrates that the MDFA model achieves superior robustness, precision, and generalization performance across diverse degradation scenarios.
3. Experimental Procedure
3.1. The Settings of Model Parameters
The proposed MDFA model was validated on benchmark datasets for aero-engine RUL prediction, specifically using NASA’s C-MAPSS dataset, which contains sensor data from gas-path and mechanical subsystems of aircraft engines. Through extensive experiments, the optimal dilation-rate configuration was identified as {1, 2, 4}, balancing the extraction of local details with modeling of global degradation trends while avoiding redundant computations from overlapping receptive fields. Compared to other configurations, such as {2, 4, 8}, {1, 3, 5}, and {1, 2, 4, 8}, the {1, 2, 4} setup maintains greater feature diversity without unnecessary complexity. The detailed hyperparameter settings are listed in
Table 1.
Although a comprehensive ablation analysis is not included, we observed in repeated preliminary experiments that using larger dilation rates (e.g., {2, 4, 8}) often led to oversmoothing, where short-term degradation cues were diluted, while irregular intervals, such as {1, 3, 5}, produced less consistent receptive-field coverage and unstable convergence. In contrast, {1, 2, 4} not only yielded lower validation loss on multiple subsets but also provided a more interpretable multi-scale representation (fine, medium, and coarse temporal resolution). These empirical findings support the choice of {1, 2, 4} as a practical and effective configuration for this task.
The hyperparameter values in
Table 1 were determined through a combination of grid search and empirical tuning on the training and validation sets. We first adopted commonly used settings in related RUL prediction studies as initial references, and then adjusted parameters such as learning rate, dropout, and window size based on validation performance to achieve a balance between model accuracy and generalization.
The MDFA model was trained with an initial learning rate of 0.0001, which was dynamically reduced during training to speed up convergence and avoid getting stuck in suboptimal minima. A dropout rate of 0.3 was applied after key layers to prevent overfitting and improve generalization on unseen operating conditions. We used the Adam optimizer, which was chosen for its adaptive learning-rate adjustments and robustness when handling sparse gradients and large datasets, and adopted mean squared error (MSE) as the loss function to penalize large deviations between predicted and actual RUL values.
Table 2 summarizes the input–output tensor dimensions, where B is the batch size, C is the per-time-step feature dimension, and H is the sequence length.
Within the multi-scale dilated convolution module, three parallel branches with dilation rates {1, 2, 4} and 3 × 3 kernels process single-channel input, forming receptive fields that capture degradation features at different temporal scales without excessively deepening the network. A fourth branch applies global average pooling to aggregate long-term trends across the entire sequence. These four outputs are concatenated and fused to ensure consistent dimensionality before passing to the dual attention stage. The channel attention submodule learns channel-wise importance via global pooling followed by two fully connected layers (ReLU then Sigmoid), reweighting each feature channel. Simultaneously, the spatial attention submodule applies global pooling along the channel axis, followed by a 1 × 1 convolution and Sigmoid activation to highlight critical time steps. By multiplying these attention weights back into the fused feature maps, the network emphasizes the most informative dimensions and temporal locations. Finally, a 1 × 1 convolution compresses the feature map, reducing computational cost while preserving the enriched, multi-scale representation needed for accurate RUL prediction. This design balances local detail extraction, global trend modeling, and computational efficiency.
3.2. Experimental Introduction
3.2.1. Dataset Description
The NASA C-MAPSS [
19] (Commercial Modular Aero-Propulsion System Simulation) dataset was employed to validate the effectiveness of the proposed method. C-MAPSS is a high-fidelity computer model used to simulate the degradation of large commercial turbofan engines. It includes atmospheric models, allowing simulations under various conditions: (1) altitudes ranging from 0 to 40,000 feet, (2) flight Mach numbers from 0 to 0.90, and (3) sea-level temperatures from −60 to 103 °F.
As shown in
Table 3, the C-MAPSS dataset comprises four subsets, each simulating engine degradation under different fault modes and operating conditions.
In real-world applications, performance degradation in engines is generally not evident during the early operational phase. However, as operational time increases, engine health progressively deteriorates [
20]. To reflect this behavior, a piecewise linear function is employed to annotate the remaining useful life (RUL) of each sample. Specifically, during the early stage of engine operation, the components maintain high performance with minimal degradation, resulting in a constant RUL value.
According to the RUL annotation strategy described above and the method proposed by Al-Khazraji et al. [
21], the RUL for each data sample is calculated as shown in Equation (11):
where
represents the computed variable,
represents the maximum number of operational cycles for a given engine unit, and
denotes the current time step within its life cycle. The calculated variable corresponds to the remaining number of operational cycles from time step t.
In RUL prediction tasks, setting an appropriate initial RUL value is crucial for effective model training and accurate forecasting. Following the approach in [
22], the initial RUL is assigned a value within the range of 120 to 130, which aligns with the health characteristics of engine components during the operational stage. Accordingly, this study set the initial RUL to 125. The linear degradation curves in
Figure 3 were derived directly from the C-MAPSS dataset by applying the piecewise linear RUL labeling strategy commonly used in RUL prediction studies. Specifically, the RUL was set to a constant initial value (125 cycles in this study) during the early stable stage, and then decreased linearly with each subsequent cycle until failure, thereby generating the straight-line degradation trajectories shown in the plots.
Figure 3a–d show the RUL trajectories of engine 3 in the FD001 through FD004 subsets. In each case, the early phase (highlighted in green) exhibits a flat plateau where the RUL remains constant at 125, corresponding roughly to cycles 54, 81, 97, and 182 for the respective subsets. This plateau reflects a quasi-steady-state region in which engine degradation is negligible and operating conditions are stable. Once the operational cycle surpasses a critical turning point (marked by blue dots), RUL begins to decline almost linearly, signaling the transition from stable operation to progressive degradation. The linear descent often corresponds to the accumulation of wear or damage mechanisms, such as material fatigue or erosion, that intensify once certain thresholds are crossed. Eventually, the RUL reaches zero, denoting failure. Identifying this turning point is crucial for prognostic models, as it delineates the boundary between normal operation and active degradation; accurate detection of this inflection can markedly improve RUL prediction by focusing the model’s attention on features that emerge only after degradation initiates.
3.2.2. C-MPASS Data Preprocessing
Raw sensor signals in aero-engine RUL modeling often contain noise, redundant information, and strong inter-variable coupling, which can increase model complexity and degrade both prediction accuracy and generalization performance [
23]. To mitigate these issues, Principal Component Analysis (PCA) is applied to the multi-dimensional sensor data from the C-MAPSS dataset for dimensionality reduction. PCA transforms the original correlated variables into a set of orthogonal principal components, retaining the majority of the information while reducing redundancy. The optimal number of principal components is selected based on the cumulative contribution rate.
Figure 4 presents the contribution rates of the sensor features across each sub-dataset, guiding the choice of components that capture sufficient variance without unnecessary complexity.
To ensure that the selected features are applicable across various operating conditions and fault modes, the sensor contribution rates across all subdatasets were comprehensively analyzed. A set of key sensor variables that consistently exhibit significant contributions across different scenarios was selected, as summarized in
Table 4.
To evaluate the effectiveness of the proposed approach, three commonly used performance metrics were adopted: root mean squared error (RMSE), mean absolute error (MAE), and the scoring function (Score). These three metrics were chosen because they are the most widely used in the prognostics community—particularly in the PHM Data Challenge—and provide complementary perspectives: RMSE emphasizes large errors, MAE reflects average prediction deviation, and Score incorporates asymmetric penalization for early versus late predictions. Compared with variance, confidence intervals, or statistical significance tests, which primarily quantify result uncertainty, these three measures directly capture predictive accuracy and are thus more suitable for fair benchmarking against prior studies. These metrics are defined as follows:
where
N denotes the total number of test samples, and
is the prediction error, with
representing the predicted RUL and
yi the true RUL. A negative error (
ei < 0) corresponds to an early prediction (the model predicts a shorter life than reality), whereas a positive error (
ei > 0 > 0) corresponds to a late prediction (the model predicts a longer life than reality). The constants (13 for early prediction and 10 for late prediction) control the penalty strength. A smaller Score indicates smaller errors and less severe penalties, thereby reflecting better predictive performance.
3.2.3. Experimental Analysis
The proposed MDFA model was evaluated on the C-MAPSS dataset. Due to computational constraints, RUL prediction results of a subset of engines were randomly selected for visualization.
Figure 5 illustrates the comparison between predicted and actual RUL values for selected engine units across different subdatasets. The predicted values are outputs of the model at each time step, while the corresponding RUL files provide the ground truth labels.
Figure 5a–d compare the RUL prediction performance of the MDFA model across all four subdataset test sets. It is evident that prediction errors in FD002 and FD004 exceeded those in FD001 and FD003, which can be traced back to the operational complexity: FD001 and FD003 each encompass a single, relatively simple operating mode, whereas FD002 and FD004 involve six distinct operating scenarios. This added variability not only introduced sudden shifts in sensor patterns but also altered degradation trajectories, making it more challenging for any model to generalize. As a result, the MDFA’s multiscale receptive fields and attention modules, designed to capture both local and global features, still exhibited larger deviations under these heterogeneous conditions. Nonetheless, even in FD002 and FD004, the predicted RUL curves follow the ground truth trends closely, demonstrating that MDFA maintains strong robustness. Occasional spikes in error typically coincide with abrupt operating condition changes, suggesting that future work could benefit from incorporating explicit operating status encoding or online domain adaptation strategies to further reduce these deviations. While
Figure 5 mainly provides a qualitative comparison, the purpose of this visualization is to illustrate overall prediction trends and the robustness of the proposed model under different operating conditions. A detailed quantitative error distribution analysis will be considered in our future work to complement the visual results.
To illustrate individual engine performance,
Figure 6 visualizes the RUL prediction for the third engine in each subdataset (FD001–FD004). In FD001 and FD003, where operating modes remained consistent, the predicted and actual curves overlap almost entirely, indicating that MDFA successfully captured gradual degradation without being distracted by noise or redundant information. For FD002 and FD004, the trajectories still align closely overall, but occasional misalignments correspond to points where the engine’s operating mode shifted, reflecting transient sensor behavior that momentarily masked degradation signals. This observation underlines the importance of the MDFA’s attention mechanisms: by selectively reweighting informative channels and time steps, the model minimizes the impact of non-degradation-related fluctuations. Across all four cases, MDFA demonstrated high prediction accuracy and stability, confirming its effectiveness in both simple and complex operating environments.
The experimental results across the four subdatasets indicate that the MDFA model demonstrates strong agreement between the predicted RUL curves and the actual degradation trajectories, with the majority of predictions falling within the 95% confidence interval. This reflects the model’s robust generalization ability and high prediction stability. Specifically, in FD001 and FD003—which involve single operating conditions and fault types—the MDFA model accurately captured the degradation trends of the engine units, yielding predictions closely aligned with the ground truth. These results underscore the model’s capability in temporal feature extraction and sequence modeling under relatively simple conditions.
In contrast, for FD002 and FD004, which contain multiple operating regimes and compound fault scenarios, the model exhibits minor prediction fluctuations at certain time steps. Nevertheless, the overall degradation patterns remain consistent with the actual RUL curves, indicating strong adaptability and resilience to complex operational variations and fault couplings. In
Figure 6, the third engine from each subdataset was chosen as a representative case to illustrate the prediction process in a clear and consistent manner across datasets. This specific selection does not affect the generality of our conclusions, as similar trends were observed for other engines during our experiments. The choice of a single engine per dataset was made primarily for clarity of presentation, avoiding overly cluttered figures while still conveying the model’s typical prediction behavior. It is worth noting that the predicted RUL curves occasionally lag behind the true degradation trajectories. This lag arises from the model’s conservative learning of degradation dynamics and acts as a safeguard against premature failure alarms, which can be beneficial in predictive maintenance scenarios. A more detailed quantification of this lag effect will be considered in future work.
In summary, the MDFA model not only achieved high prediction accuracy under simple operating conditions but also maintained reliable performance under more complex settings, thus confirming its broad applicability to practical RUL prediction tasks in aerospace systems.
3.2.4. Comparative Experiment
To further evaluate the proposed MDFA model’s effectiveness, we compared its performance with several advanced deep learning approaches, including deep convolutional neural networks (DCNNs), temporal convolutional networks (TCNs) [
24], trend-aware fully convolutional networks (TaFCNs) [
25], squeeze and excitation networks (SeNets) [
26], standard convolutional neural networks (CNNs), gated recurrent units (GRUs), and various hybrid architectures.
Table 5 presents the root mean squared error (RMSE) and Score metrics for all models across the four C-MAPSS subdatasets. MDFA consistently outperformed these baselines. In terms of RMSE, it achieved relative improvements of 7.6 percent on FD001, 3.9 percent on FD002, 1.1 percent on FD003 and 3.1 percent on FD004. For the Score metric, its gains were 0.5 percent on FD001, 7.4 percent on FD002, 2.3 percent on FD003, and 18.8 percent on FD004. The performance gap was most pronounced on FD002 and FD004, which involve more complex operating conditions, varied fault patterns and different training sample sizes. In contrast, FD001 and FD003 presented simpler operating scenarios, leading to more accurate predictions across all models. These results demonstrate that MDFA not only achieves higher accuracy in predicting remaining useful life, but also generalizes reliably in both simple and complex degradation environments, highlighting its strong potential for practical use in aero-engine health prognostics.
3.3. Experimental Analysis on the N-CMPASS
3.3.1. Dataset Description
To further assess the effectiveness and generalizability of the proposed MDFA model, we conducted additional RUL prediction experiments on the N-CMAPSS dataset. The N-CMAPSS dataset [
27] comprises eight subdatasets, aggregating operational data from 128 aircraft engine units under various degradation modes. These fault scenarios affect critical engine components, including the fan, low/high-pressure compressors, and low/high-pressure turbines, primarily impacting their flow and efficiency. Each subdataset is stored in a file and consists of two distinct parts: the development set and the test set. Both parts include six categories of variables: operational settings w, measured signals, virtual sensor readings, engine health parameters θ, remaining useful life, and auxiliary monitoring indicators. In this study, we focused on the DS02 subdataset, which provides complete degradation trajectories of 10 engine units from their healthy state to system failure. This dataset served as a critical benchmark for evaluating the RUL prediction capability of the proposed model.
3.3.2. Data Processing Procedure
The development and test sets were first loaded using appropriate data handling libraries and then concatenated to construct a unified input set. To evaluate the relevance of each sensor variable to the RUL prediction task, we employed the ExtraTreesRegressor algorithm to compute the feature importance scores, with the ranked results illustrated in
Figure 7.
In accordance with the threshold suggested by [
28], we selected variables with importance coefficients greater than 0.01 as model input features. This ensures that the selected variables contribute significantly to the predictive performance and provide sufficient informational content. The final selected features, their importance scores, and corresponding physical interpretations are summarized in
Table 6.
To comprehensively evaluate the predictive performance of the proposed model, we compared it against several state-of-the-art deep learning models, including fully connected networks, convolutional neural networks, and gated recurrent unit (GRU) models. We also used two indicators, MAE and RMSE, to evaluate it.
3.3.3. Comparative Experiment Analysis
Figure 8 shows the RUL predictions on the N-CMAPSS DS02 subset, where each model’s output is compared with the ground truth. While all models captured the overall degradation trend, MDFA outperformed them by fusing multi-scale features and focusing on critical local information, yielding prediction curves that align more closely with actual degradation paths. Its enhanced feature extraction and temporal representation demonstrate robustness in complex scenarios.
To further validate the effectiveness of the proposed MDFA model for remaining useful life (RUL) prediction, a series of comparative experiments were conducted against several representative deep learning models, including TCN, TaFCN, DCNN, SeNet, CNN, GRU, and their hybrid architectures. The results are shown in
Table 7.
MDFA consistently outperforms baseline models in RMSE and MAE. GRU captures long-range dependencies but misses critical local features, limiting accuracy. CNN-GRU hybrids improve local pattern extraction but lack depth for long-term trends. SeNet’s channel-wise attention enhances feature focus, and TaFCN’s temporal convolution with attention highlights key temporal features; however, both struggle with multi-scale cyclical degradation patterns. In contrast, MDFA’s multi-scale feature fusion combined with selective attention more precisely extracts critical degradation patterns, delivering robust, generalizable RUL predictions in complex scenarios.
3.4. Experimental Study Based on Bearing Dataset
3.4.1. Dataset Description
To assess generalization beyond aero-engine data, we applied the MDFA model to the PHM2012 bearing dataset—a standard benchmark for rotating machinery RUL prediction. The PHM2012 data were collected on the PRONOSTIA accelerated degradation test platform [
29], which synchronously records full-lifecycle bearing data via acceleration and temperature sensors; vibration signals were sampled at 25.6 kHz (temperature at 10 kHz), with 0.1 s segments captured every 10 s. Experiments were terminated when the vibration amplitude exceeded 20 g, simulating bearing failure [
30]. The dataset comprised 17 degradation sequences under three operating conditions (
Table 8). Previous research indicates that horizontal vibrations more sensitively reflect bearing degradation than vertical ones [
31]; consequently, we used horizontal vibration data for RUL prediction. Under Operating Condition 1, bearing 2_1 was chosen as a representative case, and its full-lifecycle horizontal vibration signal is shown in
Figure 9.
The lifecycle of a bearing can typically be divided into four distinct stages: Stable Health Stage: During this phase, the vibration signal amplitude remains relatively low, indicating that the bearing operates under normal, healthy conditions. Incipient Defect Stage: As time progresses, minor internal defects begin to develop, leading to a gradual increase in the vibration signal amplitude. Severe Fault Propagation Stage: In this phase, the defects become more pronounced, and the vibration amplitude continues to rise significantly as the bearing deteriorates further. Late Degradation Stage: The vibration acceleration amplitude increases rapidly, eventually reaching its peak value. At this point, the bearing is considered to have reached complete failure.
3.4.2. PHM 2012 Dataset Preprocessing
The raw data from the PHM2012 dataset undergo several preprocessing steps, including noise reduction and normalization, to enhance data quality and consistency. Following this, 13 informative features are extracted through time–frequency domain analysis. These features encompass kurtosis, entropy, waveform indicators, spectral characteristics, vibration metrics, and various statistical descriptors, which are critical for accurately estimating the remaining useful life (RUL) of bearings. During model training, RUL values are normalized to the [0, 1] range using an independent scaler. Each data sample is then labeled according to the RUL at the end of a defined sliding time window, and by varying window lengths, three-dimensional tensors are constructed to capture temporal dependencies for deep sequence modeling. To quantitatively assess the prediction performance of the proposed MDFA-based network, standard regression metrics are used, including the coefficient of determination (R2 score), MAE, and RMSE.
3.4.3. Comparative Experiments
To evaluate the predictive accuracy of the proposed MDFA model, experiments were conducted under two operating conditions using the PHM2012 dataset. In each condition, bearings 1 and 2 were used for training, and bearing 3 was reserved for testing.
Figure 10 and
Figure 11 present the RUL prediction results on test bearings 1_3 and 2_3, respectively.
Under Operating Condition 1, while TCN and GRU achieved reasonable accuracy, the MDFA model outperformed them by more effectively focusing on critical local degradation patterns through its channel and spatial attention mechanisms. GRU-based variants with added convolution or attention modules showed limited improvement, as they often missed key local features. Global attention mechanisms tended to smooth out early transient signals, resulting in delayed predictions. The DCNN model suffered from overfitting due to noise amplification and lack of regularization. In the more complex Operating Condition 2, all models exhibited higher prediction errors, but MDFA maintained robust performance. Its predictions closely followed the true RUL trajectory, demonstrating strong sensitivity to early degradation and failure phases.
To further validate its superiority, MDFA was compared against a range of models, including GRU, CNN, hybrid GRU-CNN, TCN, Transformer, TaFCN, and SeNet. Under Operating Condition 1, MDFA achieved MAE reductions of up to 87.8% and RMSE reductions up to 88.8%, along with significant improvements in R2 scores. These results confirm that MDFA’s combination of multi-scale dilated convolutions and dual attention mechanisms enables accurate extraction of degradation features and enhanced focus on key temporal patterns, leading to superior RUL prediction accuracy.
3.4.4. Generalization Experiments
Given the nonstationary nature of real-world industrial environments, evaluating model performance under a single condition is insufficient. To assess the generalization and robustness of the proposed MDFA model, a cross-condition experiment was conducted. Bearing 1 from Operating Condition 1 and bearing 2 from Operating Condition 2 were used for training, while bearings 1 and 3 from Operating Condition 3 were used for testing.
As shown in
Figure 10,
Figure 11 and
Figure 12 and summarized in
Table 9, traditional deep models exhibited noticeable performance degradation under cross-condition settings. For instance, the DCNN model achieved an R
2 of −0.351 on bearing 1_3, performing worse than a mean predictor. In contrast, the MDFA model achieved an R
2 of 0.987, MAE of 0.023, and RMSE of 0.03. Compared with GRU, CNN, TCN, Transformer, and newer models like TaFCN and SeNet, MDFA consistently delivered superior results, with significant reductions in MAE and RMSE and improvements in R
2.
Overall, MDFA achieved the best performance across all test bearings, demonstrating strong generalization to varying speeds, loads, and degradation patterns. Its multi-scale dilated convolution module effectively captures degradation cues over different temporal scales, while the combined channel and spatial attention mechanisms enhance feature sensitivity and suppress noise. These capabilities make MDFA highly reliable for RUL prediction under complex real-world conditions.
Although the proposed MDFA model incorporates multiple parallel dilated convolution branches, dual attention modules, and a post-fusion convolutional layer, the additional computational cost remains moderate. As shown in
Table 9, the average inference time per sample for MDFA was 195 ms, which is only slightly higher than those of the lightweight models, such as CNN (142 ms) and GRU (167 ms), yet substantially lower than those of Transformer (310 ms) and CNN-GRU-Attention (265 ms). This demonstrates that MDFA achieves a favorable balance between accuracy and efficiency; it consistently outperforms baseline models in prediction accuracy while maintaining computational demands at a practical level suitable for real-time prognostics applications.
4. Discussion
This study proposes a novel remaining useful life (RUL) prediction framework based on multi-scale dilated convolution and fusion attention (MDFA), aimed at improving prognostic accuracy and robustness for aeroengine bearings. Extensive experiments on the NASA C-MAPSS and N-CMAPSS datasets demonstrate that the MDFA model consistently outperforms conventional deep learning approaches, such as CNN, GRU, TCN, Transformer, SeNet, and TaFCN. Specifically, MDFA achieved MAE values as low as 0.018–0.026, RMSE values of 0.021–0.032, and R2 scores above 0.987 across multiple test subdatasets, highlighting its superior predictive accuracy and stability under diverse operational conditions.
To further assess generalization, cross-domain validation was conducted using the PHM2012 bearing dataset, which features variable speeds, loads, and degradation modes typical of real-world rotating machinery. The MDFA model maintained consistently low MAE (0.023–0.026), RMSE (0.031–0.032), and high R2 (0.987–0.995) across all test bearings, demonstrating strong adaptability to complex, non-stationary degradation patterns. These results confirm that the MDFA framework effectively combines multi-scale temporal representation with attention-based feature refinement, offering a robust and generalizable solution for accurate RUL prediction in both aero-engine systems and broader mechanical prognostics applications.
5. Conclusions
The proposed MDFA framework demonstrates clear advantages, including its ability to capture degradation dynamics at multiple temporal scales and to selectively emphasize informative features through dual attention mechanisms. These design choices enable the model to achieve superior accuracy and robustness compared to conventional approaches, particularly in complex, multi-operating-condition datasets. Furthermore, the cross-domain validation results suggest that MDFA generalizes well to unseen scenarios, making it a promising solution for real-world prognostics across different machinery types.
However, despite these strengths, several limitations and practical challenges remain. The model’s reliance on extensive historical sensor data may limit its applicability in situations with sparse or noisy measurements, and its computational complexity, due to multi-scale convolutions and attention operations, could hinder real-time deployment in embedded or resource-constrained environments. Additionally, the tuning of hyperparameters, such as dilation rates, attention configurations, and input window sizes, may require significant domain expertise, potentially restricting straightforward adoption in industrial settings. Future work could focus on lightweight implementations, adaptive hyperparameter optimization, and integration with online learning to address these challenges.
Moreover, it should be noted that the current evaluation of the MDFA framework is limited to three publicly available datasets, primarily covering aero-engine and bearing degradation scenarios. The C-MAPSS dataset focuses on high-pressure compressor (HPC) and fan-related faults, while the PHM2012 bearing dataset captures gradual degradation reflected through vibration signals. Consequently, the model’s performance under other industrial machinery components, such as pumps or gearboxes, remains untested.
Additionally, early-stage, intermittent, or transient faults, which often occur in real-world operational environments, have not been assessed. This limitation restricts the demonstrated applicability of MDFA to a broader range of practical degradation patterns. Future work could extend evaluation to more diverse fault types and machinery systems to further validate the model’s generalization capability.