Remaining Useful Life Prediction of Airplane Engine Based on Bidirectional Mamba and Causal Discovery

Li, Min; Zhu, Longxia; Luo, Meiling; Ke, Ting

doi:10.3390/s25113429

Open AccessArticle

Remaining Useful Life Prediction of Airplane Engine Based on Bidirectional Mamba and Causal Discovery

by

Min Li

^*

,

Longxia Zhu

,

Meiling Luo

and

Ting Ke

College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(11), 3429; https://doi.org/10.3390/s25113429

Submission received: 27 April 2025 / Revised: 25 May 2025 / Accepted: 27 May 2025 / Published: 29 May 2025

(This article belongs to the Section Industrial Sensors)

Download

Browse Figures

Versions Notes

Abstract

Remaining Useful Life (RUL) plays a critical role in prognostics and health management systems. It helps increase reliability and safety for the equipment used in the modern industry. The new idea proposed is the Mamba deep learning model, which aims to find a good balance between predictive performance and computation cost. This paper presents a multimodal RUL prediction model, Cau–BiMamba–LSTM, using causal discovery, a bidirectional Mamba (BiMamba), attention mechanism, and Long Short-Term Memory (LSTM). The framework utilizes maximum information transfer entropy and simple exponential smoothing in building a causal graph model that extracts groups of feature variable groupsLSTM performs long-range dependencies; the attention mechanism dynamically focuses attention according to the temporal context; finally, the bidirectional state space model captures all contextual information over time for a richer insight into underlying data patterns. Tests conducted on the C-MAPSS dataset confirm that this model achieves superior predictive accuracy and robustness. Moreover, the model achieves high predictive performance in very complex, long time–series and provides fast responses.

Keywords:

remaining useful life prediction; Mamba; state space model; causal discovery

1. Introduction

In the age of Industry 4.0, the basic processes for monitoring and assuring the quality of fault risk management and reliability assessment are essential in maintaining the stability and intactness of operations for industrial equipment. The model for equipment maintenance is changing from passive maintenance to digital precision forecasting, performed through the detection of any faults with high-grade precision in a complex condition of operations [1]; this subsequently reduces the impact of equipment failure on the production process. The recent developments in enhancing GPU logic for deeper learning algorithms speed up the study of good artificial-intelligence-based models for fault prediction and health diagnosis. Accurate prediction of RUL and health status prediction of the equipment enhances the long-term stability of the equipment and reduces maintenance costs. Thus, RUL prediction has become one of the most important directions for research in the field of fault diagnosis and health management.

Current research methods for RUL prediction can be classified broadly into two groups: physics-based methods [2] and data-driven methods [3,4,5,6]. Physics-based methodologies take into account the physical laws and mathematical structures of equipment that explain the physical processes and mechanisms of failure in a reasonable way. However, complex nonlinear relationships and large uncertainties are very often involved in the failure processes of equipment, meaning that those mechanisms are at least well understood, aiding their explanation. In addition, their accuracy heavily relies on the passable estimates of system parameters; even with minor errors in the true parameters, the predictions could significantly deviate from the optimal outcome [7]. They also encounter great difficulties in tackling sudden or stochastic failings. On the other hand, data-driven RUL prediction methods automatically identify nonlinear dependencies associated with the equipment defect process, with minimal reliance on physical parameters. These methods can be broadly applied to various types of equipment. In recent years, data-driven RUL prediction methods based on deep learning have proved to be of incredible potential, with studies being published that include the use of CNNs [8,9], Long Short-Term Memory (LSTM) networks [10], Generative Adversarial Networks (GANs) [11], Graph Neural Networks (GNNs) [12,13,14], attention mechanisms (AMs) [15,16], and Transformer models [17,18]. Keshun et al. [19] proposed an RUL prediction model based on a three-dimensional attention mechanism, CNN, and BiLSTM that enhances prediction accuracy while achieving interpretability. Shi et al. [5] introduced a lightweight novel RUL prediction model integrating exponential smoothing, attention mechanisms, and LSTMs. Such models can be properly exploited in fault diagnosis or health prediction applications which demand quick responses. However, the performance of existing hybrid models using attention mechanisms and classical neural networks typically degrades with longer sequences.

State space models (SSMs) are mathematical representations derived from control systems theory, which describe how states evolve in dynamic systems and define the relationships between inputs and outputs. SSMs rely mainly on their state equations and representation output to use hidden states to preserve historical information [20]. In addition, by alleviating the vanishing gradient issue typical in traditional recurrent neural networks, this approach enables more effective modeling of long-range dependencies. By integrating recursive inference with convolutional training mechanisms, SSMs effectively balance real-time processing efficiency and parallel computation, significantly improving computational performance. Furthermore, SSMs possess the ability to dynamically adapt their parameters, allowing for flexible alignment with the dynamic properties of diverse systems. SSM is widely applied in time–series analysis, control systems, and signal processing. Regarding RUL prediction models based on SSM, for example, one may refer to [21,22]. However, traditional SSMs face challenges with high computational complexity when processing long sequences. Mamba [23] is an improved state space model that was developed to overcome traditional SSMs’ limitations in modeling long sequences. It has linear complexity, allowing it to efficiently model long sequences while capturing their dynamic changes. It excels in tasks such as language modeling and time–series forecasting, while also offering high computational efficiency. Currently, the application of Mamba in RUL prediction is still relatively limited. For example, Liang and Zhao [24] proposed a Mamba-based state space model for early RUL prediction of lithium-ion batteries and demonstrated the method’s strengths in terms of prediction performance, robustness, and efficiency. Zhu et al. [25] integrated attention–Mamba networks with the Physics-Informed Neural Networks (PINNs) framework, incorporating hard-to-detect physical information into the neural network to improve the model’s RUL prediction accuracy.

Feature selection is a crucial step in modern data-driven modeling [26], focusing on identifying and extracting the most relevant features from the original feature set to reduce redundancy and retain essential information. This process is designed with the aim of enhancing model performance, interpretability, and computational efficiency. Causal discovery algorithms are designed to uncover causal relationships between variables from observational data. Studies on improving feature selection through causal structure learning can be found in works such as [27]. Transfer entropy (TE) is an information–theoretic metric used to measure the directional flow of information or causal influence between time–series. In the context of equipment life prediction, applying the TE algorithm helps identify and select key causal relationships from large datasets, leading to more accurate prediction models. Causal discovery has been widely applied in time–series forecasting, such as in [28,29], which demonstrated the effectiveness of this method in improving model performance.

Although the transfer entropy algorithm demonstrates excellent performance in causality identification, its computational complexity is considerable, particularly when handling high-dimensional time–series data [30], as the computational burden grows significantly. Additionally, the TE algorithm is sensitive to noise, and the presence of noise in the data may lead to erroneous causality inferences. Inspired by [5,31], this paper proposes an RUL prediction model based on Bidirectional Mamba (BiMamba) and causal discovery algorithms. Firstly, the impact of random noise is effectively reduced through exponential smoothing techniques. Subsequently, the maximum information transfer entropy method, as described in [31], is employed to construct a causal graph of feature variables; based on these, key feature variables are selected. Then, utilizing these key feature variables, a model (namely, Cau–BiMamba–LSTM) combining BiMamba, LSTM, and causality is applied for RUL prediction.

This paper makes the following key contributions:

(1): To tackle the noise issue in time–series data, exponential smoothing is employed to weight and average the data points. Furthermore, the maximum information transfer entropy algorithm is utilized to identify more accurate causal relationships. This causality-driven feature selection method enhances the interpretability and prediction accuracy of the model.
(2): This paper is the first to apply the BiMamba model to the RUL prediction of aircraft engines. By integrating a hybrid model that combines bidirectional processing mechanisms, Mamba, attention mechanisms, and LSTM, the ability to model long sequences is significantly improved. The approach takes full advantage of Mamba’s low complexity and high computational efficiency, achieving enhanced accuracy while minimizing computational resource usage.
(3): Our model demonstrates superior performance on the C-MAPSS dataset, highlighting its potential as a versatile method for predicting RUL. The Cau–BiMamba–LSTM model achieves optimal performance in terms of RMSE and SCORE on the C-MAPSS dataset, with a parameter count as low as 3323. The prediction accuracy on the FD002 and FD004 datasets outperforms most existing models. Specifically, on the most complex sub-dataset, FD004, the RMSE reaches 14.37 and SCORE reaches 948, making it the best-performing model in terms of prediction accuracy compared to all other models.

The sections of the paper are organized as follows: in Section 2, preliminaries are discussed; in Section 3, the proposed Cau–BiMamba–LSTM prediction framework is presented; in Section 4, the experimental setup and the efficiency of the approach are developed; and finally, Section 5 concludes the paper and proposes some further research directions.

2. Preliminaries

2.1. State Space Model and Mamba

The traditional state space model is used conventionally in control system theory to describe the dynamic behavior of systems and estimate states from observational data. It transforms the input,

x (t) \in R

, into the output,

y (t) \in R

, with the help of the hidden state,

h (t) \in R^{N \times 1}

[32]:

h^{'} (t) = A h (t) + B x (t),

y (t) = C h (t) .

(1)

where

h^{'} (t)

is the derivative of the current state,

A \in R^{N \times N}

is the state transition matrix, which describes how the state changes over time,

B \in R^{N \times 1}

represents the matrix that defines the input’s impact on state changes, and

C \in R^{1 \times N}

is the output matrix.

To handle discrete rather than continuous data, the Zero-Order Hold technique [33] is employed, which converts the discretized signals into continuous signals suitable for the SSM. Specifically, a time-scale parameter

Δ \in R

is used to transform the continuous parameters A and B into their corresponding discrete parameters:

\bar{A} = e x p (Δ A),

\bar{B} = {(Δ A)}^{- 1} (e x p (Δ A) - I) Δ B .

(2)

rephrased using the step size

Δ

as follows:

h (t) = \bar{A} h_{t - 1} + \bar{B} x_{t},

y_{t} = C h_{t} .

(3)

In addition, the model can use global convolution calculations to produce the output:

\bar{K} = (C \bar{B}, C \bar{A} \bar{B}, \dots, C {\bar{A}}^{M - 1} \bar{B}),

y = x * \bar{K} .

(4)

Here, M represents the length of the input sequence, x; the convolution kernel is denoted by

\bar{K} \in R^{M}

. Clearly, the discrete SSM shares a similar structure to a recurrent neural network but offers superior parallel computing capabilities, which traditional RNNs, reliant on nonlinear activation functions, cannot achieve.

Traditional SSMs are time-invariant, meaning their

A, B, C

and

Δ

are independent of the model’s input variables. This limitation restricts the ability to model context effectively and impacts overall performance [34]. To address this issue, [23] proposed Mamba as a potential alternative. One of the core innovations of Mamba is the introduction of the time-varying selection mechanism. The state transition matrices are defined as follows:

A_{t} = f_{A} (x (t)), B_{t} = f_{B} (x (t))

(5)

The proposed mechanism dynamically adjusts the model’s weight matrices, enabling better adaptation to changes in the input sequence, leading to enhanced model performance. Moreover, Mamba introduces a hardware-aware algorithm that boosts computational efficiency through parallel scanning, kernel fusion, and the effective recomputation of intermediate results stored in memory. The structure of Mamba is depicted in Figure 1.

The BiMamba architecture, shown in Figure 2 and introduced by Liu et al. [32], significantly enhances model performance by leveraging dual-branch attention mechanisms to improve feature extraction and fusion.

First, the input sequence

X_{l - 1}

is normalized by

X_{l - 1}^{'} = N o r m (X_{l - 1})

. Then,

X_{l - 1}^{'}

is linearly projected onto x and z:

x = L i n e a r^{x} (X_{l - 1}^{'})

,

z = L i n e a r^{z} (X_{l - 1}^{'})

. x is for subsequent state space modeling, and z is for the gating mechanism. Next, the sequence undergoes forward and backward processing. For the forward operation, o, after passing through a 1D convolutional layer and applying the SiLU activation function, we obtain

x_{o}^{'}

. Then, we compute the linear projection of the SSM parameters

B_{o} = L i n e a r_{o}^{B} (x_{o}^{'})

,

C_{o} = L i n e a r_{o}^{C} (x_{o}^{'})

and the time-step parameter is calculated as

Δ_{o} = l o g (1 + e x p (L i n e a r_{o}^{Δ} (x_{o}^{'})) + P a r a m e t e r Δ_{o})

, where

P a r a m e t e r Δ_{o}

is learnable parameter. Subsequently, using

{\bar{A}}_{o} = Δ_{o} \cdot P a r a m e t e r Δ_{o}

,

{\bar{B}}_{o} = Δ_{o} \cdot B_{o}

, the system’s continuous-time representation is converted to a discrete-time representation, making the SSM more efficient for computation on a computer. The obtained matrices

{\bar{A}}_{o}

,

{\bar{B}}_{o}

,

C_{o}

, and

x_{o}^{'}

are fed into the state space model to produce the output of the forward pass

y_{f o r w a r d}

. The backward pass follows the same process, with the outputs from both the forward pass

y_{f o r w a r d}

and backward pass

y_{b a c k w a r d}

being multiplied with z after applying the nonlinear SiLU activation to compute

y_{f o r w a r d}^{'}

and

y_{b a c k w a r d}^{'}

, respectively. Ultimately, the final output

X_{l}

is derived using the equation

X_{l} = L i n e a r^{T} (y_{f o r w a r d}^{'} + y_{b a c k w a r d}^{'}) + X_{l - 1}

.

2.2. Attention Mechanism

2.2.1. Self-Attention Mechanism

The self-attention mechanism is a technique used to capture dependencies between elements in sequence data, initially introduced by Vaswani et al. in 2017 [35], and has become a fundamental part of the transformer model. The self-attention mechanism dynamically assigns attention weights by calculating the relevance of each element in the sequence to other elements, thereby better capturing global dependencies. Let the input be

X \in R^{N \times d}

, where N is the sequence length and d is the feature dimension. The definitions of query (denoted as Q), key (denoted as K) and value (denoted as V) are as follows:

Q = X W_{Q}, K = X W_{K}, V = X W_{V}

(6)

where

W_{Q} \in R^{d \times d_{k}}, W_{K} \in R^{d \times d_{k}}, W_{V} \in R^{d \times d_{k}}

are the weight matrices. The attention weights are:

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}})

(7)

The output is:

o u t p u t = A t t e n t i o n (Q, K, V) \cdot V

(8)

The self-attention mechanism captures long-range dependencies by transforming the input data into matrix format. It aggregates the input features with weighted attention, emphasizing key information while suppressing details that are less relevant to the current task, effectively capturing long-distance dependencies and global information.

2.2.2. Additive Attention Mechanism

The additive attention mechanism, first introduced by [36], is a commonly used attention technique. Compared to the traditional dot product attention, the additive attention mechanism offers a more stable computational process, particularly when handling high-dimensional data. It effectively mitigates potential numerical issues that may arise in the dot product method. For each query–key pair

(Q, K)

, the alignment SCORE is computed using the tanh activation function:

e (Q, K) = V^{T} t a n h (W_{Q} + W_{K} K + b)

(9)

Attention weights are:

α (Q, K) = \frac{e (Q, K)}{\sum_{K^{'}} e (Q, K^{'})}

(10)

The output is a weighted sum of all value vectors V.

2.3. Exponential Smoothing

Time–series data often contain various random fluctuations or noise, which may not be related to the true underlying trend. Exponential smoothing, through weighted averaging, enables the model to focus more on recent genuine changes, thereby effectively reducing noise [37]. The goal is to smooth the data and minimize the interference of random fluctuations on the model. Its mathematical expression is:

{\hat{x}}_{t} = \frac{\sum_{i = 0}^{t} ω_{i} x_{t - i}}{\sum_{i = 0}^{t} ω_{i}}

(11)

There are various exponential smoothing methods, and the definition of the smoothing coefficient

w_{i}

can vary; for example, ref. [5] uses the following method:

ω_{i} = {(1 - γ)}^{i}, γ = \frac{2}{(1 + s)}

. For example, the formula for simple exponential smoothing is as follows:

{\hat{x}}_{t} = α \cdot x_{t} + (1 - α) \cdot x_{t - 1}

(12)

When

α

is close to 1,

x_{t}

represents the actual observed value, and

\hat{x_{t}}

is the smoothed value. The model becomes more sensitive to the most recent observations, with a weaker smoothing effect on historical data, making it suitable for situations where the data changes rapidly. When

α

is close to 0, the model places more emphasis on smoothing the historical data, making it more suitable for situations where the data changes slowly.

2.4. Maximum Information Transfer Entropy

Information entropy can be utilized to measure causal relationships. Assuming that the information transfer between two variables will reduce the uncertainty of the system, it is termed as transfer entropy [38]. When the transfer entropy from X to Y is greater than from Y to X, we designate X as the cause and Y as the effect, thus establishing a causal relationship between the two variables. Causal network graphs utilize nodes to represent variables and edges to illustrate the causal connections between them. This graphical model enables us to understand the interactions and mutual influences among various components of mechanical equipment more clearly, providing an important foundation of accuracy for constructing predictive models.

Transfer entropy (TE), introduced by Schreiber in 2000 [38], quantifies the directed information flow or causal influence between two stochastic processes or time–series. maximum information transfer entropy (MITE) further optimizes the calculation of transfer entropy to capture the strongest information transfer relationships between systems. In reference [31], researchers introduced a causal modeling approach utilizing maximum information transfer entropy (MITE-CM) for analyzing causality in industrial control systems. This method combines transfer entropy with the Maximum Information Coefficient (MIC) network to measure causal interactions within systems. For more comprehensive information on the MITE-CM technique, consult references [29,31].

3. Methodology

The authors of [29] employ the MITE-CM algorithm to perform causal analysis on the CMAPSS dataset for aircraft engines, thereby improving the performance of prediction. However, they do not consider the impact of noise in the data on the causal relationships, which leads to inaccurate causal inference. To address this limitation, we utilize an approach that combines exponential smoothing with the MITE-CM algorithm for causal feature selection and introduce a new BiMamba module, integrating models such as LSTM and attention mechanisms for RUL prediction. The network architecture of our proposed Cau–BiMamba–LSTM model is illustrated in Figure 3.

First, this paper employs the simple exponential smoothing method and the MITE-CM algorithm for causal feature selection. Subsequently, the selected features are processed through an encoder layer, an aggregated encoding feature (AEF) module, an aggregated original feature (AOF) layer, and a decoder layer for RUL prediction. Compared to the methods proposed in [5,29], our innovation lies in the design of a BiMamba-based aggregated encoding feature layer. By leveraging a BiMamba module combined with residual networks, our approach enables the more precise extraction of data features. The architecture of the BiMamba module, as described in [32], is depicted in Figure 2.

We utilize an self-attention mechanism to aggregate the original features of key variables, employ LSTM for encoding, and leverage a combination of BiMamba, residual connection, and additive mechanisms for aggregating the encoded features. The two types of aggregated features are then concatenated and decoded using LSTM. The LSTM output is processed by a fully connected layer to produce the final decoded result, thereby achieving RUL prediction. In the BiMamba-based aggregated encoding feature layer, a self-attention mechanism is initially applied, with its output acting as the input for the initial BiMamba module. The output from this module is then combined with the self-attention mechanism output through a residual connection and passed into the second BiMamba module. This design effectively harnesses the capabilities of BiMamba, allowing it to capture a wide range of features and produce a more detailed representation, which enhances the accuracy of RUL prediction.

The Cau–BiMamba–LSTM model has advantages in the following three aspects:

(1): Information flow perspective: BiMamba and LSTM exhibit complementarity. BiMamba excels at capturing long-range dependencies and effectively modeling complex long-term trends in time–series data [32]. On the other hand, LSTM is better suited for local pattern recognition, as it can remember and forget specific information within shorter time spans. By combining these two models, the hybrid model leverages their respective strengths, capturing both long-term trends and short-term fluctuations. This complementarity enhances the model’s performance when handling complex time–series data.
(2): Computational complexity: The computational complexity of the Cau–BiMamba–LSTM model remains linear [5,32]. When dealing with large-scale datasets, integrating multiple models can significantly improve performance. However, the addition of models often results in increased computational burden. Through optimized design, the proposed hybrid model maintains high performance without a significant increase in computational cost, ensuring that the complexity grows linearly.
(3): Innovative causal feature selection and effective fusion with attention mechanism: The hybrid model innovatively combines causal feature selection with an attention mechanism. It fully leverages the advantages of transfer entropy theory for feature selection. The model uses exponential smoothing to remove noise and employs maximum transfer entropy for causal feature selection to enhance subsequent prediction accuracy. Additionally, the attention mechanism is incorporated to dynamically focus on important features. This allows the model to automatically prioritize features that contribute more significantly to the prediction, achieving effective fusion of information. By combining transfer entropy for causal feature selection with attention mechanisms for feature weighting, the hybrid model efficiently utilizes input features and improves overall prediction performance.

In summary, the Cau–BiMamba–LSTM model integrates various independent yet complementary modules, including exponential smoothing, causal feature selection, BiMamba, LSTM, and attention mechanisms. This combination not only fully leverages the strengths of each module but also overcomes the limitations that individual modules may have. Exponential smoothing reduces the impact of noise, the MITE-CM algorithm enhances causal inference capabilities, and the BiMamba module, through the integration of LSTM and the attention mechanism, improves feature learning and modeling of temporal dependencies. As a result, the model effectively improves the accuracy and reliability of RUL prediction by comprehensively addressing noise suppression, causal relationship discovery, and the learning of both long-term and short-term features in time–series.

4. Experimental Procedure and Analysis

4.1. Dataset

The C-MAPSS dataset [39], created by NASA, simulates real turbofan engines. It serves as an open-access dataset for studies in health monitoring and RUL prediction. The dataset includes four subsets, each representing different operating conditions and fault scenarios. The training set contains sampled values of various state parameters taken at different time points throughout a complete cycle, ranging from normal operation to failure. In contrast, the test set contains state parameters at a specific time point just before failure, along with the corresponding remaining lifespan. The dataset comprises 26 columns: the initial column identifies the engine ID, followed by the current operational cycle in the second column. The subsequent three columns outline the operating conditions, while columns 6 through 26 provide numerical data from 21 sensor readings. Table 1 offers a comprehensive overview of the C-MAPSS dataset.

The C-MAPSS dataset consists of four sub-datasets (FD001-FD004), with increasing complexity: FD001 contains a single operational condition and fault mode, while FD004 includes six operational conditions and two composite fault modes. Each sub-dataset’s training set contains full lifetime data from 100 to 249 engines, while the test set contains partial data from 100 to 248 engines. The challenges in processing the dataset mainly lie in operational condition shifts, noise interference, and the complex coupling of fault characteristics. For instance, the standard deviation of sensor noise in FD003 reaches 0.5%, and FD002 covers six operational conditions that span flight altitudes from sea level to 35,000 feet. These characteristics make the dataset an important benchmark for evaluating the robustness of predictive models, particularly in terms of handling multimodal degradation trajectories and generalization capabilities, and offer significant research value.

4.2. Evaluation Criteria

The model’s performance is evaluated using two metrics: Root Mean Square Error (RMSE) and the SCORE function. Among these, RMSE is a standard tool for measuring the prediction accuracy of regression models. Given a set of true values

y_{1}, y_{2}, \dots, y_{N}

and the corresponding model predictions

{\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{N}

, N represents the total count of data points. Apply Equation (13) to calculate the RMSE. A lower RMSE value reflects improved model prediction accuracy. The definition of the SCORE function can be found in (14), where d represents the deviation between the predicted RUL and the actual RUL, defined as

d_{i} = \hat{y_{i}} - y_{i}

. Since early fault detection is crucial for engines, when d < 0, it suggests the predicted RUL underestimates the actual RUL, implying there is still time for maintenance, and thus the penalty is smaller. Conversely, when d > 0, it indicates the predicted RUL overestimates the true RUL, which could lead to machine failure without timely repair, resulting in a larger penalty.

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}}

(13)

S C O R E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} S_{i}}, S_{i} = \{\begin{matrix} e^{- \frac{d_{i}}{13}} - 1, for d_{i} < 0 \\ e^{\frac{d_{i}}{10}} - 1, for d_{i} \geq 0 \end{matrix}

(14)

4.3. Experimental Setup

All experiments are implemented on a PC with 13th Gen Intel(R) Core(TM) i9-13900HX GPU (2.20 GHz) with 32-GB RAM, and the programming platform was python 3.10.

In the causal algorithm based on simple exponential smoothing and the MITE-CM algorithm, the smoothing factor

α

is set to 0.3. In the encoder layer, a two-layer LSTM with six hidden units is used. The AEF module employs an additive mechanism for the attention scoring function, whereas the AOF layer adopts a concatenation-based mechanism. Table 2 displays the smoothing rate values in the exponential smoothing method during data preprocessing, the hidden units and attention size in the AEF module, along with the attention size in the AOF layer. The decoder layer uses a two-layer LSTM, each with six hidden units and a fully connected layer also containing six hidden units. These parameters were obtained through experimental tuning. During the training process, the batch size was set to 128, and the number of epochs was set to 40. During the training process, the batch size was set to 128, and the number of epochs was set to 40.

We have provided the grid search validation results for the smooth factor (s value) and the validation results for the number of hidden units in the AEF module, which are presented in Table 3 and Table 4. These also serve as the basis for the parameter settings in Table 2.

5. Results and Analysis

5.1. Causality-Driven Feature Selection

Using the causal algorithm based on simple exponential smoothing and the MITE-CM algorithm, the resulting causal graph is shown in Figure 4. These graphs differ from the results presented in the literature [29].

From Figure 4, we select the key feature variables, as shown in Table 5.

5.2. RUL Prediction Result

Figure 5 compares the predicted and true RUL values for each engine across the four data subsets. The chart displays predicted and actual values for various engines, organized by engine ID. The blue line represents the predicted RUL values of the engines, while the orange line represents the true RUL values. It is evident that the proposed prediction framework achieves highly accurate results. In Figure 5, we not only list the actual RUL values and predicted RUL values for each engine, but also provide the deviation values, which are visualized using bar charts to show the prediction bias for each sub-dataset.

We evaluated the proposed approach against other established RUL prediction methods which are based on attention, LSTM, or Mamba, including GCU-Transformer [40], BIGRU-TSAM [41], DA-Transformer [15], GA-Transformer [42], CNN-BiLSTM-3DAttention [19], DA-LSTM [5], Cau-DA-LSTM [29], Mamba-PINN [25], and ABiTCI [43]. The results of the comparison are presented in Table 6, with optimal results in bold and suboptimal results underlined. The same format is applied to Table 7, Table 8 and Table 9 for consistency. As shown in Table 7, the Cau–BiMamba–LSTM model achieves the lowest mean and variance in RMSE, along with the lowest mean and the second-lowest variance in SCORE. This indicates that our proposed method delivers the best overall performance across the four subsets. Furthermore, the lower variance suggests that the exponential smoothing-based model exhibits strong generalization capability and robustness.

As demonstrated in Table 6 and Figure 6, the Cau–BiMamba–LSTM model achieves outstanding performance on both the FD004 and FD002 datasets. On the FD004 dataset, which is the most challenging subset of C-MAPSS, the BiMamba-DA-LSTM model demonstrates superior performance in both RMSE and SCORE compared to other models. This indicates that the model has a strong capability in handling intricate data patterns and long-term dependencies, enhancing its suitability for industrial applications, such as fault detection. On the FD002 dataset, the Cau–BiMamba–LSTM model’s RMSE is marginally higher than the optimal model but remains at a suboptimal level. The introduction of causal discovery algorithms to extract key features and the incorporation of the BiMamba module enable the model to efficiently capture the complex dependencies in the equipment degradation process, showing good generalization ability across different datasets and outperforming most comparison models overall. In conclusion, the Cau–BiMamba–LSTM model demonstrates excellent performance in RUL prediction tasks. Its efficiency, accuracy, and lightweight design will make it highly promising for practical industrial applications.

Table 8 shows that the proposed Cau–BiMamba–LSTM has a lower number of parameters and moderate computational efficiency, making it a lightweight RUL prediction model. This also demonstrates that the integration of the Mamba model with the LSTM model does not significantly increase computational complexity.

The essential difference between FD002/FD004 and FD001/FD003 lies in the coupling of multi-operational conditions and complex fault modes. The fault degradation trajectories of FD002/FD004 are disturbed by multi-operational conditions, and the fault evolution process is influenced by the coupling of multiple physical fields, resulting in stronger nonlinearity and randomness. Our model suppresses such disturbances through noise-robust mechanisms and explicitly captures the complex dependencies during condition switching through bidirectional state space modeling and a condition-aware mechanism. In contrast, the single-operational condition environment of FD001 and FD003 means that such complex modeling could introduce unnecessary computational overhead, thereby affecting performance. The benchmark model has been widely validated for its mature performance in simpler scenarios like FD001/FD003, but it shows limitations in predictive stability in multi-operational, complex environments. Our model, through a structured design, balances the modeling capability and computational efficiency for complex scenarios. Its core objective is to enhance the reliability of multi-condition coupled fault prediction, which is more common in practical engineering, thus better aligning with the key needs in the field of industrial health state prediction. Moreover, compared to existing models, it achieves optimal average generalization performance across all data distributions.

5.3. Ablation Study

The ablation study compares the performance of different model configurations using RMSE and SCORE metrics on the C-MAPSS dataset. The results show that the proposed Cau–BiMamba–LSTM consistently achieves the best performance, with the lowest RMSE and SCORE values in all datasets, demonstrating its effectiveness. Table 9 demonstrates that the ability to model long sequences is crucial, the MITE-CM algorithm for causal feature selection is effective, and exponential smoothing for noise reduction is necessary. This confirms the necessity of each core module.

6. Conclusions

This study proposes a lightweight Cau–BiMamba–LSTM model, enhancing RUL prediction accuracy and robustness. By integrating causal algorithms for feature selection, the BiMamba module for efficient sequence modeling, and the attention module for feature extraction, the model achieves notable performance improvements on the C-MAPSS dataset. Experimental results demonstrate that the proposed model reduces the RMSE to 14.37 and the SCORE to 948 on the C-MAPSS FD004 dataset, surpassing existing models (based on attention, LSTM, or Mamba) on the FD004 dataset, while also achieving a near-optimal level on the FD002 dataset. This validates the model’s generalization ability across multiple datasets, particularly its excellent performance under complex data distributions and long sequence dependencies. Furthermore, the model’s lightweight design enables deployment on resource-limited edge devices, meeting the demands of real-time prediction. Existing pure-data-driven RUL prediction models often face challenges in balancing prediction accuracy, computational complexity, and interpretability. The proposed model addresses these challenges by incorporating feature selection interpretability, making it both lightweight and highly accurate. It shows enhanced stability and robustness when dealing with diverse operating conditions and complex time–series data.

The model still has some limitations: While the attention mechanism can highlight the importance of different features in the prediction, it does not provide clear physical or system-level explanations, leading to a lack of interpretability in the model’s decision-making process. Additionally, it does not consider physical constraints and lacks effective integration of physical knowledge, which limits the model’s generalization ability to some extent. Future work will focus on integrating multi-source data, including time–series and causal graph data. We aim to combine Mamba with physics-informed networks to build an interpretable prediction framework. Additionally, we plan to consider integrating data-driven models with physical models to construct a hybrid prediction framework, further enhancing prediction performance and interpretability.

Author Contributions

Conceptualization, methodology, and writing—review and editing, M.L. (Min Li); software, investigation, and writing—original draft preparation, L.Z.; validation, formal analysis, resources, data curation, and visualization, M.L. (Meiling Luo); supervision, project administration, and funding acquisition, T.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China (Grant No. 12401668), the project of Tianjin science and technology plant (Grant No. 23YDTPJC00470) and the research project of China National Railway Group Co., Ltd. (Grant No. L2022G004).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The C-MAPSS dataset is a U.S. Government Work in the public domain, hosted by NASA’s Prognostics Center of Excellence (PCoE) with open access for non-commercial research. Our manuscript has fully complied with NASA’s terms through the following measures: (1) proper citation of the original technical report (Saxena et al., 2008 [39]); (2) exclusive use of the data for academic research purposes. The C-MAPSS dataset used in this study is publicly available from NASA’s Prognostics Data Repository (https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/) under the U.S. Government Work policy.

Acknowledgments

The authors appreciate the valuable feedback from the reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Y.; Li, J.; Zhang, X.; Wen, S.; Zhang, Z.; Zhang, G. Nonlinear prediction and analysis of the precision remaining useful life of the key meta-action unit of CNC machine tools with incomplete maintenance. Comput. Ind. Eng. 2023, 183, 109460. [Google Scholar] [CrossRef]
El-Dalahmeh, M.; Al-Greer, M.; El-Dalahmeh, M.; Bashir, I. Physics-based model informed smooth particle filter for remaining useful life prediction of lithium-ion battery. Measurement 2023, 214, 112838. [Google Scholar] [CrossRef]
Zhao, Y.; Zhang, W.; Yan, R. Remaining Useful Life Prediction of Aero-Engine Based on Data-Driven Approach Using LSTM Network. IEEE Access 2022, 10, 25359–25370. [Google Scholar]
Liang, P.; Li, Y.; Wang, B.; Yuan, X.; Zhang, L. Remaining useful life prediction via a deep adaptive transformer framework enhanced by graph attention network. Int. J. Fatigue 2023, 174, 107722. [Google Scholar] [CrossRef]
Shi, J.; Zhong, J. A dual attention LSTM lightweight model based on exponential smoothing for remaining useful life prediction. Reliab. Eng. Syst. Saf. 2024, 243, 109821. [Google Scholar] [CrossRef]
Liu, F.; Liu, S.; Chai, Y. Enhanced Mamba model with multi-head attention mechanism and learnable scaling parameters for remaining useful life prediction. Sci. Rep. 2025, 15, 7178. [Google Scholar] [CrossRef]
Xiang, Y.; Arora, J.S.; Abdel-Malek, K. Physics-based modeling and simulation of human walking: A review of optimization-based and other approaches. Struct. Multidiscip. Optim. 2020, 42, 1–23. [Google Scholar] [CrossRef]
Li, X.; Ding, Q. Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab. Eng. Syst. Saf. 2018, 172, 1–11. [Google Scholar] [CrossRef]
Sun, S.; Wang, J.; Xiao, Y. Few-shot RUL prediction for engines based on CNN-GRU model. Sci. Rep. 2024, 14, 16041. [Google Scholar] [CrossRef]
Zhao, C.; Huang, X.; Li, Y. A novel cap-LSTM model for remaining useful life prediction. IEEE Sens. J. 2021, 21, 23498–23509. [Google Scholar] [CrossRef]
Behera, S.; Misra, R. Generative adversarial networks based remaining useful life estimation for IIoT. Comput. Electr. Eng. 2021, 92, 107195. [Google Scholar] [CrossRef]
Zhou, L.; Wang, H. MST-GAT: A multi-perspective spatial-temporal graph attention network for multi-sensor equipment remaining useful life prediction. Inf. Fusion 2024, 110, 102462. [Google Scholar] [CrossRef]
Wang, Y.; Xu, Y.; Yang, J. Fully-connected spatial-temporal graph for multivariate time-series data. Proc. AAAI Conf. Artif. Intell. 2024, 38, 15715–15724. [Google Scholar] [CrossRef]
Liu, S.; Lv, C.; Song, F. Remaining useful life prediction integrating working conditions and uncertainty quantification based on multilayer graph neural networks. J. Braz. Soc. Mech. Sci. Eng. 2025, 47, 77. [Google Scholar] [CrossRef]
Liu, L.; Song, X.; Zhou, Z. Aircraft engine remaining useful life estimation via a double attention-based data-driven architecture. Reliab. Eng. Syst. Saf. 2022, 221, 108330. [Google Scholar] [CrossRef]
Li, X.; Jiang, H.; Liu, Y.; Wang, T.; Li, Z. An integrated deep multiscale feature fusion network for aeroengine remaining useful life prediction with multisensory data. Knowl.-Based Syst. 2022, 235, 107652. [Google Scholar] [CrossRef]
Kim, G.; Choi, J.G.; Lim, S. Using transformer and a reweighting technique to develop a remaining useful life estimation method for turbofan engines. Eng. Appl. Artif. Intell. 2024, 133, 108475. [Google Scholar] [CrossRef]
Zhang, Z.; Chen, P.; Xing, C. A data augmentation boosted dual informer framework for the performance degradation prediction of aero-engines. IEEE Sens. J. 2023, 23, 12018–12030. [Google Scholar] [CrossRef]
Keshun, Y.; Guangqi, Q.; Yingkui, G. A 3-D attention-enhanced hybrid neural network for turbofan engine remaining life prediction using CNN and BiLSTM models. IEEE Sens. J. 2023, 24, 21893–21905. [Google Scholar] [CrossRef]
Lv, X.; Sun, Y.; Zhang, K.; Qu, S.; Zhu, X.; Fan, Y.; Wu, Y.; Hua, E.; Long, X.; Ding, N.; et al. Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models. arXiv 2025, arXiv:2503.11224. [Google Scholar]
Hu, Y.; Liu, S.; Lu, H. Remaining useful life model and assessment of mechanical products: A brief review and a note on the state space model method. Chin. J. Mech. Eng. 2019, 32, 15. [Google Scholar] [CrossRef]
Farhat, M.T.; Moghaddass, R. State-space modeling for degrading systems with stochastic neural networks and dynamic Bayesian layers. IISE Trans. 2024, 56, 497–514. [Google Scholar] [CrossRef]
Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
Liang, Y.; Zhao, S. Early Prediction of Remaining Useful Life for Lithium-Ion Batteries with the State Space Model. Energies 2024, 17, 6326. [Google Scholar] [CrossRef]
Zhu, Q.; Shi, Y.; Feng, Y. Physics-Informed Neural Networks for RUL Prediction. In Proceedings of the 2024 China Automation Congress (CAC), Qingdao, China, 1–3 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 6361–6366. [Google Scholar]
Dong, Y.; Yang, T.; Xing, Y.; Du, J.; Meng, Q. Data-driven modeling methods and techniques for pharmaceutical processes. Processes 2023, 11, 2096. [Google Scholar] [CrossRef]
Yu, K.; Guo, X.; Liu, L. Causality-based feature selection: Methods and evaluations. ACM Comput. Surv. (CSUR) 2020, 53, 1–36. [Google Scholar] [CrossRef]
Duan, Z.; Xu, H.; Huang, Y.; Feng, J.; Wang, Y. Multivariate time series forecasting with transfer entropy graph. Tsinghua Sci. Technol. 2022, 28, 141–149. [Google Scholar] [CrossRef]
Li, M.; Luo, M.; Ke, T. Interpretable Remaining Useful Life Prediction Based on Causal Feature Selection and Deep Learning. In Proceedings of the International Conference on Intelligent Computing, Tianjin, China, 5–8 August 2024; Springer Nature: Singapore, 2024; pp. 148–160. [Google Scholar]
Zhang, X.; Hu, W.; Yang, F.; Cao, W.; Wu, M. A new transfer entropy approach based on information granulation and clustering for root cause analysis. Control Eng. Pract. 2023, 140, 105669. [Google Scholar] [CrossRef]
Zhang, R.; Cao, Z.; Wu, K. ICS causality modeling based on maximum information transfer entropy. Appl. Res. Comput. 2021, 38, 800–804. [Google Scholar]
Liu, Y.; Tian, Y.; Zhao, Y. Vmamba: Visual state space model. Adv. Neural Inf. Process. Syst. 2024, 37, 103031–103063. [Google Scholar]
Pechlivanidou, G.; Karampetakis, N. Zero-order hold discretization of general state space systems with input delay. IMA J. Math. Control Inf. 2022, 39, 708–730. [Google Scholar] [CrossRef]
Qu, H.; Ning, L.; An, R. A survey of mamba. arXiv 2024, arXiv:2408.01129. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Tahmid, A.; Azmain, M.A.; Chowdhury, M.S.; Hossam-E-Haider, M. Exponential smoothing technique in filtration of distorted radar signal. In Proceedings of the 2022 International Conference for Advancement in Technology (ICONAT), Goa, India, 21–22 January 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 2000, 85, 461–464. [Google Scholar] [CrossRef] [PubMed]
Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage propagation modeling for aircraft engine run-to-failure simulation. In Proceedings of the 2008 International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 1–9. [Google Scholar]
Mo, Y.; Wu, Q.; Li, X.; Huang, B. Remaining useful life estimation via transformer encoder enhanced by a gated convolutional unit. J. Intell. Manuf. 2021, 632, 1997–2006. [Google Scholar] [CrossRef]
Zhang, J.; Jiang, Y.; Wu, S.; Li, X.; Luo, H.; Yin, S. Prediction of remaining useful life based on bidirectional gated recurrent unit with temporal self-attention mechanism. Reliab. Eng. Syst. Saf. 2022, 221, 108297. [Google Scholar] [CrossRef]
Mo, H.; Iacca, G. Evolutionary neural architecture search on transformers for RUL prediction. Mater. Manuf. Processes 2023, 38, 1881–1898. [Google Scholar] [CrossRef]
Wang, J.; Lu, Z.; Zhou, J.; Schröder, K.U.; Liang, X. A novel remaining useful life prediction method under multiple operating conditions based on attention mechanism and deep learning. Adv. Eng. Inform. 2025, 64, 103083. [Google Scholar] [CrossRef]

Figure 1. Graphical representation of Mamba Block [23].

Figure 2. BiMamba module in [32].

Figure 3. Structure of the Cau–BiMamba–LSTM framework.

Figure 4. Causal network graphs on C-MAPSS dataset. The causal relationship diagrams for FD001, FD002, FD003, and FD004 are represented by (a–d), respectively.

Figure 5. Prediction results of Cau–BiMamba–LSTM; (a–d) represent the comparison results of the true values and predicted values for FD001, FD002, FD003, and FD004, respectively. The blue bars represent the prediction bias for each engine in each sub-dataset.

Figure 6. Visualization of model performance comparison. (a) Comparison of RMSE for the ten models; (b) comparison of SCORE.

Table 1. Overview of the C-MAPSS dataset.

Sub-Dataset	C-MAPSS
Sub-Dataset	FD001	FD002	FD003	FD004
Training engines	100	260	100	249
Testing engines	100	259	100	248
Operating conditions	1	6	1	6
Failure mode	1	1	2	2

Table 2. Hyperparameter settings in datasets.

	FD001	FD002	FD003	FD004
Hidden units number in AEF module	8	8	10	6
Attention size in AEF module	1	1	3	3
Attention size in AOF layer	16	16	6	6
Smooth rate ( s value)	25	30	30	30

Table 3. Sensitivity of hidden units number in AEF module.

Hidden Units Number in AEF Module	RMSE				SCORE
	FD001	FD002	FD003	FD004	FD001	FD002	FD003	FD004
2	15.30	18.76	22.59	22.82	405	1845	3356	6030
4	14.55	13.79	15.94	15.38	383	862	747	1171
6	14.15	13.50	14.64	14.37	391	900	546	948
8	13.16	13.09	13.73	15.26	264	793	451	1180
10	13.04	13.44	13.85	14.55	272	858	426	1022

Table 4. Sensitivity of smooth rate (s value).

Smooth Rate (s value)	RMSE				SCORE
	FD001	FD002	FD003	FD004	FD001	FD002	FD003	FD004
20	14.15	14.44	14.72	14.78	330	1020	580	958
25	13.16	13.56	14.35	14.59	264	821	519	990
30	13.49	13.09	13.85	14.37	290	793	426	948
35	13.98	13.53	14.48	14.58	319	889	465	1035

Table 5. Variables utilized in the proposed Cau–BiMamba–LSTM model.

	FD001 and FD003	FD002 and FD004
Variable name	ID	ID
Sensor signal	2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20, 21	2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20
Operational setting	2	1, 2

Table 6. Model performance comparison on the C-MAPSS dataset.

Methods	RMSE				SCORE
	FD001	FD002	FD003	FD004	FD001	FD002	FD003	FD004
GCU-Transformer [40], 2021	11.27	22.81	11.42	24.86	N/A	N/A	N/A	N/A
BIGRU-TSAM [41], 2022	12.56	18.94	12.45	20.47	213	2264	233	3610
DA-Transformer [15], 2022	12.25	17.08	13.39	19.86	198	1575	290	1741
GA-Transformer [42], 2023	11.63	15.99	11.35	20.15	215	1133	228	2672
CNN-BiLSTM-3DAttention [19], 2023	13.12	13.93	12.15	20.24	231	760	196	1710
DA-LSTM [5], 2024	12.62	13.22	13.34	16.25	263	842	360	1372
Cau-DA-LSTM [29], 2024	13.87	13.45	14.31	14.93	321	853	369	1129
Mamba-PINN [25], 2024	N/A	N/A	N/A	18.18	N/A	N/A	N/A	N/A
ABiTCI [43], 2025	N/A	12.46	N/A	15.57	N/A	736	N/A	1003
Proposed Cau–BiMamba–LSTM	13.16	13.09	13.85	14.37	264	793	426	948

Table 7. Mean and variance of different models.

Methods	RMSE		SCORE
	$μ$ (RMSE)	$σ$ (RMSE)	$μ$ (SCORE)	$σ$ (SCORE)
GCU-Transformer [40], 2021	17.59	6.96	N/A	N/A
BIGRU-TSAM [41], 2022	16.11	4.81	1530.00	1522.13
DA-Transformer [15], 2022	15.65	3.30	951.00	764.03
GA-Transformer [42], 2023	14.78	3.96	1062.00	1103.01
CNN-BiLSTM-3DAttention [19], 2023	14.86	3.78	724.25	738.03
DA-LSTM [5], 2024	13.86	1.51	709.25	476.27
Cau-DA-LSTM [29], 2024	14.14	0.67	667.50	370.42
Mamba-PINN [25], 2024	18.18 *	N/A *	N/A	N/A
ABiTCI [43], 2025	14.02 **	1.45 **	869.50 **	133.50 **
Proposed Cau–BiMamba–LSTM	13.62	0.47	607.75	288.50

The asterisk (*) is used to annotate data calculation methods that require special explanation. A single asterisk () * indicates that the model has only one column of RMSE or SCORE data. Double asterisks () ** indicate that the model has only two columns of RMSE or SCORE data.

Table 8. Comparison for computation quantity.

Model	Proposed	DA-LSTM	Cau-DA-LSTM	CNN-BiLSTM-3DAttention	DA-Transformer	BIGRU-TSAM	GCU-Transformer
	Cau–BiMamba–LSTM	[5]	[29]	[19]	[15]	[41]	[40]
Parameter num	3323	3550	3898	151,900	116,591	2,825,443	399,700
FLOPs	$2.31 \times 10^{6}$	$1.27 \times 10^{5}$	$6.39 \times 10^{6}$	$1.70 \times 10^{5}$	$7.44 \times 10^{6}$	$1.68 \times 10^{8}$	$3.93 \times 10^{5}$

Table 9. Ablation study.

Methods	RMSE				SCORE
	FD001	FD002	FD003	FD004	FD001	FD002	FD003	FD004
No BiMamba in AEF module	14.22	13.18	14.68	14.54	359	810	547	987
No causal feature selection	13.45	13.45	14.24	17.80	333	972	621	2158
Causal feature selection without exponential smoothing	14.53	13.56	14.60	18.67	416	958	575	2611
Proposed Cau–BiMamba–LSTM	13.16	13.09	13.85	14.37	264	793	426	948

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, M.; Zhu, L.; Luo, M.; Ke, T. Remaining Useful Life Prediction of Airplane Engine Based on Bidirectional Mamba and Causal Discovery. Sensors 2025, 25, 3429. https://doi.org/10.3390/s25113429

AMA Style

Li M, Zhu L, Luo M, Ke T. Remaining Useful Life Prediction of Airplane Engine Based on Bidirectional Mamba and Causal Discovery. Sensors. 2025; 25(11):3429. https://doi.org/10.3390/s25113429

Chicago/Turabian Style

Li, Min, Longxia Zhu, Meiling Luo, and Ting Ke. 2025. "Remaining Useful Life Prediction of Airplane Engine Based on Bidirectional Mamba and Causal Discovery" Sensors 25, no. 11: 3429. https://doi.org/10.3390/s25113429

APA Style

Li, M., Zhu, L., Luo, M., & Ke, T. (2025). Remaining Useful Life Prediction of Airplane Engine Based on Bidirectional Mamba and Causal Discovery. Sensors, 25(11), 3429. https://doi.org/10.3390/s25113429

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Remaining Useful Life Prediction of Airplane Engine Based on Bidirectional Mamba and Causal Discovery

Abstract

1. Introduction

2. Preliminaries

2.1. State Space Model and Mamba

2.2. Attention Mechanism

2.2.1. Self-Attention Mechanism

2.2.2. Additive Attention Mechanism

2.3. Exponential Smoothing

2.4. Maximum Information Transfer Entropy

3. Methodology

4. Experimental Procedure and Analysis

4.1. Dataset

4.2. Evaluation Criteria

4.3. Experimental Setup

5. Results and Analysis

5.1. Causality-Driven Feature Selection

5.2. RUL Prediction Result

5.3. Ablation Study

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI