A Deep Equilibrium Model for Remaining Useful Life Estimation of Aircraft Engines

Plakias, Spyridon; Boutalis, Yiannis S.

doi:10.3390/electronics14122355

Open AccessArticle

A Deep Equilibrium Model for Remaining Useful Life Estimation of Aircraft Engines

by

Spyridon Plakias

^*

and

Yiannis S. Boutalis

Department of Electrical and Computer Engineering, Democritus University of Thrace, 67100 Xanthi, Greece

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(12), 2355; https://doi.org/10.3390/electronics14122355

Submission received: 7 May 2025 / Revised: 4 June 2025 / Accepted: 5 June 2025 / Published: 9 June 2025

(This article belongs to the Special Issue Advances in Condition Monitoring and Fault Diagnosis)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Estimating Remaining Useful Life (RUL) is crucial in modern Prognostic and Health Management (PHM) systems providing valuable information for planning the maintenance strategy of critical components in complex systems such as aircraft engines. Deep Learning (DL) models have shown great performance in the accurate prediction of RUL, building hierarchical representations by the stacking of multiple explicit neural layers. In the current research paper, we follow a different approach presenting a Deep Equilibrium Model (DEM) that effectively captures the spatial and temporal information of the sequential sensor. The DEM, which incorporates convolutional layers and a novel dual-input interconnection mechanism to capture sensor information effectively, estimates the degradation representation implicitly as the equilibrium solution of an equation, rather than explicitly computing it through multiple layer passes. The convergence representation of the DEM is estimated by a fixed-point equation solver while the computation of the gradients in the backward pass is made using the Implicit Function Theorem (IFT). The Monte Carlo Dropout (MCD) technique under calibration is the final key component of the framework that enhances regularization and performance providing a confidence interval for each prediction, contributing to a more robust and reliable outcome. Simulation experiments on the widely used NASA Turbofan Jet Engine Data Set show consistent improvements, with the proposed framework offering a competitive alternative for RUL prediction under diverse conditions.

Keywords:

remaining useful life; Deep Equilibrium Model; Monte Carlo Dropout (MCD) technique; NASA Turbofan Jet Engine Data Set

1. Introduction

Prognostic and Health Management (PHM) of complex systems such as aircraft engines leverages expert domain knowledge and advanced sensor analysis to provide real-time monitoring of the health status of vital system components. The precise estimation of Remaining Useful Life (RUL) is an essential health-state indicator in modern PHM systems. By providing vital information about the degradation of the equipment, catastrophic failures, unplanned shutdowns and financial losses are prevented [1].

RUL estimation techniques are classified into two main classes: model-based and data-driven methods [2]. Model-based methods require prior domain knowledge of complex systems from experts to build a physical model, making their application difficult. On the other hand, the recent advancements in sensor and communication technology have improved data collection, enhancing the effectiveness of data-driven methods and giving them a clear advantage.

Data-driven methods can be further classified into conventional Machine Learning (ML) techniques [3] and Deep Learning (DL) approaches [4]. The initial stages of data-driven methods are data collection and pre-processing. Conventional ML methods typically rely on handcrafted statistical features to represent the degradation process, whereas many advanced DL frameworks are designed to learn complex hierarchical features directly from raw sensor data, often reducing the need for extensive manual feature engineering. However, the specific framework, the characteristics of the dataset and the designed architecture of the network influence the effectiveness of the automatic feature extraction [5].

Several studies have employed conventional machine learning methods to represent the degradation process and estimate the RUL. In a study, Empirical Mode Decomposition (EMD) effectively analyzes non-linear and non-stationary data to extract robust features. These features are then inserted into a Random Forest (RF) model, which is optimized through Bayesian optimization for superior RUL prediction [6]. Also, to achieve adaptability and real-time capability of RUL prediction, another research effort uses the Unscented Kalman Filter (UKF) to recursively update the degradation parameters within a logistic regression model [7]. While this method offers advantages in online adaptability, it can be sensitive to the initial state estimation and the underlying assumptions about the system’s dynamics. A hybrid similarity-based clustering method enhances the performance of ML regression models by training them on groups clustered based on a degradation index [8]. Additionally, the Support Vector Regression model is presented in [9], while the XGBOOST algorithm, as an ensemble method utilizing decision trees as base learners, achieves the best performance compared to conventional machine learning algorithms [10]. An extensive comparison study of kernel Adaptive Filtering Methods has been presented in [11], showing effectiveness and reliability in cases where the computation cost is crucial.

Deep learning models are sufficient for accurately predicting remaining useful life (RUL), having as a compelling advantage the learning of complex hierarchical features directly from raw sensor data. In one study [12], a combination of four deep neural networks with an attention mechanism achieved impressive accuracy in RUL prognosis. Several research papers combine recurrent neural networks with convolutional layers or attention mechanisms to integrate spatial and temporal information. Long Short-Term Memory (LSTM) and bidirectional gated recurrent unit layers are utilized alongside a temporal self-attention mechanism in studies [13,14]. In another approach, the features extracted from ensemble empirical mode decomposition and wavelet packet transform are inserted into genetic-based optimized RNN and LSTM blocks [15]. Convolutional operations effectively capture spatial knowledge with the extraction of multi-scale features and estimate RUL predictions in [16,17]. In another study [18], a two-phase DL model combines a reformed convoluted LSTM layer with an attention mechanism to effectively estimate RUL. Also, with the use of variational inference, the degradation representation is encoded in a latent space, improving regularization and overall performance [19]. In another approach, a Graph Spatial-Temporal Neural Network effectively captures the correlations between sensor responses and health states [20] and an Embedded Attention-based Parallel network enhances the representation capability of extracted features [21]. An innovative neural approach uses ordinary differential equations (ODEs) as a residual network and shows the validity of ODEs in estimating the degradation process [22].

As discussed in the architectures above, most neural frameworks for RUL estimation use explicit layers to directly produce output from input based on their mathematical formulations. In the current research paper, we employ an implicit technique, where the input–output mapping is not defined from a fixed computational architecture but is dynamically estimated as the convergence fixed-point of a dynamic equation that describes the system.

Implicit layers often refer to:

Neural Ordinary Differential Equations (Neural ODEs), where the model learns a function that represent the evolving dynamics of the hidden state over time. This continuous transformation is solved using numerical integration methods. Because of their ability to share parameters across time, these models are memory-efficient [23].
Optimization-based Implicit Layers define their output as the outcome of a mathematical optimization problem embedded within the neural architecture. Gradients are estimated using the implicit differentiation theorem. The model learns not only from data but also ensures the satisfaction of certain mathematical properties or constraints [24].
Deep Equilibrium Models (DEMs) where the hidden representation is computed as the equilibrium of a fixed point equation. This fixed point is computed through iterative root-finding methods, allowing the model to represent arbitrarily deep computations without explicitly stacking layers, while using constant memory during training [25].

While these frameworks provide valuable approaches for implicit modeling, they often lack internal expressive mechanisms for modeling multivariate spatio-temporal dependencies in the time-series sensor data of complex systems.

In this research study, we use implicit layers as equilibrium blocks. A fixed-point equation, derived from the underlying architecture of the implicit layer, describes the nonlinear dynamics of the system. This state equation is solved using iterative numerical solvers, producing a convergence representation that corresponds to the equilibrium point of the fixed-point equation. So, instead of stacking a predetermined number of layers, we estimate the convergence point of an implicit layer, considering that the equilibrium point more accurately reflects the system dynamics.

Explicit deep networks use a fixed number of operating layers. In contrast, implicit models perform in an adaptive number of layers until convergence. This flexibility allows them to converge effectively to an equilibrium solution, making them a more robust option for various applications. However, applying classic back-propagation becomes challenging because memory requirements increase significantly due to the unrolling process of multiple iterations. Instead, we use the Implicit Differentiation Technique based on the Implicit Differentiation Theorem (IFT) to compute gradients without unrolling the intermediate iterations [24].

The Deep Equilibrium (DE) block integrates convolutional operations and a novel attention-based Dual-Input Interconnection mechanism, created specifically for implicit deep models on multivariate time-series data. The convolutional component extracts local spatial and temporal patterns from raw sensor inputs, producing an input feature map representing short-term dependencies. The input feature mapping and a latent representation vector, which encodes the internal health state of the system, are dynamically processed by the Dual-Input Interconnection mechanism. This allows the model to perform a cross-attention-like operation, where the input mapping is projected as keys and values and the health state as queries in a shared embedding space. So, the latent state is adaptively updated based on the most relevant observed patterns of the input. The equilibrium state is used as a health indicator for system monitoring since it captures long-term degradation patterns and local sensor behavior. So, the DE block allows for a highly expressive and memory-efficient representation that captures the dynamics of the underlying system by iteratively updating the latent state until convergence. The architecture differs from conventional attention-based architectures in DE models, where self-attention is applied only to the latent representation vector and the input vector mapping is used as a residual connection.

A fundamental challenge of DE models is maintaining stability during the training process since slight deviations in the input can result in significant deviations in the computed fixed point. Researchers are actively addressing these stability issues by implementing regularization techniques. In our current research effort, we are dealing with instability by incorporating Group Normalization across layers of the DE block. This technique divides the channels of a feature map into groups and normalizes the values within each group independently. By doing so, we can control the magnitude of the activations and help prevent instability.

As a final key component of the framework, we employ a feed-forward neural regression model augmented with the technique of Monte Carlo (MC) Dropout [26,27]. Unlike traditional models that disable dropout during inference, this method maintains dropout functionality throughout the process, resulting in varied outputs for each forward pass. The mean of the multiple stochastic responses is the estimated RUL, and their variance measures uncertainty for each prediction. So, we increase the neural network’s generalization capability and improve performance. After the end of the training process, the uncertainty for each prediction is calibrated based on a validation set, enhancing the overall reliability of the model. The main component of the calibration process is a Gaussian Mixture Model, which groups the validation samples. Subsequently, each group is calibrated with the estimation of a scaling factor by an optimization problem. Therefore, by the addition of uncertainty into the prediction process, the model’s reliability is improved, and an insightful confidence interval is provided. In that way, the proposed framework becomes more robust, reliable, and informative.

The CMAPSS (Commercial Modular Aero-Propulsion System Simulation) dataset [28] is a commonly used benchmark in PHM, particularly for RUL estimation, and is used to evaluate the presented framework. This dataset is generated from a detailed simulation model of turbofan engines, which captures their complex dynamics and degradation behavior. CMAPSS consists of four sub-datasets, each having different levels of complexity by incorporating varying operating conditions and faults. The experimental results demonstrate competitive performance comparable to recent state-of-the-art frameworks. Also, the proposed DEM can be applied on any complex system since it has a general purpose design, provided that multivariate sequential sensor data is available.

The contributions of the research paper are as follows:

We propose a Deep Equilibrium Model (DEM) for RUL estimation that effectively captures both spatial and temporal patterns in multivariate sensor data through implicit modeling. The architecture consistently achieves competitive performance across diverse operating conditions on the CMAPSS dataset.
The core element of the Deep Equilibrium Model is a novel Dual-Input Interconnection Attention Block, which enables iterative and adaptive updates of the latent degradation representation by jointly processing the internal health state and the spatio-temporal features extracted from convolutional blocks. Unlike standard Transformer self-attention mechanisms used in DE frameworks, which typically operate only on the latent representation and incorporate input features as a static residual, the proposed attention-based block performs a cross-attention-like interaction between two distinct inputs. This design enhances the model’s ability to capture complex degradation dynamics, leading to a more expressive and context-aware health representation.
The Calibrated Monte Carlo Dropout technique improves the reliability of the framework, providing a confidence interval for each estimation. An innovative calibration method based on a Gaussian Mixture Clustering Model is presented.

The structure of the paper follows. Section 2 describes the general principles of DE models, focusing on the estimation of the fixed point during the forward pass and the application of implicit differentiation in the backward pass. Section 3 describes in detail the proposed architecture, focusing on the innovative design of the Dual-Input Interconnection layer. Section 4 presents the experimental analysis and the simulation results, and finally we have the conclusions and future work.

2. Deep Equilibrium Models

DE models are implicit layers that include modern DL architectures. Their goal is to reach an equilibrium point that captures the system’s non-linear dynamics, using root-finding techniques rather than iteration-based methods. The characteristic fixed-point equation of a DE model is

z = f (z, x)

(1)

where

x \in R^{n}

is the input of the equilibrium layer,

z \in R^{m}

denotes the latent representation state, and

f : R^{m} \times R^{n} \to R^{m}

is the implicit function that defines the balance between internal representation states and external influences. Also, we denote as

z^{🟉}

the equilibrium representation vector of Equation (1), which is derived using a fixed-point solver.

The first part of Figure 1 illustrates the architecture of the DE layer. The second part shows its unfolding process until it reaches convergence. We observe its similarity with RNNs since both update an internal representation but with a significant difference; recurrent networks process input sequentially, one element at a time corresponding to a specific time step. In contrast, the DE layer processes all the input information at each update step and updates its internal representation until convergence. So, the DE layer analyzes input information from iterative time steps in parallel, capturing temporal coherence. This approach, along with potential internal procedures that capture spatial knowledge, enhances the model’s ability to understand the complex non-linear dynamics of modern systems.

Alternatively, DE layers are strongly similar to Residual Networks (ResNets), as we can observe through their unfolding. While ResNets achieve depth by stacking multiple layers, DE layers effectively achieve depth through the iterative update of an internal representation until it reaches a fixed point. Moreover, DE layers share parameters across the updating process, making them more robust to over-fitting. Also, the training of DE layers is more computationally efficient than ResNets’ since it is based on implicit differentiation.

One crucial part concerning DE training is the existence and uniqueness of fixed points. The stability of the process is an active research topic. One key requirement for ensuring convergence is the existence of 1-Lipschitz continuity during training, which guarantees that the function does not increase distances between inputs, leading to controlled updates in the iterative fixed-point solver [29]. A function is 1-Lipschitz when

| | f (x) - f (y | | \leq | | x - y | |), \forall x, y .

(2)

Spectral normalization constrains the spectral norm of the Jacobian matrix so that its largest singular value is less than 1. This transforms the process to 1-Lipschitz and guarantees fixed-point convergence [30]. By the application of 1-Lipschitz activation functions like ReLU, SoftThreshold, and GroupSort, we preserve bounded transformations and decrease instability [31]. Also, the use of Monotone Operators and Energy-based stability techniques such as Lyapunov Theory [32] provides theoretical guarantees of the convergence of DE layers.

2.1. Forward Pass of DE Layer

Unlike explicit layers that transform the input using their nonlinear mapping function, DE layers output the fixed point

z^{🟉}

, determined by the solving of Equation (1). This root-finding problem can be solved efficiently using numerical methods such as simple fixed-point iteration until convergence, Newton’s and Broyden’s Methods, or Anderson Acceleration. Fixed-point iteration methods apply Equation (1) repeatedly until convergence. The approach is simple and intuitive but may be slow, especially if the function f has a contraction (Lipschitz) constant that is near 1.

Newton’s Method updates z as

z_{t + 1} = z_{t} - {(\nabla_{z} f)}^{- 1} (f (z_{t}, x) - z_{t}),

(3)

where

\nabla_{z} f

is the Jacobian of f. The requirement for the estimation of the inverse of the Jacobian, shown in Equation (3), is often expensive.

To decrease the computation cost, Broyden’s method approximates

{(\nabla_{z} f)}^{- 1}

as J with the application of an extra iterative process:

J_{t + 1} = J_{t} = \frac{(y_{t} - J_{t} s_{t}) s_{t}^{T}}{s_{t} s_{t}^{T}},

(4)

where

s_{t} = z_{t + 1} - z_{t}

,

y_{t} = f (z_{t + 1}, x) - f (z_{t}, x)

. This method avoids direct Jacobian inversion while still achieving fast convergence.

Anderson Acceleration, to reduce oscillations and so enhance the convergence speed, uses a linear transformation of past iteration estimates as:

z_{t + 1} = \sum_{i = 0}^{n} a_{i} f (z - i, x) .

(5)

The coefficients

a_{i}

of Equation (5) are estimated by solving a least squares problem where the objective is minimizing the residual output

r_{t} = f (z_{t}, x) - z_{t}

. The mathematical expression of the optimization problem follows:

\begin{matrix} \underset{a_{i}}{minimize} & | | \sum_{i = 0}^{n} a_{i} r_{t - i} | | \\ subject to & \sum_{i = 0}^{n} a_{i} = 1 . \end{matrix}

(6)

The Anderson method significantly accelerates convergence with the disadvantage of higher memory cost from the storing of past iterates. So, the main advantage of Anderson acceleration over Broyden’s method is its typically faster and more robust convergence, especially in high-dimensional and non-linear fixed-point problems.

2.2. Backward Pass of DE Layer—Implicit Differentiation

The forward pass of DE models requires iterative solvers, making conventional backpropagation infeasible due to the increased storing demand of the intermediate gradients, generated during the process. To overcome the problem, gradients are computed using the Implicit Function Theorem (IFT), a technique that avoids the unrolling of the iterative process [25].

The goal is to compute

\frac{\partial L}{\partial w} = \frac{\partial L}{\partial z^{🟉}} \frac{\partial z^{🟉}}{\partial w},

(7)

where w denotes the parameters of the DE layer. The IFT states that the gradient of the fixed-point

z^{🟉}

with respect to the parameters can be derived without needing the solver’s intermediate layers.

We consider the fixed-point equation

f (z^{🟉}, x; w) - z^{🟉} = 0

, and by taking the total derivative with respect to w and applying the chain rule, we have

\frac{\partial f}{\partial z^{🟉}} \frac{\partial z^{🟉}}{\partial w} + \frac{\partial f}{\partial w} - \frac{\partial z^{🟉}}{\partial w} = 0

(8)

By solving Equation (8) for

\frac{\partial z^{🟉}}{\partial w}

, we get

\frac{\partial z^{🟉}}{\partial w} = {(I - \frac{\partial f}{\partial z^{🟉}})}^{- 1} \frac{\partial f}{\partial w},

(9)

and substituting to Equation (7)

\frac{\partial L}{\partial w} = \frac{\partial L}{\partial z^{🟉}} {(I - \frac{\partial f}{\partial z^{🟉}})}^{- 1} \frac{\partial f}{\partial w} .

(10)

Setting

u = {(I - \frac{\partial f^{T}}{\partial z^{🟉}})}^{- 1} \frac{\partial L}{\partial z^{🟉}}

, Equation (10) becomes

\frac{\partial L}{\partial w} = u^{T} \frac{\partial f}{\partial w} .

(11)

Instead of inverting the matrix

(I - \frac{\partial f^{T}}{\partial z^{🟉}})

directly, which is expensive for large-scale problems, we solve the following equation for u using an iterative solver:

\begin{matrix} (I - \frac{\partial f^{T}}{\partial z^{🟉}}) u = \frac{\partial L}{\partial z^{🟉}} \\ u = \frac{\partial L}{\partial z^{🟉}} + \frac{\partial f^{T}}{\partial z^{🟉}} u . \end{matrix}

(12)

The term

\frac{\partial f}{\partial z^{🟉}}

denotes the Jacobian of f with respect to z evaluated at the fixed point

z^{🟉}

. Also,

\frac{\partial f^{T}}{\partial z^{🟉}} u

is a Jacobian-vector product (JVP), which can be efficiently computed with modern auto-grad programming packages without explicitly forming the full Jacobian. Finally, to estimate u we follow the iterative update rule:

u_{k + 1} = \frac{\partial L}{\partial z^{🟉}} + \frac{\partial f^{T}}{\partial z^{🟉}} u_{k} .

(13)

3. Architecture of DE Layer

The inner architecture of the DE layer is crucial, since it determines its expressiveness and effectiveness and ensures that the model can learn the complex representations of data. Multi-sensor time series data, commonly used in the RUL estimation, have both spatial and temporal dependencies. Indeed, sequential sensor responses are present in the data while variations on the reading of one sensor affect the others. Also, sensor readings change over time, representing the future system’s behavior. These challenges can be addressed using spatio-temporal DL layers, integrating correlation among sensors and capturing long-range dependencies.

Figure 2 provides a visual diagram of the architecture of the proposed framework for RUL estimation. The equilibrium model (DE block) consists of two key components: a convolutional block (Figure 3) and a Dual-Input Interconnection mechanism (Figure 4). Additionally, a Monte Carlo Dropout Feedforward Neural Network is incorporated as the final block, improving both the performance and reliability of the framework.

3.1. Deep Equilibrium Block

As the initial component of the DE block, a convolutional block is employed to capture local temporal patterns and extract hierarchical features from the input sequence. Convolutional Neural Networks (CNNs) consist of multiple convolutional blocks that learn adaptive, hierarchical representations of input visual data automatically. CNNs are well-known for their success in processing visual data using 2D convolutions. However, they are also effective for processing one-dimensional data, such as time series. A 1D convolutional layer applies filters along a single spatial dimension, making it particularly suitable for extracting features from sequential data.

The mathematical formulation of a 1D convolutional layer for an input signal

x \in R^{n}

is defined as:

y [i] = \sum_{j = 1}^{k - 1} (x [i + j] \cdot w [j]), i = 1, 2, . . ., n - k

(14)

where

w \in R^{k}

denotes the convolutional kernel and

y \in R^{n - k + 1}

is the produced output. Furthermore, this operation can be extended to multiple input and output channels and incorporate padding and stride parameters to control both the output size and computational cost [33].

We consider each sensor time series as an input channel and apply shared kernel filters across time steps. In Figure 3, we notice that three convolutional layers are used, each followed by a ReLU activation function. The first convolutional layer increases the number of channels by applying a kernel filter with a size of 1, while the other two extract low-level and high-level temporal features. Also, a residual connection between the last two convolutional layers enhances the learning of diverse representations across time. So, the main goal of the convolutional block is the capture of local patterns in the input and the robust representation of short-term dependencies.

The DE block incorporates a Dual-Input Interconnection mechanism, allowing dynamic feature interaction between two different input mappings. The first input is the latent representation

z_{k} \in R^{d}

that evolves over time, while the second input is the extracted feature mapping

X_{c o n v} \in R^{T x d}

, which is produced by the previous convolutional block and captures short-term spatio-temporal patterns from sensor data. We linearly project

z_{k}

and

x_{c o n v}

using the learnable weight matrices

W_{q}, W_{K}, W_{V}

to obtain the Query, Key and Value as

Q = W_{Q} z_{k}, K = W_{K} x_{c o n v}, V = W_{V} x_{c o n v},

(15)

where

W_{Q}, W_{K}

and

W_{V} \in R^{d x d}

. The transformation of the latent vector

z_{k}

and the convolutional feature mapping

x_{c o n v}

to Query Q, Key K and Value V allows their interaction since it is a projection to a shared embedding space. The Query represents the information that the model is looking for, based on its internal understanding, which is the latent vector

z_{k}

. The Key provides an encoding of the input mapping

x_{c o n v}

, while the Value contains the information of the input that will be operated, based on the interaction between the Query and the Key.

In the sequel, we estimate the interaction weight A as

A = S o f t m a x (\frac{Q K^{T}}{\sqrt{d}}) = S o f t m a x (\frac{W_{Q} z_{k} {(W_{K} x_{c o n v})}^{T}}{\sqrt{d}}),

(16)

representing the weighted similarity score between Query and Key and providing information about the relevance of each part of the input to the current state of the system. Also, the Softmax layer transforms the scores into a probability distribution over the input vector mapping, giving focus to the most important parts of the latter. The final Dual Interconnection Output is computed as:

z_{k + 1} = W_{O} (A V + z_{k}) = W_{O} (S o f t m a x (\frac{W_{Q} z_{k} {(W_{K} x_{c o n v})}^{T}}{\sqrt{d}}) W_{V} x_{c o n v} + z_{k}),

(17)

where

W_{O}

is a learnable weight matrix. The value projection V of the input in the latent space is updated based on the attention score A, amplifying the most important features. Finally, the result is connected to the latent representation vector

z_{k}

, establishing a residual connection. The residual connection prevents information loss and enhances the flow of gradients during training. So, through the adaptive process described in Equation (17) and illustrated in Figure 4, the latent space vector

z_{k}

is updated dynamically, combining long-term dependency capture and the fusion of multi-channel input information.

Figure 2 shows the components of the proposed model where we can observe the use of ReLU activation functions and Group Normalization (GroupNorm) layers in the inputs of the DE layer to achieve stability during training and accelerate the convergence speed. GroupNorm layers [34] divide the feature mapping into groups and normalize each group independently. GroupNorm is more robust than its alternative Batch Normalization since it does not depend on batch statistics. With the use of GroupNorn, we maintain the spatial coherence of the convolutional block output by normalizing a collection of channels rather than the complete feature mapping or each individual channel. Also, the combination of ReLU activation functions, which are 1-Lipschitz, with GrouNorm layers ensures non-expansive outputs while maintaining their ability to represent complex patterns.

3.2. Monte Carlo Dropout Feed-Forward Neural Network

As the final element of the regression model, we employ a Monte Carlo dropout feed-forward Neural Network with three linear layers. The outputs of the first two are inserted into ReLU activation functions and dropout layers. The difference is that during the inference we keep the dropout enabled and pass the same input through the network multiple times. Finally, we estimate the mean of the outputs as the final prediction and the standard deviation as uncertainty. In that way, we obtain for each input an uncertainty about the prediction of the model, as a measure of its confidence.

Also, to improve the reliability of the Monte Carlo Dropout Network, a calibration method is applied with the dynamic adjustment of the uncertainties. The calibration technique is performed in a validation set, which is derived from the training dataset.

Also, we do not apply a single scaling factor for all uncertainties, but we group the validation samples based on their mean predictions and standard deviations and subsequently estimate a different scaling factor for each group. We assume that the uncertainty of the model should be increased in certain working conditions where it predicts with large errors. In the same way, smaller errors result in less uncertainty inflation. To find these regions, we use the Gaussian Mixture Clustering Method, exploiting its strength to build robust elliptical clusters in a probabilistic way. The adjustment of the predicted standard deviations, within each cluster, is designed such that approximately 95% of the actual target values fall within the estimated prediction interval. This adaptive scaling method ensures that the uncertainty estimates accurately reflect its reliability, providing trustworthy confidence intervals.

The mathematical formulation of the uncertainties’ calibration method is as follows. Let

{\hat{μ}}_{i}

,

{\hat{σ}}_{i}

and

y_{i}

be the predicted mean, standard deviation and true target for sample i, accordingly. We want to adjust

{\hat{σ}}_{i}

by a scaling factor k such that the probability of

y_{i}

falls within the range

[{\hat{μ}}_{i} - k {\hat{σ}}_{i}, {\hat{μ}}_{i} + k {\hat{σ}}_{i}]

, which equals our predefined confidence level

α

.

P (y_{i} \in [{\hat{μ}}_{i} - k {\hat{σ}}_{i}, {\hat{μ}}_{i} + k {\hat{σ}}_{i}]) \approx α .

(18)

We use the validation dataset and build a Gaussian Mixture Model (GMM) to assign each validation sample to a cluster based on the prediction means and standard deviations of the Monte Carlo Dropout Network. Finally, to find the optimum scaling factor

k_{f}^{*}

for each cluster f, we solve

k_{f}^{*} = a r g m i n_{k} (| α - \frac{1}{N_{f}} \sum_{i \in b i n_{f}} 1 {y_{i} \in [{\hat{μ}}_{i} - k {\hat{σ}}_{i}, {\hat{μ}}_{i} + k {\hat{σ}}_{i}]} |),

(19)

where

1 {.}

is the indicator function and

N_{f}

is the number of samples in cluster f. At inference, we assign each point j to a cluster g using the GMM and scale the uncertainties as

σ_{j}^{s c a l e d} = k_{g}^{*} * σ_{j}

.

4. Experimental Analysis and Results

4.1. Description of Dataset—CMAPSS Dataset

To test and validate the effectiveness of the proposed Deep Equilibrium Neural Framework for RUL prediction, we use the CMAPSS (Commercial Modular Aero-Propulsion System Simulation) dataset provided by the NASA data repository as an evaluation benchmark [28]. The dataset provides realistic simulations of aircraft engine degradations over time and contains four sub-datasets (FD001, FD002, FD003 and FD004), each corresponding to various operational conditions and fault modes. The dataset includes a combination of three operational parameters (altitude, Mach number, and fuel flow), which create diverse degradation conditions. Also, fault modes refer to the mechanisms that lead to the degradation of the aircraft engine. In the CMAPSS dataset, two specific fault modes are represented: High-Pressure Compressor (HPC) Degradation and Fan Degradation. The training dataset consists of run-to-failure simulations, where each engine starts from a healthy condition and gradually degrades until failure. So, the last sample for each engine is considered broken, meaning that the target RUL equals zero. On the contrary, in the testing dataset, the engine simulations are terminated at a point before overall failure with predefined given RUL values. Table 1 and Table 2 provide a detailed description of the characteristics of each sub-dataset and information about the sensor readings, accordingly.

4.2. Data Pre-Processing

The data pre-processing process is crucial and can be broken down into smaller parts. Initially, the multivariate time-series data is normalized so that the response of each sensor follows a standard normal with a mean of 0 and a standard deviation of 1. The fact that FD002 and FD004 sub-datasets relate to various operational conditions is of important consideration during the normalization process. To enhance data representation at these two sub-datasets, we first apply the k-means clustering algorithm according to the operational conditions, and then we normalize each cluster independently. Therefore, denoting that the

x_{i}

sample point belongs to the cth operational cluster, the normalized process is given by the equation

x_{n o r m}^{i} = \frac{x^{i} - x_{m e a n}^{c}}{σ_{m e a n}^{c}},

(20)

where

x_{m e a n}^{c}

and

σ_{m e a n}^{c}

are the mean and the std of the cluster, respectively.

In the following, a selection process of the features takes place, since various sensor readings in the dataset remain constant over time or provide meaningful information about the degradation process. The Random Forest Ensemble Regressor algorithm is used to analyze and detect the importance of each feature. The Random Forest Ensemble builds a set of multiple decision trees, where each one is trained using a random subset of the data and a random subset of features at each split. The final estimation is the mean of the prediction from all the individual trees. The significance of each feature is evaluated as a measure of its contribution to the reduction in the variance of predictions (Mean Decrease in Impurity—MDI) [35]. This method is selected due to its robustness and ability to handle non-linear relationships. Other feature selection methods, such as mutual information or recursive feature elimination (RFE) could also be applied, leading to alternative feature subsets and influencing model performance. A comparative analysis of feature selection strategies is left as future work, to explore whether such alternatives could present improvements in RUL estimation accuracy, especially under varying operational conditions.

The next step of pre-processing implies the use of the Exponentially Weighted Average Smoothing (EWAS) function in each sensor reading. The CMAPSS time-series data as measurements of sensors in turbofan engines contains high-frequency noise and fluctuations. By the application of the EWAS method, we smooth out noise, remove sudden spikes and improve the identification of degradation trends in the time-series data. The EWAS process is described by the following formula:

s_{t} = \frac{β s_{t - 1} + (1 - β) x_{t}}{1 - β^{t}},

(21)

where

β \in [0, 1]

is the smoothing factor, and

x_{t}

and

s_{t}

are the sensor reading and the smoothed value at time t, respectively.

A common practice in the literature is to limit the maximum value of RUL, considering that for a starting time, the engines are considered healthy until a breakdown occurs. The operation of the engine after the breakdown drives linear decreases in the RUL values. To have fair comparisons with papers that examine the performance on the same dataset, we adopt this practice and set the maximum value of RUL as 125 engine cycles. Also, as a final preprocessing step, we normalize each target value by dividing it by the maximum value, resulting in a target range of [0,1] for the regression task.

4.3. Setting Hyper-Parameters and Configuration

We construct the input of the DE model using 15 subsequent sensor readings, so the window interval of the input is set to 15. During the feature selection process, we use a significance threshold of

10^{- 3}

, meaning that features that contribute less than this amount are rejected. In the experimental setup, to effectively remove noise and capture the degradation trends, we set the smoothing factor of EWAS to

0.98

. Figure 5 shows the smoothing process for the FD001 sub-dataset.

We set the training batch size to 256. The Adam optimizer is used to train the DE model with a learning rate of

10^{- 3}

. Also, a reduction scheduler of the learning rate by

0.5

is applied every 10 epochs. The training is completed after a fixed number of 35 epochs.

Given that the number of input channels is D, each convolutional block utilizes

2 \cdot D

filters. For each sub-dataset, the number of input channels D varies due to the applied feature selection process. Therefore, the output of the concatenate layer in the convolutional block extracts a feature mapping of

4 \cdot D

channels. Also, GroupNorm layers employ 4 normalization groups and the linear layers of the Dual-Input Interconnection Block do not change the dimension of the feature mapping. Consequently, the dropout ratio of the Monte Carlo feedforward NN is set to

0.4

.

Finally, to estimate the fixed-point of the DE model, we use the Anderson Acceleration method. In the utilization of Anderson Acceleration, which enhances the convergence of the forward pass of the DE model, we set the maximum number of iterations to 200 and the relative residual tolerance to

10^{- 4}

. The weights of the linear projections in the Dual-Input Interconnection Block are initialized with random values drawn from a normal distribution centered at 0, with a standard deviation of 0.01.

The hyper-parameters (learning rate, batch size, dropout ratio, number of convolutional filters, attention projection dimension) were selected by a combination of trial-and-error experimentation and reference to values commonly reported in related literature [1,2,3,12,17]. A limited manual tuning process was employed on a validation split from the training data, where the performance metric of RMSE was monitored. A full grid search of the hyper-parameters can further improve the performance of the RUL estimation framework, but in this study, it was not conducted.

4.4. Experimental Results

To evaluate and compare the performance of the proposed DE model against state-of-the-art frameworks, we use two standard metrics: the Root Mean Square Error (RMSE) and the PHM08 scoring metric. The RMSE metric measures the average magnitude of the prediction error, is sensitive to large deviations and intuitively informs us how far, on average, the model’s predictions are from the true RUL values. The PHM08 score is a widely adopted tool for evaluating RUL predictions since it penalizes both early and late predictions with the application of different scaling factors to each type of error. In the current research paper, we used

α_{1} = 10

for early predictions and

α_{2} = 13

for late predictions. By setting

α_{2}

higher than

α_{1}

, the PHM08 score penalizes late predictions, an important aspect in real-world scenarios.

r m s e = \frac{1}{N} \sum_{i = 1}^{N} {(\hat{y} - y)}^{2} s c o r e = \{\begin{matrix} \sum_{i = 1}^{n} e^{- (\frac{d}{α_{1}})} - 1 & for d = \hat{y} - y < 0 \\ \sum_{i = 1}^{n} e^{(\frac{d}{α_{2}})} - 1 & for d = \hat{y} - y \geq 0 \end{matrix}

(22)

Table 3 presents the performance of various state-of-the-art frameworks in predicting RUL for the CMAPSS dataset. Comparing the results, we notice that the proposed DE model achieves better performance, particularly in the more challenging sub-datasets FD002 and FD004, which operate in multiple working conditions. Moreover, for the extremely difficult sub-dataset FD004, the presented model shows an improvement in the RMSE metric by

13 %

compared to the second-best model and by

36 %

compared to the average performance of the models under comparison. Also, results on the PHM08 scoring metric indicate that the model performs better in both early and late predictions. An interesting comparison is with the Neural ODE model, which represents implicit deep learning approaches. We observe that our proposed DEM consistently outperforms Neural ODE, demonstrating its ability to capture degradation patterns in RUL prediction.

Figure 6 displays the outcome of the DE model alongside the actual RUL values for the testing samples from each sub-dataset. The shaded red region indicates the

95 %

confidence interval, which is determined by the calibrated standard deviation of the predictions. The plots show that the model effectively tracks the degradation trend of the engines, especially in the critical final stages before failure. Furthermore, the green scatter points in the plot indicate predictions that fall within the confidence interval. On the contrary, the red scatter points correspond to a significant prediction error, even outside the interval of confidence.

4.5. Comparison Without Monte Carlo Dropout Technique

The final component of the DE model is a Monte Carlo dropout neural model with three linear layers. By utilizing the MC dropout technique, we prevent over-fitting of the network during training, enhance generalization, and provide a confidence interval. Table 4 presents the performance of the DE model with and without Monte Carlo dropout enabled during inference across the four sub-datasets. By the observation of the table, we notice that the utilization of MCD during inference slightly improves the performance, achieving lower RMSE and score values. The application of MC dropout not only enhances the model’s predictive accuracy but also provides a crucial reliability tool, which is highly valuable for real-world PHM applications.

4.6. Effectiveness of the Proposed Calibration Method on Predictive Uncertainty

Table 5 evaluates the prediction interval coverage before and after applying the proposed calibration method across the four sub-datasets. As coverage performance, we define the ratio of true RUL target values that fall within the confidence intervals of the predictions. We notice that the proposed calibration method, based on GMM clustering, improves significantly for all sub-datasets the coverage ratio, constructing prediction intervals that better reflect the true variability of the data and the model’s uncertainty. Indeed, the confidence coverage has been increased for

18 %

for sub-dataset FD001,

33.6 %

for sub-dataset FD002,

32 %

for sub-dataset FD003 and

37.1 %

for sub-dataset FD004. So, the proposed overall DE model enhances maintenance decisions by providing not only predictions but also reliable confidence intervals.

4.7. Convergence Behavior of the Deep Equilibrium (DE) Model During Training

Figure 7 demonstrates the convergence dynamics of the proposed DE model during training. The top sub-plot illustrates the residual errors as the model converges to the fixed point for each training epoch. The bottom one shows the number of fixed-point iterations needed for convergence for each training epoch.

As we can observe, the residual errors remain consistently low after the initial epochs. Also, the number of convergence iterations for all sub-datasets is significantly smaller than the maximum number of 200 iterations set for the Andersson Acceleration algorithm. So, we notice that the DE model reliably reaches a fixed point with minimal error at a small number of iterations. Therefore, the steady residuals and the low count of iterations even for FD002 and FD004 sub-datasets, where the operating conditions vary, show the robustness of the proposed DE model during the training process.

4.8. Computational Cost and Time Overhead

To evaluate the practicality and feasibility in real-world applications, we present an analysis of the computation requirements of the proposed Bayesian Deep Equilibrium framework in the FD001 sub-dataset of CMAPSS. The model’s training was conducted in a system with an NVIDIA GeForce GTX 1060 with 6 GB, utilizing a batch size of 256 over 35 epochs. The sequence window is set to 15, building a model with

21, 955

trainable parameters, making it lightweight and suitable even for low-resource environments. The training time interval was 17 min, which corresponds to an average of

29.5

s per epoch.

Also, a computational analysis of the inference process is provided. To estimate predictive uncertainty during inference, we use 100 Monte Carlo forward passes in the Bayesian framework. This increased the overhead during inference by around

25 %

. However, the response time for each engine is

0.008

s, making the model suitable for real-time industrial applications.

4.9. Failure Analysis on the CMAPSS Dataset

It is crucial to provide deeper insights into the robustness of the DE model observed within the CMAPSS dataset and analyze the specific conditions where its predictive performance decreased. Observing the performance across the four CMAPSS sub-datasets, we notice increased sensitivity and higher error rates for datasets with multiple operating conditions. The model has difficulties with engines that show sudden degradation patterns or irregular sensor responses in different operating modes. Furthermore, sensor noise and sudden measurement anomalies are commonly encountered in realistic aircraft engine operations, disrupting the equilibrium convergence process and resulting in significant prediction inaccuracies. The model’s reliance on strict Lipschitz constraints for stability further contributes to sensitivity since small deviations can considerably affect performance, underscoring the need for careful parameter initialization, regularization, and normalization strategies in practical applications.

5. Conclusions and Future Work

In the current research paper, a Deep Equilibrium-based architecture is proposed to predict the Remaining Useful Life of turbofan jet engines. The proposed framework achieves consistent performance improvements across different CMAPSS sub-datasets, particularly in complex scenarios involving multiple fault modes and operating conditions. These results suggest that the method is a competitive and reliable alternative among recent approaches. A convolutional block of layers is employed to extract hierarchical features that capture local temporal patterns and a novel Dual-Input Interconnection Attention-based Layer integrates the correlation among sensors and captures long-range dependencies. Finally, a novel calibrated Monte Carlo Dropout Network enhances generalization capability and provides a confidence interval as an extra health management tool. Future work could focus on improving the inner architecture of the Deep Equilibrium Model and developing a calibrated probabilistic model to provide more reliable uncertainties in the final stages of prediction. Also, the limitations identified through the failure analysis of the CMAPSS dataset present valuable opportunities for future research aimed at enhancing the reliability of the DEM framework. Future work could examine advanced approaches to handle irregular or rapid engine degradation scenarios frequently encountered in datasets such as CMAPSS FD002 and FD004.

Author Contributions

Methodology, S.P. and Y.S.B.; Software, S.P.; Validation, S.P.; Investigation, S.P. and Y.S.B.; Writing—original draft, S.P.; Visualization, S.P.; Supervision, Y.S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data and source code supporting the findings of this study are available at: https://github.com/SpirosPlak/DeepEquilibriumModel_4_RUL.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zio, E. Prognostics and Health Management (PHM): Where are we and where do we (need to) go in theory and practice. Reliab. Eng. Syst. Saf. 2022, 218, 108119. [Google Scholar] [CrossRef]
Kumar, S.; Raj, K.K.; Cirrincione, M.; Cirrincione, G.; Franzitta, V.; Kumar, R.R. A Comprehensive Review of Remaining Useful Life Estimation Approaches for Rotating Machinery. Energies 2024, 17, 5538. [Google Scholar] [CrossRef]
Ferreira, C.; Gonçalves, G. Remaining Useful Life prediction and challenges: A literature review on the use of Machine Learning Methods. J. Manuf. Syst. 2022, 63, 550–562. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, Y.; Addepalli, S. Remaining Useful Life Prediction using Deep Learning Approaches: A Review. Procedia Manuf. 2020, 49, 81–88. [Google Scholar] [CrossRef]
Berghout, T.; Benbouzid, M. A Systematic Guide for Predicting Remaining Useful Life with Machine Learning. Electronics 2022, 11, 1125. [Google Scholar] [CrossRef]
Alfarizi, M.G.; Tajiani, B.; Vatn, J.; Yin, S. Optimized Random Forest Model for Remaining Useful Life Prediction of Experimental Bearings. IEEE Trans. Ind. Inform. 2023, 19, 7771–7779. [Google Scholar] [CrossRef]
Maulana, F.; Starr, A.; Ompusunggu, A.P. Explainable Data-Driven Method Combined with Bayesian Filtering for Remaining Useful Lifetime Prediction of Aircraft Engines Using NASA CMAPSS Datasets. Machines 2023, 11, 163. [Google Scholar] [CrossRef]
Isbilen, F.; Bektas, O.; Avsar, R.; Konar, M. Improved machine learning models with a similarity-based approach for remaining useful life prediction. Aeronaut. J. 2025, 129, 485–505. [Google Scholar] [CrossRef]
Khelif, R.; Chebel-Morello, B.; Malinowski, S.; Laajili, E.; Fnaiech, F.; Zerhouni, N. Direct Remaining Useful Life Estimation Based on Support Vector Regression. IEEE Trans. Ind. Electron. 2017, 64, 2276–2285. [Google Scholar] [CrossRef]
Jia, Z.; Xiao, Z.; Shi, Y. Remaining Useful Life Prediction of Equipment Based on XGBoost. In Proceedings of the 5th International Conference on Computer Science and Application Engineering (CSAE ’21), Sanya, China, 19–21 October 2021; Association for Computing Machinery: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Karatzinis, G.D.; Boutalis, Y.S.; Van Vaerenbergh, S. Aircraft engine remaining useful life prediction: A comparison study of Kernel Adaptive Filtering architectures. Mech. Syst. Signal Process. 2024, 218, 111551. [Google Scholar] [CrossRef]
Muneer, A.; Taib, S.M.; Naseer, S.; Ali, R.F.; Aziz, I.A. Data-Driven Deep Learning-Based Attention Mechanism for Remaining Useful Life Prediction: Case Study Application to Turbofan Engine Analysis. Electronics 2021, 10, 2453. [Google Scholar] [CrossRef]
Xia, J.; Feng, Y.; Lu, C.; Fei, C.; Xue, X. LSTM-based multi-layer self-attention method for remaining useful life estimation of mechanical systems. Eng. Fail. Anal. 2021, 125, 105385. [Google Scholar] [CrossRef]
Zhang, J.; Jiang, Y.; Wu, S.; Li, X.; Luo, H.; Yin, S. Prediction of remaining useful life based on bidirectional gated recurrent unit with temporal self-attention mechanism. Reliab. Eng. Syst. Saf. 2022, 221, 108297. [Google Scholar] [CrossRef]
Chui, K.T.; Gupta, B.B.; Vasant, P. A Genetic Algorithm Optimized RNN-LSTM Model for Remaining Useful Life Prediction of Turbofan Engine. Electronics 2021, 10, 285. [Google Scholar] [CrossRef]
Ma, M.; Mao, Z. Deep-Convolution-Based LSTM Network for Remaining Useful Life Prediction. IEEE Trans. Ind. Inform. 2021, 17, 1658–1667. [Google Scholar] [CrossRef]
Li, H.; Zhao, W.; Zhang, Y.; Zio, E. Remaining useful life prediction using multi-scale deep convolutional neural network. Appl. Soft Comput. 2020, 89, 106113. [Google Scholar] [CrossRef]
Xie, Z.; Du, S.; Lv, J.; Deng, Y.; Jia, S. A Hybrid Prognostics Deep Learning Model for Remaining Useful Life Prediction. Electronics 2021, 10, 39. [Google Scholar] [CrossRef]
Costa, N.; Sánchez, L. Variational encoding approach for interpretable assessment of remaining useful life estimation. Reliab. Eng. Syst. Saf. 2022, 222, 108353. [Google Scholar] [CrossRef]
Wang, Y.; Xu, Y.; Yang, J.; Wu, M.; Li, X.; Xie, L.; Chen, Z. Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data. arXiv 2024, arXiv:2309.05305. Available online: http://arxiv.org/abs/2309.05305 (accessed on 4 June 2025). [CrossRef]
Zhang, X.; Guo, Y.; Shangguan, H.; Li, R.; Wu, X.; Wang, A. Predicting remaining useful life of a machine based on embedded attention parallel networks. Mech. Syst. Signal Process. 2023, 192, 110221. [Google Scholar] [CrossRef]
Star, M.; McKee, K. Remaining Useful Life Estimation Using Neural Ordinary Differential Equations. Int. J. Progn. Health Manag. 2021, 12, 1–15. [Google Scholar] [CrossRef]
Chen, T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D. Neural Ordinary Differential Equations. arXiv 2018, arXiv:1806.07366. Available online: http://arxiv.org/abs/1806.07366 (accessed on 4 June 2025).
Agrawal, A.; Amos, B.; Barratt, S.T.; Boyd, S.P.; Diamond, S.; Kolter, J.Z. Differentiable Convex Optimization Layers. arXiv 2019, arXiv:1910.12430. Available online: http://arxiv.org/abs/1910.12430 (accessed on 4 June 2025).
Bai, S.; Kolter, J.Z.; Koltun, V. Deep Equilibrium Models. arXiv 2019, arXiv:1909.01377. Available online: http://arxiv.org/abs/1909.01377 (accessed on 4 June 2025).
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning—Volume 48 (ICML’16), New York, NY, USA, 19–24 June 2016; JMLR.org: New York, NY, USA, 2016; pp. 1050–1059. [Google Scholar]
Choubineh, A.; Chen, J.; Coenen, F.; Ma, F. Applying Monte Carlo Dropout to Quantify the Uncertainty of Skip Connection-Based Convolutional Neural Networks Optimized by Big Data. Electronics 2023, 12, 1453. [Google Scholar] [CrossRef]
Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage propagation modeling for aircraft engine run-to-failure simulation. In Proceedings of the 2008 International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; pp. 1–9. [Google Scholar] [CrossRef]
Revay, M.; Wang, R.; Manchester, I.R. Lipschitz Bounded Equilibrium Networks. arXiv 2020, arXiv:2010.01732. Available online: http://arxiv.org/abs/2010.01732 (accessed on 4 June 2025).
Bai, S.; Koltun, V.; Kolter, Z. Stabilizing Equilibrium Models by Jacobian Regularization. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Meila, M., Zhang, T., Eds.; PMLR: Birmingham, UK, 2021; Volume 139, pp. 554–565. [Google Scholar]
Tanielian, U.; Biau, G. Approximating Lipschitz continuous functions with GroupSort neural networks. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Virtual, 13–15 April 2021; Banerjee, A., Fukumizu, K., Eds.; PMLR: Birmingham, UK, 2021; Volume 130, pp. 442–450. [Google Scholar]
Chu, H.; Wei, S.; Liu, T.; Zhao, Y.; Miyatake, Y. Lyapunov-Stable Deep Equilibrium Models. Proc. AAAI Conf. Artif. Intell. 2024, 38, 11615–11623. [Google Scholar] [CrossRef]
Krichen, M. Convolutional Neural Networks: A Survey. Computers 2023, 12, 151. [Google Scholar] [CrossRef]
Wu, Y.; He, K. Group Normalization. Int. J. Comput. Vis. 2018, 128, 742–755. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]

Figure 1. (a) Architecture of DE Layer. (b) Unfolding process until convergence.

Figure 2. Overall architecture of the proposed Deep Equilibrium Model (DEM). The input signal is processed by a Convolutional Module, then combined with the latent state in the Dual-Input Interconnection Block. The updated latent space is used for prediction via a Monte Carlo Dropout Feedforward Neural Network.

Figure 3. Internal structure of the Convolutional Block, composed of three convolutional layers with ReLU activations and one residual connection. This module is used within the Deep Equilibrium Model.

Figure 4. Architecture of the Dual-Input Interconnection Block used within the Deep Equilibrium Model. This block enables adaptive information exchange between the latent state

z_{k}

and the convolutional input features

x_{c o n v}

via a cross-attention mechanism.

Figure 4. Architecture of the Dual-Input Interconnection Block used within the Deep Equilibrium Model. This block enables adaptive information exchange between the latent state

z_{k}

and the convolutional input features

x_{c o n v}

via a cross-attention mechanism.

Figure 5. Smoothed sensor signals from the FD001 dataset. The blue lines represent the original normalized sensor readings, while the orange curves show the smoothed trends over time (cycles). Each subplot corresponds to a selected sensor (s2–s21), where we observe the degradation pattern in each case.

Figure 6. True vs. predicted RUL for the testing sub-datasets of CMAPSS. (a) Sub-dataset FD001. (b) Sub-dataset FD002. (c) Sub-dataset FD003. (d) Sub-dataset FD004.

Figure 7. Convergence behavior during training of DE model (top) residuals over training epochs; (bottom) average number of fixed-point iterations per epoch.

Table 1. Information about CMAPSS sub-datasets.

Attribute	FD001	FD002	FD003	FD004
# Engines of Training Dataset	100	260	100	249
# Engines of Testing Dataset	100	259	100	248
# Samples of Training Dataset	17,731	48,819	21,820	57,522
# Samples of Testing Dataset	4086	42,128	6052	24,638
Degradation Modes	HPC	HPC	HPC & Fan	HPC & Fan
# Operational Conditions	1	6	1	6

Table 2. Sensor reading information of CMAPSS dataset.

Sensor	Symbol	Description
S1	T2	Total temperature at fan inlet
S2	T24	Total temperature at LPC outlet
S3	T30	Total temperature at HPC outlet
S4	T50	Total temperature at LPT outlet
S5	P2	Pressure at fan inlet
S6	P15	Total pressure in bypass-duct
S7	P30	Total pressure at HPC outlet
S8	Nf	Physical fan speed
S9	Nc	Static pressure at HPC outlet
S10	Epr	Engine pressure ratio (P50/P2)
S11	Ps30	Static pressure at HPC outlet
S12	phi	Ratio of fuel flow to Ps30
S13	NRf	Corrected fan speed
S14	NRc	Corrected core speed
S15	BPR	Bypass ratio
S16	farB	Burner fuel-air ratio
S17	htBleed	Bleed enthalpy
S18	Nf_dmd	Demanded fan speed
S19	PCNR_dmd	Demanded corrected fan speed
S20	W31	HPT coolant bleed
S21	W32	LPT coolant bleed

Table 3. Comparison with state-of-the-art frameworks on CMAPSS dataset.

Method	Year	FD001		FD002		FD003		FD004
Method	Year	RMSE	Score	RMSE	Score	RMSE	Score	RMSE	Score
Multi-Scale CNN [17]	2020	11.44	196.22	19.35	3747.00	11.67	241.87	22.22	4844.00
Hybrid Prognostics DL Model [18]	2020	12.67	262.71	16.19	1401.95	12.82	333.79	19.15	2282.23
LSTM Multi Layer Attention [13]	2021	11.56	252.86	14.02	899.18	12.13	370.39	17.21	1558.48
Genetic Algorithm RNN-LSTM [15]	2021	11.19	-	19.33	-	11.47	-	19.74	-
Variational Encoding [19]	2022	13.42	323.82	14.92	1379.17	12.51	256.36	16.37	1845.99
Bidirectional GRU with Temporal Attention [14]	2022	12.56	213.35	18.94	2264.13	12.45	232.86	20.47	3610.34
Embedded Attention Parallel Network [21]	2023	12.11	245.32	15.68	1126.49	12.52	266.69	18.12	2050.72
Neural ODE [22]	2023	13.65	235.00	14.30	886	12.56	270	15.06	947
Spatial-Temporal Neural Graph [20]	2024	11.62	203.00	13.04	738.00	11.52	198.00	13.62	816.00
DE Model (proposed)	2025	10.06	175.98	11.96	776.86	11.19	227.31	11.82	648.08

Table 4. Analysis of Monte Carlo dropout.

Method	FD001		FD002		FD003		FD004
Method	RMSE	Score	RMSE	Score	RMSE	Score	RMSE	Score
DE model with MC dropout disabled	10.48	189.67	12.16	809.30	11.25	235.46	11.81	648.47
DE model with MC dropout enabled	10.06	175.98	11.96	776.86	11.19	227.31	11.82	648.08

Table 5. Prediction interval coverage before and after calibration on C-MAPSS datasets.

Method	FD001	FD002	FD003	FD004
Without Calibration (%)	73.00	59.85	63.00	56.45
With Proposed Calibration (%)	91.00	93.44	95.00	93.55

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Plakias, S.; Boutalis, Y.S. A Deep Equilibrium Model for Remaining Useful Life Estimation of Aircraft Engines. Electronics 2025, 14, 2355. https://doi.org/10.3390/electronics14122355

AMA Style

Plakias S, Boutalis YS. A Deep Equilibrium Model for Remaining Useful Life Estimation of Aircraft Engines. Electronics. 2025; 14(12):2355. https://doi.org/10.3390/electronics14122355

Chicago/Turabian Style

Plakias, Spyridon, and Yiannis S. Boutalis. 2025. "A Deep Equilibrium Model for Remaining Useful Life Estimation of Aircraft Engines" Electronics 14, no. 12: 2355. https://doi.org/10.3390/electronics14122355

APA Style

Plakias, S., & Boutalis, Y. S. (2025). A Deep Equilibrium Model for Remaining Useful Life Estimation of Aircraft Engines. Electronics, 14(12), 2355. https://doi.org/10.3390/electronics14122355

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Equilibrium Model for Remaining Useful Life Estimation of Aircraft Engines

Abstract

1. Introduction

2. Deep Equilibrium Models

2.1. Forward Pass of DE Layer

2.2. Backward Pass of DE Layer—Implicit Differentiation

3. Architecture of DE Layer

3.1. Deep Equilibrium Block

3.2. Monte Carlo Dropout Feed-Forward Neural Network

4. Experimental Analysis and Results

4.1. Description of Dataset—CMAPSS Dataset

4.2. Data Pre-Processing

4.3. Setting Hyper-Parameters and Configuration

4.4. Experimental Results

4.5. Comparison Without Monte Carlo Dropout Technique

4.6. Effectiveness of the Proposed Calibration Method on Predictive Uncertainty

4.7. Convergence Behavior of the Deep Equilibrium (DE) Model During Training

4.8. Computational Cost and Time Overhead

4.9. Failure Analysis on the CMAPSS Dataset

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI