Health Assessment for Gas Turbines Based on Domain-Adversarial Neural Network

Ma, Bingzhou; Fu, Xueting; Lu, Feng; Deng, Daming; An, Haoran; Li, Qiuhong

doi:10.3390/aerospace13040332

Open AccessArticle

Health Assessment for Gas Turbines Based on Domain-Adversarial Neural Network

by

Bingzhou Ma

^1,2,

Xueting Fu

³,

Feng Lu

¹

,

Daming Deng

¹,

Haoran An

¹ and

Qiuhong Li

^1,*

¹

Jiangsu Province Key Laboratory of Aerospace Power System, College of Energy and Power Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

²

Jiangsu Huaiyin Power Generation Co., Ltd., Huai’an 223002, China

³

Jiangsu Xinhai Power Generation Co., Ltd., Lianyungang 222023, China

^*

Author to whom correspondence should be addressed.

Aerospace 2026, 13(4), 332; https://doi.org/10.3390/aerospace13040332

Submission received: 29 January 2026 / Revised: 20 March 2026 / Accepted: 30 March 2026 / Published: 2 April 2026

(This article belongs to the Section Aeronautics)

Download

Browse Figures

Versions Notes

Abstract

To address the challenges of limited access to full-life-cycle data and insufficient labeled samples in gas turbine health management, a Bidirectional Long Short-Term Memory-Domain Adversarial Neural Network (BiLSTM-DANN) is adopted to achieve cross-domain health assessment for gas turbines. The model extracts temporal health features with a two-layer BiLSTM network and integrates DANN to achieve cross-domain feature alignment, thereby learning domain-invariant health representations. The simulation results demonstrate that the BiLSTM-DANN model outperforms the traditional BiLSTM and DCNN models on both the FD001 and FD003 datasets of C-MAPSS. Health assessment tests conducted on real gas turbine operation datasets indicate that the BiLSTM-DANN model can effectively depict the long-term operational health evolution trend of the entire unit and accurately reflect the health changes of the gas turbine before and after water washing. Therefore, the method studied in this paper provides a transferable solution for assessing the health of the entire gas turbine under conditions of scarce labels.

Keywords:

gas turbine; health assessment; DANN; BiLSTM

1. Introduction

With the continuous growth of energy demand and the rapid development of energy infrastructure, gas turbines (GTs), as core power equipment in fields such as aviation, marine propulsion, industrial power generation, and long-distance natural gas pipeline boosting, have become increasingly strategically important [1,2]. As complex rotating machinery operating under high-temperature, high-pressure, and high mechanical stress conditions, GTs inevitably undergo progressive performance degradation during long-term service due to mechanisms such as fouling, wear, erosion, and thermal damage [3]. The gradual degradation of key components, including the compressor, combustor, and turbine, alters the thermodynamic matching relationship of the entire unit, leading to reduced efficiency, decreased power output, increased fuel consumption, and even unplanned shutdowns. Therefore, accurate evaluation and prediction of GT health state are of great significance for achieving intelligent operation and maintenance, predictive maintenance, and ensuring the safe and economical operation of GT systems.

However, most current research on GT health management focuses on fault diagnosis and abnormal detection of specific components. For example, Wang L et al. [4,5] systematically analyzed gas path fault diagnosis of GTs from aspects of thermodynamic models and knowledge-based models, demonstrating the advantages of physics-based methods in locating gas path performance deviations. The core idea is to compare deviations between actual values and standard values to determine faults, but both small deviation methods and nonlinear diagnosis methods suffer from complex modeling, susceptibility to external interference, and low model accuracy. Yan et al. [6] developed a semi-supervised anomaly detection framework for GT combustors based on deep representation learning from multivariate time-series measurement data, showing that data-driven monitoring methods are still capable of detecting early combustor anomalies even when fault samples are scarce. Tang Jingpeng et al. [7] proposed a fault diagnosis method for GT rotors using deep transfer learning, transferring a deep learning model pre-trained on bearing datasets to the GT rotor domain for fault diagnosis, demonstrating the potential of cross-domain feature transfer in data-scarce scenarios. Sarwar et al. [8,9] effectively improved the accuracy and efficiency of fault diagnosis using machine learning methods such as artificial neural networks and principal component analysis, fully proving the applicability of data-driven methods in this field. Bunyan et al. [10] conducted research on gas turbine thermal condition monitoring and predictive maintenance based on machine learning. By extracting features from exhaust temperature data and combining them with the XGBoost model, they achieved effective discrimination between healthy and faulty states, further demonstrating the application potential of data-driven methods in GT health management. An et al. [11] proposed a GT fault diagnosis method based on sliding radar charts. By utilizing the spatial distribution and temporal variation characteristics of combustor outlet temperature sensors and validating the method on real power plant data, they showed that combining spatial and temporal information can help improve GT fault diagnosis performance. Fahmi et al. [12] developed an anomaly detection model that combines a temporal convolutional network with an autoencoder, and further introduced a multi-head attention mechanism, achieving high-accuracy GT fault detection on real power plant vibration data. Yu et al. [13] proposed a GT fault diagnosis method based on a feature-fusion cascaded neural network, which was used to identify both incipient faults and abrupt faults under conditions of limited measurement points, further indicating the application potential of deep feature fusion in GT fault identification.

Nevertheless, research on modeling the continuous degradation process of overall GT performance and reliably predicting health state remains relatively scarce. Fundamentally, high-precision overall degradation modeling and health assessment rely heavily on complete life-cycle data from initial health status to complete failure [14,15]. Under actual operating conditions, GTs have a design life of tens of thousands to hundreds of thousands of hours, making the collection of complete life-cycle data impractical in terms of time and economic costs. Additionally, operational data often lacks clear health parameter labels [16,17]. The scarcity of full-life-cycle data and the lack of health parameter labels are key issues in research related to GT health assessment.

In sharp contrast, in the field of aero-engines, which share similar structures and operating principles, remaining useful life (RUL) prediction has developed into a relatively mature research direction. Benefiting from open and detailed benchmark datasets such as C-MAPSS released by NASA, a systematic research system has been established in this field, with numerous data-driven RUL prediction methods emerging [18]. Wu et al. [19] proposed a method based on a Deep Convolutional Neural Network (DCNN) that can effectively extract deep spatial features of data. Huang et al. [20] adopted Bidirectional Long Short-Term Memory (BiLSTM) neural networks, which more comprehensively characterize health evolution trends by simultaneously learning forward and backward dependencies of time-series data. These methods indicate that deep learning can effectively capture complex temporal degradation patterns, laying a methodological foundation for high-precision health assessment.

Furthermore, to address the cross-domain prediction challenges often faced in actual aero-engine operations—i.e., distribution differences between training data (source domain) and test data (target domain)—solutions centered on transfer learning have gradually been developed. Relevant studies have proposed various methods to mitigate the decline in prediction performance caused by inter-domain distribution inconsistencies, such as weight-optimized transfer learning [21], deep domain adversarial networks [22], and Domain Adversarial Neural Network (DANN) [23]. Among these, DANN forces the network to learn domain-invariant features that strip domain-specific characteristics through adversarial training between the feature extractor and domain discriminator, significantly improving the model’s generalization capability under distribution differences. This method provides an important theoretical tool for solving cross-equipment and cross-operating condition prediction tasks. Summarizing the above progress, in the aero-engine field, relying on the C-MAPSS dataset, researchers have not only established mature deep learning prediction models but also developed domain adaptation technologies represented by DANN to overcome data distribution differences.

Considering the similarities in structure and degradation mechanisms between aero-engines and GTs, as well as the availability of rich labeled health parameter data and mature cross-domain adaptation methods in the aero-engine field, a cross-domain GT health assessment method based on DANN’s domain adaptation technology is researched in this paper. Using label-rich C-MAPSS data as the source domain, the method leverages a domain adversarial training mechanism to drive the BiLSTM feature extractor to learn domain-invariant health features, constructing an assessment model that can be transferred to the GT target domain with scarce health parameter labels, thereby achieving reliable assessment of GT health.

The remainder of this paper is organized as follows. Section 2 presents the BiLSTM-DANN; Section 3 conducts simulations and discussion; Section 4 provides a summary.

2. Methodology

2.1. Bidirectional Long Short-Term Memory Neural Network

The BiLSTM neural network is an enhanced recurrent neural network (RNN) architecture composed of two independent LSTM networks. These two LSTM networks process sequence data along the forward and backward directions of the time axis, respectively.

As a specialized type of RNN, LSTM effectively mitigates the issues of gradient vanishing and gradient explosion encountered by traditional RNNs when processing long-term sequence data, enabling relatively efficient capture of dependencies within extended sequences. LSTM incorporates three gating units—the input gate, output gate, and forget gate. These gating units facilitate the extraction of internal correlation information from long-term sequence data. The unit structure of LSTM is illustrated in Figure 1.

At time step t, the internal computation within the neuron proceeds as follows:

f_{t} = σ (W_{f} x_{t} + V_{f} h_{t - 1} + b_{f})

(1)

i_{t} = σ (W_{i} x_{t} + V_{i} h_{t - 1} + b_{i})

(2)

o_{t} = σ (W_{o} x_{t} + V_{o} h_{t - 1} + b_{o})

(3)

{\tilde{c}}_{t} = g (W_{c} x_{t} + V_{c} h_{t - 1} + b_{c})

(4)

c_{t} = f_{t} \otimes c_{t - 1} + i_{t} \otimes {\tilde{c}}_{t}

(5)

h_{t} = o_{t} \otimes g (c_{t})

(6)

where x_t denotes the input at time step t; W, V, and b represent the weight parameters and bias; h_t denotes the hidden state at time step t;

σ

and g denote the Sigmoid function and hyperbolic tangent function, respectively; f_t, i_t, and o_t denote the forget gate, input gate, and output gate at time step t, respectively;

{\tilde{c}}_{t}

denotes the candidate hidden state at time step t; c_t denotes the memory cell at time step t; and

\otimes

denotes the Hadamard product.

Traditional LSTMs only consider past contextual relationships when processing sequential data, yet future contextual relationships also play a crucial role in many data processing tasks. BiLSTM incorporates two hidden layers: a forward LSTM and a backward LSTM. For an input sequence, the forward LSTM processes sequentially from t = 1 to t = n, while the backward LSTM processes in reverse order from t = n to t = 1. At each time step t in the sequence, the final output is jointly determined by the hidden state of the forward LSTM at time t and the hidden state of the backward LSTM at time t. The BiLSTM architecture is illustrated in Figure 2. Compared to unidirectional LSTM networks, BiLSTM integrates both forward and backward sequence information, capturing complete sequence context from two dimensions and significantly enhancing sequence modeling capabilities. When addressing complex GT health status assessment problems, the current state incorporates both historical information and future health evolution trends. Therefore, compared to unidirectional LSTMs, bidirectional BiLSTMs can extract more comprehensive temporal health features, laying a solid foundation for high-precision and reliable health assessment of GT.

2.2. Domain Adversarial Neural Networks

To address the domain shift phenomenon where the model performs well on the source domain but degrades on the target domain, DANN can eliminate the domain-related parts of features while learning task-discriminative features through adversarial training. Thus, using labeled source domain data, it improves the generalization performance of the model on the target domain with unlabeled or few labeled samples.

DANN consists of three parts: the feature extractor G_f, the domain discriminator G_d, and the predictor G_y. The feature extractor extracts high-level features from the input data; the domain discriminator distinguishes the domain to which the features belong; and the predictor outputs the predicted values for the task. During backpropagation, DANN implements gradient reversal by passing gradients from the domain discriminator through a Gradient Reversal Layer (GRL). These gradients are multiplied by a negative coefficient before being fed back to the feature extractor. This adversarial process encourages the feature extractor to learn features that are useful for both the source and target domains.

The loss of the entire model consists of prediction loss and domain discrimination loss, so the objective function can be expressed as:

\begin{matrix} L (θ_{f}, θ_{y}, θ_{d}) = & \frac{1}{n} \sum_{i = 1}^{n} L_{y} (G_{y} (G_{f} (x_{i}; θ_{f}); θ_{y}), y_{i}) \\ - λ (\begin{array}{l} \frac{1}{n} \sum_{i = 1}^{n} L_{d} (G_{d} (G_{f} (x_{i}; θ_{f}); θ_{d}), d_{i}) \\ + \frac{1}{n^{'}} \sum_{i = n + 1}^{N} L_{d} (G_{d} (G_{f} (x_{i}; θ_{f}); θ_{d}), d_{i}) \end{array}) \\ = & \underset{source domain}{\underset{︸}{\frac{1}{n} \sum_{i = 1}^{n} L_{y}^{i} (θ_{f}, θ_{y})}} - λ (\underset{source domain}{\underset{︸}{\frac{1}{n} \sum_{i = 1}^{n} L_{d}^{i} (θ_{f}, θ_{d})}} + \underset{target domain}{\underset{︸}{\frac{1}{n^{'}} \sum_{i = n + 1}^{N} L_{d}^{i} (θ_{f}, θ_{d})}}) \end{matrix}

(7)

where θ_f, θ_d, and θ_y represent the parameters of the feature extractor, domain classifier, and predictor, respectively. For the GT RUL prediction problem, L_y and L_d denote the losses for RUL prediction and domain label classification, respectively. y_i is the RUL label for input x_i, and d_i is the domain label for input x_i. λ serves as the trade-off parameter between prediction loss and domain classification loss. n denotes the number of source domain samples, and n′ denotes the number of target domain samples.

Then, the optimal values

{\hat{θ}}_{f}

,

{\hat{θ}}_{d}

, and

{\hat{θ}}_{y}

for θ_f, θ_d, and θ_y can be obtained:

({\hat{θ}}_{f}, {\hat{θ}}_{y}) = \underset{θ_{f}, θ_{y}}{\arg \min} L (θ_{f}, θ_{y}, {\hat{θ}}_{d})

(8)

{\hat{θ}}_{d} = \underset{θ_{d}}{\arg \max} L ({\hat{θ}}_{f}, {\hat{θ}}_{y}, θ_{d})

(9)

Combined with the objective function in Equation (7), taking the derivative of each parameter, we get:

θ_{f} \leftarrow θ_{f} - μ (\frac{\partial L_{y}^{i}}{\partial θ_{f}} - λ \frac{\partial L_{d}^{i}}{\partial θ_{f}})

(10)

θ_{y} \leftarrow θ_{y} - μ \frac{\partial L_{y}^{i}}{\partial θ_{y}}

(11)

θ_{d} \leftarrow θ_{d} - μ λ \frac{\partial L_{d}^{i}}{\partial θ_{d}}

(12)

where μ denotes the learning rate. This equation closely resembles the stochastic gradient descent update in deep learning, with the most significant difference being that in Equation (10), the gradients of the prediction loss and domain discriminator loss are not added but subtracted through the weighting parameter λ. This implies that θ_f updates in a direction that minimizes the prediction loss L_y while maximizing the domain discriminator loss L_d. This incentivizes the feature extractor to generate features that the domain discriminator cannot distinguish—i.e., domain-invariant features—which is precisely the adversarial process achieved through GRL.

2.3. BiLSTM-DANN

In this paper, a health assessment model that integrates BiLSTM and DANN was established, with its structure illustrated in Figure 3. First, data preprocessing is performed on the multivariate time-series data from the source domain C-MAPSS dataset and the target domain GT monitoring data, including manual parameter alignment and screening based on physical rules, outlier processing, and data normalization. Subsequently, the sliding window method is used to construct continuous time-series data into sequence samples that the model can process. BiLSTM is used as the feature extractor of DANN to fully learn the complex temporal features in the health evolution process. In the adversarial training architecture of DANN, the deep features f extracted by BiLSTM are simultaneously fed into the predictor and the domain discriminator for health assessment and data source distinction, respectively. Through continuous adversarial iteration, the model will gradually reduce the distribution difference between the source domain data and the target domain data, and ultimately improve the accuracy of GT health assessment.

The constructed BiLSTM-DANN model consists of three parts: a feature extractor, a health status predictor, and a domain discriminator. The feature extractor composed of BiLSTM is used to obtain the historical accumulation information and future predictive information in the GT health evolution trend to extract high-quality temporal features. During training, the Adaptive Moment Estimation (Adam) optimizer is adopted. Adam combines momentum method and root mean square backpropagation, uses the first-moment estimation of the gradient to maintain the direction inertia of parameter updates, and uses the second-moment estimation to adaptively adjust the learning rate for each parameter. Thus, when processing sensor data containing noise, stable and efficient convergence can be achieved. Both the predictor and the domain discriminator are implemented by fully connected layers.

The model loss includes RUL regression prediction loss and domain discrimination loss, which adopt mean square error and cross-entropy loss, respectively. Therefore, Equation (7) can be expressed as:

L (θ_{f}, θ_{y}, θ_{d}) = \underset{source domain}{\underset{︸}{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}} - λ (\begin{array}{l} \underset{source domain}{\underset{︸}{\frac{1}{n} \sum_{i = 1}^{n} [d_{i} \log {\hat{d}}_{i} + (1 - d_{i}) \log (1 - {\hat{d}}_{i})]}} + \underset{target domain}{\underset{︸}{\frac{1}{n^{'}} \sum_{i = n + 1}^{N} [d_{i} \log {\hat{d}}_{i} + (1 - d_{i}) \log (1 - {\hat{d}}_{i})]}} \end{array})

(13)

where

{\hat{y}}_{i}

denotes the predicted value for the RUL label y_i, and

{\hat{d}}_{i}

denotes the predicted label for the domain label d_i, n represents the number of source-domain samples, n′ represents the number of target-domain samples, and N = n + n′.

3. BiLSTM-DANN-Based Health Assessment

3.1. Health Assessment Based on the C-MAPSS Dataset

To ensure the reliability of the source domain data, this paper selects the C-MAPSS simulation dataset released by NASA for model training and verification. This dataset is obtained through multiple simulation experiments, simulating the health process of aero-engines under different operating conditions, and is widely used in RUL prediction research.

The C-MAPSS dataset includes four datasets—FD001, FD002, FD003, and FD004—each containing a training set and a test set. The training set includes time-series data of state parameters during the complete life-cycle of each aero-engine, while the test set includes time-series data of state parameters during a certain period of the incomplete life-cycle of each aero-engine. As shown in Table 1, the number of operating conditions and failure modes corresponding to each dataset are different. Considering that the research object of this paper is a GT with simple operating conditions, FD001 and FD003 are selected for model training and verification. Among them, FD001 corresponds to a single operating condition and a single failure mode, whereas FD003 corresponds to a single operating condition and two failure modes. The reasons for selecting the FD001 and FD003 datasets for validation are as follows: on the one hand, both subsets are more consistent with the characteristics of industrial GTs, which typically operate only at ground operating points; on the other hand, they enable a more comprehensive assessment of the model under both single-component degradation and simultaneous multi-component degradation conditions. Both subsets provide full-life-cycle data for aero-engines and include 21 sensor measurements, such as pressure and temperature parameters at multiple sections (e.g., fan inlet, compressor outlet, and turbine outlet), high and low pressure rotational speeds, bypass ratio, pressure ratio, and other variables.

3.2. Data Preprocessing

During data preprocessing, parameters that remained unchanged during the health decline period were initially filtered out, retaining only 14 parameters numbered 2, 3, 4, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20, and 21.

To realize effective knowledge transfer from aero-engine simulation data to industrial power generation GT operational data, this paper manually aligns the C-MAPSS source domain parameters with the GT target domain parameters based on thermodynamic principles and data characteristics. The final dataset retained parameters including high-pressure compressor outlet total temperature T₃₀, low-pressure turbine outlet total temperature T₅₀, high-pressure compressor outlet total pressure P₃₀, and low-pressure turbine rotational speed N_f. These correspond to GT compressor outlet temperature T_c, turbine exhaust temperature T_t, compressor discharge pressure P_c, and power output P_e, respectively. These parameters are closely related to GT health status because they reflect the thermodynamic state, compression performance, and power output capability of the unit. In general, increases in T_c and T_t may indicate greater efficiency loss and performance deterioration associated with fouling, whereas a decrease in P_c may imply a weakening of the compressor’s compression capability. A reduction in P_e, on the other hand, directly reflects a decline in the effective output capacity of the unit under the same operating conditions. However, it should be noted that the selection of these parameters is based primarily on their positions in the thermodynamic cycle, their functional roles, and their sensitivity to GT health. Therefore, they represent correspondences only at the level of engineering mechanisms, rather than strictly equivalent relationships in a physical sense. For example, we match P_e with N_f to create a load-reflective feature alignment for health assessment. This correlation serves as an approximation of the dynamic equilibrium between the GT’s shaft power and the propulsive thrust under varying operational conditions.

The alignment of T₃₀ and P₃₀ of C-MAPSS with T_c and P_c of GT is because both are located at the inlet of the combustion chamber in their respective thermodynamic cycles and are parameters characterizing the compression efficiency of the compressor. The comparison between the total temperature T₅₀ at the low-pressure turbine outlet of an aero-engine and the exhaust temperature T_t of a GT is made because both represent end-of-cycle parameters. They integrate the overall efficiency information from the compressor, combustion chamber, and turbine, serving as comprehensive indicators of health. The comparison with the low-pressure turbine speed N_f of an aero-engine and the electrical power output P_e of a GT is based on the fact that the subject of this study is a GT used for industrial power generation. Such turbines exhibit nearly constant rotational speed after grid connection. Therefore, electrical power output, which directly reflects performance degradation, is chosen for comparison instead of turbine speed. Both are key parameters for measuring the output load.

A similarity transformation is performed on the sensor parameters.

T_{30, c} = \frac{T_{30}}{\sqrt{T_{2}}}, T_{50, c} = \frac{T_{50}}{\sqrt{T_{2}}}, P_{30, c} = \frac{P_{30}}{P_{2}}, N_{f, c} = \frac{N_{f}}{\sqrt{T_{2}}}

(14)

where T₂ and P₂ are the total temperature and total pressure at the aero-engine inlet, and parameters with the subscript “c” represent parameters after similarity transformation.

Subsequently, data standardization is performed.

X_{z} = \frac{X - \bar{X}}{σ}

(15)

where X denotes the parameter after similarity transformation, X_z represents the normalized parameter,

\bar{X}

is the mean value of parameter X, and σ is the standard deviation.

Taking the first engine in the FD001 dataset as an example, the trend of parameter changes after similarity transformation and normalization is shown in Figure 4.

To ensure consistent sample lengths, this study employs a sliding window method for data processing. With a window length of 30 and a step size of 1, each sample segment has dimensions of 30 × 4.

3.3. Establishing RUL Labels

During the full life-cycle of an aero-engine, the health of the engine can be divided into two stages. In the first stage, the early operation period, the engine health is relatively stable, and the RUL remains at the maximum value RULmax. In the second stage, the engine performance degrades, and the RUL shows a linear decreasing trend until it drops to 0. Therefore, the change trend of RUL with the operating cycle t is:

R U L = \{\begin{array}{l} R U L_{\max}, & 0 < t \leq t_{\max} - R U L_{\max} \\ t_{\max} - t, & t > t_{\max} - R U L_{\max} \end{array}

(16)

where t_max represents the maximum operating cycle count.

As can be seen from Equation (16), a piecewise RUL labeling strategy is adopted for the C-MAPSS dataset in this study, with RULmax set to 125 [24]. This is because engines in the early stage are generally in a healthy condition; although their RUL values are relatively large, the health information contained in the sensor signals is limited. Setting an upper bound for RUL can reduce the influence of these low-information samples, improve training stability, and enhance comparability with existing studies.

Taking the first engine in the FD001 dataset as an example, the RUL change trend is illustrated in Figure 5.

As shown in Figure 5, the inflection point in the RUL curve mainly arises from the piecewise labeling strategy rather than from an abrupt change in the actual physical health decline process. Since the maximum RUL is capped at 125 in this study, the RUL remains constant during the early stage and begins to decrease linearly after reaching the corresponding threshold.

4. Validation

4.1. Assessment Indicators

To validate the model’s prediction accuracy, the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Relative Accuracy (RA) are adopted as evaluation indicators for this study. The expressions for each indicator are as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(17)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(18)

R A = 1 - \frac{1}{n} \sum_{i = 1}^{n} \frac{|y_{i} - {\hat{y}}_{i}|}{y_{i}}

(19)

where smaller values of MAE and RMSE indicate smaller prediction errors, and larger values of RA indicate higher model accuracy.

4.2. Configuration of the BiLSTM-DANN

The GT health assessment model established in this paper consists of 3 parts: a feature extractor, a remaining useful life (RUL) predictor, and a domain discriminator. The parameter settings of each component are shown in Table 2, Table 3 and Table 4. The hyperparameter settings during the training process are shown in Table 5.

4.3. RUL Prediction Comparisons

To verify the effectiveness of the proposed model, BiLSTM and DCNN, which are widely used in this field, were selected as baseline models, and comparative experiments were conducted on the C-MAPSS FD001 and FD003 test sets. Meanwhile, to provide a more comprehensive assessment of the effectiveness of the model, Table 6 further supplements the published results of related research models on the above datasets. Specifically, Figure 6 presents the distribution of remaining useful life (RUL) prediction results of the proposed model, BiLSTM, and DCNN on the test sets. Figure 7 further selects two representative engines and shows the comparison curves of the RUL predicted values and actual values of the above three models over time. Table 6 summarizes the quantitative results of the proposed model, the baseline models, and other related research models. Among them, the proposed model and the baseline models are evaluated using MAE, RMSE, and RA, whereas the results of other related research models, including CNN [25], ARR-Transformer [26], and Transformer-MCD-CP [27], are all taken from existing studies. Since differences may exist among different studies in terms of data preprocessing, sensor selection, RUL labeling strategy, and evaluation protocols, the unreported metrics are denoted by “—”, and the related comparison results are provided for reference only.

As shown in Figure 6 and Figure 7, all three methods, namely BiLSTM, DCNN, and BiLSTM-DANN, are able to characterize the overall variation trend of engine remaining useful life (RUL) on the FD001 and FD003 test sets reasonably well; however, BiLSTM-DANN achieves the highest degree of fit to the true RUL curves. For the overall engine fleet, Figure 6 shows that the RUL prediction distribution of BiLSTM-DANN across different engines is closer to the ground truth, with smaller overall errors, indicating better predictive stability at the sample level. For representative individual engines, the RUL prediction results on the FD001 and FD003 test sets shown in Figure 7 further demonstrate that BiLSTM, by relying on its bidirectional recurrent structure to model global temporal dependencies, is able to capture the overall evolutionary trend of the health evolution process more effectively, but may still exhibit certain deviations during health state transition stages and in time intervals with rapid local changes. DCNN, by contrast, places greater emphasis on extracting local temporal patterns and is therefore more sensitive to local variations; when the input parameters fluctuate substantially, it is more likely to introduce larger local fluctuation errors. In comparison, BiLSTM-DANN retains the temporal modeling advantages of BiLSTM, while the adversarial training introduced by DANN not only preserves BiLSTM’s temporal modeling capability but also imposes additional constraints on the feature extraction process, enabling the model to learn health features with stronger discriminative power and robustness. As a result, even on source-domain test samples, it exhibits smaller local deviations and higher predictive stability. Therefore, BiLSTM-DANN is able to track the true RUL trajectory more accurately during the RUL plateau stage, in the vicinity of inflection points, and during the later rapid decline stage, thereby demonstrating higher prediction accuracy and stronger cross-condition generalization capability.

As can be seen from Table 6, BiLSTM-DANN achieves the best results on both the FD001 and FD003 test sets. On FD001, its MAE, RMSE, and RA are 4.73, 5.80, and 0.96, respectively; on FD003, these values further improve to 1.97, 2.60, and 0.98, all of which outperform BiLSTM, DCNN, and the other related methods listed in the table. In particular, in terms of the RMSE metric, BiLSTM-DANN shows a more pronounced advantage on both datasets, indicating that this method can reduce prediction errors more effectively.

4.4. GT Health Assessment

The health of a PG9171E single-shaft heavy-duty industrial GT, manufactured by General Electric (GE, Boston, MA, USA), is assessed. The data used originates from operational records of a power plant, covering specific dates over ten days in a given year: 30 January, 23 February, 19 March, 21 May, 18 June, 1 July, 30 August, 23 September, 14 October, and 1 November. The x-axis in the figure represents the data series. The operational data for four sensor parameters are shown in Figure 8. The four target-domain parameters adopted in Figure 8 are the compressor outlet temperature T_c, turbine exhaust temperature T_t, compressor discharge pressure P_c, and power output P_e, as selected in Section 3.2. These data serve as the target domain for the BiLSTM-DANN model and are employed to predict and assess the health of the GT. The assessment results are shown in Figure 9, where the x-axis indicates the data series, and the y-axis, labeled as “E_HI”, represents the assessment of the Health indicator. It should be clarified that the RUL labels in the C-MAPSS source domain are not directly treated as the RUL of the target-domain GT in this study. Considering that industrial GTs generally have very long life-cycles and undergo frequent water-washing maintenance during operation, their health changes correspondingly over time. Therefore, in this study, the source-domain RUL labels are used as supervisory signals to drive the BiLSTM-DANN model to learn temporal features related to the evolution of GT health in the target domain. Through cross-domain feature alignment, these health features learned from the source-domain life-cycle data are transferred to the GT target domain. Combined with variations in key health parameters such as T_c, T_t, P_c, and P_e, they are further used to characterize the overall degree of GT health deterioration. For this reason, the output of the target domain is defined as a comprehensive health indicator, namely “E_HI”. E_HI can provide a comprehensive assessment of the degree of engine health and serve as a quantitative indicator for determining whether the GT requires water-washing maintenance.

The GT underwent water washing operations on 5 May, 16 June, 27 August, and 21 October. The solid red lines in Figure 9 mark these water washing events. As shown in Figure 9, sensor parameters exhibit noise, resulting in noisy assessment values. However, each data segment demonstrates a relatively stable trend. The first three segments show continuous operation with a gradual decline in performance. Following the 5 May water wash, health on 21 May significantly improved compared to 19 March, with the predicted assessment mean increasing by 3.28. The mean predicted health assessment for each segment is shown in Table 7. Subsequent washing occurred on 16 June after 21 May. Performance on 18 June was marginally better than on 21 May, with the mean improving by only 0.56. After washing on 27 August, performance on 30 August improved compared to 1 July, with the mean increasing by 1.92. Following washing on 21 October, health on 1 November outperformed that on 14 October, with the mean value increasing by 1.90. It is worth noting that, as a comprehensive assessment indicator of GT health, E_HI adopts the same value alignment approach as the remaining useful life in the source-domain C-MAPSS dataset (which is processed in this study within the range of 0 to 125). Therefore, values greater than 100 may occur. The GT investigated in this study has already been in service for 13 years and is expected to remain operational for a total service life of 20 to 30 years; accordingly, values exceeding 100 are considered reasonable in this context.

To further characterize the health features of the GT under continuous operating conditions, this study selected one segment of steady-state data from each day of four consecutive days of operation provided by the enterprise for health analysis. The selected data are shown in Figure 10, and the resulting predicted health state trajectories are presented in Figure 11. In the figure, the red vertical line indicates the time of water washing, while the gray dashed lines denote the boundaries between different dates.

As can be seen from Figure 11, during the continuous operation stage before water washing, the health assessment values are overall relatively concentrated and remain within a relatively stable range, showing only limited fluctuations and a slight downward trend. This indicates that, over a short timescale of continuous operation, the health decline of the GT exhibits a slow cumulative characteristic, and its health state does not change significantly within a short period. Compared with the health differences observed over an inter-month timescale, the performance evolution during the short-term continuous operation stage is more stable.

In contrast to the pre-washing condition, the continuous operation trajectory after water washing exhibits a markedly different distribution pattern. Compared with the values before washing, the health assessment values after washing show an overall significant increase. The curve rises from the lower level observed before washing to a higher range and remains basically at this improved performance level during the following several days. This indicates that water-washing maintenance has a significant effect on restoring the unit condition, effectively improving gas flow capacity and compression efficiency, thereby alleviating the accumulated health decline and enabling performance recovery of the unit.

The assessment results reveal two important characteristics of the health and maintenance recovery process of the real GT. First, during the short-term continuous operation stage, changes in the health assessment values are relatively limited, indicating that the health decline process is progressive and cumulative. Second, maintenance operations such as water washing can effectively improve GT performance, demonstrating the important engineering application value of condition-based water-washing maintenance research.

Overall, the health assessment results were reasonable and consistent with actual conditions, validating the effectiveness of the research methodology presented in this paper.

5. Conclusions

Addressing the challenges posed by sparse and unlabeled full-lifetime data for GT health assessment, this paper transfers mature deep learning and domain adaptation techniques from the aviation engine field to the prediction of overall GT health assessment. Leveraging commonalities in aerothermal and structural characteristics between the two, this study uses the fully labeled C-MAPSS dataset as the source domain. Employing domain adaptation techniques, a BiLSTM-DANN transfer learning model is constructed to achieve precise assessment and prediction of GTs’ whole-machine health performance.

First, based on mechanism analysis and physical properties, we performed physical mapping between the sensor parameters of aero-engines and GTs, screened parameters, and introduced similarity transformation and normalization to eliminate dimensional and operating condition differences. A sliding window method reconstructed samples from the preprocessed time-series data for model input. Subsequently, a dual-layer BiLSTM was used as the feature extractor for the DANN, fully capturing bidirectional dependencies within the time-series data. The adversarial training mechanism between the domain discriminator and feature extractor within the DANN drives the feature extractor to learn domain-invariant, intrinsic health characteristics. Comparative experiments on the FD001 and FD003 test sets demonstrated that the BiLSTM-DANN model developed in this study significantly outperforms the conventional BiLSTM and DCNN models in terms of MAE, RMSE, and RA, thereby achieving higher prediction accuracy and stronger generalization capability. When applied to operational data from target-domain GTs, it achieves satisfactory predictions consistent with physical principles.

This study provides a data migration solution for assessing and predicting the health of GT in scenarios with sparse labels and presents a novel approach for fully data-driven health assessment of GT. However, given the limited number of currently available measurement parameters for GTs, this paper has utilized mechanism analysis to select only four alignment parameters to develop a health assessment model. In the future, we will further conduct data-based feature importance analysis, aiming to provide more comprehensive and reliable data validation support for the GT health assessment model. Meanwhile, current research still lacks the support of full-life-cycle data of GTs, and the analysis conducted in this paper on real GTs mainly leads to qualitative conclusions rather than strictly statistical quantitative metrics. In subsequent work, we will continuously monitor the health changes of the GTs under study and conduct trend analysis and evolutionary pattern prediction of their health throughout the entire life-cycle.

Author Contributions

Conceptualization, Q.L. and F.L.; methodology, B.M. and F.L.; software, X.F. and D.D.; validation, X.F., D.D. and H.A.; formal analysis, B.M.; investigation, B.M.; resources, B.M. and X.F.; data curation, B.M.; writing—original draft preparation, B.M.; writing—review and editing, X.F. and Q.L.; visualization, B.M.; supervision, Q.L. and F.L.; project administration, F.L.; funding acquisition, F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Bingzhou Ma and Xueting Fu were employed by Jiangsu Huaiyin Power Generation Co., Ltd. and Jiangsu Xinhai Power Generation Co., Ltd., respectively. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zhou, D.; Yao, Q.; Wu, H.; Ma, S.; Zhang, H. Fault diagnosis of gas turbine based on partly interpretable convolutional neural networks. Energy 2020, 200, 117467. [Google Scholar] [CrossRef]
Song, W.Q.; Shen, D.H. Research on gas path fault diagnosis technology of gas turbine based on deep extreme learning machine. J. Beijing Univ. Chem. Technol. (Nat. Sci. Ed.) 2024, 51, 89–98. [Google Scholar]
Kurz, R.; Brun, K. Degradation Effects on Industrial Gas Turbines. J. Eng. Gas Turbines Power 2009, 131, 062401. [Google Scholar] [CrossRef]
Wang, L.; Li, Y.G.; Ghafir, M.F.A. Rough Set Diagnostic Frameworks for Gas Turbine Fault Classification. In Proceedings of the ASME Turbo Expo 2013: Turbine Technical Conference and Exposition, San Antonio, TX, USA, 3–7 June 2013; Volume 2. [Google Scholar] [CrossRef]
Capata, R. An artificial neural network-based diagnostic methodology for gas turbine path analysis—Part I: Introduction. Energy Ecol. Environ. 2016, 1, 343–350. [Google Scholar] [CrossRef]
Yan, W. Detecting Gas Turbine Combustor Anomalies Using Semi-supervised Anomaly Detection with Deep Representation Learning. Cogn. Comput. 2020, 12, 398–411. [Google Scholar] [CrossRef]
Tang, J.P.; Wang, H.J.; Zhong, J.L.; Liu, S.C.; Zhang, X.; Xu, W.F. Fault diagnosis method for gas turbine rotor based on WDCNN-SVM deep transfer learning. J. Electron. Meas. Instrum. 2021, 35, 115–123. [Google Scholar] [CrossRef]
Sarwar, U.; Muhammad, M.; Mokhtar, A.A.; Khan, R.; Behrani, P.; Kaka, S. Hybrid intelligence for enhanced fault detection and diagnosis for industrial gas turbine engine. Results Eng. 2024, 21, 101841. [Google Scholar] [CrossRef]
Cheng, K.; Wang, Y.; Yang, X.; Zhang, K.; Liu, F. An intelligent online fault diagnosis system for gas turbine sensors based on unsupervised learning method LOF and KELM. Sens. Actuators A. Phys. 2024, 365, 114872. [Google Scholar] [CrossRef]
Bunyan, S.T.; Khan, Z.H.; Al-Haddad, L.A.; Dhahad, H.A.; Al-Karkhi, M.I.; Ogaili, A.A.F.; Al-Sharify, Z.T. Intelligent thermal condition monitoring for predictive maintenance of gas turbines using machine learning. Machines 2025, 13, 401. [Google Scholar] [CrossRef]
An, S.; Han, J.; Kwon, D. Gas turbine fault diagnosis based on sliding radar chart. Int. J. Precis. Eng. Manuf.-Green Technol. 2026. [Google Scholar] [CrossRef]
Fahmi, A.T.W.K.; Kashyzadeh, K.R.; Ghorbani, S. Advancements in gas turbine fault detection: A machine learning approach based on the temporal convolutional network-autoencoder model. Appl. Sci. 2024, 14, 4551. [Google Scholar] [CrossRef]
Yu, B.; Cao, L.; Xie, D.; Chen, J.; Zhang, H. Fault diagnosis of gas turbine based on feature fusion cascade neural network. Energy 2025, 321, 135439. [Google Scholar] [CrossRef]
El-Brawany, M.A.; Ibrahim, D.A.; Elminir, H.K.; Elattar, H.M.; Ramadan, E.A. Artificial intelligence-based data-driven prognostics in industry: A survey. Comput. Ind. Eng. 2023, 184, 109605. [Google Scholar] [CrossRef]
Arias Chao, M.; Kulkarni, C.; Goebel, K.; Fink, O. Aircraft Engine Run-to-Failure Dataset under Real Flight Conditions for Prognostics and Diagnostics. Data 2021, 6, 5. [Google Scholar] [CrossRef]
Cheng, H.; Kong, X.; Wang, Q.; Ma, H.; Yang, S.; Xu, K. Remaining useful life prediction combined dynamic model with transfer learning under insufficient degradation data. Reliab. Eng. Syst. Saf. 2023, 236, 109292. [Google Scholar] [CrossRef]
Zhang, W.; Li, X.; Ma, H.; Luo, Z.; Li, X. Transfer learning using deep representation regularization in remaining useful life prediction across operating conditions. Reliab. Eng. Syst. Saf. 2021, 211, 107556. [Google Scholar] [CrossRef]
Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage propagation modeling for aircraft engine run-to-failure simulation. In Proceedings of the IEEE International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008. [Google Scholar] [CrossRef]
Wu, Y.; Yuan, M.; Dong, S.; Lin, L.; Liu, Y. Remaining useful life estimation of engineered systems using vanilla LSTM neural networks. Neurocomputing 2018, 284, 167–179. [Google Scholar] [CrossRef]
Huang, C.G.; Huang, H.Z.; Li, Y.F. A Bidirectional LSTM Prognostics Method Under Multiple Operational Conditions. IEEE Trans. Ind. Electron. 2020, 67, 761–770. [Google Scholar] [CrossRef]
Wu, K.; Li, J.; Zuo, L.; Zuo, L.; Lu, K.; Shen, H.T. Weighted adversarial domain adaptation for machine remaining useful life prediction. IEEE Trans. Instrum. Meas. 2022, 71, 3526511. [Google Scholar] [CrossRef]
Ye, Z.; Yu, J. State-of-health estimation for lithium-ion batteries using domain adversarial transfer learning. IEEE Trans. Power Electron. 2021, 37, 3528–3543. [Google Scholar] [CrossRef]
Costa, P.R.D.O.D.; Akçay, A.; Zhang, Y.; Kaymak, U. Remaining useful lifetime prediction via deep domain adaptation. Reliab. Eng. Syst. Saf. 2020, 195, 106682. [Google Scholar] [CrossRef]
Xie, K.; Zhang, Q.; Yang, P.; Liu, Q. A novel similarity-based remaining useful life prediction method under multiple failure modes with variable sensors. Measurement 2026, 258, 119311. [Google Scholar] [CrossRef]
Babu, G.S.; Zhao, P.; Li, X.L. Deep convolutional neural network based regression approach for estimation of remaining useful life. In Database Systems for Advanced Applications: DASFAA 2016, Part, I. Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; Volume 9642, pp. 214–228. [Google Scholar] [CrossRef]
Kim, G.; Choi, J.G.; Lim, S. Using transformer and a reweighting technique to develop a remaining useful life estimation method for turbofan engines. Eng. Appl. Artif. Intell. 2024, 133, 108475. [Google Scholar] [CrossRef]
Yang, C.L.; Meles, T.Y.; Yilma, A.A.; Teshome, M.M. Uncertainty aware predictive maintenance using a hybrid Transformer with Monte Carlo Dropout and conformal prediction. Ain Shams Eng. J. 2026, 17, 103992. [Google Scholar] [CrossRef]

Figure 1. LSTM Neuron Structure.

Figure 2. BiLSTM Network Architecture.

Figure 3. BiLSTM-DANN Model Architecture.

Figure 4. Data after similarity transformation and normalization (Engine No. 1).

Figure 5. RUL Trend (Engine No. 1 on FD001).

Figure 6. Prediction Results of Different Models on the FD001 and FD003 Test Sets.

Figure 7. RUL prediction results of different models for selected engines on the FD001 and FD003 test sets.

Figure 8. GT sensor parameters on selected operating dates.

Figure 9. GT health assessment results on selected operating dates.

Figure 10. GT operational data under continuous operating conditions before and after water washing.

Figure 11. Health assessment of GT under continuous operating conditions before and after water washing.

Table 1. C-MAPSS Dataset.

Dataset	Number of Working Conditions	Number of Failure Modes
FD001	1	1
FD002	6	1
FD003	1	2
FD004	6	2

Table 2. Configuration of Feature Extractor.

Sub-Module	Parameter	Value
First BiLSTM Layer	Input Dimension	4
	Number of Hidden nodes	48
	Bidirectional Output Dimension	96
	Dropout Rate	0.05
Second BiLSTM Layer	Input Dimension	96
	Number of Hidden nodes	32
	Bidirectional Output Dimension	64
	Dropout Rate	0.05
Fully Connected Layer (FC1)	Input Dimension	64
Fully Connected Layer (FC1)	Output Dimension	16

Table 3. Configuration of RUL Predictor.

Sub-Module	Parameter	Value
First Fully Connected Layer (FC2)	Input Dimension	16
	Output Dimension	8
	Activation Function	ReLU
	Dropout Rate	0.1
Second Fully Connected Layer (FC3)	Input Dimension	8
	Output Dimension	1
	Output Result	RUL Prediction Value

Table 4. Configuration of Domain Discriminator.

Sub-Module	Parameter	Value
First Fully Connected Layer (FC4)	Input Dimension	16
	Output Dimension	12
	Activation Function	ReLU
Second Fully Connected Layer (FC5)	Input Dimension	12
	Output Dimension	2
	Classification Function	Softmax

Table 5. Training Hyperparameters.

Parameter	Value
Optimizer	Adam
Number of Training Epochs	150
Learning Rate	0.001
Prediction Loss Function	MSE
Domain Discrimination Loss Function	Cross-Entropy Loss

Table 6. Accuracy of Different Models on the FD001 and FD003 test sets.

Model	FD001			FD003
Model	MAE	RMSE	RA	MAE	RMSE	RA
CNN	—	18.45	—	—	19.82	—
ARR-Transformer	—	11.36	—	—	11.28	—
Transformer-MCD-CP	8.11	11.71	—	7.21	10.50	—
BiLSTM	5.89	8.26	0.93	2.43	3.57	0.97
DCNN	5.10	7.04	0.94	3.28	4.15	0.96
BiLSTM-DANN	4.73	5.80	0.96	1.97	2.60	0.98

Table 7. Average E_HI for each segment.

Date	20190130	20190223	20190319	20190521	20190618	20190701	20190830	20190923	20191014	20191101
Mean	113.42	112.23	103.98	107.26	107.82	105.63	107.55	104.48	103.43	105.33

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, B.; Fu, X.; Lu, F.; Deng, D.; An, H.; Li, Q. Health Assessment for Gas Turbines Based on Domain-Adversarial Neural Network. Aerospace 2026, 13, 332. https://doi.org/10.3390/aerospace13040332

AMA Style

Ma B, Fu X, Lu F, Deng D, An H, Li Q. Health Assessment for Gas Turbines Based on Domain-Adversarial Neural Network. Aerospace. 2026; 13(4):332. https://doi.org/10.3390/aerospace13040332

Chicago/Turabian Style

Ma, Bingzhou, Xueting Fu, Feng Lu, Daming Deng, Haoran An, and Qiuhong Li. 2026. "Health Assessment for Gas Turbines Based on Domain-Adversarial Neural Network" Aerospace 13, no. 4: 332. https://doi.org/10.3390/aerospace13040332

APA Style

Ma, B., Fu, X., Lu, F., Deng, D., An, H., & Li, Q. (2026). Health Assessment for Gas Turbines Based on Domain-Adversarial Neural Network. Aerospace, 13(4), 332. https://doi.org/10.3390/aerospace13040332

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Health Assessment for Gas Turbines Based on Domain-Adversarial Neural Network

Abstract

1. Introduction

2. Methodology

2.1. Bidirectional Long Short-Term Memory Neural Network

2.2. Domain Adversarial Neural Networks

2.3. BiLSTM-DANN

3. BiLSTM-DANN-Based Health Assessment

3.1. Health Assessment Based on the C-MAPSS Dataset

3.2. Data Preprocessing

3.3. Establishing RUL Labels

4. Validation

4.1. Assessment Indicators

4.2. Configuration of the BiLSTM-DANN

4.3. RUL Prediction Comparisons

4.4. GT Health Assessment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI