Aircraft Engine Run-to-Failure Dataset under Real Flight Conditions for Prognostics and Diagnostics

: A key enabler of intelligent maintenance systems is the ability to predict the remaining useful lifetime (RUL) of its components, i.e., prognostics. The development of data-driven prognostics models requires datasets with run-to-failure trajectories. However, large representative run-to-failure datasets are often unavailable in real applications because failures are rare in many safety-critical systems. To foster the development of prognostics methods, we develop a new realistic dataset of run-to-failure trajectories for a ﬂeet of aircraft engines under real ﬂight conditions. The dataset was generated with the Commercial Modular Aero-Propulsion System Simulation (CMAPSS) model developed at NASA. The damage propagation modelling used in this dataset builds on the modelling strategy from previous work and incorporates two new levels of ﬁdelity. First, it considers real ﬂight conditions as recorded on board of a commercial jet. Second, it extends the degradation modelling by relating the degradation process to its operation history. This dataset also provides the health, respectively, fault class. Therefore, besides its applicability to prognostics problems, the dataset can be used for fault diagnostics.


Introduction
Failures of safety-critical systems such as aircraft engines can cause significant economic disruptions and have potently high social costs. The prediction of the system's failure time is therefore of great importance for maintaining the functionality of safetycritical systems and society. The problem of predicting how long a particular industrial asset is going to operate until a system failure occurs, i.e., predicting RUL, is also referred to as prognostics [1]. Deploying successful prognostic methods in real-life applications would enable the design of intelligent maintenance strategies to determine with a sufficiently long lead time before failure when interventions need to be performed. Such maintenance strategies have the potential of reducing costs, machine downtime, and the risk of potentially catastrophic consequences if the systems are not maintained in time.
In light of their superior learning capabilities in a wide range of application fields, Machine Learning (ML), in general, and Deep Learning (DL), in particular, are promising candidates to tackle the challenges involved in the design of intelligent maintenance approaches [2]. This idea has been reinforced by the recent availability of large volumes of condition monitoring (CM) data from critical assets. As multiple research studies have Data 2021, 6, 5 2 of 14 pointed out [3][4][5][6][7], the CM data provide an untapped potential to develop data-driven algorithms for various predictive maintenance applications.
The development of data-driven prognostics models requires the availability of datasets with run-to-failure trajectories. These trajectories need to be comprised of a set of time series of CM data along with the corresponding time-to-failure labels. While CM data are often available in abundance, they typically lack the corresponding time-to-failure labels due to the rarity of occurring failures in safety-critical systems and the excessive preventive maintenance. Moreover, due to the sensitive nature of failures and the potential legal implications, manufacturers and operators have been reluctant to share prognostics datasets of their assets openly. As a result, over the last decade, only a very limited number of datasets have been made available to the scientific community for the development of prognostics models. At present, most of the available datasets are synthetic datasets generated with simulators or developed in a lab environment for simple systems by governmental and academic institutions [8,9]. While the availability of even such limited datasets is one of the most relevant contributors to the considerable progress of the prognostics and health management (PHM) in the last decade, these datasets lack important factors of complexity that are present in real systems. As a consequence, the developed data-driven prognostics algorithms are often not transferable to real applications.
Since its release as PHM Challenge [10] in 2008, the CMAPSS dataset [11] has been one of the most widely used prognostics datasets. Some recent examples that are also among the best performing prognostics models applied to the CMAPSS dataset are deep learning based methods such as convolutional neural network (CNN) [12,13], long shortterm memory networks (LSTM) [14][15][16][17][18][19] or hybrid networks combining CNN and LSTM layers [20,21]. The CMAPSS dataset provides simulated run-to-failure trajectories of a fleet comprising large turbofan engines. However, the represented flight conditions are restricted to six snapshots during a standard cruise phase, and the onset of an abnormal degradation (i.e., presence of a fault signature) is not dependent on the past operating profile. Therefore, the onset of the fault cannot be predicted; only the evolution of the fault can. Consequently, there is a fidelity gap in the dataset as the simulated degradation trajectories lack important factors of complexity that are present in real engines. Bringing higher fidelity to the degradation and the operating conditions represented in the CMAPSS dataset could improve the usability and the transferability of the developed data-driven models to real-world applications.
In this work, we introduce improvements and further developments to the original CMAPSS dataset with respect to two main aspects. First, we simulate complete flights as recorded on board a commercial jet, covering climb, cruise and descend flight conditions corresponding to different commercial flight routes [22]. Second, we increase the fidelity of degradation modelling by relating the onset of the degradation process to the operation history. To further extend the applicability of this dataset for a range of different case studies, we also include the health condition (i.e., healthy or faulty) in the dataset. We refer to the new CMAPSS dataset as N-CMAPSS. The procedure for generating this dataset is shown schematically in Figure 1 and described in detail in the Methods section.
The new prognostics dataset as proposed here will help to facilitate the development of deep learning algorithms for predictive maintenance applications that are more easily transferable to real applications. Generation process of the new CMAPSS dataset (i.e., N-CMAPSS) based on the real flight data. First, we define the flight data as recorded on board of a commercial jet. Second, the degradation of the engine components is imposed. Third, the resulting degraded flight is simulated. Fourth, the health condition is evaluated and the unit continues flying with increasing degradation until the health index of the engine has reached zero i.e., H I = 0, which defines the end-of-life. Finally, sensor noise is added to the simulated engine response.

CMAPSS Model
An important requirement for the generation of realistic run-to-failure trajectories is the availability of a suitable system model that allows variations of health conditions at sub-system level and the simulation of the output sensor measurements. The CMAPSS dynamical model is a high fidelity computer model for simulation of a realistic large commercial turbofan engine. Figure 2 shows a schematic representation of the engine along with the corresponding station numbers as defined in the CMAPSS model documentation [23]. In addition to the engine thermodynamic model, the package includes an atmospheric model capable of operation at (i) altitudes from sea level to 40,000 ft, (ii) Mach numbers from 0 to 0.90, and (iii) sea-level temperatures from -60 to 103 • F. The package also includes a power-management system that allows the engine to be operated over a wide range of thrust levels throughout the full range of flight conditions. The CMAPSS system model has the form of a coupled system of nonlinear equations. The inputs of the system model are divided into scenario-descriptor operating conditions w and unobservable model health parameters θ. The outputs of the system model are estimates of the measured physical properties x s and unobserved properties x v that are not part of the condition monitoring signals (i.e., virtual sensors). The nonlinear system model is denoted as: The unobservable model health parameters θ are model tuners and fall in the class referred to as quality parameters (i.e., component efficiencies, flow, input scalars, output scalars, and/or adders). These model parameters are used to simulate the deteriorated behaviour of the system. Concretely, in our work, all the rotating sub-components of the engine i.e., fan, low pressure compressor (LPC), high pressure compressor (HPC), low pressure turbine (LPT) and high pressure turbine (HPT) can be affected by degradation in flow and efficiency. In this work, we extended the number of sub-components that can be affected by the degradation from two to five.

Flight Data
Real flight conditions as recorded on board of a commercial jet were taken as input to the CMAPSS model (i.e., w). We divided the flight conditions in three flight classes according to the flight length. Table 1 shows exemplary the flight length range and the number of different flights in the DASHlink-Flight Data For Tail 687 [22]. It is assumed that each flight of the fleet only operates a particular flight class. Therefore, the assignation of a flight class to each unit is done only once.

Data Records
The N-CMAPSS dataset provides synthetic run-to-failure degradation trajectories of a fleet of turbofan engines with unknown initial health states subject to real flight conditions. At present, the N-CMAPSS dataset contains eight sets of data from 128 units and seven different failure modes affecting the flow (F) and/or efficiency (E) of all the rotating sub-components. Table 2 provides an overview of flight classes and failure modes for each of the sets of data provided. Each set of data are stored in a Hierarchical Data Format version 5 (HDF5) file 1 . The dataset is accessible publicly at the repository: https: //ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/. Scripts in the form of Jupyter notebooks are available also in the data repositories to demonstrate how to load the data, reproduce the analysis of this manuscript and to apply simple analysis to subvolumes of data. The online dataset will be updated when new degradation trajectories are computed.
Each data file provides two sets of data: the development dataset and the test dataset. Each of them contains six types of variables: the operative conditions w, the measured signals x s , the virtual sensors x v , the engine health parameters θ, the RUL label and the auxiliary data (i.e., the unit number u and the flight cycle number c, the flight class Fc and the health state h s ). In addition, the name of the variables within w, x s and x v , θ and the auxiliary data are provided. Table 3 shows an overview of the 17 variables stored in the .h5 file.  [23]. RUL is provided in units of cycles.    HPC_flow_mod HPC flow modifier -7 HPT_eff_mod HPT efficiency modifier -8 HPT_flow_mod HPT flow modifier -9 LPT_eff_mod LPT efficiency modifier -10 LPT_flow_mod HPT flow modifier -

Methods
The method used for generation of the N-CMAPSS dataset follows the methodology delineated in [11] and depicted in Figure 1. In brief, the method corresponds to the following process: 1.
Define flight conditions. Real flight conditions as recorded on board of a commercial jet (i.e., NASA DASHlink [22] data) are taken as input to an engine simulator. Add sensor noise. Sensor noise is added to the simulated data to account for the variability of real sensor readings.
In the following, we describe the key steps of the data generation process outlined above in more detail.

Degradation Model
The degradation of each engine is modelled as the combination of three contributors: an initial degradation, a normal degradation and abnormal degradation. The dataset generation process assumes failure modes exhibiting a continuous degradation of the main rotating engine sub-components: fan, LPC, HPC, HPT and LPT. The degradation effects are modelled by adjustments of flow capacity and efficiency of these engine sub-components (i.e., the engine health parameters θ).
Initial degradation. Due to manufacturing and assembly tolerances, each unit of the fleet has sightly different initial wear at the engine sub-component. Degradation due to this initial wear is not considered abnormal but can make a difference in useful operational life of a component. Following the original work, this initial wear is modeled by variations in flow and efficiencies of the various sub-component. An uniform random distribution U(0, 0.01) is assumed for each of the sub-components. The magnitude of such variations is relatively low, resulting in a health index within the range [0.9 to 1.0]. We denote the initial degradation as δ 0 .
Normal degradation. In addition to the initial wear, the system's components also experience degradation due to wear and tear resulting from usage. This type of degradation is considered normal and is modelled as linear decreasing trend given by: where a n = −0.001 is the slope of the degradation, and t refers to the time in units of cycles, i.e., flights. Transition from normal to abnormal degradation. Some time during an engine's life, its health state might transition to an abnormal state resulting from the presence of a particular failure mode. That is, at a point in time, t s , the corresponding fault leads to an abnormal condition and to an eventual failure at t EOL (i.e., end-of-life). We model the onset of a fault as a stochastic process governed by past operation history. While the detailed computation of the micro-level processes leading to a degraded state was not within the scope of this analysis, we capture the macro-level degradation characteristics leading to a fault by computing the energy balances around each sub-component. Concretely, it is assumed that each sub-component can only withstand certain excitation energy before reaching a state of abnormal degradation. We denote the maximum excitation energy of sub-component θ as th E θ , which we model as a Gaussian distribution to represent variability on the material properties of each unit. The fault onset time corresponds to the point in time at which the total amount of energy E that a component has been exited with from an initial time t = 0 to a time t exceeds max E θ . i.e., t s = t E(t)>max E θ . The excitation energy experienced by a sub-component in the time interval t = [0, t] is given by: where P(t) is the power consumed or produced by each component. Abnormal degradation. The evolution of the abnormal system degradation with time follows the modelling of the original work. In brief, the abnormal degradation model assumes the degradation of each system sub-components flow and efficiencies (i.e., θ) is governed by the following model: where a = U(0.001, 0.003), b = U(1.4, 1.6) and ξ = N(0, c) is the process noise with c = 0.001 when θ corresponds to an efficiency and c = 0.002 to a flow capacity. Since a, b and ξ are random variables, the evolution of the abnormal degradation with time is stochastic. The degradation process follows an exponential behaviour common in multiple damage propagation models (e.g., Arrhenius, Coffin-Manson, and Eyring models). Concretely, the modelling assumes a generalized equation for wear, w = Ae B(t) , which ignores micro-level deterioration processes but retains macro-level degradation characteristics. The between-flight maintenance is not explicitly modeled but is considered by the process noise. This allows the engine health parameters (flow and efficiency) to improve within allowable limits at any point and hence the loss in efficiency or flow is not locally monotonic (see step 2 in Figure 1).

Health Condition
The modelling approach assumes an overall health index of the engine i.e., HI(t). The health index of the engine is monitored at each flight, and the end of life is declared when the health index reaches a zero value i.e., HI = 0 or the system has reached more that 100 operative cycles. The overall health index is modelled as aggregation of four normalized remaining operative margins (hi µ ) that characterize the wear/health of the engine: In particular, the surge margins of the fan (SmFan), LPC (SmLPC) and HPC (SmHPC) and the exhaust gas temperature (T48) computed at reference conditions 2 are the operative margins considered that we denote as µ. Delta differences of these operative margins between a degraded engine and the corresponding values of a clean new engine are assumed as measures of wear i.e., w µ (t) ∼ µ(t) − µ new . Furthermore, the degradation model assumes upper wear thresholds, th w , that denote the operational limits beyond which the engine cannot be operated. Under this assumption, the evolution of the normalized remaining operative margins with time, hi µ (t), for each of the operative margins monitored is obtained by subtracting the wear from an upper threshold th w and normalizing it with respect to the upper threshold:

Sensor Noise
Measurement noise is an important source of variability present in real systems. A typical approach to model it is to add the white noise model to the simulated response. In this study, since there were no real data available to characterize true noise levels, we added Gaussian noise to the x s signals with a target Signal-to-Noise Ratio (SNR) target of 65 dB. With this noise intensity, the resulting noise level is in alignment with the measurement uncertainties reported in the literature for modern turbofan engines [24,25]. It should be noted that the flight conditions (w) contain already sensor noise since they are real flight data.

Technical Validation
Quality assurance and quality control of the provided data included the following steps performed by different teams. First, one team checked if the flight profiles were within the flight envelope of the CMAPSS dynamical model. Second, an independent team assessed whether the generated degradation profiles from the different dataset showed the intended characteristics: random initial wear, linear normal degradation, sharp abnormal degradation and smooth transition from normal to abnormal degradation. Finally, all the authors checked if the outputs of the engine model follow the expected behavior and are bounded by physically meaningful upper/lower values. In the following, we provide a closer look at some of these important aspects of the data generation process.

Examination of the Flight Profiles
All flight data were checked to ensure that only flight conditions within the validity of the simulation flight envelope of the CMAPSS model were used. Figure 4 shows the

Examination of the Degradation Trajectories
The degradation trajectories generated are designed to show three characteristics present in real systems: random initial wear, linear normal degradation, and abnormal degradation. Figure 5 shows the resulting evolution of the health index (i.e., H I) in the ten units of dataset DS01. We can observe that the initial deterioration of each unit is different and corresponds to an engine-to-engine variability equivalent to a 10% of the health index. The degradation of the affected system components follows a stochastic process with a linear normal degradation followed by a steeper abnormal degradation. The transition from normal to abnormal degradation is smooth. The degradation rate of each component varies within the fleet.

Examination of the Transition Times
The transition time (t s ) is dependent on the operating conditions i.e., flight profile. To illustrate the impact of the operative conditions on the onset of the abnormal degradation, Figure 6 shows the traces of degradation imposed on the high pressure turbine efficiency (HPT_Eff_mod), low pressure turbine efficiency (LPT_Eff_mod) and low pressure turbine flow (LPT_flow_mod) on three units of DS02. Each of the selected units correspond to a different flight class. Unit 11 is long flight unit (i.e., flight class 3), and the onset of the abnormal degradation occurs the earliest at 19 cycles. Unit 14 is short flight length unit (i.e., flight class 1) and has an onset at 36 cycles. Finally, Unit 15 is medium flight length unit (i.e., flight class 2) and has an onset at 24 cycles. We can observe that abnormal degradation arises later in Unit 14 and consequently can perform more flights. In addition to the quality assurance and quality control checks, two of the sets of data provided have been satisfactorily used in previous works. Specifically, dataset DS01 has been used for the application of model-based diagnostics [26] and dataset DS02 has been used for data-driven prognostics [27].

Usage Notes
The N-CMAPSS has the potential to facilitate the development of DL algorithms for predictive maintenance applications that are more easily transferable to real-world applications. The dataset can also serve as a benchmark enabling a better comparison of different algorithms and their extensions. Moreover, the N-CMAPSS dataset is a resource for the machine learning community to test new time-dependent algorithms. It should be noted that, contrarily to the original CMAPSS work, the N-CMAPSS provides the degradation trajectories in the form of θ. Therefore, the N-CMAPSS dataset can also be used to develop new physics-informed machine learning algorithms [28]. We conclude by providing a brief abstract formulation of prognostics and diagnostics problems aiming at facilitating the understanding of both problems to a larger scientific audience. i ∈ R s . The length of the sensory signal for the i-th unit is given by m i , which can, in general, differ from unit to unit. The total combined length of the available data set is m = ∑ N i=1 m i . More compactly, we denote the available dataset as

Prognostics Problem
. Given this set-up, the task is to obtain a predictive model G that provides a reliable RUL estimate (Ŷ) on a test dataset of M units D T * = {X s j * } M j=1 , where X s j * = [x 1 s j * , . . . , x k j s j * ] are multivariate time-series of sensors readings. The total combined length of the test data set is m * = ∑ M j=1 k j .

Evaluation Metric
Two common evaluation metrics in CMAPSS prognostics analysis in [11] are proposed to compare the prognostics results: root-mean-square error (RMSE) and NASA's scoring function [11] (s), which are defined as: where m * denotes the total number of test data samples, ∆ (j) is the difference between the estimated and the real RUL of the j sample (i.e., y (j) −ŷ (j) ), and α is 1 13 if RUL is under-estimated and 1 10 , otherwise. The resulting s metric is not symmetric and penalizes over-estimation more than under-estimation.

Diagnostics Problem
The formulation of the suggested diagnostic problem is formally introduced in the following. Multivariate time-series of condition monitoring sensors readings X s u = [x (1)  u ∈ R s . The length of the sensory signal for the u-th unit is given by m u , which can, in general, differ from unit to unit. The total combined length of the available data set is m = ∑ N u=1 m u . We consider the situa-tion where the CM data correspond to past operating conditions (i.e., t < t a u ), where the system's health state is healthy and denoted as H s u = [h (1) Therefore, in compact form, we denote the available unit specific data as . The system's components experience normal degradation during the healthy state. We consider the scenario where this normal degradation turns into an abnormal condition at t s u leading to an eventual failure at t EOL u (i.e., end-of-life). The fault detection task is to detect as early as possible the onset of the abnormal degradation within an independent test data set D T u = {(x (j) s u * } M u j=1 of future operating conditions (i.e., t > t a u ). This task comprises, therefore, the estimation of the true system health state on the test set. In addition, the diagnostics task involves performing a fault isolation and identifying the subsystem(s) affected by the fault.

Discussion
In this work, we provide a new CMAPSS dataset (N-CMAPSS) with run-to-failure degradation trajectories that incorporate two major fidelity improvements with respect to previous work. First, it considers real flight conditions as recorded on board a commercial jet. Second, it extends the degradation modelling by relating the degradation process to its operation history. The N-CMPASS dataset also provides fault class labels of each failure mode. Therefore, besides its applicability to prognostics problems, the dataset can be used for fault diagnostics. However, besides these notable improvements, the degradation process of turbofan engines can still be improved further and modelled with higher fidelity. In particular, we have considered an accelerated aging as compared to typical engines with full operative lifespans on the order of thousands of cycles. In addition, we have restricted the degradation modelling to certain fault types that can be represented by flow and efficiency modulation. Therefore, extending the represented fault types and the onboard sensors (e.g., accelerometers, oil debris monitoring, etc.) are natural extensions of the work.