A New Predictive Model Based on the ABC Optimized Multivariate Adaptive Regression Splines Approach for Predicting the Remaining Useful Life in Aircraft Engines

Remaining useful life (RUL) estimation is considered as one of the most central points in the prognostics and health management (PHM). The present paper describes a nonlinear hybrid ABC–MARS-based model for the prediction of the remaining useful life of aircraft engines. Indeed, it is well-known that an accurate RUL estimation allows failure prevention in a more controllable way so that the effective maintenance can be carried out in appropriate time to correct impending faults. The proposed hybrid model combines multivariate adaptive regression splines (MARS), which have been successfully adopted for regression problems, with the artificial bee colony (ABC) technique. This optimization technique involves parameter setting in the MARS training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not yet been widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid ABC–MARS-based model from the remaining measured parameters (input variables) for aircraft engines with success. A correlation coefficient equal to 0.92 was obtained when this hybrid ABC–MARS-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. The main advantage of this predictive model is that it does not require information about the previous operation states of the aircraft engine.


Introduction
An aircraft engine is the component of the propulsion system for an aircraft that generates mechanical power.Aircraft engines are almost always either lightweight piston engines or gas turbines [1,2].A gas turbine, also called a combustion turbine, is a type of internal combustion engine.It has an upstream rotating compressor coupled to a downstream turbine, and a combustion chamber in between.The basic operation of the gas turbine is similar to that of the steam power plant except that air is used instead of water.Fresh atmospheric air flows through a compressor that brings it to higher pressure.Energy is then added by spraying fuel into the air and igniting it so the combustion generates a high-temperature flow.This high-temperature high-pressure gas enters a turbine, where it expands down to the exhaust pressure, producing a shaft work output in the process.The turbine Energies 2016, 9, 409 2 of 19 shaft work is used to drive the compressor and other devices such as an electric generator that may be coupled to the shaft [3,4].The energy that is not used for shaft work comes out in the exhaust gases, so these have either a high temperature or a high velocity.The purpose of the gas turbine determines the design so that the most desirable energy form is maximized.Gas turbines are used to aircraft engines, trains, ships, electrical generators, and even tanks [3][4][5][6].Indeed, the engine diagram in Figure 1 shows the main elements of the engine model.Additionally, Figure 2 shows the flowchart corresponding to the simulation with its modules.
Energies 2016, 9, 409 2 of 20 turbine, where it expands down to the exhaust pressure, producing a shaft work output in the process.The turbine shaft work is used to drive the compressor and other devices such as an electric generator that may be coupled to the shaft [3,4].The energy that is not used for shaft work comes out in the exhaust gases, so these have either a high temperature or a high velocity.The purpose of the gas turbine determines the design so that the most desirable energy form is maximized.Gas turbines are used to aircraft engines, trains, ships, electrical generators, and even tanks [3][4][5][6].Indeed, the engine diagram in Figure 1 shows the main elements of the engine model.Additionally, Figure 2 shows the flowchart corresponding to the simulation with its modules.Energies 2016, 9, 409 2 of 20 turbine, where it expands down to the exhaust pressure, producing a shaft work output in the process.The turbine shaft work is used to drive the compressor and other devices such as an electric generator that may be coupled to the shaft [3,4].The energy that is not used for shaft work comes out in the exhaust gases, so these have either a high temperature or a high velocity.The purpose of the gas turbine determines the design so that the most desirable energy form is maximized.Gas turbines are used to aircraft engines, trains, ships, electrical generators, and even tanks [3][4][5][6].Indeed, the engine diagram in Figure 1 shows the main elements of the engine model.Additionally, Figure 2 shows the flowchart corresponding to the simulation with its modules.A particularly important concept is the prognostics of the aircraft turbine engine components.Indeed, prognostics is defined as an engineering discipline focused on predicting the time at which a system or a component will no longer perform its intended function.This lack of performance is most often a failure beyond which the system can no longer be used to meet desired performance [7].Prognostics is a very important aspect of an equipment health management system.A reliable prognostic system provides some benefits in terms of the financial cost saving and safety.The prerequisite of the benefit from the prognostics is the correct detection of fault by an efficient diagnostic system because diagnostic decision triggers the prognostic system [8].The predicted time then becomes the remaining useful life (RUL) [9][10][11][12][13], which is an important concept in decision making for contingency mitigation.Therefore, prognostics is explicitly defined as the estimation of the remaining useful life (RUL) of any equipment.The science of prognostics is based on the analysis of failure modes, detection of early signs of wear and aging, and fault conditions.
It is to be noted that the safe and reliable operation of engineering systems such as aircraft engines is of great significance for modern energy plants, production quality, preservation of human health and life, etc. [14,15].In this way, reliability is theoretically defined as the probability of failure, the frequency of failures, or in terms of availability, a probability derived from reliability and maintainability.Specifically, the concept of remaining useful life (RUL) is defined here as the number of remaining time units that the equipment has before it reaches the limit for a safe operation [7][8][9][10][11][12][13][14][15][16].
Taking into account that the RUL of any device can be considered as a random variable [17], since it depends on some variables that are properties of each equipment such as its age, the operation environment and the operation conditions, a statistical approach of this operation variable would be a convenient method.Previous studies have tried to predict the RUL based on time series [18,19].However, nowadays it is more useful to use models based on the monitoring of the operation variables values during the operation of the equipment [20,21].Therefore, the present study uses this approach.
The objective of this study is to evaluate the application of the multivariate adaptive regression splines (MARS) approach in combination with the Artificial Bee Colony (ABC) technique for the calculation of a predictive model of the RUL for aircraft engines.The MARS technique is based on the statistical learning theory and is a new class of models that can be used to predict values in very different areas [22][23][24][25][26].This study uses multivariate adaptive regression splines (MARS), a nonlinear and non-parametric regression methodology, to build a RUL forecasting model from experimental turbines data.It is a nonparametric regression technique and can be seen as an extension of linear models that automatically model nonlinearities and complex interactions between variables.The statistical learning theory is the theoretical foundation for the learning algorithm of the MARS technique [22][23][24][25][26][27].Some motivations behind the application of the proposed method with respect to other already existing techniques are as follows: (1) MARS models are more flexible than linear regression models; (2) MARS models are simple to understand and interpret; (3) MARS can handle both continuous and categorical data; (4) MARS models tend to have a good bias-variance trade-off; and (5) MARS models give us an explicit mathematical formula of the dependent variable as a function of the independent variables through an expansion of basis functions (hinge functions and products of two or more hinge functions).This last feature is a fundamental difference compared to other alternative methods because most of them behave like a black box.In order to carry out the optimization mechanism corresponding to the optimal hyperparameters setting in the MARS training, the artificial bee colony (ABC) technique was used here with success.The artificial bee colony (ABC) technique is an optimization algorithm based on the intelligent foraging behavior of honeybee swarms [28].Similar to other evolutionary computation algorithms such as particle swarm optimization (PSO) [29][30][31] or ant colony optimization [32], ABC exploits the model of social sharing of information [28,33,34].For the above-mentioned purpose, a hybrid ABC optimized MARS (ABC-MARS) model was used as an automated learning tool, training it in order to predict remaining useful life (RUL) from other turbine's operation parameters.According to previous research, the MARS technique has been proven to be an effective tool to predict natural parameters, being successfully used in a wide range of environmental fields: medical applications [35,36], forest modeling [37], prediction of the air quality [38], water quality studies [39,40], solution of geotechnical problems, particularly those that lack a precise analytical theory or understanding of the phenomena involved [41], and so on.
In summary, the present study is structured as follows: firstly, the materials, methods and dataset used are explained; secondly, the results of the new hybrid ABC-MARS-based model trained are presented and discussed; and finally, the main conclusions of this research work are drawn.Therefore, the aim of this research work was to obtain a predictive model for the RUL as well as the contribution degree of each input operation parameter.

Experimental Dataset
The present study uses a dataset corresponding to an aircraft engine of 90,000 lbs.thrust class.Indeed, this dataset includes working conditions with heights from sea conditions to 40,000 feet and temperatures from ´51 ˝C to 39 ˝C.Specifically, the above dataset was obtained through a software application known as MAPSS (Modular Aero-Propulsion System Simulation) [42].This is a simulation environment for aeronautical turbines that allows access to a range of monitoring parameters controlling the operating state of the system through a graphical user interface.This provides the user with a simulation environment that is able to develop advanced algorithms for control and diagnosis that can be tested on a generic aircraft engine simulator.Additionally, MAPSS is able to generate linear state space models from which the user can create linear controls.Furthermore, MAPSS is also able to perform simulations of transient states.These capabilities allow the user to verify the performance of the control algorithms and its validation in a generic engine model [42].The MAPSS program has two well-known versions for both civil and military applications.As a final note, the military version is able to perform realistic simulations within the FADEC (Full Authority Digital Engine Controllers) [43] standard and it will be used for the present research.
The database of the study consists of data for a total of 100 turbines, with a number of observations that ranges from 128 to 362 for each one, being the dependent variable in the RUL.The total number of observations for all of the turbines amounts to 20,631 observations.The database contains input parameters that simulate the effects of faults and deterioration in any of the rotating components of the engine (fan, LPC, HPC, HPT, and LPT).Data belong to a fleet of similar aircrafts with the same engines.It is important to emphasize that each engine starts with a different health status but at the beginning all are inside the operational range and the dataset convers since that moment to the engine failure.
Note that each record of the engine state is formed by a set of 24 variables, three of them are operational settings and the other 21 represent values for engine performance measurements, which are contaminated with noise.A complete list of variables is presented in Table 1.The operability margins, such as stall and temperature margins, define the safe operation region for the engine [44].Six different flight conditions are contained in the database, with altitudes from 0 to 42,000 feet, speeds from 0 to 0.84 Mach and throttle resolver angle from 20 ˝to 100 ˝.For further information about how the engine run-to-failure simulation is performed, the readers can consult the reference [42].
After a first exploratory analysis, it was found that the measurements of the variables Total temperature at fan inlet, Pressure at fan inlet, Engine pressure ratio, Burner fuel-air ratio, Demanded fan speed and Demanded corrected fan speed are constant and that variable Total pressure in bypass-duct oscillates between 21.6 and 21.61 psia.Therefore, these variables will not be used in the calculations.Additionally, the absolute value of the correlation coefficients between each pair of variables is graphically represented by the correlation matrix [25,26] in Figure 3. Two additional variables, Physical core speed (C_sp) and Corrected core speed (C_csp), were discarded due to their low correlation with the independent variable RUL.Indeed, only the remaining twelve variables are used as independent variables in the modeling of the dependent variable RUL and its simulation.As an example, Figure 4 shows the 189 observations of the independent and dependent variables for the turbine 4. since it depends on the properties of the equipment such as its age, the operation environment and the operation conditions.The dataset that has been analyzed is adequate for data-driven approach since sufficient data and RUL values are available with dataset.The ABC-MARS-based model is suitable for this case in order to achieve an effective approach to nonlinearities present in this regression problem.Additionally, it is not appropriate for physics-based modeling since the health index parameters are not given, and no physics-based model found for whole engine system degradation.Accordingly, the database referred previously will be used for the training and validation of the hybrid model proposed in this paper using a random 90% of the information for training and the remaining 10% for validation.

Multivariate Adaptive Regression Splines (MARS)
Multivariate adaptive regression splines (MARS) is a multivariate nonparametric classification/regression technique [22][23][24].Its main purpose is to predict the values of a continuous dependent variable,   . The MARS model can be represented as: The RUL prognosis is a complex and highly nonlinear problem. Figure 4 shows that RUL is a multivariate function of the operating parameters of the turbofan engines.In general, there are two major approaches to prognostics: physics-based and data-driven models.In the first case one needs to build an accurate physical model of system behavior either in normal state or in faulty conditions.Comparing the data captured from the sensors to the model predictions, one can obtain the health indicator of the system.In the second case, there is no physical model but the data with relevant events are available.On the one hand, physics-based models require the presence of a mathematical representation of the physics of failure degradation and the parameters used in degradation modeling.On the other hand, data-driven models require statistically sufficient run-to-failure samples, due to the fact that RUL of turbofan engine can be considered as a random variable [17], since it depends on the properties of the equipment such as its age, the operation environment and the operation conditions.
The dataset that has been analyzed is adequate for data-driven approach since sufficient data and RUL values are available with dataset.The ABC-MARS-based model is suitable for this case in order to achieve an effective approach to nonlinearities present in this regression problem.Additionally, it is not appropriate for physics-based modeling since the health index parameters are not given, and no physics-based model found for whole engine system degradation.Accordingly, the database referred previously will be used for the training and validation of the hybrid model proposed in this paper using a random 90% of the information for training and the remaining 10% for validation.

Multivariate Adaptive Regression Splines (MARS)
Multivariate adaptive regression splines (MARS) is a multivariate nonparametric classification/regression technique [22][23][24].Its main purpose is to predict the values of a continuous dependent variable, y pn ˆ1q, from a set of independent explanatory variables, X pn ˆpq.The MARS model can be represented as: where f is a weighted sum of basis functions that depend on X and e is an error vector of dimension pn ˆ1q.MARS model does not require any a priori assumptions about the underlying functional relationship between dependent and independent variables.Instead, this relation is uncovered from a set of coefficients and piecewise polynomials of degree q (basis functions) that are entirely "driven" from the regression data pX, yq.The MARS regression model is constructed by fitting basis functions to distinct intervals of the independent variables.Generally, piecewise polynomials, also called splines, have pieces smoothly connected together.In MARS terminology, the joining points of the polynomials are called knots, nodes or breakdown points.These will be denoted by the small letter t.For a spline of degree q each segment is a polynomial function.MARS uses two-sided truncated power functions as spline basis functions, described by the following equations [22][23][24]27,[35][36][37][38][39][40]: r`px ´tqs q `" # pt ´xq q i f x ě t where q pě 0q is the power to which the splines are raised and which determines the degree of smoothness of the resultant function estimate.When q " 1, which is the case in this study, only simple linear splines are considered.A pair of splines for q " 1 at the knot t " 3.5 is presented in Figure 5.
The MARS model of a dependent variable y with M basis functions (terms) can be written as [22][23][24]27,[35][36][37][38][39][40]: where ŷ is the dependent variable predicted by the MARS model, c 0 is a constant, B m pxq is the m-th basis function, which may be a single spline basis functions, and c m is the coefficient of the m-th basis functions.
is the power to which the splines are raised and which determines the degree of smoothness of the resultant function estimate.When 1 q  , which is the case in this study, only simple linear splines are considered.A pair of splines for 1 q  at the knot 3.5 t  is presented in shown as a dashed line; and the right spline ( x t The MARS model of a dependent variable y with M basis functions (terms) can be written as [22][23][24]27,[35][36][37][38][39][40]: where ŷ is the dependent variable predicted by the MARS model, c0 is a constant,

 
B m x is the m-th basis function, which may be a single spline basis functions, and c m is the coefficient of the m-th basis functions.
Both the variables to be introduced into the model and the knot positions for each individual variable have to be optimized.For a dataset X containing n objects and p explanatory variables, there are N n p   pairs of spline basis functions, given by Equations ( 2) and (3), with knot A two-step procedure is followed to construct the final model.First, in order to select the consecutive pairs of basis functions of the model, a two-at-a-time forward stepwise procedure is implemented [22][23][24]27,[35][36][37][38][39][40].This forward stepwise selection of basis function leads to a very complex and overfitted model.Such a model, although it fits the data well, has poor predictive abilities for new objects.To improve the prediction, the redundant basis functions are removed one at a time using a backward stepwise procedure.To determine which basis functions should be included in the model, MARS utilizes the generalized cross-validation (GCV) [27,[35][36][37][38][39][40].In this way, Both the variables to be introduced into the model and the knot positions for each individual variable have to be optimized.For a dataset X containing n objects and p explanatory variables, there are N " n ˆp pairs of spline basis functions, given by Equations ( 2) and ( 3), with knot locations x ij (i " 1, 2, . . ., n; j " 1, 2, . . ., p).
A two-step procedure is followed to construct the final model.First, in order to select the consecutive pairs of basis functions of the model, a two-at-a-time forward stepwise procedure is implemented [22][23][24]27,[35][36][37][38][39][40].This forward stepwise selection of basis function leads to a very complex and overfitted model.Such a model, although it fits the data well, has poor predictive abilities for new objects.To improve the prediction, the redundant basis functions are removed one at a time using a backward stepwise procedure.To determine which basis functions should be included in the model, MARS utilizes the generalized cross-validation (GCV) [27,[35][36][37][38][39][40].In this way, the GCV is the mean squared residual error divided by a penalty dependent on the model complexity.The GCV criterion is defined in the following way [22][23][24]27,[35][36][37][38][39][40]: where C pMq is a complexity penalty that increases with the number of basis functions in the model and is defined as [22][23][24]27,[35][36][37][38][39][40]: where M is the number of basis functions in Equation ( 4), and the parameter d is a penalty for each basis function included into the model.It can be also regarded as a smoothing parameter.Large values of d lead to fewer basis functions and therefore smoother function estimates.
Once the MARS model is constructed, it is possible to evaluate the importance of the explanatory variables used to construct the basis functions.Establishing predictor importance is in general a complex problem, which in general requires the use of more than one criterion.In order to obtain Energies 2016, 9, 409 9 of 19 reliable results, it is convenient the use of the GCV parameter explained before together with the parameters Nsubsets (criterion counts the number of model subsets in which each variable is included) and the residual sum of squares RSS [22][23][24]27,[35][36][37][38][39][40].

The Artificial Bee Colony (ABC) Algorithm
The algorithm Artificial Bee Colony (ABC) is an evolutionary optimization algorithm inspired in the behavior of bees foraging food sources [28,33,34].Indeed, in the ABC technique, the colony consists of three groups of bees: employed bees, onlookers and scouts.The N food sources are the possible set of solutions and are represented by the vectors p i .It represents its position in the search space of possible solutions.The food source dimension is the number of parameters of the optimization problem.The algorithm initializes the food sources or possible solutions of the problem randomly in a plausible hypercube and the fitness of each food source is evaluated.The relation between the objective function F and the fitness of a food source is given by (see Figure 6): values of d lead to fewer basis functions and therefore smoother function estimates.
Once the MARS model is constructed, it is possible to evaluate the importance of the explanatory variables used to construct the basis functions.Establishing predictor importance is in general a complex problem, which in general requires the use of more than one criterion.In order to obtain reliable results, it is convenient the use of the GCV parameter explained before together with the parameters Nsubsets (criterion counts the number of model subsets in which each variable is included) and the residual sum of squares RSS [22][23][24]27,[35][36][37][38][39][40].

The Artificial Bee Colony (ABC) Algorithm
The algorithm Artificial Bee Colony (ABC) is an evolutionary optimization algorithm inspired in the behavior of bees foraging food sources [28,33,34].Indeed, in the ABC technique, the colony consists of three groups of bees: employed bees, onlookers and scouts.The N food sources are the possible set of solutions and are represented by the vectors i p .It represents its position in the search space of possible solutions.The food source dimension is the number of parameters of the optimization problem.The algorithm initializes the food sources or possible solutions of the problem randomly in a plausible hypercube and the fitness of each food source is evaluated.The relation between the objective function F and the fitness of a food source is given by (see Figure 6):  The lower the objective function value, the higher the fitness.As the algorithm searches for the highest fitness of a food source, it minimizes.The ABC technique has three phases [28,33,34]: The Employee Bee Phase In the first phase, the employee bees forage the food sources and tries to introduce a variation of every i food source according with the equation [28,33,34]: where j is the randomly chosen parameter we are modifying, k a randomly chosen food source different from i and R ij a number chosen randomly in [´1,1].Once v ij is calculated, its fitness is obtained.If this is higher than f itness `F `pij ˘˘, its value is changed to v ij and the trial counter set to one.If not, the value of the food source does not change and the trial counter is increased.

‚
The Onlooker Bee Phase For each food source p i , we draw a number r i in r0, 1s.If r i ă prob i , we try again to change one parameter in the food source.The quantity prob i is obtained from the fitness of this food source as follows [28,33,34]:

‚
The Scout Bee Phase If, after a determined number of trials, a food source is not improved, it is discarded and a new one is randomly chosen from the initial searching space.The food source with the highest fitness is the temporal optimum in this iteration [28,33,34].This cycle is continued until a stop condition is met.In the present case, the stop condition has been a maximum number of iterations and the repetition of the optimum for a determined number of iterations.If this occurs, it is assumed that the algorithm has already converged.

Algorithm Goodness-of-Fit Measurement
To estimate remaining useful life from other parameters measured in the aircraft engines, it is important to select the model that best fits the experimental data.The criterion considered in this research to measure the goodness-of-fit was the coefficient of determination R 2 [45][46][47].This ratio indicates the proportion of total variation in the dependent variable explained by the model (inside-bark volume in our case).A dataset takes values t i , each of which has an associated modeled value y i .The former are called the observed values and the latter are often referred to as the predicted values.Variability in the dataset is measured through different sums of squares [45][46][47]: `ti ´t˘2 : The total sum of squares, proportional to the sample variance.
`yi ´t˘2 : The regression sum of squares, also called the explained sum of squares.
pt i ´yi q 2 : The residual sum of squares.
In the previous sums, t is the mean of the n observed data: Bearing in mind the above sums, the general definition of the coefficient of determination is: A coefficient of determination value of 1.0 indicates that the regression curve fits the data perfectly.

Analysis of Results and Discussion
The total number of predicting variables used to build the hybrid ABC-MARS model was 12.The output predicted variable was the remaining useful life.

‚
Maximum number of basis functions (Maxfuncs): Maximum number of model terms before pruning, i.e., the maximum number of terms created by the forward pass.

‚ Penalty parameter (d):
The Generalized Cross Validation (GCV) penalty per knot.A value of 0 penalizes only terms, not knots.The value ´1 means no penalty.

‚
Interactions: Maximum degree of interaction between variables.
Therefore, a hybrid ABC-MARS-based model was applied to predict the RUL (output variable) from the other twelve remaining variables (input variables) studying their influence in order to optimize its calculation through the analysis of the determination coefficient R 2 with success.Figure 7 shows the flowchart of this hybrid ABC-MARS-based model developed in this research work.
The output predicted variable was the remaining useful life.


Maximum number of basis functions (Maxfuncs): Maximum number of model terms before pruning, i.e., the maximum number of terms created by the forward pass. Penalty parameter (d): The Generalized Cross Validation (GCV) penalty per knot.A value of 0 penalizes only terms, not knots.The value −1 means no penalty. Interactions: Maximum degree of interaction between variables.
Therefore, a hybrid ABC-MARS-based model was applied to predict the RUL (output variable) from the other twelve remaining variables (input variables) studying their influence in order to optimize its calculation through the analysis of the determination coefficient R 2 with success.Figure 7 shows the flowchart of this hybrid ABC-MARS-based model developed in this research work.The determination coefficient is a statistical measure of how well a regression curve approximates real data points.Furthermore, it is a descriptive measure ranging from zero to one, indicating how good one term is predicted by another one.Thus, R 2 = 1 indicates the best approximation and R 2 = 0 the worst one.
Cross validation was the standard technique used here for finding the real coefficient of determination (R 2 ) for the turbines analyzed [48,49].The dataset is randomly divided into l disjoint subsets of equal size, and each subset is used once as a validation set, whereas the other l ´1 subsets are put together to form a training set.In the simplest case, the average accuracy of the l validation sets is used as an estimator for the accuracy of the method.The combination of the hyperparameters with the best performance is chosen [37][38][39][40][45][46][47][48][49].In this way, 10-fold cross-validation was used.
In order to guarantee the prediction ability of the ABC-MARS-based model, an exhaustive cross-validation algorithm was used.The referred algorithm consists in splitting the sample into 10 parts and using 9 of them for training and the remaining one for testing.This process was performed 10 times using each of the parties of the 10 divisions for testing and calculating the average error.Therefore, all of the possible variability of ABC-MARS-based model parameters has been evaluated in order to get the optimum point, looking for those parameters that minimize the average error.With these optimal hyperparameters, the error criterion was calculated from the built model using 90% of the sample for training and validated with the remaining 10%.In this way, we are able to simulate as much as possible the real conditions under which the model would be built in order to later fit it to new observation data unrelated to the construction of the model.
The dependent variable is the RUL and twelve independent variables x k i were used in this research work.The variable number is i " 1, . . ., 12 and k " 1, . . ., 100 is the turbine number.For each variable and turbine, the independent variables have been fitted with cubic polynomials: `ak 1i t `ak 2i t 2 `ak 3i t 3 i " 1, . . ., 12, k " 1, . . ., 100, t " 1, . . ., n k (12) where t is the observation number and n k is the total number of observations for turbine k.These polynomials have been represented previously as red lines in Figure 3.The observed data are substituted by the fitted data from here and will be called xk i .It must be noted that the independent variable RUL has not been changed, only the independent data.
The regression modeling has been performed with Multivariate adaptive regression splines, using the Earth package [50] and the parameters have been optimized with the ABC technique using the ABCoptim package [51] from the R Project.The bounds (initial ranges) of the space of solutions used in ABC technique are shown in Table 2.The number of particles used has been 10.The stopping criterion has been a maximum number of iterations of 100 and the repetition of the optimum for 10 of iterations.To optimize the MARS parameters, the ABC module is used.The ABC searches for the best Maxfuncs, Penalty, and Interactions parameters by comparing the cross-validation error in every iterations.The search space is organized in three dimensions, one for each parameter.Main fitness factor is the coefficient of determination (R 2 ).
Table 3 shows the optimal hyperparameters of the best-fitted ABC-MARS-based model found with the artificial bee colony (ABC) technique.The results of the best fitted ABC-MARS-based model computed using all of the available data observations are shown in the supplementary appendix (see Table A1), with a list of 139 main basis functions for fitted ABC-MARS-based model and their coefficients, respectively.Therefore, the MARS model is a form of non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions as a weighted sum of basis functions called hinge functions [22][23][24]27,[35][36][37][38][39][40].
Additionally, the coefficient of determination and correlation coefficient for the ABC-MARS-based model are 0.84 and 0.92, respectively.An important goodness of fit, that is to say, a good agreement between the model and the experimental data, can be infered from these results.Additionally, a linear model was also constructed for reference pursposes and its goodness of fit with a 10-fold cross validation was R 2 = 0.61, significatively lower than the obtained for the ABC-MARS model.The significance ranking for the twelve input variables predicting the RUL (output variable) in this high nonlinear complex problem is shown in Table 4. Thus, for this hybrid MARS model the most significant variable in RUL prediction is the Total temperature at HPC outlet, followed by Corrected fan speed, Total temperature at LPC outlet, Total pressure at HPC outlet, Physical fan speed, Ratio of fuel flow to Ps30, LPT coolant bleed, Total temperature at LPT outlet, Bypass ratio, HPT coolant bleed, Bleed enthalpy and finally Static pressure at HPC outlet.This research work was able to predict the remaining useful life (RUL) for aircraft engines in agreement with the real experimental values of remaining useful life observed with a big accurateness.Indeed, Figure 8 shows the remaining useful life observed experimentally versus the remaining useful life predicted by means of the hybrid model.One may observe immediately that there is a good agreement between both values that corresponds to a high Pearson's correlation coefficient of 0.92.

Conclusions
The proposed hybrid model accurately predicts the RUL of the turbines.Furthermore, the model performed requires no knowledge of the previous status of the aircraft engine, as it only requires information of the current situation of the same data.This model provides the advantage of system robustness against possible failures in the memory registers.As a future line of research, the authors propose the performance of models taking into account the values of the input variables at earlier time points, in order to determine the remaining useful life of the aircraft engines.From the point of view of the authors, these models should provide a better fit but suffer from the need to have in memory the values at previous time points of the variables included in the model.These models would be of interest only if they could be trained with fewer variables than the model that does not require information of the previous state.Therefore, the use of an ABC-MARS-based model is necessary in order to achieve an effective approach to nonlinearities present in this regression problem.Obviously, these results coincide again with the outcome criterion of "quality of the fit" (R 2 ) so that the MARS model with the ABC technique has been the best fitting.The predictive ability of machine learning data-driven models, as the model developed here, depends on the proper training.Therefore, it is necessary to have statistically

Figure 2 .
Figure 2. Layout showing various modules and their connections as modeled in the simulation.

Figure 2 .
Figure 2. Layout showing various modules and their connections as modeled in the simulation.Figure 2. Layout showing various modules and their connections as modeled in the simulation.

Figure 2 .
Figure 2. Layout showing various modules and their connections as modeled in the simulation.Figure 2. Layout showing various modules and their connections as modeled in the simulation.

Figure 3 .
Figure 3. Correlation matrix of this problem with the two discarded independent variables: Physical core speed (C_sp) and Corrected core speed (C_csp).Figure 3. Correlation matrix of this problem with the two discarded independent variables: Physical core speed (C_sp) and Corrected core speed (C_csp).

Figure 3 .
Figure 3. Correlation matrix of this problem with the two discarded independent variables: Physical core speed (C_sp) and Corrected core speed (C_csp).Figure 3. Correlation matrix of this problem with the two discarded independent variables: Physical core speed (C_sp) and Corrected core speed (C_csp).

Figure 3 .
Figure 3. Correlation matrix of this problem with the two discarded independent variables: Physical core speed (C_sp) and Corrected core speed (C_csp).

Figure 4 .
Figure 4. Independent variables vs. dependent variable (RUL) for turbine 4.This turbine contains 189 observations for each variable (red line represents the fitted third degree polynomial or cubic polynomial).

Figure 4 .
Figure 4. Independent variables vs. dependent variable (RUL) for turbine 4.This turbine contains 189 observations for each variable (red line represents the fitted third degree polynomial or cubic polynomial).

Figure 5 .
Figure 5.A graphical representation of a spline basis function.The left spline ( x t

Figure 5 .
Figure 5.A graphical representation of a spline basis function.The left spline (x ă t, ´px ´tq) is shown as a dashed line; and the right spline (x ą t, `px ´tq) as a solid line.

Figure 6 .
Figure 6.Relation between the fitness of a food source and objective function.Figure 6. Relation between the fitness of a food source and objective function.

Figure 6 .
Figure 6.Relation between the fitness of a food source and objective function.Figure 6. Relation between the fitness of a food source and objective function.

Figure 7 .
Figure 7. Flowchart of the hybrid ABC-MARS-based model.Figure 7. Flowchart of the hybrid ABC-MARS-based model.

Figure 7 .
Figure 7. Flowchart of the hybrid ABC-MARS-based model.Figure 7. Flowchart of the hybrid ABC-MARS-based model.
of their conditions of service (run-to-failure samples) to apply these models to predict the RUL of turbofan engines.

Figure 8 .
Figure 8.Comparison between the remaining useful life observed and predicted by the hybrid ABC-MARS-based model for aircraft engines (r = 0.92) for the first 3200 observations.

Figure 8 .
Figure 8.Comparison between the remaining useful life observed and predicted by the hybrid ABC-MARS-based model for aircraft engines (r = 0.92) for the first 3200 observations.

Table 1 .
MAPSS input variables of the simulated engine to simulate the remaining useful life (RUL) with their mean and standard deviation.

Table 2 .
Initial ranges of the three hyperparameters of the ABC-MARS-based model fitted in this study.

Table 3 .
Optimal hyperparameters of the best-fitted MARS model found with the ABC technique.

Table 4 .
Significance ranking for the variables involved in the best fitted ABC-MARS-based model for the RUL prediction according to criteria Nsubsets, GCV and RSS.

Table A1 .
List of basis functions of the best fitted ABC-MARS-based model for the RUL and their coefficients c i .