Fast Evaluation of Aircraft Icing Severity Using Machine Learning Based on XGBoost

: Aircraft icing represents a serious hazard in aviation which has caused a number of fatal accidents over the years. In addition, it can lead to substantial increase in drag and weight, thus reducing the aerodynamics performance of the airplane. The process of ice accretion on a solid surface is a complex interaction of aerodynamic and environmental variables. The complex relationship makes machine learning-based methods an attractive alternative to traditional numerical simulation-based approaches. In this study, we introduce a purely data-driven approach to ﬁnd the complex pattern between different ﬂight conditions and aircraft icing severity prediction. The supervised learning algorithm Extreme Gradient Boosting (XGBoost) is applied to establish the prediction framework which makes prediction based on any set of observations. The input ﬂight conditions for the proposed prediction framework are liquid water content, droplet diameter and exposure time. The proposed approach is demonstrated in three cases: maximum ice thickness prediction, icing area prediction and icing severity level evaluation. Performance comparison studies and error analysis are also conducted to verify the effectiveness and performance of the proposed method. Results show that the proposed method has reasonable capability in evaluating aircraft icing severity.


Introduction
Ice accretion on aircraft surfaces has been the principal cause of several flight accidents in the past and represents a source of major concern in aviation. Aircraft icing often happens when aircraft encounters clouds with supercooled water droplets in flight. The droplets impacting the aircraft then accumulate as ice on an aircraft surfaces. The formation process of ice is a complex interaction of aerodynamic and environmental variables [1]. The aerodynamic factors include flight speed, attack angle and exposure time. The environmental variables include liquid water content (LWC), droplet diameter and free-stream temperature.
Aircraft icing is an area of active research for its relevance to aviation safety and aerodynamic performance of the airplane [2]. Experiments [3] were firstly carried out to study the icing formation and determine the icing severity level. With the building up of the theoretical icing models [4,5], considerable research has been done to investigate the icing phenomenon numerically. For example, the LEWICE code [6,7] based on the Messinger model [4] is developed by the NASA Glenn Research diameter and exposure time. The corresponding outputs are the icing severity level [28], area size covered by ice and maximum ice thickness. Performance comparison studies and error analysis are conducted to verify the effectiveness and performance of the proposed method.
The present study proposes a method to fast evaluate the aircraft icing severity based on the extreme gradient boosting machine learning method . Applications to three icing features demonstrate that the proposed approach can provide a suitable alternative to numerical simulation approach with reasonable accuracy and shorter computational time. Error analysis method containing various components is established to provide an accurate quantitative measure of the predictions. Additionally, the effect of different aerodynamic and meteorological factors on the icing severity results can be reported by the current method. Potential uses for this method include estimating degradation of the aircraft aerodynamic performance by coupling with other computational fluid dynamics codes and increasing the flight safety by coupling with other ice protection systems.

Computational Methods
The turbulent flow field is obtained by using rhoEnergyFoam solver [27] to solve the compressible Navier-Stokes equations. The rhoEnergyFoam solver is a newly released solver in OpenFOAM open-source community [13], which is a high-fidelity solver with capability of conserving the total kinetic energy in the inviscid limit.
The governing equations rhoEnergyFoam solves are the Navier-Stokes equations for a compressible ideal gas, integrated over an arbitrary control volume V.
where n is the outward normal, u, f i and f v i represent the conservative variables vector, the associated Eulerian and viscous fluxes, respectively. Here, ρ is the density of the air, u i is the velocity component in the i-th coordinate direction, E is the total energy per unit mass, e is the internal energy per unit mass, H is the total enthalpy, σ ij is the viscous stress tensor, q i is the heat flux vector. (2) Then, the Eulerian fluxes are split into the convective and pressure contribution: The convective and pressure fluxes both consist of a central part and a diffusive part. The central part of the convective flux is evaluated through the second-order accurate midpoint interpolation scheme which discretely preserves kinetic energy of the flow from convection, and the pressure flux is evaluated through the standard central interpolation. A shock sensor is used in rhoEnergyFoam solver to judge the smoothness of the numerical solution. When capturing shock waves, the artificial diffusion terms provided by the Advection Upstream Splitting Method (AUSM) [29] are applied to pressure and convective fluxes. Since the current work targets on conducting delayed detached eddy simulation (DDES) simulation and the mesh resolution is fine enough, so no artificial diffusion is needed and the AUSM scheme is deactivated to preserve the total kinetic energy for large-eddy simulation (LES) simulation. A third-order, four-stage Runge-Kutta scheme is used to advance in time.
The delayed detached eddy simulation (DDES) proposed by Spalart et al. [26] is adopted for solving the airflow field. The DDES method considers a full RANS model in the near wall region and combines it with a large-eddy simulation (LES) for the outer flow. The governing equations can be described as follows.
whereν is the modified eddy viscosity andν = ν t / f v1 (y + ). The f v1 is chosen such thatν in the proximity of walls. The coefficients and blending functions can be found in the original paper [30].
The length scaled is expressed as follows. where where U i,j is the velocity gradients, κ is the Karman constant and d is the distance to the wall.
The shielding function f d is introduced to detect the boundary layer and delays the switch to LES mode until outside of it. After obtaining the airflow results, the icing thermodynamic process is solved by applying the icing solver developed by Li and Paoli [12]. The thermal wall function derived by Da Silva et al. [31] has been implemented and applied to the wing surface to study the roughness effect caused by ice accretion. The roughness effect is considered by using the momentum and heat transfer analogy factor. Here, α t is the turbulent thermal diffusivity.
where µ t is the turbulent viscosity, Pr t is the turbulent Prandtl number, C f is the rough skin friction, U τ is the shear velocity, and K s is the equivalent sand-grain roughness. a, b, and C are defined by Stefanini et al. [32] as three constants as −0.45, −0.8 and 1.42, respectively.

Problem Description
Among the parameters affect aircraft icing, the cloud type (Stratiform cloud and Cumuliform cloud) determines the liquid water content and water droplet diameter distributions [28]. In the stratiform cloud, smaller droplet diameter and lower liquid water content with larger horizontal distribution area often occur. FAR-25 [15] defines the relationship among LWC, droplet diameter and ambient temperature as shown in Figure 1. The corresponding icing condition is referred as continuous maximum icing (CMI). The water droplet diameter in the cumuliform cloud is larger than that in the stratiform cloud. Therefore, the FAR-25 defines icing condition occurring in the cumuliform cloud as intermittent maximum icing (IMI) condition. The relationship among LWC, droplet diameter and ambient temperature for IMI condition is shown in Figure 2. Hence, different meteorological variables (LWC, droplet diameter) can be extracted from Figures 1 and 2 to further study how these factors affect the icing severity level. Additionally, different cloud sizes lead directly to different exposure time. The severity and extent of aircraft icing are positively correlated with the exposure time. Therefore, multiple exposure time are assigned to the extracted meteorological variables (LWC, droplet diameter) to study the effect of exposure time on aircraft icing severity.  The NACA0012 airfoil is studied in this paper. The flow conditions correspond to those of the experiment test case of Shin et al. [33], namely the angle of attack is 4 • , the free stream velocity is u ∞ = 67.05 m/s, the pressure is p ∞ = 101,300 pa. The static temperature T = 244.7 K. The selected LWC, droplet diameter and exposure time are applied for the icing severity calculation.

Data-Driven Methods
The second part of the study made use of machine learning models for icing severity prediction. In order to predict the aircraft icing severity precisely, we firstly predict the icing severity level (Table 1) based on icing thickness described by Cao et al. [28] Aviation meteorologists could issue forecasts of meteorological variables, but the pilots would have to interpret the forecasts into an icing hazard for their own situation. Instead, the pilots would have a good idea how hazardous the icing is if they can get the icing severity level in report [1]. Besides the icing severity level, the machine learning models are also trained to predict the size of the area on the airfoil covered by ice and maximum ice thickness. Four categories are introduced in Table 1 to describe the icing severity levels. It is reasonable to consider the ice maximum thickness rather than the rate of accretion in determining the icing severity levels. Indeed, the ice thickness depends on the environmental conditions as well as the time spent in such conditions. The flight performance will not be greatly affected if the airplane does not stay for long in icing state even if the icing condition is severe.

The Extreme Gradient Boosting Model
XGBoost is a decision tree based method. It is a supervised machine learning algorithm for structured or tabular datasets on classication and regression predictive modeling problems. A detailed explanation of the computational methods of XGBoost [21] is beyond the scope of the current study. A brief introduction and generalized algorithm are presented.
Boosting is an ensemble technique where new models are added to correct the errors made by existing models. Models are added sequentially until no further improvements can be made. XGBoost is an improved Gradient Boosting algorithm where new models are created that predict the residuals or errors of prior models and then added together to make the final prediction [21]. It is called gradient boosting because it uses a gradient descent algorithm to minimize the loss when adding new models.
As an improvement, XGBoost adds the regularization factor Ω(θ) to the loss function to represent the complexity of the trees, and it defines the objective function of the optimization in the training model, which is given by where the θ is the model parameter trained from given data, L is the training loss function, such as square loss or logistic loss, which represents the matching degree between the model and the given training data. Ω(θ) is the regularization term, such as L1 norm or L2 norm, which represents the complexity of the model. XGBoost uses the tree model complexity as a regularization term in the objective function to avoid overfitting [23]. A tree ensemble model predicts the output by averaging K trees.
where F is the space of regression trees. To learn the set of function in the model, we minimize the following regularized objective.
where l is a differentiable convex loss function that measures the difference between the predictionŷ i and the target y i . n is the number of predictions and the second term ∑ K k=1 Ω( f k ) penalizes the complexity of the model. For a given tree ensemble model, the complexity of the model can be defined as where the γ and λ are both regularization factors. γ is the complexity of the each leaf which is used to control tree node splitting. The node will split when the loss function is larger than the value of γ. λ is a parameter to scale the penalty, T is the number of leaves in a decision tree and w j is the weight of the leaf nodes. The XGBoost algorithm apply the second-order Taylor expansion to the loss function in the optimization process. Therefore, the objective function at step t can be derived as where g i and h i are the first and second derivative of the loss function. We can also express the derivatives by the sum for each leaf node as G j = ∑ i∈I j g i and H j = ∑ i∈I j h i where I j represents all the data samples in leaf node j. Therefore, we can express the objective function as The XGBoost algorithm involves the creation and addition of decision trees sequentially, each constructed based on the information from all previously built trees. Shallow trees often yield poor performance because they capture few details of the problem and are generally referred to as weak learners. Deeper trees might capture too many details of the problem and overfit the training dataset, limiting the ability to make good predictions on new data. Thus, it is important to tune the number and size of decision trees with XGBoost. Additionally, the hyperparameter learning rate also needs to be tuned to prevent the model from quickly fitting and then overfitting the training dataset. It scales the newly added weights to reduce the influence of each individual tree and leaves space for future trees to improve the model. In the current work, we firstly selected multiple combinations of the three parameters within reasonable ranges. For the number of trees, we evaluated a series of values from 600 to 1200 with a step size of 100 for the regression model and from 50 to 350 with a step size of 50 for the classification model. Four different values (2,4,6,8) of the size of trees are selected. The learning rate is varied on a log10 scale from 0.0001 to 0.1. Then, the cross-validation is performed on the training dataset with different parameter combinations and the one produces the minimum mean square error on the validation dataset is selected and used in the final models.
In this work, the machine learning models are trained on python 3.6.1 and the Scikit-learn 0.18.1 [34] and XGBoost 0.6 [21] are imported to support XGBoost learning models.

Data Sets
Data preparation is the first step to build the predictive model. In the current work, the models were trained, validated and tested on a data base of 1736 samples, consisting of high fidelity simulation data (DDES) and RANS results. The RANS data, obtained by applying the ice accretion modeling solver developed by the author in the previous work [12], are used to constitute the major part (around 90%) of the database. The DDES data are mainly used for comparison purpose when evaluating the model's capability in predicting the icing severity results. The DDES results are obtained by applying the author's newly developed icing solver [25] and described in the next section. 217 sets of meteorological variables (LWC, droplet diameter) are selected from their distribution profile at −22 F • shown in Figures 1 and 2 Tables 2 and 3. From the set of full generated samples, 1215 samples are selected for the training and 521 samples for the testing. Since testing and training samples are partitioned randomly from the set of generated samples following same design principle, the testing and training data sets have the same population distribution.

Error Analysis
To evaluate the performance of the developed models to predict the icing area and maximum ice thickness, multiple error analysis measures were performed. For predicting the icing severity level, the model's performance evaluation is introduced in the next section.

RMSE
The root mean squared error (RMSE) was calculated for the regression model. It presents an estimate of by how much each predicted outcome will deviate from its true value.

R 2
The R 2 or Coefficient of determination is another measure for the regression model which presents how well the model can replicate the true outcomes. Sufficient agreement is achieved when R 2 ≥ 0.85 [23].
whereȲ is the mean of the true outcomes.

Distribution of Errors
The errors are calculated as fractions given by the value of residuals divided by the true outcomes. The histograms are constructed to examine the nature of the error distributions. It is the most informative measure of prediction model performance.

Numerical Simulation Results
For the DDES computation, the baseline topology of the grid is a C-type structured grid as shown in Figure 3. The far-field boundaries are located 50 chord lengths (c) away from the airfoil to diminish the influence of the boundary condition and the spanwise domain size is 0.4c. The first layer heights is 0.8e − 5c. The grid details are presented in Table 4, in which N x , N y , and N z are the numbers of grid points along the circumference direction, the wall-normal direction, and the spanwise direction, respectively, and the N s represents the number of grid points on the airfoil surface. Since the main focus of the current work is the development of the icing severity prediction model, the instantaneous flow field will not be discussed in this paper.  Figure 3. Computational grid for NACA0012 airfoil.
To carry out the ice accretion simulation, the meteorological factors LWC and droplet diameter is set to be 1.0 g/m 3 and 20 µm, respectively. The exposure time is 360 s. For validation purposes, Figure 4 shows the ice shape comparison between the simulated results in this paper, experimental data [33], and simulated results from Cao et al. [10]. From the comparison plot, a good agreement is obtained. It can be seen that the ice height is higher in the lower region of the airfoil. This is because the airfoil is at 4 • angles of attack, the main impingement region is at the lower surface.

Aircraft Icing Severity Evaluation Based on XGBoost
In this section, the building process of the aircraft icing severity evaluation model is described, and then the prediction results are studied and error analysis are conducted to demonstrate the effectiveness of the proposed approach. Three icing conditions, that is, LWC, droplet diameter and exposure time are given to the model. We consider three icing severity features to demonstrate the applicability of the model: icing area, maximum ice thickness and icing severity level, and then the effects of the icing conditions on the three icing severity features are studied. Chang et al. [35] suggest that using both the separated-specimen method and the whole-set method to conduct the simulation to comprehensively evaluate the model's performance. The separated-specimen method is used to divide the database into the CMI condition and the IMI condition. For each feature, we explore the effect of training dataset type on the model performance by carrying out the prediction for each dataset. A schematic of the proposed workflow is shown in Figure 5.

Icing Area and Maximum Ice Thickness Evaluation
In this case, the model is used to predict the area size covered by ice and the maximum ice thickness on the airfoil . The performance of the model is summarized in Tables 5 and 6 for the three datasets: CMI dataset, IMI dataset and the whole dataset. It can be observed that the larger training set yields lower prediction errors. More importantly, the fact that the model yields the best performance on the whole dataset indicates that the model is capable of differentiating the CMI and IMI condition when evaluating the icing area and maximum ice thickness. The CPU time is recorded as the training time. The tests are performed on the 3.5 GHz Intel Core i7 processor in serial. To further demonstrate the effectiveness of the model, we present the comparison between the true results and predicted results in the test dataset in Figures 6 and 7. The scatter plot is the true icing area versus the predicted icing area and true maximum ice thickness versus the predicted maximum ice thickness, and the black line with unit slope represents a perfect prediction. They indicate that the sufficient agreement is achieved between the predicted and true result.
The Figures 8 and 9 present the histograms of the test errors to examine the nature of the error distributions. For predicting the icing area, the median error is 0.0014 in CMI condition and 0.0006 in IMI condition. For predicting maximum ice thickness, the median error is 0.0420 in CMI condition and 0.0400 in IMI condition. All of them are lower than 0.0500, which is deemed satisfactory [23]. Figure 10 presents the recreated icing area and maximum ice thickness corresponding to the LWC and droplet diameter distribution shown in Figures 1 and 2 using the predictive model. It can be observed that in both conditions, the model's results have good agreement with the DDES results. In the CMI condition, the RANS method tends to underpredict the icing area and the maximum ice thickness as well. Since the model is trained on the datasets comprises not only RANS data but DDES data as well, the model is able to learn an average result. The IMI condition represents the icing condition occurring in the cumuliform clouds which is generally unstable relative to the stratiform clouds. Furthermore, due to the complexity and unsteadiness of the ice accretion process itself, indeed, the icing results shown in Figure 10b shows more strong oscillations. However, it should be noted that the icing area and maximum ice thickness oscillates strongly but with little changes for small droplets (less than 15 µm). Those small droplets do not contribute much to icing. Overall, it can be concluded that fairly good agreement is achieved for the model.
In both conditions, the larger the water droplets, the thicker the ice layer, the greater impact on the aircraft safety. By comparing Figure 10a,b, we can notice that in general, the maximum ice thickness in IMI condition is higher than it is in CMI condition. This is because that CMI condition represents the occurrence of icing in Stratiform cloud which usually has lower liquid water content than cumuliform cloud. The LWC represents the amount of supercooled water droplets that an aircraft can hit in a given air mass, thus, the thickness of ice increases with the increase of LWC and cause greater damage to the aerodynamic performance.    Figure 11 shows the effect of exposure time on the icing area with the median value of LWC and droplet diameter. The predicted icing area and the DDES results are in good agreement. As expected, in both conditions, the icing area increases with the increase of exposure time. It should be noticed that the icing area growth ratio are increasing with time, which means that the longer a flight stays in the icing cloud, the faster the icing area increases. This observation might be attributed to the runback water effect [11]. The conductive heat transfer plays a dominant role in the ice accretion process, thus, it can be fairly assumed that in the early icing stage where very thin ice layer created, all the water droplets gets frozen immediately upon the impact on the aircraft surface and there exists no overflow water. The conductive heat loss gets weaker as the ice layer thickness increases and the overflow water first appears as the ice layer has grown to a certain extent which is referred to as critical ice thickness [11]. Since the overflow water is mainly driven by the air/water interface friction, it moves closely follows the wall air streamlines [36]. Hence, it is appropriate to consider that the overflow water might contribute to the icing area increment. Figure 12 shows that the built model can successfully capture the maximum ice thickness with exposure time with the median value of LWC and droplet diameter. Intuitively, the longer the aircraft stays in icing condition, the thicker the ice layer, which is also confirmed by Figure 12. Additionally, it is worth mentioning that thickness growth ratio is decreasing with time. Again, it can be accounted for by the runback water effect. As the ice layer grows to the critical ice thickness, only part of the water droplet gets frozen, which slows down the maximum ice thickness growing speed.
A benefit of using ensembles of decision tree methods like XGBoost is that they can provide estimates of feature importance from a trained predictive model. Generally, scores are generated to indicate how useful or valuable each feature was in the construction of the boosted decision trees within the model. In this case, we output the feature importance to rank the importance of the three input parameters to the icing severity results. The scores are summarized in the Figure 13 where D represents the droplet diameter. We can see that the droplet diameter has the highest importance and the LWC has the lowest importance with regards to icing area. In terms of maximum ice thickness, droplet diameter has the highest importance, the exposure time and LWC have a comparable level of importance.
(a) CMI condition (b) IMI condition Figure 11. Effect of exposure time on icing area.

Icing Severity Level Determination
For predicting the icing severity levels, the model is quantitatively evaluated by using several model evaluation indicators, such as precision, recall rate, F1 score and confusion matrix. The binary classification problem is taken as an example to explain the definitions of these measures. For each icing severity level prediction, the dataset are divided into true positive (TP), false positive (FP), true negative (TN) and false negative (FN) according to the combination of its real category and machine learning model prediction category.

Precision
Precision indicates the proportion of the positive prediction which was correct. It is defined as the number of true positives (TP) over the number of true positives plus the number of false positives (FP).

Recall Rate
Recall Rate indicates the proportion of actual positives was identified correctly. It is defined as the number of true positives (TP) over the number of true positives plus the number of false negatives (FN).
3. F1 Score F1 Score relates precision and recall and is defined as the harmonic mean of them.

Confusion Matrix
The confusion matrix [37] is used for evaluating the model when faced with a multi-classification problem. In the current work, it is applied to measure the performance of the model predicting the icing severity level. Each row of the matrix represents the predicted category while each column represents the actual category, which allows more detailed analysis than mere proportion of correct classifications.
The above mentioned evaluation indicators are summarized in Figure 14. It can be seen that in the confusion matrix, the diagonal values are much higher than the non-diagonal value. The precision and recall rate for the four categories are all above 95%. It shows that the XGBoost model has high accuracy when predicting icing severity level. It should be noticed that there are three extreme error cases: two severe predictions instead of light and one light prediction instead of severe. By looking into the error samples, the two severe predictions happens at 30 min exposure time. The corresponding LWC and droplet diameter indicate the icing severity level should be light though. The light prediction happens at 15 min exposure time. The feature importance shown in Figure 15 indicates that the model put much more weight on exposure time than LWC. Thus, the wrong severe prediction might be caused by the large exposure time. The feature importance is reported in Figure 15, from which we can see that the droplet diameter and the exposure time have the highest importance and the LWC has the lowest importance with regards to icing severity level prediction. The noticeable difference between Figures 13b and 15 might be attributed to model's training process. Although the maximum ice thickness and the icing severity level have the similar physical meaning, the model might adopt different weights during the training process to generate the satisfied results for continuous variable and ordinal variable.

Conclusions
In this study, we proposed an approach to provide quick evaluations of aircraft icing severity at different flight conditions based on the machine learning model XGBoost. In its application to predict icing area, maximum ice thickness and icing severity level, well-defined performance measures are carried out through the training and testing process. It was found that the new method shows promising performance in different types of icing conditions. Due to its small computational resources requirement, fast performance and reasonable accuracy, the new method can provide an attractive alternative to traditional numerical simulation approach. Additionally, its ability to generate the feature importance can help to study the effect of different aerodynamic and meteorological factors on the icing severity results. The limitation found on the use of the current method is that the range of predictions is limited to the input data range. For example, it will not be able to predict icing severity precisely for different wings.
The proposed method has the potential to be coupled with other ice protection systems to further increase the flight safety. Additionally, coupling the proposed method with other computational fluid dynamics codes to estimate the degradation of the aircraft performance is currently being explored. If successful, a short-term application of this study would be to use the trained model to obtain a more comprehensive 'off-line' mapping of ice severity that incorporates a larger data-set of meteorological data and flight conditions. A long term application would be to use the trained model in a on-board software to help the pilot determine 'on-line' the severity as function of current conditions.