Surrogate Models for Performance Prediction of Axial Compressors Using through-Flow Approach

: Two-dimensional design and analysis issues on the meridional surface, which is important in the preliminary design procedure of compressors, are highly dependent on the accuracy of empirical models, such as the prediction of total pressure loss model and turning ﬂow angle. Most of the widely used models are derived or improved from experimental data of some speciﬁc cascades with low-loading and low-speed airfoil types. These models may work for most conventional compressors but are incapable of accurately estimating the performance for some speciﬁc cases like transonic compressors. The errors made by these models may mislead the ﬁnal design results. Therefore, surrogate models are developed in this work to reduce the errors and replace the conventional empirical models in the through-ﬂow calculation procedure. A group of experimental data considering a two-stage transonic compressor is used to generate the airfoils database for training the surrogate models. Sensitivity analysis is applied to select the most inﬂuential features. Two supervised learning approaches including support vector regression (SVR) and Gaussian process regression (GPR) are used to train the models with a Bayesian optimization algorithm to obtain the optimal hyper parameters. The trained models are integrated into the through-ﬂow code based on streamline curvature method (SLC) to predict the overall performance and internal ﬂow ﬁeld of the transonic compressor on ﬁve rotational speed lines for validation. The predictions are compared with the experimental data and the results of conventional empirical models. The comparison shows that SVR and GPR respectively reduce the predicted error of empirical models by 62.2% and 55.2% for the total pressure ratio and 48.4% and 50.1% for adiabatic e ﬃ ciency on average. This suggests that the surrogate models constitute an alternative way to predict the performance of airfoils in through-ﬂow calculation where empirical models are ine ﬃ cient. deviation angle and total pressure loss over incidence angle and Mach for the 900 samples in the blade element database. The variety parameters cover a wide working range of typical transonic compressors.


Introduction
With the development of computation technology, the 3D computational fluid dynamics (CFD) approach has become a mainstream numerical simulation tool for compressors. However, during the preliminary design period, the 2D through-flow approach still plays an important role in design and analysis issues of compressors. Unlike the comprehensive but time-consuming 3D simulation, 2D approaches provide rapid and reliable estimation of the performance of compressors with few geometric parameters, before the geometry of blades is specified, and direct the subsequent steps. The popular through-flow approaches including streamline curvature method, stream function matrix method, or time marching method, to solve the simplified Navier-Stokes equations on the median flow surface between adjacent blades with reasonable assumptions. The accuracy of these methods is determined by empirical correlations [1] which give the control equations a closed-form expression. In general, the empirical correlations originate from the qualitative boundary-layer theory and experimental data of some classical cascades, some of which are listed in [2]. They have a limited suitable operation extent while the accuracy becomes unreliable if they are applied to off-design conditions or high Mach number cases. Given that, many efforts have been devoted to improve the accuracy and application scope of the empirical correlations [3][4][5]. Besides the improvement works on the traditional correlations, Schmitz et al. proposed the thought of using surrogate models to predict the performance of airfoils [6], which inspired this work. They generated the airfoil database using S1-Solver MISES (a system for cascade analysis and design developed by MIT [7]), built the surrogate models by Bayesian neural networks, and integrated the Bayesian trained neural networks to S2-solver to evaluate a transonic 4.5-stage compressor [8]. In addition to the application in modeling the airfoil characteristics, the surrogate model technique has been widely used in other works: Nonlinear unsteady aerodynamic reduced-order models are developed for aeroelastic analysis of airfoils [9]; the convolutional neural network (CNN) is used to approximate the high-fidelity CFD models to predict the property of turbine vanes in an efficient and accurate way [10]; the adaptive machine learning framework, including functions of dynamic selection and self-tuning surrogate models, are developed to accelerate the optimization process of transonic compressors [11].
Surrogate models based on machine learning methods can solve the nonlinear issues such as the problem in through-flow calculations. This work aims to build surrogate models from experimental data of a certain transonic axial compressor to calculate total pressure loss and deviation angle of airfoils, instead of using conventional empirical models. The availability of predicting the performance and internal flow field by integrating the surrogate models with a through-flow program is also validated.
In brief, the procedure of building the surrogate models contains the following steps. A Kriging model is used to approximate the experimental data with all input parameters. Then, the most influential inputs are screened out based on the sensitivity analysis results of the Kriging model. The SVR or GPR models are trained with selected inputs while the hyper parameters of surrogate models are tuned by the Bayesian optimization algorithm. The experimental data is split into training and validation sets via a cross validation scheme, and the score of the validation set is set as optimization target during the optimization procedure. The final surrogate models are obtained by training the models on the whole data set with the optimal hyper parameters.

Blade Element Database
A reliable database, including the necessary parameters of blade elements, is an essential prerequisite to ensure the accuracy of the models. Empirical models are commonly derived from experimental data of 2D cascades within design conditions. Additional correlations for correction are required when empirical models are applied to estimate the performance of compressors since the internal flow field is more complex compared to cascades. However, this correction process inevitably introduces errors into the target values, which may mislead the prediction of overall performance. Thus, a set of experimental data of the real compressor is used directly to build the surrogate models. The experimental data are extracted from a two-stage axial transonic compressor developed by the NASA Lewis Research Center. The design parameters are briefly listed in Table 1 and can be referred to [12]. The experimental rotation speed and mass flow rate are corrected to the standard inlet condition at sea level by Equation (1). This compressor has a supersonic inflow condition in the tip zones of rotors with the existence of channel shock wave and subsonic flow field in the other zones. The airfoils of both rotors and stators were designed by the multiple-circular-arc (MCA) rule, which can represent the design concept of most the other transonic compressors developed in that age.  The schematic of the test rig of this compressor is shown in Figure 1, and the other configuration information of the experiment can be found in [13]. Nine probes are distributed non-uniformly along the span of each blade row from 0.05 to 0.95 of span length, thus each test case contains 36 sets of blade element data for the 4-blade row, including measured aerodynamic parameters. There are 35 test cases conducted with rotational speed varying from 50% to 112% and 25 of them have detailed blade element data in the report. The 900 measurements of blade element data from the 25 test cases are used to construct the training samples for the next section. Based on the features of most empirical models, each sample includes 6 aerodynamic and 7 geometric parameters as input variables. The structure of the surrogate models is shown in Figure 2 in which the geometric and aerodynamic parameters are defined as: SPAN represents the fraction of blade height from hub; STAGGER is the angle between the chord line and axial direction; SOLIDITY is the ratio of chord length to pitch; T/C means the ratio of maximum thickness to chord length; A/C is the fractional distance of the maximum thickness location from the leading edge; CAM is the camber angle; LET/C is the ratio of the thickness of the leading edge to chord length; INC indicates the incidence angle between inflow angle and inlet metal angle; B1 is the inflow angle; M1 is the inflow Mach number; DF is the diffusion factor which represents the loading of the cascade; MVDR refers to the meridional velocity-density ratio as ρ 2 V m2 /ρ 1 V m1 where 1 and 2 represent the inlet and outlet respectively; REYNOLD means the Reynolds number of the cascade. The total pressure loss ω and deviation angle δ are the outputs of the surrogate models. Figure 3 depicts the distribution of deviation angle and total pressure loss over incidence angle and Mach for the 900 samples in the blade element database. The variety parameters cover a wide working range of typical transonic compressors.  The schematic of the test rig of this compressor is shown in Figure 1, and the other configuration information of the experiment can be found in [13]. Nine probes are distributed non-uniformly along the span of each blade row from 0.05 to 0.95 of span length, thus each test case contains 36 sets of blade element data for the 4-blade row, including measured aerodynamic parameters. There are 35 test cases conducted with rotational speed varying from 50% to 112% and 25 of them have detailed blade element data in the report. The 900 measurements of blade element data from the 25 test cases are used to construct the training samples for the next section. Based on the features of most empirical models, each sample includes 6 aerodynamic and 7 geometric parameters as input variables. The structure of the surrogate models is shown in Figure 2 in which the geometric and aerodynamic parameters are defined as: SPAN represents the fraction of blade height from hub; STAGGER is the angle between the chord line and axial direction; SOLIDITY is the ratio of chord length to pitch; T/C means the ratio of maximum thickness to chord length; A/C is the fractional distance of the maximum thickness location from the leading edge; CAM is the camber angle; LET/C is the ratio of the thickness of the leading edge to chord length; INC indicates the incidence angle between inflow angle and inlet metal angle; B1 is the inflow angle; M1 is the inflow Mach number; DF is the diffusion factor which represents the loading of the cascade; MVDR refers to the meridional velocity-density ratio as where 1 and 2 represent the inlet and outlet respectively; REYNOLD means the Reynolds number of the cascade. The total pressure loss ω and deviation angle δ are the outputs of the surrogate models. Figure 3 depicts the distribution of deviation angle and total pressure loss over incidence angle and Mach for the 900 samples in the blade element database. The variety parameters cover a wide working range of typical transonic compressors.

Method for Building Surrogate Models
The steps for building the surrogate models for total pressure loss and deviation angle (flowchart shown in Figure 4) are as follows: 1. Pre-process the experimental data: A Kriging approach is implemented to create a response surface model based on the blade element database with all the features as input parameters. Sensitivity analysis method is applied on the response surface to estimate the elementary effect of each input feature. The features with a greater effect on the objects are screened out as input parameters for training either the total pressure loss model or deviation angle model. 2. Train models and tune hyper parameters: support vector regression and Gaussian process regression are used to train the surrogate models, respectively, with the input parameters determined by Step 1. The optimal models are obtained by tuning the hyper parameters in these models using a Bayesian optimization approach. The coefficients of determination (R2) of the cross-validation results are used as criteria to evaluate the models during the optimization process.

Method for Building Surrogate Models
The steps for building the surrogate models for total pressure loss and deviation angle (flowchart shown in Figure 4) are as follows: 1. Pre-process the experimental data: A Kriging approach is implemented to create a response surface model based on the blade element database with all the features as input parameters. Sensitivity analysis method is applied on the response surface to estimate the elementary effect of each input feature. The features with a greater effect on the objects are screened out as input parameters for training either the total pressure loss model or deviation angle model. 2. Train models and tune hyper parameters: support vector regression and Gaussian process regression are used to train the surrogate models, respectively, with the input parameters determined by Step 1. The optimal models are obtained by tuning the hyper parameters in these models using a Bayesian optimization approach. The coefficients of determination (R2) of the cross-validation results are used as criteria to evaluate the models during the optimization process.

Method for Building Surrogate Models
The steps for building the surrogate models for total pressure loss and deviation angle (flowchart shown in Figure 4) are as follows: 1.
Pre-process the experimental data: A Kriging approach is implemented to create a response surface model based on the blade element database with all the features as input parameters. Sensitivity analysis method is applied on the response surface to estimate the elementary effect of each input feature. The features with a greater effect on the objects are screened out as input parameters for training either the total pressure loss model or deviation angle model.

2.
Train models and tune hyper parameters: support vector regression and Gaussian process regression are used to train the surrogate models, respectively, with the input parameters determined by Step 1. The optimal models are obtained by tuning the hyper parameters in these models using a Bayesian optimization approach. The coefficients of determination (R2) of the cross-validation results are used as criteria to evaluate the models during the optimization process.

3.
Post-process with validation: Validate the final models using the whole experimental database. 3. Post-process with validation: Validate the final models using the whole experimental database.

Features Selection
For the two targets of total pressure loss and deviation angle, either the mapping relationships or the elementary effects between input features and targets is different. The noise may be brought into the models and mislead the prediction if non-influential variables are selected as input. Therefore, the sensitivity analysis method is required to screen out the features which have important effects on the targets before training the surrogate models. An elementary effects (EE) method proposed by Morris [14] is applied here to select the necessary input parameters for the two models respectively. To simply exemplify the EE method, let the models have the form as Equation (2) with k input factors, where X represents for the geometric and aerodynamic parameters, and Y for the total pressure loss or deviation angle.
Morris provides two sensitivity measures for each input factor: μ to assess the importance of the input factor on the output, and σ to describe the non-linear effects and interactions. These two measures are obtained from a design based on the construction of a series of trajectories in the space of the inputs, where inputs are randomly moved one-at-a-time (OAT). In this design, each model input is assumed to vary across p selected levels in the space of the input factors. The region of space Ω is thus a k-dimensional p-level grid. Each trajectory is composed of k + 1 points since input factors move one by one of a step Δ in is any selected value in Ω such that the transformed point is still in Ω.
Further, r elementary effects are estimated for each input

Features Selection
For the two targets of total pressure loss and deviation angle, either the mapping relationships or the elementary effects between input features and targets is different. The noise may be brought into the models and mislead the prediction if non-influential variables are selected as input. Therefore, the sensitivity analysis method is required to screen out the features which have important effects on the targets before training the surrogate models. An elementary effects (EE) method proposed by Morris [14] is applied here to select the necessary input parameters for the two models respectively. To simply exemplify the EE method, let the models have the form as Equation (2) with k input factors, where X represents for the geometric and aerodynamic parameters, and Y for the total pressure loss or deviation angle.
Morris provides two sensitivity measures for each input factor: µ to assess the importance of the input factor on the output, and σ to describe the non-linear effects and interactions. These two measures are obtained from a design based on the construction of a series of trajectories in the space of the inputs, where inputs are randomly moved one-at-a-time (OAT). In this design, each model input is assumed to vary across p selected levels in the space of the input factors. The region of space Ω is thus a k-dimensional p-level grid. Each trajectory is composed of k + 1 points since input factors move one by one of a step ∆ in 0, 1/(p − 1), 2/(p − 1), . . . , 1 while all others remain fixed. Along each trajectory, the elementary effect for each input factor is defined as: where X = (X 1 , X 2 , . . . , X k ) is any selected value in Ω such that the transformed point is still in Ω. Further, r elementary effects are estimated for each input d i X (1) , d i X (2) , . . . , d i X (r) by randomly sampling r points X (1) , X (2) , . . . , X (r) . The two measurements µ and σ are respectively the mean and standard deviation of the distribution of the elementary effects of each input: These two measures rank the input factors in order of influence on the target. The input variable with both low µ i and σ i can be confirmed as a non-influential factor. A low µ may occur when the Energies 2020, 13, 169 6 of 25 model is not monotonic and elementary effects with opposite sign cancel each other out. An improved method was proposed by Campolongo [15] to use the absolute value of d i X ( j) to calculate µ: A Kriging approach [16] is implemented to create a response surface model, giving all the geometric and aerodynamic factors as input based on the blade element database since the exact form of mapping function is unknown. The optimal model for sensitivity analysis is obtained by the tuning of hyper parameters using an optimization algorithm described in the next section. For this case, k is 13 while r and p are assigned to 1000 and 4, leading to 1000 trajectories and 4 levels on each. The distribution of elementary effect of each input factors are shown in Figure 5, with ranking based on importance. The results indicate that the influential features for each target are not consistent. The top factors where the sum of elementary effects occupy greater than 90% are selected as inputs. The final screened inputs for models of total pressure loss and deviation are listed in Table 2.
Energies 2019, 12, x FOR PEER REVIEW 6 of 25 improved method was proposed by Campolongo [15] to use the absolute value of A Kriging approach [16] is implemented to create a response surface model, giving all the geometric and aerodynamic factors as input based on the blade element database since the exact form of mapping function is unknown. The optimal model for sensitivity analysis is obtained by the tuning of hyper parameters using an optimization algorithm described in the next section. For this case, k is 13 while r and p are assigned to 1000 and 4, leading to 1000 trajectories and 4 levels on each. The distribution of elementary effect of each input factors are shown in Figure 5, with ranking based on importance. The results indicate that the influential features for each target are not consistent. The top factors where the sum of elementary effects occupy greater than 90% are selected as inputs. The final screened inputs for models of total pressure loss and deviation are listed in Table 2.

Hyper Parameters Optimization
The performance of surrogate models is strongly related to the configuration of the internal hyper parameters, such as learning rates, kernel functions, or regularization factors, besides the reliable samples. It is necessary to tune the hyper parameters to make sure that the models have optimal performance. Various search strategies are thus developed to find the best hyper parameter. For the cases with just a few hyper parameters, a manual search can easily be used to find the optimal values by experienced experts, but it depends greatly on prior knowledge. The most common automatic search strategies include grid search and random search. Grid search is a simple strategy for the investigation of user-defined points. It may lead to either a miss of the optimal value if too few points are specified, or excessive resource consumption if every possible combination of hyper parameters is considered, especially for complicated cases. Random search [17] was developed to overcome the drawbacks of the grid search and successfully used in many applications. It requires fewer random samples than grid search, fewer computational resources, and performs more efficiently. Each search process of either grid or random search is independent to the others, since they are both uninformed search algorithms, thus the previous results of hyper parameters cannot guide the next evaluation. These algorithms can be applied to a variety of search problems because the target information is not needed, however, at the cost of search efficiency. Therefore, an informed search algorithm using Bayesian optimization with higher search efficiency [18] (shown in Figure 6, dash denotes hand-tuned) is used optimize the hyper parameters.

Hyper Parameters Optimization
The performance of surrogate models is strongly related to the configuration of the internal hyper parameters, such as learning rates, kernel functions, or regularization factors, besides the reliable samples. It is necessary to tune the hyper parameters to make sure that the models have optimal performance. Various search strategies are thus developed to find the best hyper parameter. For the cases with just a few hyper parameters, a manual search can easily be used to find the optimal values by experienced experts, but it depends greatly on prior knowledge. The most common automatic search strategies include grid search and random search. Grid search is a simple strategy for the investigation of user-defined points. It may lead to either a miss of the optimal value if too few points are specified, or excessive resource consumption if every possible combination of hyper parameters is considered, especially for complicated cases. Random search [17] was developed to overcome the drawbacks of the grid search and successfully used in many applications. It requires fewer random samples than grid search, fewer computational resources, and performs more efficiently. Each search process of either grid or random search is independent to the others, since they are both uninformed search algorithms, thus the previous results of hyper parameters cannot guide the next evaluation. These algorithms can be applied to a variety of search problems because the target information is not needed, however, at the cost of search efficiency. Therefore, an informed search algorithm using Bayesian optimization with higher search efficiency [18] (shown in Figure 6, dash denotes hand-tuned) is used optimize the hyper parameters. Bayesian optimization, based on the sequential model-based global optimization [18,19] (SMBO) algorithm that uses information from previous trials as advice for future exploration, has been applied in the modeling cases where the calculation of a fitness function is expensive [20,21]. The true fitness function of minimization problem : f χ →  : within domain d χ ∈  is considered to solve the optimal hyper parameters of * x . SMBO approximates f with surrogate models, usually as Gaussian Process (GP) or Tree-structured Parzen Estimator (TPE) approach since the true form is unknown. The schematic of the SMBO algorithm is shown in Algorithm 1. Bayesian optimization, based on the sequential model-based global optimization [18,19] (SMBO) algorithm that uses information from previous trials as advice for future exploration, has been applied in the modeling cases where the calculation of a fitness function is expensive [20,21]. The true fitness function of minimization problem f : χ → R : within domain χ ∈ R d is considered to solve the optimal hyper parameters of x * . SMBO approximates f with surrogate models, usually as Gaussian Process (GP) or Tree-structured Parzen Estimator (TPE) approach since the true form is unknown. The schematic of the SMBO algorithm is shown in Algorithm 1.

Algorithm 1
The pseudo-code of generic sequential model-based optimization.
Typically, a small set of samples from χ initialize the probabilistic regression model (or prior function) M as cheap surrogate for the expensive fitness function f. Then a new set of values x i is obtained by optimizing the acquisition function S. A new evaluation of y i corresponds to x i , and is sequentially obtained by performing f directly and appended to the evaluation history set of the best locations and targets D = (x 1 , y 1 ), . . . , (x i , y i ) . The model M refits the updated record D and repeats the last steps for T times. The criterion of expected improvement (EI) [22] is chosen as S because it is intuitive and well performing. Expected improvement is the expectation for f when the evaluation of f (x) negatively exceeds a specified threshold y * : EI is always non-negative, either zero for y more than y * , or positive when y is less than y * , and the minimum y corresponds to the maximum EI. Herein the higher budget TPE is selected to approximate f . Unlike GP that models the prior p(y x) directly, the TPE defines p(x y) using two non-parametric densities: From Bayes theorem, we get: Considering γ = p(y < y * ), the p(x) can be expressed as follows: Putting Equations (9)-(11) into (8), the EI based on the TPE algorithm is given by: This expression indicates that x with high probability under l(x) and low g(x) points to the maximum EI. The x which gives the greatest EI is x * , and the optimal hyper parameters are the best x * in the observation history D. The open source optimization package Hyperopt developed by Bergstra et al. [18] is used as optimizer for the hyper parameters.

Modeling by Support Vector Regression
Support vector machine (SVM) [23] is a type of supervised learning method usually used for classification (SVC) [24] or regression (SVR) [25,26]. Its high efficiency in processing high dimensional data spaces, and ability to use only a subset of samples (support vectors) in the decision function, has allowed a wide application of SVM in solving real world problems. Attributed to the assistance of kernel functions (called kernel trick) by mapping the original sample space R to a higher or infinite dimensional space R n (Hilbert space), while effectively avoiding the probable curse of dimensionality without complicating the calculation, SVM can easily handle the nonlinear sample data on higher dimensional space by using a linear method (optimal hyper plane). It's flexible to adjust SVM performance with versatile kernel functions for specific problem. Kernel functions such as linear, polynomial, RBF, and sigmoid are commonly used. In this work RBF is chosen as the kernel function with the form of: The hyper parameter γ denoting the effect range of RBF function should be optimized by SMBO. It works on the Euclidean distance x − x between x and x . For the given training vectors x i ∈ R p , i = 1, . . . , n and the target vector y ∈ R n , the following target function of SVR needs to be solved: This is a convex quadratic program problem with linear constraints. An equivalent dual formulation for simplifying calculation is transformed by introducing the addition of Lagrange multipliers based on Lagrange duality. The dual function is: where e is the unit vector, penalty coefficient C and margin ε are the other two hyper parameters requiring optimization. C is dedicated to avoid over-fitting as the regularization approach, and ε is the margin to tolerate the noise. Q is a n × n semi-definite matrix given by: where K is the RBF kernel function, φ is the mapping to high dimensional space and n is the amount of samples. This dual optimization problem can be solved by sequential minimal optimization (SMO) algorithm [27] whose structure is briefly shown in Algorithm 2. The decision function (formulation of surrogate models) is as follows, where ρ indicates the interception of model.
During this modeling task, the SMBO algorithm works on the domain of hyper parameters (first 3 columns in Table 3) defined by range of exponential function e (a,b) for 100 iteration steps, where [a, b] is a uniform distribution. fix the other α and solve the optimization problem to get α i and α j 7: express α j by α i 8: end while The coefficient of determination R 2 is used as the score of fitness function f (y i means the true target value, y denotes the mean value of y i and f i is the prediction values by SVR). The target of optimization is to find an optimal set of hyper parameters maximizing the R 2 in which high value means good fit.
The surrogate models are usually over-fitted if the validation set is consistent with the trained one. The performance of models may deteriorate if the training data is simply split into two parts of training data and validation set, since the training information does not cover both sets. The cross validation [28] (CV) method is therefore developed to improve this situation. By splitting blade element database to 5 sets randomly of similar size, where one set is for validation, and the other four are for training sets with 5 loops. The mean of the 5 R 2 of validation sets is taken as the criteria for evaluation of models in this work. Figure 7 shows the hyper parameters optimization history records for SVR models of the deviation angle and total pressure loss. From observing the history of locations in Figure 7a,b, each hyper parameter has reached the optimal value (last 2 columns in Table 3) within the search domain corresponding to the highest R 2 of the validation set. The optimization history curve in Figure 7c indicates both models approached the optimal target within 40 iterations.
The trained models are validated by the whole blade element data set and the results shown in Figure 8. The abscissa denotes the experimental data while the ordinate means prediction values. Deviation angle model has a higher R 2 , very close to 1, as compared to the total pressure loss model, with R 2 = 0.788. It is observed that the loss model underestimated the total pressure loss in some cases with high values. The high total pressure loss indicates that the airfoil works on inferior flow conditions, like near stall or choke operations, which are far from the designed working region. For conditions in the normal working range, the loss model is deemed reliable. Furthermore, considering that the former has more importance than the latter in through-flow code, the R 2 of loss is acceptable. More work on the surrogate model should be developed in future to improve the loss prediction.  with 0.788 R = . It is observed that the loss model underestimated the total pressure loss in some cases with high values. The high total pressure loss indicates that the airfoil works on inferior flow conditions, like near stall or choke operations, which are far from the designed working region. For conditions in the normal working range, the loss model is deemed reliable. Furthermore, considering that the former has more importance than the latter in through-flow code, the 2 R of loss is acceptable. More work on the surrogate model should be developed in future to improve the loss prediction.
(a) (b) Figure 8. Validation of SVR models on the whole data set: (a) deviation angle; (b) total pressure loss.

Modeling by Gaussian Process Regression
Similar to SVR, Gaussian process regression [29] is another supervised learning method for regression and probabilistic classing problems. Its advantages include the prediction's interpolation of the observations, the prediction is probabilistic (Gaussian) so that the prediction can be improved in some concerned region based on the empirical confidence intervals, and can use various kernel functions, as in SVR. However, GPR loses efficiency in high dimensional spaces.
From the function view, Gaussian process (GP) is used to describe a distribution over functions considering Bayesian inference directly in function space. GP is a set of stochastic variables with any combination of which has multivariate Gaussian distributions [29]. Its properties depend completely on the mean function ( ) m x and the covariance function ( ) where , ' d x x ∈ denotes the stochastic variables, and GP can be defined as: For simplicity, the mean function of the variables is usually preprocessed to zero. Considering the regression problem, we have the generic model as follows: ( )

Modeling by Gaussian Process Regression
Similar to SVR, Gaussian process regression [29] is another supervised learning method for regression and probabilistic classing problems. Its advantages include the prediction's interpolation of the observations, the prediction is probabilistic (Gaussian) so that the prediction can be improved in some concerned region based on the empirical confidence intervals, and can use various kernel functions, as in SVR. However, GPR loses efficiency in high dimensional spaces.
From the function view, Gaussian process (GP) is used to describe a distribution over functions considering Bayesian inference directly in function space. GP is a set of stochastic variables with any combination of which has multivariate Gaussian distributions [29]. Its properties depend completely on the mean function m(x) and the covariance function k(x, x ): where x, x ∈ R d denotes the stochastic variables, and GP can be defined as: For simplicity, the mean function of the variables is usually preprocessed to zero. Considering the regression problem, we have the generic model as follows: where x denotes the input vectors, f is the function, while y is the observation target values polluted by noise ε. Further considering the noise subjected to standard normal distribution ε ∼ N 0, σ 2 n , the prior distribution of y and the joint prior distribution of y and prediction f are respectively shown: In the formulas, K(X, X) is a n × n definite and symmetric covariance matrix composed by element k i,j = k x i , x j measuring the correlation between x i and x j . K(X, x * ) equaling to K(x * , X) T indicates the n × 1 covariance matrix between test point x * and input of training set X. k(x * , x * ) is the covariance of test point itself, and I n is the identity matrix. The posterior of f * can be calculated as: are the mean and variance of the prediction corresponding to x * , respectively. GPR has different selections of covariance function (kernel), where the most commonly used is the squared-exponential covariance function with the following form: where M = diag l 2 and l is the so-called length-scale (analogic to γ in RBF), while σ 2 f is the signal variance. The hyper parameters set θ = l, σ 2 f , σ 2 n can be achieved by the optimization of the gradient of negative log marginal likelihood L(θ) = − log p(y X, θ) with quasi-Newton methods: where Alternatively, the SMBO algorithm is used here to optimize the hyper parameters for comparison with the SVR model. The search domain is listed in Table 4 (first three columns). The configuration of iteration steps, score criteria, and CV scheme are equivalent with the SVR model.  Figure 9a,b describes the hyper parameters tuning procedure of training GPR models for both targets. As in the SVR cases, both models have reached the highest R 2 over optimal hyper parameters (last 2 columns in Table 4) within the search domain. The optimization record curve in Figure 9c shows that both models converge in 20 iterations, which is faster than SVR. From Figure 9a,b, it is observed that σ 2 n affects the R 2 more than the other two. This indicates that the prior distribution of noise is the main influencing factor for the GPR model's performance. Figure 9a and b describes the hyper parameters tuning procedure of training GPR models for both targets. As in the SVR cases, both models have reached the highest 2 R over optimal hyper parameters (last 2 columns in Table 4) within the search domain. The optimization record curve in Figure 9c shows that both models converge in 20 iterations, which is faster than SVR. From Figure 9a and b, it is observed that 2 n σ affects the 2 R more than the other two. This indicates that the prior distribution of noise is the main influencing factor for the GPR model's performance.  Figure 10 shows the comparison of prediction values of trained GPR models with experimental data. The 2 R of GPR validation results is close to the SVR results, which demonstrates that the two surrogate models perform nearly the same on the training data. In the next section, both models are applied in through-flow code to predict both design and off-design operation performance of the compressor.  Figure 10 shows the comparison of prediction values of trained GPR models with experimental data. The R 2 of GPR validation results is close to the SVR results, which demonstrates that the two surrogate models perform nearly the same on the training data. In the next section, both models are applied in through-flow code to predict both design and off-design operation performance of the compressor.
(c) Figure 9. Optimization of training GPR models: (a) Hyper parameters search of deviation angle model; (b) Hyper parameters search of total pressure loss; (c) Optimization curve of models. Figure 10 shows the comparison of prediction values of trained GPR models with experimental data. The 2 R of GPR validation results is close to the SVR results, which demonstrates that the two surrogate models perform nearly the same on the training data. In the next section, both models are applied in through-flow code to predict both design and off-design operation performance of the compressor.
(a) (b) Figure 10. Validation of GPR models on whole data set; (a) deviation angle; (b) total pressure loss. Figure 10. Validation of GPR models on whole data set; (a) deviation angle; (b) total pressure loss.

Streamline Curvature Approach Based through-Flow Program
The dominated through-flow method streamline curvature approach is implemented here to simulate the flow field in the axial compressor, as well as its overall performance. Based on the general through-flow theory [30,31] and actively developed since the 1960s [32], SLC has been successfully applied to the design and analysis of various types of turbomachinery. It simplifies the computational complexity greatly from real 3D flow simulations in blade channels to solve the Euler equations on 2D (or quasi-3D) flow surfaces ( Figure 11) on the premise of some assumptions including axial symmetric flow, adiabatic, steady, etc. The governing formulas are composed of a radial equilibrium equation and massflow continuity equation, with a constraint condition that the meridional velocity should be subsonic:

Comparison with Empirical Models
With the development of SLC, versatile empirical correlations of loss and deviation angles are proposed and improved by countless experts and engineers who design turbomachinery by SLC in practice. Some typical and widely used correlations are applied in this program to form the loss and deviation angle for comparison with surrogate models: The computations are performed on the mesh grid formed by streamlines and quasi-orthogonal stations. The discrete governing equations are solved by finite difference with specified boundary conditions for loops until the location of streamlines do not change any more. A private SLC based through-flow program developed by the authors is applied for this work [33]. The structure of this program is shown in Figure 12. The entropy S and swirl rV θ in the governing equations are updated by the total pressure loss and deviation angle formed below when solving the governing equations.
Energies 2019, 12, x FOR PEER REVIEW 17 of 25 The deviation angle δ contains the correlations of design angle * δ [33], off-design correction i δ [34] and AVDR correction avdr δ [35] while the total pressure loss considers airfoil profile loss prof ϖ [36], shock loss sh ϖ [37], and secondary flow loss sec ϖ [37], These correlations have the similar inputs as the surrogate models. The research object is the transonic compressor described at the beginning. The through-flow program coupling with either empirical models or surrogate models are respectively run for 5 rotational schemes (50%, 70%, 85%, 95%, 100% of design rotational 10,720 rpm), corresponding to the experimental cases. Figure 13 shows the meridional computation grids for this compressor, with 12 streamlines along the spanwise direction, where the first and last streamline coincide with hub and shroud. 18 quasi-orthogonal stations are distributed in the axial direction, with 8 stations coinciding with the rotor/stator blades leading and trailing edges, and the others for the inlet and outlet channel.

Comparison with Empirical Models
With the development of SLC, versatile empirical correlations of loss and deviation angles are proposed and improved by countless experts and engineers who design turbomachinery by SLC in practice. Some typical and widely used correlations are applied in this program to form the loss and deviation angle for comparison with surrogate models: δ = δ * + δ i + δ avdr tot = pro f + sh + sec (28) The deviation angle δ contains the correlations of design angle δ * [34], off-design correction δ i [35] and AVDR correction δ avdr [36] while the total pressure loss considers airfoil profile loss pro f [37], shock loss sh [38], and secondary flow loss sec [38], These correlations have the similar inputs as the surrogate models.
The research object is the transonic compressor described at the beginning. The through-flow program coupling with either empirical models or surrogate models are respectively run for 5 rotational schemes (50%, 70%, 85%, 95%, 100% of design rotational 10,720 rpm), corresponding to the experimental cases. Figure 13 shows the meridional computation grids for this compressor, with 12 streamlines along the spanwise direction, where the first and last streamline coincide with hub and shroud. 18 quasi-orthogonal stations are distributed in the axial direction, with 8 stations coinciding with the rotor/stator blades leading and trailing edges, and the others for the inlet and outlet channel.
with the rotor/stator blades leading and trailing edges, and the others for the inlet and outlet channel.   Figure 14 shows the overall performance maps of total pressure ratio ( Figure 14a) and adiabatic efficiency (Figure 14b) computed by different types of models for the five rotational speed schemes. The experimental characteristic lines are also plotted for reference. SVR, GPR, EMP, and EXP represent SVR models, GPR models, empirical correlations and experimental data, respectively. In  Figure 14 shows the overall performance maps of total pressure ratio ( Figure 14a) and adiabatic efficiency (Figure 14b) computed by different types of models for the five rotational speed schemes. The experimental characteristic lines are also plotted for reference. SVR, GPR, EMP, and EXP represent SVR models, GPR models, empirical correlations and experimental data, respectively. In general, the SLC method has an intrinsic defect that is incapable of predicting performance in choke conditions very well, since mass flow is the input for flow continuity equation and needs to be adjusted to predict different performance. The SLC approach performs well for the operations points of low rotational speed, where the flow in blade channel is not yet choked. However, for configurations where the Mach number in the throat area of the blade channel is close to 1, which means the blade channel has reached the maximum capability of through-flow, the mass of the compressor remains constant, although the performance still changes. This situation usually happens in high rotational speed regions, where SLC has a higher prediction error with the experiment as shown. Some alternative methods are developed to solve this problem, such as the implementation of dedicated models for the choke regions, localized choked flow treatment [39], or time marching through-flow method [40], which is out of the scope of this article. For the comparison of the different models, treatment of this defect is not taken into consideration in this work.
reduction of RSME indicates that all the models have better performance on conditions of low speed than higher. The reasons for surrogate models and empirical correlations are similar. The empirical correlations work well for low-loading airfoil types and low Mach number conditions because they are derived from data of some typical subsonic airfoils like NACA-65 series, but they are insufficient to simulate transonic compressors with high rotational speeds. Similarly, most training data are low speed samples, while a small part of high speed samples are located in rotor tip zones with high rotational speeds, which makes the surrogate model have a higher accuracy for the low speed cases. The increments of RMSE for GPR and SVR prove the notable improvement of predicted accuracy. Compared with the empirical correlations, the surrogate models both show better agreement with the experimental data than empirical models over all the characteristic lines. The accuracy improvement of surrogate models also increase as the rotational speeds are decreasing. Compared with empirical models, the increments of error of π change from −41.81% and −0.73% to −85.78% and −90.83%, while the increments of error of η change from −26.58% and 1.66% to −58.10% and −72.30% respectively for GPR and SVR. In addition, the GPR models have better performance compared to SVR for high rotational speed conditions, while vice versa for low speeds. The The root mean squared errors (RMSE) of the predicted overall performance of different models on the five characteristic lines compared to experimental data are listed in Table 5. The π and η indicate RSME of total pressure ratio, adiabatic efficiency of each model and the ∆π and ∆η means the percentage of increment of RSME for SVR and GPR compared to EMP. From the variation of predicted errors of each model over the whole characteristic lines, the accuracy of EMP, GPR, and SVR keeps improving with the reduction of rotational speeds. Regarding EMP, GPR, and SVR, the error of π decreases from 0.6038, 0.2430, and 0.4146 to 0.0626, 0.0089, and 0.0057, respectively while the error of η decrease from 0.1576, 0.0593, and 0.0679 to 0.0587, 0.0246, and 0.0163 respectively. The reduction of RSME indicates that all the models have better performance on conditions of low speed than higher. The reasons for surrogate models and empirical correlations are similar. The empirical correlations work well for low-loading airfoil types and low Mach number conditions because they are derived from data of some typical subsonic airfoils like NACA-65 series, but they are insufficient to simulate transonic compressors with high rotational speeds. Similarly, most training data are low speed samples, while a small part of high speed samples are located in rotor tip zones with high rotational speeds, which makes the surrogate model have a higher accuracy for the low speed cases. The increments of RMSE for GPR and SVR prove the notable improvement of predicted accuracy. Compared with the empirical correlations, the surrogate models both show better agreement with the experimental data than empirical models over all the characteristic lines. The accuracy improvement of surrogate models also increase as the rotational speeds are decreasing. Compared with empirical models, the increments of error of π change from −41.81% and −0.73% to −85.78% and −90.83%, while the increments of error of η change from −26.58% and 1.66% to −58.10% and −72.30% respectively for GPR and SVR. In addition, the GPR models have better performance compared to SVR for high rotational speed conditions, while vice versa for low speeds. The characteristic lines of EMP are smoother than the surrogate models since empirical models usually use parabolic curves to approximate the function relationship between total pressure loss/deviation angle and incidence angle increments.
In contrast, the surrogate models cannot get enough information from regions where training samples are sparse and lead to inaccurate predictions, which is the same reason why characteristic lines of high rotational speeds are rougher than the lower rotational speeds. This can be improved by adding more training samples.

Predicted Flow Field Analysis
For further comparison of the models, the operatio point near the stalled margin of experimental data on the 50% rotational speed line (the left endpoint with massflow 32.3 kg/s) is chosen as the reference case, since all the models perform better at lower rpm. The total pressure ratio π, adiabatic efficiency η, and the prediction errors ERR of all models are listed in Table 6. Both GPR and SVR have a lower prediction error for both π and η than the empirical correlations. The source of the errors can be seen in the following spanwise distribution of predictions.  Figure 15 shows the distribution of deviation angle and total pressure loss along blade span for blade rows. In Figure 15a, the measured deviation angle basically distributes like a bucket shape, with higher values near endwalls and lower values in the mid-span, although a small difference exists in the first stage due to the effects of part-span shroud of R1 at 61% span location [13] The predictions of SVR and GPR models show a good agreement with the EXP data, while the empirical correlations overestimated in S1, and underestimated in the second stage. The underestimated deviation angle in rotors mainly results in a higher predicted total pressure ratio for empirical correlations, because for a certain airfoil, a lower deviation angle means higher turning flow angle as well as more energy provided. In Figure 15b, the SVR and GPR models also show a better performance than the empirical correlations for total pressure loss prediction. The underestimation mainly happens in the tip region of the first stage, where the actual total pressure loss is higher and makes the predicted adiabatic efficiency of empirical correlations much higher than the experimental data. In contrast, the SVR and GPR models accurately predicted the loss distribution for most span ranges, except the tip region of S1. The predicted errors relative to experimental data at hub, mid-span and tip blade span are listed in Table 7 for comparing the accuracy of EMP, GPR, and SVR models. For intuitive comparison, RMSE are also calculated for each blade row as the criteria. Although the EMP model produces a lower predicted error at a few local parts, GPR and SVR models have a lower RMSE for both deviation  The predicted errors relative to experimental data at hub, mid-span and tip blade span are listed in Table 7 for comparing the accuracy of EMP, GPR, and SVR models. For intuitive comparison, RMSE are also calculated for each blade row as the criteria. Although the EMP model produces a lower predicted error at a few local parts, GPR and SVR models have a lower RMSE for both deviation angle and loss predictions, except the loss predictions for R2. Furthermore, Figure 16 shows the predicted meridional contour of relative Mach number and total pressure (Pt) of different models comparing with measured data. For both variables, the contours of surrogate models show better agreement with experimental data than does the empirical correlations. As shown in Figure 16a, obvious discrepancy exists on the spanwise distribution of inlet relative Mach number for stators between EMP and EXP results. Based on the velocity triangle of the blade element section, the higher predicted inlet Mach number in the tip region of stators results from the underestimation of the deviation angle for upstream rotor blades, which corresponds to the analysis of EMP in Figure 15a. It's also easy to notice the difference between EMP and EXP on the both stators from Figure 16b. For the EMP result, the higher Pt at S1 tip inlet develops from the underestimation of deviation angle at R1 outlet (Figure 15a), while the high Pt region through S1 channel is attributed to the underestimation of total pressure loss at S1 tip ( Figure 15b). Two notable high Pt zones are located near the endwalls of the tip and hub after R2 due to the underestimation of R2 deviation angle (Figure 15a). Table 8 lists the RMSE of Mach number and Pt corresponding to Figure 16 on both leading edge (LE) and trailing edge (TE) of each blade rows. The predicted accuracy of surrogate models on Ma and Pt has a notable improvement, which is up to 49.24% and 73.13% for GPR, and 71.62% and 87.19% for SVR.
R2 deviation angle (Figure 15a). Table 8 lists the RMSE of Mach number and Pt corresponding to Figure 16 on both leading edge (LE) and trailing edge (TE) of each blade rows. The predicted accuracy of surrogate models on Ma and Pt has a notable improvement, which is up to 49.24% and 73.13% for GPR, and 71.62% and 87.19% for SVR.
In summary, the comparison suggests that the data-driven surrogate model provides an alternative approach to get a reliable prediction of compressor performance when typical empirical correlations are incapable.    In summary, the comparison suggests that the data-driven surrogate model provides an alternative approach to get a reliable prediction of compressor performance when typical empirical correlations are incapable.

Conclusions and Outlook
An alternative way of using surrogate models to predict the total pressure loss and deviation angle was developed in this work. Experimental data measured along the blade spans of a two-stage axial transonic compressor over a wide operation range constituted the samples database for the training models. The most influential inputs were identified from seven geometric and six aerodynamic features using a sensitivity analysis approach for both surrogate models. Two supervised learning methods, support vector regression and Gaussian process regression, were applied to train the surrogate models. The SMBO algorithm was implemented to tune the hyper parameters with cross validation scheme to avoid an over-fit situation. Both models obtained the optimal hyper parameters within 100 iteration steps. The validation showed that the SVR and GPR models have similar performance on the sample database. The optimal models were then integrated into a through-flow program developed using the streamline curvature method to predict the performance and flow field of the compressor. Some typical empirical correlations were also introduced as reference. The results of surrogate models showed good agreement with the measured data, whereas the problem of underestimation or overestimation