An Explainable Prediction Model for Aerodynamic Noise of an Engine Turbocharger Compressor Using an Ensemble Learning and Shapley Additive Explanations Approach

: In the ﬁelds of environment and transportation, the aerodynamic noise emissions emitted from heavy-duty diesel engine turbocharger compressors are of great harm to the environment and human health, which needs to be addressed urgently. However, for the study of compressor aerodynamic noise, particularly at the full operating range, experimental or numerical simulation methods are costly or long-period, which do not match engineering requirements. To ﬁll this gap, a method based on ensemble learning is proposed to predict aerodynamic noise. In this study, 10,773 datasets were collected to establish and normalize an aerodynamic noise dataset. Four ensemble learning algorithms (random forest, extreme gradient boosting, categorical boosting (CatBoost) and light gradient boosting machine) were applied to establish the mapping functions between the total sound pressure level (SPL) of the aerodynamic noise and the speed, mass ﬂow rate, pressure ratio and frequency of the compressor. The results showed that, among the four models, the CatBoost model had the best prediction performance with a correlation coefﬁcient and root mean square error of 0.984798 and 0.000628, respectively. In addition, the error between the predicted total SPL and the observed value was the smallest, at only 0.37%. Therefore, the method based on the CatBoost algorithm to predict aerodynamic noise is proposed. For different operating points of the compressor, the CatBoost model had high prediction accuracy. The noise contour cloud in the predicted MAP from the CatBoost model was better at characterizing the variation in the total SPL. The maximum and minimum total SPLs were 122.53 dB and 115.42 dB, respectively. To further interpret the model, an analysis conducted by applying the Shapley Additive Explanation algorithm showed that frequency signiﬁcantly affected the SPL, while the speed, mass ﬂow rate and pressure ratio had little effect on the SPL. Therefore, the proposed method based on the CatBoost algorithm could well predict aerodynamic noise emissions from a turbocharger compressor.


Introduction
As the problems of energy shortage and environmental pollution are becoming more and more prominent, reducing fuel consumption and pollutant emissions from road vehicles is one of the most important approaches to achieving environmentally and economically sustainable development [1][2][3].Turbochargers are widely used in the transportation field because they increase engines' specific power and reduce gas emissions [4][5][6][7].Unfortunately, the noise emissions generated by turbochargers become a non-negligible part of the engine noise source, hindering environmentally sustainable development to some extent [8,9].In addition, due to the increase in output power requirements for diesel engines, the turbocharger pressure ratio increases, resulting in an increase in the compressor load and higher aerodynamic noise emissions [10,11].The existing literature indicates that aerodynamic noise is considered to be the main noise source in turbochargers [12,13].Aerodynamic noise mainly consists of discrete noise and broadband noise, which is generated by the turbulent motion between the airflow and the compressor components [14,15].Due to the complexity of the turbulent motion, it is difficult to quantitatively describe the flow field and the resulting induced sound field during the operation of a turbocharger compressor by means of a complete mathematical analytical formula.Therefore, experimental or numerical simulation methods are often relied upon to obtain a realistic situation of the compressor aerodynamic noise.
Analyzing the aerodynamic noise distribution of a compressor is the basis for achieving aerodynamic noise emission control.Researchers have conducted numerous experimental studies on compressors' aerodynamic noise.Raitor et al. [16] studied the main noise sources of centrifugal compressors.The results indicated that blade passing frequency (BPF) noise, buzzsaw noise and tip clearance noise were the main noise sources.Figurella et al. [17] showed that in the compressor, BPF and its harmonic frequencies, discrete noise could be observed.Sun et al. [18] conducted experiments to investigate the influences of foam metal casing treatment on an axial flow compressor's aerodynamic noise.The results showed that the use of the foam metal casing treatment could reduce aerodynamic noise within a range of 0.18 dB∼1.6 dB.Zhang et al. [19] used the experiment method to investigate the effect of differential tip clearances on the noise emissions of an axial compressor.The results showed that when the tip clearance was small, the sound pressure level (SPL) of the compressor was lowest.Furthermore, Galindo et al. [20] carried out experiments to study the influence of inlet geometry on an automotive turbocharger compressor noise.They found that the aerodynamic noise emissions and surge margin could significantly improve using a convergent-divergent nozzle.Therefore, the experimental approach is an effective way to study the aerodynamic noise of turbocharger compressors.However, the test bench operation and cost limitations make it difficult to carry out the measurement of SPL for compressor aerodynamic noise under arbitrary operating conditions.This brings challenges for reducing compressor noise emissions and promoting environmentally sustainable development.
With the gradual development of computational fluid dynamics (CFD), numerical simulation techniques for compressor noise coupled with CFD and computational aerodynamic acoustics methods have been widely used [21,22].Liu et al. [23] calculated the unsteady flow field of a compressor and used the flow field results to obtain noise source information.In order to calculate centrifugal fan and axial compressor noises, the RANS method and the Ffowcs Williams and Hawkings (FW-H) equation were used by Khelladi et al. [24] and Laborderie et al. [25].Karim et al. [12] conducted a CFD numerical simulation with the use of the Large Eddy Simulation approach to measure the pressure signal at the inlet and outlet of a compressor, and to calculate the SPL and spectral distribution.Lu et al. [26] conducted an experimental and simulation investigation on the aerodynamic noise of an axial compressor.They found that the main sound source areas were the rotor and stator.In addition, Zhang et al. [27] used multiple calculation methods to investigate the effect of an approximately solid surface wall on fan noise propagation.However, due to the large resource consumption and long computation time of the multi-dimensional and dense-grid numerical simulation of compressor noise, there are certain disadvantages in engineering applications.
From the above literature analysis, it is clear that traditional experimental measurements are costly, long-period and complicated to operate.The advent of numerical simulation methods has made it possible to obtain more detailed flow structures and richer flow field information than experiments at a lower cost.However, turbulence is a nonlinear mechanical system with a large number of degrees of freedom and an extremely wide range of scales.For models with complex geometrical shapes and high Reynolds number flows, even if they rely on numerical simulation by computer, they still need to perform very complicated calculations on a rapidly increasing number of grids, which consumes huge computational resources.Therefore, in order to save the costs of experiments or simulations and shorten their time cycles, data-driven methods are gradually becoming a focus of attention [28].That is, by means of machine learning, key information can be extracted and "black box" models constructed based on sample data from experiments or numerical simulations.
Machine learning, as an interdisciplinary discipline, has received sustained attention from many scholars in recent years [29][30][31].Ensemble learning algorithms based on decision trees, such as extreme gradient boosting (XGBoost) and random forest (RF), are widely used in the study of complex nonlinear models in the environmental field [32,33].Furthermore, the recently proposed categorical boosting (CatBoost) and light gradient boosting machine (LightGBM) algorithms have gained attention due to their excellent performance on small datasets and their strong overfitting resistance [34,35].However, these algorithms are based on decision trees and are often considered "black box" algorithms, making it difficult to know their prediction process.In recent years, researchers have introduced a number of techniques to explain machine learning algorithms.The partial dependence diagram (PDP), as a classical method to reveal the mean partial relationship of one or more features in the model results, has been adopted by many researchers [36].However, the average marginal benefits calculated by PDPs may hide the variability among data.Therefore, the Shapley Additive Explanation (SHAP) method was introduced to overcome these problems.The SHAP method is a game theory-based model diagnosis method that can improve interpretability by calculating the importance value of each input feature on the prediction results [37,38].In addition, the SHAP method offers the possibility to visualize and interpret the contribution of a feature value to the predicted results using SHAP values.
In the existing literature, the experimental and numerical simulation methods are the main approaches used to study the aerodynamic noise characteristics.However, they are costly and long-period at the full compressor operating range, which has some drawbacks in engineering applications.To fill this gap, based on compressor aerodynamic noise datasets, four ensemble learning methods (RF, XGBoost, CatBoost and LightGBM) and the SHAP algorithm were used to establish an interpretable compressor aerodynamic noise prediction model in this study.The model based on the CatBoost algorithm with the best predictive performance among the four models was selected through tenfold cross-validation to carry out the aerodynamic noise prediction, and the differences between the predicted results and the observed values were compared and analyzed.A MAP diagram of the aerodynamic noise at the full operating range is presented.Furthermore, in order to understand the prediction process of the proposed method, the SHAP algorithm was used to reveal the nonlinear relationship between the model input features and the predicted results.The interpretable prediction model proposed in this study could accurately evaluate the compressor aerodynamic noise under arbitrary operating conditions and provide data and theoretical support for realizing the control of noise emissions and contributing to environmentally sustainable development.Figure 1 shows the research framework of this study.

Research Methodology
The interpretable prediction model building and analysis process for predicting com pressor aerodynamic noise is shown in Figure 2. The compressor aerodynamic noise dat were obtained from the experiments, and the data were processed to build the emissio

Research Methodology
The interpretable prediction model building and analysis process for predicting compressor aerodynamic noise is shown in Figure 2. The compressor aerodynamic noise data were obtained from the experiments, and the data were processed to build the emission prediction model.Four ensemble machine learning methods (random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost) and light gradient boosting machine (LightGBM)) were used to construct the models, and then an interpretable algorithm of Shapley Additive Explanations (SHAP) was used to analyze the extent to which the input features contributed to the output results and provided explanations for the aerodynamic noise prediction process.The results of the study can provide decision-making for compressor aerodynamic noise emission control.

Research Methodology
The interpretable prediction model building and analysis process for predicting compressor aerodynamic noise is shown in Figure 2. The compressor aerodynamic noise data were obtained from the experiments, and the data were processed to build the emission prediction model.Four ensemble machine learning methods (random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost) and light gradient boosting machine (LightGBM)) were used to construct the models, and then an interpretable algorithm of Shapley Additive Explanations (SHAP) was used to analyze the extent to which the input features contributed to the output results and provided explanations for the aerodynamic noise prediction process.The results of the study can provide decision-making for compressor aerodynamic noise emission control.

Experimental System and Method
Figure 3 shows the schematic of the turbocharger compressor aerodynamic noise test rig.In the compressor noise experimental system, the PCB-SN152495 type microphone was used to measure the sound pressure level (SPL) of aerodynamic noise, the PCB-HT356B21 type vibration sensor was used to test the vibration acceleration on the surface of the compressor worm shell and the SPL and vibration signals were collected and analyzed by the SIEMENS signal acquisition method.A detailed description of turbocharger test bench can be found in the literature [39,40], and the turbocharger test bench and aerodynamic noise test instruments are shown in Table 1.As can be seen from Table 1, the

Experimental System and Method
Figure 3 shows the schematic of the turbocharger compressor aerodynamic noise test rig.In the compressor noise experimental system, the PCB-SN152495 type microphone was used to measure the sound pressure level (SPL) of aerodynamic noise, the PCB-HT356B21 type vibration sensor was used to test the vibration acceleration on the surface of the compressor worm shell and the SPL and vibration signals were collected and analyzed by the SIEMENS signal acquisition method.A detailed description of turbocharger test bench can be found in the literature [39,40], and the turbocharger test bench and aerodynamic noise test instruments are shown in Table 1.As can be seen from Table 1, the turbocharger performance and noise test rig consisted of four parts, which were the compressor section, turbine section, intake and exhaust piping and components, and noise test section, respectively.Table 2 lists the measuring ranges, accuracies and uncertainties of the aerodynamic noise test instruments [40].

Research Object
The research object of this study was the turbocharger compressor of a heavy-duty diesel engine.The compressor structure using a splitter blade, the diffuser using a bladeless structure and the specific parameters are shown in Table 3.A detailed description of the specification dimensions of the compressor can be found in the literature [40].

Dataset Creation
During the experiments, the JB/T 12332-2015 "Turbocharger Noise Test Method" standard was referenced to test the noise of the compressor [41].In order to ensure the repeatability and accuracy of the aerodynamic noise experiments of the compressor, the laboratory environment and instruments needed to be measured and calibrated before the test started.The measurement methods and procedures are described in the literature [40].In addition, the turbine, exhaust pipes and facilities of the turbocharger for the test were covered and soundproofed to ensure that the compressor inlet aerodynamic noise experiments were not affected by other noise sources.In the experiments, the SPLs of aerodynamic noise corresponding to different frequencies were recorded by adjusting the speed, pressure ratio and mass flow rate of the compressor.The formula for calculating the total SPL of aerodynamic noise is shown in Equation ( 1) [42]: where L total and L i are the total SPL and SPL at a fixed frequency point, respectively.n is the number of frequency points.
In this study, a total of 10,773 sets of aerodynamic noise data were obtained from the experiments.The noise test points were determined based on a MAP diagram of the compressor performance, as shown in Figure 4.The noise test points included 21 operating points.In addition, the datasets collected in the experiments were obtained from a previous study [40].The remaining operating points were the compressor performance distribution points.
The distribution of the dataset is shown in Table 4. From the table, it can be seen that the dataset covered a total of seven speed lines ranging from 60,000 r/min to 110,000 r/min, including three operating regions of the compressor: near-choke region, high-efficiency region and near-surge region.points.In addition, the datasets collected in the experiments were obtained from a previous study [40].The remaining operating points were the compressor performance distribution points.The distribution of the dataset is shown in Table 4. From the table, it can be seen that the dataset covered a total of seven speed lines ranging from 60,000 r/min to 110,000 r/min, including three operating regions of the compressor: near-choke region, high-efficiency region and near-surge region.

Model Building and Performance Evaluation
In this study, one traditional ensemble learning algorithm (RF) and three gradient boosting decision tree (GBDT) algorithms (XGBoost, CatBoost and LightGBM) were used to build a compressor aerodynamic noise emissions prediction model.Compared to complex deep learning models, using four ensemble models made it easier to capture the variation in parameters and variable interpretation within each model.For the ensemble learning component, the RF was a typical bagging algorithm that accomplished a classification task by voting and a regression task by averaging [43].Specifically, the RF was a set of decision trees, and each tree was constructed using the best split for each node among a subset of predictors randomly chosen at that node.In the end, a simple majority vote was taken for prediction.The GBDT was a machine learning model for regression and classification, and its effective implementations included XGBoost.However, the efficiency and standardization of XGBoost was not satisfactory when feature dimensionality was high and the data size was large.Therefore, the CatBoost and LightGBM models were proposed, and these models were shown to significantly outperform other models in terms of accuracy for structured and tabulated data [44].To be specific, CatBoost used greedy strategies to consider combinations to improve classification accuracy when constructing new split points for the current tree.Meanwhile LightGBM contained two novel techniques including Gradient-based One-Sided Sampling and Exclusive Feature Bundling [45].
The ensemble learning models in this study were all implemented based on scikitlearn and Python libraries.To ensure the accuracy of the models, each model uniformly used 80% of the dataset as the training set and 20% of the dataset as the validation set.The optimal model was obtained by adjusting the training strategy using GridSearch and tenfold cross-validation methods, in which the training set was randomly divided into ten copies and the ten subsets were traversed in turn, with the current subset used for testing and the remaining nine copies used for training.The performance of the prediction model was evaluated using the coefficient of determination (R 2 ) and the root mean square error (RMSE).The R 2 and RMSE were calculated as shown in Equations ( 2) and (3): Sustainability 2023, 15, 13405 where N is the sample size, y p is the predicted value, y o is the test observation and y is the average of y o .The setup parameters of the four models are listed in Table 5.The distributions of the predicted operating points of the prediction models are shown in Figure 5.The remaining operating points were the aerodynamic noise test points of the compressor at different speed lines.

Model Interpretation
Ensemble learning models based on decision trees have often been considered as "black box" models.However, while establishing prediction models accurately, it is also necessary to explain how the prediction models work effectively.SHAP summary graphs obtained using the SHAP method have been shown to be effective in explaining the predicted results of decision tree models [38].In a SHAP summary graph, the horizontal axis (x-axis) represents the SHAP value, and the magnitude of the value indicates the average marginal contribution of the input features to the model output.A SHAP value of less than 0 indicates a negative contribution; equal to 0, no contribution; and greater than 0, a positive contribution.A positive contribution means that the input features are highly important to the final predicted result, while the least important features result in a negative contribution.Each input feature was ranked from top to bottom according to its importance, with the top features contributing more to the predicted results of the model than the bottom features.The points representing the feature values were plotted horizontally, and the color of each point from low (blue) to high (red) represents the magnitude of the SHAP value for that feature [46].

Model Interpretation
Ensemble learning models based on decision trees have often been considered as "black box" models.However, while establishing prediction models accurately, it is also necessary to explain how the prediction models work effectively.SHAP summary graphs obtained using the SHAP method have been shown to be effective in explaining the predicted results of decision tree models [38].In a SHAP summary graph, the horizontal axis (x-axis) represents the SHAP value, and the magnitude of the value indicates the average marginal contribution of the input features to the model output.A SHAP value of less than 0 indicates a negative contribution; equal to 0, no contribution; and greater than 0, a positive contribution.A positive contribution means that the input features are highly important to the final predicted result, while the least important features result in a negative contribution.Each input feature was ranked from top to bottom according to its importance, with the top features contributing more to the predicted results of the model than the bottom features.The points representing the feature values were plotted horizontally, and the color of each point from low (blue) to high (red) represents the magnitude of the SHAP value for that feature [46].In this study, two interest parameters included the compressor operating characteristics and the aerodynamic noise characteristics, which were introduced to explain their effects on the aerodynamic noise SPL.The two parameters are shown in Table 6.The parameters of compressor operating characteristics include speed, pressure ratio and mass flow rate.Related studies [47,48] have shown that compressor operating characteristics reflect the operating condition of the compressor and have an obvious impact on the SPL of aerodynamic noise.Aerodynamic noise characteristics refere to the frequencies corresponding to the SPL of aerodynamic noise.The SPL of aerodynamic noise varied for different frequencies.However, the coupling effect of these four characteristics (speed, pressure ratio, mass flow rate and frequency) on the SPL of the compressor was not well investigated, especially in terms of the contribution of each characteristic to the SPL, which was one of the focuses of this study.During the experiments, the compressor operating conditions were adjusted by changing the compressor speed, pressure ratio and mass flow rate, and the aerodynamic noise was measured.As can be seen from Figure 4 and Table 4, the compressor speed distribution ranged from 60,000 r/min to 110,000 r/min, the pressure ratio distribution ranged from 1.3 to 4.175, the mass flow rate distribution ranged from 0.151 kg/s to 0.542 kg/s and the frequency distribution ranged from 0 to 25,600 Hz.Therefore, by changing the speed, pressure ratio and mass flow rate within a certain range, the compressor was operated under different operating conditions, and then the aerodynamic noise was generated.

Results and Discussion
Different frequencies corresponded to different SPLs of aerodynamic noise.The aerodynamic noise characteristics of the compressor under various operating conditions are shown in Figure 6, and the experimental data were provided by a previous study [40].
From the figure, it can be seen that under the same operating conditions, the SPL of aerodynamic noise basically tended to decrease as the frequency increased.The frequency distribution ranged from 0 to 25,600 Hz, which shows that different frequencies had an effect on the SPL of aerodynamic noise.Therefore, speed, pressure ratio, mass flow rate and frequency were selected as the four features describing the aerodynamic noise generated during the operation of the compressor, and the output result was the SPL of the corresponding frequency.In this study, a total of 10,773 sets of valid data were collected, in which each set of data contained the SPL of aerodynamic noise and the four characteristic values affecting the SPL.To prevent the influence of the magnitude on the model training results, all the eigenvalues were normalized.

Importance of Input Features
The importance of the input features of the models was analyzed using the SHAP method.Figure 7 shows the results of ranking the importance of the input features for the four models.In the figure, the SHAP values of all features obtained by applying the SHAP method were within 0.18.Among the four models, the SHAP values of each feature were frequency > speed > mass flow rate > pressure ratio in descending order.Among all the features, frequency was the most important feature which affected the SPL of the aerodynamic noise, and its average SHAP value was above 0.16.This was because the SPLs of the aerodynamic noise corresponding to different frequencies were significantly different under the same compressor operating conditions (speed, mass flow rate and pressure ratio were the same), which made frequency have the greatest effect on the SPL.This result is consistent with that of Xu et al. [49].Compared with the RF model, the SHAP values of speed in the three models of XGBoost, CatBoost and LightGBM were all above 0.02, and there was a significant difference with the third ranked mass flow rate.This indicates that the influence of speed was still larger in these three models.The above results show that among the four models, the frequency, speed, mass flow rate and pressure ratio had an influence on the output results of the prediction models and could be used as the input features.Therefore, speed, pressure ratio, mass flow rate and frequency were selected as the four features describing the aerodynamic noise generated during the operation of the compressor, and the output result was the SPL of the corresponding frequency.In this study, a total of 10,773 sets of valid data were collected, in which each set of data contained the SPL of aerodynamic noise and the four characteristic values affecting the SPL.To prevent the influence of the magnitude on the model training results, all the eigenvalues were normalized.

Importance of Input Features
The importance of the input features of the models was analyzed using the SHAP method.Figure 7 shows the results of ranking the importance of the input features for the four models.In the figure, the SHAP values of all features obtained by applying the SHAP method were within 0.18.Among the four models, the SHAP values of each feature were frequency > speed > mass flow rate > pressure ratio in descending order.Among all the features, frequency was the most important feature which affected the SPL of the aerodynamic noise, and its average SHAP value was above 0.16.This was because the SPLs of the aerodynamic noise corresponding to different frequencies were significantly different under the same compressor operating conditions (speed, mass flow rate and pressure ratio were the same), which made frequency have the greatest effect on the SPL.This result is consistent with that of Xu et al. [49].Compared with the RF model, the SHAP values of speed in the three models of XGBoost, CatBoost and LightGBM were all above 0.02, and there was a significant difference with the third ranked mass flow rate.This indicates that the influence of speed was still larger in these three models.The above results show that among the four models, the frequency, speed, mass flow rate and pressure ratio had an influence on the output results of the prediction models and could be used as the input features.

Model Performance Comparison
Four prediction models of compressor aerodynamic noise were obtained by training and tenfold cross-validation with 8618 sets of a training dataset.The purpose of the tenfold cross-validation for prediction models was to select the optimal model parameters corresponding to the four models, thus improving the generalization ability of the models [43].The R 2 and RMSE obtained for each calculation in the tenfold cross-validation are shown in Figure 8, and the average R 2 and average RMSE values from the tenfold cross-validation of the four models are shown in Figure 9.As can be seen from Figure 8, among the ten tests, the best prediction performance tests of RF, XGBoost, CatBoost and LightGBM were Test 5, Test 10, Test 6 and Test 3, respectively.In addition, from Figures 8 and 9, it can be seen that the R 2 , RMSE and the mean R 2 and mean RMSE of the model training results in the tenfold cross-validation were close.Among them, the CatBoost model had the largest mean R 2 and the smallest mean RMSE with the values of 0.983579 and 0.000694, respectively.Therefore, for the four models, the optimal model was selected, respectively, for predictions adopting the tenfold cross-validation method.
Sustainability 2023, 15, x FOR PEER REVIEW 13 of 25 Four prediction models of compressor aerodynamic noise were obtained by training and tenfold cross-validation with 8618 sets of a training dataset.The purpose of the tenfold cross-validation for prediction models was to select the optimal model parameters corresponding to the four models, thus improving the generalization ability of the models [43].The R 2 and RMSE obtained for each calculation in the tenfold cross-validation are shown in Figure 8, and the average R 2 and average RMSE values from the tenfold crossvalidation of the four models are shown in Figure 9.As can be seen from Figure 8, among the ten tests, the best prediction performance tests of RF, XGBoost, CatBoost and LightGBM were Test 5, Test 10, Test 6 and Test 3, respectively.In addition, from Figures 8  and 9, it can be seen that the R 2 , RMSE and the mean R 2 and mean RMSE of the model training results in the tenfold cross-validation were close.Among them, the CatBoost model had the largest mean R 2 and the smallest mean RMSE with the values of 0.983579 and 0.000694, respectively.Therefore, for the four models, the optimal model was selected, respectively, for predictions adopting the tenfold cross-validation method.To determine the prediction performances of the models, the best models built by the four ensemble machine learning algorithms were applied to predict the 2155 datasets in the validation set, respectively.Figure 10 shows the R 2 and RMSE of the predicted results of the four models.It can be seen that overfitting was avoided in all four models.Among the four models, the largest R 2 and the smallest RMSE with the values of 0.984798 and 0.000628 can be seen in the CatBoost model, respectively, which indicates that the CatBoost-based model had the best predictive performance.Therefore, in this study, frequency, speed, mass flow rate and pressure ratios were used as the model input features, and the CatBoost algorithm was applied to build the compressor aerodynamic noise emission prediction model.To determine the prediction performances of the models, the best models built by the four ensemble machine learning algorithms were applied to predict the 2155 datasets in the validation set, respectively.Figure 10 shows the R 2 and RMSE of the predicted results of the four models.It can be seen that overfitting was avoided in all four models.Among the four models, the largest R 2 and the smallest RMSE with the values of 0.984798 and 0.000628 can be seen in the CatBoost model, respectively, which indicates that the Cat-Boost-based model had the best predictive performance.Therefore, in this study, frequency, speed, mass flow rate and pressure ratios were used as the model input features, and the CatBoost algorithm was applied to build the compressor aerodynamic noise emission prediction model.To determine the prediction performances of the models, the best models built by the four ensemble machine learning algorithms were applied to predict the 2155 datasets in the validation set, respectively.Figure 10 shows the R 2 and RMSE of the predicted results of the four models.It can be seen that overfitting was avoided in all four models.Among the four models, the largest R 2 and the smallest RMSE with the values of 0.984798 and 0.000628 can be seen in the CatBoost model, respectively, which indicates that the Cat-Boost-based model had the best predictive performance.Therefore, in this study, frequency, speed, mass flow rate and pressure ratios were used as the model input features, and the CatBoost algorithm was applied to build the compressor aerodynamic noise emission prediction model.Figure 11 shows the observed values and the predicted total SPL of aerodynamic noise for the four models.The predicted total SPL of aerodynamic noise based on the CatBoost algorithm had only 0.37% error compared with the observed value, which was the smallest error among the four models, indicating that the model established by applying the CatBoost algorithm had the highest prediction accuracy.
The predicted results of the CatBoost model based on 2155 sets of validation datasets were compared with the observed values.The comparison results for three randomly selected operating condition points are shown in Figure 12.The slanted straight line indicated the degree of fit between the predicted and observed values.In this study, 60,000 r/min, 90,000 r/min and 110,000 r/min were chosen to represent the low, medium and high speeds of the compressor, respectively.The CatBoost model had a high prediction accuracy under all three operating conditions.Compared with the medium and high speeds, the CatBoost model had the highest prediction accuracy under low-speed conditions (60,000 r/min) with an R 2 and RMSE of 0.997237 and 1.290883, respectively, indicating that the CatBoost model could accurately capture and predict the nonlinear relationship between the SPL of the compressor's aerodynamic noise and different input features.The predicted results of the CatBoost model based on 2155 sets of validation datasets were compared with the observed values.The comparison results for three randomly selected operating condition points are shown in Figure 12.The slanted straight line indicated the degree of fit between the predicted and observed values.In this study, 60,000 r/min, 90,000 r/min and 110,000 r/min were chosen to represent the low, medium and high speeds of the compressor, respectively.The CatBoost model had a high prediction accuracy under all three operating conditions.Compared with the medium and high speeds, the CatBoost model had the highest prediction accuracy under low-speed conditions (60,000 r/min) with an R 2 and RMSE of 0.997237 and 1.290883, respectively, indicating that the CatBoost model could accurately capture and predict the nonlinear relationship between the SPL of the compressor's aerodynamic noise and different input features.Related studies have shown that the blade passing frequency (BPF) noise is one of the main noise components of a compressor's aerodynamic noise.The BPF is calculated as follows: where n and Z are the compressor speed and the number of blade sets, respectively.Figure 13 shows the predicted results of aerodynamic noise at the untested operating points in the MAP diagram of the compressor.The SPL of aerodynamic noise decreased with an increase in the frequency at different operating condition points, which was consistent with the trend in the observed values.It was further observed that for all predicted points, there was one peak at the BPF, and the peak was more obvious as the speed increased.This indicates that the model based on the CatBoost algorithm could well predict the acoustic information at specific frequencies.
From the above analysis, it can be seen that the prediction model based on the Cat-Boost algorithm could predict the aerodynamic noise for any operating conditions of the compressor and calculate the total SPL.Therefore, the aerodynamic noise MAP diagram could be given correspondingly while predicting the aerodynamic performance of the compressor.Figure 14 shows the noise MAP drawn directly using the observed values and the MAP drawn using the predicted results of the model.Among them, the noise MAP of the observed values consisted of the total SPL for the 21 test conditions, and the noise MAP predicted by the CatBoost model included the 21 observed values and the total SPLs predicted by 21 predicted points.As can be seen from the figure, the compressor aerodynamic noise increased with an increase in the compressor speed.At the same speed, the lowest total SPL of aerodynamic noise was found in the region of medium pressure ratio and medium mass flow rate.The MAP diagrams of aerodynamic noise predicted by the CatBoost model and the observed values are in good agreement, and the locations of the SPL contours are basically the same.In addition, compared with the experimental aerodynamic noise MAP, the noise contours in the predicted MAP are better at characterizing the changes in the total SPL.The maximum and minimum predicted total SPLs were 122.53 dB and 115.42 dB, respectively.Therefore, the comparison in Figure 14 further verifies the feasibility of the model built based on the CatBoost algorithm in the prediction the SPL contours are basically the same.In addition, compared with the experimental aerodynamic noise MAP, the noise contours in the predicted MAP are better at characterizing the changes in the total SPL.The maximum and minimum predicted total SPLs were 122.53 dB and 115.42 dB, respectively.Therefore, the comparison in Figure 14 further verifies the feasibility of the model built based on the CatBoost algorithm in the prediction of compressor aerodynamic noise, which could provide an accurate and usable numerical tool for the analysis of compressor aerodynamic noise prediction.

Interpretation of CatBoost Model Based on SHAP Method
The non-linear relationship between the four input features of the CatBoost-based aerodynamic noise prediction model and the SPL of the aerodynamic noise was revealed by the SHAP method.The results were extracted from a Python SHAP library. Figure 15 shows the effect of changing the input features on the SHAP value of the aerodynamic noise SPL.The color trends of the four input features show that the SPL of aerodynamic noise increased with an increase in the speed, mass flow rate and pressure ratio, and decreased with an increase in frequency.Among them, changing speed had the greatest effect on the change in SPL compared with the mass flow rate and pressure ratio.It was further observed that the SHAP values of the remaining three input features, except frequency, on the SPL of aerodynamic noise, were mainly concentrated around 0. This indicates that under similar operating conditions, the speed, mass flow rate and pressure ratio had less influence on the SPL of aerodynamic noise, while frequency could significantly affect the aerodynamic noise SPL of the compressor.

Interpretation of CatBoost Model Based on SHAP Method
The non-linear relationship between the four input features of the CatBoost-based aerodynamic noise prediction model and the SPL of the aerodynamic noise was revealed by the SHAP method.The results were extracted from a Python SHAP library. Figure 15 shows the effect of changing the input features on the SHAP value of the aerodynamic noise SPL.The color trends of the four input features show that the SPL of aerodynamic noise increased with an increase in the speed, mass flow rate and pressure ratio, and decreased with an increase in frequency.Among them, changing speed had the greatest effect on the change in SPL compared with the mass flow rate and pressure ratio.It was further observed that the SHAP values of the remaining three input features, except frequency, on the SPL of aerodynamic noise, were mainly concentrated around 0. This indicates that under similar operating conditions, the speed, mass flow rate and pressure ratio had less influence on the SPL of aerodynamic noise, while frequency could significantly affect the aerodynamic noise SPL of the compressor.The SHAP method was used to further quantify the contribution of the four input features at each operating point of the compressor aerodynamic noise.The CatBoost model was applied to predict the SPL of the aerodynamic noise for one randomly selected data point from 10,773 datasets, and the SHAP method was used to calculate the contribution of the feature values.The calculation results were extracted from a Python SHAP library as shown in Figure 16. () represents the average of the predicted results of all samples and () represents the predicted result of that point.The red color indicates that the feature led to an increase in the SPL, and the blue color indicates that the feature led to a decrease in the SPL.As can be seen from the figure, the frequency, speed and pressure ratio played a role in reducing the SPL of aerodynamic noise.At the same frequency, changing the compressor mass flow rate and speed had a greater effect on the SPL.Therefore, the SHAP method could effectively evaluate and quantify the influences of all features on the SPL during the operation of the compressor, which further increased The SHAP method was used to further quantify the contribution of the four input features at each operating point of the compressor aerodynamic noise.The CatBoost model was applied to predict the SPL of the aerodynamic noise for one randomly selected data point from 10,773 datasets, and the SHAP method was used to calculate the contribution of the feature values.The calculation results were extracted from a Python SHAP library as shown in Figure 16.E[ f (x)] represents the average of the predicted results of all samples and f (x) represents the predicted result of that point.The red color indicates that the feature led to an increase in the SPL, and the blue color indicates that the feature led to a decrease in the SPL.As can be seen from the figure, the frequency, speed and pressure ratio played a role in reducing the SPL of aerodynamic noise.At the same frequency, changing the compressor mass flow rate and speed had a greater effect on the SPL.Therefore, the SHAP method could effectively evaluate and quantify the influences of all features on the SPL during the operation of the compressor, which further increased the credibility of the prediction model for compressor aerodynamic noise based on the CatBoost algorithm.

Comparative Analysis of the Results with the Existing Research Findings
In order to further emphasize the novelty of this study, a detailed comparison was conducted between this study and existing research findings, as shown in Table 7.

Broatch et al. [50] Centrifugal compressor Numerical simulation and experiments
A numerical model of a centrifugal compressor was presented to predict a presented peak pressure point.

Numerical simulation and experiments
A radiated noise prediction model of a rotary vane compressor was established.

Zhao et al. [52]
A commercial vehicle turbocharger compressor Numerical simulation A new one-dimensional prediction model was proposed to predict intake system noise.

Soulat et al. [53]
A single stage compressor Numerical simulation The effects of wake modelling on the prediction of broadband noise generated by the impingement of turbulent wakes on a stationary blade row were studied.

Sharma et al. [54]
A turbocharger compressor with ported shroud design

Numerical simulation and experiments
(1) Spectral signatures using statistical and scaleresolving turbulence modelling methods were obtained.
(2) Rotating structures through the slot was found to potentially impact the acoustic and vibrational response.

Numerical simulation and experiments
Three operating points at nominal compressor speeds were simulated ranging from a best efficiency point to near-surge conditions.

Comparative Analysis of the Results with the Existing Research Findings
In order to further emphasize the novelty of this study, a detailed comparison was conducted between this study and existing research findings, as shown in Table 7.

Zhao et al. [52] A commercial vehicle turbocharger compressor
Numerical simulation A new one-dimensional prediction model was proposed to predict intake system noise.

Soulat et al. [53]
A single stage compressor Numerical simulation The effects of wake modelling on the prediction of broadband noise generated by the impingement of turbulent wakes on a stationary blade row were studied.

Sharma et al. [54]
A turbocharger compressor with ported shroud design

Numerical simulation and experiments
(1) Spectral signatures using statistical and scale-resolving turbulence modelling methods were obtained.(2) Rotating structures through the slot was found to potentially impact the acoustic and vibrational response.(1) A prediction method of compressor aerodynamic noise was proposed using the CatBoost algorithm.(2) During the prediction process, the nonlinear relationships between the input features (speed, mass flow rate, pressure ratio and frequency) and the SPL were elaborated upon.(3) The predicted noise MAP was better at characterizing the variation in the total SPL for the aerodynamic noise.
As can been seen from Table 7, the existing literature focused on compressor noise characteristics analysis, and the research methods used in the studies included experimentation, numerical simulation, and a combination of experimental and numerical simulation.The findings mainly included aerodynamic noise characteristics of compressors at specific operating points.However, there were few studies on the aerodynamic noise prediction of centrifugal compressors for engine turbochargers under the entire operating range.In addition, the coupling effect between the influencing parameters (speed, mass flow rate, pressure ratio and frequency) and the total SPL of the compressor was not sufficiently analyzed, especially the contribution of each characteristic toward the SPL.Therefore, the innovation involved in this study was to propose a method that could accurately predict the aerodynamic noise of a turbocharger compressor under arbitrary working conditions based on ensemble learning.In addition, the SHAP algorithm was used to analyze the aerodynamic noise prediction process, which illustrated that the speed, mass flow rate and pressure ratio had little effect on the SPL of the aerodynamic noise, while frequency could significantly affect the SPL.The results of this study could provide a theoretical basis for reducing the aerodynamic noise emissions of compressors and have engineering guidance significance.

Conclusions
Environmentally sustainable development plays an important role in human health and social development.The analysis of the aerodynamic noise of turbocharger compressors is significant for reducing noise emissions.In order to accurately evaluate the aerodynamic noise emissions under arbitrary operating conditions of a heavy-duty diesel engine turbocharger compressor, aerodynamic noise experiments on turbocharger compressors and established datasets were conducted in this study.Four ensemble machine learning algorithms (random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost) and light gradient boosting machine (LightGBM)) were introduced to establish a compressor aerodynamic noise emission prediction model, and the SHAP algorithm was used to analyze the contribution of input features toward the model results.The main findings were as follows:

•
In the compressor aerodynamic noise prediction model, the speed, pressure ratio, mass flow rate and frequency were the important input features.The degree of importance of the input features calculated based on the SHAP algorithm was frequency > speed > mass flow rate > pressure ratio in descending order.Compared with RF, the SHAP values of speed were above 0.02 in all three models of XGBoost, CatBoost and LightGBM, indicating that speed had some influence on the output results of the prediction models.

•
The compressor aerodynamic noise model based on the CatBoost algorithm had the best prediction performance with the largest R 2 and the smallest RMSE with the values of 0.984798 and 0.000628, respectively.In addition, among the four models, the CatBoost model had the smallest error between the total SPL of aerodynamic noise and the observed value, which was only 0.37%.

•
The CatBoost model had a high prediction accuracy at different operating points of the compressor.The predicted aerodynamic noise MAP from the CatBoost model and the experimental noise MAP were in good agreement, and the SPL contour locations were basically the same.In addition, compared with the experimental noise MAP, the predicted noise MAP was better at characterizing the variation in the total SPL of the aerodynamic noise.

•
The analysis of the input characteristics of the prediction model based on the SHAP algorithm showed that the frequency and the SPL were negatively correlated.The speed, mass flow rate and pressure ratio and the SPL showed a positive correlation.In addition, the effects of the speed, mass flow rate and pressure ratio on the SPL were small, while frequency could significantly affect the SPL of the compressor.

•
The prediction model of compressor aerodynamic noise established by applying the CatBoost algorithm could accurately evaluate aerodynamic noise under arbitrary operating conditions and provide data and theoretical support for realizing the control of aerodynamic noise emissions, contributing to environmentally sustainable development.

Figure 2 .
Figure 2. Explainable prediction model for compressor aerodynamic noise using Shapley Additive Explanations approach.

Figure 2 .
Figure 2. Explainable prediction model for compressor aerodynamic noise using Shapley Additive Explanations approach.

Figure 5 .
Figure 5. Predicted operating points of aerodynamic noise.

Figure 6 .
Figure 6.Aerodynamic noise characteristics of compressor under various operating conditions.Figure 6. Aerodynamic noise characteristics of compressor under various operating conditions.

Figure 6 .
Figure 6.Aerodynamic noise characteristics of compressor under various operating conditions.Figure 6. Aerodynamic noise characteristics of compressor under various operating conditions.

Figure 7 .
Figure 7. Analysis of the importance of the input features in different models.

Figure 7 .
Figure 7. Analysis of the importance of the input features in different models.

Figure 8 .
Figure 8. R 2 and RMSE values from the tenfold cross-validation of four models.Figure 8. R 2 and RMSE values from the tenfold cross-validation of four models.

Figure 8 .
Figure 8. R 2 and RMSE values from the tenfold cross-validation of four models.Figure 8. R 2 and RMSE values from the tenfold cross-validation of four models.

Figure 9 .
Figure 9.Comparison of mean R 2 and mean RMSE values of predicted results from the tenfold crossvalidation of four models.

Figure 10 .
Figure 10.Comparison of R 2 and RMSE values of predicted results for four models.

Figure 11
Figure11shows the observed values and the predicted total SPL of aerodynamic noise for the four models.The predicted total SPL of aerodynamic noise based on the Cat-Boost algorithm had only 0.37% error compared with the observed value, which was the smallest error among the four models, indicating that the model established by applying the CatBoost algorithm had the highest prediction accuracy.

Figure 9 .
Figure 9.Comparison of mean R 2 and mean RMSE values of predicted results from the tenfold cross-validation of four models.

Figure 9 .
Figure 9.Comparison of mean R 2 and mean RMSE values of predicted results from the tenfold crossvalidation of four models.

Figure 10 .
Figure 10.Comparison of R 2 and RMSE values of predicted results for four models.

Figure 11
Figure11shows the observed values and the predicted total SPL of aerodynamic noise for the four models.The predicted total SPL of aerodynamic noise based on the Cat-Boost algorithm had only 0.37% error compared with the observed value, which was the smallest error among the four models, indicating that the model established by applying the CatBoost algorithm had the highest prediction accuracy.

Figure 10 .
Figure 10.Comparison of R 2 and RMSE values of predicted results for four models.

Figure 11 .
Figure 11.Comparison of total SPL of predicted and observed values for four models.

Figure 11 .Figure 12 .
Figure 11.Comparison of total SPL of predicted and observed values for four models.Sustainability 2023, 15, x FOR PEER REVIEW 16 of 25

Figure 12 .
Figure 12.The modeling results of the CatBoost model on the validation dataset.Figure 12.The modeling results of the CatBoost model on the validation dataset.

Figure 12 .
Figure 12.The modeling results of the CatBoost model on the validation dataset.Figure 12.The modeling results of the CatBoost model on the validation dataset.

Figure 14 .
Figure 14.Comparison of experimental and CatBoost model predicted total SPL emission clouds for aerodynamic noise.

Figure 14 .
Figure 14.Comparison of experimental and CatBoost model predicted total SPL emission clouds for aerodynamic noise.

Figure 15 .
Figure 15.Relationship between the SHAP value and the values of different input features.

Figure 15 .
Figure 15.Relationship between the SHAP value and the values of different input features.

Sustainability 2023 , 25 Figure 16 .
Figure 16.Interpretation of the features contribution of the compressor aerodynamic noise prediction model.

Figure 16 .
Figure 16.Interpretation of the features contribution of the compressor aerodynamic noise prediction model.

Table 1 .
Distributions of turbocharger test bench and aerodynamic noise test instruments.

Compressor Section Turbine Section Intake and Exhaust Piping and Components Noise Test Section
[40]ocharger performance and noise test rig consisted of four parts, which were the compressor section, turbine section, intake and exhaust piping and components, and noise test section, respectively.Table2lists the measuring ranges, accuracies and uncertainties of the aerodynamic noise test instruments[40].

Table 1 .
Distributions of turbocharger test bench and aerodynamic noise test instruments.

Table 4 .
Dataset distributions of noise test points.

Table 5 .
The setup parameters of the four models.

Table 6 .
Specifications of four types of interest parameters.

Table 7 .
Comparison of the investigation and a survey of the other existing literature.

Table 7 .
Comparison of the investigation and a survey of the other existing literature.