Prediction of Neutralization Depth of R.C. Bridges Using Machine Learning Methods

: Machine learning techniques have become a popular solution to prediction problems. These approaches show excellent performance without being explicitly programmed. In this paper, 448 sets of data were collected to predict the neutralization depth of concrete bridges in China. Random forest was used for parameter selection. Besides this, four machine learning methods, such as support vector machine (SVM), k-nearest neighbor (KNN) and XGBoost, were adopted to develop models. The results show that machine learning models obtain a high accuracy (>80%) and an acceptable macro recall rate (>80%) even with only four parameters. For SVM models, the radial basis function has a better performance than other kernel functions. The radial basis kernel SVM method has the highest veriﬁcation accuracy (91%) and the highest macro recall rate (86%). Besides this, the preference of different methods is revealed in this study.


Introduction
The neutralization of concrete is a major factor that influences the service life of R.C. bridges. The alkaline environment around steel bars will be impaired by carbon dioxide and other acid materials, such as acid rain [1,2]. Subsequently, steel bars are likely to be oxidized, especially with the effect of chloride ions and moisture in the concrete. Once the steel bars are corroded, the bearing capacity of bridges will be impaired [3][4][5].
Currently, the total number of bridges in China is nearly one million, and many bridges have been in service for more than ten years. It is necessary to provide a solution for the prediction of the neutralization depth of existing bridges. However, in real engineering, the influence factors are coupled, which makes it difficult to estimate the neutralization status of concrete. Besides this, for existing bridges, another difficulty is the loss of original bridge construction information. However, some information (e.g., water cement ratio, maximum nominal aggregate size, and cement content) is often considered necessary for the prediction of neutralization depth. Besides this, predicting the neutralization of the concrete in inland river bridges is interesting as regards the special service ambiance of these concrete components. The influences of the river, wind, traffic load and some unknown factors are significant. However, accurately quantifying the effects of these factors by formulas is difficult.
Machine learning (ML) is proving to be an efficient approach to solving the above problems. ML refers to the capability of computers to obtain knowledge from datasets without being explicitly programmed [6]. It includes many powerful methods, such as support vector machine (SVM), decision tree, k-means, AdaBoost and k-nearest neighbor (KNN). One ML method mainly consists of two parts: the decision function and objective function. For a new data point, the decision function is used to predict its category. The decision function contains some pending parameters that must be determined by optimizing the objective function. The objective function at least contains a loss function and a regularization item. The loss function depicts the gap between true values and prediction values; the regularization item is used to avoid model overfitting. ML methods have been widely used in civil engineering. The first application of ML was to promote structural safety [7]. Nowadays, ML is used in structural health monitoring [8][9][10][11], reliability analysis [12,13], and earthquake engineering [14][15][16].
In addition, machine learning techniques show great potential in the concrete industry. The complexity of concrete makes it difficult to developing prediction models. However, models developed by ML methods always achieve a high accuracy [17][18][19][20]. Topçu et al. [21] proposed an artificial neural networks (ANN) model to evaluate the effect of fly ash on the compressive strength of concrete. The results show that the root-mean-squared error (RMSE) of the ANN model is less than 3.0. Bilim et al. [22] constructed an ANN model to predict the compressive strength of ground granulated blast furnace slag (GGBFS) concrete. Sarıdemir et al. [23] used ANN and a fuzzy logic method to predict the long-term effects of GGBFS on the compressive strength of concrete. Their results show that the fuzzy logic model has a low RMSE (3.379); however, the ANN model's RMSE (2.511) is lower. Golafshani et al. [24] used grey wolf optimizer to improve the performance of the ANN model and an adaptive neuro-fuzzy inference system model in predicting the compressive strength of concrete. Kandiri et al. [25] developed some ANN models with a slap swarm algorithm to estimate the compressive strength of concrete. The results show that this algorithm can reduce the RMSE of ANN models. Machine learning methods can be used for classification problems, regress problems, feature selection and data mining. Compared with conventional models, machine learning models are good at gaining information from data.
Machine learning can select a few effective parameters for developing models. Han et al. [26] measured the importance of parameters based on the random forest method, and then used this approach to establish prediction models. Their results show that the performance of models can be obviously improved by parameter selection. Random forest is an effective feature selection method [27]. It is widely used in bioscience [28,29], computer science [30], environmental sciences [31], and many other fields. Zhang et al. [32] used random forest to select important features from a building energy consumption dataset. Yuan et al. [33] employed random forest to rank the features of house coal consumption.
In this paper, random forest was adopted for parameter selection. SVM, KNN, Ad-aBoost and XGBoost were used to develop prediction models. These ML methods have been successfully used in many fields [34][35][36][37]. A comparison among these ML models was also conducted to reveal the preference for different methods in the prediction of neutralization depth.

Dataset Description and Analysis
The dataset, which focuses on the neutralization depth of R.C. bridges in China, includes 448 samples. Parameters such as service time, concrete strength, bridge load class and environmental conditions were considered in this study. Figure 1 shows the distribution of these bridges. The full information of the dataset is shown in the Appendix A. The dataset was collected from references , and the meteorological data of the city were collected from the environmental meteorological data center of China. These references  are included in two professional Chinese document databases: CNKI and WANFANG DATA. All the samples included in the Appendix A are the detection data of existing bridges.
The neutralization depth of concrete was tested with phenolphthalein, and the compressive strength was measured with a resiliometer and calculated according to the Technical Specification for Inspecting of Concrete Compressive Strength by Rebound Method (JGJ/T 23-2011). The vehicle load of bridges was divided into two levels, according to the General Specifications for Design of Highway Bridges and Culverts (JTG D60-2015). There are no missing data in the dataset; any samples containing missing values were abandoned  The reliability of the ML models relies on the quality of the dataset. Generally, ML models have an excellent performance within the scope of the training dataset. However, predicting the neutralization depth of a new sample outside the range of the training dataset is difficult for ML models. Therefore, a dataset with a wide scope is necessary for the   General Specifications for Design of Highway Bridges and Culverts (JTG D60-2015). There are no missing data in the dataset; any samples containing missing values were abandoned during collection. Table 1 gives a detailed description of the dataset. The climatic division used in Table 1 is derived from the climatic division map of the geographic atlas of China (Peking University) [69].  The reliability of the ML models relies on the quality of the dataset. Generally, ML models have an excellent performance within the scope of the training dataset. However, predicting the neutralization depth of a new sample outside the range of the training dataset is difficult for ML models. Therefore, a dataset with a wide scope is necessary for the General Specifications for Design of Highway Bridges and Culverts (JTG D60-2015). There are no missing data in the dataset; any samples containing missing values were abandoned during collection. Table 1 gives a detailed description of the dataset. The climatic division used in Table 1 is derived from the climatic division map of the geographic atlas of China (Peking University) [69].  The reliability of the ML models relies on the quality of the dataset. Generally, ML models have an excellent performance within the scope of the training dataset. However, predicting the neutralization depth of a new sample outside the range of the training dataset is difficult for ML models. Therefore, a dataset with a wide scope is necessary for the General Specifications for Design of Highway Bridges and Culverts (JTG D60-2015). There are no missing data in the dataset; any samples containing missing values were abandoned during collection. Table 1 gives a detailed description of the dataset. The climatic division used in Table 1 is derived from the climatic division map of the geographic atlas of China (Peking University) [69].  The reliability of the ML models relies on the quality of the dataset. Generally, ML models have an excellent performance within the scope of the training dataset. However, predicting the neutralization depth of a new sample outside the range of the training dataset is difficult for ML models. Therefore, a dataset with a wide scope is necessary for the General Specifications for Design of Highway Bridges and Culverts (JTG D60-2015). There are no missing data in the dataset; any samples containing missing values were abandoned during collection. Table 1 gives a detailed description of the dataset. The climatic division used in Table 1 is derived from the climatic division map of the geographic atlas of China (Peking University) [69].  The reliability of the ML models relies on the quality of the dataset. Generally, ML models have an excellent performance within the scope of the training dataset. However, predicting the neutralization depth of a new sample outside the range of the training dataset is difficult for ML models. Therefore, a dataset with a wide scope is necessary for the 1 2 Acid rain pH -pH < 4.6, pH < 5.6, pH > 5.6. General Specifications for Design of Highway Bridges and Culverts (JTG D60-2015). There are no missing data in the dataset; any samples containing missing values were abandoned during collection. Table 1 gives a detailed description of the dataset. The climatic division used in Table 1 is derived from the climatic division map of the geographic atlas of China (Peking University) [69].  The reliability of the ML models relies on the quality of the dataset. Generally, ML models have an excellent performance within the scope of the training dataset. However, predicting the neutralization depth of a new sample outside the range of the training dataset is difficult for ML models. Therefore, a dataset with a wide scope is necessary for the 1 2 Load level p -Level 1, level 2. General Specifications for Design of Highway Bridges and Culverts (JTG D60-2015). There are no missing data in the dataset; any samples containing missing values were abandoned during collection. Table 1 gives a detailed description of the dataset. The climatic division used in Table 1 is derived from the climatic division map of the geographic atlas of China (Peking University) [69].  The reliability of the ML models relies on the quality of the dataset. Generally, ML models have an excellent performance within the scope of the training dataset. However, predicting the neutralization depth of a new sample outside the range of the training dataset is difficult for ML models. Therefore, a dataset with a wide scope is necessary for the 1 2 Location of bridge components Loc -Arch ring, beam, bridge pier, bridge platform. General Specifications for Design of Highway Bridges and Culverts (JTG D60-2015). There are no missing data in the dataset; any samples containing missing values were abandoned during collection. Table 1 gives a detailed description of the dataset. The climatic division used in Table 1 is derived from the climatic division map of the geographic atlas of China (Peking University) [69].  The reliability of the ML models relies on the quality of the dataset. Generally, ML models have an excellent performance within the scope of the training dataset. However, predicting the neutralization depth of a new sample outside the range of the training dataset is difficult for ML models. Therefore, a dataset with a wide scope is necessary for the General Specifications for Design of Highway Bridges and Culverts (JTG D60-2015). There are no missing data in the dataset; any samples containing missing values were abandoned during collection. Table 1 gives a detailed description of the dataset. The climatic division used in Table 1 is derived from the climatic division map of the geographic atlas of China (Peking University) [69].  The reliability of the ML models relies on the quality of the dataset. Generally, ML models have an excellent performance within the scope of the training dataset. However, predicting the neutralization depth of a new sample outside the range of the training dataset is difficult for ML models. Therefore, a dataset with a wide scope is necessary for the 1 2 The reliability of the ML models relies on the quality of the dataset. Generally, ML models have an excellent performance within the scope of the training dataset. However, predicting the neutralization depth of a new sample outside the range of the training dataset is difficult for ML models. Therefore, a dataset with a wide scope is necessary for the reliability of ML models. Table 1 shows the range of this dataset. The service time, compressive strength, and load level of samples in the dataset can cover the main status of existing bridges. The temperature, humidity, acid rain, and climate status can cover the  Figure 1 also shows that the dataset has a large scope. Besides this, the histograms in Table 1 show that the values of samples have good continuity. Therefore, it is believed that this dataset is effective for developing ML models.
The imbalanced distribution of temperature and RH is also revealed in the histograms. The RH of most of the samples is around 72.5-82.5%, and the temperature of most of the samples is around 15 • C. The parts that have few points will receive less attention from the ML models because of the imbalance in the dataset. However, this negative effect caused by imbalances can be alleviated through increasing the penalty applied to the misclassification of the samples in these parts.

Parameter Evaluation and Selection
This study aims to develop diagnosis models for existing R.C. bridges, so parameter selection is important as it will alleviate the difficulty of obtaining parameters in real engineering. Random forest is widely used in feature selection [27]. It is used for supervised learning, and it does not require the dataset to obey normal distribution [33]. Obviously, the dataset used in this study does not obey normal distribution.

Random Forest for Parameter Evaluation
Random forest is a combination of decision trees. First, n samples are selected from the dataset as a training set through put-back sampling, and a decision tree is generated from these n samples. Then d features are randomly selected and split at each node of the decision tree. The above process is repeated k times (k is the number of decision trees in a random forest), and finally a random forest model is generated.
In the process of generating decision trees, the Gini coefficient is usually used to split nodes. Random forest evaluates the importance of parameters by calculating the average change in the Gini coefficient of feature f i (i = 1, 2, . . . , d) during the splitting process of the nodes. Assuming there are, in total, d features in kth decision tree, the probability of a sample belonging to class m is p m , and there are M classes; then, the Gini coefficient is defined as: For dataset D, the Gini coefficient is: where C m is the subset of samples belonging to class m in the dataset D. On node n, feature f i divides dataset D into two parts, D 1 and D 2 , so the changes in the Gini coefficient are: Therefore, the importance of parameter f i in the kth decision tree is: where N i is the number of nodes divided by feature f i on dataset D. Therefore, the importance of feature f i can be calculated by Equation (5): where d is the number of characteristics, and K is the number of decision trees in the random forest. Figure 2 shows the results of parameter evaluation. It is noted that temperature, concrete strength f , RH and age are more important than climate, location of components, acid rain, and load level. The cumulative importance of the top four parameters reaches 0.73. Climate, which represents the rough ambient conditions of neutralization, is often considered an important feature. However, due to its high correlation with other environment parameters, the results show that it is not so important. The random forest approach tends to place some of the highly correlated features on the top, but place the others at the end.

Results and Discussions
where is the number of characteristics, and is the number of decision trees in random forest. Figure 2 shows the results of parameter evaluation. It is noted that temperature, crete strength , RH and age are more important than climate, location of compon acid rain, and load level. The cumulative importance of the top four parameters rea 0.73. Climate, which represents the rough ambient conditions of neutralization, is considered an important feature. However, due to its high correlation with other env ment parameters, the results show that it is not so important. The random forest appr tends to place some of the highly correlated features on the top, but place the others a end. Further, Climate and Loc are noun parameters. Generally, a noun parameter ca be used directly to establish models. A common approach for preprocessing noun pa eters is unique heat coding, and this will create a new virtual parameter for each un value of the noun parameter. Therefore, the old parameter Climate will generate six parameters, and Loc will generate four new parameters. Adding so many new param is unnecessary, because of their low importance. Therefore, Climate, Loc, pH and p omitted in the next study.

Results and Discussions
In addition, it is important to discuss the limitations of the ML models used in study. For ML models, their validity scope depends on the range of the dataset. T models are actually empirical models. In this study, the ML models were based on RH, and , so the validity scope of the models is a four-dimensional space determ by the dataset. The search algorithm can be used to determine the valid scope of the m els. For example, when a new sample is obtained and one wants to know if the samp in the valid scope, one can search the dataset and find the new sample's neighbo points. Then, the neighboring points can be used for judging if the new sample is i valid range. Further, Climate and Loc are noun parameters. Generally, a noun parameter cannot be used directly to establish models. A common approach for preprocessing noun parameters is unique heat coding, and this will create a new virtual parameter for each unique value of the noun parameter. Therefore, the old parameter Climate will generate six new parameters, and Loc will generate four new parameters. Adding so many new parameters is unnecessary, because of their low importance. Therefore, Climate, Loc, pH and p were omitted in the next study.
In addition, it is important to discuss the limitations of the ML models used in this study. For ML models, their validity scope depends on the range of the dataset. These models are actually empirical models. In this study, the ML models were based on age, RH, f and t, so the validity scope of the models is a four-dimensional space determined by the dataset. The search algorithm can be used to determine the valid scope of the models. For example, when a new sample is obtained and one wants to know if the sample is in the valid scope, one can search the dataset and find the new sample's neighboring points. Then, the neighboring points can be used for judging if the new sample is in the valid range.

Machine Learning Models
The current prediction models require a large number of input parameters, and output a mean value of neutral depth. However, the dispersion of concrete's neutral depth is great. Figure 3 illustrates the histogram of the neutralization depth data of the Nanjing Yangtze river bridge's concrete components. All components in Figure 3 have the same service time and concrete mix proportion. It is noted that the discreteness of those components is obvious. Thus, this paper decided to predict the level of neutral depth. Table 2 shows the classification of the neutral depth of concrete of bridges. 6 mm was chose as a boundary between slight level and medium level in this study. This is because the relationship between the neutralization depth and concrete's compressive strength will become uncertain in the appraisal of old buildings when the neutralization depth of concrete is greater than 6 mm. According to Technical Specification for Inspecting of Concrete Compressive Strength by Rebound Method (JGJ/T 23-2011), when the neutralization depth is greater than 6 mm, the test results cannot reflect the actual strength of the concrete. In addition, 25mm was selected as the boundary between the medium level and the serious level, since the protective layer thickness of components of bridges in China is often between 20 and 30mm. When the neutral depth reaches 25mm, the neutral area is likely to reach the surface of the steel bars.
The current prediction models require a large number of input parameters, and put a mean value of neutral depth. However, the dispersion of concrete's neutral dep great. Figure 3 illustrates the histogram of the neutralization depth data of the Nan Yangtze river bridge's concrete components. All components in Figure 3 have the s service time and concrete mix proportion. It is noted that the discreteness of those c ponents is obvious. Thus, this paper decided to predict the level of neutral depth. Tab shows the classification of the neutral depth of concrete of bridges. 6 mm was chose boundary between slight level and medium level in this study. This is because the tionship between the neutralization depth and concrete's compressive strength wil come uncertain in the appraisal of old buildings when the neutralization depth of conc is greater than 6 mm. According to Technical Specification for Inspecting of Conc Compressive Strength by Rebound Method (JGJ/T 23-2011), when the neutraliza depth is greater than 6 mm, the test results cannot reflect the actual strength of the crete. In addition, 25mm was selected as the boundary between the medium level and serious level, since the protective layer thickness of components of bridges in Chin often between 20 and 30mm. When the neutral depth reaches 25mm, the neutral ar likely to reach the surface of the steel bars.

Support Vector Machine
SVM is a binary classification model; its purpose is to find a hyperplane in order to classify samples into two classes [70]. SVM finds the hyperplane by maximizing the margin between the two classes. The margin refers to the shortest distance between the closest data points to the hyperplane. Therefore, only a few points, which are called support vectors, can influence the hyperplane. Because the majority of the samples are insignificant, SVM offers one of the most robust and accurate algorithms among all well-known modeling methods when the dataset is not huge [37]. Considering the size of the dataset used in this study, SVM is obviously attractive. Figure 4 shows an illustration of SVM.

Support Vector Machine
SVM is a binary classification model; its purpose is to find a hyperplane in order to classify samples into two classes [70]. SVM finds the hyperplane by maximizing the margin between the two classes. The margin refers to the shortest distance between the closest data points to the hyperplane. Therefore, only a few points, which are called support vectors, can influence the hyperplane. Because the majority of the samples are insignificant, SVM offers one of the most robust and accurate algorithms among all well-known modeling methods when the dataset is not huge [37]. Considering the size of the dataset used in this study, SVM is obviously attractive. Figure 4 shows an illustration of SVM. Developing an SVM model can help in solving the following problem [70]: ( • ) is the kernel function. The data points in a low-dimensional space can be transformed into the data points in a high-dimensional space through the kernel function [70]. Therefore, a nonlinear problem can turn into a linear problem. Figure 5 depicts the effects of the kernel function. Common kernel functions include polynomial kernel, radial basis kernel and hyperbolic tangent kernel. ( , ) is the sample point, and is the number of samples in the dataset.
indicates the penalty for misclassification. * = ( * , * , … , * ) can be obtained by solving Equation (6). Then, the decision function can finally be obtained:  Developing an SVM model can help in solving the following problem [70]: K(x·z) is the kernel function. The data points in a low-dimensional space can be transformed into the data points in a high-dimensional space through the kernel function [70]. Therefore, a nonlinear problem can turn into a linear problem. Figure 5 depicts the effects of the kernel function. Common kernel functions include polynomial kernel, radial basis kernel and hyperbolic tangent kernel. (x i , y i ) is the sample point, and N is the number of samples in the dataset. C indicates the penalty for misclassification. α * = α * 1 , α * 2 , . . . , α * N T can be obtained by solving Equation (6). Then, the decision function can finally be obtained: Crystals 2021, 11, x FOR PEER REVIEW 8 of 22

K-Nearest Neighbor
KNN is one of the most concise classification algorithms, and it is also recognized as one of the top ten data mining algorithms [37]. For a new sample, KNN will find k samples closest to this sample in the dataset. The classification of this new sample depends on the voting results of those k samples. Figure 6 shows an illustration of KNN. In Figure 6, we

K-Nearest Neighbor
KNN is one of the most concise classification algorithms, and it is also recognized as one of the top ten data mining algorithms [37]. For a new sample, KNN will find k samples closest to this sample in the dataset. The classification of this new sample depends on the voting results of those k samples. Figure 6 shows an illustration of KNN. In Figure 6, we suppose k = 4, and the four closest data points to the new sample are marked with numbers. Points 1, 2, and 3 belong to class C, and only point 4 belongs to class A. Therefore, this new sample should be classified into class C. Compared with other ML methods, KNN is simpler, but is also effective [71]. KNN is often used for comparison with other ML methods in some studies [71,72], as well as this study.
KNN is one of the most concise classification algorithms, and it is also recognized as one of the top ten data mining algorithms [37]. For a new sample, KNN will find k samples closest to this sample in the dataset. The classification of this new sample depends on the voting results of those k samples. Figure 6 shows an illustration of KNN. In Figure 6, we suppose = 4, and the four closest data points to the new sample are marked with numbers. Points 1, 2, and 3 belong to class C, and only point 4 belongs to class A. Therefore, this new sample should be classified into class C. Compared with other ML methods, KNN is simpler, but is also effective [71]. KNN is often used for comparison with other ML methods in some studies [71,72], as well as this study.
The decision function of KNN can be written as follows:  The decision function of KNN can be written as follows: c j represents the class j (j = 1, 2, . . . , k). N k (x) is the range that covers these k samples. I(y i = c i ) is an indicator function; if y i = c i , then I(y i = c i ) = 1, otherwise, I(y i = c i ) = 0 (i = 1, 2, . . . , N). The purpose of KNN is to find the optimal number k of nearest neighbors.

AdaBoost
AdaBoost is one of the most representative methods in machine learning [37]. Ad-aBoost is a famous ensemble learning algorithm. This method will develop a lot of weak classifiers, and finally combines these weak classifiers into a strong classifier. Therefore, the decision function can be written as follows [37]: G m (x) is the decision function of the mth weak classifier (m = 1, 2, . . . , M). α m is the weight of G m (x), and this coefficient is calculated by the accuracy of G m (x). In this study, decision tree models were used as the weak classifiers. AdaBoost first generates a weak decision tree model, gets its decision function G 1 (x), and updates the weight of the samples according to the performance of G 1 (x). If one data point is misclassified by G 1 (x), it will be assigned a greater weight in the next round. The weight of samples is updated in the (m − 1)th round, and G m (x) will be fitted based on these samples and their weights in the mth round. α m , the weight of G m (x), will be calculated via the accuracy of G m (x).

XGBoost
XGBoost (extreme gradient boosting) was proposed in 2016, and soon became a popular method for its excellent performance in Kaggle competitions [73]. XGBoost is one of the most popular emerging ML approaches. However, its application in civil engineering is not as common as the application of other conventional ML methods, such as SVM, KNN, and ANN. Most of the applications of XGBoost in civil engineering have been undertaken in the last two years. Hu et al. [74] used XGBoost to predict the wind pressure coefficients of buildings. Pei et al. [75] developed a pavement aggregate shape classifier based on XGBoost. In this study, XGBoost is selected on behalf of the other new ML methods for the comparison with other representative ML methods.
The XGBoost method first develops a weak classifier. Then, the next weak classifier is designed to reduce the gap between the true value and the prediction value of the first weak classifier. For mth training, the decision function can be written as follows: α m represents the weight of weak classifier F m (x). When the mean-squared error is chosen as the loss function of the models, the objective function requiring optimization in generating a new weak classifier can be written as follows: Ω f j (x) is a regularization item, and N is the number of samples. In this study, the tree model was selected as the weak classifier. The tree model is the commonest weak classifier in the application of XGBoost.

Multi-Class Problem
Some machine learning methods (e.g., SVM) are designed for binary classification problems, but a multi-class problem was studied in this study. Therefore, a one-to-one strategy (OVO) is considered. OVO is a common approach for multi-class problems [76][77][78]. OVO methods generate a hyperplane between any two categories, and will generate N(N − 1)/2 hyperplanes for an N classification problem. For a new sample, all models are utilized, and the final results depend on the vote among ML models.

Results and Discussions
Z-score normalization is used for the normalization of the dataset. Normalization can alleviate the influence of the parameters' different scales. Besides this, in order to improve the reliability of ML models, it is necessary to divide all samples into two parts: T 1 and T 2 . T 1 , the training dataset, is used for training ML models, and T 2 , the testing dataset, is used for testing the performance of ML models. This study made 70% of the original dataset into a training dataset.
Training accuracy, verification accuracy and macro recall rate were used as the indicators for the optimization of the parameters of the models. The mesh search tuning approach was used to find the optimal values of the parameters of the models. Underfitting and over-fitting can be avoided by comparing the model's training accuracy and verification accuracy. Besides this, in order to improve the reliability of results, the dataset was randomly divided 10 times, and each child dataset has the same distribution. The final results were based on the performances of these 10 models. Table 3 shows the results of mesh search tuning.  [70]. K(u, v) = e −γ|u−v| 2 . 2 "tanh" represents hyperbolic tangent kernel [79]. K(u, v) = tanh(γ·u·v + c). 3 "poly" represents polynomial kernel [79]. K(u, v) = (γ·u·v + c) deg . 4 Parameters C, k and M in this column are explained in Sections 4.1-4.4. Table 3 shows that the training accuracy of ML models is very close to their verification accuracy, which illustrates that overfitting is avoided. Both accuracy and macro recall rate were adopted for estimating ML models. In fact, macro recall rate is more important than accuracy when the imbalance of datasets is considered. Macro recall rate is an index for depicting the ratio of samples correctly classified by the classifier to samples that should be correctly classified. Obviously, the verification accuracy (91%) and the macro recall rate (86%) of the radial basis kernel SVM model are higher than those of other models. Besides this, the gap between the verification accuracy and the training accuracy of the radial basis kernel SVM model is 2%, so there is no obvious evidence of overfitting. Besides this, the performance of radial basis kernel and polynomial kernel are better than that of hyperbolic tangent kernel. Radial basis kernel is the best kernel function in this study. Compared with other methods, KNN also seemed to be attractive. Besides this, the maximum gap between the models in terms of verification accuracy is 22%. However, for macro recall rate, the maximum gap can reach 40%. This may be due to the influence of the uneven distribution of the dataset. The macro recall rate is sensitive to imbalance data. As an evaluation index, the macro recall rate is more representative. Table 4 shows the obfuscation matrixes of models. All models were established by scikit-learn. For Li and Lj (i, j = 1, 2, 3) in Table 4, the value in row L i and column L j represents the number of samples that are actually class L i but are predicted to be class L j by the models. The green area in Table 4 shows the number of test samples that are rightly classified, and the yellow area shows the number of test samples that are wrongly classified. Based on Table 4, the accuracy of the models in terms of the neutralization level of concrete can be obtained (Figure 7).
Even though the radial basis kernel SVM model has the highest verification accuracy and the highest macro recall rate, Table 4 and Figure 7 show that the KNN model is better at classifying the samples with a slight level than other methods (accuracy > 97%). However, compared with other models, the KNN model only achieves a moderate performance in the prediction of medium-level samples (accuracy = 81%). The AdaBoost model is the best classifier in predicting the neutralization depth of medium-level samples (accuracy > 93%).  Figure 7 show that the KNN model is better at classifying the samples with a slight level than other methods (accuracy > 97%). However, compared with other models, the KNN model only achieves a moderate performance in the prediction of medium-level samples (accuracy = 81%). The AdaBoost model is the best classifier in predicting the neutralization depth of medium-level samples (accuracy > 93%). Besides this, Figures 8 and 9 show that most ML models reach a high accuracy in predicting the neutralization depth of concrete components with a longer service life (20-39 years) and a higher compressive strength (40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51)(52)(53)(54)(55)(56)(57)(58)(59). Figures 10 and 11 show that most ML models achieve a high accuracy in predicting the neutralization depth of concrete component in a lower temperature (13-16 °C) and a lower humidity (71-75%) environment. It is believed that when the neutralization depth of concrete reaches the boundary of two levels, the accuracy will decline. Therefore, a rough warning range for the neutralization depth of concrete in terms of parameters can be obtained. For instance, for service time, the range is 10-19. The warning range implies that the neutralization depth of concrete is likely to reach the next level, and more attention should be paid to these bridges. Besides this, Figures 8 and 9 show that most ML models reach a high accuracy in predicting the neutralization depth of concrete components with a longer service life (20-39 years) and a higher compressive strength (40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51)(52)(53)(54)(55)(56)(57)(58)(59). Figures 10 and 11 show that most ML models achieve a high accuracy in predicting the neutralization depth of concrete component in a lower temperature (13-16 • C) and a lower humidity (71-75%) environment. It is believed that when the neutralization depth of concrete reaches the boundary of two levels, the accuracy will decline. Therefore, a rough warning range for the neutralization depth of concrete in terms of parameters can be obtained. For instance, for service time, the range is 10-19. The warning range implies that the neutralization depth of concrete is likely to reach the next level, and more attention should be paid to these bridges.        Crystals 2021, 11, x FOR PEER REVIEW Figure 11. The accuracy of models in terms of the relative humidity of exposure situations

Conclusions
In this paper, four-parameter ML models for predicting the neutralization de els of the concrete components of existing bridges were established. Four repres  Figure 11. The accuracy of models in terms of the relative humidity of exposure situations.

Conclusions
In this paper, four-parameter ML models for predicting the neutralization depth levels of the concrete components of existing bridges were established. Four representative ML methods were used in this study. The following conclusions can be drawn:

1.
This study used SVM, KNN, AdaBoost and XGBoost to predict the neutralization depth level of the concrete of existing bridges, and the results show that the radial basis kernel SVM model has the highest validation accuracy (91%) and the highest macro recall rate (86%), with only four parameters. The radial basis kernel function is the best kernel functions in this study. Compared with other models, the radial basis kernel SVM model and KNN model achieve a better performance; 2.
The results reveal the preference of ML methods. KNN is good at classifying slightlevel samples (accuracy > 97%), and AdaBoost is the best method for the prediction of medium-level samples (accuracy > 93%). Machine learning shows great potential in predicting the neutralization depth of concrete with very few parameters, and evaluating the durability level of existing bridges; 3.
Random forest was used for parameter selection. The results show that temperature, concrete strength, RH and service time are more important than climate, acid rain, location of components, and load level. The cumulative importance of these top four parameters reaches 73%. The performance of the models shows that random forest is an effective approach for parameter selection.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A