A New Approach to Machine Learning Model Development for Prediction of Concrete Fatigue Life under Uniaxial Compression

: The goal of this work is to show how machine learning models, such as the random forest, neural network, gradient boosting, and AdaBoost models, can be used to forecast the fatigue life (N) of plain concrete under uniaxial compression. Here, we developed our ﬁnal machine learning model by generating the following three data ﬁles from the original data used in the work of Zhang et al.: (a) grouped data with the same input variable value and different output variable log N value, (b) data excluding outliers selected by three or more outlier detection methods; (c) average data excluding outliers, created by averaging the grouped data after excluding outliers from among the grouped data. Excluding the sustained strength of the concrete variable, originally treated as the seventh input variable in the work of Zhang et al., resulted in improving the determination coefﬁcient ( R 2 ) values. Moreover, the gradient boosting model showed a high R 2 value at 0.753, indicating a high accuracy in predicting outcomes. Further analysis using data excluding outliers shows that the R 2 value increased to 0.803. Moreover, the average data excluding outliers provided the best R 2 value at 0.915. Finally, a permutation feature importance (PFI) analysis was carried out to determine the strength of the relationship between the feature and the target value for the gradient boosting model. The analysis results showed that the maximum stress level ( S max ) and loading frequency ( f ) were the most signiﬁcant input variables, followed by compressive strength ( f ’ c ) and maximum to minimum stress ratio ( R ). Shape and height to width ratio ( h / w ) were the features with a non-signiﬁcant inﬂuence on the model. This trend was previously conﬁrmed by a Pearson and Spearman correlation analysis.


Introduction
Concrete structures are subjected to repeated loading (N) from many sources, such as dead load and live loads in buildings, traffic loads in civil structures, or environmental loads, such as temperature and humidity changes.It is commonly known that concrete strength under repeated loading will be lower than that under static loading [1,2].Concrete structures subjected to many repeated loadings will experience an increase in deflections, crack widths, and eventually lead to the reduction in durability and fatigue failure [3].
A classic fatigue equation for plain concrete is typically represented by an S-N diagram, where the stress level (S) is defined as the percentage of the static strength, with respect to the logarithm of N. Most previous research results on fatigue have been analyzed with a simple linear equation.However, it is well-known that the single S-N curve (known as a Wohler curve) is inappropriate to describe fatigue behavior [1], as it is affected by other factors.
In addition to S, concrete fatigue is affected by various factors, such as concrete compressive strength, concrete mix proportions, and loading parameters [1,.As stated in [1], unlike insensitivity to the details of mix design and concrete compressive strength, concrete is highly sensitive to fatigue loading parameters, such as maximum stress level (S max ), maximum stress to minimum stress level (R), frequency (f ), and fatigue loading history [1].
Appl.Sci.2022, 12, 9766 2 of 22 Moreover, high strength concrete yields a different fatigue pattern, while various mix designs proportioned by different water-binder ratios, including the use of fibers, also produce different fatigue patterns [1].Recently, incorporating supplementary cementitious materials (SCMs), such as slag, fly ash, metakaolin, and silica fume, in the concrete mix is widely regarded as the most economical means of improving durability and reducing CO 2 emission issues [34,35].Thus, in the near future, it will be essential to understand the fatigue behavior of the waste materials, as well as the SCMs.However, the fatigue behavior of innovative concrete materials combined with the above-mentioned mixture constituents is difficult to estimate.In addition, the concrete structures will be exposed to diverse fatigue loading parameters, such as different stress levels and frequencies, as mentioned before.Therefore, the traditional statistical treatment of accurately predicting concrete fatigue behavior has reached its limit, due to its inability to consider the complicated combined effects of those influential parameters.
In 2019, an ANN-based concrete fatigue strength model was proposed by Abambres and Lantsoght [63].They used 203 data points gathered from the literature.Predicted values analyzed from the ANN model were compared to the existing code expressions.Their ANN model includes the compressive strength of concrete, maximum stress level, and minimum stress level.In 2021, a strength degradation model of concrete under fatigue loading was proposed by Zhang et al. [4] using several ML algorithms, such as the random forest, support vector machine, and artificial neural network models.About 1000 experimental data were collected from various independent experiments .Seven independent variables were chosen in their study, including the compressive strength of concrete, sustained strength of concrete, height to width ratio and shape of the test specimens, maximum stress level, minimum to maximum stress ratio, and loading frequency.The analysis results revealed that the random forest model produces the highest value of the correlation coefficient at 0.85.
Due to the nature of the fatigue strength test, outliers can remarkably occur in this test compared to other material strength properties tests.In statistics, an outlier is a data point that differs significantly from other observations [64,65].An outlier may be due to variability in the measurement or it may indicate experimental error; the latter is sometimes excluded from the data set.There are various methods of outlier detection, such as Grubbs's test [64], Chauvenet's criterion [66], Peirce's criterion [67], Dixon's Q-test [68], the generalized extreme studentized deviation test [69], Thompson and Tau test [70], and the IQR-test [71,72].
In this study, 1300 samples of experimental data  of concrete fatigue tests originally carried out by Zhang et al. [4] were treated using 4 kinds of machine learning models (artificial neural network, random forest, and the gradient boosting and AdaBoost method).Unlike previous studies, this research adopts six independent values, excluding only the sustained strength of the concrete variable used from the work of Zhang et al. [4].For our approach, three data files were generated to compare the actual number of fatigue repetition values (logN) against the predicted values (logN).The first data file uses the entire original dataset, which was treated by Zhang et al. [4].However, unlike Zhang et al. [4], our research adds the second data file with the grouping data and the third data file that excludes outliers.In this work, Chauvenet's criterion, Pierce's criterion, the Thompson-Tau criterion and the IQR method were adopted to remove outliers.Finally, a permutation feature importance (PFI) analysis was carried out to determine which input variables are the most critical or minor in the fatigue life model.Our novel approach allows better fatigue life prediction than Zhang et al. [4]'s approach.

Input and Output DATA (Independent and Dependent Variables)
Six basic input features (variables) that influence the fatigue life span of plain concrete under a uniaxial fatigue test in compression were chosen, as shown in Table 1.One output variable is the logarithm value of the maximum number of cycles at failure, representing the fatigue life of the test.The number of the first group of the key input variables, which are related to the material and dimensional properties of the test specimens, included the compressive strength of concrete (f c ), height to width ratio (h/w), and shape of the test specimens.The other three variables that reflect the loading conditions of the fatigue test specimens include the maximum stress level (S max ), minimum stress to maximum stress ratio (R), and loading frequency (f ).This study covers low-strength hydraulic concrete (10~30 MPa), ordinary concrete (30~60 MPa), and high-strength concrete (60~120 MPa).The h/w of the test specimens ranged from 1.0 to 3.0, and the specimen's shape includes the cube, prism, and cylinder.The loading conditions were also greatly diverse, with the S max ranging from 0.457 to 0.95, the R covering 0 to about 0.67, and a loading frequency ranging from 0.0625 to 150 Hz.The dataset used in this study is summarized below.f c : the compressive strength of concrete by MPa; h/w: height to width ratio of the tested specimens; Shape: shape of the test specimens; S max : maximum stress level; R: minimum stress to maximum stress ratio; f (Hz): loading frequency by Hz; LogN: logarithm number of cycles to failure of the specimen.

DATA Preparation for the Developed Model
Three data files were generated and used to develop the final ML model.Each data file is described below.
1. ORIGINAL DATA.These are data used in Zhang's paper, directly collected by the authors from papers .The full-data spreadsheet is available in the Supplementary Materials.These are used as the reference data for this study.A total of 1298 data were collected, and statistical features such as the mean, median, dispersion, minimum, and maximum values of independent and dependent variables are summarized in Table 1.The ORIGINAL DATA were grouped by the same input variable value.2. DATA Excluding OUTLIERS.If there are outliers in the group, these are the data created after removing them.These data are used as a basis for determining the average value after removing the outliers.A total of 1252 data were generated.Statistical features such as the mean, median, dispersion, minimum and maximum values of independent and dependent variables are summarized in Table 2. 3. AVERAGE DATA Excluding OUTLIERS.These are the data created by averaging the grouped data after excluding outliers from among the grouped data.In this process, the total number of data was reduced to 310.Statistical features such as the mean, median, dispersion, minimum and maximum values of independent and dependent variables are summarized in Table 3. Tables 1-3 illustrate the statistical analysis of variables, showing the numerous mathematical descriptions of the input and output values for each data set.Tables 4-6 describe the data process in which parts of the data from reference [5] are used to illustrate the process more clearly as an example.Table 4 represents a part of the grouped data in which data sets with the same input variable value, but with different output variable values, are grouped together.Table 4 consists of two groups.Group 1 is a data set with an f c value of 56 MPa, h/w value of 1, shape value of 1, S max value of 0.85, R value of 0.3, and f value of 4 Hz, but with different output values N. Group 2 is a data set with an f c value of 56 MPa, h/w value of 1, shape value of 1, S max value of 0.85, R value of 0.3, and f value of 1 Hz, but with different output values N.
To designate whether there are outlier data in each group, four commonly used outlier detection methods [70,71] were performed.If three or more of them were designated as an outlier, they were excluded from the data.The four methodologies are as follows: 1. Outlier detection method using Chauvenet's criterion; 2. Outlier detection method using Peirce's criterion; 3. Outlier detection method using Thompson-Tau criterion; 4. Outlier detection method using IQR (inter quartile range) criterion.
We performed all four of these methodologies on each group of data to determine which values were detected as outliers.All four of these outlier detection methodologies detected the N value of 22,570 (see Table 4) as an outlier for the Group 1 data.On the other hand, for the data in Group 2, the N value of 1571 (see Table 4) was detected as an outlier only by the Thompson-Tau methodology, but was not detected as an outlier by the other three methodologies.Table 5 represents the grouped data in which the data set with an N value of 22,570 is removed from Group 1.Even after removing outliers, the values of different output variables are recorded as experimental values in the same input variable values.With these data, it is difficult to make an accurate prediction model as long as the current input variables are maintained.One must suppose that the user predicts a function of y = sin(x).If several different values of the y experimental value for the sin(x) value are matched when x = 30, it will be difficult to create an ML model that predicts the sin(x) function.Therefore, in the case of grouped data having the same input variable value and different output variable values to eliminate this situation, the average value of all other output variable values is obtained.One average value is used as the output value for the same specific input variable value.This should provide more reasonable data for creating predictive ML models.Table 6 represents the average grouped data in Table 5. Figure 1 depicts the relative frequency distributions of the six input variables and one output variable.The shape variable is not only a numerical variable, but also a categorical variable.In the model, shape = 1 is represented as a cube, shape = 2 as a prism, and shape = 3 as a cylinder.Since the numbers are meaningful in determining the category, (d) in Figure 1 can be changed to (e), which is more suitable for normal distribution.The f variable appears to be unsuitable for normal distribution, since some high-frequency values of 10 Hz exist in the data.The f variable appears to be unsuitable for normal distribution, since some high-frequency value of 10Hz exist in the data.If these high-frequency data are removed, the rest of the data are much more suitable for normal distribution in Figure 1i.
The relationships between various independent variables and logN are plotted in Figure 2.Although not strong, one linear relationship is identified in Figure 2a (logN vs. S max ).All other plots show non-linear behavior.
The most commonly used methods in correlation analysis are the Pearson correlation analysis and Spearman correlation analysis.Pearson correlation evaluates the linear relationship and direction between two variables using the values of the variables.Spearman correlation evaluates a monotonic relationship between two variables.In a monotonic relationship, the two variables tend to change together, but do not necessarily change at a constant rate.The Spearman correlation coefficient is based on ranked values for each variable, not on raw data.The relationships between various independent variables and logN are plotted in Figure 2.Although not strong, one linear relationship is identified in Figure 2a   The most commonly used methods in correlation analysis are the Pearson correlation analysis and Spearman correlation analysis.Pearson correlation evaluates the linear relationship and direction between two variables using the values of the variables.Spearman correlation evaluates a monotonic relationship between two variables.In a monotonic relationship, the two variables tend to change together, but do not necessarily change at a constant rate.The Spearman correlation coefficient is based on ranked values for each variable, not on raw data.
Table 7 summarizes the Pearson correlation coefficient and Spearman correlation coefficient of the data used for our ML model.According to the Pearson correlation coefficient, Smax and logN have a negative solid linear relationship, while f has a positive and, R has a negative moderate linear relationship with logN.f'c, shape, and h/w have a nonsignificant linear relationship with logN.According to the Spearman correlation coefficient, Smax has a negative and f has a positive significant; f'c has a negative moderate; R, shape, and h/w have a negligible monotonic relationship with logN.Therefore, a complex relationship rather than linear mapping is critical for capturing variation and interaction.This is why it is necessary to create predictive systems using ML methods.

Methodology
Four types of predictive regression models were developed in this study using a neural network model, a random forest model, a gradient boosting model, and an AdaBoost model.

Neural Network
Artificial neural networks (ANNs) are an efficient learning tool inspired by biological neural networks.They are composed of the following three types of layers: input, hidden, and output.Training data are fed to the input layer, and the predicted value is calculated by the output layer through the hidden layer.Using the backpropagation algorithm, the weights connecting the input layer, the hidden layer, and the output layer are updated in a way that minimizes the error between the calculated value and the measured value [73,74].Figure 3 shows the general structure of ANNs.Table 7 summarizes the Pearson correlation coefficient and Spearman correlation coefficient of the data used for our ML model.According to the Pearson correlation coefficient, S max and logN have a negative solid linear relationship, while f has a positive and, R has a negative moderate linear relationship with logN.f c , shape, and h/w have a non-significant linear relationship with logN.According to the Spearman correlation coefficient, S max has a negative and f has a positive significant; f c has a negative moderate; R, shape, and h/w have a negligible monotonic relationship with logN.Therefore, a complex relationship rather than linear mapping is critical for capturing variation and interaction.This is why it is necessary to create predictive systems using ML methods.

Methodology
Four types of predictive regression models were developed in this study using a neural network model, a random forest model, a gradient boosting model, and an AdaBoost model.

Neural Network
Artificial neural networks (ANNs) are an efficient learning tool inspired by biological neural networks.They are composed of the following three types of layers: input, hidden, and output.Training data are fed to the input layer, and the predicted value is calculated by the output layer through the hidden layer.Using the backpropagation algorithm, the weights connecting the input layer, the hidden layer, and the output layer are updated in a way that minimizes the error between the calculated value and the measured value [73,74].Figure 3 shows the general structure of ANNs.

Random Forest
Random forest is one of the ensemble models.It is a method of forming multiple decision trees, passing new data through each tree, and voting based on the classification results of each tree, and then selecting the result with the most votes as the final classification result (see Figure 4).A random forest model can be viewed as a forest composed of random trees.Some trees in the random forest may be over fitted; however, there are many other trees that make up the forest.Therefore, there is no significant impact on the model [4,75].

Random Forest
Random forest is one of the ensemble models.It is a method of formin decision trees, passing new data through each tree, and voting based on the cl results of each tree, and then selecting the result with the most votes as the fi cation result (see Figure 4).A random forest model can be viewed as a forest co random trees.Some trees in the random forest may be over fitted; howeve many other trees that make up the forest.Therefore, there is no significant im model [4,75].

Random Forest
Random forest is one of the ensemble models.It is a method of forming multiple decision trees, passing new data through each tree, and voting based on the classification results of each tree, and then selecting the result with the most votes as the final classification result (see Figure 4).A random forest model can be viewed as a forest composed of random trees.Some trees in the random forest may be over fitted; however, there are many other trees that make up the forest.Therefore, there is no significant impact on the model [4,75].

Boosting Model
Boosting is an ensemble method that combines several weak learners to create a strong learner.It improves the performance of the next learning model, while reducing the errors of the previous learning model.There are several types of boosting methods, but AdaBoost and gradient boosting are representative models [75].

Boosting Model
Boosting is an ensemble method that combines several weak learners to create a strong learner.It improves the performance of the next learning model, while reducing the errors of the previous learning model.There are several types of boosting methods, but AdaBoost and gradient boosting are representative models [75].

Gradient Boosting Method
Gradient boosting uses gradient descent to minimize the loss function of a model by adding weak learners (see Figure 5).By training the model's residuals, this gives more importance to misclassified observations.The contribution of each weak learner to the final prediction is based on a gradient optimization process to minimize the overall error of the strong learner [75,76].

AdaBoost Method
AdaBoost, or adaptive boosting, is a type of boosting algorithm that generates a final strong classifier by collecting weighted weak classifiers (see Figure 6) [75,77].
Gradient boosting uses gradient descent to minimize the loss function of a model by adding weak learners (see Figure 5).By training the model's residuals, this gives more importance to misclassified observations.The contribution of each weak learner to the final prediction is based on a gradient optimization process to minimize the overall error of the strong learner [75,76].

AdaBoost Method
AdaBoost, or adaptive boosting, is a type of boosting algorithm that generates a final strong classifier by collecting weighted weak classifiers (see Figure 6) [75,77].

Model Development
The models for fatigue prediction were developed using Orange software, which is a popular open-source machine learning technology platform for statistical computing and data mining [78,79].All data analysis in this research was carried out using Orange of the strong learner [75,76].

AdaBoost Method
AdaBoost, or adaptive boosting, is a type of boosting algorithm that generates a final strong classifier by collecting weighted weak classifiers (see Figure 6) [75,77].

Model Development
The models for fatigue prediction were developed using Orange software, which is a popular open-source machine learning technology platform for statistical computing and data mining [78,79].All data analysis in this research was carried out using Orange

Model Development
The models for fatigue prediction were developed using Orange software, which is a popular open-source machine learning technology platform for statistical computing and data mining [78,79].All data analysis in this research was carried out using Orange software (version 3.32.0,developed at Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia, together with open source community.),which provides the most prevalent supervised ML algorithms.These algorithms were used to develop our novel ML model.Information regarding the input parameters and implementation of each machine learning algorithm are summarized in documentation and can be found at (https://orangedatamining.com/widget-catalog/, accessed on 4 April 2022).Orange provides a platform for developing the predictive modeling with big data.The schematic model developed using Orange Software is presented in Figure 7, and the specific parameters of each proposed model are shown in Figures 8-11.Unfortunately, the Orange 3 software used for this study does not have an optimizer function that automatically finds the hyper-parameters of the model.Thus, starting with the default parameters provided by Orange 3, the authors manually adjusted the parameters to generate feasible output for each ML model.algorithms were used to develop our novel ML model.Information regarding the input parameters and implementation of each machine learning algorithm are summarized in documentation and can be found at (https://orangedatamining.com/widget-catalog/, accessed on 4 April 2022).Orange provides a platform for developing the predictive modeling with big data.The schematic model developed using Orange Software is presented in Figure 7, and the specific parameters of each proposed model are shown in Figures 8-11.Unfortunately, the Orange 3 software used for this study does not have an optimizer function that automatically finds the hyper-parameters of the model.Thus, starting with the default parameters provided by Orange 3, the authors manually adjusted the parameters to generate feasible output for each ML model.In order to develop an ANN network model, the user has to set several important parameters, which are as follows.The number of hidden layers is set to two, and there are seven and eight neurons in each hidden layer, as shown in Figure 8.The rectified linear unit function is selected as the activation function for the hidden layer.As a solver for weight optimization, a stochastic gradient-based optimizer called Adam is used.As a regularization parameter, commonly called alpha, 0.0004 is used.Replicable training is allowed.In order to develop a random forest model, the user has to set several important parameters, which are as follows.As shown in Figure 9, 50 decision trees are included in the forest.Four attributes will be arbitrarily drawn for consideration at each node.Replicable In order to develop an ANN network model, the user has to set several important parameters, which are as follows.The number of hidden layers is set to two, and there are seven and eight neurons in each hidden layer, as shown in Figure 8.The rectified linear unit function is selected as the activation function for the hidden layer.As a solver for weight optimization, a stochastic gradient-based optimizer called Adam is used.As a regularization parameter, commonly called alpha, 0.0004 is used.Replicable training is allowed.
in Figure 7, and the specific parameters of each proposed model are shown in Figures 8-11.Unfortunately, the Orange 3 software used for this study does not have an optimizer function that automatically finds the hyper-parameters of the model.Thus, starting with the default parameters provided by Orange 3, the authors manually adjusted the parameters to generate feasible output for each ML model.In order to develop an ANN network model, the user has to set several important parameters, which are as follows.The number of hidden layers is set to two, and there are seven and eight neurons in each hidden layer, as shown in Figure 8.The rectified linear unit function is selected as the activation function for the hidden layer.As a solver for weight optimization, a stochastic gradient-based optimizer called Adam is used.As a regularization parameter, commonly called alpha, 0.0004 is used.Replicable training is allowed.In order to develop a random forest model, the user has to set several important parameters, which are as follows.As shown in Figure 9, 50 decision trees are included in the forest.Four attributes will be arbitrarily drawn for consideration at each node.Replicable In order to develop a random forest model, the user has to set several important parameters, which are as follows.As shown in Figure 9, 50 decision trees are included in the forest.Four attributes will be arbitrarily drawn for consideration at each node.Replicable training was permitted, while balance class distribution was not.The limit depth of individual trees has not been determined.One must select five subsets as the smallest subset that can be split.training was permitted, while balance class distribution was not.The limit depth of individual trees has not been determined.One must select five subsets as the smallest subset that can be split.In order to develop a gradient boosting model, the user has to set several important parameters, which are as follows.As shown in Figure 10, 150 gradient boosted trees are specified.A larger number usually results in better performance.The boosting rate is set In order to develop a gradient boosting model, the user has to set several important parameters, which are as follows.As shown in Figure 10, 150 gradient boosted trees are specified.A larger number usually results in better performance.The boosting rate is set to 0.2.Replicable training is allowed.The maximum depth of the individual tree is set to 4. One must select three subsets as the smallest subset that can be split.The fraction of training instances is set to 1.One must specify the percentage of the training instances for fitting the individual tree.In order to develop a gradient boosting model, the user has to set several important parameters, which are as follows.As shown in Figure 10, 150 gradient boosted trees are specified.A larger number usually results in better performance.The boosting rate is set to 0.2.Replicable training is allowed.The maximum depth of the individual tree is set to 4. One must select three subsets as the smallest subset that can be split.The fraction of training instances is set to 1.One must specify the percentage of the training instances for fitting the individual tree.In order to develop an AdaBoost model, the user has to set several important parameters, which are as follows.The number of estimators is set to 50, as shown in Figure 11.The learning rate is set to 1.It determines to what extent the newly acquired information will override the old information.The number of 1 means that the agent considers only the most recent information.The number of 3 is set as a fixed seed to enable reproduction of the results.We decided to use SAMME as the classification algorithm, which updates the base estimator's weights with classification results.Among the regression loss function options, the linear option is selected.In order to develop an AdaBoost model, the user has to set several important parameters, which are as follows.The number of estimators is set to 50, as shown in Figure 11.The learning rate is set to 1.It determines to what extent the newly acquired information will override the old information.The number of 1 means that the agent considers only the most recent information.The number of 3 is set as a fixed seed to enable reproduction of the results.We decided to use SAMME as the classification algorithm, which updates the base estimator's weights with classification results.Among the regression loss function options, the linear option is selected.

Model Developed with Original Data
In our novel ML model, about 1300 fatigue test results from the 29 paper data sets used by Zhang et al. [4] were collected and organized.For training and testing of the model, 90% of the total data was used for training and 10% was used for testing.

•
Total data sets: 1298 data sets;

•
Training data sets: 1169 data sets; • Test data sets: 129 data sets.
The four ML models (random forest, neural network, gradient boosting, and Ada-Boost) were run, and the results of training and testing for each model are shown in Table 8a,b below.Using the same data sets, Zhang et al. [4] reported that the MSE and correlation coefficient (r) from the random forest model are 0.44 and 0.85, respectively.The de-

Model Developed with Original Data
In our novel ML model, about 1300 fatigue test results from the 29 paper data sets used by Zhang et al. [4] were collected and organized.For training and testing of the model, 90% of the total data was used for training and 10% was used for testing.

•
Total data sets: 1298 data sets;

•
Training data sets: 1169 data sets;

•
Test data sets: 129 data sets.
The four ML models (random forest, neural network, gradient boosting, and Ad-aBoost) were run, and the results of training and testing for each model are shown in Table 8a,b below.Using the same data sets, Zhang et al. [4] reported that the MSE and correlation coefficient (r) from the random forest model are 0.44 and 0.85, respectively.The determination coefficient (R 2 ) in that case was about 0.723.In this study, excluding the sustained strength of the concrete variable, which was originally treated as the seventh input variable in the work of Zhang et al. [4], resulted in improving the MSE and R 2 values.Moreover, Table 8a,b shows that the gradient boosting model with the value of the minimum error and a high R 2 value is indicates high accuracy in predicting outcomes.Additionally, Zhang et al. [4] reported that the MSE and r values from the typical traditional regression fatigue formulae (represented as S-N-T-R) in terms of R, S max , and rate of loading (T) were 1.46 and 0.50, respectively.

Model Developed with Data Excluding Outliers
The data used in this model are the data that excluded outliers among the data used in 6.1.A total of 46 data sets, approximately 3.5 % of the original total data sets, are treated as outliers.For training and testing of the model, 90% of the total data was used for training and 10% was used for testing.In addition, 90-10, 85-15, and 80-20 are the ratios of the most used training and testing data.When developing an ML model using average data, the number of data is reduced.Therefore, in order to secure as much of the training data as possible, a ratio of 90-10 was used.

•
Total data sets: 1252 data sets;

•
Test data sets: 125 data sets.
The four machine learning models (random forest, neural network, gradient boosting, and AdaBoost) were run, and the results of training and testing for each model are shown in Table 9a,b below.As shown in Table 9a, the gradient boosting model with training data provides the highest determination coefficient, R 2 = 0.809, followed by R 2 = 0.805 from the AdaBoost model, and then 0.795 from the random forest model.The neural network gave the lowest R 2 value at 0.726.As shown in Table 9b, the gradient boosting model provides the highest determination coefficient, R 2 = 0.803, followed by R 2 = 0.794 from the AdaBoost model, and then 0.791 from the random forest model.The neural network gave the lowest R 2 value at 0.726.

Model Developed with Average Data Excluding Outliers
For the data used in Section 6.2, the data sets used with the same input variables have different output variable values.If there are many cases similar to this, it may be difficult to train the ML model.To eliminate this, one output data set value should be matched to one possible input data set value.For this purpose, average data are used.For training and testing of the model, 90% of the total data was used for training and 10% was used for testing.

•
Total data sets: 310 data sets;

•
Training data sets: 279 data sets;

•
Test data sets: 31 data sets.
The four machine learning models (random forest, neural network, gradient boosting, and AdaBoost) were run, and the results of training and testing for each model are shown in Table 10a,b below.As tabulated in Table 10a, the gradient boosting model with training data provides the highest determination coefficient, R 2 = 0.982, followed by R 2 = 0.973 from AdaBoost, then 0.887 from the random forest model.The neural network model showed the lowest R 2 value as 0.679.As tabulated in Table 10b, the gradient boosting model provides the highest determination coefficient, R 2 = 0.915, followed by R 2 = 0.893 from the random forest model, then 0.876 from the AdaBoost model.The neural network model showed the lowest R 2 value as 0.730.Three sets of data were used to develop the ML models in this study.The MSE, RMSE, MAE, and R 2 calculated with the average data excluding outliers were compared to the MSE, RMSE, MAE, and R 2 calculated with both the original data and the grouped data excluding outliers.As a result of comparing the values in Tables 8-10, the ML model developed with average data excluding outliers most closely matched the predicted value and the observed value.
Figure 12 depicts the actual values against the predicted values of logN for machine learning models developed with the average data excluding outliers.The results of the gradient boosting model fit a straight line better than the other ML models, which indicates that the gradient boosting model is more accurate for predicting the logN.The scattered data of the gradient boosting model are closer to the linear regression line than the scattered data of the other models.Compared to the other models, the scatter plot of the neural network model does not fit well and its prediction is slightly off, which has a larger dispersity of scatter points.Among the four ML models developed with the average data excluding outliers, the gradient boosting model most closely fits the observed data.The gradient boosting model often achieves state-of-the-art results on tabular data [80].It is one of the most powerful ensemble algorithms that often has the highest predictive accuracy [81][82][83], and the results of this study show no exception; the gradient boosting model outperformed all the other ML models tested here.
Finally, the results of the developed models using the training average data and testing average data are shown in Figure 13.The gradient boosting model has the highest value of R 2 with the training dataset and testing dataset.Finally, the results of the developed models using the training average data and testing average data are shown in Figure 13.The gradient boosting model has the highest value of R 2 with the training dataset and testing dataset.

Sensitivity Analysis of ML Models
Sensitivity analysis was performed to find a better ML model with various training and testing ratios.The results of the sensitivity analysis are summarized in Table 11 and Figure 14.All ML models show the highest R 2 value when the training and testing ratio is 90:10.When the training and testing ratio is 90:10, the R 2 value of the GB model is 0.915, which is the best value among the sensitivity analysis results.Sensitivity analysis was performed to find a better ML model with various training and testing ratios.The results of the sensitivity analysis are summarized in Table 11 and Figure 14.All ML models show the highest R 2 value when the training and testing ratio is 90:10.When the training and testing ratio is 90:10, the R 2 value of the GB model is 0.915, which is the best value among the sensitivity analysis results.

Comprehensive Evaluation of ML Models
In addition to the classic model performance evaluation indices, such as R 2 , MSE, MAE, new indices, such as VAF, PI, and A 10−index , are proposed to assess the efficiency of the developed models by Menemaran et al. [84].It was noted that the smaller RMSE, MAE, PI indicate more trustable statistical impressions [84].PI and A 10−index are represented by Equations ( 1) and ( 2) [84].
t is the mean of the observed values.In addition, M represents the sample number, and m 10 is the number of data with a ratio of the measured to predicted value between 0.9 and 1.1 [84].
In this study, five model performance indices (RMSE, MAE, R 2 , A 10−index , PI) were assessed in order to carry out comprehensive comparison.The models were scored from 1 to 4 based on each of the five indices; then, the scores were summed to assign a total score for each model.The results for this comparison score are listed in Table 12.Table 12 shows that the gradient boosting model has the best performance.On the other hand, the neural network model has the lowest accuracy for the testing data, respectively.Furthermore, the Taylor diagram of the four developed ML models is presented in Figure 15.It can be observed from the graph that the gradient boosting model has the best performance, while the neural network model has the worst performance with the average data excluding outliers.score for each model.The results for this comparison score are listed in Table 12.Table 12 shows that the gradient boosting model has the best performance.On the other hand, the neural network model has the lowest accuracy for the testing data, respectively.Furthermore, the Taylor diagram of the four developed ML models is presented in Figure 15.It can be observed from the graph that the gradient boosting model has the best performance, while the neural network model has the worst performance with the average data excluding outliers.

Permutation Feature Importance
The correlation used to explain the model is, in fact, a methodology to explain the relationship between each input variable and output variable before model development;

Permutation Feature Importance
The correlation used to explain the model is, in fact, a methodology to explain the relationship between each input variable and output variable before model development; however, it is slightly insufficient to comprehensively explain the influence of a specific input variable on the prediction of the ML model [85,86].Permutation feature importance (PFI) is used as a method to comprehensively determine the importance of variables in a model.To determine the strength of the relationship between the feature and the target value, the error increase in the model prediction is measured after the features are randomly removed.If the model error increases when randomly removing one feature, it is a "significant" feature because it indicates that the model depends on that feature when making predictions.Conversely, if there is no difference in error, the feature is said to be "non-significant" [87].
Figure 16 shows that S max and f are very important input variables in the gradient boosting model.It also shows that f c and R are the next most important features, and shape and h/w are features with very weak influence on the gradient boosting model.

Conclusions
The goal of this work was to show how ML models can be used to forecast the fatigue life (N) of plain concrete under uniaxial compression.The fatigue life was forecasted using random forest, neural network, gradient boosting, and AdaBoost models.The models were developed sequentially using three data sets.The first was developed with original data, the second was developed with outliers removed, and the last model was developed with the average value of data with different outputs in the same input.For training and testing of the models, a ratio of training and testing was used as 90:10 in order to secure as much of the training data as possible.From this, we were able to make the following conclusions.
1. Three data files were generated from the original data, which were used in the work of Zhang et al. [4].These files were used to develop the final ML model and were as follows: (a) grouped data with the same input variable value and different output variable logN value, (b) data excluding outliers selected by three or more outlier detection methods; (c) average data excluding outliers, created by averaging the grouped data after excluding outliers from the grouped data.2. From the Pearson and Spearman correlation analysis, it was observed that the maximum stress level Smax had a solid negative relationship with logN, and the loading frequency f had a solid positive relationship with logN.Simultaneously, the height to width ratio (h/w) and shape of the tested specimens had weak relationships with logN.3. Excluding the sustained strength of the concrete variable, originally treated as the seventh input variable in the work of Zhang et al. [4], resulted in improving the MSE and determination coefficient R 2 values.Moreover, the gradient boosting model with

Conclusions
The goal of this work was to show how ML models can be used to forecast the fatigue life (N) of plain concrete under uniaxial compression.The fatigue life was forecasted using random forest, neural network, gradient boosting, and AdaBoost models.The models were developed sequentially using three data sets.The first was developed with original data, the second was developed with outliers removed, and the last model was developed with the average value of data with different outputs in the same input.For training and testing of the models, a ratio of training and testing was used as 90:10 in order to secure as much of the training data as possible.From this, we were able to make the following conclusions.
1. Three data files were generated from the original data, which were used in the work of Zhang et al. [4].These files were used to develop the final ML model and were as follows: (a) grouped data with the same input variable value and different output variable logN value, (b) data excluding outliers selected by three or more outlier detection methods; (c) average data excluding outliers, created by averaging the grouped data after excluding outliers from the grouped data.2. From the Pearson and Spearman correlation analysis, it was observed that the maximum stress level S max had a solid negative relationship with logN, and the loading frequency f had a solid positive relationship with logN.Simultaneously, the height to width ratio (h/w) and shape of the tested specimens had weak relationships with logN.3. Excluding the sustained strength of the concrete variable, originally treated as the seventh input variable in the work of Zhang et al. [4], resulted in improving the MSE

Figure 1 .Figure 1 .Figure 1 .
Figure 1.Distribution of frequency of the variables used to run the models.The relationships between various independent variables and logN are plotted in Figure2.Although not strong, one linear relationship is identified in Figure2a(logN vs. Smax).All other plots show non-linear behavior.

Figure 2 .
Figure 2. Scattered plot between independent and dependent variables.

Figure 2 .
Figure 2. Scattered plot between independent and dependent variables.

Figure 4 .
Figure 4. Structure of the random forest model.

Figure 3 .
Figure 3. Structure of neural network model.

Figure 3 .
Figure 3. Structure of neural network model.

Figure 4 .
Figure 4. Structure of the random forest model.

Figure 4 .
Figure 4. Structure of the random forest model.

Figure 5 .
Figure 5. Structure of the gradient boosting model.

Figure 6 .
Figure 6.Structure of the AdaBoost model.

Figure 5 .
Figure 5. Structure of the gradient boosting model.

Figure 5 .
Figure 5. Structure of the gradient boosting model.

Figure 6 .
Figure 6.Structure of the AdaBoost model.

Figure 6 .
Figure 6.Structure of the AdaBoost model.

Figure 12 .
Figure 12.Predicted vs observed data in model developed.Finally, the results of the developed models using the training average data and testing average data are shown in Figure13.The gradient boosting model has the highest value of R 2 with the training dataset and testing dataset.

Figure 12 .
Figure 12.Predicted vs observed data in model developed.

Figure 12 .
Figure 12.Predicted vs observed data in model developed.

6. 5 .Figure 14 .
Figure 14.Results of sensitivity analysis with various training and testing ratios.

Figure 15 .
Figure 15.Taylor diagram of ML model developed.

22 Figure 16 .
Figure 16.Permutation feature importance of the gradient boosting model.

Figure 16 .
Figure 16.Permutation feature importance of the gradient boosting model.

Table 1 .
Statistical features of original data.
(1) Since shape is a categorical variable, the statistical features expressed in the table may not be meaningful.

Table 2 .
Statistical feature of data excluding outliers.

Table 3 .
Statistical feature average data excluding outliers.

Table 4 .
Outlier identified in grouped data.

Table 6 .
Average grouped data excluding outliers.

Table 7 .
Pearson and Spearman correlation coefficient.

Table 7 .
Pearson and Spearman correlation coefficient.

Table 8 .
(a) Result of ML models with training original data.(b) Result of ML models testing original data.
MSE: mean squared error, RMSE: root mean squared error, MAE: mean absolute error; R 2 : coefficient of determination.

Table 9 .
(a).Result of ML models with training data excluding outliers.(b).Result of ML models with testing data excluding outliers.

Table 10 .
(a).Result of ML models with training average data excluding outliers.(b).Result of ML models with testing average data excluding outliers.

Table 11 .
Sensitivity analysis of ML models with different training and testing ratios. ModelR

Table 11 .
Sensitivity analysis of ML models with different training and testing ratios.

Table 12 .
Comprehensive evaluation of ML models.

Table 12 .
Comprehensive evaluation of ML models.
Figure 15.Taylor diagram of ML model developed.