A Comparative Analysis of Machine Learning Models in Prediction of Mortar Compressive Strength

: Predicting the mechanical properties of cement-based mortars is essential in understanding the life and functioning of structures. Machine learning (ML) algorithms in this regard can be especially useful in prediction scenarios. In this paper, a comprehensive comparison of nine ML algorithms, i.e., linear regression (LR), random forest regression (RFR), support vector regression (SVR), AdaBoost regression (ABR), multi-layer perceptron (MLP), gradient boosting regression (GBR), decision tree regression (DT), hist gradient boosting regression (hGBR) and XGBoost regression (XGB), is carried out. A multi-attribute decision making method called TOPSIS (technique for order of preference by similarity to ideal solution) is used to select the best ML metamodel. A large dataset on cement-based mortars consisting of 424 sample points is used. The compressive strength of cement-based mortars is predicted based on six input parameters, i.e., the age of specimen (AS), the cement grade (CG), the metakaolin-to-total-binder ratio (MK/B), the water-to-binder ratio (W/B), the superplasticizer-to-binder ratio (SP) and the binder-to-sand ratio (B/S). XGBoost regression is found to be the best ML metamodel while simple metamodels like linear regression (LR) are found to be insufﬁcient in handling the non-linearity in the process. This mapping of the compressive strength of mortars using ML techniques will be helpful for practitioners and researchers in identifying suitable mortar mixes.


Introduction
Rapid urbanization throughout the globe has increased the demand for construction materials.Concrete is perhaps the most widely used artificial material in the construction industry.However, the recent impetus of the world towards finding sustainable and ecofriendly means of construction has led to a lot of research on improving concretes and trying to reduce their adverse impact on the earth.Many researchers in this regard suggest the use of metakaolin (MK) as a partial replacement for Portland cement [1].MK is obtained by high-temperature (700-900 • C) calcination of silica and alumina.Khatib et al. [2] suggest that MK can be used to replace some amount of cement in mortars, resulting in significant improvement in the mechanical performance of concretes.It has been reported by other researchers as well that the addition of MK can help in the improvement of the compressive strength (CS) of mortars [3].Some analytical formulas have been derived by researchers for estimating the CS of mortars [4].However, these analytical relations are difficult to derive when MK is present in the mortars.Significant non-linearity has been reported by researchers in such cases, making simple statistical models like response surface methodology insufficient.Machine learning (ML) metamodels are viable alternatives in such cases.Among various ML methods, artificial neural networks (ANN) have received a lot of attention in this field.Onal and Ozturk [5] tried to derive a cause-effect synergy between microstructural characteristics and CS of cement mortars.They relied on ANN metamodels and found that CS and microstructural characteristics share a strong correlation.Asteris et al. [6] employed ANN to estimate the mortar compressive strength based on its mixed components.Eskandari-Naddaf and Kazemi [7] used ANN to predict the compressive strength of cement mortar.They carried out experimental verification for their ANN metamodels and concluded that the cement strength class is also a significant input parameter and thus should be included in data-driven methods.Sharifi and Hosseinpour [8] developed a new formula to express the CS as a function of input parameters by using ANN metamodels.
Some researchers have also relied on metaheuristic algorithms to train and enhance the ANNs.For example, Asteris et al. [9] in a recent study used metaheuristic algorithms like biogeography-based optimization (BBO) and invasive weed optimization algorithms to train ANN for predicting the CS of mortars.Ly et al. [10] used a particle swarm optimization (PSO) algorithm to train ANNs in the prediction of CS for foamed concrete.They showed the PSO-trained ANN to be significantly better than vanilla ANN.Zhao et al. [11] used BBO and a multi-tracker optimization algorithm to enhance the performance of ANN in prediction tasks of the CS of manufactured-sand concrete.A similar attempt was made by Sun et al. [12] by incorporating an artificial bee colony algorithm to train the hyperparameters of ANN.
The majority of the literature on the estimation of the CS of mortars is seen to be focused on the use of ANNs.In some cases, researchers have compared the performance of ANNs with other MLs.For example, Armaghani and Asteris [13] compared ANN metamodels with ANFIS metamodels for the prediction of the CS of mortars and reported that ANFIS metamodels were prone to overfitting in some cases.Sevim et al. [14] too compared the utility of ANFIS and ANN in such applications.Asteris et al. [15] used ANN and genetic programming (GP) metamodels for estimating the 28-day CS of cementmetakaolin mortars.They found ANNs to be superior to the GP metamodels.Dao et al. [16] compared several Gaussian process regression (GPR) metamodels with ANN metamodels and reported that the GPR metamodel with a Matern32 kernel function outperforms others.Mohammed et al. [17] carried out a comparison of ANN, M5P trees and non-linear regression methods.They found curing time to be the most important input parameter in estimating the CS of mortars.Similarly, Abdalla and Salih [18] compared M5P trees, GP and ANN metamodels.
Comparison studies on the application of ML algorithms other than ANNs for the estimation of the CS of cement mortars are relatively fewer.Asteris et al. [19] in a recent study compared the performance of k-nearest neighbors (kNN), decision tree regression (DT), support vector regression (SVR), random forest regression (RFR) and AdaBoost regression (ABR) algorithms.They reported RFR and ABR algorithms to be the most apt for the task.Çalışkan et al. [20] compared group methods of data handling, SVR and extreme learning machine (ELM) and found ELM to outperform the other two metamodels.Ozcan et al. [21] used SVR, RFR, ABR and Bayes classifier to develop estimation metamodels for CS based on four input parameters.
From the above literature survey, it is observed that most of the researchers have mostly relied on ANNs for the prediction modelling of the CS of mortar mixes.Very few comparative studies of ML techniques have been carried out, but they too have generally compared only two to three ML techniques.Thus, there is a literature gap in terms of a comprehensive comparison of ML techniques.In this paper, nine popular ML methods (namely, linear regression (LR), RFR, SVR, ABR, multi-layer perceptron (MLP), gradient boost regression (GBR), DT, hist gradient boost regression (hGBR) and XGBoost regression (XGB)) have been compared for developing metamodels for the CS of mortar mixes.The hyperparameters of the metamodels are tuned for unbiased comparison.

Linear Regression
Linear regression (LR) is a statistical method widely used for analyzing the effect of input (independent) and output (dependent) variables.Recently it has been widely implemented in various machine learning libraries and thus has become extremely popular as a machine learning algorithm [22].
LR is capable of deriving linear models, i.e., a model that establishes the dependent variable (y) as a linear function of the independent variables (x).In simple LR, only one independent variable (x) is present, whereas in multiple LR, two or more independent variables (x) are present.In simple LR, the model is of the following form: Equation ( 1) above represents the equation of a straight line, where b 0 and b 1 are the y-intercepts and the slope, respectively.
In multiple LR, instead of the line, a plane or a hyper-plane is used.The model takes the following form in multiple LR: In Equation ( 2), there are n number of predictors (x 1 , x 2 , . . ., x n ).

Random Forest Regression
The random forest algorithm is an ensemble technique.It can be employed to perform both regression and classification.The random forest algorithm uses simple decision trees as the base learners [23].It employs many decision trees to generate a model for the problem.It trains multiple trees simultaneously by using the technique of bootstrap aggregating or bagging.From a given set of training data (X), bagging techniques repeatedly select B number of times random samples (with replacement) of training data and trains the decision trees.These decision trees do not interact with each other during the building phase and are generated parallelly.Once trained, these trees can be used to predict the unknown samples (x ′ ) by using the following equation: In general, the decision trees have high variance.However, when they are combined parallelly to form the random forest, the overall variance becomes low.During classification tasks, the random forest algorithm uses the mode of the classes, whereas during regression, it uses the mean of decision trees.

Support Vector Regression
Support vector machine (SVM) can be employed for both regression and classification tasks [24].The prime objective of SVM is to generate the best decision boundary that can separate n-dimensional space into distinct classes.In SVM, the best decision boundary is referred to as a hyperplane.The algorithm chooses support vectors which can then aid in the creation of the hyperplane.SVM can be linear or non-linear.If the data is linearly separable, linear SVM is used.In such cases, the data can be segregated into two classes by a straight line.In cases when the data cannot be segregated by using a single straight line, non-linear SVM is used.

AdaBoost Regression
AdaBoost or adaptive boosting is an ensemble technique.AdaBoost works by conjugating several weak learners into a single strong learner.Generally, AdaBoost uses single split decisions, called decision stumps, as the weak learners [25].AdaBoost allocates weights to the classification instances depending on the difficulty level, i.e., instances that are difficult to classify are allocated higher weights than those instances that are easy to classify.Adaptive boosting can be employed for both regression and classification tasks.
Switching from LR to AdaBoost allows mapping of many more non-linear relationships, which results in better estimation and, thus, higher accuracy.As a general principle, AdaBoost builds ensembles by sequentially adding members which have been trained on those instances of data which are proving the most difficult to correctly predict.Each new predictor is given a training set where the difficult examples are increasingly represented; this is achieved either through weighting or resampling.
An ABR is a meta-estimator which initiates by training a regressor on the training dataset and then trains further replicas of the regressor on the training data.However, in the training of the subsequent regressors, weights of instances are attuned depending on the error encountered in the current regressor.Thus, successive regressors attain more accuracy by focusing more on difficult cases.

Multi-Layer Perceptron
Multi-layer perceptron (MLP) is a machine-learning and deep-learning method that defines a complex architecture for artificial neural networks [26].The method is generally used for the supervised learning format.MLP is typically a feed-forward neural network which produces a set of output from a set of input.It is broadly created from several layers of the perceptron.The multiple layers of input nodes are linked just like a directed graph.In MLP, the adaline rule and perceptron rule are used in training a neural network of a single layer, and weights are updated based on the unit function on a linear function in the adaline or perceptron rule.
An MLP is a fully linked neural network.It consists of three layers which include a hidden layer.When there is more than one hidden layer, it is referred to as a deep artificial neural network (ANN).The number of neurons and the number of layers are the ones that require tuning.To find the ideal values for the hyperparameters (number of neurons and number of layers), cross-validation approaches are employed.Backpropagation assists in training weight adjustments.
The MLP approach involves three steps: • Forward propagation-data propagation begins at the input layer and forwards to the output layer; • Error calculation-find the error (variation between the estimated and known outcomes) based on the output; • Backpropagate the error-the derivative based on respective weights in the network is determined, and then the model is updated.
The steps are repeated across multiple epochs to establish ideal weights, and then the output is taken through a threshold function to get the estimated class labels.

Gradient Boosting Regression
Gradient boosting is an ensemble technique.It is primarily used for regression tasks.The method works on the boosting principle where several weak learners are conjugated to form a strong learner.Generally, decision trees are used as the weak learners.First, base trees with single nodes are constructed.Subsequent trees are constructed depending on the errors of the previous trees.The trees are scaled by using the learning rate, which ensures the contribution of each tree to the overall prediction.The subsequent trees are joined with the preceding trees to predict the response.The process is repeated unless the maximum number of trees is reached or the resulting responses are not improved [27].
Gradient boosting regression can help in expressing dependent variables as functions of independent variables.GBR can be used for the prediction of the numeric output; thus, the response variable should be numeric.

Decision Tree Regression
Decision tree regression uses the characteristics of an object to train a model within the tree structure to estimate future data to generate meaningful continuous data.It encompasses interrogating the data through a series of questions with each question narrowing the possible values until a confident model is developed.The decision tree regression decomposes a given dataset into smaller datasets.It consists of a decision leaf that splits into two or more branches which present the value of the attribute that is being examined.The node that is at the top, i.e., topmost, is the most suitable predictor, and it is referred to as the root node.The decision tree regression technique uses a top-down tactic.Splits are made based on standard deviation.When a sample is entirely homogenous, its standard deviation is 0, whereas when the standard deviation is greater, it implies that the degree of homogeneity is higher [28].
Decision tree regression generally uses mean squared error (MSE) to decide on splitting a node into two or more sub-nodes.In decision tree regression, the way to find the best split is to try each variable and every potential value of the given variable and identify the variable and value that give a split with the best score.

Hist Gradient Boosting Regression
Hist gradient boosting regression is a method for training faster decision trees employed in gradient boosting.Binning or discretizing can be used to dramatically speed up the process of training trees which are added to an ensemble.This makes the hist gradient boosting method implement its algorithm for the input variables.Each of the trees that are added to an ensemble tries to correct the forecasted errors through the models that already exist in the ensemble.
The hGBR technique is implemented along with other techniques.For example, it can be implemented with the scikit-learn machine learning, which is a library that gives an experimental implementation of gradient boosting which underpins the histogram approach.In particular, it provides the HistGradientBoostingRegressor and HistGradientBoostingClassifier classes.According to the scikit-learn documentation, hGBR implementation has faster orders of magnitude compared to the default GBR implementation offered by the library.

XGBoost Regression
XGBoost or extreme gradient boosting is a popular and powerful ML technique for building supervised regression models [29].XGB is highly efficient and computationally effective.It has remarkably higher accuracy than decision trees but lacks the interpretation ability of decision trees.Base learners are needed in XGB.The algorithm trains and keeps on adding base learners to form an ensemble learning which then can perform the prediction.The objective function of the XGB contains a regularization term and the loss function.The variation in the target values and the predicted values, i.e., the variation between the model predictions and actual values, is known using the objective function.

Problem Description
The compressive strength of cement-based mortars is the most important property to be considered while selecting the appropriate mixture for construction.A large experimental dataset on cement-based mortars is collected from the literature [19].This dataset has 424 data points generated experimentally by 20 different research studies [1,[30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48].However, all these studies have followed the same testing standards.The dataset contains information about the CS of mortar mixes with and without MK.In addition, superplasticizers (SP) are present in some of the experiments.The dataset has six different input parameters, namely the age of specimen (AS), the cement grade (CG), the metakaolin-tototal-binder ratio (MK/B), the water-to-binder ratio (W/B), the superplasticizer-to-binder ratio (SP) and the binder-to-sand ratio (B/S).The AS is measured in days, while CG and CS are measured in MPa.The MK/B, W/B, SP and B/S are measured as w/w.It should be noted that one of the most prominent features of the dataset is the inclusion of CG, which, despite being an important input parameter, has often been missed by previous computational modelling studies [19].The objective of this computational modelling study is to use the ML algorithms in developing metamodels to express the CS as a function of the six input parameters.

Characteristics of the Cement-Based Mortars Database
The collated database is analyzed to reveal the statistical characteristics (Table 1) of all the input and output parameters.The range of the CS of the tested mortar mixes is found to be between 4.1 MPa and 115.25 MPa, which indicates the diversity in the test samples considered in the dataset.Figure 1 contains pair plots that show the level-wise spread of the data points.It is observed that the dataset contains CS values for only five possible binder-to-sand ratios (B/S).The range of the B/S in the dataset varies from 0.33 to 0.51, indicating that researchers agree that typically a binder-to-sand ratio of 1:3 or 1:2 is desirable.Similarly, from Figure 1, it is observed that the experimental dataset contains CS values for only seven possible cement grades (CG).Despite this, a close investigation of the CG pair plot indicates that the researchers prefer CG in three distinct zones, i.e., low (~32 MPa), mid (~42 MPa) and high (~53 MPa).To check for multicollinearity, the Pearson's correlation coefficients are computed for the input and the output parameters and presented as a heatmap in Figure 2. The relatively strongest correlation coefficient between any two inputs is seen for the water-to-binder ratio (W/B) and the superplasticizer-to-binder ratio (SP) at −0.44, indicating a weak negative correlation.The other inputs share weak to negligible correlations amongst themselves.The strongest relation between any input and CS is seen for the water-to-binder ratio (W/B) and CS at −0.58.

Tuning of ML Metamodels
The ML metamodels are initially tuned to optimize the most prominent hyperparameters.This is done by varying the number of estimators and training the ML metamodels.These trained ML metamodels are then tested on the test dataset, and their MAE, MSE and R 2 are recorded and analyzed.Figure 3 shows the effect of the number of estimators in RF on the accuracy of the metamodel.It is seen that the effect is quite non-linear and, thus, underlines the fact that the number of estimators in RF cannot be set arbitrarily without carrying out a pilot study.The MAE of RF is found to be lowest at 300 estimators while MSE is lowest at 700 estimators.The R 2 of RF is found to be highest at 700 estimators, and thus, it is selected for the study.In Figure 4, the effect of change in estimators on the AdaBoost accuracy is analyzed and found that 200 estimators are sufficient.Similarly, in Figure 5 for MLP, the improvement in its performance is found to be directly related to the number of perceptrons used.However, beyond a threshold, the rate of improvement is found to be negligible.Thus, for further analysis in the study with MLP, the number of perceptrons is fixed at 2750.In the case of GBR, as shown in Figure 6, the metamodel's accuracy is seen to be quite significantly affected by the number of estimators.For GBR, 1400 estimators were found to be the most optimized value for maintaining the high accuracy of the metamodel.

Prediction of Cement-Based Mortars' Compressive Strength
The tuned ML metamodels are then deployed on the collated dataset predict the CS of mortar mixes.The scatter plots of the actual versus the ML predictions are plotted in Figure 7 for both training and testing data.The prediction scatter by LR in Figure 7a shows that these simple linear metamodels are insufficient in capturing all the variance in the training data and, thus, naturally are bound to record a poorer performance on testing data.The predictions of the LR metamodel are seen to differ greatly from the true values.The performances of the metamodels are also assessed by using statistical error metrics like R 2 , MSE, MAE and maximum error.These are reported in Table 2 for training data and in Table 3 for testing data.The R 2 of the LR metamodel is found to be below 50% on both training and testing datasets, thereby confirming its inability in handling the non-linearity in the association of inputs to the CS of mortar mixes.Figure 7b presents the performance of RFR, and it is seen that its performance is near ideal for training data.Almost all the datapoints are found to be within the ±20% error bound.The RFR metamodel is seen to be more accurate for higher CS values as compared to the CS values below 40 MPa.The R 2 of the RFR on training and testing is seen to be 99% and 97%, respectively.The SVR metamodel's performance in Figure 7c is found to be somewhat poorer than RFR.Here too, at low CS values, the prediction errors are significantly higher.The R 2 of the SVR on training and testing is realized to be 95% and 93%, respectively.In Figure 7d, the ABR is found to perform very poorly for CS values less than 60 MPa.Even at higher CS regions, the ABR is seen to go beyond the ±20% error bound.The R 2 of the is around 80% for both training and testing, indicating the metamodel is able to explain only 80% of the variance in the dataset.The MLP metamodel's performance is also seen to be similar to that of ABR.However, its performance on testing data is found to be relatively better than its performance on training data (Figure 7e).The R 2 of the MLP on training and testing is found to be 84% and 90%, respectively.GBR metamodels show remarkable accuracy in predicting the CS of mortar mixes (Figure 7f).The prediction scatter of DT, hGBR and XGB metamodels is shown in Figure 7g-i, respectively.Both DT and XGB are found to have excellent prediction performances.
The computational experiments are carried out on a windows platform with Intel(R) Core (TM) i7 CPU @3.40 GHz and 24 GB RAM.All the ML algorithms are programmed and realized using Python in Jupyter Notebook 6.4.5.The approximate computational time for the deployment of each ML algorithm is presented in Figure 8. LR takes the least time to deploy while MLP takes the most time.Overall, the LR, XGB, hGBR and SVR are found to remarkably faster than GBR, ABR, RFR and MLP.

TOPSIS-Based Selection of ML Metamodel
As evidenced from the previous section, some of the developed ML metamodels have conflicting characteristics.For example, the LR metamodel has the lowest computational requirement but is not able to achieve high accuracy.Similarly, XGB metamodel is very accurate but has a high computational time requirement.Thus, it is essential to select an appropriate metamodel which presents a balanced solution.However, the selection of the best metamodel cannot be done arbitrarily and, thus, a multi-attribute decision-making (MADM) method called TOPSIS (technique for order of preference by similarity to ideal solution) is used.The description of TOPSIS is beyond the scope of this article and can be found elsewhere [49,50].The initial decision matrix for TOPSIS is shown in Table 4.It has nine alternatives and nine criteria.The training R 2 , MSE, MAE and maximum error as well as testing R 2 , MSE, MAE and maximum error along with computational time are considered as the nine criteria.Since, for ML-based approaches, more importance should be given to the performance of metamodels on test data as compared to training data, in this study, 0.05 weight is allocated to training R 2 , MSE, MAE and maximum error each while 0.15 weight is allocated to testing R 2 , MSE, MAE and maximum error.Computational time is accorded 0.2 weight.Using TOPSIS methodology, the weighted normalized decision matrix is calculated and shown in Table 5.The Euclidian distance of each metamodel from the hypothetical ideal solution derived by TOPSIS is shown in Figure 9.For the best metamodel, the S i + distance should be low while the S i − should be high, indicating that the metamodel is close to the positive ideal solution (PIS) but away from the negative ideal solution (NIS).As per TOPSIS, XGB and hGBR are found to be the best two metamodels.

Conclusions
In this paper, a comprehensive comparison of nine different ML (namely LR, RFR, SVR, ABR, MLP, GBR, DT, hGBR and XGB) is carried out for predicting the CS of mortar A large dataset on cement-based concretes consisting of 424 samples was used in the study.Six input parameters, namely the age of specimen (AS), the cement grade (CG), the metakaolin-to-total-binder ratio (MK/B), the water-to-binder ratio (W/B), the superplasticizer-to-binder ratio (SP) and the binder-to-sand ratio (B/S) were studied to assess their influence on the CS of mortar mixes.Based on the study, the following conclusion can be drawn:

•
In terms of computational time, the metamodels can be arranged from most expensive to least expensive as MLP > RFR > ABR > GBR > DT > SVR > hGBR > XGB > LR; • Using a MADM method called TOPSIS, the metamodels can be ranked from best to worst based on a compromise solution as XGB > hGBR > SVR > GBR > DT > RFR > MLP > ABR > LR.

Figure 1 .
Figure 1.Pair plots for the input parameters and observed responses.

Figure 3 .
Figure 3.Effect of number of estimators on random forest regression's performance in terms of (a) MAE, (b) MSE and (c) R 2 .

Figure 4 . 2 𝑅𝑅 2 Figure 5 .
Effect of number of estimators on AdaBoost regression's performance in terms of (a) MAE, (b) MSE and (c) R 2 . Effect of number of perceptrons on MLP's performance in terms of (a) MAE, (b) MSE and (c) R 2 .

Figure 6 .
Figure 6.Effect of number of estimators on gradient boosting regression's performance in terms of (a) MAE, (b) MSE and (c) R 2 .

Figure 8 .
Figure 8. Computational time of each ML metamodel.

Figure 9 .
Figure 9. Distance of each alternative (ML metamodel) from the hypothetical ideal solution.

Table 1 .
Statistical summary of the dataset.

Table 2 .
Performance of the ML metamodels on training data.

Table 3 .
Performance of the ML metamodels on testing data.

Table 4 .
Initial decision matrix for TOPSIS.