Ensemble Machine Learning-Based Approach for Predicting of FRP–Concrete Interfacial Bonding

: Developments in ﬁber-reinforced polymer (FRP) composite materials have created a huge impact on civil engineering techniques. Bonding properties of FRP led to its wide usage with concrete structures for interfacial bonding. FRP materials show great promise for rehabilitation of existing infrastructure by strengthening concrete structures. Existing machine learning-based models for predicting the FRP–concrete bond strength have not attained maximum performance in evaluating the bond strength. This paper presents an ensemble machine learning approach capable of predicting the FRP–concrete interfacial bond strength. In this work, a dataset holding details of 855 single-lap shear tests on FRP–concrete interfacial bonds extracted from the literature is used to build a bond strength prediction model. Test results hold data of different material properties and geometrical parameters inﬂuencing the FRP–concrete interfacial bond. This study employs CatBoost algorithm, an improved ensemble machine learning approach used to accurately predict bond strength of FRP– concrete interface. The algorithm performance is compared with those of other ensemble methods (i.e., histogram gradient boosting algorithm, extreme gradient boosting algorithm, and random forest). The CatBoost algorithm outperforms other ensemble methods with various performance metrics (i.e., lower root mean square error (2.310), lower covariance (21.8%), lower integral absolute error (8.8%), and higher R-square (96.1%)). A comparative study is performed between the proposed model and best performing bond strength prediction models in the literature. The results show that FRP–concrete interfacial bonding can be effectively predicted using proposed ensemble method. Conceptualization, Y.N.; writing—review Y.N. B.K.; administra-tion, B.K.; B.K. the of


Introduction
The fiber-reinforced polymer (FRP) bonding technique is used for compacting and rehabilitating civil structures. FRPs are widely used because they are lightweight and have high corrosion resistance, strength, and elastic modulus [1]. In recent years, externally bonded FRP sheets have been used to fortify reinforced concrete (RC) structures and in the repairing, retrofitting, and rebuilding processes in structural engineering [2]. FRP-concrete composite structures also help reduce the risk of steel rebar corrosion by preventing the penetration of destructive Cl ions from the external environment into concrete [3]. Although FRPs are used to improve bonding with concrete, premature failures, such handling (GMDH) networks was investigated to measure the bond strength and perform a comparison with multiple linear and nonlinear regression models [21]. The GMDH model proved to outperform other models and improve accuracy [22]. An artificial neural network (ANN)-based model was built to predict the bond strength of concrete [23]. Experiments on integrating ANN with fuzzy logic to calculate the bond strength of steel bars with concrete were conducted, and works proved the ANN-based model to be more capable of accurately predicting the ultimate bond strength than the FL-based model [24]. An investigation on ANN models was performed to forecast the shear capacity of FRP-RC beams devoid of shear reinforcement. A parametric analysis was attempted to establish the relationship between the variables influencing the shear capacity, but this resulted in a few inconsistencies. Many studies using ANNs based on different civil structures were performed, and comparisons with earlier neural network-based models were made, revealing an accurate prediction by ANN models in spite of their complexity and fluctuating nature [25]. Models built using ANNs and genetic algorithms (GAs) adapt and produce the relationship between variables based on the training data. A simple model based on the gene expression program (GEP) approach was employed to assess the concrete shear capacity of FRP-RC slender beams exclusive of stirrups [26]. A novel prediction model applying the GEP to estimate the bond strength of concrete and fiber-reinforced polymer was employed, producing an improved accuracy compared to earlier models [27]. Backpropagation neural networks (BPPN) were implemented to forecast the shear resistance of retrofitted FRP-RC beams. A BPPN model for predicting the bond strength using a published experimental database was employed, and the performance was compared with those of common existing analytical models [28]. An intelligent technique like the fuzzy logic-based inference system was established to measure the shear competence of FRP-RC beams [29]. The BPPN model proved to provide an efficient alternative method compared to those in the experimental results and other existing analytical models. The neural network and neurofuzzy approaches for predicting the bonding strength of FRP composites were implemented. Consequently, these models were found to be difficult to use as empirical formulas [30]. ANN-and GA-based models were also employed to calculate the ultimate concrete bond strength of glass-based FRP bars, and they obtained a more enhanced prediction than linear regression models [31]. The effectiveness of ANN and support vector machine algorithms was utilized to analyze the bond behavior of FRP systems [32]. ANN-and fuzzy logic-based inference systems with an adaptive network were developed to measure the strength of GPC samples. A biogeography-based program was introduced to predict the shear strength of FRP-RC beams and devised to perform better than those presented by investigational results [33]. The model was more accurate and robust than other guideline models. Meanwhile, the regression analysis model was recommended to predict the cohesion strength between concrete and FRP bars [34]. A multi-gene genetic program prediction prototype was established to estimate the bonding strength between concrete and FRP bars. The approach also utilized genetic algorithm and regression models to express the competent prediction of FRP-concrete bond strength. However, the prototype was very dynamic in nonlinear modeling, with the hypothesis that its input parameters are not permanently reliable [35]. Compared to the working principles of individual machine learning algorithms, ensemble algorithms yield an enhanced performance using their integrated operating principles. Boosting algorithms help reduce the bias, which results in an improved performance. Integrating additional new machine learning algorithms in series would help fix the prediction errors made by previous models. The advancements in the usage of machine learning algorithms across different domains allow the researchers to provide artificial intelligence-based solutions for various real-time problems [36][37][38]. An experimental study on the evaluation of a variety of machine learning models for the bond strength prediction was conducted. Accordingly, the recommended hybridization of machine learning models was found as a suitable substitute for empirical models [39]. This study aims to build an ensemble method-based bond strength predictive model that is to be trained from a large dataset of single-lap shear bond tests on FRP-concrete specimens collected from the literature.

Bond Behavior between FRP and Concrete
The deterioration of concrete structures due to increased loads or defective designs threatens the structural safety. FRPs are used along with concrete to strengthen structures. The most common challenge in using FRP with concrete is the premature debonding or peeling of the FRP from the concrete. Hence, the essential aspects of the behavior of FRP to concrete interfaces must be analyzed to enhance the interfacial bonding strength. Existing theoretical and experimental studies have listed six essential parameters with a control over the determination of the bond strength of FRP-concrete reinforcing members [40]. These parameters are the bond length, concrete strength, FRP stiffness, FRP-concrete width ratio, stiffness, and strength of adhesive. The positivity of the externally reinforcing members lies with the cohesion of the FRP and the concrete material. The bond strength is influenced by the surface preparation and the general concrete quality [41]. In addition, the bonding concrete should be free from scattered layers and detached particles [42]. The bond strength is influenced by composite action, interface stresses, and stress-strain distribution. Among the different influencing factors, the composite behavior plays a vital role in deciding on the bond strength.

Bond Strength Influenced by the Composite Behavior
Along with civil engineering materials, FRPs are primarily used as enhancement substitutes for the infrastructure rehabilitation of concrete structures. Advancements in the new styles of reinforcement nanoparticles have enabled the extensive usage of FRP composites for strengthening civil structures. The perfect bonding between concrete and steel reinforcement must be ensured to extend the durability of RC. The FRP thickness influences the stress, which leads to a bonding failure that limits the width-to-thickness ratio used in the FRP [43]. The strain compatibility derived from the bonding plays a vital role in many design and analysis methods. Initially, the bond strength was believed to be greatly affected by the surface preparation and the concrete composite quality. Maximum importance is given to the methods used for the surface preparation and the composite materials used [44]. Later studies in the literature have reported that strain compatibility does not affect the bonding failure [45]. The failure mode is observed in bonding during an increased composite behavior. The properties, characteristics, and major factors affecting the bond strength are identified as the bond length, axial stiffness of the laminate, adhesive compression strength, and concrete compressive strength. During the bonding process, the strain can be transferred to the FRP, and slip will occur in the adhesive. The degree to which the strain is transferred determines the overall resistance of the structure. The composite behavior of the externally strengthened concrete beams must be maintained at all stages up to failure [46]. The developments of new methods are utilized considering the failure theories formulated on the homogenized description of composite materials. The major pitfall in FRP strengthening occurs due to brittle failure modes. Thus, the behavior of the bond between FRP and concrete surfaces is considerably subjective to the composite behavior. Various studies have been conducted to justify the external reinforcement to the concrete surface by the adhesive bond. The literature has shown that strain compatibility is influenced by the section depth, which creates an interrogation by which the degree of composite behavior influences. The external anchorage properties remain significant in maintaining the composite behavior [47]. In a RC beam with CFRP bands and without a supplementary dock, the composite action is terminated to 85% of the initial beam load [48]. The failure mode variation is observed with an increase in the composite behavior. The adhesive properties and features are identified as a significant factor when developing a composite action. Adhesives can allocate the stress, which relies on its association between concrete and its laminates, stresses between layers, toughness, elasticity, and viscosity of the material. Low creep is also a required feature. The lack of any of these properties can be harmful to the composite behavior.

Data Preparation from the Single-Lap Shear Bond Tests
Various studies employed several pull-off tests to assess the bond strength of the concrete edge-FRP behavior. The pull-off tests (e.g., single-lap shear test, double-lap shear test, and bending tests.) were performed on adhesive joints pulled at the end of the FRP-concrete. The debonding that occurs at the joints was then measured. Although the double-lap shear and bending tests have advantages, simulating the bonding strength between the FRP and concrete has remained challenging. In the literature, a large number of studies have proposed FRP-concrete bond strength predictive models using the single-lap shear test due to its easier operations. The simplicity and the low cost involved in the specimen manufacture have contributed to the widespread use of the single-lap shear method for producing data on adhesively bonded joints. Figure 1 demonstrates the test setup for the single-lap shear test arrangement of the FRP-concrete interface. The single-lap shear test consists of rectangular adherends of a uniform size bonded together with an overlapping length of 12% to 25% relative to the adherend length. The test aims to assess the bond strength; thus, the yield point of the adherend in tension must not be exceeded. The maximum permissible length of the adherend is denoted as L, representing the thickness function. The adherend stiffness is estimated as follows through Equation (1) where σ Y is the adherend yield stress; t is the adherend thickness; and τ is the expected average shear strength of the adhesive.
Mathematics 2022, 10, x FOR PEER REVIEW 5 of 22 on its association between concrete and its laminates, stresses between layers, toughness, elasticity, and viscosity of the material. Low creep is also a required feature. The lack of any of these properties can be harmful to the composite behavior.

Data Preparation from the Single-Lap Shear Bond Tests
Various studies employed several pull-off tests to assess the bond strength of the concrete edge-FRP behavior. The pull-off tests (e.g., single-lap shear test, double-lap shear test, and bending tests.) were performed on adhesive joints pulled at the end of the FRPconcrete. The debonding that occurs at the joints was then measured. Although the double-lap shear and bending tests have advantages, simulating the bonding strength between the FRP and concrete has remained challenging. In the literature, a large number of studies have proposed FRP-concrete bond strength predictive models using the singlelap shear test due to its easier operations. The simplicity and the low cost involved in the specimen manufacture have contributed to the widespread use of the single-lap shear method for producing data on adhesively bonded joints. Figure 1 demonstrates the test setup for the single-lap shear test arrangement of the FRP-concrete interface. The singlelap shear test consists of rectangular adherends of a uniform size bonded together with an overlapping length of 12% to 25% relative to the adherend length. The test aims to assess the bond strength; thus, the yield point of the adherend in tension must not be exceeded. The maximum permissible length of the adherend is denoted as L, representing the thickness function. The adherend stiffness is estimated as follows through Equation where σY is the adherend yield stress; t is the adherend thickness; and τ is the expected average shear strength of the adhesive.    The compressive and tensile strengths of concrete are standardized to eradicate the size conflicts of the samples in dissimilar international data. The ductile strength of concrete ft is transformed based on the design requirement in the Chinese concrete code GB 50010-2010, where F t = 0.395 (F c ) 0.55. Table 2 lists the compressive strength conversion rules for concrete cube and cylinder [49].

Existing Bond Strength Models
Many predictive models were proposed based on the empirical relations standardized with experimental data for measuring the bond strength between concrete and FRP laminates. Models based on fracture mechanics theories and adopting simple assumptions based on calibrated experimental data were used. All models required a pull test on a bonded FRP specimen [13]. In the literature, a large number of studies proposed FRP-concrete bond strength predictive models using the single-lap shear test. This test produces data on adhesively bonded joints. The simplicity and the low cost involved in specimen manufacture have contributed to the widespread use of this method for assessing FRP-concrete adhesive bonding. Different variants of the bond strength models are available, and the evaluated test results are quoted in data sheets. Most of the bond strength estimation models are categorized as average bond shear-stress-based models, effective bond length-based models, and fracture mechanism-based bond strength models. The parameters influencing the bond strength are based on the physical considerations of concrete properties and the dereliction pattern observed during a shear bond test. The primary physical parameters considered for evaluation are the compressive strength of concrete, F c , and the width of the concrete substrate, b c . In addition, the FRP sheet properties like stiffness, K f ; width, b f ; elastic modulus, E f ; sheet thickness, t f ; and bond length, L f , are considered for evaluation. The researchers implemented a backpropagation neural network-based ANN model to predict the bond strength of the FRP-concrete interface.
This ANN-based bond strength prediction model derived the width correction coefficient b f /b c and stiffness K f of the FRP. These derived values are used as an additional input parameter while building a predictive model [49][50][51][52]. The ANN is a data-driven method for identifying hidden outlines or fitting the non-linear interdependencies among complex variables. Although the ANN was proven powerful in many prediction-based tasks, it requires a huge volume of data for model training [51]. In the ANN-based network, a huge amount of data is usually fed into the network to avoid overfitting [49]. However, conducting many single-lap shear test assemblies and labeling data with such an amount are impractical in a bond strength prediction project. New mechanisms have specified that for a small-scale dataset, ANNs struggle to overtake ensemble-based approaches, including random forest (RF), eXtreme-gradient boosting (XGBoost), and light-gradient boosting machine (LightGBM) [53]. Boosting algorithms cumulate weak base classifiers into an enhanced robust classifier and attain advanced outcomes on numerous machine learning tasks. Boosting algorithms authorize numerous naive classifiers to model minor datasets, avoiding the overfitting produced by complex classifiers. This study presents a set of ensemble methods for enhancing the accuracy in predicting the FRP-concrete interfacial bonding. The evaluation of the proposed ensemble methods is performed against the data prepared through single-lap shear bond tests on FRP−concrete specimens presented in the literature. The proposed ensemble methods are then validated against experimental data and existing models.

Proposed Methodology
This section introduces the proposed method through which the proposed model for the bond strength estimation takes the training data as the initial input and generates a random subset for the n number of training data. Let the bond strength test database be represented by D and by D = {(Xi + Yi), i = 1, 2 . . . ., N}. The independent variable Xi represents the parameters influencing the bond strength (Table 1) and given as Xi = (xi1, xi2, xi3, xi4, xi5, xi6). The bond strength Pu is represented as a dependent variable (Yi) evaluated from the single-lap shear, where Yi ε {2.4-56.5} (Table 1). In this study, CatBoost is applied as an ensemble method to build a bond strength prediction model. The CatBoost algorithm generates a set of regression trees from the random subset and splits the internal nodes of the regression tree. The results are then aggregated for model building. The proposed CatBoost bond strength prediction model is validated with the test data for the bond strength prediction. To further validate the performance of the proposed model as compared to other ensemble methods, a comparative study is performed on various ensemble classifiers, namely RF, extreme-gradient boosting (XGBoost) algorithm, and histogram gradient boosting (HGBoost) algorithm. Section 4 presents a comparison of how CatBoost works compared with the existing ANN model. Figure 2 illustrates the ensemble classifier-based bond strength prediction model.
Classic regression models like logistic regression, polynomial regression, and ANN are historically used to predict a continuous value. These algorithms have the limitation of performing with smaller datasets, as there is a higher possibility of underfitting to smaller datapoints during training. In this proposed research, boosting algorithms are implemented, which can efficiently do deep learning concepts, such as optimization and regularization on the random forest models. Further, CatBoost algorithm is introduced with a novel sampling approach called Minimal Variance Sampling (MVS) to regularize the boosting models. With MVS technique, the number of examples needed for each iteration of boosting decreases, and the quality of the model improves significantly compared to the other gradient boosting models. The features for each boosting tree are sampled to maximize the accuracy of split scoring. By leveraging the sophistication of CatBoost, a modified training cycle is implemented to take account of cross-validation datapoints. Furthermore, model save checkpoint are implemented to optimize new training datapoints. Save checkpoints provide better adaptability for using the CatBoost algorithm for training. The internal work of the ensemble classifier for building a prediction model is presented in the subsequent section. Classic regression models like logistic regression, polynomial regression, and ANN are historically used to predict a continuous value. These algorithms have the limitation of performing with smaller datasets, as there is a higher possibility of underfitting to smaller datapoints during training. In this proposed research, boosting algorithms are implemented, which can efficiently do deep learning concepts, such as optimization and regularization on the random forest models. Further, CatBoost algorithm is introduced with a novel sampling approach called Minimal Variance Sampling (MVS) to regularize the boosting models. With MVS technique, the number of examples needed for each iteration of boosting decreases, and the quality of the model improves significantly compared to the other gradient boosting models. The features for each boosting tree are sampled to maximize the accuracy of split scoring. By leveraging the sophistication of CatBoost, a modified training cycle is implemented to take account of cross-validation datapoints. Furthermore, model save checkpoint are implemented to optimize new training datapoints. Save checkpoints provide better adaptability for using the CatBoost algorithm for training. The internal work of the ensemble classifier for building a prediction model is presented in the subsequent section.

CatBoost
CatBoost is an unbiased gradient boosting algorithm with categorical features. Its important features are categorical features and a novel order-boosting scheme without predicting shift. It provides different categorical features with different solutions. Its procedure is optimized and applied in tree splitting instead of processing in the pre-processing stage. The features are with a minimum number of classes; thus, the classifier in-

CatBoost
CatBoost is an unbiased gradient boosting algorithm with categorical features. Its important features are categorical features and a novel order-boosting scheme without predicting shift. It provides different categorical features with different solutions. Its procedure is optimized and applied in tree splitting instead of processing in the preprocessing stage. The features are with a minimum number of classes; thus, the classifier integrates one hot encoding, which alters the categorical features into numeric features with a number of occurrences. For composite features, the classes are swapped with the average target. To avoid overfitting, the average sample x σi,k is calculated with the target values of the illustrations before x σi,k in an arbitrary permutation σ = (σ 1 , σ 2 , . . . .σ n ) of the dataset given in Equation (2).
where x σi,k = x σj,k will use value 1 when the circumstance is fulfilled; p denotes the prior value; and a represents the weights of the prior value. The average of the entire dataset, P, is used to perform the regression task and compute the prior probability. This feature transformation will indicate the info loss of the interaction among categorical characters. Hence, CatBoost considers the previous combination of features in the current state with the rest of the categorical features. To prevent overfitting, CatBoost is enabled with a configurable boosting arrangement based on a similar ordering principle functional to the categorical characteristics. It works with unaware trees, which also utilize the same splitting measures for the entire tree construction. These trees are proportionally stable, have no overfitting, and are quicker in absorbing in the prediction stages. In this work, CatBoost was applied with the ordered boosting mode for an efficient tree construction.
In the tree construction process using ordered boosting, for one random permutation σ of the training data, n different trees will be constructed as T1, . . . , Tn such that the tree Ti is constructed using the first i examples in the permutations. The tree Tj-1 is used to obtain the residual for the jth sample of the training data. The tree constructed at each permutation on the training data serves as a model for the data prediction. During tree construction using the ordered boosting mode in the training data, Cat-Boost initially generates a p + 1 independent random permutation. The σ = σ 1 , σ 2 , . . . .σ p permutations are used to define the split evaluation in the internal nodes of the tree construction. The σ 0 permutation is used to choose the leaf values lj of the constructed tree. During the training process, CatBoost preserves the supportive tree Tr,j, where Tq,j (i) is the present prediction for the ith instance based on the initial j instances in the variation σ r . A tree is then constructed based on it. Algorithm 1 explains how CatBoost works.
CatBoost algorithm effectively trains a random forest-based boosting model. CatBoost algorithm introduces a unique system for training called Minimal Variance Sample (MVS), a weighed sampling version of the sampling technique for regularization. CatBoost algorithm consists of the parameters needed to construct each decision tree and the parameters to configure the random forest model. It also requires specific hyper-parameters to build the boosting methodology, which will train the model. While training, the CatBoost algorithm takes the hyper-parameters for the boosting model training and optimizes it. The trained model is validated, and the applied parameters are saved. The saved parameter supports defining the threshold parameters used to construct the random forest model. The hyper-parameters of the CatBoost algorithm are saved and further optimized with more datapoints.

XGBoost
eXtreme-gradient boosting (XGBoost) is widely used as a gradient boosting method, and it is an effective ML algorithm. The XGBoost system is planned as a mountable and accurate tree boosting technique. XGBoost operates with the characteristics of reformulation of the objective task to include regularization expression, parallel tree learning with cache-conscious column block, estimated crack findings based on one-sided quantile draw, and sparsity conscious of the split function. XGBoost enhances the computing and memory capacities to quickly boost the learning process to the maximum. Even though XGBoost includes adaptations to reduce overfitting and other types of extended issues, its main feature against overfitting is a regularized model formulation. It also includes other regularization techniques, such as shrinkage and instance subsampling. The objective function of XG-Boost comprises a regularization expression Ω, which manages the model complexity. This permits the learning of a naive and predictive model and finding of a noble bias-variance tradeoff. The objective function of XGBoost is presented in Equations (3) and (4) where where T denotes the leaf count of the tree f ; the computed score of the jth leaf of tree f is denoted as w; and f (x) is a function, such as f (x) = w q(x) , where q is a tree that plots sample x to the corresponding leaf. λ represents the optimization parameter for rigid regularization which makes prediction less sensitive to training data by decreasing the variance. Parameter γ is the threshold for the score function for splitting the tree. Equation (5) represents the XGBoost training by minimizing the objective function on the addition of a new tree.
XGBoost supports customized loss functions with Taylor's second-order approximation for optimizing loss functions. The objective function is tuned with the w score of the leaf nodes by removing the constant values. Equation (6) presents the customized XGBoost loss function.
where g i and h i are the gradient values of the loss function. The samples allotted to leaf j are represented as I j . The ideal score w * j of leaf j for a tree q can be attained as follows, using Equation (7): To enhance the accuracy, the data are stored as blocks of memory in a compressed column format. The split value is calculated by linearly scanning each column. The split values are aggregated and applied for all leaves during the single scanning in the gradient statistics collection. This leads to the parallel algorithm of split value finding. The splitting technique accesses a non-contiguous memory using gradient values, resulting in missing cache data with several instances. XGBoost solves this issue by pre-buffering the required values and processing them.

Histogram Gradient Boosting
Gradient boosting is a collection of machine learning algorithms built as an ensemble model for classification. At every iteration, the loss function optimizes with the deepest descent minimization. The predictive function is built through many optimizations in function space. It uses both decision tree and linear regression as its base learner. It starts with the random F 0 (x) of all output samples. Gradient boosting fits the tree until it reaches the extreme number of estimators. The sum of all scaled output from the tree in the ensemble and the initial guess predict the output for the new input. The HGB with a derivable loss function is adaptable to any regression or classification problem. A binary or multi-class classification uses negative binomial log-likelihood or multinomial loss function. In multi-class regression problems, the HGB estimates the additive function F 1 (x) for each class l using the loss function Equation where L denotes the number of classes; y 1 takes the value of 1 or 0 based on the class of sample x; and p l (x) denotes the probability that x belongs to class l. The probability p l (x) is calculated using Equation (9).
The regression tree is trained with the pseudo residuals of probabilities. The output calculation of leaves is presented in Equation (10).
Equation (9) predicts the probabilities of a new observation of all classes. In boostingbased ensemble methods, the regression trees generated at each layer will incorporate the strength of the regression subtrees generated to handle different value types generated for the predictor variables.

Random Forest
Random forest is an ensemble method used for classification and regression. Due to its effectiveness and simplicity, it has become the most popular ensemble method based on previous literatures and surveys. RF is the best classifier. It is a flag shipping method that serves as an extension of begging decision trees. The RF method reduces the correlation between the subtrees generated by incorporating the randomized feature. Through bootstrap sampling, RF provides a variety of trees in the forest. Each decision tree in the forest is trained with a different set of training data from the original dataset. RF includes an extra boost of diversity during the development of each tree. A subset of features from the total set Q of the training data D is used to select the best cut for tree branching. A RF T n is generated for every bootstrap sample derived. During the tree construction, m random variables must be selected, and the best splitting criteria among the m must be identified. The subtrees are generated based on the splitting criteria, and the tree construction steps are repeated until the minimum node size n min is reached at each terminal tree node. The regression formula for predicting a new datapoint x is presented as Equation (11).
where the output of the ensemble of trees is denoted by {T n }. While generating the subtrees, RF will find the best split only among the n features. The depth of the trees increases without pruning. The final prediction is attained based on the majority of voting among the decisions made by different trees. RF overcomes the overfitting problem by averaging or combining the results of different trees. It also maintains a good accuracy even when a large portion of data is missing.

Datasets and Performance Metrics for Model Evaluation
To evaluate the bond strength of the FRP-concrete interface, a large number of singlelap shear test results used in the literature were used to build an ensemble method-based bond strength prediction model. A total of 855 single-lap shear test data instances, each with seven attributes, were considered from the literature. The dataset included the values of seven different attributes, which were strength of the concrete cylinder/cube, elastic modulus, thickness, width and length of the FRP sheet, thickness of the concrete material, and bond strength. Among these attributes, the bond strength remained as a class-labeled attribute for model building. For the experimentation, a 80-20 train-test scheme was used to make a better comparison between the proposed method and those used in the literature for the same dataset. The training dataset contained 685 instances of single-lap shear test data, while the validation data contained 170 instances of the single-lap shear test data. The overall performance analysis of the proposed ensemble methods was performed using four performance metrics, namely root mean square error (RMSE), R-square error, covariance (COV), and integral absolute error (IAE).

Root Mean Square Error
The RMSE is the standard deviation of the residual functions. It is the measure of the average scale of the error. The RMSE is calculated by applying the predicted and actual observation values in Equation (12).
where n indicates the number of instances; y i represents the predicted output value; andŷ i indicates the actual value.

R-Squared Measure
R-square is the statistical measure that denotes the proportion of variance between dependent and independent variables. R-square indicates the measure of the relationship between an independent variable and a dependent variable. It evaluates the observation around the fitted regression line and is known as the coefficient of determination. Equation (13) represents the formula for calculating the R-square: where y represents the actual recorded value;ŷ i is the predicted value of y; and y is the average of the y values.

Covariance
Covariance is the sum dissimilarity of the observations from the original and predicted values. It directly measures the relationship between values. A positive relationship between the observations implies that the actual and predicted values are near each other. A negative relationship between the observations shows that the actual and predicted values are not close, and it does not provide the best fit. The covariance (cov) between two variables (x and y) is calculated as follows Equation (14): where X i denotes the predicted values; y i denotes the actual values; X denotes the mean of the predicted value; y denotes the mean of the observed value; and n denotes the number of observations.

Integral Absolute Error
The IAE is the integration of the absolute error over the predicted and actual observations that determine the model performance. The IAE represented in Equation (15) denotes the sum of errors that occur above and below the actual point and inflicts all errors that occur equally, irrespective of the direction.
where e (t) represents the error value.

Explained Variance Score
Explained variance score (EVS) is used to measure the discrepancy between a model and actual data. The higher value of EVS indicates the higher strength of association between the parameters. Better prediction accuracy is achieved by the strong association with the parameters, and EVS can be measured using the Equation (16).
where y represents the actual recorded value;ŷ is the predicted value of y; and var represents the variance between of the values.

Mean Squared Error
The mean squared error (MSE) measures the amount of error in the proposed ensemble models. MSE is calculated by applying the predicted and actual observation values in Equation (12).
where n indicates the number of instances; y i represents the actual value; andŷ i indicates the predicted output value.

Mean Absolute Error
The mean absolute error (MAE) represents the actual average value of the absolute errors of the data points. The absolute error highlights the total value of the difference between the forecasted value and the actual value. MAE calculates the accuracy for continuous variables and quantifies the error value expected from the forecast on average. The MAE is calculated as mentioned in Equation (18).
where n indicates the number of instances; y i represents the actual value; andŷ i indicates the predicted output value.

Residual Error
Residual standard error (RSE) is represented as a model sigma, a variant of the RMSE adjusted for the number of predictors in the model. In RSE, the residual error is denoted as e. Residual error represents the difference between the value predicted by the model and the actual observed value. e = y −ŷ (19) where e represents the residual error; y indicates the actual observed value; andŷ represents the predicted output value;

Results and Discussion
This section presents two types of comparative studies based on the model performance to evaluate the effectiveness of the proposed work. The proposed bond strength prediction model was initially developed using the single-lap shear test data considered from the literature. In this proposed work, 80% of the data is utilized from the total available data for the model building. The remaining 20% of the data was later used to validate the performance of the proposed model. All the proposed ensemble models were constructed over the Python platform, with dependencies on the Python machine learning libraries. The computational resource used for this model building was Intel Core i7 with GPU RTX 2080, 3.60 GHz, and 16 GB RAM. Unlike deep learning models, ensemble methods depend only on minimum computation power. They do not need to depend on GPU servers for compu-tation. The two comparative studies performed to validate the performance of the proposed CatBoost model presented in this section involved (i) a comparison of CatBoost with other ensemble approaches and (ii) a comparison of CatBoost with the ANN algorithm.

Comparative Study of CatBoost with Other Ensemble Approaches
A comparative study was conducted with other well-known ensemble methods to validate the performance of the CatBoost method. The methods used for the comparison were XGBoost, HGBoost, and RF. Table 3 summarizes the test results of the various ensemble methods over the different performance metrics on the single-lap shear test dataset.  Table 3 evidently shows that the proposed CatBoost approach achieved maximum performance in predicting the bond strength on the single-lap shear test dataset, with the maximum R-square value of 0.961 and minimum RMSE, COV, and IAE values of 2.310, 0.218, and 0.088, respectively. In addition, CatBoost algorithm perform better than all the other models in terms of other performance metrics, including EVS, MSE, MAE, and RE. The results also illustrate that, next to CatBoost, XGBoost had a better performance in predicting the bond strength values, followed by HGBoost and RF. Figures 3 and 4 presents a visualization of the performance comparison of the proposed CatBoost on different performance metrics with the other ensemble methods. Figure 5 depicts the performances of the ensemble methods by plotting the predicted bond strength values with a sequence of bond strength tests. The actual and predicted bond strength results of CatBoost more accurately overlapped each other compared to those of the other ensemble methods. Figure 6 shows the estimator result analysis of the ensemble methods, which clearly demonstrates that the estimator results of CatBoost and its true test results are less scattered compared with those of other ensemble methods. The boosting schemes used in CatBoost helped reduce overfitting and improved the model quality. Furthermore, CatBoost used the symmetric trees method for the tree construction; hence, it can have fast inference over the other ensemble methods.

Comparative Study of the Ensemble Approach with ANN
In the literature, an ANN-based model [1] was developed by considering the singlelap shear test data considered in 34 reference papers. The ANN-based model produced a satisfactory accuracy when compared to the other existing models in the literature. In the preceding subsection, a comparative study was conducted between the proposed CatBoost method and the ANN-based model. The result showed that the proposed CatBoost method has better performance than the ANN-based model in predicting the bond strength of FRP-concrete. A comparative study on performance was also conducted between the proposed ensemble methods and the existing ANN-based model on single-lap shear test data (Figure 3). Figure 7 depicts the consolidated performance analysis of CatBoost and other ensemble classifiers, showing that the former provides the best fit.
CatBoost produced the maximum accuracy when compared to the other ensemble methods in terms of the bond strength prediction; hence, a separate comparison was conducted. CatBoost delivers better performance on the training and evaluation while modelling with the bond strength dataset. CatBoost takes advantage of the categorical features and mainly leverages that for model building. It assumes that absolute features are more powerful and best suited for the dataset with a more definite feature set. CatBoost is perfectly implemented for the bond strength dataset with mixed data types. CatBoost also does a better job in training a model with a relatively smaller dataset. CatBoost algorithm offers various hyperparameters to tune and custom call-back functions. Table 4 presents the performance comparison of the CatBoost and ANN models.                  Figure 6 shows the estimator result analysis of the ensemble methods, which clearly demonstrates that the estimator results of CatBoost and its true test results are less scattered compared with those of other ensemble methods. The boosting schemes used in CatBoost helped reduce overfitting and improved the model quality. Furthermore, Cat-Boost used the symmetric trees method for the tree construction; hence, it can have fast inference over the other ensemble methods. preceding subsection, a comparative study was conducted between the proposed Cat-Boost method and the ANN-based model. The result showed that the proposed CatBoost method has better performance than the ANN-based model in predicting the bond strength of FRP-concrete. A comparative study on performance was also conducted between the proposed ensemble methods and the existing ANN-based model on single-lap shear test data (Figure 3). Figure 7 depicts the consolidated performance analysis of Cat-Boost and other ensemble classifiers, showing that the former provides the best fit. CatBoost produced the maximum accuracy when compared to the other ensemble methods in terms of the bond strength prediction; hence, a separate comparison was conducted. CatBoost delivers better performance on the training and evaluation while modelling with the bond strength dataset. CatBoost takes advantage of the categorical features and mainly leverages that for model building. It assumes that absolute features are more powerful and best suited for the dataset with a more definite feature set. CatBoost is perfectly implemented for the bond strength dataset with mixed data types. CatBoost also does a better job in training a model with a relatively smaller dataset. CatBoost algorithm offers various hyperparameters to tune and custom call-back functions. Table 4 presents the performance comparison of the CatBoost and ANN models. The proposed CatBoost ensemble classifier was compared with the ANN classifier using all statistical measures (i.e., RMSE, R-square, COV, and IAE). Table 4 [14], and the ANN model (IAE = 23.04%, RMSE = 5.56, and COV = 28.6%) [14] in the literature) was performed to validate the proposed model. Figure 9 presents the comparison of the error evaluation of the proposed models with the models used in the literature.  [14], and the ANN model (IAE = 23.04%, RMSE = 5.56, and COV = 28.6%) [14] in the literature) was performed to validate the proposed model. Figure 9 presents the comparison of the error evaluation of the proposed models with the models used in the literature.    The error evaluation comparison presented in Figure 9 shows that the ensemble methods used herein can produce maximum accuracy compared to any other model presented in the literature. The ANN-based model performed better than the other traditional algorithms in the literature by maintaining the minimum error rate. The ensemble methods helped convert a set of feeble learners into robust learners by combining diversified learners. This enabled the ensemble methods to reduce variance and bias compared to the other machine learning algorithms. The error evaluation comparison presented in Figure 9 shows that the ensemble methods used herein can produce maximum accuracy compared to any other model presented in the literature. The ANN-based model performed better than the other traditional algorithms in the literature by maintaining the minimum error rate. The ensemble methods helped convert a set of feeble learners into robust learners by combining diversified learners. This enabled the ensemble methods to reduce variance and bias compared to the other machine learning algorithms.

Conclusions
This work employed an ensemble machine learning approach for FRP-concrete bond strength prediction. To build the model, a large number of single-lap shear test experimental data from the reference papers of FRP-concrete bond strength data were collected. The collected data were pre-processed to eliminate the missing data, finally reaching a total of 855 complete test result datapoints. The complete data set included the values of seven important attributes, which are strength of the concrete cylinder/cube, elastic modulus, thickness, width and length of the FRP sheet, thickness of the concrete material, and bond strength value. These seven attributes from 685 instances were used to build the ensemble model for the bond strength prediction. The remaining 170 instances were utilized for the model validation. Further, the performance of the ensemble was validated with all the other possible performance metrics and showed an improved performance compared to other machine learning models used in the literature. Among the proposed ensemble models, CatBoost algorithm managed to produce maximum performance compared to the other ensemble methods.
Furthermore, the hyperparameters of CatBoost algorithm are perfectly finetuned in this research work for a relatively small dataset that has mixed data types. Despite the significant performance of CatBoost algorithm, finetuning of the hyperparameters remains challenging and requires proper experimentation on larger dataset. Further research can be conducted by considering additional parameters in addition to the existing single-lap shear test data available in the literature. This enables the future researchers to determine the sensitivity of the FRP-concrete interfacial bonding in external conditions.