Hybrid Ensemble Model for Predicting the Strength of FRP Laminates Bonded to the Concrete

The goal of this work was to use a hybrid ensemble machine learning approach to estimate the interfacial bond strength (IFB) of fibre-reinforced polymer laminates (FRPL) bonded to the concrete using the results of a single shear-lap test. A database comprising 136 data was used to train and validate six standalone machine learning models, namely, artificial neural network (ANN), extreme machine learning (ELM), the group method of data handling (GMDH), multivariate adaptive regression splines (MARS), least square-support vector machine (LSSVM), and Gaussian process regression (GPR). The hybrid ensemble (HENS) model was subsequently built, employing the combined and trained predicted outputs of the ANN, ELM, GMDH, MARS, LSSVM, and GPR models. In comparison with the standalone models employed in the current investigation, it was observed that the suggested HENS model generated superior predicted accuracy with R2 (training = 0.9783, testing = 0.9287), VAF (training = 97.83, testing = 92.87), RMSE (training = 0.0300, testing = 0.0613), and MAE (training = 0.0212, testing = 0.0443). Using the training and testing dataset to assess the predictive performance of all models for IFB prediction, it was discovered that the HENS model had the greatest predictive accuracy throughout both stages with an R2 of 0.9663. According to the findings of the experiments, the newly developed HENS model has a great deal of promise to be a fresh approach to deal with the overfitting problems of CML models and thus may be utilised to forecast the IFB of FRPL.


Introduction
Reinforced concrete structures are composite materials of concrete and reinforcement which are subjected to wear and tear throughout the structure's life span [1]. This emphasises the need for proper maintenance and sustainability of RC structures. To address the self-strengthening factors in a structure's construction process, many new technologies have been introduced and implemented [2]. Fibre-reinforced polymer (FRP) laminates bonded to the concrete prism comprise one of the retrofitting methods which facilitates an increase in the structural capacity performance of beams [3][4][5], columns [6,7], and beam-column joints [8][9][10][11]. The FRP plates exhibit outstanding resistance to corrosion [12], fatigue [13,14], creep, and hygrothermal deformations [15] and lead to lightweight and high-strength structures that can substitute the use of structural reinforcement. Different types of FRPs, namely CFRP, BFRP, and GFRP, are used in structural applications owing to their cost and benefits. Although CFRP is expensive, it has very good mechanical characteristics and corrosion, creep, and fatigue resistances. Contrarily, GFRP and BFRP have relatively weak mechanical characteristics and corrosion resistance, especially when exposed to the alkaline prism by Su et al. [80]. The observation for the training and validation of the data resulted in the accuracy of R 2 equalling 0.81 and 0.91, but the developed models can be developed based on their accuracy. We can also use gene expression programming (GEM), which is a robust technique used to form an intact relationship between input and output attributes, in the form of a simple mathematical equation. However, the ANN models provide no information about the basis of the relationship between the mathematical equations and attributes related to each other [81][82][83], and hence an overall parametric analysis should be conducted to investigate the effects of input attributes on IBS as they have an efficient command over the type of strengthening techniques in terms of efficiency and viability.
Taking these considerations into account, a hybrid ensemble approach was used to forecast ISB of FRP laminates externally bonded to the concrete prism on groves in this study. Using data from a custom database, we presented, analysed, and discussed six widely used conventional soft computing techniques: an artificial neural network (ANN) [84], extreme machine learning (ELM) [85], the group method of data handling (GMDH) [86], the multivariate adaptive regression splines (MARS) [87], the least squaressupport vector machine (LSSVM) [81], and Gaussian process regression (GPR) [88]. When the conventional machine learning (CML) models were combined with an ANN model, a hybrid ensemble technique was proposed. For "unstable" models, an ensemble model works on a high level to improve their performance. It is worth noting that in many situations, an ensemble model comprised of multiple machine learning (ML) models outperforms a single model. The hybrid ensemble model (HENS) [89,90] presented in this paper comprised six CML models and one ANN. FRP laminates externally bonded on the concrete prism utilising SSTs results were predicted using conventional and HENS models, respectively, to aggregate the CML models' output. Eight performance indices were used to evaluate the prediction performance of the models used, including ANN, ELM, GMDH, MARS, LSSVM, GPR, and HENS. The suggested HENS model's benefits and applicability were confirmed by comparisons with CML models based on experimental data. The research plays an important role to explore the potential of the HENS model in the prediction of the ISB of FRP laminates externally bonded to the concrete prism on the groves using SST results (anchorage made on one end of FRP to the concrete prism shown in Figure 1). For better representation of the results, the study also includes the visual interpretation of results using an accuracy matrix and Taylor diagram.

Research Significance
In order to solve the challenges associated with using empirical formulas to forecast the IBS of FRP, it was clear that an alternative approach was required as soon as possible. For this purpose, the approaches of soft computing, with their proficiency for non-linear modelling, have emerged as a significant class of prediction tools, providing answers to the ever-increasing complexity of optimisation issues. In this research, an enhanced application of CML algorithms was presented for the estimation of IBS of FRP using a hybrid ensemble technique. The traditional feed-forward ANN was chosen for the construction of the HENS model because of its straightforward network topology, its capacity to deal with varying degrees of complexity, its ease of application, and its capability to provide highly non-linear modelling. The findings of this research will provide engineers with tools to predict the IBS of FRP and will make it easier to design FRP constructions that are less complex and more durable while also improving the level of accuracy with which such designs can be made.

ANN
Artificial neurons [84] are conceptually developed from biological neurons, which are made up of elements that operate in parallel and are organised in ways that resemble biological neural networks. Each artificial neuron receives inputs and generates a single output that can be transmitted to a number of other neurons. There are three essential layers of ANN: the input layer, the output layer, and the hidden layer. The outputs of neurons are achieved by calculating the weighted total of all the inputs, which is then multiplied by the weights of the connections between the inputs and the neurons. Later, a bias term is added to this total. The weighted sum is processed through a (typically non-linear) activation function to create the output. The system generates a result that is similar to the target result. For this study, a single hidden layer was used to develop the model. For x ∈ R d , with Kij (xi, xj)i, j = 1, . . . , n for all pairs of x ∈ R d could make K(X, X) the covariance matrix.

ELM
ELM is a novel soft computing approach that aims to bridge the gap between machine learning and biological learning mechanisms [85]. Extreme refers to a strong approach with learning capabilities comparable to the human brain in this context. Unlike ANNs, which require hidden neuron tuning during the learning phase, an ELM, whether it is a single-or multi-hidden layer feed-forward network, does not require any tuning. ELM has been effectively used in learning capacity for feature learning, clustering, regression, and classification applications in recent years [85,91]. ELM was created as a quick-learning, single-layer feed-forward network (SLFN) with high generalisation capacity, making it easier to build than other soft computing approaches.
In an ELM, the input data are mapped to an M-dimensional ELM random feature space and the network output is mapped according to the following equation: where h i = hidden nodes output for input x; M = dimensionality of the ELM random feature space; β i = output weight matrix between the hidden nodes and the output nodes.

GMDH
The group method of data handling (GMDH) is a set of techniques for computerbased structure identification and mathematical modelling [86]. The majority of GMDH algorithms employ an inductive self-organising method to generate a multi-parametric model. Inductive GMDH algorithms allow for the automatic discovery of data interrelationships, the selection of an ideal model or network architecture, and the improvement of the accuracy of current algorithms. The group method of data handling is a collection of methods for solving various issues. It includes methods for parametrics, clustering, analogue complexion, rebinarisation, and probability. This inductive technique is focused on sorting out more complex models and selecting the best solution with the minimum possible external criteria. Basic models include not just polynomials, but also non-linear, probabilistic functions and cauterisations. A GMDH model with several inputs and one output that is a subset of the base function's components is written as: where elementary functions' dependency on different sets of inputs is denoted by f i , a i are coefficients, and m represents the number of the base-function components. GMDH methods use multiple component subsets of the base function (in eqn), termed as partial models, in order to obtain the optimal answer. The least squares approach is used to estimate the coefficients of these models. The number of partial model components in GMDH algorithms is continuously increased until the minimal value of an external criterion indicates a model structure with optimal complexity. This is referred to as model self-organisation.

MARS
Jerome H. Friedman [87] proposed multivariate adaptive regression splines (MARS) as a type of regression model in 1991. It is a non-parametric regression approach, which may be seen as an extension of linear models, that incorporates nonlinearities and interactions between variables automatically. MARS may be thought of as an ensemble of linear functions linked together by one or more hinge functions. The results of mixing linear hinge functions may be seen in the figure below, where the black dots represent observations, and the red line is a MARS model forecast ( Figure 2). The algorithm is divided into two stages: forward and backward. In the forward step, it generates a large number of candidate basis functions in pairs. On the other hand, a pair of functions will be incorporated into the model if it lowers the overall model error. In the first phase, the maximum number of functions that the model generates with a hyper-parameter may be adjusted. Whereas, the pruning step, also known as the backward stage, goes over each function one by one and deletes those that do not contribute any value to the model. This is accomplished through the use of a cross-validation-based generalised cross-validation (GCV) score. It is only a rough estimate of the real cross-validation score, with the goal of penalising model complexity. A set of linear functions may be put forward as in the following simple equation: where, f (x) is the output and .Weighted sum of basis functions is denoted by . Each α i c i term represents a constant coefficient.

LSSVM
A least-square support vector machine (LSSVM) is a supervised machine learning method that may be used for both regression and classification. It was created by Vapnik in 1995 [92] and is based on statistical learning theory. SVM techniques project data into a high-dimensional feature space and employ kernels to classify nonlinearly separable datasets [93,94]. In multidimensional space, an SVM model is essentially a representation of various classes in a hyperplane. SVM will generate the hyperplane in an iterative manner in order to reduce the error. SVM's objective is to split datasets into classes such that a maximum marginal hyperplane (MMH) may be found. The data points closest to the hyperplane, or the points of a data set that, if deleted, would change the location of the dividing hyperplane, are called support vectors. As a result, they might be regarded as important components of data collection. In general, the accuracy of the SVR model is determined by the kernels used and their parameters. The radial basis function (RBF) has been shown to perform well as a kernel function for SVM in several forecasting experiments) [95].
For a data set input vector space and y ∈ R is an output in a 1-dimensional vector space, SVM regression can estimate the relationship between x and y. In the SVM approach, the risk function is minimized by minimizing both empirical risk and ω 2 .
where the regression data vector is ω and the loss is denoted by l ε , which presents the difference between y i (real output) and f → . The risk function can be minimized with the following function: is a product of kernel function and b is a bias term.

GPR
Supervised learning has two key components, and their approaches are classification and regression issues. Gaussian process regression is one of the most appealing supervised learning nonparametric techniques for predictions among the many approaches [96]. Considering a data set: where x ∈ R d is a d-dimensional input vector space and y ∈ R is an output in a 1-dimensional vector space, GPR regression can estimate the relationship between x and y. The conditional distribution of outcomes due to certain inputs is important for understanding the link between inputs and outputs in the regression technique. A joint Gaussian distribution covers a finite number of random variables in the Gaussian process. The Gaussian process f (x) can specify the mean and covariance functions:

Descriptive Statistics and Statistical Analysis
Previous research yielded 136 experimental findings for a single lap-shear test [80], all of which were factored into the development of a HENS using CML. It is abundantly clear that the elastic modulus of FRP multiplied by the thickness of the fibre (E f t f , GPa-mm), which is also known as the axial stiffness, the width of the FRP (b f , mm), the concrete compressive strength (fc, MPa), the width of the groove (b g , mm), and the depth of the groove (h g , mm) were all utilised as input variables, while the ultimate capacity (P, KN) was the desired variable. Table 1 presents the descriptive statistics of the input and output parameters. From this table, one can see that the E f t f parameter ranged from 12.90 to 78.20 with a skewness of 0.58, the b f parameter ranged from 60 to 6270, the b g and h g parameter ranged from 10 to 1405, and the fc parameter ranged from 48.20 to 4585.40. However, the output value p ranged from 4.76 to 25.49 with a skewness of 0.80. After the descriptive analysis that was described above revealed that the collected database had a wide range of experimental data, statistical analysis was carried out in order to measure the degree of correlation (DOC) between the aforementioned parameters. This was carried out in order to draw the appropriate conclusions. According to the information presented by the Pearson Correlation in Figure 3, the DOC between p and other parameters (with the exception of E f , t f , and b f ) was smaller when all of the parameters were examined. This is the conclusion reached by the information. On the other hand, it was shown that the DOC between E f t f and b f was substantially higher.

Sensitivity Analysis
In general, sensitivity analysis (SA) is a technique that is used to determine how changes in input parameters affect the response of the proposed models. This will assist us in identifying the input parameters based on their influence on the result. The Cosine Amplitude Method [51] was used in this work to calculate the amount of influence of the inputs on the response, i.e., the IBS of FRP. The data pairings in this study were represented in a data array, X, as follows in Equation (7): and variable xi in X is a length vector of m as in Equation (8).
The correlation between the strength of the relation (Rij) and datasets of xi and xj are provided by Equation (9).
The graphical representation of R ij shows the relation between the IBS of FRP and the input parameters as shown in

Performance Parameters
To evaluate the effectiveness of the developed models, eight distinct performance indices (Equations (10)-(17) were calculated). These included the determination coefficient (R2), performance index (PI), variance account factor (VAF), Willmott's index of agreement (WI), root mean square error (RMSE), mean absolute error (MAE), RMSE observation standard deviation ratio (RSR), and weighted mean absolute percentage error (WMAPE) [59,86,89,. The values of these indices should be identical to their ideal values, which are provided in Table 2. This will ensure that the prediction model is errorfree. Take note that the ability of any predictive model to generalise is evaluated by determining various metrics, such as the degree of correlation, the associated error, the amount of variation, and so on, from these various aspects. This evaluation is carried out in order to determine the generalisation capacity of the model. where y andŷ are the actual and estimated output; n is the total number of observations; and y mean is the average of the actual values.

Simulation of Soft Computing Models
By analysing 136 tests, we hoped to be able to forecast FRP's interfacial bond strength. The goal of this research was to determine how the five input parameters affect the IBS using the most efficient CML models. After that, the outputs of the created CML models were aggregated using the ANN and the HENS model was built. To train the models, 70% of the dataset (i.e., the training dataset of 110 instances) was used, while the remaining data (28 instances) were used for testing. The prediction success was evaluated using eight performance parameters. With these indices, we could assess how well our models worked by comparing the actual and anticipated values and measuring correlation coefficients, variances, and related errors. The created models and their outputs are presented in the following subsections for a more in-depth comparison. The flowchart of developed models has been shown in Figure 5.

ANN Model
It was determined that only five hidden layers were necessary, and the best model was found via trial and error. For example, Table 3 provides a breakdown of the model's performance parameters. When ANN was being trained, it had an accuracy rate of over 91% (R 2 = 0.9159), whereas when it was being tested, its accuracy rate increased to 92% (R 2 = 0.9290). In terms of MAE (training = 0.0448 and testing = 0.0514) and RMSE (training = 0.0594 and testing = 0.0620), the best training results for the best feed-forward ANN structure were adopted with five inputs. For the training and testing datasets, the experimental and expected values of the ANN model are displayed in Figures 6 and 7.

ELM Model
In contrast, in order to figure out the structure of the ELM, the number of hidden neurons, which can range anywhere from 2 to 30, was counted. In this particular research project, the sigmoid activation function was put to use in order to choose the most effective ELM model. The optimal number of hidden neurons was determined to be six, which was arrived at through a process of trial and error. The model performance is tabulated in the Tables 3 and 4. It was observed that the ELM model attained the R 2 value of 0.7881 in training, whereas it improved little to the value of 0.8647 in the testing phase (Tables 3 and 4).

GMDH
The best-performing GMDH model was determined by a trial-and-error method with a maximum of five layers of neurons. The value of alpha was set to 0.6 to achieve the best result. When GMDH was being trained, it had an accuracy rate of over 91% (R 2 = 0.9154), whereas when it was being tested, its accuracy rate increased to 93% (R 2 = 0.9359). Tables 3 and 4 provide a breakdown of the model's performance parameters.

MARS Modelling
Initial parameters, such as the maximum number of BFs, GCV penalty per knot, maximum number of interactions, and so on, need to be tuned properly in MARS modelling in order to produce the best predictive model, as was explained in the methodology section (refer to Section 3.4). Table 5 contains the final values of the effective parameters, which may be found by following this process. Table 5. Basis function of the MARS model.

Basis Function Models
For the purpose of constructing the MARS model, which is used for forecasting the desired output, piecewise linear combinations of BFs were employed in this method. In order to arrive at an accurate calculation of the IFB of FRP, each of the five variables were first analysed under a variety of scenarios, and then the 'states' were formulated. In the beginning, the MARS will build a number of linear combinations (also known as linear BFs), and then, at the end, it will produce a model (MARS) to predict the desired output (y). The specifics of the BFs that make up the MARS model are provided in Table 5, and the equation that represents the final model may be seen below (17).
The expression that is presented in Equation (17) can be utilised as a ready-made solution for problems that require a more realistic approach. FRP's IFB may be calculated with an accuracy of R2 (training = 0.8941, testing = 0.8816), RMSE (training = 0.0675, testing = 0.0722), and MAE (training = 0.0499, testing = 0.0541). Table 6 presents the other performance characteristics in order to facilitate a detailed evaluation of the model.

LSSVM Model
The performance of LSSVM depends on two parameters i.e., Sigma and gamma. For the optimum modelling by trial and error, the values of those parameters were found as 60.38 and 6656.09, respectively. The elapsed time for the execution of the model was found to be 0.0016 s. The best performing LSSVM yielded an R 2 of 0.9346 in training with a value of WMAPE of 0.1107, whereas for testing the values were 0.9226 and 0.1523 respectively.

GPR Model
Two critical parameters that can be found through an initial round of trial-and-error are S (width of rbf) and ε (gaussian noise), both of which are discussed in the technique section (refer to Section 3.6). S = 0.50 and ε = 0.01 can be used as design settings for GPR's parameters. IBF values were predicted almost accurately by the GPR model during training, but a substantial variance was detected during testing. Figures 6 and 7 show a scatter plot comparison of the actual and projected values for the training and testing datasets. Generalisation capability was demonstrated by the GPR model's prediction accuracy (R 2 = 0.9775; RMSE = 0.0306 in training, R 2 = 0.8404; RMSE = 0.0944) in both stages. Tables 3 and 4 provide further information on the various performance metrics.

HENS Modelling
The results of six separate CML models were reported earlier in the section. As indicated in Table 3, all of the generated models were able to accurately estimate the IFB of FRP. Given the huge size of the testing dataset, the comparatively strong R 2 (ranging from 0.7811 to 0.9775) and low RMSE (ranging from 0.0306 to 0.0938) prediction values show this (110 instances). However, a close examination of the data shows that GPR was the best standalone model, whereas ELM was the poorest. Other performance indices, such the WMAPE and PI, support GPR's prediction performance at all levels, in contrast to the R 2 and RMSE values. Furthermore, it is important to remember that a model that performs well in testing and has a high level of accuracy is generally considered to be a robust model. As a result, improving their validation performance may make it possible to construct an even more robust model. As a result, the ensemble technique was used in this work without any additional ML models being included. Once these issues are resolved, this will be a more positive outcome for everyone. Without enough data, the computer models produced a wide range of outputs with variable degrees of accuracy. When multiple models are combined, there is a lower chance of selecting the erroneous one. In this study, a HENS model was utilised to combine the six CML models used to forecast the IFB of FRP in an attempt to apply ensemble modelling. The standard feed-forward ANN was used to build the suggested HENS model, which aggregates the outputs of all six CML models as inputs and outputs the actual IFB value, as mentioned in the methodology section (refer to Section 3). Figure 5 depicts the HENS model's implementation approach in the form of a flowchart, which makes it easy to see how amalgamation works. Tables 3-6 show the HENS model's ability to accurately anticipate outcomes. When compared with the solo CML models that had been created, the ensemble hybrid model yielded the best predictions. Table 3's CML models were compared, and the HENS model achieved substantial results in all phases, R 2 (training = 0.9783, testing = 0.9287), VAF (training = 97.83, testing = 92.87), RMSE (training = 0.0300, testing = 0.0613), and MAE (training = 0.0212, testing = 0.0443)) when compared. The predictability of the constructed model, which includes HENSM, was assessed using R 2 , VAF, RMSE, MAE, and other performance criteria. Staggered plots show how projected and actual values compare for training and testing datasets, respectively, in Figures 6 and 7. The suggested HENS model is thoroughly evaluated in the following sections to ensure its robustness and generalisability.

Taylor Diagram
As shown in Figures 8 and 9, the Taylor diagram was utilised to analyse the performance of the hybrid support vector machine models for testing and training datasets respectively. This diagram establishes whether or not the model is able to accurately forecast the intended result. In order to obtain a relative measure of how well the three models compare to one another, we looked at three distinct statistical criteria (RMSE, correlation coefficients, and standard deviation ratios). As the reference point, we use the centred RMSE, which is defined as the distance from the measured point. In the reference model, both the standard deviation and the correlation coefficient were both configured to have a value of 1. It is clear from looking at the graph that the values of the standard deviation and correlation coefficient for all seven models during the training phase were quite near to 1. It is possible to draw the conclusion from the graph that the ELM model had the lowest correlation in training, whereas the GPR and HENS model delivered the best performance during the training phase. Out of all five models, the HENS model had the best performance for the testing dataset. As a result, it is possible to draw the conclusion that the HENS model has the best overall performance because it produced satisfactory results for both datasets.

Accuracy Matrix
To better demonstrate the values of performance indices, a recently proposed heat map-shaped graphical assessment, called accuracy matrix, was tested to visualise the model efficiency [120]. This matrix displays multiple statistical parameters to measure the model's predictive performance for the training and testing datasets. Figure 10 displays the accuracy matrix for the performance indices determined in this study. It indicates the accuracy of the performance parameters (in percentage) by comparing them with their corresponding ideal values. For example, the ideal value of MAE is 0 and, in the present work, the value of MAE for the training subset was determined to be 0.0448 for the ANN model (see Table 3. Thus, it can be estimated that the ANN model attained 96% ((1 − 0.0448) × 100%) accuracy in terms of MAE. On the other hand, the values of R2 and PI were obtained as 0.9421 and 1.7957 in the testing phase, respectively, for HENS (see Table 4), which shows that HENS attained 94% ((0.9421/1) × 100%) and 90% ((1.7957/2) × 100%) accuracy in terms of R 2 and PI, respectively. A similar procedure was followed for the other parameters as well. However, it may be noted that the parameters such as VAF, which are determined in percentage terms, should be converted in their decimal form before implementing the abovementioned procedure.

Conclusions
It is important to note that an accurate and trustworthy estimate of the concrete IFB of FRP laminates can aid with cost/performance optimisation while also saving time. The estimation of IFB of FRPL was reported in the current work. In this work, six independent machine learning models-ANN, ELM, GMDH, MARS, LSSVM, GPR, and a HENS model-were introduced. A trustworthy experimental database made up of 136 FRPs was employed for this purpose. To train and validate the used models, the obtained dataset was split into training and testing subsets. Several statistical techniques were used to compare and analyse the accuracy of all models. Through the study, the following conclusions were drawn:

2.
It was observed from the experimental data that the suggested HENS model achieved the maximum prediction accuracy by minimising the particular flaws of CML models. Additionally, it is clear that the current HENS model (R 2 = 0.9663, VAF = 96.60, RMSE = 0.0383, and RSR = 0.1847) was the best-performing model among the other models, according to Table 6, was able to handle the overfitting problem of the GPR model and exhibited all the desired trends in the parametric study of FRP, confirming the superiority of the suggested method at all levels.

3.
The HENS model had high potential to forecast the intended IFB of FRPL bonded to concrete, as shown by the parametric analysis, and was extremely easy to implement. It also has a very cheap computational cost (only 10 s), is representational, and performed better than the standalone models.
In conclusion, this study successfully used an ensemble ML model to predict the IFB of FRPL bonded to concrete. It seems to be a precise and yet computationally effective technique. However, a thorough analysis of the outcomes suggests that the suggested strategy could be further enhanced in the future, possibly including a thorough evaluation of the HENS model approach for complicated issues in many technical and scientific domains. As far as the authors are aware, this study is the first to apply a hybrid ensemble of CML models to forecast IFB or FRPL bonded to concrete.