Torsional Capacity Prediction of Reinforced Concrete Beams Using Machine Learning Techniques Based on Ensembles of Trees

: For the design or assessment of framed concrete structures under high eccentric loadings, the accurate prediction of the torsional capacity of reinforced concrete (RC) beams can be critical. Unfortunately, traditional semi-empirical equations still fail to accurately estimate the torsional capacity of RC beams, namely for over-reinforced and high-strength RC beams. This drawback can be solved by developing accurate Machine Learning (ML) based models as an alternative to other more complex and computationally demanding models. This goal has been herein addressed by employing several ML techniques and by validating their predictions. The novelty of the present article lies in the successful implementation of ML methods based on Ensembles of Trees (ET) for the prediction of the torsional capacity of RC beams. A dataset incorporating 202 reference RC beams with varying design attributes was divided into testing and training sets. Only three input features were considered, namely the concrete area (area enclosed within the outer perimeter of the cross-section), the concrete compressive strength and the reinforcement factor (which accounts for the ratio between the yielding forces of both the longitudinal and transverse reinforcements). The predictions from the used models were statistically compared to the experimental data to evaluate their performances. The results showed that ET reach higher accuracies than a simple Decision Tree (DT). In particular, The Bagging Meta-Estimator (BME), the Forests of Randomized Trees (FRT), the AdaBoost (AB) and the Gradient Tree Boosting (GTB) reached good performances. For instance, they reached values of R 2 (coefﬁcient of determination) in the range between 0.982 and 0.990, and values of cvRMSE (coefﬁcient of variation of the root mean squared error) in the range between 10.04% and 13.92%. From the obtained results, it is shown that these ML techniques provide a high capability for the prediction of the torsional capacity of RC beams, at the same level of other more complicated ML techniques and with much fewer input features.


Introduction
Reinforced concrete (RC) structures often incorporate linear members for which the torsional effects can be critical for designing or for assessing the actual cross-sectional resistance.This situation is very common in bridges and building structures which incorporate beams and columns under high eccentric loadings, or any structural system whose equilibrium depends on reinforced concrete elements resisting torsion.For the assessment of the capacity of such members, an accurate estimation of the torsional capacity in their critical cross-sections is usually required.To do so, structural designers often apply the provisions from codes of practice, which incorporate simplified models and semi-empirical equations, allowing to estimate the resistance to internal forces for a given RC cross-section.As far as torsion is concerned, and unlike what is observed for the bending capacity, recent studies have reported that current codes of practice can still fail to estimate the torsional capacity of RC beams.This conclusion arises from comparing the estimates from codes and the results from experiments [1][2][3].For RC beams with somewhat high reinforcement ratios, unsafe estimates for the torsional capacity were also reported.This can explain why structural failures related to primary torsional effects still continue to be reported in the literature [4].
The aforementioned problem has been well-known by the technical community for a long time.For this reason, efforts have been made in the past decades to propose alternative and more accurate models for RC members under torsion.One such model is the skewbending theory, proposed by Hsu in 1968 [5] from the observation of the experimental torsional failure pattern in tested RC beams, and further developed by other authors [6].For common and small RC rectangular cross-sections, simple equations were derived from this model to compute the torsional capacity, which were incorporated in some reference codes of practice, namely the American building code prior to 1995 [7] and the current Eurasian code [8].
Nowadays, most of current reference codes of practice for RC structures (such as the American, Canadian and European codes [9][10][11][12]) base their provision rules for torsion on the space truss analogy.This more rational model was proposed by Raush in 1929 [13] and further refined by other authors mainly in the second half of the last century [14][15][16][17] based on the results of extensive experimental programs.When compared with the skew-bending theory, the space truss analogy provides simpler equations for the design of RC beams under torsion and for members with a much wider range of geometrical cross-sections.With the aim of obtaining simple rules for design to be incorporated in the provisions from codes of practice, the original equations from both the skew-bending theory and space truss analogy have been simplified by using some empirical hypotheses.As a result, codes of practice incorporate many semi-empirical equations, which can still fail to provide always accurate and safe estimates for the torsional capacity of RC beams, as referred before.Hence, more refined or alternative models for RC beams under torsion are still needed and have been proposed in the past two decades.
Among the refined models proposed in the literature, some of them are still based on the space truss analogy.Some of these refined models allow to predict very well the torsional capacity of RC beams with a wide range of design attributes and torsional loadings [18][19][20][21][22][23].Alternative and more advanced analytical or numerical reliable models, including models based on the finite element method, have also been proposed [24][25][26].However, many of these models need advanced calculation procedures and demanding computational efforts just to compute the actual torsional capacity of RC beams.Hence, they cannot be easy to use by most of practitioners.
In recent years, reliable models based on several Machine Learning (ML) techniques have been proposed in the literature to solve diverse engineering problems, as alternatives to other and more complex analytical or numerical models.These models are able to provide predictions through a self-learning process based on collected or existing databases.In parallel, during the last decades, a huge amount of data has been accumulated in the field of civil engineering due to the advance in experimental programs, monitoring, data acquisition, and processing.For these reasons, ML-based models have been successfully used in complex civil engineering problems, including geotechnical, materials and structural [27][28][29][30][31][32][33][34][35][36][37][38][39], and some of them already focus on the torsional capacity of RC beams, including externally strengthened and combined RC beams [40][41][42][43][44][45][46][47][48].The referred studies on the torsional capacity of RC beams are still very limited and they generally apply different ML techniques.This is due to the fact that several ML techniques have been proposed in the literature, which are able to be used to solve problems in the field of civil engineering, including for structural concrete [46].Despite the successes achieved with the application of ML approaches in several previous studies, some key challenges still exist which prevent these models from being widely used to solve engineering problems.Perhaps the main key challenge is related with the selection process of the most appropriate and effective ML models to solve a particular civil engineering problem.For instance, some studies indicated that the best performances were reached with extreme gradient boosting (XGBoost) for problems involving the shear strength of RC beams [49,50], whereas others have mentioned other ML models to be more performant for the same types of problems [41][42][43][44][45][46][47][48][49][50][51][52][53][54][55], namely Artificial Neural Networks (ANN).This includes the analysis of RC beams under torsion, where the shear effect is predominant [40,41,[43][44][45][46]48].Ensembles of Trees (ET) have also been used in such problems, although to a lesser extent and with limited success [44].Explicit ML techniques, such as the M5P model tree [47], were also successfully used.The problem related with the selection process of the most appropriate effective ML model for RC beams under torsion needs more studies to compare the performance of different ML models.Particularizing for RC beams under torsion, another key challenge is related to the number of features to be considered as input variables for developing the ML models.The more selected number of input variables, the higher the prediction accuracy.However, the more complex the algorithm and the larger the computational cost.Hence, a compromise must be reached for practice.
Decision Trees (DT) are among the most commonly used ML models in very different problems, and they show several distinct advantages.Many algorithms require data normalization before model building.Such variable transformations are not required with DT since the tree structure will remain the same despite the transformation.They also implicitly perform feature selection and are not sensitive to outliers.In addition, DT usually require less input variables when compared to other ML models, such as ANN, to achieve similar performances.For these reasons, DT were used as ML-based models for this study.
This study aims to contribute to compare and find the best ML techniques to be applied to RC beams under pure torsion.Although ET can be considered some of the most effective and popular ML algorithms, an extensive examination of the literature indicates that studies have yet to successfully use this approach to predict the torsional capacity of RC beams Therefore, an attempt has been made to investigate this potentiality, and five ET models were developed to study the feasibility of applying these techniques for the quick estimation of the torsional capacity of RC beams.To the best of the authors' knowledge, only one previous study used two such techniques, but applied to externally bonded FRP (Fiber Reinforced Polymer) RC Beams under torsion [44].A dataset including 202 RC beams tested under pure torsion and found in the literature was used for both the training and testing stages.Three input variables were used: the cross-section area (area enclosed within the outer perimeter of the cross-section), the concrete compressive strength, and the reinforcement factor (factor accounting for the ratio between the yielding forces of both the longitudinal and transverse reinforcements).The predictions from the five ML models are statistically compared to each other, to the experimental data, and to other ML models, to evaluate their performance and reliability.
This article was structured in the following manner.Section 2 gives an overview of the used methodology, the dataset, the input and output variables, the splitting of the dataset in training and testing sets, the used ML models, the cross-validation, the hyperparameter tunning procedures, the metrics used to evaluate the performance of the models, and the used programming languages and software to implement the models.Section 3 presents the results of the models and Section 4 discusses the comparison of these results with the ones obtained in previous studies.In Section 5, a sensitivity analysis is performed to check the importance of each input variable in the prediction of the torsional capacity of RC beams.Finally, Section 6 summarizes the main conclusions.

Research Methodology
The dataset was randomly divided into two sets: one for training and another for testing.Each model was trained using the training set and was tested using the testing set.The appropriate hyperparameters of each model were determined by using a five-fold cross-validation procedure in the training set.After the hyperparameters were optimized, the testing set was used to verify the performance of each model.If the performance of the model is satisfactory, it is deemed as a final predictive model.The steps used in this study are depicted in the flowchart illustrated in Figure 1 (adapted from [30]).

Research Methodology
The dataset was randomly divided into two sets: one for training and another for testing.Each model was trained using the training set and was tested using the testing set.The appropriate hyperparameters of each model were determined by using a five-fold cross-validation procedure in the training set.After the hyperparameters were optimized, the testing set was used to verify the performance of each model.If the performance of the model is satisfactory, it is deemed as a final predictive model.The steps used in this study are depicted in the flowchart illustrated in Figure 1 (adapted from [30]).

Dataset
For this study, the dataset compiled by Bernardo et al. [3] and also used by Henedy et al. [47] was considered.In the former study, a literature review was performed to compile the main features and torsional capacity of RC rectangular beams tested under pure torsion until failure [5,6,17,[56][57][58][59][60][61][62][63][64][65][66][67][68].In total, 202 RC beams were compiled to build the dataset.Detailed information about the criteria used to discard some of the found tested beams for the dataset can be found in [3].
Table A1 in Appendix A summarizes the main geometrical and mechanical properties of the reference RC beams.The meaning of each parameter can be found in the nomenclature.Figure 2 presents the histograms with the distribution of the considered main key parameters in this study for the 202 RC beams from the dataset.From the analysis of the data presented in Table A1 and from a visual analysis from Figure 2, it can be stated that 142 and 60 beams were built with normal-strength (up to 50 MPa) and high-strength concrete (over 50 MPa, according to [11]), respectively.The average concrete compressive strength ( ) ranges between 14 MPa and 110 MPa.The total reinforcement ratio, which represents the sum of the longitudinal plus the transverse reinforcement ratio ( =  +  ), ranges between a minimum of 0.37% and a maximum of 6.36%.For most beams,  ranges between 1% and 2%.The yielding stress of the longitudinal reinforcement ( )

Dataset
For this study, the dataset compiled by Bernardo et al. [3] and also used by Henedy et al. [47] was considered.In the former study, a literature review was performed to compile the main features and torsional capacity of RC rectangular beams tested under pure torsion until failure [5,6,17,[56][57][58][59][60][61][62][63][64][65][66][67][68].In total, 202 RC beams were compiled to build the dataset.Detailed information about the criteria used to discard some of the found tested beams for the dataset can be found in [3].
Table A1 in Appendix A summarizes the main geometrical and mechanical properties of the reference RC beams.The meaning of each parameter can be found in the nomenclature.Figure 2 presents the histograms with the distribution of the considered main key parameters in this study for the 202 RC beams from the dataset.From the analysis of the data presented in Table A1 and from a visual analysis from Figure 2, it can be stated that 142 and 60 beams were built with normal-strength (up to 50 MPa) and high-strength concrete (over 50 MPa, according to [11]), respectively.The average concrete compressive strength ( f c ) ranges between 14 MPa and 110 MPa.The total reinforcement ratio, which represents the sum of the longitudinal plus the transverse reinforcement ratio (ρ tot = ρ l + ρ t ), ranges between a minimum of 0.37% and a maximum of 6.36%.For most beams, ρ tot ranges between 1% and 2%.The yielding stress of the longitudinal reinforcement ( f ly ) ranges between 308.8 MPa and 723.9 MPa.The yielding stress of the transverse reinforcement ( f ty ) ranges between 285 MPa and 714.8 MPa.For most beams, the yielding stress ranges between 300 MPa and 500 MPa.

Input and Output Variables
Based on results from previous studies [3,47], the following three input variables were used to characterize each RC beam and predict the torsional capacity  _ : the cross-section area  (area enclosed within the outer perimeter of the cross-section), the average concrete compressive strength  , and the reinforcement factor   (factor accounting for the ratio between the yielding forces of both the longitudinal and transverse reinforcements).To compute the last input variable, the units of each parameter are the same as the ones given in Table A1 in Appendix A. The reinforcement factor combines both the ratio of the longitudinal to the transverse torsional reinforcement, as well as the ratio of their yielding stresses.Hence, it can be used to characterize beams with balanced (which is a usual design criterion for pure torsion) or unbalanced (for which the reinforcement factor is different from unity) yielding forces of the longitudinal and transverse reinforcements.For this reason, the authors considered that there is no need to consider separately the effects of longitudinal and transverse reinforcement in developing the ML models in this study, as it was carried out in previous studies using ANN ML models (for

Input and Output Variables
Based on results from previous studies [3,47], the following three input variables were used to characterize each RC beam and predict the torsional capacity T R_Exp : the cross-section area A c (area enclosed within the outer perimeter of the cross-section), the average concrete compressive strength f c , and the reinforcement factor A l f ly A t f ty s (factor accounting for the ratio between the yielding forces of both the longitudinal and transverse reinforcements).To compute the last input variable, the units of each parameter are the same as the ones given in Table A1 in Appendix A. The reinforcement factor combines both the ratio of the longitudinal to the transverse torsional reinforcement, as well as the ratio of their yielding stresses.Hence, it can be used to characterize beams with balanced (which is a usual design criterion for pure torsion) or unbalanced (for which the reinforcement factor is different from unity) yielding forces of the longitudinal and transverse reinforcements.For this reason, the authors considered that there is no need to consider separately the effects of longitudinal and transverse reinforcement in developing the ML models in this study, as it was carried out in previous studies using ANN ML models (for instance, [40,41,43,48]) which usually require more input variables.The reinforcement factor was firstly proposed by Rahal [1] as an input variable and further used successfully in recent studies from the authors [3,47] using nonlinear regression and MP5 models to predict the torsional capacity of RC beams under pure torsion.
Figure 3 presents the histograms of the input variables (the concrete compressive strength was added again), as well as the output variable (torsional capacity), for the 202 reference RC beams from the database.Figure 3 shows that for most of the beams, the reinforcement factor ranges between 1.1 × 10 6 and 2 × 10 8 , and the torsional capacity ranges between 9.0 kNm and 200 kNm.
Appl.Sci.2023, 13, x FOR PEER REVIEW 6 of 30 instance, [40,41,43,48]) which usually require more input variables.The reinforcement factor was firstly proposed by Rahal [1] as an input variable and further used successfully in recent studies from the authors [3,47] using nonlinear regression and MP5 models to predict the torsional capacity of RC beams under pure torsion.Figure 3 presents the histograms of the input variables (the concrete compressive strength was added again), as well as the output variable (torsional capacity), for the 202 reference RC beams from the database.Figure 3 shows that for most of the beams, the reinforcement factor ranges between 1.1 × 10 6 and 2 × 10 8 , and the torsional capacity ranges between 9.0 kNm and 200 kNm.As it can be can noticed from Figures 2 and 3, some of the predictors and outcome variables do not obey the normal distribution curve.However, DT and ET methods do not require feature scaling to be performed as they are not sensitive to the variance in the data [55], as previously mentioned in Section 1.For this reason, data normalization was not performed in this study.
Following a previous study by Wakjira et al. [32], the correlation between each pair of parameters is also important to be analyzed.Figure 4 and Table 1 show the scatter plot and the Pearson correlation coefficient between pairs of the inputs and the output variables, respectively.Both Figure 4 and Table 1 show that there is a strong correlation between some of the input variables, namely the cross-section area and the reinforcement factor, and the output, whereas there is a moderate correlation between the concrete compressive strength and the output.In general, low degrees of correlation exist between As it can be can noticed from Figures 2 and 3, some of the predictors and outcome variables do not obey the normal distribution curve.However, DT and ET methods do not require feature scaling to be performed as they are not sensitive to the variance in the data [55], as previously mentioned in Section 1.For this reason, data normalization was not performed in this study.
Following a previous study by Wakjira et al. [32], the correlation between each pair of parameters is also important to be analyzed.Figure 4 and Table 1 show the scatter plot and the Pearson correlation coefficient between pairs of the inputs and the output variables, respectively.Both Figure 4 and Table 1 show that there is a strong correlation between some of the input variables, namely the cross-section area and the reinforcement factor, and the output, whereas there is a moderate correlation between the concrete compressive strength and the output.In general, low degrees of correlation exist between pairs of input variables, indicating that these will not cause multicollinearity problems on the models [69].A high absolute value of the Pearson correlation coefficient between pairs of inputs could affect the accuracy of the model and the interpretation of the effects of the inputs on the output [33].
Appl.Sci.2023, 13, x FOR PEER REVIEW 7 of 30 pairs of input variables, indicating that these will not cause multicollinearity problems on the models [69].A high absolute value of the Pearson correlation coefficient between pairs of inputs could affect the accuracy of the model and the interpretation of the effects of the inputs on the output [33].

Data-Splitting Procedure
To implement the ML models, the database was divided into a training set and a testing set.Each model was developed using the training set and evaluated using the testing set.As much as possible, a statistically significant association was ensured between the variables of the training and testing sets when dividing the database into subsets.Of the 202 samples of the database, 80% (161) was used for training and 20% (41) was used for testing.This division was made based on the ratio of the data-splitting procedure commonly used in several previous related studies.In particular, in [70] the authors justified the use of 70/30 or 80/20 splitting procedures.The statistical measures of the torsional capacity and input variables are given in Tables 2 and 3, for the training and testing set, respectively.

Development of Models
In this article, five ML models were implemented using the same database and inputs.The first model implemented was a simple Decision Tree (DT) regressor.A DT can be unstable since small variations in the data might result in a completely different generated tree.A single tree usually exhibits high variance and tends to overfit, not generalizing well to new data [71].This problem is attenuated by using several trees within an ensemble.Ensemble methods seek to combine the predictions of several base estimators to improve generalizability, accuracy and robustness over a single estimator.The other four models implemented in this study involve Ensemble Tree (ET) methodologies.For the sake of the readers, the following subsections briefly present an overview of each implemented method.Training data exist in the form of a training set {(x i , y i )} n i=1 , in which x i ∈ R p represents the input features and y i ∈ R represents the torsional capacity for p inputs and n samples.

Decision Trees
Decision Trees (DT) predict the value of a variable by learning decision rules inferred from the features.The deeper the tree, the more complex the decision rules and the fitter the model.DT require little data preparation when compared to other techniques that require data normalization and other transformations.It also uses a white box model, allowing an easy explanation of the conditions by Boolean logic.However, DT can create over-complex trees that overfit and do not generalize well to the data [71].This problem was avoided by setting a maximum depth for the trees, as referred in [71].The input data is recursively partitioned into two smaller regions called nodes, based on a series of decision rules until a stopping criterion is reached (in this case, the maximum depth of the tree).Samples with similar target values are grouped together, and then the response is modelled by the mean of the y variable in each region.A tree can be seen as a piecewise constant approximation.The variable and split point are chosen in order to achieve the best fit.Let the data at node m be represented by R m with n m samples.For each candidate split θ = (j, t m ) consisting of a feature j and a threshold t m , the data is partitioned into The quality of a candidate split of node m is then computed using a loss function H: The parameters are selected in order to minimize the loss function: This is carried out recursively for the two subsets until the maximum depth is reached.

Bagging Meta-Estimator
A Bagging Meta-Estimator (BME) belongs to the set of Ensemble Methods, specifically to the family of averaging methods, in which several estimators are built independently and then their predictions are averaged to form a final prediction.Averaging methods are used to improve stability and reduce the variance of the base estimator by introducing randomization into its construction procedure.As they provide a way to reduce overfitting, these methods work best with strong and complex models, in this case, fully developed trees.BME uses a bootstrapping technique to build each individual estimator on random subsets of samples that are repeatedly drawn from the original training dataset with replacement [71].A model is fitted for each bootstrap sample, giving a prediction f b (x) for b = 1, . . ., B. The final estimate is then given by:

Forests of Randomized Trees
Forests of Randomized Trees (FRT) are also an averaging methods.The prediction of the ensemble is given as the averaged prediction of the individual regressors.Each tree is built from a sample drawn with replacement from the training set.The sources of randomness decrease the variance of the estimator by reducing the correlation between trees.In this algorithm, only one subset of features is randomly selected out of the total, and the best split feature from the subset is used to split each node in a tree, unlike in Bagging, where all features are considered for splitting a node [71].

Adaptive Boosting
Adaptive Boosting or AdaBoost (AB) belongs to the set of Ensemble Methods, specifically to the family of boosting methods, in which the base estimators are built sequentially, and the goal is to reduce the bias of the combined estimator.The motivation is to combine several weak models to generate a powerful ensemble.In contrast with averaging methods, boosting methods usually work best with weak models, namely shallow trees in this case.A sequence of weak learners is fitted on repeatedly modified versions of the data.The predictions from all of them are then combined through a weighted sum to produce the final prediction.The data modifications at each boosting step consist of applying weights ω 1 , ω 2 , . . ., ω N to each of the training samples (x i , y i ) [71].Initially, all the weights are all set to ω i = 1/N, so that the first step simply trains a weak learner in the usual manner.For each successive iteration, the sample weights are individually modified, and the algorithm is reapplied to the weighted data.The training examples that were incorrectly predicted by the model induced at the previous step have their weights increased, whereas the weights are decreased for the ones predicted correctly.As iterations proceed, examples that are difficult to predict receive an increasing influence.Each subsequent weak learner is forced to concentrate on the examples that were missed by the previous ones in the sequence.

Gradient Tree Boosting
Gradient Tree Boosting (GTB) is also a boosting method.It builds an additive model in a forward stage-wise fashion allowing the optimization of a differentiable loss function.The existing residuals are used to build new trees sequentially.In each stage, a regression tree is added and fitted on the negative gradient of the loss function, reducing its value and improving the prediction.The prediction ŷi for a given input x i is of the form: where h m are the weak learners.The constant M corresponds to the number of estimators.GTB is built as follows: where the added tree h m is fitted in order to minimize a sum of losses L m , given the previous ensemble F m−1 : where l[y i , F m−1 ] is the loss function measuring how much the predicted value F(x) differs from the true value y.Trees will be built, and each iteration must satisfy the above equation.The initial model F 0 is chosen as the constant that minimizes the loss.Using a first-order Taylor approximation, the value of l can be approximated as: Removing the constant terms, This is minimized if h(x i ) is fitted to predict a value that is proportional to the negative gradient.Therefore, at each iteration, the estimator h m is fitted to predict the negative gradients of the samples.The gradients are updated at each iteration.This can be considered as a form of gradient descent in a functional space.
Regularization techniques are usually applied during training to reduce overfitting and improve the generalization of the model.A regularization strategy that scales the contribution of each weak learner by a constant factor ν is the following [72]: The parameter ν is called the learning rate because it scales the step length of the gradient descent procedure.
Stochastic Gradient Boosting combines gradient boosting with bootstrap averaging (Bagging) [73].At each iteration the base estimator is trained on a fraction subsample of the available training data.The subsample is drawn without replacement.

Cross-Validation
A standard approach for evaluating the performance of ML models with hyperparameter tuning is to divide the dataset into three subsets: training, validation, and testing.The training set is used for the learning process and the evaluation of the performance of the model is done on the validation set.After the best parameters are found, the final evaluation is performed on the test set, with samples that the model has never seen before.Note that the validation score is biased and to obtain a proper estimate of the generalization the score needs to be computed on the test set.However, dividing the data into three subsets reduces the number of samples that can be used for learning, which might result in an inadequately trained model.Cross-validation (CV) [71] is a widely used strategy for avoiding over-reduction of the training set, particularly for small datasets.A test set should still be held out for final evaluation, but the validation set is no longer needed.In this research, k-fold CV was used.This consists of splitting the training set into k smaller sets.Then, for each of the k folds, the model is trained using k − 1 of these folds and then it is validated on the remaining fold.The final performance measure is the average of the k performance values computed in this loop.Figure 5 (adapted from [74]) depicts the five-fold CV used in this research for training and also for the hyperparameter selection.
Regularization techniques are usually applied during training to reduce overfitting and improve the generalization of the model.A regularization strategy that scales the contribution of each weak learner by a constant factor ν is the following [72]: The parameter ν is called the learning rate because it scales the step length of the gradient descent procedure.
Stochastic Gradient Boosting combines gradient boosting with bootstrap averaging (Bagging) [73].At each iteration the base estimator is trained on a fraction subsample of the available training data.The subsample is drawn without replacement.

Cross-Validation
A standard approach for evaluating the performance of ML models with hyperparameter tuning is to divide the dataset into three subsets: training, validation, and testing.The training set is used for the learning process and the evaluation of the performance of the model is done on the validation set.After the best parameters are found, the final evaluation is performed on the test set, with samples that the model has never seen before.Note that the validation score is biased and to obtain a proper estimate of the generalization the score needs to be computed on the test set.However, dividing the data into three subsets reduces the number of samples that can be used for learning, which might result in an inadequately trained model.Cross-validation (CV) [71] is a widely used strategy for avoiding over-reduction of the training set, particularly for small datasets.A test set should still be held out for final evaluation, but the validation set is no longer needed.In this research, k-fold CV was used.This consists of splitting the training set into k smaller sets.Then, for each of the k folds, the model is trained using k-1 of these folds and then it is validated on the remaining fold.The final performance measure is the average of the k performance values computed in this loop.Figure 5 (adapted from [74]) depicts the fivefold CV used in this research for training and also for the hyperparameter selection.

Hyperparameter Tuning
The tuning of hyperparameters is essential to select the optimal values that improve the performance of the model.One method of automated hyperparameter selection is the grid search technique, which investigates all possible hyperparameter values in a pre-defined domain [75].The hyperparameters are optimized by a cross-validated grid search over a parameter grid, considering all possible combinations.It selects the set of hyperparameters with the maximum score on a validation set.At last, a final evaluation of the model is performed in the test set, allowing the understanding of how the model is performing on unseen data.
The proper manner of choosing multiple hyperparameters of an estimator is the grid search.However, grid search techniques can come at the cost of great computing time when having several parameters, with dozens or even hundreds of values each.It is sometimes helpful to plot the influence of a single hyperparameter on the training and validation scores to find if the estimator is overfitting or underfitting for some hyperparameter values.

Model Performance and Uncertainty Metrics
Various statistical measures were used to evaluate the performance of the ML models.The mean absolute error (MAE) is the averaged absolute difference between the actual and the predicted values.The equation for the MAE is the following: where y i and ŷi are the actual and the predicted value of the i-th sample, respectively, for a total of n samples.The root-mean-squared error (RMSE) is calculated as the square root of the average squared errors.The RMSE is computed as follows: The coefficient of variation of the root mean squared error (cvRMSE), or the scatter index (SI), is the ratio of the RMSE to the mean of the actual values y = ∑ n i=1 y i .It represents the percentage of RMSE with respect to the mean of observations and gives the expected error.It can be calculated as follows: The coefficient of determination (R 2 ) represents the proportion of variation in the dependent variable that is predicted by the independent variables in the model.It is an indicator of goodness of fit and a measure of how well the model predicts the outcome.When the R 2 value is 1, the predicted and the true values are perfectly aligned.R 2 has the following mathematical representation: The mean absolute percentage error (MAPE) is an evaluation metric sensitive to relative errors.The smaller its value, the better.It is given by: The root-relative-squared error (RRSE) is the square root of the sum of squared errors of a predictive model normalized by the sum of squared errors of a simple model.The lower its value, the better the model.It can be calculated as: Another performance metric used in this study is the variance accounted for (VAF), with the best possible score of 1.The following equation was used to compute this metric.
The model uncertainty is defined as the inability of the model to effectively express the torsional capacity.In this research, the predictive model uncertainty related to beam i is equal to the ratio between the experimental and the predicted torsional capacity: where M i is the model uncertainty of the i-th beam sample.The mean and standard deviation of the model uncertainty are represented by σ M and µ M , respectively.It is better to obtain a model with a mean µ M close to 1 and a standard deviation σ M close to 0.

Programming Languages and Software
The Python programming language combined with the Scikit-learn library was used to build the models for the torsional capacity prediction.Scikit-learn [76] is an open-source ML library in Python, containing many algorithms and methods, such as classification, clustering, and regression, in addition to being used in data processing and model evaluation.

Parameters Description of the Regression Functions
For the Decision Tree (DT), the module sklearn.treeprovides the function DecisionTreeRegressor.This function supports four different criteria to measure the quality of a split via the parameter criterion: the MSE, the MSE with improvement score by Friedman, the MAE and the Poisson deviance.The size of the regression tree can be controlled by specifying the parameter max_depth, which limits the number of nodes in the tree.The maximum number of features to consider when looking for the best split is given by the parameter max_features.
For the Bagging Meta-Estimator (BME), the module sklearn.ensembleprovides the function BaggingRegressor, taking as input an estimator along with parameters specifying the strategy to draw random subsets.The base estimator used was a DT regressor.The parameter n_estimators is the number of base estimators in the ensemble.The parameters max_samples and max_features control the size of the subsets in terms of samples and features to train each base estimator, while bootstrap and bootstrap_features control whether samples and features are drawn with or without replacement.
For the Forests of Randomized Trees (FRT), the module sklearn.ensembleprovides the function RandomForestRegressor.The parameter n_estimators is the number of trees in the forest.This function also supports the same four criteria to measure the quality of a split via the parameter criterion as the DecisionTreeRegressor.When building trees, the best node split can be found either from all input features or a random subset of size max_features.Bootstrap samples are used by default and the fraction of samples to train each base estimator is given by the parameter max_samples.
For the AdaBoost (AB), the module sklearn.ensembleprovides the function AdaBoostRegressor, taking as input a user-specified estimator from which the ensemble is built along with other parameters.The base estimator used was a DT regressor.The learning rate can be set via the parameter learning_rate, which controls the contribution of each tree in the final combination.This strongly interacts with the parameter n_estimators.Empirical evidence suggests that small values of learning_rate favor better test error and lead to better model generalization [71].Smaller values of learning_rate require larger numbers of weak learners to maintain a constant training error, and this comes at the cost of greater computing time.It is recommended to set the learning rate to a small constant and choose n_estimators by early stopping [71].This function also supports four different loss functions when updating the weights after each boosting iteration via the parameter loss: the linear, the square and the exponential.
Finally, for the Gradient Tree Boosting (GTB) the module sklearn.ensembleprovides the function GradientBoostingRegressor, which uses gradient boosted trees.It supports four different loss functions to be optimized via the parameter loss: the squared error, the absolute error, the huber and the quantile.The Huber loss function is a combination of the first two loss functions.It applies the squared error loss for small deviations from the actual value and the absolute error loss for large deviations.There is also a parameter alpha that dictates the threshold between these losses.This function also supports two different criteria to measure the quality of a split via the parameter criterion: the MSE and the MSE with improvement score by Friedman.The size of the individual trees can be controlled by the parameter max_depth.The subsample parameter represents the fraction of samples to be used for fitting the individual base learners.The learning rate can be set via learning_rate and the number of estimators can be set via n_estimators.The number of features to consider when looking for the best split can be given by max_features.

Hyperparameter Optimization
The hyperparameters of each model were optimized using a grid search process and a five-fold cross-validation was performed in the training set.For all methods, an exhaustive search over specified parameter values was performed using the function GridSearchCV from the module sklearn.model_selection.For the BME model, the parameters that control whether samples and features are drawn with or without replacement were set according to their default stage (samples drawn with replacement and features drawn without replacement).Since averaging methods work best with strong and complex models, the trees for these methods were not given a maximum depth.For the AB and the GTB models, the learning was set to 0.01.
For each model, the tuned values for each hyperparameter are shown in Table 4.The coefficient of determination (R 2 ) was used as the statistical error to obtain hyperparameters with maximum accuracy while minimizing overfitting.Since there are only three inputs, the maximum number of features when splitting a node was considered to be the default value for all the models, which means all of them.The maximum depth for the DT, AB and GTB models was searched in the range between 3 and 15.The number of estimators was searched in the range between 10 and 100 for the BME and the FRT models, and in the range between 100 and 1500 for the AB and GTB models, since for these ones a low value for the learning rate was chosen.The fractions of the maximum number of samples, the alpha and the subsample were searched in the range 0.1-1.0.A total of 161 data samples was used to train all the models, and 41 samples were used to test them.The five-fold cross-validation results and statistical breakdown are shown in Table 5, where the coefficient of variation (COV) based on the average R 2 and the standard deviation (STD) are presented for each model.There is no noticeable fluctuation in the results of the five-folds, and the overall accuracy remains very good for all the models.In particular, the FRT model showed to be excellent with the smallest value for the COV of 0.6857%.The validation curve for the coefficient of determination as a function of the maximum depth of the tree for the chosen hyperparameters is represented in Figure 6 for the DT, AB and GTB models.For this, validation_curve from the module sklearn.model_selection was used.As referred earlier, for the BME and FRT model the trees were not given a maximum depth, since they are averaging methods.Figure 6 shows that, for all models, the R 2 quickly reaches a plateau, and the results stop significantly improving beyond a critical number of maximum depths.It can also be observed that the models are not overfitting, since the training and validation scores are both high.This way, the used values of max_depth were deemed as satisfactory.
the standard deviation (STD) are presented for each model.There is no noticeable fluctuation in the results of the five-folds, and the overall accuracy remains very good for all the models.In particular, the FRT model showed to be excellent with the smallest value for the COV of 0.6857%.The validation curve for the coefficient of determination as a function of the maximum depth of the tree for the chosen hyperparameters is represented in Figure 6 for the DT, AB and GTB models.For this, validation_curve from the module sklearn.model_selection was used.As referred earlier, for the BME and FRT model the trees were not given a maximum depth, since they are averaging methods.Figure 6 shows that, for all models, the R 2 quickly reaches a plateau, and the results stop significantly improving beyond a critical number of maximum depths.It can also be observed that the models are not overfitting, since the training and validation scores are both high.This way, the used values of max_depth were deemed as satisfactory.A validation curve for the number of estimators was calculated to check the performance of the models (except for the DT since it only has one estimator).The validation curve for the R 2 as a function of the number of estimators for the chosen hyperparameters is represented in Figure 7.As it can be observed, the R 2 quickly reaches a plateau, and the results stop significantly improving beyond a critical number of trees.The larger the better, but also the longer it will take to compute.This way, the used values of n_estimators were deemed as satisfactory.It can also be observed from the plot that the models are not overfitting, since the training validation scores are both high.
Appl.Sci.2023, 13, x FOR PEER REVIEW 16 of 30 A validation curve for the number of estimators was calculated to check the performance of the models (except for the DT since it only has one estimator).The validation curve for the R 2 as a function of the number of estimators for the chosen hyperparameters is represented in Figure 7.As it can be observed, the R 2 quickly reaches a plateau, and the results stop significantly improving beyond a critical number of trees.The larger the better, but also the longer it will take to compute.This way, the used values of n_estimators were deemed as satisfactory.It can also be observed from the plot that the models are not overfitting, since the training validation scores are both high.A learning curve shows the validation and training score of an estimator for several numbers of training samples.It is used to find how much the model benefits from adding more training data and whether the estimator suffers more from a variance or a bias error.The learning curves for all the models with the respective hyperparameters chosen are represented in Figure 8.The function learning_curve from the module sklearn.model_selection was used.Overall, for small amounts of data, the training score is much greater than the validation score.Adding more training samples increases generalization.

Performance on Testing Set
The prediction performance of the proposed methods can be tested after the best hyperparameters have been identified.The performance metrics of the models are reported in Table 6.The prediction findings for the testing set are shown in Figure 9, with the x and y axes representing the experimental and predicted torsional capacity, respectively.

Performance on Testing Set
The prediction performance of the proposed methods can be tested after the best hyperparameters have been identified.The performance metrics of the models are reported in Table 6.The prediction findings for the testing set are shown in Figure 9, with the x and y axes representing the experimental and predicted torsional capacity, respectively.Overall, the difference between the predicted and actual torsional capacity (MAE) is low, with an average between 6.9 and 10.7 kNm.The low values obtained for RMSE, cvRMSE, MAPE and RRSE, and the high values obtained for R 2 and VAF are very acceptable and show that the used models based on Ensembles of Trees can be considered effective in estimating the torsional capacity of RC beams.From Table 6, the best models in terms of the cvRMSE are the BME, FRT and GTB models, with values of 11.41%, 10.83% and 10.04%, respectively.
Figure 10 presents the histograms of the predicted torsional capacity T R_Exp from all the models compared with the actual torsional capacity T R_Pred .It also presents its normal distribution, for which the mean value and standard deviation are presented in Table 7.The normal distribution shows that the error is dispersed randomly.
Overall, the difference between the predicted and actual torsional capacity (MAE) is low, with an average between 6.9 and 10.7 kNm.The low values obtained for RMSE, cvRMSE, MAPE and RRSE, and the high values obtained for R 2 and VAF are very acceptable and show that the used models based on Ensembles of Trees can be considered effective in estimating the torsional capacity of RC beams.From Table 6, the best models in terms of the cvRMSE are the BME, FRT and GTB models, with values of 11.41%, 10.83% and 10.04%, respectively.
Figure 10 presents the histograms of the predicted torsional capacity TR_Exp from all the models compared with the actual torsional capacity TR_Pred.It also presents its normal distribution, for which the mean value and standard deviation are presented in Table 7.The normal distribution shows that the error is dispersed randomly.

Comparison with the Results from Previous Studies
When compared with the results from previous studies using ML models, referred in Section 1 and related mainly with RC beams under pure torsion, the following can be stated.
The research article from Deifalla and Salem [44] was the only one in which the authors applied two models based on ET, namely a Boosted and a Bagged model, to predict the torsional gain capacity of externally bonded FRP-RC beams, and not the full torsional capacity of common RC beams.They used additional inputs incorporating data about the strengthening system and a dataset with 157 beams.Since the predicted variable was different, only the R 2 parameter is discussed here.With their models, the referred authors achieved values of 0.71 and 0.47 for the Boosted and Bagged models, respectively.These values are much lower than the ones achieved in this study.Reference [44] also used other models, namely four models based on a Gaussian Process Regression and five models based on Neural Networks.For all these models, the R 2 ranged between 0.56 and 0.93, which still remains lower than the ones from this study.
Henedy et al. [47] used the same database and the same inputs as in the present research, and applied a M5P model tree and a non-linear regression model to predict the torsional capacity of the RC beams.For the first one and for the testing set, they achieved the following metric performances: MAE = 8.224, RMSE = 13.432 and R 2 = 0.981.When compared with the metric performances in Table 6 (also for the testing set), it can be stated that the models used in this study achieved similar results, although three of the models (BME, FRT and GTB) showed to be slightly better.
Cevik et al. [42] used a Genetic-programming-based model to predict the torsional capacity of RC beams using a dataset with only 76 beams.The authors used 5 variables among 12 available as inputs.For the testing set, the following metric performances were achieved: RMSE = 12.9 and R 2 = 0.95.Based on the results from Table 6 (also for the testing set), the same three previously referred models (BME, FRT and GTB) were shown to be slightly better in terms of RMSE.Furthermore, all models showed R 2 values higher than 0.95.
Finally, [40,41,43,48] used different artificial neural network (ANN) or convolutional neural network (CNN) models to predict the torsional capacity of RC beams, with a much higher number of inputs (11 or 12) and different datasets (with 76 beams for [41,43], 240 beams for [40] and 268 beams for [48]).Regarding the same metric performances used in this study, for reference [40] and for four different ANN models (BP and GA-BP neural networks, optimized BP and GA-BP neural networks), the metric performances ranged between the following values: MAE = 6.742 to 11.548, RMSE = 10.154 to 17.758 and R 2 = 0.846 to 0.950.For reference [41], with only one ANN model (hybrid neural network: GA-MLP), the metric performances were as follows: MAE = 6.94 and R 2 = 0.980.For reference [43] where eleven ANN algorithms were used, the author achieved the following range of values: R 2 = 0.9496 to 0.9876.Finally, in [48], the authors proposed an improved bird swarm algorithm optimized 2D CNN, for which the following metric performance was achieved: MAE = 2.9875.When compared with the results from Table 6, it can be stated that the ET models in this study achieved very similar performance metrics as the ones from the ANN models, but with much lesser inputs.
Based on the previous comparisons, it can be stated that ET models, such as the ones used in this study, can be considered predictive models as good as the best models from the previously referred studies to predict the torsional capacity of RC beams.

Sensitivity Analysis
The importance of each input variable in the prediction of the torsional capacity of RC beams is determined by performing a sensitivity analysis.This analysis is carried out by removing each predictor from the database, one at a time, and training and testing the proposed models using the resulting dataset.The MAE, RMSE, cvRMSE and R 2 performance measures resulting from testing were used to analyze the importance of each input variable.These results are presented in Table 8, in which it can be noticed that the cross-section area A c is the most important variable of the three inputs when predicting the torsional capacity of the RC beams.The concrete compressive strength f c in its turn is the variable with the least impact on the performance of the models.These results confirm the ones from a previous study from the authors using different models to predict the torsional capacity of RC beams for the same dataset [47].

Conclusions
An investigation on the use of ML methods based on the Ensemble of Trees (ET) for the prediction of the torsional capacity of RC beams was conducted in this study.A database with 202 RC beam samples was created and randomly divided into testing and training sets.Using a five-fold cross-validation procedure paired with a grid search strategy, optimal hyperparameters for all the models were found based on the training dataset.The testing dataset was used to validate the performance of the built models.From the obtained results, the following main conclusions that can be drawn:

•
A simple Decision Tree (DT) regressor predicts the torsional capacity of RC beams with satisfactory accuracy.The model has a R 2 value of 0.973 and a cvRMSE value of 16.78% for the testing set; • It was shown that ET reach higher accuracies than a simple DT.The Bagging Meta-Estimator (BME), the Forests of Randomized Trees (FRT), the AdaBoost (AB) and the Gradient Tree Boosting (GTB) reached values of R 2 in the range between 0.982 and 0.990, and values of cvRMSE in the range between 10.04% and 13.92%.These results indicate that the prediction capability of these four models can be trusted with high confidence, in particular for the GTB and FRT models; • The cross-section area A c was shown to be the most important variable of the three inputs when predicting the torsional capacity of the RC beams, while the concrete compressive strength f c was shown to be the variable with the least impact; • With low error measurements and mean values µ M near unity, the results showed that these four ET methods (BME, FRT, AB and GTB) can be considered predictive models as good as the best models from the referred previous studies to predict the torsional capacity of RC beams.
With only 202 samples, the dataset used to develop the prediction models can be considered somewhat small, which represents a limitation of this investigation.The accuracy of the models can be improved in the future by using a larger dataset.
Finally, upon request to the corresponding author, the Python scripts developed in this article can be made available, namely for researchers and practitioners.

Figure 1 .
Figure 1.Flowchart with the research methodology.

Figure 1 .
Figure 1.Flowchart with the research methodology.

Figure 2 .
Figure 2. Histograms of key parameters for the RC beams: (a) total reinforcement ratio, (b) concrete compressive strength, (c) yielding stress of longitudinal reinforcement, (d) yielding stress of transverse reinforcement.

Figure 2 .
Figure 2. Histograms of key parameters for the RC beams: (a) total reinforcement ratio, (b) concrete compressive strength, (c) yielding stress of longitudinal reinforcement, (d) yielding stress of transverse reinforcement.

Figure 4 .
Figure 4. Scatter plot between pairs of variables.Figure 4. Scatter plot between pairs of variables.

Figure 4 .
Figure 4. Scatter plot between pairs of variables.Figure 4. Scatter plot between pairs of variables.

Figure 6 .
Figure 6.Validation curve for the maximum depth of the trees: (a) DT, (b) AB and (c) GTB.

Figure 6 .
Figure 6.Validation curve for the maximum depth of the trees: (a) DT, (b) AB and (c) GTB.

Figure 7 .
Figure 7. Validation curve for the number of estimators: (a) BME, (b) FRT, (c) AB and (d) GTB.A learning curve shows the validation and training score of an estimator for several numbers of training samples.It is used to find how much the model benefits from adding more training data and whether the estimator suffers more from a variance or a bias error.The learning curves for all the models with the respective hyperparameters chosen are represented in Figure 8.The function learning_curve from the module sklearn.model_selection was used.Overall, for small amounts of data, the training score is much greater than the validation score.Adding more training samples increases generalization.

Table 2 .
Statistical measures of the variables for the training set.

Table 3 .
Statistical measures of the variables for the testing set.

Table 4 .
Hyperparameters for each model.

Table 6 .
Performance metrics for the testing results.

Table 6 .
Performance metrics for the testing results.

Table 7 .
Mean and standard deviation for testing results.

Table 8 .
Effect of input variables on the performance of the ML models.

Table A1 .
Geometric and mechanical properties of the reference RC beams.