Accurate Prediction of Punching Shear Strength of Steel Fiber-Reinforced Concrete Slabs: A Machine Learning Approach with Data Augmentation and Explainability

: Reinforced concrete slabs are widely used in building structures due to their economic, durable, and aesthetic advantages. The determination of their ultimate strength often hinges on punching shear strength. Presently, methods such as closed hoops, steel bending, and fiber reinforcement are employed to enhance punching shear strength, with fiber reinforcement gaining popularity due to its ease of implementation and efficacy in improving concrete durability. This study introduces a novel approach employing six machine learning algorithms rooted in decision trees and decision tree-based ensemble learning to predict punching shear strength in steel fiber-reinforced concrete slabs. To overcome experimental data limitations, a data augmentation approach based on the Gaussian mixture model is employed. The validation of the data augmentation is conducted through “synthetic training—real testing” and “real training—real testing”. Additionally, the best machine learning model is analyzed for explainability using Shapley Additive exPlanation (SHAP). Results demonstrate that the proposed data augmentation method effectively captures the original data distribution, enhancing the robustness and accuracy of the machine learning model. Moreover, SHAP provides better insights into the features influencing punching shear strength. Thus, the proposed data enhancement model offers a reliable approach for modeling small experimental datasets in structural engineering.


Introduction
Nowadays, reinforced concrete and its combined structures are widely used [1][2][3].Reinforced concrete slabs are widely employed in construction for their superior structural strength and excellent durability [4,5].Compared to traditional construction materials and methods, utilizing reinforced concrete flat slabs accelerates construction timelines, diminishes uncertainties, and lowers risks during the construction process [6,7].In addition, utilizing these slabs offers greater flexibility in building design, empowering designers to fulfill diverse innovative and functional requirements [8,9].Studies indicate that the ultimate strength of reinforced concrete flat slabs typically hinges on the punching shear strength at the slab-column joint [10].Following punching, the residual strength of the slab significantly decreases compared to the punching load, potentially leading to progressive building collapse if one column shears, causing adjacent columns to rapidly overload and fail in punching shear [11].Currently, several methods exist to enhance shear resistance, including closed hoops, steel bending, shear studs, or post-shear reinforcement.Recently, researchers have delved into leveraging fiber-reinforced concrete (FRC) to increase the punching shear resistance.Numerous studies affirm that FRC slabs exhibit improved strength and ductility in punching shear.Among various fibers, steel fibers stand out for their widespread application in reinforced concrete slabs, owing to their superior strength, toughness, and punching shear resistance [12,13].
Current codes for slab-column connections, such as ACI 318-11, JSCE, and fib Model Code 2010, were developed for plain concrete structures [14][15][16][17].However, the cracking and punching shear strength of steel fiber concrete (SFRC) structures diverge significantly from that of conventional ones.Therefore, there is an urgent need to introduce a punching shear strength prediction-tailored model for SFRC structures.Narayanan and Darwish proposed a design equation considering various factors such as the compression zone strength above the inclined crack, pull-out shear on the fibers along the inclined crack, and shear forces from dowel pins and film action to evaluate punching shear strength [18].Harajli et al. introduced a best-fit linear regression model for SFRC slab-column connections, incorporating empirical design equations for punching shear strength based on concrete and fiber coupling contributions [19].Choi et al. conducted a theoretical study and proposed a design equation based on the FRC failure criteria for thin slabs with large spans and thicknesses.The equation considers the contribution of compressive and tensile zones in the critical section and assumes that the punching shear strength of these two zones is controlled by tensile cracking rather than compressive crushing [20].Higashiyama et al. proposed a design equation based on the JSCE to evaluate the punching shear strength of plain concrete slab-column connections, considering factors such as fiber pull-out strength and critical section perimeter based on fiber properties [21].In addition, Maya et al. proposed a design equation for SFRC punching shear strength based on the critical shear crack theory and verified its superiority over existing models of Narayanan and Darwish, Harajli et al., and Higashiyama et al. through experimental data analysis [22].While these models advanced SFRC punching shear strength studies, some issues persist.For example, the empirical models by Narayanan and Darwish, Harajli et al., and Higashiyama et al. lack consistency with the methodology adopted in the current code, and the model of Maya et al. risks overestimating SFRC punching shear capacity with room for accuracy enhancement.Recently, Hoang developed a shear capacity prediction model using multiple linear regression and artificial neural networks based on experimental data [23], showcasing machine learning's potential in SFRC shear strength prediction.However, the model's generalization and explainability warrant improvement.Given these challenges, there is a crucial need for a highly accurate, generalizable, and explainable model for SFRC punching shear strength assessment.
In recent years, ensemble learning combined with SHapley Additive exPlanation (SHAP) has been widely used in structural engineering due to its high accuracy and explainability.Wang et al. used four standalone learning models and two ensemble learning models to predict the bond strength between steel sections and concrete.The results show that the ensemble learning model is much better than the standalone model [24].Cakiroglu et al. used Extreme Gradient Boosting, Light Gradient Boosting Machine, Random Forest, and Categorical Boosting to predict the splitting tensile strength of concrete reinforced with basalt fibers [25].Feng et al. predicted the creep behavior of recycled aggregate concrete using ensemble learning combined with SHAP and performed feature importance analysis [26].Nguyen et al. predicted the compressive strength of cement-based mortar containing metakaolin using Categorical Gradient Boosting and investigated the features using SHAP [27].
The above study demonstrates the power of ensemble learning and SHAP in structural engineering.The aim of this study is to develop a model that accurately predicts the punching shear strength of SFRC slabs while ensuring generalizability and explainability.To achieve this objective, data are sourced from the published literature and augmented using the Gaussian mixture model (GMM).Subsequently, SFRC punching shear strength prediction models are developed employing six machine learning algorithms rooted in decision trees and decision tree-based ensemble learning.The efficacy of the augmented data in enhancing the robustness and accuracy of the models is evaluated through the "synthetic training-real testing" and "real training-real prediction" methodologies.Finally, the SHAP technique is employed to delve into the explainability of the top-performing algorithms within the ensemble learning model.This research not only aims to deliver precise predictions of SFRC punching shear strength but also underscores the potential of data augmentation techniques, particularly GMM, in machine learning modeling using small experimental datasets in structural engineering.

Workflow
The workflow for this study, as depicted in Figure 1, consists of the following four main components:

Workflow
The workflow for this study, as depicted in Figure 1, consists of the following four main components: Data collection: It involved gathering 140 instances, comprising the following features: slab depth (h), the effective depth of the slab (d), length or radius of the loading pad or column (bc), concrete strength (f'c), the reinforcement ratio (ρ), the fiber volume (ρf), and punching shear strength (V).
Data augmentation: The GMM is utilized to generate 500 datasets.The distribution of the generated data is evaluated based on the probability density curve to ensure it accurately captures the distribution of the original data.
Model development and evaluation: Six machine learning algorithms are employed to develop modes for punching shear strength in steel fiber-reinforced concrete slabs.The models are evaluated using metrics such as goodness of fit.
Model explainability: SHapley Additive exPlanations is employed to provide global and local explanation.Data collection: It involved gathering 140 instances, comprising the following features: slab depth (h), the effective depth of the slab (d), length or radius of the loading pad or column (b c ), concrete strength (f' c ), the reinforcement ratio (ρ), the fiber volume (ρ f ), and punching shear strength (V).
Data augmentation: The GMM is utilized to generate 500 datasets.The distribution of the generated data is evaluated based on the probability density curve to ensure it accurately captures the distribution of the original data.
Model development and evaluation: Six machine learning algorithms are employed to develop modes for punching shear strength in steel fiber-reinforced concrete slabs.The models are evaluated using metrics such as goodness of fit.
Model explainability: SHapley Additive exPlanations is employed to provide global and local explanation.

Gaussian Mixture Model
The Gaussian Mixture Model (GMM) operates under the assumption that multiple multivariate normal distributions exist, each with a probability of generating a data point, and collectively their probabilities sum up to 1.The process of solving the GMM essentially involves estimating the likelihood of observing the data.The model assumes the existence of several multivariate normal distribution generators, each with an associated weight, and the total weights sum up to 1. Based on this data generation process and the observed sample set, the likelihood equation can be formulated.The unknown parameters in this equation include the mean vector and covariance matrix of each multivariate normal distribution, and the probability associated with each generator producing a sample.After solving for the model parameters, it becomes possible to discern from which multivariate normal distribution the samples were likely generated [28].
Let the GMM contain M multivariate normally distributed generators, then the probability that this GMM generates a sample x is: where α m is the probability that the mth multivariate normal distribution generates a sample, and ∅(x|θ m ) is the probability density function of the m th multivariate normal distribution θ m = (µ m , ∑ m ), where µ m denotes the mean vector of the m th multivariate nor- mal distribution component, and ∑ m denotes the covariance matrix of the m th multivariate normal distribution.
To ascertain from which multivariate normal distribution a given sample originates in the model, the parameters of the GMM need to be computed from the dataset, and the model must effectively fit the training set to make the most accurate determination.GMM is inherently a probabilistic model, and the typical approach to solving for its parameters involves maximizing the likelihood function.For a data set with m samples, the likelihood function of a Gaussian mixture model is: where x 1 , . . ., x m are m data in the sample, p( x i |θ) is the probability that the model gener- ates a given sample, and θ denotes all the parameters of the model.Due to the complexity of the likelihood equation, directly solving the optimal parameters is challenging and is typically addressed through the expectation-maximization method.

Ensemble Learning
The study employed ensemble learning techniques, utilizing decision trees as the foundational model.In this field, two main approaches are prominent: bagging and boosting.Bagging, short for Bootstrap Aggregating, entails training multiple instances of DTs on various subsets of the training data, employing bootstrap sampling where some instances may be selected multiple times while others may not be chosen at all.The predictions from each model are then combined, typically through averaging for regression tasks or voting for classification tasks, to reduce variance and mitigate overfitting, particularly beneficial for complex models like decision trees.On the other hand, boosting sequentially trains decision trees, with each subsequent model focusing on correcting errors made by its predecessors.Initially, each data instance is assigned equal weight, but misclassified instances receive higher weights in subsequent iterations, allowing subsequent models to prioritize them.By iteratively refining the model's fit to the data, boosting aims to reduce bias and improve overall predictive performance [29].In this study, DT and DT-based ensemble learning methods are utilized, including Random Forest from bagging, GBDT, XGBoost, LightGBM, and CatBoost from boosting [27,[29][30][31][32][33].These methods are adopted for solving complex civil engineering problems [34,35].

SHAP
SHAP (Shapley Additive explanation) is one of the most popular model-agnostic methods available for enhancing the explainability of machine learning models [36].Grounded in cooperative game theory, SHAP assigns feature importance using Shapley values.The Shapley value for a feature ∅ j (val) is computed as the weighted sum of its marginal contributions across all possible feature subsets as shown in the equation below: where S is a feature subset, x is the feature vector, and p is the number of features.val x (S) represents the prediction for feature values in set S marginalized over features not included in set S: Averaging the absolute Shapley values across various instances, as illustrated in the equation below, yields a more dependable measure of feature importance (I j ).This approach offers a thorough assessment of each feature's impact on the model's predictions, emphasizing features with higher absolute Shapley values as more impactful in the prediction process.

Parameter Selection and Database Construction 4.1. Data Collection and Analysis
A total of 140 sets of experimental data were collected from seven studies [37][38][39][40][41][42][43], encompassing the following features: slab depth (h), the effective depth of the slab (d), length or radius of the loading pad or column (b c ), concrete strength (f' c ), the reinforcement ratio (ρ), and the fiber volume (ρ f ) and punching shear strength.The punching shear strength (V) is designated as the target feature, while the other features serve as input features for analysis.The distribution of each input feature is shown in Table 1, and the correlation coefficients between the parameters are shown in Figure 2. The Pearson correlation coefficient is used to measure the degree of linear correlation between continuous variables, and the Spearman correlation coefficient is used to measure the degree of monotonic correlation between two variables.Figure 2 indicates that the correlation between the input features and the target feature is generally weak, with the exception of h and d, which exhibit relatively strong correlations with the target variable.Despite their strong correlations, h and d are retained as significant parameters influencing V.

Data Augmentation
The distribution of each parameter before and after enhancement is shown in Figure 3.The statistical characteristics of the generated data are shown in Table 2.It is evident from Figure 3 that GMM has learned the distribution of the original parameters well, with the distribution of the augmented data closely resembling that of the data before augmentation.

Data Augmentation
The distribution of each parameter before and after enhancement is shown in Figure 3.The statistical characteristics of the generated data are shown in Table 2.It is evident from Figure 3 that GMM has learned the distribution of the original parameters well, with the distribution of the augmented data closely resembling that of the data before augmentation.

Data Augmentation
The distribution of each parameter before and after enhancement is shown in Figure 3.The statistical characteristics of the generated data are shown in Table 2.It is evident from Figure 3 that GMM has learned the distribution of the original parameters well, with the distribution of the augmented data closely resembling that of the data before augmentation.

Model Construction
Both the original data and the augmented data were utilized for modeling, employing two distinct approaches: M1 and M2.In M1, 80% of the original real values were utilized for the training set and 20% were allocated for testing.On the other hand, M2 utilized the generated values for training and real values for testing.The machine learning algorithms were trained using the six models rooted in the decision tree and decision tree-based ensemble learning introduced in Section 3.2, with the optimal hyperparameters of each algorithm determined through grid search with five-fold cross-validation.

Data Augmentation Validation
The training and test performance of all the models under M1 and M2 are depicted in Figures 4 and 5, respectively.

Model Construction
Both the original data and the augmented data were utilized for modelin ing two distinct approaches: M1 and M2.In M1, 80% of the original real value lized for the training set and 20% were allocated for testing.On the other hand, M the generated values for training and real values for testing.The machine lear rithms were trained using the six models rooted in the decision tree and dec based ensemble learning introduced in Section 3.2, with the optimal hyperpar each algorithm determined through grid search with five-fold cross-validation

Data Augmentation Validation
The training and test performance of all the models under M1 and M2 ar in Figure 4 and Figure 5, respectively.From Figure 4, it can be noticed that there is a large difference in the performance of the model on the training and test sets.Conversely, Figure 5, highlights a significant improvement in the R 2 of each machine learning model on the test set under M2. Figure 6 presents the distribution of the deviations of each algorithm under M1 and M2, offering insights into their robustness.In general, a deviation centered at 0 and normally distributed indicates a model with good robustness.In Figure 6A, it is evident that under M1, DT, GBDT, and XGBoost models exhibit robustness on both training and test sets, with a more uniform deviation distribution.RF, LightGBM, and CatBoost show better robustness on the training set, but their deviation distribution is less stable on the test set, indicating poorer robustness.Conversely, Figure 6B, illustrates the six machine learning models demonstrate good robustness on both the training and test sets, with deviation distributions approximating normality.In addition, DT and GBDT outperform the other models significantly on the test set.This figure underscores the improvement in model robustness with data augmentation (M2).
To further evaluate the model accuracy, Figure 7 examines the models under M1 and M2 using standard deviation and coefficient of variation.In Figure 7a, for the training set, the standard deviation of each model under M2 is lower than M1, except for DT.Similarly, for the test set, the standard deviation of all machine learning models under M2 is lower than M1.In Figure 7b, for the training set, the coefficients of variation of all machine learning models under M2 are lower than M1, except for LightGBM.In addition, for the test set, the coefficients of variation of all models under M2 are lower than M1.In conclusion, the data augmentation method proposed in this work enhances the robustness and accuracy of the machine learning models.To further evaluate the model accuracy, Figure 7 examines the models under M1 M2 using standard deviation and coefficient of variation.In Figure 7a, for the training the standard deviation of each model under M2 is lower than M1, except for DT.Simila for the test set, the standard deviation of all machine learning models under M2 is lo than M1.In Figure 7b, for the training set, the coefficients of variation of all machine le ing models under M2 are lower than M1, except for LightGBM.In addition, for the set, the coefficients of variation of all models under M2 are lower than M1.In conclus the data augmentation method proposed in this work enhances the robustness and a racy of the machine learning models.

Model Performance Evaluation
The robustness and accuracy of the data-augmented models were verified in Section 5.2.The performance of the machine learning models is further evaluated in this section to identify the most suitable algorithms for this research.The performance of each machine learning model is assessed using standard deviation (SD), root mean square devia-

Model Performance Evaluation
The robustness and accuracy of the data-augmented models were verified in Section 5.2.The performance of the machine learning models is further evaluated in this section to identify the most suitable algorithms for this research.The performance of each machine learning model is assessed using standard deviation (SD), root mean square deviation (RMSD), and goodness-of-fit (R 2 ), visualized through a Taylor diagram, as seen in Figure 8.The radial axis indicates the standard deviation of the model.The angle indicates the correlation or agreement between the model predictions and the observations.A smaller angle means that the model predictions are closer to the observations.In addition, a bluer color indicates a smaller root mean square deviation.

Model Performance Evaluation
The robustness and accuracy of the data-augmented models were verified in Section 5.2.The performance of the machine learning models is further evaluated in this section to identify the most suitable algorithms for this research.The performance of each machine learning model is assessed using standard deviation (SD), root mean square deviation (RMSD), and goodness-of-fit (R 2 ), visualized through a Taylor diagram, as seen in Figure 8.The radial axis indicates the standard deviation of the model.The angle indicates the correlation or agreement between the model predictions and the observations.A smaller angle means that the model predictions are closer to the observations.In addition, a bluer color indicates a smaller root mean square deviation.As depicted in Figure 8a, for the training set, XGBoost, GBDT, and CatBoost exhibit significantly better performance compared to LightGBM, RF, and DT, with XGBoost having the smallest SD and RMSD and the largest R 2 .In Figure 8b, on the test set, LightGBM demonstrates the best performance, followed by CatBoost and XGBoost, while DT performs the worst.Considering the performance of the models on both the training and test sets, XGBoost emerges as the most suitable model for this study.As depicted in Figure 8a, for the training set, XGBoost, GBDT, and CatBoost exhibit significantly better performance compared to LightGBM, RF, and DT, with XGBoost having the smallest SD and RMSD and the largest R 2 .In Figure 8b, on the test set, LightGBM demonstrates the best performance, followed by CatBoost and XGBoost, while DT performs the worst.Considering the performance of the models on both the training and test sets, XGBoost emerges as the most suitable model for this study.

Model Explainability
As observed in Section 5.3, XGBoost demonstrates the most balanced performance on both the training and test sets.Therefore, SHAP is employed for the explainability of XGBoost.This plot combines feature importance with feature effects for each instance.Each point on the plot represents a SHAP value associated with a feature and an instance.The y-axis denotes the feature, while the x-axis represents the SHAP value.The color of the points corresponds to the feature value, ranging from low to high.As observed in Figure 9, the order of importance of the features on the punching shear strength of steel fiber-reinforced concrete slabs is as follows: h, d, b c , f' c , ρ f, and ρ.Additionally, it is evident that for all features, except d and b c , higher magnitudes result in positive SHAP values, indicating a positive impact on the prediction of punching shear strength of steel fiber-reinforced concrete slabs.Conversely, lower magnitudes of these features adversely affect the prediction.
In Figure 9, the global interpretation of features reveals the overall impact on punching shear strength, yet individual feature effects can vary across samples.For instance, considering the fifth specimen with a real punching shear strength of 402 KN, the SHAP waterfall plot in Figure 10 illustrates that the values of concrete strength (f' c ), slab depth (h), the effective depth of the slab (d), fiber volume (ρ f ), and reinforcement ratio (ρ) are all positive (shown in red), indicating that they have a positive effect on the punching shear strength.Among them, the SHAP value of concrete strength is the largest, indicating that it has the greatest effect for the fifth specimen, while the length or radius of the loading pad or column (b c ) has a negative impact (shown in blue).Notably, the minimal value of b c for this specimen, as seen in Figure 10, correlates with more negative SHAP values, indicating its adverse effect on prediction.Furthermore, the XGBoost model's prediction of 399.412KN for the fifth specimen aligns closely with its true value of 402 KN, showcasing high prediction accuracy.

Model Explainability
As observed in Section 5.3, XGBoost demonstrates the most balanced performance on both the training and test sets.Therefore, SHAP is employed for the explainability of XGBoost.This plot combines feature importance with feature effects for each instance.Each point on the plot represents a SHAP value associated with a feature and an instance.The y-axis denotes the feature, while the x-axis represents the SHAP value.The color of the points corresponds to the feature value, ranging from low to high.As observed in Figure 9, the order of importance of the features on the punching shear strength of steel fiber-reinforced concrete slabs is as follows: h, d, bc, f'c, ρf, and ρ.Additionally, it is evident that for all features, except d and bc, higher magnitudes result in positive SHAP values, indicating a positive impact on the prediction of punching shear strength of steel fiberreinforced concrete slabs.Conversely, lower magnitudes of these features adversely affect the prediction.In Figure 9, the global interpretation of features reveals the overall impact on punching shear strength, yet individual feature effects can vary across samples.For instance, considering the fifth specimen with a real punching shear strength of 402 KN, the SHAP waterfall plot in Figure 10 illustrates that the values of concrete strength (f'c), slab depth (h), the effective depth of the slab (d), fiber volume (ρf), and reinforcement ratio (ρ) are all positive (shown in red), indicating that they have a positive effect on the punching shear strength.Among them, the SHAP value of concrete strength is the largest, indicating that it has the greatest effect for the fifth specimen, while the length or radius of the loading pad or column (bc) has a negative impact (shown in blue).Notably, the minimal value of bc for this specimen, as seen in Figure 10, correlates with more negative SHAP values, indicating its adverse effect on prediction.Furthermore, the XGBoost model s prediction of 399.412KN for the fifth specimen aligns closely with its true value of 402 KN, showcasing high prediction accuracy.

Conclusions
This study introduces a data augmentation method employing the Gaussian mixture model to expand small experimental datasets, with the goal of enhancing the performance of machine learning models.Subsequently, SFRC punching shear strength prediction models are developed using six algorithms rooted in decision trees and decision treebased ensemble learning models.The SHAP technique is then applied to comprehensively elucidate the significance and dependencies within the best-performing model.The following conclusions were reached: (1) The adopted Gaussian mixture model effectively captures the distribution of features in the dataset, with the probability density function curves of the generated data closely aligning with those of the original data.(2) When subjected to the "synthetic training-real testing" condition, the machine learning models demonstrate significantly enhanced accuracy and robustness are compared to the "real training-real prediction" scenario.Notably, XGBoost exhibits the

Conclusions
This study introduces a data augmentation method employing the Gaussian mixture model to expand small experimental datasets, with the goal of enhancing the performance of machine learning models.Subsequently, SFRC punching shear strength prediction models are developed using six algorithms rooted in decision trees and decision tree-based ensemble learning models.The SHAP technique is then applied to comprehensively elucidate the significance and dependencies within the best-performing model.The following conclusions were reached:

Figure 1 .
Figure 1.Workflow of the study.

Figure 1 .
Figure 1.Workflow of the study.

Figure 3 .
Figure 3. Data distribution before and after augmentation.

Figure 3 .
Figure 3. Data distribution before and after augmentation.

Figure 3 .
Figure 3. Data distribution before and after augmentation.

Buildings 2024 ,
14, x FOR PEER REVIEW 9 o (A) Distribution of deviations under M1

Figure 6 .Figure 7 .
Figure 6.Distribution of deviations in machine learning models.Buildings 2024, 14, x FOR PEER REVIEW 11 of 15

Figure 7 .
Figure 7. Standard deviation and coefficient of variation of the model.

Figure 7 .
Figure 7. Standard deviation and coefficient of variation of the model.

Figure 8 .
Figure 8. Performance of each machine learning model.

Figure 8 .
Figure 8. Performance of each machine learning model.

Table 1 .
Statistical distribution of parameters.

Table 2 .
Statistical distribution of the augmented parameters.

Table 2 .
Statistical distribution of the augmented parameters.

Table 2 .
Statistical distribution of the augmented parameters.