Interpretable Machine Learning Reveals Synergy-Gain Windows and Dual-Objective Mix-Proportion Boundaries for Compressive Strength and Peak Strain in Hybrid Steel–PVA Fiber-Reinforced Concrete

Liu, Maojun; Chen, Junwen; Zhou, Shengkai

doi:10.3390/buildings16101927

Open AccessArticle

Interpretable Machine Learning Reveals Synergy-Gain Windows and Dual-Objective Mix-Proportion Boundaries for Compressive Strength and Peak Strain in Hybrid Steel–PVA Fiber-Reinforced Concrete

by

Maojun Liu

^*,

Junwen Chen

and

Shengkai Zhou

College of Civil Engineering and Geomatics, Nanning Campus, Guilin University of Technology, Nanning 530001, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(10), 1927; https://doi.org/10.3390/buildings16101927

Submission received: 10 April 2026 / Revised: 5 May 2026 / Accepted: 9 May 2026 / Published: 12 May 2026

(This article belongs to the Section Building Structures)

Download

Browse Figures

Versions Notes

Abstract

Hybrid steel–PVA fiber-reinforced concrete offers promise for enhancing both load-bearing capacity and deformation capacity. However, the coupled effects of fiber parameters and volume-fraction combinations on compressive strength (σc) and peak strain (εc) are still not fully understood. A unified, interpretable, and engineering-oriented quantitative framework is still lacking. This study compiled experimental data from 26 published literature, building a multi-source database consisting of 397 datasets for σc and 203 datasets for εc. Based on this database, a comprehensive analytical framework was proposed, including model prediction, SHAP-based interpretation, Monte Carlo marginalization, synergy-gain window determination, and dual-objective mix-proportion optimization. For σc prediction, LightGBM achieved the highest test-set R² (0.9783), whereas CatBoost showed more robust error control (MAE = 2.7409 MPa). CatBoost was therefore selected as the base model for the subsequent interpretation analysis. For εc prediction, Bayesian-optimized CatBoost achieved the best test performance (R² = 0.9659, MAE = 0.0218, RMSE = 0.0358), while the transfer-learning model reached a comparable accuracy level (R² = 0.9650). SHAP analysis revealed that σc is mainly governed by matrix mix-proportion factors and steel fiber volume fraction, whereas εc is more sensitive to S/B and PVA-related variables. The mean synergy-gain maps generated via Monte Carlo marginalization and two-dimensional grid evaluation further showed clear differences between the two targets. Positive synergy in σc was highly localized. Its maximum mean synergy gain was 4.7949 MPa at (Steel, PVA) = (1.875%, 2.000%). By contrast, εc exhibited a wider positive-synergy region, with a peak value of 0.0141629 at (0.38%, 1.62%). Therefore, the engineering output of this study is not a single optimal mix point. Instead, it is a set of candidate windows for different performance targets, together with boundary-risk identification and priorities for experimental validation.

Keywords:

hybrid steel–PVA fiber-reinforced concrete; compressive strength; peak strain; interpretable machine learning; dual-objective mix proportion

1. Introduction

Steel and PVA fibers play complementary roles in toughening concrete. Steel fibers mainly improve post-cracking load resistance and energy absorption, whereas PVA fibers are more effective in controlling microcracks and enhancing ductility. When the two fibers are properly combined in terms of properties and dosage, they may work together to improve both strength and deformation capacity. However, hybrid fiber systems involve the interaction of many factors, such as matrix composition, fiber geometry, mechanical properties, and fiber content. Because of this complexity, conventional experiments alone are often not sufficient to reveal the underlying interaction patterns in a systematic way.

In recent years, experimental studies on the mechanical performance of hybrid steel–PVA fiber-reinforced concrete have steadily increased. Zhou et al. [1] systematically investigated its uniaxial compressive constitutive behavior through an orthogonal experimental design and confirmed that the combined use of steel and PVA fibers can significantly improve failure behavior and energy dissipation. Liu et al. [2] reported that the hybrid-fiber effect is jointly governed by matrix mix proportion and fiber volume fraction. Abbas et al. [3] developed a compressive stress–strain constitutive model for hybrid steel–PVA fiber-reinforced concrete and quantified the effects of fiber parameters on peak stress and peak strain. Wu et al. [4] further demonstrated, from the perspective of flexural behavior, the synergistic advantages of hybrid fibers in post-cracking toughness and deformation capacity. These studies provide an important basis for understanding the reinforcing mechanisms of hybrid fibers. However, because of the limited experimental scope and sample size, it is still difficult to fully capture the nonlinear interactions arising from multi-factor coupling.

With the wider use of data-driven methods in building materials research, machine learning has become a common tool for predicting concrete performance and exploring hidden patterns in the data. Kang et al. [5] showed that tree-based models can capture the nonlinear behavior of fiber-reinforced concrete with good accuracy. Al-Shamasneh et al. [6] reported that ensemble learning performs robustly in predicting the compressive strength of steel fiber-reinforced concrete. Sofos et al. [7] and Cui et al. [8] further demonstrated that machine learning is applicable to complex material–structure problems involving FRP-confined concrete and related members.

Even so, most existing studies still use machine learning mainly as a black-box predictor. Much less attention has been given to questions that matter more in engineering practice, such as whether synergistic enhancement really exists, where it appears in terms of fiber dosage, and whether such patterns are backed by sufficient data.

Although machine learning has shown strong predictive ability in concrete research, most existing studies still treat the model as a black-box predictor. They pay much less attention to engineering questions that are more practically relevant, such as whether synergistic enhancement exists, in which volume-fraction ranges it occurs, and whether the observed pattern is supported by sufficient data.

Against this limitation in existing studies, machine learning-driven multi-objective optimization has emerged as a key approach to overcoming the dimensional bottlenecks of experimental research. Zhang et al. [9,10,11] applied Pareto fronts and metaheuristic algorithms to examine trade-offs among strength, economic performance and other indicators in normal, silica-fume and recycled-aggregate concrete. However, these studies have yet to address the synergistic optimization of key mechanical properties in hybrid steel–PVA fiber-reinforced concrete.

For engineering design, it is essential to clarify how fiber reinforcement works. In this study, synergy gain is defined as the super-additive effect of hybrid steel–PVA fibers relative to single-fiber reinforcement. Thus, the key question is not only whether the prediction is accurate. It also includes whether synergistic fiber enhancement exists, in which volume-fraction ranges it appears, and under what data-support conditions it can be interpreted with reasonable confidence.

Therefore, this study does not treat machine learning simply as a black-box predictor. Instead, it combines machine learning with interpretable analysis, marginal-response modeling, and synergy-gain quantification to build a knowledge-extraction framework for engineering-oriented screening. Using compressive strength (σc) and peak strain (εc) as two target indicators of load-bearing capacity and deformation capacity, respectively, this study further proposes an overlay strategy for dual-objective synergy windows. This strategy provides support for preliminary mix screening and for setting priorities in experimental validation.

The main contributions of this study are as follows: (1) an interpretable machine learning framework was established for the dual objectives of σc and εc; (2) the mean synergy-gain surface of steel and PVA fibers was defined and quantified based on Monte Carlo marginalization; and (3) the synergy boundaries of σc and εc were identified, and a dual-objective mix-proportion screening logic was constructed.

Compared with most existing studies, which focus mainly on accuracy comparison, single-indicator interpretation, or single-performance prediction, the main novelty of this study lies in the identification of dual-objective synergy windows and in the design-oriented conclusions provided for material selection and mix-proportion optimization in engineering practice.

2. Materials and Methods

2.1. Dataset Construction and Definition of the Feature System

2.1.1. Data Sources and Sample Composition

The database used in this study was compiled from uniaxial compression test data on hybrid steel–PVA fiber-reinforced concrete collected from published literature and academic theses [1,3,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]. After data cleaning, data standardization, and specimen-size normalization, the compressive-strength dataset (σc) contained 397 samples, whereas the peak-strain dataset (εc) contained 203 samples. The two datasets cover a range of conditions, including plain matrix mixtures, single-fiber mixtures, and hybrid steel–PVA fiber mixtures. This broad parameter coverage provides a sound basis for the subsequent modeling of nonlinear multi-factor relationships.

Table 1 summarizes the data sources and sample-count distribution of the core references for σc, so as to illustrate the source composition and sample coverage of the database. The εc samples were mainly collected from 12 studies, including Refs. [1,3,12,13,14,15,16,17,18,19,20,21], and are therefore not listed separately here.

2.1.2. Feature Classification, Coding, and Target Variables

The input variables included four categories: matrix mix-proportion parameters, mineral admixture and chemical admixture indicators, steel-fiber parameters, and PVA-fiber parameters. In addition to continuous variables, several binary indicator variables were introduced to distinguish the presence or absence of fly ash, silica fume, superplasticizer, and the two fiber types, thereby improving the model’s ability to represent the mixed feature space. The target variables were compressive strength, σc (MPa), and peak strain, εc (%). The definitions of all variables are given in Table 2.

2.2. Data Preprocessing, Internal-Validation Setting, and Analysis Boundaries

2.2.1. Data Cleaning, Size Normalization, and Statistical Characteristics

After extraction, the raw data collected from the literature were sequentially subjected to unit unification, outlier checking, duplicate verification, and missing-value screening. For compressive test results obtained from specimens of different shapes and sizes, predefined size-normalization rules were applied to convert them to a common reference basis. This procedure was used to reduce the influence of specimen-size differences on model training and synergy analysis. Table 3 presents the conversion coefficients used to normalize the mechanical properties of non-standard specimens to the standard size according to Eurocode 2 (BS EN 1992) [36].

In terms of statistical characteristics, the key variables in the database span a broad range (see Table 4). Most parameters fall within the ranges commonly used in engineering practice and exhibit reasonable variations in their values. This indicates that the database captures substantial differences in material performance under different mix proportions and fiber-parameter combinations. It also provides the necessary data basis for the subsequent identification of nonlinear effects and interactions.

In this study, data complexity was controlled through preprocessing procedures, including feature selection and input-variable standardization. These procedures removed redundant features and unified variable scales, thereby reducing the Kolmogorov complexity of the dataset [38,39]. This provided an efficient and stable data basis for the subsequent model training and synergy-effect analysis.

2.2.2. Dataset Splitting and Validation Strategy

To ensure that the modeling procedure was reproducible and that the results were reliable, the compiled dataset was randomly split into training and test sets at a ratio of 8:2. A Kolmogorov–Smirnov (KS) test was then performed to check whether the two sets remained consistent in terms of the target-variable distribution. This helped confirm that the data split was reasonable. Figure 1 compares the CDFs of σc for the training and test sets and reports the corresponding KS test results.

Figure 1 shows only a small difference in the σc distribution between the training and test sets, which satisfies the requirement of distributional balance for the subsequent internal validation and interpretation analysis. The εc dataset was split using the same stratified sampling strategy as that used for σc. Its KS statistic and distribution-consistency indicators were at the same confidence level, indicating a high degree of distributional agreement between the training and test sets for both target variables. Therefore, the distribution-validation figure for εc is not presented separately.

2.2.3. Model Training and Synergy-Gain Calculation Framework

For the σc task, multiple regression models were compared, and the model with the best overall performance was selected as the base model for the subsequent interpretation analysis. For the εc task, transfer learning and hyperparameter optimization were introduced to improve modeling stability because of the limited sample size. The optimal models were then further analyzed through global SHAP importance and SHAP dependence plots.

To quantify the synergy boundary of the two fiber volume fractions, this study calculated the synergy gain Δ(s, p) on a predefined 17 × 17 discrete grid. Here, s denotes the steel-fiber volume fraction, and p denotes the PVA-fiber volume fraction. The boundary was identified using a fixed procedure. First, the Δ(s, p) = 0 contour was used as the critical boundary of synergy. All grid points with Δ(s, p) > 0 were defined as the positive-synergy window. Its coverage was calculated as the proportion of positive-synergy points among all grid points. Then, a convex hull, denoted as H, was constructed from the measured sample coordinates of (s, p). The intersection between the positive-synergy window and H was taken as the data-supported candidate mix-proportion range. If the shortest distance from a candidate point to the boundary of H was smaller than the predefined grid step of 0.125%, this point was classified as a boundary-risk point rather than a robust recommended region. By combining standardized algorithmic steps with the measured data distribution, this procedure determined the data-supported robust boundary for the steel–PVA fiber mix proportions.

To build synergy maps that can support engineering screening, this study used a Monte Carlo marginalization strategy. The procedure was implemented as follows. The analysis focused on the two core variables, namely the steel-fiber and PVA-fiber volume fractions. The remaining mix-proportion and material parameters were randomly sampled, and the model outputs were then averaged. This reduced the influence of non-core variables and yielded the marginal response of the two core variables. For a given steel-fiber volume fraction s and PVA-fiber volume fraction p, the other input variables were randomly sampled, and the model outputs were averaged to obtain the marginal mean response. The synergy gain was defined as Δ(s,p) = f(s,p) − f(s,0) − f(0,p) + f(0,0), and its marginal mean was written as

\bar{Δ}

(s,p). When

\bar{Δ}

(s,p) > 0, positive synergy is considered to exist. (The stability of the model basis used to identify the synergy-gain window was further examined using 10 repeated random splits. The coefficient of variation of R² was only 1.01%; see Section 3.2.3 and Appendix A.1.)

To quantify this synergy gain, the steel-fiber volume fraction s and the PVA-fiber volume fraction p were fixed first. For each fixed (s, p) combination, B samples were independently drawn from the empirical distributions of the remaining mix-proportion and material parameters. The trained machine learning model was then used to predict the target performance for each sampled case. The arithmetic mean of these predictions was taken as the model output for the corresponding (s, p) combination.

In the dual-objective screening, the compressive strength and peak strain of a specimen are denoted as σc(s,p) and εc(s,p), respectively. The optimization problem is therefore formulated as maximizing the objective vector [σc(s,p), εc(s,p)]. Under this maximization setting, candidate i is regarded as Pareto-optimal if no other mix-proportion combination j satisfies both σc(sj,pj) ≥ σc(si,pi) and εc(sj,pj) ≥ εc(si,pi). To avoid relying on a single extreme solution, this study further identifies strength-oriented, ductility-oriented, and compromise candidates along the Pareto front. Their relative stability is then examined through test-set re-evaluation.

Because statistical interpretation cannot be directly equated with material mechanisms, the SHAP results were further checked using three engineering-consistent criteria, with empirical support from Appendix A.5 and Appendix A.8. First, the influence directions of the key variables should be consistent with the basic mechanical behavior of fiber-reinforced concrete. Second, the SHAP dependence regions should be supported by sufficient sample density and controlled discrete-level distributions. Third, in the two σc and εc tasks, the dominant roles of steel and PVA fibers should match the engineering mechanisms of load-bearing capacity and deformation capacity, respectively.

2.2.4. Five-Stage Framework and Technical Roadmap for the Quantitative Identification of Fiber Synergy

This study adopted a data-driven five-stage analytical framework to quantitatively identify fiber synergy. The overall workflow is shown in Figure 2.

The framework begins by integrating and cleaning 397/203 literature-based datasets to build a high-quality sample database. It then combines CatBoost modeling with SHAP-based interpretation to achieve accurate and interpretable prediction. Finally, by linking Monte Carlo marginalization with synergy-window overlay, it makes it possible to identify and visualize fiber synergy in a quantitative way. In this way, the framework offers an interpretable route for optimizing the mix design of fiber-reinforced concrete.

3. Results and Analysis

3.1. Model Performance Comparison and Base-Model Selection

3.1.1. Comparison of σc Models and Selection of the Base Model

Table 5 shows that, for the σc task, tree-based models performed markedly better than the linear model overall. This suggests that the relationship between compressive strength and the multidimensional input variables in hybrid steel–PVA fiber-reinforced concrete is strongly nonlinear. In the internal-validation results, LightGBM achieved the highest test-set R² (0.9783), whereas CatBoost obtained the lowest MAE (2.7409 MPa). CatBoost also showed better robustness under limited-sample conditions and was more suitable for handling categorical features and supporting the subsequent interpretation analysis. (See Table A2 in Appendix A.1 for details.)

Because the goal of the subsequent analysis is not simply to identify the model with the highest test score, but to support SHAP interpretation, single-variable main-effect analysis, and two-dimensional synergy-gain calculation, moreover, CatBoost is naturally suited to handling categorical features, which makes it more compatible with the mixed feature structure of the dataset used in this study. CatBoost was chosen as the base model for the σc analysis. This choice preserves strong predictive performance while avoiding over-reliance on a single evaluation metric in model selection. To test its robustness, the dataset was subjected to 10 repeated random train–test splits. The results show that the performance variation of CatBoost remained within an acceptable range. Detailed metrics and variation analysis are given in Appendix A.1.

3.1.2. εc Model Development and Small-Sample Modeling Strategy

Compared with σc, the εc dataset is smaller and is therefore more sensitive to parameter selection and variations in data splitting. Based on this characteristic, three modeling strategies were compared in this study: baseline CatBoost, a transfer-learning model, and Bayesian-optimized CatBoost (see Table 6). The results show that all three methods achieved good predictive performance. Among them, Bayesian-optimized CatBoost performed best overall, while the transfer-learning model reached a comparable level of accuracy. The parameter settings of the transfer-learning model(Table A3) and the hyperparameter-optimization results (Table A4) are presented in Appendix A.2 and Appendix A.3.

This result suggests that cross-task transfer can be practically useful when the sample sizes of different performance indicators are unbalanced. Even so, the later analysis of εc, including its interpretation and synergy-map construction, is still based mainly on the Bayesian-optimized CatBoost model. The transfer-learning results are treated as supportive evidence, rather than as the sole basis for the core conclusions.

3.2. Analysis of the σc Model Results and Synergy Mechanisms

3.2.1. Feature-Importance Ranking and Strength-Control Variables for the σc Model

Figure 3 presents the global feature importance ranking for the σc model. The results show that compressive strength is primarily driven by matrix mix proportions and fiber volume fractions. Among all variables, the W/B and V_STF contribute most significantly, indicating that matrix densification and steel fiber content are the key factors governing compressive performance. The importance rankings are broadly consistent between the training and test sets, suggesting that the main findings are relatively stable across the dataset. Detailed information on feature fluctuations and validation results is provided in Appendix A.4.

Figure 4 further focuses on fiber-related variables. In addition to volume fraction, steel fiber properties including tensile strength and length, and PVA-related geometric parameters also rank relatively high in importance, although they generally appear after the dominant matrix-related factors and fiber volume fractions. This result suggests that the effect of hybrid steel–PVA fibers on compressive strength is governed first by fiber addition and its volume fraction, whereas geometric and mechanical parameters mainly exert a secondary regulating effect within specific ranges (see Appendix A.5).

3.2.2. SHAP Dependence Plots and Single-Fiber Main-Effect Curves

Figure 5 further reveals the nonlinear influence patterns of key variables on σc. Generally, increasing the steel fiber volume fraction leads to greater positive contributions to σc, though this enhancement exhibits diminishing marginal returns at higher volume fractions. Meanwhile, PVA fiber-related variables exhibit more erratic contributions to σc at low volume fractions, with their effects gradually stabilizing after the intermediate volume-fraction range. This indicates that fiber reinforcement does not operate at a constant efficiency; instead, its effectiveness is collectively governed by the uniformity of fiber dispersion, the quality of fiber–matrix interfacial bonding, and the compatibility between the fiber and matrix materials.

From the perspective of physical mechanisms, this trend is consistent with the role of steel fibers in the later stage of compression. They help restrain lateral deformation, bridge macrocracks, and limit crack propagation. The diminishing marginal gain at higher fiber contents may be attributed to reduced workability, fiber agglomeration, and more defects in the interfacial transition zone, which can offset the reinforcing effect. Therefore, the V_STF- and V_PVA-related SHAP patterns are interpreted in this study as statistical evidence consistent with the combined action of crack bridging and microcrack restraint. They should not be regarded as standalone causal proof. The empirical support for these SHAP-dependence regions is summarized in Table A7.

The single-fiber main-effect curves in Figure 6 show an overall trend consistent with the SHAP dependence plots. Steel-fiber addition alone is more likely to improve compressive strength, whereas the gain in σc from PVA-fiber addition alone is relatively limited. When the two fiber types coexist, some volume-fraction combinations show a more pronounced enhancement trend than single-fiber addition. However, this enhancement does not hold across the entire volume-fraction domain. In this study, B = 100 was adopted for the visualization and quantitative analysis of the synergy-gain surface, and its convergence analysis is provided in Appendix A.6.

3.2.3. Mean Synergy-Gain Surface, Synergy Boundary, and Data-Support-Domain Constraint for σc

To provide a direct view of the synergistic interaction between the steel-fiber and PVA-fiber volume fractions, a bivariate partial dependence plot (Figure 7) was constructed based on the standard analytical framework of partial dependence plots [40]. Building on this, the boundary and coverage domain of the synergy-gain window were delineated.

Figure 7 provides a clear view of the synergistic response of compressive strength to the interaction between the steel-fiber and PVA-fiber volume fractions. The strongest response appears when the steel-fiber volume fraction is within 1.5–2.0% and the PVA-fiber volume fraction is within 1.8–2.0%. In this region, the predicted strength gradient is continuous and relatively stable. This result supports the robustness of the identified synergy-gain window and the rationality of the feature range covered by the dataset.

Based on the synergy-gain calculation framework and the Monte Carlo marginalization strategy established in Section 2.2.3, this section provides a quantitative analysis of the synergy boundary for compressive strength. Figure 8 shows the mean synergy-gain surface of σc obtained from a 17 × 17 two-dimensional grid and Monte Carlo marginalization. The results indicate that the positive synergy between steel fibers and PVA fibers in compressive strength is clearly localized, rather than being universally present across the entire steel–PVA volume-fraction plane. The positive-synergy region is mainly distributed near combinations with high steel-fiber and high PVA-fiber contents. This suggests that, at relatively high volume fractions, the two fiber types may enhance compressive performance through the combined effects of macro-scale bridging and microcrack restraint.

Figure 9 shows that the maximum mean synergy gain for σc over the whole domain is 4.794912 MPa, corresponding to (Steel, PVA) = (1.875%, 2.000%). However, the positive-synergy region occupies only about 1.7% of the domain. This result suggests that hybridization does not necessarily lead to a strength benefit. Its super-additive effect appears only within a limited range of fiber combinations.

By combining the optimal-gain results from Table 7 with the sample distributions in Figure 9, we find the maximum mean synergy gain of σc lies near the boundary of the sample-supported domain. (Its repeatability was assessed using 10 repeated random splits, as reported in Appendix A.1.) This not only points to the high research value of high-steel-fiber and high-PVA-fiber combinations, but also reminds us to interpret these findings carefully in real engineering applications—we must pay close attention to workability, fiber dispersion, and the need for further experimental validation.

From an engineering perspective, the high-steel-fiber/high-PVA-fiber region is better treated as a potential testing zone for high-strength mixes, rather than a fixed point ready for direct practical recommendation. Priority should be given to combinations that fall within the positive-synergy region, remain reasonably far from the convex-hull boundary, and reside within the data-support domain.

To validate the reliability and robustness of this synergy-gain surface, we conducted grid resolution tests and robustness checks, which confirmed that fluctuations were kept within 2% and the core morphological features showed statistical consistency. The full validation process is detailed in Appendix A.7.

3.3. Analysis of the εc Model Results and Synergy Mechanisms

3.3.1. Global Feature Importance and Ranking of Fiber-Related Variables

As shown in Figure 10 and Figure 11, compared with the σc task, the importance ranking of the εc model exhibits a more differentiated pattern. In addition to some matrix mix-proportion parameters, PVA-related variables become markedly more important in the εc task, with S/B, V_PVA, D_PVA, and f_PVA usually ranking among the more influential features. This indicates that peak strain is more sensitive to the microcrack-control capacity of PVA fibers, rather than being governed solely by the macro-scale bridging effect of steel fibers.

The importance distributions in the training and test sets are broadly consistent, which supports the interpretation analysis of εc within the current data-support domain. However, because the εc sample size is relatively small, the ranking results should emphasize the relative positions of the main controlling factors, rather than over-interpreting subtle differences between adjacent variables (see Appendix A.8).

3.3.2. SHAP Dependence Plots, Discrete-Level Support, and the εc Synergy-Gain Map

This section adopts the same definition of the synergy-gain window, quantification formula, and stability-assessment framework as those used for the compressive-strength analysis in Section 3.2.3. The detailed logic and supporting data are provided in Section 3.2.3 and Appendix A.1.

The SHAP dependence plots of the key variables for εc (Figure 12) show that the volume fraction of PVA fibers and related parameters make a more pronounced positive contribution to peak strain, whereas steel fibers mainly play a supporting bridging role during the later stages of crack development. This result is consistent with the underlying material mechanism. PVA fibers are more effective in suppressing the initiation and propagation of microcracks, and are therefore more critical to deformation capacity near the peak point. By contrast, steel fibers improve bridging capacity at the macrocrack stage and thus provide a complementary contribution to ductility enhancement. In addition, PVA fibers may also restrain microcracks during curing shrinkage. When PVA fibers deform compatibly with the early-age matrix, they can share local shrinkage-induced tensile stress and bridge initial microcracks. This may reduce the risk of shrinkage-crack formation and propagation, thereby indirectly improving the peak strain of the material.

This explanation is also consistent with the scale-dependent roles of hybrid fibers. PVA fibers have smaller diameters and a higher number density. They are therefore more effective in controlling microcracks and improving deformation compatibility near the peak point. Steel fibers have higher stiffness and strength, making them more suitable for bridging and restraining larger cracks. Thus, the increased importance of PVA-related variables in the εc model is not merely a statistical pattern. It is also consistent with the physical origin of peak strain.

It should be noted that some variables in the εc dataset have relatively few discrete levels and high shares of dominant levels (see Table 8). This may cause the SHAP dependence plots to exhibit step-like or locally fluctuating patterns in certain intervals. Therefore, when interpreting the key variables, this study considers the number of discrete levels, the shares of dominant levels, and the distribution of tail samples together, so as to enhance interpretive transparency. (See Table A10 for empirical support details)

Based on the above single-variable SHAP dependence analysis, the mean synergy-gain map constructed for εc shows that its positive-synergy region is clearly wider than that of σc. The peak is mainly located near combinations with low-to-moderate steel-fiber content and moderate-to-high PVA-fiber content. Quantitative results show that the global maximum mean synergy gain of εc is 0.0141629, located at (Steel, PVA) ≈ (0.38%, 1.62%), and that the positive-synergy region covers about 18% of the whole domain.

Combined with the window distribution, this suggests that ductility synergy is more likely to arise from the coordinated action of a moderate amount of steel fibers and a relatively high amount of PVA fibers at different stages of crack development. Compared with the σc window, the εc window is more suitable as a basis for ductility-oriented design (see Figure 13 and Table 9).

3.4. Implications of Dual-Objective Synergy: Trade-Offs Between Strength and Ductility and Mix-Proportion Boundaries

As shown in Figure 14, a comparison of the synergy windows of σc and εc on the steel–PVA volume-fraction plane indicates that their positive-synergy regions do not fully overlap. This means that engineering design does not have a single optimal point that simultaneously satisfies all objectives. A more reasonable approach is to screen candidate regions according to performance constraints and the level of data support.

From an engineering perspective, if the primary objective is to improve load-bearing capacity, the candidate mixes are more likely to fall in the high-fiber-content region. However, greater attention should also be paid to reduced workability, difficulties in fiber dispersion, and the risk of boundary extrapolation. If ductility is the main concern, screening can instead focus on combinations with low-to-moderate steel-fiber content and moderate-to-high PVA-fiber content. The value of the overlay map of dual-objective synergy windows lies not in replacing experiments, but in providing a data-supported quantitative basis for preliminary mix screening and for setting priorities in experimental validation.

Based on the above differences, preliminary engineering mix screening can be carried out in three steps. First, determine whether the primary objective is load-bearing capacity, ductility, or a balance between the two. Second, screen candidate ranges within the corresponding positive-synergy window. Third, exclude combinations located near the boundary of the data-support domain, and subject the remaining combinations to experimental verification.

To further show the trade-off between the two objectives, Figure 15 presents the training-set Pareto front obtained from the candidate mix-proportion points, together with the corresponding test-set re-evaluation results. As the solution moves from left to right along the Pareto front, the increase in σc is accompanied by a decrease in εc. This indicates a clear trade-off between strength and peak strain. The Max εc candidate, Max σc candidate, and balanced candidate represent three types of candidate schemes: ductility-oriented, strength-oriented, and dual-objective compromise schemes, respectively. The test-set re-evaluation curve is generally lower than the training-set Pareto front. This suggests that the Pareto results are more suitable for ranking candidate mix proportions and for setting experimental priorities. They should not be directly treated as final engineering recommendations.

4. Discussion

4.1. Model Performance and Analytical Positioning

As shown in Section 3, the random cross-validation results based on the available data indicate that both the σc and εc tasks achieved high fitting and predictive accuracy. This suggests that the data-driven models established from mix-proportion parameters, fiber geometric parameters, and fiber mechanical parameters can effectively capture the main nonlinear relationships in the compressive behavior of hybrid steel–PVA fiber-reinforced concrete. More importantly, the core value of this study lies not only in its high predictive accuracy, but also in transforming the prediction results into interpretable, screenable, and practically useful information that can directly support engineering applications, mix-proportion optimization, and experimental design.

Building on the analytical framework this model provides, we will further investigate how steel–PVA hybrid fibers modulate the compressive performance of concrete. To improve sample utilization under small-sample conditions, all 397 samples were combined for the final visualization analysis. The stability assessment of the synergy-gain window is provided in Appendix A.1.

4.2. Interpretability of Key Variables and the Differentiated Roles of Steel and PVA Fibers

SHAP analysis shows that although σc and εc are both indicators of compressive behavior, they are governed by different dominant factors. In the σc task, matrix mix-proportion factors and steel fiber volume fraction are more important. In the εc task, S/B and several PVA-related variables carry greater influence. This difference indicates that the load-bearing capacity and deformation capacity of hybrid fiber-reinforced concrete are not controlled by the same set of variables in the same manner. The former depends more on the load-bearing skeleton of the matrix and the bridging capacity across macrocracks. The latter is more sensitive to microcrack control, fiber–matrix interfacial interaction, and the effect of matrix volumetric proportioning on deformation compatibility.

From the perspective of material mechanisms, the higher elastic modulus and tensile strength of steel fibers make them more effective in post-cracking bridging and in delaying the propagation of macrocracks. This is why they exert a more direct strengthening effect in the σc model. By contrast, PVA fibers are more advantageous in suppressing microcrack initiation, improving the continuity of crack propagation, and enhancing deformation accommodation near the peak point. This is broadly consistent with the findings of previous experimental studies [1,2,3,4]. In other words, steel fibers and PVA fibers do not merely offer redundant reinforcement. Instead, they participate in the compressive failure process at different scales. This distinction forms the basic physical basis of hybridization, rather than simple superposition. Furthermore, the role of PVA fibers should not be understood only as crack bridging during loading. It also includes early restraint of shrinkage-induced microcracks during curing. Such control of initial defects may provide an important basis for the later increase in peak strain and the delayed propagation of cracks.

Furthermore, both the SHAP dependence plots and the single-fiber main-effect curves indicate that fiber effects are strongly nonlinear. As the volume fraction increases, the strengthening effect does not continue to grow at a constant rate. Instead, it often shows diminishing marginal returns, plateauing, or even local fluctuations. This suggests that, in a multi-source literature-based dataset, the potential performance gains associated with higher fiber content may be simultaneously limited by factors such as fiber dispersion, workability, interfacial bonding, and matrix compatibility. Therefore, fiber optimization cannot be achieved simply by increasing fiber dosage. More importantly, it calls for pinpointing the parameter ranges in which the positive effects of fibers can be stably observed under the support of the available data.

Therefore, the physical validity of the interpretable results was assessed using a cautious “mechanistic consistency + data support” strategy. A SHAP trend was used as a basis for engineering interpretation only when three conditions were met: it was consistent with material mechanisms, it was supported by sufficient sample density or discrete-level coverage, and it showed a similar direction in the single-fiber main-effect curves and synergy-gain maps. Local interactions with sparse samples or those close to the data boundary were treated as optimization directions that require further engineering validation.

4.3. Engineering Implications of Synergy-Gain Windows and Dual-Objective Trade-Offs

From the synergy-gain heatmaps generated via Monte Carlo marginalization and two-dimensional grid evaluation, we observe that the synergistic enhancement between steel fibers and PVA fibers does not hold across the entire volume-fraction domain, but instead shows clear regional characteristics. This insight carries important implications for engineering practice: hybrid fiber mixtures do not inherently outperform single-fiber systems or produce simple additive effects, and only specific fiber combinations can yield true super-additive benefits. Therefore, rather than adhering to the empirical assumption that combining steel and PVA fibers will necessarily improve performance, this study advocates for a window-based and condition-dependent design strategy.

For σc, the positive-synergy region is concentrated near combinations with high steel-fiber and high PVA-fiber contents, and its area share is very small, indicating that strength synergy is strongly localized. This means that if the primary engineering objective is to achieve higher load-bearing capacity, the formulations under consideration are likely to fall in the high-volume-fraction region. However, such regions are also often closer to the data boundary and more likely to be accompanied by reduced constructability, difficulties in fiber dispersion, and increased construction risk. Therefore, the interpretation of the strength-synergy window must consider both potential benefits and application risks, instead of fixating solely on the peak value. In particular, the high-dosage synergy region close to the data boundary should be regarded as a high-potential direction for further experiments at this stage. Targeted validation tests are still needed before engineering application.

In comparison, the positive-synergy region for εc is wider, and its peak is located near combinations with low-to-moderate steel-fiber content and moderate-to-high PVA-fiber content. This indicates that ductility synergy does not rely on an extremely high steel-fiber volume fraction. Instead, it is more likely to arise from the coordinated action of a moderate amount of steel fibers and a substantial volume of PVA fibers at different stages of crack development: the former provides the necessary macro-scale bridging capacity, whereas the latter improves microcrack control and deformation compatibility. This finding suggests that, in scenarios where ductility, energy dissipation, or peak-strain enhancement is the primary objective, a moderate rather than extreme steel-fiber content is prone to deliver stable benefits.

More importantly, the synergy windows of σc and εc do not fully overlap, which means that engineering design must inevitably address a trade-off between strength and ductility. A more reasonable strategy is to first define the minimum requirements for load-bearing capacity and deformation capacity according to the structural objective. Candidate points should then be prioritized within the dual-objective synergy region, while also remaining inside the data-support domain and relatively far from the convex-hull boundary. Combinations located near the boundary should be confirmed through additional experiments. In this way, the role of the synergy-window map is not to replace experiments, but to help guide them in a more targeted and efficient manner.

Therefore, this study recommends window-guided selection rather than point-based selection. When a candidate combination lies well within the synergy region and remains distant from the convex-hull boundary, it should be prioritized in laboratory mixing trials. By contrast, if a combination exhibits favorable mechanical performance only in terms of its peak value but lies close to the boundary, it is an oriented case rather than being directly recommended as a target mix proportion.

Therefore, under the screening logic combining the Pareto front and the synergy window, combinations located within the synergy region and far from the convex-hull boundary should be prioritized as starting points for trial mixing in engineering applications. If a combination is favorable only near the boundary, then it should be treated as a validation target rather than as a directly recommended engineering mix without further testing.

4.4. Scope of Applicability, Limitations, and Future Work

Although this study established a relatively complete integrated workflow for prediction, interpretation, and synergy identification, its scope of applicability still needs to be clearly defined. First, the data were compiled from multiple published studies and academic theses. Although data standardization and specimen-size normalization were performed, differences in metadata may still exist across studies, including raw-material sources, curing conditions, loading rates, specimen preparation procedures, and testing equipment. These factors were not fully structured and incorporated into the model. Therefore, the patterns learned by the model should be understood, to some extent, as empirical regularities averaged over a multi-source database, rather than as an exact reproduction of a single experimental system.

Second, to ensure the reliability of the conclusions, all findings in this study are restricted to the current data-support range, and the scope of applicability will be further expanded through targeted experiments. At the same time, convex-hull coverage was used to identify high-confidence regions within the supported data domain. The conclusions on synergy effects in these regions can directly inform engineering design. By contrast, conclusions for samples near the convex-hull boundary should be regarded only as a basis for preliminary design and still require further experimental validation. Fiber-mix schemes located in such high-confidence regions can be directly applied in the production of concrete members and may help reduce trial-mix costs to some extent.

To further extend the current application boundary of this study, future work will proceed in three directions. First, additional experiments will be carried out in boundary regions and sparse mix-proportion intervals where data coverage is insufficient. The focus will be on verifying the mechanical stability of systems with high steel-fiber and high PVA-fiber contents, so as to provide more refined mix guidance for the production of concrete members. Second, mix-validation experiments using different batches of raw materials will be conducted to clarify the extent to which raw-material variability affects mix performance, thereby providing a quantitative basis for material substitution in engineering practice. Third, long-term performance data under different curing regimes (such as the evolution of shrinkage-induced stress and cracking) will be added to establish performance-prediction models that better reflect field conditions and further enhance the engineering applicability of the present study. In addition, this pre-peak strain energy

U_{c}

is an important indicator for evaluating the seismic energy-dissipation potential of fiber-reinforced concrete. Previous studies have shown [41] that

U_{c}

can be approximately estimated using empirical relationships related to

σ_{c}

and

ϵ_{c}

. Therefore, extending the prediction–interpretation–synergy-identification framework developed in this study to a three-objective analysis framework for

σ_{c}

,

ϵ_{c}

, and

U_{c}

would further improve the engineering applicability of the present findings.

From the perspective of engineering implementation, the most effective path for future improvement is not to develop more complex prediction models. Instead, it is to improve the transferability and practical usefulness of the conclusions in concrete-member production by supplementing validation experiments for boundary mix proportions and by introducing multidimensional constraints such as workability, durability, and cost.

5. Conclusions

Based on the multi-source experimental database, interpretable machine learning, and synergy-gain map analysis, this study systematically investigated the compressive strength and peak strain of hybrid steel–PVA fiber-reinforced concrete. The main conclusions are as follows:

(1): This study developed an interpretable machine learning framework to analyze the compressive strength (σc) and peak strain (εc) of hybrid steel–PVA fiber-reinforced concrete. The framework integrates performance prediction, mechanistic interpretation, and synergy-window identification into a unified analytical workflow, thereby providing a data-driven, interpretable technical approach for optimizing the mix design of such fiber-reinforced concretes.
(2): Using a random train–test split on our multi-source dataset, tree-based models consistently outperformed linear models. For the compressive-strength prediction task, LightGBM achieved the highest R-squared at 0.9783, while CatBoost delivered the lowest mean absolute error of 2.7409 MPa. After comprehensively evaluating error control, prediction stability, and post-hoc interpretability, we selected CatBoost as the foundational model for subsequent compressive-strength analysis.
(3): For the εc task, Bayesian-optimized CatBoost achieved the best test performance (R² = 0.9659, MAE = 0.0218, RMSE = 0.0358). The transfer-learning model reached a comparable accuracy level (R² = 0.9650), indicating that cross-task feature transfer can provide effective prior support for modeling performance indicators with limited sample sizes.
(4): SHAP analysis showed that σc is mainly governed by matrix mix-proportion factors and steel fiber volume fraction, whereas εc is more sensitive to S/B and PVA-related variables. This difference reflects the distinct fiber-action mechanisms underlying load-bearing capacity and deformation capacity.
(5): The mean synergy-gain maps derived from Monte Carlo marginalization show that the positive-synergy region for σc is strongly localized and mainly concentrated near combinations with high steel-fiber and high PVA-fiber contents, with a global maximum mean synergy gain of 4.794912 MPa. By contrast, the positive-synergy region for εc is wider and is mainly distributed in the range of low-to-moderate steel-fiber and moderate-to-high PVA-fiber combinations, with a peak value of 0.0141629. These results indicate that the effects of the two fiber types are not simply linearly additive, but show clear regionality and target dependence.
(6): The dual-objective synergy windows of σc and εc do not fully overlap. Therefore, engineering mix design is better guided by a hierarchical screening logic of performance target–synergy window–data-support domain. The core value of this study lies in providing an interpretable and visual quantitative tool for candidate-mix screening and experimental-priority setting within the current data-support range, rather than directly offering a single universally applicable mix proportion. In particular, high-potential synergy regions close to the data-support boundary should be further validated before engineering application.
(7): From a practical engineering perspective, this study is better regarded as a candidate-window map plus validation-priority tool, rather than as a single-point mix recommender. Priority should be given to combinations located within the positive-synergy region and relatively far from the boundary of the data-support domain, so as to improve the reliability of trial mixing and validation.

Author Contributions

Conceptualization, M.L.; Methodology, M.L.; Software, M.L.; Validation, J.C.; Formal analysis, J.C.; Investigation, S.Z.; Resources, S.Z.; Data curation, J.C.; Writing—original draft, M.L.; Writing—review and editing, S.Z.; Visualization, J.C.; Supervision, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guilin University of Technology, Nanning Branch, grant number [2024] No. 22.

Data Availability Statement

The data presented in this study are available in [Web of Science and CNKI] at [1].

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Model Stability and Synergy-Window Reproducibility

Appendix A.1.1. Performance Variation Under Repeated Random Splits

As shown in Table A1, the results of 10 repeated random splits indicate that the CatBoost model exhibits good stability. The σc model is more stable, whereas the εc model is more sensitive to random splitting. However, the overall conclusions remain consistent. The variation trajectories of the performance metrics under different random splits, together with their 95% confidence intervals, are presented in Figure A1.

Table A1. Performance stability of CatBoost models under repeated random train–test splits.

Property	Metric	Mean	SD	CV (%)
Compressive strength (σc)	R²	0.9572	0.0097	1.01
Compressive strength (σc)	MAE	3.3136	0.2804	8.46
Compressive strength (σc)	RMSE	4.8106	0.7167	14.90
Peak strain (εc)	R²	0.9124	0.0314	3.44
Peak strain (εc)	MAE	0.0421	0.0092	21.73
Peak strain (εc)	RMSE	0.0698	0.0162	23.17

Note: 1. Mean, standard deviation (SD), and coefficient of variation (CV) of R², MAE, and RMSE across 10 repeated 8:2 random splits. 2. Abbreviations: R², coefficient of determination; MAE, mean absolute error; RMSE, root mean square error; SD, standard deviation; CV, coefficient of variation.

Table A2. Performance stability comparison between CatBoost and LightGBM models for compressive-strength prediction.

Model	CV of R² (%)
LightGBM	1.03
CatBoost	1.01

Note: Both models were trained and validated using the same dataset and the same splitting strategy.

Figure A1. Performance variation and confidence intervals of the peak-strain and compressive-strength models under 10 repeated random splits.

Appendix A.1.2. Logic for Identifying the Synergy-Gain Window

Based on the model stability shown in Figure A1, the core logic for identifying the synergy-gain window is as follows:

Load the trained CatBoost-based prediction models for compressive strength and peak strain.
Traverse all combinations of steel-fiber volume fraction s ∈ [0,2%] and PVA-fiber volume fraction p ∈ [0,1.5%]. For each combination, calculate the hybrid-fiber response f(s,p), the single-fiber responses f(s,0) and f(0,p), and the fiber-free baseline response f(0,0).
Substitute these responses into the synergy-effect formula, Δ(s,p) = f(s,p) − f(s,0) − f(0,p) + f(0,0), and retain the combinations with Δ(s,p) > 0.
Apply density-based clustering to the retained combinations to determine a boundary interval of the synergy-gain window.

Appendix A.2. Parameter Settings of the Transfer-Learning Model

Table A3. Parameter settings of the transfer-learning model for εc prediction.

Parameter Category	Parameter	Value
Pre-trained model parameters	Number of fixed layers	726
	Input feature dimension	20
	Leaf-feature dimension	726
	Source of pre-trained weights	None
Training configuration	Regularization coefficient (alpha)	0.001
	Batch size	Full-batch training
	Maximum iterations (max_iter)	20,000
Validation strategy	Early stopping	Not applicable
	Validation split ratio	0.2

Appendix A.3. Hyperparameter Settings and Optimization Results

Table A4. Search space and optimal hyperparameters of the Bayesian-optimized CatBoost model.

Hyperparameter	Search Range	Optimal Value
learning_rate	(0.01, 0.05)	0.0452
depth	(4, 6)	6
iterations	(3000, 4500),	4018
l2_leaf_reg	(10, 20),	15
min_data_in_leaf	(10, 16),	14
random_strength	(0.2, 0.6),	0.5616
subsample	(0.8, 1),	0.9955
colsample_bylevel	(0.7, 0.9),	0.898

Note: This table reports the final hyperparameter combination selected for the peak-strain task using five-fold cross-validation, while taking advantage of CatBoost’s suitability for handling categorical features. This information supports the reproducibility of model training.

Appendix A.4. Stability of SHAP Importance Rankings

To examine the sensitivity of feature-interpretation results to random data splitting, this study repeated the 8:2 data split, model training, and SHAP ranking analysis under 10 different random seeds. The results show that, in both tasks, the rankings of the main features remain generally stable, and the top-ranked features exhibit only small fluctuations. The Kendall coefficient of concordance further indicates a high degree of consistency in the feature importance rankings across the 10 repeated experiments, suggesting that the corresponding interpretation results are robust (see Table A5 and Table A6, Figure A2 and Figure A3).

Table A5. Feature importance rank consistency analysis across 10 repeated random splits (σc).

Feature	Mean Rank	SD Rank	Best Rank	Worst Rank	Mean \|SHAP\|
W/B	1.00	0.00	1.0	1.0	6.2785
SP	2.10	0.32	2.0	3.0	3.4681
V_STF	2.90	0.32	2.0	3.0	2.9420
SF	4.20	0.42	4.0	5.0	2.2659
FA	5.80	1.14	4.0	7.0	1.6938
SF_zero	6.10	1.45	5.0	8.0	1.7252
D_PVA	7.00	1.25	5.0	9.0	1.5527
f_PVA	7.40	1.26	6.0	9.0	1.4286
S/B	9.20	1.03	7.0	10.0	1.2116
E_STF	10.20	1.81	8.0	14.0	1.0532
V_PVA	11.00	0.82	10.0	12.0	0.9666
E_PVA	11.60	1.26	9.0	13.0	0.8899
f_STF	12.60	0.70	11.0	13.0	0.7050
L_STF	14.50	0.71	14.0	16.0	0.3510
D_STF	14.70	0.95	13.0	16.0	0.3769
L_PVA	16.50	1.27	15.0	19.0	0.2018
FA_zero	17.00	1.15	15.0	19.0	0.1475
STF_zero	17.70	1.16	16.0	20.0	0.1230
SP_zero	19.10	0.74	18.0	20.0	0.0334
PVA_zero	19.40	0.84	18.0	20.0	0.0368

Table A6. Feature importance rank consistency analysis across 10 repeated random splits (εc).

Feature	Mean Rank	SD Rank	Best Rank	Worst Rank	Mean \|SHAP\|
S/B	1.00	0.00	1.0	1.0	0.0851
V_PVA	2.10	0.32	2.0	3.0	0.0320
FA	3.50	0.97	3.0	6.0	0.0225
V_STF	4.00	1.15	2.0	6.0	0.0190
f_PVA	6.00	2.00	4.0	10.0	0.0147
D_PVA	6.50	1.84	5.0	10.0	0.0136
SP	6.60	1.17	4.0	8.0	0.0139
SF	7.30	1.83	5.0	11.0	0.0126
W/B	9.70	2.00	7.0	13.0	0.0094
SF_zero	9.90	1.20	8.0	12.0	0.0099
D_STF	12.30	2.98	8.0	16.0	0.0069
FA_zero	12.50	2.32	9.0	16.0	0.0071
E_STF	12.70	1.57	10.0	15.0	0.0072
L_STF	14.10	1.79	11.0	17.0	0.0055
E_PVA	14.10	2.02	11.0	16.0	0.0055
f_STF	14.20	1.87	11.0	17.0	0.0058
STF_zero	16.80	1.40	14.0	19.0	0.0031
PVA_zero	18.10	0.88	17.0	20.0	0.0020
L_PVA	19.10	0.74	18.0	20.0	0.0010
SP_zero	19.50	0.71	18.0	20.0	0.0009

Note: Kendall’s W = 0.9716; chi-square = 184.61; p < 0.001.

Figure A2. Feature importance ranking heatmap for the compressive-strength (σc) model.

Figure A3. Feature importance ranking heatmap for the peak-strain (εc) model.

Appendix A.5. Empirical Support for SHAP Dependence Regions (Density, Discrete Levels, and Tail Coverage) for Compressive Strength (σc) Using Train + Test Combined Data

Table A7. Empirical support for SHAP dependence regions for compressive strength (σc) using the combined train + test data.

Feature (unit)	K (Rounded Levels)	P5/P50/P95	Tail n (<P5/>P95)	Top-3 Levels (Share%)
V_STF (%)	24	0/0.8/1.7	0/18	0 (21.4%); 1 (19.9%); 0.5 (11.6%)
f_PVA (MPa)	8	1300/1560/1850	16/0	1560 (38.8%); 1600 (33.2%); 1620 (9.8%)
D_PVA (mm)	7	0.02/0.04/0.04	2/10	0.04 (69.0%); 0.039 (9.8%); 0.02 (9.6%)
E_PVA (GPa)	11	30/40/42.8	13/18	41 (32.0%); 40 (28.5%); 42.8 (9.8%)
V_PVA (%)	25	0/0.5/2	0/1	0 (22.2%); 1 (19.1%); 0.5 (9.6%)
f_STF (MPa)	13	600/2000/2850	0/8	2800 (22.2%); 2000 (16.1%); 2850 (11.6%)

Note: All features have n = 397; P5, P50, and P95 denote the 5th, 50th (median), and 95th percentiles. Tail n reports the number of observations below P5 and above P95, Top-3 levels show the most frequent rounded values and their sample shares.

Appendix A.6. Sensitivity to the Number of Monte Carlo Samples

This study conducted a convergence analysis on the number of Monte Carlo samples, and the results are presented in Figure A4. The results show that when B ≥ 80, the fluctuation of the single-fiber main-effect curves remains below 2%, which satisfies the requirement for statistical stability. To strike a balance between computational efficiency and result reliability, B = 100 was ultimately selected for the subsequent analysis.

Figure A4. Convergence analysis of the number of Monte Carlo samples.

Appendix A.7. Robustness Analysis of the Synergy-Gain Surface

A further local shape-robustness analysis was conducted for the σc synergy-gain surface using B = 100, as adopted in the main text. With the trained model, the combined train+test dataset, the two-dimensional grid range (0–2% × 0–2%), and the grid resolution (Δs = Δp = 0.125%) kept unchanged, pairwise comparisons were performed among the synergy-gain surfaces obtained with B = 80, 100, and 120.

The results show that the overall shapes of the synergy-gain surfaces across varying B values are highly consistent (see Table A8). The Pearson correlation coefficients are all above 0.9989, and the Spearman rank correlation coefficients are all above 0.9965. This indicates that the synergy pattern identified in the main text is not sensitive to small variations in B around 100.

Combined with the convex-hull-based data-support domain shown in Figure 9, the adopted 0–2% grid is broadly consistent with the main supported region of the current σc dataset. Therefore, using B = 100 in the main text provides a good balance between computational efficiency and map robustness. Further results on the shape robustness of the synergy-gain surface under different Monte Carlo sample sizes are presented in Table A9.

Table A8. Local shape-robustness check of the σc synergy-gain surface around the adopted Monte Carlo sample size (B = 100).

Pair of B Values	Surface Correlation, Pearson r	Spearman Rank Correlation, ρ
80 vs. 100	0.9995 ± 0.0004	0.9974 ± 0.0010
100 vs. 120	0.9996 ± 0.0004	0.9980 ± 0.0008
80 vs. 120	0.9989 ± 0.0010	0.9965 ± 0.0018

Note: The comparison was performed under the same trained model, combined train+test dataset, grid range (0–2% × 0–2%), and grid resolution (17 × 17, Δs = Δp = 0.125%). Only the Monte Carlo sample size B was varied.

Table A9. Shape robustness of the σc synergy-gain surfaces around the adopted Monte Carlo sample size (B = 100).

Pair of B Values	Surface Correlation, Pearson r	Spearman Rank Correlation, ρ	Positive-Window IoU
80 vs. 100	0.9995 ± 0.0004	0.9974 ± 0.0010	0.3842 ± 0.0550
100 vs. 120	0.9996 ± 0.0004	0.9980 ± 0.0008	0.3703 ± 0.0571
80 vs. 120	0.9989 ± 0.0010	0.9965 ± 0.0018	0.3721 ± 0.0646

Note: The comparison was conducted using the same trained model, combined train+test dataset (N=397), grid range (0–2% × 0–2%), and grid resolution (17 × 17, Δs = Δp = 0.125%). Only the Monte Carlo sample size B was changed. In the present σc dataset, the convex-hull-based support domain coincided with the adopted full grid; therefore, the full-grid and support-domain statistics were numerically identical.

Appendix A.8. Empirical Support for SHAP Dependence Regions (Density, Discrete Levels, and Tail Coverage)

Table A10. Empirical support for SHAP dependence regions for peak strain (εc) using the combined train + test data.

Feature (unit)	K (Rounded Levels)	P5/P50/P95	Tail n (<P5/>P95)	Top-3 Levels (Share%)
V_STF (%)	19	0/0.8/1.5	0/8	1 (18.2%); 0 (18.2%); 0.5 (12.8%)
f_PVA (MPa)	4	1300/1560/1620	0/0	1560 (40.4%); 1600 (31.5%); 1300 (14.3%)
D_PVA (mm)	3	0.03/0.04/0.04	10/0	0.04 (83.3%); 0.03 (11.8%); 0.02 (4.9%)
V_PVA (%)	18	0/0.5/1.7	0/6	1 (20.7%); 0 (19.2%); 1.7 (11.8%)
D_STF (mm)	6	0.2/0.2/0.75	0/9	0.2 (53.7%); 0.6 (20.2%); 0.75 (8.9%)
L_STF (mm)	8	13/13/50	0/9	13 (53.7%); 36 (12.3%); 50 (8.9%)

Note: All features have n = 203; P5, P50, and P95 denote the 5th, 50th (median), and 95th percentiles. Tail n reports the number of observations below P5 and above P95, Top-3 levels show the most frequent rounded values and their sample shares.

References

Zhou, Y.; Xiao, Y.; Gu, A.; Zhong, G.; Feng, S. Orthogonal experimental investigation of steel-PVA fiber-reinforced concrete and its uniaxial constitutive model. Constr. Build. Mater. 2019, 197, 615–625. [Google Scholar] [CrossRef]
Liu, F.; Ding, W.; Qiao, Y. Experimental investigation on the tensile behavior of hybrid steel-PVA fiber reinforced concrete containing fly ash and slag powder. Constr. Build. Mater. 2020, 241, 118000. [Google Scholar] [CrossRef]
Abbas, Y.M.; Hussain, L.A.; Khan, M.I. Constitutive Compressive Stress-Strain Behavior of Hybrid Steel-PVA High-Performance Fiber-Reinforced Concrete. J. Mater. Civ. Eng. 2022, 34, 04021401. [Google Scholar] [CrossRef]
Wu, J.; Zhang, W.; Han, J.; Liu, Z.; Liu, J.; Huang, Y. Experimental Study on the Flexural Performance of Steel–Polyvinyl Alcohol Hybrid Fiber-Reinforced Concrete. Materials 2024, 17, 3099. [Google Scholar] [CrossRef] [PubMed]
Kang, M.C.; Yoo, D.Y.; Gupta, R. Machine learning-based prediction for compressive and flexural strengths of steel fiber-reinforced concrete. Constr. Build. Mater. 2021, 266, 121117. [Google Scholar] [CrossRef]
Al-Shamasneh, A.R.; Mahmoodzadeh, A.; Karim, F.K.; Saidani, T.; Alghamdi, A.; Alnahas, J.; Sulaiman, M. Application of machine learning techniques to predict the compressive strength of steel fiber reinforced concrete. Sci. Rep. 2026, 16, 1901. [Google Scholar] [CrossRef] [PubMed]
Sofos, F.; Papakonstantinou, C.G.; Valasaki, M.; Karakasidis, T.E. Fiber-reinforced polymer confined concrete: Data-driven predictions of compressive strength utilizing machine learning techniques. Appl. Sci. 2022, 13, 567. [Google Scholar] [CrossRef]
Cui, R.; Yang, H.; Li, J.; Xiao, Y.; Yao, G.; Yu, Y. Machine learning-based prediction of compressive strength in circular FRP-confined concrete columns. Front. Mater. 2024, 11, 1408670. [Google Scholar] [CrossRef]
Zhang, J.; Huang, Y.; Wang, Y.; Ma, G. Multi-objective optimization of concrete mixture proportions using machine learning and metaheuristic algorithms. Constr. Build. Mater. 2020, 253, 119208. [Google Scholar] [CrossRef]
Zhang, J.; Huang, Y.; Ma, G.; Nener, B. Mixture optimization for environmental, economical and mechanical objectives in silica fume concrete: A novel framework based on machine learning and a new meta-heuristic algorithm. Resour. Conserv. Recycl. 2021, 167, 105395. [Google Scholar] [CrossRef]
Fan, M.; Li, Y.; Shen, J.; Jin, K.; Shi, J. Multi-objective optimization design of recycled aggregate concrete mixture proportions based on machine learning and NSGA-II algorithm. Adv. Eng. Softw. 2024, 192, 103631. [Google Scholar] [CrossRef]
Li, W. Study on Mechanical Properties of Steel–PVA Hybrid Fiber Reinforced Cementitious Composites. Master’s Thesis, Guangxi University, Nanning, China, 2024. (In Chinese) [Google Scholar]
Wang, Z. Studies on Mechanical Performance of Polyvinyl Alcohol-Steel Hybrid Fiber Reinforced Cementitious Composites. Ph.D. Thesis, Tsinghua University, Beijing, China, 2016. (In Chinese) [Google Scholar] [CrossRef]
Wang, Z.; Zhang, J.; Wang, Q. Mechanical properties and crack width control of hybrid fiber reinforced ductile cementitious composites. J. Build. Mater. 2018, 21, 216–221+227. (In Chinese) [Google Scholar] [CrossRef]
Sun, L.; Hao, Q.; Zhao, J.; Wu, D.; Yang, F. Stress strain behavior of hybrid steel-PVA fiber reinforced cementitious composites under uniaxial compression. Constr. Build. Mater. 2018, 188, 349–360. [Google Scholar] [CrossRef]
Liu, W.; Han, J. Experimental Investigation on Compressive Toughness of the PVA-Steel Hybrid Fiber Reinforced Cementitious Composites. Front. Mater. 2019, 6, 108. [Google Scholar] [CrossRef]
Liu, W.; Xu, A.; Han, J. Experimental study on the compressive behavior of PVA–steel hybrid fiber reinforced cementitious composites. J. Heilongjiang Univ. Technol. (Compr. Ed.) 2024, 24, 121–128. (In Chinese) [Google Scholar]
Hao, Q. Research on the Constitutive Model of Steel–PVA Hybrid Fiber Reinforced Cementitious Composites. Master’s Thesis, Wenzhou University, Wenzhou, China, 2017. (In Chinese) [Google Scholar]
Zhong, G.; Zhou, Y.; Xiao, Y. Study on the uniaxial stress–strain curve of steel–polyvinyl alcohol hybrid fiber concrete. Eng. Mech. 2020, 37, 111–120. (In Chinese) [Google Scholar] [CrossRef]
Liu, Y.N.; Li, H.; Li, H.W. Experimental study and constitutive modeling of fine steel fiber/PVA hybrid cement-based composites under uniaxial compression. Chin. Q. Mech. 2021, 42, 317–325. (In Chinese) [Google Scholar] [CrossRef]
Kuang, W.; Tan, Z.; Li, Y.; Li, X.; Liu, F. Study on the compressive behavior of steel–PVA fiber high-strength manufactured-sand concrete. Guangzhou Archit. 2025, 53, 71–77. (In Chinese) [Google Scholar]
Hu, J. Study on the Mechanical Properties of Steel–Polyvinyl Alcohol Hybrid Fiber Reinforced Cementitious Composites. Master’s Thesis, Kunming University of Science and Technology, Kunming, China, 2023. (In Chinese) [Google Scholar]
Zhao, X. Study on the Mechanical Properties of PVA–Steel Fiber Reinforced Cement-Based Materials. Master’s Thesis, Harbin Institute of Technology, Harbin, China, 2020. (In Chinese) [Google Scholar]
Gao, C. Experimental Study on Mix Proportion and Material Properties of PVA–Steel Hybrid Fiber Reinforced Cementitious Composites. Master’s Thesis, Lanzhou University of Technology, Lanzhou, China, 2022. (In Chinese) [Google Scholar]
Sree, K.S.S.; Koniki, S. Mechanical Properties of PVA & Steel Hybrid Fiber Reinforced Concrete. E3S Web Conf. 2021, 309, 01174. [Google Scholar] [CrossRef]
Ju, Y.; Zhu, M.; Zhang, X.; Wang, D. Influence of steel fiber and polyvinyl alcohol fiber on properties of high performance concrete. Struct. Concr. 2022, 23, 1687–1703. [Google Scholar] [CrossRef]
Zhang, X.; Wang, B.; Ju, Y.; Wang, D.; Zhu, M. Experimental Study and New Model for Flexural Parameters of Steel–PVA High-Performance Fiber–Reinforced Concrete. J. Mater. Civ. Eng. 2023, 35, 04023016. [Google Scholar] [CrossRef]
Sanchayan, S.; Foster, S.J. High temperature behaviour of hybrid steel–PVA fibre reinforced reactive powder concrete. Mater. Struct. 2016, 49, 769–782. [Google Scholar] [CrossRef]
Xu, Q.; Jiang, X.; Zhang, Z.; Xu, C.; Zhang, J.; Zhou, B.; Hang, W.; Zheng, Z. Experimental study on residual mechanical properties of steel-PVA hybrid fiber high performance concrete after high temperature. Constr. Build. Mater. 2025, 458, 139735. [Google Scholar] [CrossRef]
Wang, J. Experimental Study on the Effects of PVA Fiber and Steel Fiber on the Fracture Properties of High-Performance Fiber-Reinforced Cementitious Composites. Master’s Thesis, Beijing Jiaotong University, Beijing, China, 2011. [Google Scholar] [CrossRef]
Zhang, P.; Deng, R.; Hu, J.; Wu, L.; Tao, Z. Flexural performance of steel–PVA hybrid fiber engineered cementitious composites. Bull. Chin. Ceram. Soc. 2023, 42, 3125–3134. (In Chinese) [Google Scholar] [CrossRef]
Ding, Y. The Shock Compression Dynamic Performance Experimental Study of Steel and PVA Hybrid Fiber Reinforced Cement Matrix Composites. Master’s Thesis, South China University of Technology, Guangzhou, China, 2014. (In Chinese) [Google Scholar]
Chen, G.; Lv, M.; Zhu, H.; Zhang, J.; Zhang, L. Towards compressive and tensile strengths of hybrid steel and PVA fibre-reinforced cementitious composites: Experimental and analytical. Case Stud. Constr. Mater. 2025, 22, e04301. [Google Scholar] [CrossRef]
Li, S.; Ding, D.; He, S.; Lu, J.; Xiong, Z.; Wu, N. Research on fracture performance of steel–PVA hybrid fiber high-strength manufactured-sand concrete. Build. Struct. 2025, 55, 47–54. (In Chinese) [Google Scholar]
Sun, J.; Zhao, Y.; Li, L.; Tian, L. Research on the influence of steel–PVA fiber volume fraction on the mechanical properties of concrete. Concrete 2025, 96–103. (In Chinese) [Google Scholar]
BS EN 1992-1-1:2004; Eurocode 2: Design of Concrete Structures—Part 1-1: General Rules and Rules for Buildings. British Standards Institution (BSI): London, UK, 2004.
Chen, P.; Liu, C.; Wang, Y. Size effect on peak axial strain and stress-strain behavior of concrete subjected to axial compression. Constr. Build. Mater. 2018, 188, 645–655. [Google Scholar] [CrossRef]
Bolón-Canedo, V.; Remeseiro, B. Feature selection in image analysis: A survey. Artif. Intell. Rev. 2020, 53, 2905–2931. [Google Scholar] [CrossRef]
Kabir, H.; Garg, N. Machine learning enabled orthogonal camera goniometry for accurate and robust contact angle measurements. Sci. Rep. 2023, 13, 1497. [Google Scholar] [CrossRef]
Kazemi, F.; Özyüksel Çiftçioğlu, A.; Shafighfard, T.; Asgarkhani, N.; Jankowski, R. RAGN-R: A multi-subject ensemble machine-learning method for estimating mechanical properties of advanced structural materials. Comput. Struct. 2025, 308, 107657. [Google Scholar] [CrossRef]
Xiao, S.; Yang, J.; Liu, Z.; Yang, W.; He, J. Effects of steel fiber content on compressive properties and constitutive relation of ultra-high performance shotcrete (UHPSC). Buildings 2024, 14, 1503. [Google Scholar] [CrossRef]

Figure 1. CDF comparison of compressive strength (σc) for train and test sets.

Figure 2. Technical roadmap of the five-stage machine learning framework for analyzing fiber-reinforced concrete performance.

Figure 3. Global feature importance ranking for σc prediction (train vs. test).

Figure 4. Fiber-related feature importance ranking for σc prediction.

Figure 5. SHAP dependence plots and marginal histograms of key steel-fiber and PVA-fiber variables for the σc task.

Figure 6. Single-fiber main-effect curves for σc under Monte Carlo marginalization (B = 100).

Figure 7. Bivariate partial dependence plot showing the interaction of steel-fiber and PVA-fiber volume fractions in the prediction of compressive strength.

Figure 8. Mean synergy-gain surface

\bar{Δ}

(s, p) for σc with the

\bar{Δ}

= 0 boundary and the maximum point marked.

Figure 8. Mean synergy-gain surface

\bar{Δ}

(s, p) for σc with the

\bar{Δ}

= 0 boundary and the maximum point marked.

Figure 9. Overlay of the σc synergy boundary and the convex-hull-based data-support domain.

Figure 10. Global feature importance ranking for εc prediction (train vs. test).

Figure 11. Fiber-related feature importance ranking for εc prediction.

Figure 12. SHAP dependence plots of key variables for εc.

Figure 13. Mean synergygain surface and datasupport overlay for εc: (a) mean synergy-gain heatmap; (b) overlay of the

\bar{Δ}

= 0 boundary and the convex-hull-based datasupport domain.

Figure 13. Mean synergygain surface and datasupport overlay for εc: (a) mean synergy-gain heatmap; (b) overlay of the

\bar{Δ}

= 0 boundary and the convex-hull-based datasupport domain.

Figure 14. Overlay of the σc and εc synergy windows with dual-objective contours.

Figure 15. Pareto-front-based trade-off analysis between predicted compressive strength (σc) and peak strain (εc), including the training-set Pareto front, test-set re-evaluation, and representative candidate mixtures.

Table 1. Core literature sources and sample counts of the σc dataset.

No.	Literature Sources	Number of Specimens	Proportion of Dataset
1	Zhou et al. (2018) [1]	17	4.28%
2	Abbas et al. (2022) [3]	19	4.79%
3	Li (2024) [12]	18	4.53%
4	Wang (2016) [13]	20	5.04%
5	Sun et al. (2018) [15]	24	6.05%
6	Liu et al. (2019) [16]	19	4.79%
7	Liu et al. (2024) [17]	22	5.54%
8	Hao et al. (2025) [18]	27	6.80%
9	Zhong et al. (2020) [19]	17	4.28%
10	Zhao (2020) [23]	16	4.03%
11	Gao (2022) [24]	36	9.07%
12	Ju et al. (2022) [26]	17	4.28%
13	Zhang et al. (2023) [27]	16	4.03%
14	Wang et al. (2011) [30]	24	6.05%
15	Chen et al. (2025) [33]	14	3.53%
16	Sun et al. (2025) [35]	25	6.30%

Note: The full database contains 26 literature sources, with those contributing less than 3.5% not listed.

Table 2. Definition of input and target variables.

Feature Category	Abbreviation	Physical Meaning (Unit)
Cementitious material	FA	Fly_Ash content (%)
Binary indicator (0/1)	FA_zero	Fly Ash Addition Marker
Cementitious material	SF	Silica_Fume content (%)
Binary indicator (0/1)	SF_zero	Silica_Fume Addition Marker
Mix-proportion parameter	W/B	Water to Binder Ratio (-)
Mix-proportion parameter	S/B	Sand to Binder Ratio (-)
Chemical admixture	SP	Superplasticizer content (%)
Binary indicator (0/1)	SP_zero	Superplasticizer Addition Marker
Steel-fiber parameter	D_STF	Steel Fiber Diameter (mm)
Steel-fiber parameter	L_STF	Steel Fiber Length(mm)
Steel-fiber parameter	f_STF	Steel Fiber Tensile Strength (MPa)
Steel-fiber parameter	E_STF	Steel Fiber Elastic Modulus (GPa)
Steel-fiber parameter	V_STF	Steel Fiber Volume Fraction (%)
Binary indicator (0/1)	STF_zero	Steel Fiber Addition Marker
PVA-fiber parameter	D_PVA	PVA Fiber Diameter (mm)
PVA-fiber parameter	L_PVA	PVA Fiber Length (mm)
PVA-fiber parameter	f_PVA	PVA Fiber Tensile Strength (MPa)
PVA-fiber parameter	E_PVA	PVA Fiber Elastic Modulus (GPa)
PVA-fiber parameter	V_PVA	PVA Fiber Volume Fraction (%)
Binary indicator (0/1)	PVA_zero	PVA Fiber Addition Marker
Target variable	σc	Compressive Strength (MPa)
Target variable	εc	Peak Strain (%)

Note: 1. The percentages of FA and SF are calculated based on the mass of cement; 2. The volume fractions of V_STF and V_PVA are determined by the total volume of concrete; 3. σc denotes the target compressive strength; 4. εc denotes the target peak strain.

Table 3. Conversion coefficients for specimen-size/shape normalization.

NO.	Specimen Type	Non-Standard Dimensions	Conversion Coefficient
1	Cube	70.7 mm	0.95
2	Cube	100 mm	0.97
3	Cube	150 mm	1.00
4	Cylinder	100 (d) × 200 (h)	1.00
5	Cylinder	150 (d) × 300 (h)	1.05

Note: Since no explicit size-effect conversion standard is available for peak strain, this study used the same conversion coefficients as those listed for compressive strength in this table to approximately normalize the peak-strain records [37].

Table 4. Statistical description of key variables in the database.

Parameter	Sample Size	Maximum	Minimum	Mean	Median
FA (%)	397	0.700	0.000	0.276	0.200
SF (%)	397	0.200	0.000	0.044	0.000
W/B	397	0.550	0.176	0.342	0.315
S/B	397	2.240	0.200	0.961	1.000
SP (%)	397	0.050	0.000	0.010	0.008
D_STF (mm)	397	0.820	0.075	0.326	0.200
L_STF (mm)	397	58.000	13.000	20.713	13.000
f_STF (MPa)	397	3100.000	600.000	2055.466	2000.000
E_STF (GPa)	397	220.000	180.000	204.345	200.000
V_STF (%)	397	2.000	0.000	0.739	0.800
D_PVA (mm)	397	0.060	0.015	0.037	0.040
L_PVA (mm)	397	12.000	8.000	11.567	12.000
f_PVA (MPa)	397	1850.000	800.000	1552.217	1560.000
E_PVA (GPa)	397	43.000	29.000	39.247	40.000
V_PVA (%)	397	2.000	0.000	0.756	0.500
σc (MPa)	397	173.084	14.457	54.473	49.305
εc (%)	203	1.234	0.169	0.441	0.356

Note: Sample size is reported as number of records. All specimens were subjected to standard curing for 28 days.

Table 5. Performance comparison of candidate models for σc prediction under the internal-validation scenario.

Model	R²	MAE (MPa)	RMSE (MPa)
Multiple Linear Regression	0.9121	5.9924	7.8404
Random Forest	0.9703	2.9755	4.5608
Extra Trees	0.9676	3.1663	4.7587
XGBoost	0.9748	2.8136	4.2024
LightGBM	0.9783	2.8633	3.8940
CatBoost	0.9737	2.7409	4.2857

Table 6. Test-set performance comparison of candidate models for εc prediction.

Modeling Strategy (Test Set)	R²	MAE	RMSE
Baseline CatBoost	0.9575	0.0265	0.0399
Transfer-learning model	0.9650	0.0291	0.0363
Bayesian-optimized CatBoost	0.9659	0.0218	0.0358

Table 7. Quantitative summary of the σc mean synergy-gain surface.

Metric	Symbol/Setting	Value
Grid range(Steel × PVA)	s,p	[0.0,2.0]% × [0.0,2.0]%
Grid resolution	N × N,Δ	17 × 17, Δs = Δp = 0.125%
Monte Carlo samples	B	100
Global maximum mean synergy gain	max $\bar{Δ}$	4.794912 MPa
Location of max mean synergy gain	(s,p)	(1.875%, 2.000%)
Positive-synergy coverage (area share)	P( $\bar{Δ}$ > 0)	1.7%
Mean synergy gain within positive region	E[ $\bar{Δ}$ \| $\bar{Δ}$ >0]	3.271014 MPa
Global mean synergy gain	E[ $\bar{Δ}$ ]	−2.117415 MPa

Note: Δ(s,p) = f(s,p) − f(s,0) − f(0,p) + f(0,0).

\bar{Δ}

(s,p) is the Monte Carlo average over B = 100 samples drawn from the combined train + test dataset (n = 397), while varying only the fiber volume fractions on the grid.

Table 8. Discrete levels and dominant-level shares of key fiber-related variables in the εc interpretation (test set, n = 41).

Feature	Range (Test)	Levels (Test)	Top-1 Share	Top-2 Share
V_PVA (%)	0.00–1.70	11	22.0%	44.0%
D_PVA (mm)	0.020–0.040	4	80.5%	90.2%
V_STF (%)	0.00–2.00	10	24.4%	46.3%
f_PVA (MPa)	1300–1620	4	53.7%	85.4%
D_STF (mm)	0.200–0.820	6	56.1%	78.0%
L_STF (mm)	13–58	8	56.1%	70.7%

Note: Shares are reported for the test set to match the SHAP dependence plots (computed on X_test).

Table 9. Quantitative summary of the εc mean synergy-gain surface.

Statistic	Value
Monte Carlo samples (B)	100
Grid resolution	17 × 17 (0–2% × 0–2%)
Max mean synergy gain, max $\bar{Δ}$	0.0141629 (εc units)
Location of max $\bar{Δ}$	Steel = 0.38%, PVA = 1.62%
Area fraction with $\bar{Δ}$ > 0	17.99%
Mean $\bar{Δ}$ over $\bar{Δ}$ > 0 region	0.00412705
Mean $\bar{Δ}$ over all grid points	−0.0332134

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, M.; Chen, J.; Zhou, S. Interpretable Machine Learning Reveals Synergy-Gain Windows and Dual-Objective Mix-Proportion Boundaries for Compressive Strength and Peak Strain in Hybrid Steel–PVA Fiber-Reinforced Concrete. Buildings 2026, 16, 1927. https://doi.org/10.3390/buildings16101927

AMA Style

Liu M, Chen J, Zhou S. Interpretable Machine Learning Reveals Synergy-Gain Windows and Dual-Objective Mix-Proportion Boundaries for Compressive Strength and Peak Strain in Hybrid Steel–PVA Fiber-Reinforced Concrete. Buildings. 2026; 16(10):1927. https://doi.org/10.3390/buildings16101927

Chicago/Turabian Style

Liu, Maojun, Junwen Chen, and Shengkai Zhou. 2026. "Interpretable Machine Learning Reveals Synergy-Gain Windows and Dual-Objective Mix-Proportion Boundaries for Compressive Strength and Peak Strain in Hybrid Steel–PVA Fiber-Reinforced Concrete" Buildings 16, no. 10: 1927. https://doi.org/10.3390/buildings16101927

APA Style

Liu, M., Chen, J., & Zhou, S. (2026). Interpretable Machine Learning Reveals Synergy-Gain Windows and Dual-Objective Mix-Proportion Boundaries for Compressive Strength and Peak Strain in Hybrid Steel–PVA Fiber-Reinforced Concrete. Buildings, 16(10), 1927. https://doi.org/10.3390/buildings16101927

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable Machine Learning Reveals Synergy-Gain Windows and Dual-Objective Mix-Proportion Boundaries for Compressive Strength and Peak Strain in Hybrid Steel–PVA Fiber-Reinforced Concrete

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Construction and Definition of the Feature System

2.1.1. Data Sources and Sample Composition

2.1.2. Feature Classification, Coding, and Target Variables

2.2. Data Preprocessing, Internal-Validation Setting, and Analysis Boundaries

2.2.1. Data Cleaning, Size Normalization, and Statistical Characteristics

2.2.2. Dataset Splitting and Validation Strategy

2.2.3. Model Training and Synergy-Gain Calculation Framework

2.2.4. Five-Stage Framework and Technical Roadmap for the Quantitative Identification of Fiber Synergy

3. Results and Analysis

3.1. Model Performance Comparison and Base-Model Selection

3.1.1. Comparison of σc Models and Selection of the Base Model

3.1.2. εc Model Development and Small-Sample Modeling Strategy

3.2. Analysis of the σc Model Results and Synergy Mechanisms

3.2.1. Feature-Importance Ranking and Strength-Control Variables for the σc Model

3.2.2. SHAP Dependence Plots and Single-Fiber Main-Effect Curves

3.2.3. Mean Synergy-Gain Surface, Synergy Boundary, and Data-Support-Domain Constraint for σc

3.3. Analysis of the εc Model Results and Synergy Mechanisms

3.3.1. Global Feature Importance and Ranking of Fiber-Related Variables

3.3.2. SHAP Dependence Plots, Discrete-Level Support, and the εc Synergy-Gain Map

3.4. Implications of Dual-Objective Synergy: Trade-Offs Between Strength and Ductility and Mix-Proportion Boundaries

4. Discussion

4.1. Model Performance and Analytical Positioning

4.2. Interpretability of Key Variables and the Differentiated Roles of Steel and PVA Fibers

4.3. Engineering Implications of Synergy-Gain Windows and Dual-Objective Trade-Offs

4.4. Scope of Applicability, Limitations, and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Model Stability and Synergy-Window Reproducibility

Appendix A.1.1. Performance Variation Under Repeated Random Splits

Appendix A.1.2. Logic for Identifying the Synergy-Gain Window

Appendix A.2. Parameter Settings of the Transfer-Learning Model

Appendix A.3. Hyperparameter Settings and Optimization Results

Appendix A.4. Stability of SHAP Importance Rankings

Appendix A.5. Empirical Support for SHAP Dependence Regions (Density, Discrete Levels, and Tail Coverage) for Compressive Strength (σc) Using Train + Test Combined Data

Appendix A.6. Sensitivity to the Number of Monte Carlo Samples

Appendix A.7. Robustness Analysis of the Synergy-Gain Surface

Appendix A.8. Empirical Support for SHAP Dependence Regions (Density, Discrete Levels, and Tail Coverage)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI