4.3. Gradient Boosted Regression Tree Ensemble Model
To model the complex nonlinear dependency between the material descriptors and the sound reduction index , a Gradient Boosted Regression Tree (GBRT) ensemble was employed. Specifically, the LSBoost (least-squares boosting) algorithm was adopted, which minimizes the mean squared error (MSE) by sequentially adding regression trees that approximate the residuals of the current ensemble. In this additive framework, each boosting iteration refines the predictor by correcting the errors remaining after the previous trees, thereby enabling the model to capture high-order nonlinearities and interaction effects without explicit feature engineering.
The predictive performance of LSBoost is strongly governed by its hyperparameters, which jointly control the bias–variance trade-off and the effective complexity of the ensemble. Therefore, a multi-dimensional grid search was performed over three key parameters:
Learning rate : .
The learning rate scales the contribution of each newly added tree (shrinkage). Smaller values generally improve stability and generalization but require more boosting iterations to reach optimal performance.
Maximum number of splits : .
This parameter controls the complexity of each regression tree (weak learner), with larger values permitting deeper, more expressive trees capable of capturing intricate nonlinear patterns.
Number of learning cycles : .
This represents the total number of sequential trees in the ensemble, i.e., the boosting iterations.
The grid search revealed marked interactions between the learning rate and tree complexity. Performance trends were visualized using 3D surface plots in the () plane at the tree count corresponding to the globally optimal RMSE ().
Learning-rate sensitivity: Very small learning rates () produced a severely degraded test performance, indicating underfitting: the shrinkage was too strong for the ensemble to converge to an adequate solution within the predefined number of learning cycles.
Effect of tree complexity: Increasing consistently improved predictive performance within the effective learning-rate regime (notably ), implying that deeper base learners were necessary to capture the nonlinear interactions embedded in the dataset.
The best-performing configurations are summarized below (
Table 10).
The RMSE-optimal model
follows the classical boosting principle of a small learning rate with many iterations, which typically yields stable generalization and reduces the impact of occasional large residuals (
Figure 10).
In contrast, MAE and MAPE favored a more aggressive learning rate with shallower trees
, suggesting that this setting better reduces the typical (average absolute/relative) deviation, even if it is not optimal for penalizing larger errors (
Figure 10).
Table 11 presents a performance comparison of LSBoost models across training, training augmented, test, and combined datasets.
The MAE-optimized LSBoost model yields smaller average absolute deviations across datasets, whereas the RMSE-optimized model generalizes better on the test set and reduces large-error risk, as reflected by a lower RMSE and higher
(
Table 11).
Overall, both selected LSBoost models demonstrate substantially improved predictive capability relative to traditional linear regression formulations, highlighting the importance of accounting for nonlinearities and interaction effects when modeling acoustic insulation indicators from material–property descriptors.
4.4. Symbolic Regression Modeling via Genetic Programming (GPTIPS)
To obtain an explicit analytical model for predicting the Sound Reduction Index
from the available material descriptors, multigene symbolic regression was performed using GPTIPS 2 [
24,
25,
26]. Symbolic regression was selected because it can simultaneously capture strongly nonlinear relationships and variable interactions and return an interpretable closed-form expression, which is suitable for reporting and re-use in engineering practice.
Prior to model training, the input variables were rescaled using a min–max normalization procedure to map each predictor into the interval [0, 1]. Importantly, the normalization parameters were computed exclusively from the augmented training set, which served as the reference set in order to avoid any information leakage from the test set. The column-wise bounds were computed, and the normalization was applied as (9):
where
and
denote the minimum and maximum values of the j-th predictor column, respectively, and
was added to the denominator to ensure numerical stability in the unlikely event that range is equal to zero. This normalization step is particularly important in symbolic regression when exponential, logarithmic, and power operators are allowed because it reduces the risk of extreme intermediate values and improves the stability of the evolutionary search. This design ensures that the genetic programming process optimizes models on the augmented training data, while model performance is verified on the same independent test set implemented to all models to quantify generalization.
All symbolic regression runs were conducted with reproducible stochastic behavior by fixing the random stream using rng (‘default’). The evolutionary search used a population size of 100 individuals and proceeded for 100 generations, repeated over 10 independent runs. Multiple runs were employed to reduce sensitivity to random initialization and to increase the likelihood of locating high-quality models in the highly multimodal search space typical of symbolic regression.
A multigene representation was adopted with a maximum of five genes. In this framework, each gene corresponds to a symbolic expression tree, and the final predictor is formed as a weighted combination of these gene outputs. This structure increases modeling flexibility while retaining interpretability, as each gene may represent a distinct physical effect or interaction term.
To limit model complexity and mitigate overfitting, the maximum tree depth was constrained to 3. This restriction acts as an explicit structural regularization mechanism, ensuring that evolved expressions remain compact and scientifically interpretable.
Selection was performed using tournament selection with tournament size 2, providing moderate selection pressure and preserving diversity. Additionally, Pareto-based selection pressure was introduced through (gp.selection.tournament.p_pareto = 0.7), which promotes a trade-off between model accuracy and complexity and supports the discovery of parsimonious solutions. Elitism was enabled by retaining the best 5% of individuals to prevent the loss of top solutions between generations.
The function set was selected to enable both physically plausible transformations and nonlinear interactions commonly observed in material–acoustic phenomena. The allowed operators included: times, minus, plus, rdivide (restricted division), square, add3 (sum of three terms), exp, log, mult3 (multiplicative combination of three terms), sqrt, cube, and power.
The grid search shows strong underfitting on the test set in the simplest setting (g = 1, d = 1), where = 0.44 and RMSE = 6.37. Increasing depth and/or the number of genes generally improves accuracy by capturing stronger nonlinearities (e.g., for g = 3, d = 3: = 0.91, RMSE = 2.54).
The best overall result is achieved at g = 4, d = 3 (
= 0.993, RMSE = 0.69, MAE = 0.55, MAPE = 3.68%), indicating a sufficient model capacity to represent the underlying relationship. Moving to g = 5 does not further improve the optimum, suggesting diminishing returns beyond g = 4 at depth 3 (
Figure 11).
The Pareto front shows the trade-off between model complexity (x-axis) and training error 1 −
(
Figure 12). Each point represents one GP-derived equation; the Pareto-optimal solutions forming the front are highlighted in green, meaning no other model is simultaneously simpler and more accurate on the training set. The red point marks the selected optimal training set solution. Models near the “knee” of the green front provide the best accuracy–interpretability balance and were therefore considered primary candidates for final selection and subsequent test set validation.
Figure 13 lists representative Pareto-optimal symbolic regression models obtained by GPTIPS on the training set, reporting each model’s ID, goodness-of-fit (
), and expression complexity together with the explicit analytical form. Subsequently, only seven models exhibiting comparable predictive accuracy (highlighted in red) were subjected to further analysis.
Among the high- Pareto candidates, ID = 103 and ID = 166 provide the strongest evidence of robust generalization because they simultaneously achieve a very high test accuracy ( ≈ 0.993–0.994) and the lowest test errors (RMSE ≈ 0.65–0.69, MAPE ≈ 3.5–3.7%). In contrast, models such as ID = 567/815/820/925/928 show a clear train → test performance drop (higher RMSE/MAE/MAPE on test), indicating reduced transferability to unseen samples.
Robustness is defined as the consistency of
, RMSE, MAE, and MAPE values across different datasets (
Figure 14).
Large discrepancies (e.g., excellent training performance but degraded test performance) indicate dataset sensitivity, suggesting potential overfitting and/or dependence on the underlying data distribution. Conversely, similar values across datasets indicate a stable performance and consistent generalization, i.e., higher robustness (
Figure 14).
For each candidate model i, four values of the same performance metric are available (10), evaluated on different datasets (Train–real, Train–SMOTE, Test, and All), e.g.,
and analogously for RMSE, MAE, and MAPE.
For each metric, robustness is operationalized as the standard deviation across the four datasets. For RMSE (analogously for MAE and MAPE), for model
i (11),
A smaller
indicates that RMSE remains similar across datasets, implying a more stable model. Likewise, for (12) and (13),
RMSE, MAE, and MAPE are error measures (lower is better), whereas
is a goodness-of-fit measure (higher is better). To treat all criteria consistently in a “lower-is-better” form,
is transformed into an error-like quantity (14):
Robustness with respect to
then could be quantified as (15):
Because the metrics have different numerical scales, the standard deviations for metrics
i are normalized by the mean dispersion across all analyzed models (16):
A single robustness score is then defined as the sum (17):
Lower
values indicate that all metrics remain more consistent across datasets, i.e., a more robust model. The most robust model (
Figure 14) is selected as (18):
Figure 15 illustrates the robustness of Pareto-optimal MGGP models by quantifying performance variability across the training, augmented training, and independent test datasets. Models with minimal cross-dataset error variation are identified as robust, indicating stable generalization under limited experimental data. This analysis complements accuracy-based Pareto selection and supports the identification of MGGP models that are both accurate and reliable for the practical prediction of airborne sound insulation.
A mean-based Shapley value analysis [
34,
35] was applied to assess the global importance of input variables in the optimal (Model ID 166) MGGP symbolic regression model. The analysis was performed on the complete normalized dataset using the mean of all inputs as the baseline reference. Shapley values were computed by averaging the marginal contribution of each variable across all possible combinations of predictors, ensuring a fair attribution of both individual and interaction effects (
Figure 16).
The results show (
Figure 16) that SG is the most influential variable, accounting for approximately 55% of the total absolute Shapley importance, followed by PU_ADH with about 27%, while GC exhibits the smallest contribution (approximately 18%). Overall, the analysis confirms that the model predictions are primarily driven by SG, with secondary influence from PU_ADH and a weaker but positive contribution from GC.
Partial Dependence Plot (PDP) and Individual Conditional Expectation (ICE) plots (
Figure 17) were produced to interpret the MGGP symbolic model using the entire normalized dataset (inputs scaled to [0, 1]).
For each feature (GC, SG, PU_ADH), the feature was varied over a uniform grid in [0, 1] while all other inputs were kept at their observed values, and predictions were recomputed. ICE curves show the sample-wise response to the feature change, while the PDP (bold line) is the mean of ICE curves, representing the average marginal effect. The shaded band corresponds to the 10–90% prediction range, summarizing response heterogeneity and interaction effects.
GC shows a relatively weak average influence (small PDP change). SG and PU_ADH show stronger positive trends, with increasing ICE spread at higher values, indicating that their impact on is nonlinear and interaction-dependent (effect varies across samples).
The final equation of the optimal refined model (Model ID 166), expressed in a simplified form, is given as follows (19):
where Equations (20)–(22) define the normalized input variables as
Here,
,
, and
denote the original (non-normalized) input variables. Specifically,
represents the granulometric composition expressed as the percentage of the 0–2 mm fraction,
denotes the material density (kg/
), and
is the polyurethane adhesive dosage (gr/mm
2). Value
was introduced to avoid a zero denominator. The structure of the MGGP model is presented in
Figure 18.
The symbolic regression equation was then evaluated using these normalized variables, which allows the direct use of laboratory-scale measurements while preserving the model structure learned on the normalized domain
The comparative performance summary presented in
Table 12 clearly highlights substantial differences in predictive accuracy among the investigated modeling approaches. Classical regression formulations (linear, interaction, logarithmic, and exponential models) exhibit a limited predictive capability, reflected by relatively low coefficients of determination and high error levels. Although the quadratic regression model improves accuracy by introducing curvature effects, its performance remains constrained by the fixed functional structure and limited ability to capture complex interactions among material parameters.
Tree-based ensemble methods (Random Forest and Gradient Boosted Trees) achieve notably better predictive accuracy by effectively modeling localized nonlinearities and variable interactions. However, despite their improved performance relative to regression-based models, their prediction errors remain higher than those of the symbolic regression approach, and their black-box nature limits interpretability and direct engineering applicability.
The multigene symbolic regression model (GPTIPS, ID 166) clearly outperforms all alternative models, achieving the highest coefficient of determination and the lowest RMSE, MAE, and MAPE values (
Table 12). This result confirms that the symbolic model provides the most accurate and stable representation of the underlying relationship between material descriptors and the sound reduction index. Importantly, this superior accuracy is achieved while retaining an explicit analytical formulation, offering a unique combination of predictive performance and interpretability that is not attainable with conventional regression or ensemble learning techniques.
The residual statistics (
Table 13) indicate stable and unbiased predictive behavior across all datasets. The mean residuals for the training, augmented training, and test sets are close to zero (−0.051, −3.3 × 10
−5, and 0.050, respectively), confirming the absence of systematic overestimation or underestimation of the weighted airborne sound reduction index
(
Figure 19). A clear reduction in residual dispersion is observed from the original training set to the augmented training and test sets. The RMSE decreases from 1.001 for the training data to 0.764 for the augmented training data and further to 0.646 for the independent test data, indicating improved numerical stability and generalization following regression-oriented data augmentation, without evidence of overfitting. The Kolmogorov–Smirnov test confirms residual normality for all datasets (
p = 0.2597, 0.5935, and 0.8829 for training, augmented training, and test sets, respectively), suggesting that the prediction errors are predominantly random. Overall, the residual analysis demonstrates that the proposed framework yields statistically consistent and robust predictions under limited experimental data conditions.
To support practical deployment, a MATLAB-based graphical user interface (GUI) was implemented to enable the direct input of laboratory-measured parameters and real-time evaluation of the proposed MGGP model (
Figure 20).