1. Introduction
The vadose zone functions as a central regulator of global water, energy, and solute exchange, controlling infiltration, evaporation, groundwater recharge, plant water uptake, and contaminant transport [
1,
2,
3]. These processes are fundamentally governed by soil hydraulic properties that determine the movement and retention of water under unsaturated conditions. Among them, the soil water retention curve (SWRC), which describes the relationship between volumetric water content (
θ) and soil water matric potential or pressure head (
h), constitutes a cornerstone of vadose-zone hydrology [
2,
4,
5]. The SWRC not only characterizes moisture storage but also underpins predictive models of water flow and solute dynamics in soils.
Accurate representation of the SWRC is indispensable for simulations based on the Richards equation, which forms the theoretical backbone of most vadose-zone models [
6]. Because unsaturated hydraulic conductivity is typically derived from the SWRC through nonlinear constitutive relationships, small biases in
θ(
h) estimation can propagate nonlinearly, and often exponentially, into conductivity predictions [
4,
7]. These propagated uncertainties directly affect simulated infiltration rates, drainage behavior, solute transport, and the timing and magnitude of vadose-zone fluxes [
6,
8]. Consequently, uncertainty in the SWRC is widely recognized as a dominant source of modeling error at plot, catchment, and regional scales [
9,
10]. Improving the reliability and physical realism of SWRC estimation, therefore, remains a critical challenge in hydrological science.
Direct measurement of the SWRC is labor-intensive, time-consuming, and costly, particularly when extensive spatial coverage is required. Conventional laboratory methods require undisturbed soil cores and specialized equipment, yet often fail to capture field-scale heterogeneity and dynamic structural evolution [
9,
11]. As a result, pedotransfer functions (PTFs) have emerged as practical tools for estimating hydraulic properties from more readily available soil attributes such as texture, bulk density (
ρb), and organic matter (OM) content [
12,
13]. Through connecting the basic soil properties with the hydraulic parameters, the application of the PTFs makes it possible to implement soil hydraulic properties into the hydrological and land-surface models without exhaustive measurements. Traditionally, PTFs were developed in the process of moving from empirical regression methods to increasingly data-driven techniques. At the early stage, the development of the functions involved the construction of empirical connections between easily measurable soil properties and water retention curves or certain parameters of analytical models like the van Genuchten or Brooks–Corey functions [
4,
6]. In spite of being computationally efficient and convenient for practical implementation, parametric PTFs are limited in terms of predictive power due to assumptions about the underlying functional form and insufficient representation of the soil structure in the set of input variables [
4,
8].
Many natural soils possess complicated structures involving the occurrence of com-plex pore size distribution and heterogeneous structural patterns associated with aggregation, root channel formation, biological activities, and other factors. High-resolution image analysis proves that the pore connectivity and 3D structure play an important role in hydraulic behavior [
5,
14,
15]. At the same time, they are poorly represented by the simplest texture-based predictors. Therefore, classical parametric PTFs tend to show systematic errors in extrapolation or in application to structurally complicated soils [
8,
16].
A very important problem associated with traditional PTFs is their poor ability to capture the dynamics of the soil structural state. The vadose zone is far from being static; it experiences continuous modification due to agricultural management such as tillage, compacting effects caused by machinery use, grazing pressure, and shrinking–swelling cycles, as well as other processes that modify
ρb, porosity (
n), and pore connectivity, thus changing the water retention and unsaturated hydraulic conductivity functions [
14,
17,
18,
19]. Several widely employed PTF formulas ignore structural indicators or do not change the SWRC shape depending on the variation in
ρb and
n. Tian et al. [
7] showed that many PTFs provided almost the same retention functions regardless of the level of compaction in experiments that demonstrated significant SWRC flattening under conditions of increasing
ρb. Such insensitivity to dynamic structural changes limits the usefulness of classical PTFs in agricultural applications and makes them less reliable in simulating water dynamics.
Recently, ML techniques have become increasingly popular in constructing PTFs due to their ability to detect non-linear dependencies without the need for predefining the mathematical equation. Such approaches as ANN, SVM, RF, XGB, and others have shown higher prediction abilities in comparison with traditional regression techniques [
9,
13]. Large databases with information about soil hydraulics, e.g., UNSODA 2.0, allow for the training and evaluation of PTFs based on ML algorithms. For instance, Rastgou et al. [
20] achieved high predictability in SWRC estimation using optimized deep neural networks, whereas Pham et al. [
9] pointed out that XGB showed impressive success in modeling complicated retention behavior.
New techniques go even further beyond single-model approaches. The usage of ensemble learning, hybrid metaheuristic, and geographically informed machine learning models proved to provide even more robust results [
16]. For instance, Sun et al. [
21] used stacked generalization to improve SWRC prediction accuracy. Taherdangkoo et al. [
22] utilized a combination of PSO–GA optimization with an XGBoost model to predict compacted clay soils within an extremely wide suction spectrum. Moreover, Niu et al. [
23] considered geospatial heterogeneity while using machine learning (ML) algorithms for improving regional mapping of hydraulic properties. These innovations indicate a clear tendency towards developing flexible and geographically adaptive solutions.
However, there are still two crucial issues hampering the wide adoption of ML-based PTFs in hydrology. The first is high sensitivity to hyperparameter configuration, as most of the existing research uses manual tuning or grid-search techniques to optimize models, which may result in suboptimal parameter configurations [
12,
13]. Hyperparameter optimization frameworks were compared and showed large variance in terms of search speed and solution quality [
18]; however, advanced methods like Bayesian optimization, available in the Optuna framework, are seldom used for SWRC models. Interpretability is the second issue that should be addressed for achieving greater success with applying ML PTFs. It is necessary to figure out how specific soil properties influence
θ(
h) predictions to prove the physical consistency of obtained models. Emerging explainable ML techniques, including permutation importance and Shapley Additive Explanations (SHAP), provide quantitative insight into feature contributions, but systematic application of these tools across multiple input configurations remains limited.
Under such conditions, the current research work proposes a highly accurate yet interpretable XGB-based modeling methodology for estimating SWRCs through the use of the UN-SODA 2.0 dataset. In contrast to traditional parametric methods like the Van Genuchten–Mualem model, which requires the assumption of a definite continuous function, the current methodology involves prediction of the θ values at the designated matric potential points. The technique may be termed high-density pointwise estimation of SWRCs.
Three model configurations are compared: (i) a baseline XGB model with default settings, (ii) a Hyperopt-tuned model employing Bayesian optimization, and (iii) an Optuna-optimized model using an efficient tree-structured search strategy. Eleven input scenarios are evaluated, ranging from texture-only predictors to extended feature sets incorporating structural and compositional indicators such as ρb, n, OM, and particle density (ρp). Model performance is assessed using root mean square error (RMSE), coefficient of determination (R2), and Kling–Gupta efficiency (KGE), providing complementary perspectives on predictive accuracy and hydrological reliability.
To enhance physical transparency, permutation importance and SHAP analyses are conducted to quantify the contribution of individual soil properties across suction levels and input scenarios. By integrating structural soil indicators, advanced hyperparameter optimization, and explainable ML within a unified framework, this study aims to advance the accuracy, robustness, and interpretability of ML-based PTFs. In doing so, it addresses key methodological gaps in current SWRC modeling and provides a scalable pathway for improving vadose-zone simulations in both research and applied hydrological contexts.
2. Materials and Methods
2.1. UNSODA 2.0
All pedotransfer models were developed and evaluated using version 2.0 of the UNSODA database, a publicly available global compilation of measured soil hydraulic properties and associated pedological information. UNSODA 2.0 contains laboratory-measured SWRCs, hydraulic conductivity, and diffusivity data, together with supporting particle-size distribution and basic soil descriptors for 790 soil samples collected worldwide. Each sample is indexed by a unique soil identifier linking tables describing texture,
ρb,
ρp,
n, OM content, and hydraulic measurements [
17].
In this study, the drying branch of the SWRC was extracted for all soils with multi-point measurements. To avoid geographic or textural bias, all available samples meeting this criterion were included, yielding a diverse dataset spanning nine USDA textural classes. The resulting distribution across the soil texture triangle is shown in
Figure 1.
Any record containing missing values in one or more of the variables required by a certain scenario was dropped. This resulted in different sample sizes per scenario (see
Table 1). The training and validation for each of the scenarios were performed using exactly the same 80/20% split achieved via a fixed random seed to make sure that the data split would be consistent. The same test set was used for all scenarios to facilitate direct comparisons of model performance, and no test records were involved in the hyperparameter tuning process.
2.2. Input Variables and Feature Scenarios
The objective of this study was to identify a sufficient set of input variables that enables high-fidelity SWRC estimation while preserving physical interpretability and practical applicability in hydrological modeling. Accordingly, eleven input scenarios were designed to incorporate textural, structural, and compositional soil variables commonly available from soil surveys and experimental studies.
All scenarios included h, which governs water retention, together with texture fractions (Fsand, Fsilt, Fclay) that control pore-size distribution. Structural and compositional variables, ρb, n, OM, and ρp, were added incrementally to assess their marginal contribution to estimation accuracy and to determine which combinations yield the greatest improvement relative to data requirements.
Eleven input scenarios were defined to evaluate the predictive value of progressively expanded soil-variable sets. Scenario 1 represents the texture-only baseline (
Fsand,
Fsilt,
Fclay); subsequent scenarios incrementally incorporate structural and compositional variables, culminating in the full-feature configuration in Scenario 11 (
Table 2). This structured design enables comparison of model performance across scenarios and supports identification of the smallest set of input variables capable of delivering accurate SWRC estimates.
The statistical properties of the dataset are outlined in
Table 3 to establish the statistical envelope and dimensionality of the input space on which the eleven scenarios are based. The values presented in the table are important for understanding scenario performance because of the wide range and large coefficient of variation (CV) of texture and OM, which creates contrasting data that help the XGB algorithm to learn the mapping between
θ and h. On the other hand, low variability of
ρp with a CV value of 0.03 indicates that this variable carries less information density, contributing to its lack of importance for prediction compared to other features in the dataset. As such,
Table 3 describes the variability in the physical variables, while the scenario sample size (
Table 1) and selected input variables (
Table 2) control how this variability is split between scenarios.
2.3. Modeling Framework
The modeling objective was to estimate θ as a function of soil properties, thereby reconstructing the θ–h relationship from data. Each measured θ–h pair, together with its associated soil descriptors, was treated as an independent training sample. This formulation avoids reliance on predefined parametric retention models and allows the learning algorithm to infer nonlinear relationships from measurements.
XGB was selected due to its strong performance on tabular data, ability to capture nonlinear interactions, and built-in regularization. All models were implemented in Python using the XGBoost library (version 1.7.6). The regression objective was set to reg:squarederror, and the histogram-based tree method (tree_method = “hist”) was used to improve computational efficiency. A fixed random seed (random_state = 42) ensured reproducibility.
For each scenario, three model configurations were evaluated:
Baseline XGB, trained using default hyperparameters.
Hyperopt-optimized XGB, tuned via Bayesian optimization.
Optuna-optimized XGB, tuned using an alternative Bayesian search strategy.
Baseline models were trained with 600 boosting rounds, selected based on preliminary experiments that balanced accuracy and training time. No early stopping was applied to ensure consistent comparison across configurations.
Training the XGB models was carried out with the aim of predicting θ at the predetermined matric potential values throughout the full suction range covered by the UNSODA database. The predictions were produced individually for every suction value without interpolating them after training into parametric curves. As a result, the reconstructed SWRC in every soil sample is composed of many points produced by the model instead of one continuous hydraulic curve. Connecting the predicted points is possible for visualizing the retention curve, though.
2.4. Hyperparameter Optimization
Hyperparameter tuning was conducted using Hyperopt and Optuna to evaluate the impact of advanced Bayesian optimization frameworks on XGB performance. Both approaches employ tree-structured Parzen estimators but differ in sampling strategy and internal optimization mechanics.
The following hyperparameters were optimized in both frameworks: n_estimators (100–1000), max_depth (3–12), learning rate η (0.005–0.3, log-uniform), subsample (0.6–1.0), colsample_bytree (0.6–1.0), reg_alpha (0–10), and reg_lambda (0.1–10). These bounds reflect commonly recommended ranges for regression tasks and encompass values reported in soil-physics literature.
Each optimization was conducted for 50 trials per scenario. For Hyperopt, the TPE algorithm (tpe.suggest) was used with a fixed random seed (42). For Optuna, a separate study was created for each scenario with direction = “minimize”. No pruning or early stopping was applied in either framework. The optimization objective was the minimization of RMSE on the fixed test set. After optimization, the best hyperparameter configuration was used to retrain the model on the training set. The resulting configurations are summarized in
Table 4.
2.5. Model Evaluation
Model performance was evaluated using the RMSE, mean absolute error (MAE), R2, Willmott’s index of agreement (WI), and KGE. These complementary metrics collectively quantify accuracy, bias, variability, and overall agreement between measured and estimated θ. RMSE and MAE are expressed in cm3 cm−3, whereas R2, WI, and KGE are dimensionless. Metric definitions are provided in Equations (1)–(5).
All metrics were computed on the test dataset to ensure unbiased comparison across input scenarios and optimization strategies.
where
is the mean of the measured
.
r is the Pearson correlation coefficient between
and
θ.
is the ratio of estimated to measured standard deviations, and
is the ratio of estimated to measured means.
and
denote the estimated and measured volumetric water contents for the
i-th sample, respectively, and
n is the total number of test samples.
2.6. Model Validation Using Nested Grouped Cross-Validation (GCV)
However, given the presence of multiple θ–h observations generated from the same soil sample in the UNSODA 2.0 database, it is likely that the independence assumption regarding samples in a typical random train–test split can result in the overestimation of model performance. Specifically, in the compiled data used for the analysis, there were 1955 θ–h observations for 175 unique soil sample codes, implying that each soil sample included multiple observations. As a result, in the conventional random split, there were 170 soil sample codes present both in the training and testing splits, thereby leading to some form of information leakage between the two partitions.
To address the potential bias due to the hierarchical nature of the UNSODA 2.0 database (that is, the presence of several observations corresponding to the same soil sample) and to obtain an unbiased estimate of the model performance, the nested grouped cross-validation (GCV) procedure was adopted in this study. Specifically, the soil/sample code was employed as the grouping variable to preserve the physical integrity of samples in the folds. In the outer loop, the unbiased estimation of out-of-sample generalization capabilities was ensured using the 5-fold GroupKFold cross-validation approach. In the inner loop, hyperparameter tuning in each outer fold was carried out via the 3-fold GroupKFold optimization. Notably, the outer validation partition was strictly independent throughout the tuning process, implying that the selection of hyperparameters was determined exclusively by the minimization of the mean RMSE across inner folds.
2.7. Model Interpretability
To assess physical consistency and improve transparency, permutation importance and SHAP analyses were applied to the trained models. Permutation importance quantifies the reduction in R
2 resulting from random shuffling of each input variable (Equation (6)), thereby identifying variables essential for estimation performance. Each permutation was repeated ten times and averaged to reduce sampling variability.
where
denotes the trained model,
is the original test dataset, and
represents the same dataset with the
j-th randomly permuted input variable.
(.) denotes the performance metric.
SHAP values were computed using TreeSHAP, which provides exact Shapley values for tree-based ensembles. Importance rankings were derived from mean absolute SHAP values, while explanations were visualized using waterfall plots for representative samples. These analyses clarify how texture, structure, and compositional variables influence θ predictions across the range of h.
Since there is high collinearity between certain predictors (for example, n and ρb r = −0.94), SHAP values are regarded as marginal impacts on the trained model, not as the influence of each predictor independently. Therefore, the attribution analysis is based mainly on physically meaningful groups of predictors.
The overall workflow of data preprocessing, model training, optimization, evaluation, and interpretability analysis is summarized in
Figure 2.
3. Results
Pearson correlation coefficients (
r) between input variables and measured
θ are illustrated in
Figure 3.
θ exhibited positive correlations with
Fclay (
r ≈ 0.42),
n (
r ≈ 0.38), and OM (
r ≈ 0.31), reflecting increased water retention associated with finer textures, greater pore volume, and enhanced aggregation. In contrast,
θ was negatively correlated with
ρb (
r ≈ −0.38),
Fsand (
r ≈ −0.37), and
ρp (
r ≈ −0.20), consistent with reduced storage capacity in coarser or more compacted soils.
Strong collinearity was evident among texture fractions, particularly between Fsand and Fsilt (r ≈ −0.88) and Fsand and Fclay (r ≈ −0.67). Structural variables also showed strong dependence: n and ρb were highly negatively correlated (r ≈ −0.94), indicating that they conveyed overlapping but not identical information regarding pore space. OM was moderately correlated with n (r ≈ 0.56) and negatively correlated with ρb (r ≈ −0.64). These relationships highlight the importance of regularization and feature selection in multivariate modeling.
The estimation performance of baseline, Hyperopt-optimized, and Optuna-optimized XGB models is summarized for the train and test datasets in
Table 5 and
Table 6, respectively. Across all scenarios, baseline XGB models provided reasonable accuracy, with a mean test RMSE of 0.0455 and a mean
R2 of 0.8837. The best baseline performance was obtained for Scenario 11, which yielded a test
RMSE of 0.0356,
R2 = 0.9299, and
KGE = 0.9267.
Bayesian hyperparameter optimization improved model performance. Averaged across scenarios, Hyperopt-XGB reduced the mean test RMSE to 0.0243 and increased the mean R2 to 0.9658, while Optuna-XGB further reduced the mean RMSE to 0.0235 and increased R2 to 0.9679. Relative to the baseline, these improvements correspond to average RMSE reductions of 46.9% (Hyperopt) and 48.8% (Optuna), with concurrent increases in R2 of 0.082 and 0.084, respectively.
Train–test comparisons indicate that the tuned models exhibited slightly larger performance gaps than the baseline, reflecting increased model flexibility. However, the absolute differences remained small (mean RMSE gaps of −0.0142 for Hyperopt and −0.0144 for Optuna; mean R2 gaps ≈ 0.026), indicating good generalization and no evidence of severe overfitting.
Baseline model performance varied systematically across input scenarios (
Table 6). When only texture fractions were used (Scenario 1), the test
RMSE was 0.0612, and
R2 was 0.7937. Adding
ρb (Scenario 2) produced modest improvement (
RMSE = 0.0492;
R2 = 0.8666), whereas adding
n (Scenario 3) led to a more pronounced gain (
RMSE = 0.0446;
R2 = 0.8902). Scenarios including only OM or
ρp provided limited improvement relative to texture-only models.
Hyperparameter tuning reduced performance disparities across scenarios while preserving consistent trends. For Scenario 1, Hyperopt-XGB and Optuna-XGB reduced RMSE to 0.0400 and 0.0395, respectively, corresponding to error reductions exceeding 34%. In Scenario 3, tuned models achieved RMSE values near 0.021, representing reductions of more than 50% relative to the baseline. Scenarios incorporating n (Scenarios 3, 6, 9–11) yielded the highest accuracy, whereas scenarios relying on ρp alone remained weaker.
The most comprehensive scenario (Scenario 11) produced the best overall performance. Optuna-XGB achieved a test RMSE of 0.0183, R2 = 0.9815, WI = 0.9953, and KGE = 0.9825. Relative to the baseline in the same scenario, this represents a 48.6% reduction in RMSE and an increase in R2 of 0.0516. Compared with the simplest baseline configuration (Scenario 1), RMSE was reduced by 70.1%.
To complement the quantitative evaluation,
Figure 4 visually compares measured and estimated
θ–
h relationships for the best-performing Optuna-XGB-11 model across the test set. The estimated
θ values generally overlapped the measured observations over the logarithmic h range, indicating that the model reproduced the expected decline in water content with increasing suction. This agreement was evident across contrasting textures, including the rapid drainage behavior of sand and loamy sand, the intermediate retention patterns of silty loam and sandy loam, and the more gradual water release of clay loam and clay. Minor local deviations occurred mainly near the wet and dry extremes, where measured hydraulic data are typically more variable. Overall,
Figure 4 supports the numerical results by showing that the optimized model preserved physically plausible, texture-dependent SWRC behavior.
To complement the overall performance metrics,
Figure 4 visually evaluates whether the Optuna-XGB-11 model reproduced
θ–
h behavior across three hydrologically relevant suction ranges: low suction (
h < 100 cm), representing near-saturated conditions; intermediate suction (100 ≤
h < 1000 cm), corresponding to the main drainage transition; and high suction (
h ≥ 1000 cm), where retention is increasingly controlled by finer pore domains. Across these ranges, estimated
θ values generally followed the measured observations and preserved the expected decline in water content with increasing
h. The model reproduced the rapid drainage behavior of coarse-textured soils and the more gradual water release of finer-textured soils, with only local deviations near the wet and dry extremes. Thus,
Figure 4 provides visual support for the quantitative results by confirming physically plausible SWRC behavior across contrasting textures and suction conditions.
Texture-specific comparisons confirmed that the Optuna-XGB-11 model provided strong agreement between measured and estimated
θ across all soil texture classes represented in the test set (
Figure 5). The estimated values were closely distributed around the 1:1 agreement line, with high
R2 values ranging from 0.9656 for sandy loam to 0.9860 for clay, indicating stable predictive performance across contrasting textures. Loamy sand showed nearly unbiased behavior, with an intercept close to zero and a slope close to unity, while sand, silty loam, and sandy loam also exhibited low intercepts and slopes near one, demonstrating reliable prediction in coarse- and medium-textured soils. For clay loam and clay, the model retained high agreement across the broader
θ range associated with finer-textured soils, although the positive intercepts and slopes slightly below unity indicate minor compression of predictions toward the central
θ range. Overall, these results show that the optimized XGB model maintained robust texture-specific performance, with only limited class-dependent bias.
Residual distributions for the most accurate models are shown in
Figure 6. The baseline model exhibits systematic bias, with negative residuals at high
θ and positive residuals at intermediate values, indicating underestimation in wetter soils. Hyperopt-XGB and Optuna-XGB reduce both bias and variance, producing near-Gaussian residual distributions with improved symmetry.
Model accuracy is further summarized using a Taylor diagram (
Figure 7), which jointly compares standard deviation, correlation coefficient, and centered
RMSE for XGB-11, Hyperopt-XGB-11, and Optuna-XGB-11. The baseline model deviates from the reference point due to lower correlation and higher normalized
RMSE. Hyperopt-XGB-11 moves closer to the reference, indicating improved correlation and reduced error. Optuna-XGB-11 lies closest to the reference point, reflecting the highest correlation and lowest centered
RMSE among the evaluated models. This visualization confirms the superior overall performance of the Optuna-optimized configuration.
Compared to random splits, the GCV method provided a more rigorous way of assessing the generalization capabilities of the model. It can be observed from the results shown in
Table 7 and
Figure 8 that the random splits produced optimistic performance estimates since the observations used were correlated between training and testing sets in relation to the same soil samples.
GCV showed that the baseline XGB model demonstrated a higher RMSE compared to random validation, growing from 0.0356 to 0.0557. The performance deterioration is even more pronounced for optimized models: The Hyperopt XGB RMSE grew from 0.0227 to 0.0530, and the Optuna XGB RMSE from 0.0216 to 0.0543. Furthermore, the R2 score deteriorated from 0.9717 to 0.8374 for the first case and from 0.9742 to 0.8288 for the second one.
Even in the case of a more conservative estimation approach, both optimized models retain their predictive power, producing an R2 greater than 0.82, which means that generalizable relations are captured by the models between soil characteristics and moisture content. Among all the models, Hyperopt XGB model demonstrated the best performance under GCV.
SHAP waterfall plots for representative samples are shown in
Figure 9. Negative contributions from
Fsand reflect rapid drainage associated with coarse textures, whereas positive contributions from
Fclay,
n, and OM reflect enhanced storage in finer and more aggregated pore networks. Tuned models exhibit larger absolute contributions for structurally meaningful variables, indicating that optimization enhanced sensitivity to physically relevant soil properties.
Permutation importance results (
Figure 10) corroborate these patterns. Across all models,
h is the most influential variable, followed by
Fsand. Structural descriptors, particularly
n, exhibit higher importance in tuned models than in the baseline. OM shows moderate influence, while
ρp ranks lowest. Mean absolute SHAP values (
Figure 11) confirm the same hierarchy and indicate reduced variance in feature effects for optimized models, suggesting improved stability. Given the near-deterministic relationship between
n and
ρb, their relative SHAP rankings are interpreted as reflecting a shared structural control on SWRC behavior rather than isolated variable importance.
4. Discussion
The present study demonstrates that Optuna-optimized XGB-based PTFs provide accurate, robust, and physically interpretable estimates of SWRCs across a wide range of soil textures and matric suctions. By integrating Bayesian hyperparameter optimization with XGB and interpretable diagnostics, this work addresses two persistent challenges in data-driven SWRC modeling: sensitivity to model configuration and limited physical transparency.
The results clearly indicate that Bayesian hyperparameter optimization enhanced XGB performance relative to untuned baseline models. On average, Optuna-optimized models reduced test RMSE by approximately 50% and increased
R2 by about 0.08 across the eleven evaluated input scenarios. These improvements are consistent with previous ML studies showing that default or heuristically selected hyperparameters rarely yield optimal performance for complex environmental datasets [
12,
13]. The marginally superior performance of Optuna compared with Hyperopt is attributable to its more efficient exploration of the hyperparameter space and adaptive trial selection, as reported in other hydrological and environmental applications.
From a hydrological perspective, such reductions in SWRC estimation error represent more than statistical improvement. Because unsaturated hydraulic conductivity is derived from the SWRC through nonlinear constitutive relationships, even modest reductions in
θ(
h) error can translate into disproportionately large improvements in simulated infiltration, drainage, and soil water storage [
4,
7]. The magnitude of
RMSE reduction achieved here therefore constitutes a meaningful advance in the reliability of vadose-zone simulations under near-saturated conditions where model sensitivity is greatest [
6,
8].
Residual analyses further show that hyperparameter optimization not only reduced random error but also mitigated systematic bias, particularly the underestimation of
θ near saturation observed in untuned models. Accurate representation of the wet end of the SWRC is critical for simulating rainfall partitioning between infiltration and runoff and for capturing surface–subsurface exchange processes [
2,
6]. Inadequate representation of this region can lead to persistent underestimation of infiltration capacity and overprediction of runoff during high-intensity precipitation events, limitations that are reduced by the optimized models.
The difference between random splitting and GCV demonstrates the necessity of taking into account the hierarchy inherent to soil database construction. Datasets like UNSO-DA include multiple measurements made on the same soil sample, which therefore share common physical characteristics. Such an approach may result in misleading accuracy evaluation when applying the conventional random splitting technique. The use of grouped samples in accordance with their soil sample code guarantees that the test data comprise only soil samples not encountered before in the training set. Consequently, a higher value of RMSE along with a lower R2 represent a better approximation of model generalization. However, even in this case, the predictive performance is sufficiently high (GCV R2 > 0.82).
The evaluation of eleven input scenarios clarifies the relative roles of textural and structural soil properties in controlling SWRC predictions. Texture fractions dominate estimation accuracy, reflecting the strong influence of particle size distribution on soil pore size distribution and, consequently, on water retention behavior. However, the inclusion of structural descriptors such as
n and
ρb further improves model performance beyond texture-only scenarios by accounting for variations in pore volume and soil structure [
2,
4].
The model results under different scenarios should be evaluated taking into account that the unequal size of the training set is due to scenario-dependent filtration. This issue may have a slight influence on estimation precision since bigger data samples normally lead to better estimates. In order to eliminate such effects, the test sample and methodology were identical for all scenarios. As a result, any discrepancies between scenarios should be considered in light of the predictive power of additional predictors. Nevertheless, any improvement in the event of complex inputs should be regarded as both a quantity and quality issue.
The variable
n exerts a strong influence by constraining the upper limit of
θ and shaping the wet end of the SWRC. The marked improvement observed when n was added (Scenario 3) relative to texture-only models confirms that descriptors of pore space provide information not captured by texture alone [
19,
24]. Although
ρb is strongly correlated with
n (
r ≈ −0.94), it captures complementary information related to soil packing, aggregation, and mechanical disturbance, which influence pore connectivity and macroporosity beyond total pore volume [
17,
18,
19]. The joint inclusion of
n and
ρb therefore enables sensitivity to structural changes induced by compaction, tillage, or land use, an important limitation of many traditional PTFs [
24].
OM contributed modest improvements when combined with structural variables. This behavior aligns with the established understanding that OM enhances aggregation and microporosity, thereby increasing water retention, especially in finer-textured soils [
19,
20]. In contrast,
ρp exhibited negligible importance. Its limited variability within mineral soils constrains its explanatory power, and scenarios including only
ρp showed marginal gains over texture-only models. This finding corroborates earlier studies indicating that
ρp provides little additional information for SWRC estimation in mineral soil datasets [
12,
16].
Negative SHAP contributions associated with high
Fsand reflect rapid drainage from macropores, while positive contributions from
Fclay,
n, and OM capture enhanced retention in finer pores and greater total pore volume [
19,
20]. High negative correlation (
r ≈ −0.94) between the
n and
ρb suggests that the same structural information regarding pore space is embedded in these two variables. This poses some challenges for attribution analysis. For example, tree-based models have arbitrariness in assigning split frequency and gain between predictors that are highly correlated, and the SHAP scores only estimate marginal contributions conditioned on the model structure but not the unique physical effect. The larger SHAP importance score assigned to n compared to
ρb does not indicate the intrinsic importance of
n but simply demonstrates the choice made by the model to utilize
n as a representation of pore space.
Crucially, collinearity does not affect prediction performance, but does limit attribution interpretability. Based on physics knowledge, the two features should be considered together as structurally controlling water retention process. Although other techniques, e.g., conditional or grouped feature importance, could be applied to further separate the importance of these two features, they were selected to retain physical meaning and comparability with previous studies in pedotransfer literature. Therefore, SHAP importance will be interpreted from the viewpoint of structural versus textural features instead of individual variables.
Hyperparameter optimization amplified the influence of structurally meaningful variables and reduced residual variance, indicating that tuning improved not only predictive accuracy but also the clarity with which the model distinguished the roles of texture and structure. Such interpretability is essential for the acceptance of ML-based PTFs in hydrological modeling workflows, where transparency and physical plausibility are critical.
The estimation accuracy achieved in this study is competitive with that reported for both conventional and ML-based PTFs in the literature. SVM models developed by Cisty and Povazanova [
11] achieved an
RMSE value of 0.018 for the wetting branch of the SWRC, outperforming classical parametric formulations such as the Mualem and Kool–Parker models. The Optuna-optimized XGB models presented here achieved comparable
RMSE values (0.0183–0.0236) while explicitly modeling the entire drying branch across a broader range of soil types and using an extended set of input variables.
Boosted regression trees tuned via differential evolution have been shown to reduce RMSE relative to untuned models, yet their final accuracies (
RMSE ≈ 0.11–0.17;
R2 ≈ 0.58–0.79) remain lower than those reported here [
13]. ANN-based PTFs have also demonstrated strong performance; for example, Totola et al. [
19] reported
RMSE values near 0.045 for Brazilian soils, while Rastgou et al. [
20] achieved
R2 values approaching 0.98 using deep learning architectures. The present results indicate that optimized XGB models can match or exceed the accuracy of more complex ANN approaches while offering superior interpretability through SHAP-based explanations.
Recent physics-informed neural network approaches that embed the Richards equation have reported improved performance at the dry end of the SWRC [
10]. Incorporating analogous physical constraints into XGB-based or hybrid ensemble frameworks represents a logical extension of the present work and may contribute to reduced bias near saturation while preserving computational efficiency.
From an applied perspective, the results provide clear guidance for selecting input variables based on data availability and application requirements. Texture-only scenarios yield reasonable estimates but are associated with higher uncertainty. Including n leads to substantial performance gains and represents an effective compromise between data requirements and predictive accuracy. The inclusion of ρb further improves model accuracy for disturbed or compacted soils, and is therefore advisable when such information is available. The full feature set delivers the highest accuracy and is most appropriate for applications requiring precise SWRC estimates, including detailed vadose-zone simulations, agronomic planning, and geotechnical assessments.
The modeling strategy adopted in this study enables estimation of the
θ–
h relationship without imposing predefined functional forms, thereby addressing a long-recognized source of uncertainty in hydrological modeling [
9,
10]. The improved representation of both wet- and dry-end retention behavior supports increased confidence in downstream simulations of infiltration, evapotranspiration, drainage, and solute transport.
Several limitations merit consideration. Treating each h–property pair as an independent sample neglects potential autocorrelation within individual SWRCs, an assumption commonly adopted in PTF development. Future investigations could evaluate hierarchical or soil-specific modeling approaches to explicitly account for within-profile dependence.
In addition, reliance on complete-case analysis reduced the effective sample size for extended feature scenarios and may have led to the underrepresentation of highly organic or strongly structured soils. Expanding available datasets and incorporating additional descriptors of soil structure would likely improve model generalizability across a wider range of soil conditions.
Finally, although the proposed models exhibit strong physical consistency, they remain purely data-driven and do not explicitly enforce physical constraints such as the monotonic decrease of θ with increasing h. Integrating optimized XGB with physics-informed constraints constitutes a well-motivated direction for future research and may contribute to reduced bias near saturation while preserving computational efficiency.