1. Introduction
Soil moisture is one of the key variables in the water cycle and crop production processes within agricultural ecosystems [
1,
2]. In such systems, the water storage within the 0–200 cm soil profile not only determines the spatial and temporal distribution of available water in the root zone, but also directly affects crop transpiration intensity and duration at different growth stages, drought risk levels, and the scientific basis for irrigation scheduling [
3]. Deep soil water content (i.e., >80 cm), especially in rainfed or limited irrigation farmlands, serves as a critical buffer when surface soil dries out, sustaining plant transpiration and growth. This buffering function is essential for stabilizing crop yield and enhancing drought resistance [
4,
5]. Therefore, accurately capturing the temporal dynamics of deep soil moisture is of vital importance for drought assessment, water resource management, and efficient irrigation scheduling in farmlands [
6,
7].
However, in agricultural soil production management practice, continuous in situ monitoring of deep soil moisture is often very difficult [
8,
9]. Currently, commonly used soil moisture monitoring probes are usually installed to depths of 40–80 cm. Due to the cost of sensor network equipment, the difficulty of deploying sensors in deep agricultural soils, and the long-term maintenance issues of monitoring systems, it is difficult to provide continuous soil moisture monitoring data at deeper layers [
8]. Although manual sampling can relatively conveniently obtain moisture information for the 0–200 cm soil profile, this method is destructive, costly, and has low temporal resolution, making it difficult to achieve dynamic continuous monitoring of soil profile moisture changes [
10,
11]. This relative predicament in monitoring deep soil moisture information in agricultural fields means that agricultural water management often remains at the shallow information level, leading to irrigation timing and crop production layer soil moisture that are difficult to match with actual available water, thereby causing over-irrigation, under-irrigation, and long-term depletion of deep water storage [
12,
13].
Traditional physics-based soil hydraulic models, such as the Richards equation and its framework in the SPAC (Soil–Plant–Atmosphere Continuum) system, can theoretically describe rainfall or irrigation infiltration, water redistribution, and root water uptake processes [
14,
15]. However, these mechanistic models require precise soil physical hydraulic parameters, soil structure, and boundary condition inputs, and these related parameters and initial and boundary inputs are difficult to accurately obtain and reasonably set at the actual field scale [
16,
17]. Although remote sensing inversion methods perform well for surface moisture monitoring at regional scales, microwave and thermal infrared signals have limited penetration depth and cannot directly reflect the moisture status in the 0–200 cm soil profile range, often requiring coupling with models or ground data to compensate for the deficiency in monitoring deep soil moisture [
18,
19]. Therefore, achieving a balance between low cost and high accuracy to realize near-real-time monitoring of 0–200 cm soil profile moisture remains a core challenge in agricultural water research and management [
8].
In recent years, with the development of wireless sensor networks and machine learning technologies, data-driven methods have shown new potential in soil moisture inversion and prediction [
20,
21,
22]. Recent studies have further advanced this field by developing physics-aware machine learning models. For instance, diffusion processes have been integrated into probabilistic modeling to estimate subsurface soil moisture from surface observations across different climate settings using global datasets like ISMN [
23].Existing research indicates that there is a significant dynamic correlation between shallow soil moisture and deep soil moisture, which can be used to infer deep soil moisture changes, but prediction accuracy and stability usually decrease rapidly with increasing depth. At the same time, soil moisture at different depths shows obvious differences in response to meteorological factors: shallow soil is sensitive to changes in precipitation and evaporation conditions, while deep soil moisture changes are more affected by the comprehensive influence of multi-day infiltration and evaporation balance processes [
24,
25]. Therefore, comprehensively considering shallow moisture status and multi-day meteorological accumulation features in model construction is expected to more accurately characterize the spatiotemporal evolution patterns of soil moisture.
Based on this, this study proposes a framework for predicting deep soil moisture that integrates shallow soil moisture and multi-day accumulated meteorological features. By constructing meteorological feature sets under different cumulative time windows and combining various interpretable machine learning models for training and selection, the lagged response relationship between shallow and deep soil moisture is systematically characterized, and interpretability analysis of input features is conducted using SHapley Additive exPlanations (SHAP) and Partial Dependence Plots (PDP). This method can realize layer-by-layer near real-time reconstruction of 0–200 cm soil moisture without deploying deep probes. The research results can provide a low cost, scalable technical pathway for dynamic monitoring of deep soil moisture, and provide more data-driven and intelligent support for drought monitoring and agricultural irrigation management.
2. Materials and Methods
This study aims to predict the moisture content of the 0–200 cm soil profile in agricultural fields by using real-time monitoring of shallow soil moisture and multi-day accumulated meteorological features as inputs for machine learning prediction models. The research technical approach includes data collection (layered sampling, sensor monitoring, and meteorological data acquisition), model evaluation and optimization, model interpretability analysis, and near-real-time monitoring application of 0–200 cm soil profile moisture based on sensor networks. Through three consecutive years of multi-source data integration and machine learning modeling, a framework for dynamic and continuous monitoring of 0–200 cm agricultural soil moisture with both accuracy and deployability was established (
Figure 1).
2.1. Study Area
The study was conducted at the single Lifang Organic Dryland Farming Experimental Base in Yuci District, Jinzhong City, Shanxi Province (37°51′ N, 112°45′ E). The region is located on the eastern edge of the Loess Plateau and belongs to a temperate continental semi-arid climate zone, with an annual average temperature of approximately 9.5–10.8 °C and a multi-year average precipitation of 400–500 mm, of which 70–80% is concentrated in June–September. The annual sunshine hours are approximately 2000–3000 h, annual evaporation is 1500–2300 mm, and the frost-free period is approximately 180 days. The experimental area is at an elevation of approximately 850 m, representing a typical rainfed dryland agricultural system (non-irrigated) in the Loess Plateau. The soil type in the study area is cinnamon soil, with loam predominating in the 0–100 cm profile, organic matter content of approximately 1.4–1.6%, soil bulk density of 1.28–1.47 g·cm−3, loose soil structure, and moderate water retention capacity. The main crops grown in the agricultural fields are maize, millet, and sorghum.
2.2. Data Collection
This study aims to reconstruct the 0–200 cm agricultural soil moisture profile through the integration of shallow monitoring and multi-day accumulated meteorological features. The overall technical approach includes four components: data collection (layered sampling, sensor monitoring, and meteorological data acquisition), model construction and performance evaluation, model interpretability analysis, and near-real-time application based on sensor networks (
Figure 1). Through three consecutive years of multi-source data integration and machine learning modeling, a framework for dynamic reconstruction of agricultural soil moisture with both accuracy and deployability was established.
2.2.1. Layered Sampling of the Soil Profile
To obtain the ground truth of deep soil moisture distribution, systematic multi-temporal manual profile sampling was conducted over three consecutive years from 2023 to 2025. The sampling campaigns were strictly synchronized with the critical phenological stages of the rotation crops [
26]. A detailed schedule of the sampling campaigns, including specific dates and crop growth stages, is provided in
Appendix A (
Table A1). For instance, sampling was conducted on specific dates capturing distinct seasonal dynamics, such as 21 July and 11 August in 2023, and 9 June and 25 June in 2024. In total, approximately 280 soil profile datasets were collected using the manual soil auger method (0–20, 20–40, 40–60, 60–80, 80–100, 100–120, 120–140, 140–160, 160–180, and 180–200 cm).
To ensure comparability and quantitative analysis of soil moisture at different depths, the volumetric water content (θ, m
3·m
−3) of each layer was converted according to layer thickness to cumulative soil water content (Equation (1)), obtaining the integrated profile water content S
0−D for the depth range, in units of mm.
where Δz is the thickness of the layered monitoring soil, in cm; the coefficient 10 is used to convert volumetric water content to water layer thickness (mm); θ(i) is the volumetric water content of the i-th layer, in m
3·m
−3.
S0−D reflects the total soil water content from the surface to depth D, facilitating cross-layer and cross-period comparisons. This study set nine target depths: D ∈ {40,60,80,100,120,140,160,180,200} cm. This allows for calculation of the total soil water content for corresponding depth intervals, used for mutual reference and model validation among multi-layer results
2.2.2. Shallow Soil Moisture Sensor Monitoring
To enable continuous observation of shallow soil moisture and to provide high-temporal-resolution input data for the predictive model, a multi-layer (0–80 cm) soil moisture monitoring sensor network was deployed in the experimental area in July 2025. Six representative sampling sites were selected, and sensors were installed at 20 cm intervals within the 0–80 cm soil profile (i.e., 0–20, 20–40, 40–60, and 60–80 cm layers). This monitoring depth was strategically chosen to capture the active dynamics of the infiltration front and root-zone water uptake while minimizing installation costs. The reconstruction of the deep layers (100–200 cm) is achieved not by simple spatial extrapolation, but by modeling the time-lagged response patterns between these shallow state variables and cumulative meteorological driving forces [
27]. The volumetric water content at each depth was continuously monitored using high-sensitivity dielectric soil sensors (Model: RS-ECTH-N01-TR-1, Shandong Renke Control Technology Co., Ltd., Hangzhou, China). The sensors operated at a minimum sampling interval of 10 min, were equipped with automatic temperature compensation, and maintained stable measurement accuracy across a wide range of soil moisture conditions.
2.2.3. Meteorological Data and Potential Evapotranspiration Calculation
Daily meteorological data were obtained from the single on-site automatic meteorological observation station at the experimental base. Observation elements include precipitation (P, mm·d−1), average air temperature (T, °C), relative humidity (RH, %), 10 m height wind speed (U, m·s−1), and total solar radiation (R, MJ·m−2·d−1). All meteorological data were automatically collected by instruments and manually verified, with abnormal values and missing records removed to ensure data completeness.
Potential evapotranspiration (ET
0, mm·d
−1) was calculated using the FAO-56 recommended Penman–Monteith equation, with all required parameters derived from the above daily-scale meteorological records [
28]. To facilitate subsequent feature construction, this study further processed meteorological data with sliding time window accumulation to form accumulated meteorological elements for 5, 10, 15, and 25 days.
2.3. Analytical Scenarios
To systematically evaluate the role of shallow soil moisture and multi-day accumulated meteorological features in reconstructing soil moisture profiles, and to address any ambiguity in depth terminology, this study explicitly defines the depth categories aligned with the sensor configuration as follows: ‘Shallow/Surface layers’ refer to the model input horizons (0–40 cm); ‘Middle layers’ refer to the monitored transition zone (40–80 cm); ‘Deep layers’ refer to the unmonitored depths beyond the sensor range (>80 cm); and ‘Profile’ refers to the entire 0–200 cm soil column. Based on this classification, five types of input combination scenarios (A–E) were established (
Table 1), corresponding to different observation conditions and degrees of information fusion. By comparing prediction results under different scenarios, the relative contributions of shallow monitoring data and meteorological driving factors in inferring moisture dynamics across the middle and deep layers (40–200 cm) can be analyzed.
In scenarios B and D that include meteorological factors, six meteorological elements were selected: daily precipitation, potential evapotranspiration, average temperature, relative humidity, 10 m wind speed, and solar radiation (R), and sliding accumulations were calculated for time windows of 5, 10, 15, and 25 days to characterize the response of soil moisture to changes in meteorological elements. It should be noted that while the accumulations of precipitation and ET0 represent physical water mass fluxes (Input/Output), the accumulations of state variables (temperature, humidity, wind speed, radiation) serve as statistical proxies for the duration and intensity of atmospheric forcing. For tree-based machine learning models (e.g., RF, GBDT), the accumulated sum is mathematically equivalent to a scaled moving average, effectively functioning as a low-pass filter to capture the persistence of meteorological conditions (e.g., sustained drying winds or heatwaves) rather than high-frequency daily fluctuations.
For subsequent descriptive convenience and feature variables, in this study, surface layer 0–20 cm soil moisture is denoted as X1, 20–40 cm soil moisture is denoted as X2, and the accumulated values of six meteorological elements (daily precipitation, potential evapotranspiration, average temperature, relative humidity, 10 m wind speed, and solar radiation) are denoted as X3, X4, X5, X6, X7, X8, respectively. Soil profile 0–40 cm soil moisture is denoted as Y1, 0–60 cm soil moisture is denoted as Y2, and so on, with 0–200 cm soil moisture denoted as Y9.
On this basis, a near-real-time application scenario E based on sensor networks was further constructed. This scenario utilizes layered soil moisture sensor observation data in the 0–80 cm range (every 20 cm layer), combined with daily meteorological records, to generate model input features in real-time to predict changes in 0–200 cm profile soil moisture.
2.4. Machine Learning Models and Selection Criteria
To quantitatively characterize the nonlinear relationship between shallow and deep soil moisture, this study selected several representative algorithms including Ridge Regression (RR), Random Forest (RF), Gradient Boosting Decision Tree (GBDT, including XGBoost and LightGBM implementations), Support Vector Regression (SVR), and Multilayer Perceptron (MLP) for comparative modeling and selection [
29,
30,
31,
32]. These models have different characteristics in terms of structural complexity, nonlinear fitting capability, and interpretability, enabling systematic evaluation of prediction performance and stability under different modeling assumptions.
Model input features correspond to the four analytical scenarios set in
Section 2.3 (
Table 1), with the output variable being the soil water content S
0−D at different target depths. In linear models and SVR, input features were standardized to avoid dimensional bias [
33]; tree-based models (RF, GBDT) directly used original dimensional data [
34]. To ensure reproducibility, hyperparameters were tuned using Grid Search with 5-fold cross-validation. The search ranges were set as follows: for RF and GBDT, the number of trees (n_estimators) ranged from 100 to 1000, and maximum depth (max_depth) from 3 to 10; for SVR, the Radial Basis Function (RBF) kernel was selected, with the regularization parameter C ranging from 0.1 to 100 and gamma from 0.01 to 1. To mitigate temporal leakage during model evaluation, strict temporal segregation was applied: data from 2023 to 2024 served as the training/validation set, while the continuous monitoring data from 15 August to 15 September 2025 (Scenario E) was reserved exclusively as an independent temporal test set. Model parameters were determined jointly through grid search and five-fold cross-validation to achieve a balance between bias and variance.
Training set data consisted of collected data, with Scenario E data from 15 August to 15 September 2025 as continuous monitoring data, ensuring temporal forward consistency and avoiding information leakage. For each input scenario and different meteorological accumulation time windows (5–30 days), models were constructed and evaluated separately. Through comparing the coefficient of determination (R
2), root mean square error (RMSE, mm), and mean absolute percentage error (MAPE, %) of models on independent validation sets, the optimal input feature combination and meteorological accumulation time scale were comprehensively determined [
35,
36]. It is explicitly noted that for deep layers (>80 cm) where continuous sensors were unavailable, the calculation of these accuracy metrics was strictly based on the ground truth provided by the ~280 groups of manual profile sampling data collected over the three-year experimental period, ensuring a systematic evaluation of model skill at each depth.
2.5. Model Interpretability Analysis
To deeply investigate the influence patterns of shallow soil moisture and multi-day accumulated meteorological features in the formation and prediction of deep moisture dynamics, this study conducted interpretability analysis based on the optimal model. First, the SHAP to calculate the average contribution of each input variable to prediction results at different target depths [
37]. Based on game theory principles, this method can quantify the marginal role of each feature in model decision-making and evaluate its influence direction and intensity on prediction results. By comparing SHAP value distributions at different depths and different meteorological accumulation time windows, key driving factors dominating deep moisture changes can be identified, revealing how the importance of shallow moisture and meteorological variables evolves with depth decay or time window changes.
Furthermore, to intuitively display the nonlinear relationship and threshold effects between main input features and model output, Partial Dependence Plot (PDP) analysis was used for response curve analysis of key variables. By comparing response curves of different features (such as surface moisture and multi-day precipitation, potential evapotranspiration accumulation, etc.), the lagged-cumulative response pattern of deep moisture to shallow dynamics and meteorological drivers can be intuitively reflected.
Combining the comprehensive results of SHAP and PDP, this study verified the model’s interpretability from both statistical significance and physical logic aspects: shallow moisture dominates the rapid response of the upper profile, while multi-day accumulated meteorological features show significant buffering effects and time lag characteristics in deeper layers. This analysis not only reveals the internal laws of shallow-deep moisture transmission but also provides a theoretical basis for the model’s transferability across different climate conditions and crop systems.
3. Results
3.1. Prediction of Soil Moisture at Different Depths Using 0–20 cm Surface Layer as Input
In Scenario A, using 0–20 cm soil moisture as the sole input variable to predict soil moisture at different depths from 0 to 200 cm, the results are shown in
Figure 2. The goodness of fit (R
2) and error indicators of models at depths showed a gradual downward trend with increasing depth, indicating that shallow moisture information can effectively characterize the dynamic changes in the upper profile, but has limited explanatory power for deep moisture. Prediction accuracy was high within 0–60 cm, with R
2 values of 0.927 and 0.872, RMSE below 10 mm, and MAPE less than 8.5%, indicating that shallow moisture fluctuations have a significant linear relationship with near-surface water storage changes. As prediction depth increased (≥100 cm), model performance gradually weakened, with R
2 de-creasing from 0.777 to 0.582 and RMSE increasing from 18.6 mm to 38.9 mm, mainly because deep moisture is greatly affected by lagged recharge and soil structure, making it difficult for shallow information to fully capture its change process. Overall, SVR performed best in shallow layers (≤80 cm), while Random Forest and MLP had relatively more stable generalization performance in deep layers. This result indicates that although relying solely on surface sensor information can predict upper moisture status relatively well, it is necessary to combine multi-layer monitoring or meteorological accumulation factors to improve deep moisture estimation accuracy.
3.2. Prediction of Deep Soil Moisture Using Surface 0–20 cm and Multi-Day Meteorological Elements
In Scenario B, surface (0–20 cm) soil moisture and accumulated features of multi-day meteorological elements (precipitation, temperature, evapotranspiration, etc.) were used together as input variables to predict soil moisture at different depths from 0 to 200 cm. Model evaluation results are shown in
Table 2. Considering indicators such as R
2, RMSE, and MAPE comprehensively, the overall prediction performance of the model improved significantly compared to Scenario A, indicating that the combined features of shallow moisture and short-term meteorological processes can effectively enhance the interpretability of deep moisture.
From the overall trend, R2 ranged from 0.725 to 0.943, RMSE ranged from 4.9 to 32.3 mm, and MAPE remained in a relatively stable range of 6.5–9.3%. Compared with single surface moisture input, the introduction of meteorological elements significantly weakened the performance degradation caused by depth increase. Especially within 0–100 cm, R2 was higher than 0.85 and RMSE was controlled below 15 mm, indicating that meteorological driving signals have a strong response relationship to moisture changes in the middle and upper layers; while in the 120–200 cm deep layer, although R2 gradually decreased, it still remained between 0.71 and 0.84, an average increase of 0.05–0.08 compared to Scenario A, showing that meteorological accumulation features have a certain explanatory effect on the water recharge process of the lag layer.
Comparison of different accumulation time windows showed that 5–20 day accumulations could effectively improve model performance, but the optimal window did not monotonically change with time extension. Five-day and 10-day accumulations performed best in shallow layers (≤80 cm), with R2 reaching 0.937–0.905, indicating that short-term rainfall and evapotranspiration conditions have a direct driving effect on dynamic changes in upper soil moisture. The 15-day accumulation window performed relatively stable in middle-deep layers (100–160 cm), with R2 of 0.861–0.784 and RMSE between 16.3 and 23.5 mm, reflecting the balance between the cumulative effect of meteorological conditions and shallow lag conduction. The 20-day accumulation was slightly better in ultra-deep layer (>160 cm) models, indicating that longer time scale water infiltration and capillary recharge may have a more significant impact on deep moisture status.
3.3. Evaluation of Prediction Performance by Introducing 20–40 cm Soil Layer Moisture on the Basis of 0–20 cm Input
In Scenario C, on the basis of surface 0–20 cm moisture input, 20–40 cm soil moisture was introduced as composite input to predict profile moisture at depths from 0 to 200 cm. The results are shown in
Figure 3. Compared with Scenario A that relied only on 0–20 cm moisture, the overall model fitting accuracy improved significantly, especially showing more stable performance in middle-deep layers (100–160 cm).
From statistical indicators, R2 ranged from 0.712 to 0.980, RMSE was between 3.92 and 32.28 mm, and MAPE was controlled at 3.5–8.9%. Among them, prediction performance was best for 0–60 cm and 0–80 cm layers, with R2 of 0.980 and 0.948, respectively, and RMSE both below 8 mm, indicating that there is a high linear correlation between shallow and sub-shallow moisture. As prediction depth increased, model accuracy slightly decreased, but within 0–140 cm it still maintained R2 > 0.80 and MAPE < 8%, showing that combined input of surface and sub-surface layers can effectively capture middle-layer moisture dynamics. In the 160–200 cm deep layer, R2 decreased to 0.71–0.73, but was still about 0.05–0.08 higher than Scenario A at the same layer, indicating that dual-layer input significantly improved the predictability of deep moisture.
Compared with single-layer input models, introducing 20–40 cm moisture significantly enhanced the model’s sensitivity to vertical moisture gradients, reflecting the compound influence of shallow downward infiltration and lagged recharge. This result indicates that 0–20 cm and 20–40 cm combined input can more completely characterize the surface moisture redistribution process in physical sense, thereby improving the estimation accuracy of deep moisture. Overall, the dual-layer input approach significantly improved model robustness while maintaining low errors, providing a more practical solution for profile moisture inversion under shallow sensor conditions.
3.4. Prediction of Deep Soil Moisture Using Shallow Dual-Layer Soil Moisture Features and Multi-Day Meteorological Elements
In Scenario D, surface (0–20 cm) and sub-surface (20–40 cm) soil moisture features were combined with multi-day accumulated meteorological elements (precipitation, temperature, evapotranspiration, etc.) as input to predict soil moisture at different depths of the 0–200 cm profile. Model evaluation results are shown in
Table 3. Dual-layer moisture input superimposed with meteorological drivers significantly improved the prediction ability of deep soil moisture, and the performance was superior to Scenarios B and C, indicating that the coupling of shallow moisture dynamics and short-term meteorological processes can effectively compensate for the lack of deep monitoring data.
Overall, model R2 ranged from 0.759 to 0.981, RMSE was 3.9–29.6 mm, and MAPE was controlled at 3.3–8.7%. Prediction performance was optimal for shallow layers (≤100 cm), with R2 all higher than 0.92 and RMSE less than 11 mm, indicating that shallow and middle-layer moisture changes respond sensitively to meteorological signals. The 0–60 cm and 0–80 cm models had the highest accuracy (R2 = 0.980–0.962, RMSE < 7 mm), indicating a strong coupling relationship between shallow dual-layer moisture and the short-term accumulation process of precipitation and evapotranspiration. As depth increased, prediction accuracy showed a gradual declining trend, with R2 of 0.89 and 0.81 for 0–120 cm and 0–160 cm layers, respectively, but still improved by about 0.05–0.07 compared to Scenario B (single-layer input), indicating that the introduction of sub-surface moisture enhanced the model’s ability to characterize water storage and infiltration processes.
Different accumulation time windows also showed differences in model performance. Five-day and 10-day accumulated meteorological features had the best effect in shallow layer prediction, accurately reflecting the direct influence of short-term rainfall and evapotranspiration on moisture fluctuations; the 15-day window had the most stable comprehensive performance in the 100–160 cm range, with R2 of 0.857–0.793 and RMSE maintained at 17–25 mm, indicating that medium-term accumulation processes help characterize infiltration lag effects; 20-day accumulation showed slight improvement in deeper layer (≥180 cm) models, indicating that the cumulative effect of long-term meteorological processes has a certain compensating effect on deep moisture changes.
Compared with the previous scenarios, Scenario D showed stronger physical rationality and prediction stability. Shallow dual-layer input strengthened the characterization of surface moisture redistribution and vertical infiltration, while the introduction of meteorological factors supplemented dynamic information on external energy and moisture flux, thereby achieving refined inference of deep soil moisture. Results indicate that hybrid modeling combining shallow multi-dimensional sensor information and meteorological accumulation features is an effective approach to achieving near-real-time estimation of deep soil moisture in semi-arid regions, providing a feasible data foundation for precision irrigation, regional moisture balance assessment, and soil-crop system models.
3.5. Continuous Prediction and Dynamic Validation of Profile Soil Moisture Based on Scenario D
Based on the optimal model configuration of Scenario D, the near-real-time application (Scenario E) was constructed to integrate multi-layer IoT observation data (0–80 cm, one layer every 20 cm) and daily meteorological data. The model adopts a continuously updated operating mode, enabling daily-scale simulation and prediction of moisture at multiple profile layers including 0–60 cm, 0–80 cm, 0–100 cm, 0–120 cm, 0–140 cm, 0–160 cm, 0–180 cm, and 0–200 cm. The system effectively extended shallow observation information to deeper layers, achieving near-real-time estimation of 0–200 cm profile soil moisture (
Figure 4 and
Figure 5).
As shown in
Figure 4, at the 0–60 cm and 0–80 cm layers, predicted values and measured values showed extremely high consistency throughout the two-month monitoring period. The model could accurately track the wet-dry cycle processes caused by rainfall and evaporation, and the response to abrupt fluctuations was also relatively sensitive. The overall trend of predicted curves highly matched measured changes, with only slight deviations during individual strong infiltration stages. Statistical results showed that the model’s R
2 value consistently remained above 0.90, and RMSE was controlled within 10 mm, indicating that this rolling prediction method exhibited excellent performance in terms of temporal continuity and stability, with good dynamic tracking capability and adaptability.
Figure 5 shows the spatiotemporal dynamic characteristics of predicted deep-layer (60–200 cm) soil moisture. Profile moisture showed obvious gradient distribution along depth, and the response to surface wetting events gradually lagged with increasing depth. Shallow layers (≤80 cm) responded quickly to rainfall, while middle-deep layers showed a gradually wetting infiltration diffusion trend, reflecting the dynamic process of moisture infiltration and redistribution. The model well reproduced the penetration and retention patterns of soil moisture from top to bottom in semi-arid environments. Although deep layers (>80 cm) lacked direct observation data, the temporal change trend of prediction results was highly consistent with shallow measured responses, indicating that the model has good physical rationality and credibility in characterizing deep moisture dynamics.
This continuous prediction system realized the transformation from discrete monitoring to dynamic simulation. By integrating high-frequency IoT monitoring data and meteorological accumulation driving information, the system can achieve near-real-time updating and visualization display of profile soil moisture. Results indicate that the hybrid modeling method integrating shallow dual-layer moisture features and short-term meteorological accumulation factors is an effective approach to achieving near-real-time estimation of deep soil moisture in semi-arid regions, with good promotion and application potential.
3.6. Importance and Interpretability Analysis of Input Feature Variables for Scenario D
It is important to clarify that the following SHAP and PDP analyses are intended solely to reveal the internal decision-making logic of the trained model and to evaluate the physical consistency of the learned feature influences. These methods provide model interpretability rather than empirical accuracy. The empirical validation of the model’s predictive performance against actual field observations has been presented in the preceding
Section 3.1,
Section 3.2,
Section 3.3 and
Section 3.4 (
Table 2 and
Table 3 and
Figure 2 and
Figure 3).
3.6.1. Importance Distribution and Depth Differences in Input Feature Variables (SHAP Analysis)
Global and layered analysis based on the SHAP (SHapley Additive exPlanations) method revealed the contribution characteristics of different input variables in the Scenario D model (
Figure 6). From the global feature importance perspective (
Figure 6a), shallow dual-layer soil moisture variables X2 (20–40 cm) and X1 (0–20 cm) occupy a dominant position in overall explanatory power, with their average |SHAP| values significantly higher than other variables, indicating that profile moisture prediction mainly depends on the immediate moisture status of shallow and sub-shallow layers. Meteorological accumulation features (X3–X8) have relatively low overall contributions, but they have a sustained background influence on deep target variables, reflecting the lag effect of meteorological driving processes in moisture dynamics.
Depth distribution characteristics (
Figure 6b) further indicate that shallow (0–60 cm) and middle-layer (0–120 cm) predictions are mainly driven jointly by X1 and X2, with the sensitivity of both to soil moisture changes being strongest in upper layers; while in deeper layers (≥140 cm), the relative importance of X7 (one of the multi-day meteorological accumulation factors) significantly increased, indicating that deep soil moisture has a more obvious response to long-term accumulation of meteorological processes. In contrast, contributions of other meteorological variables (such as X6, X8) are relatively limited, indicating that their effects are mostly reflected indirectly through X7.
Overall, SHAP analysis revealed the dominant hierarchical structure of profile moisture drivers: shallow dual-layer moisture (X1, X2) provides direct dynamic signals, dominating prediction accuracy; while meteorological accumulation features (especially X7) show auxiliary explanatory power in middle-deep layers, used to characterize lagged recharge and buffering processes. This result is consistent with the significant improvement in R2 at different depths in the Scenario D model, indicating that the input structure of ‘dual-layer moisture combined with meteorological accumulation’ has rationality and complementarity at both physical and statistical levels.
3.6.2. Profile Response Patterns and Variable Marginal Effects (PDP Analysis)
To further elucidate the response characteristics of different input variables to profile moisture prediction, marginal effects of shallow, middle, and deep layer targets were analyzed based on the PDP (Partial Dependence Plot) method (
Figure 7,
Figure 8 and
Figure 9).
Shallow layer targets (0–60 cm) showed approximately linear positive response relationships to X1 and X2 (
Figure 7), indicating that shallow water storage is mainly determined jointly by the immediate moisture status of surface and sub-surface layers, and shallow soil moisture changes are highly synchronized with local precipitation infiltration and evapotranspiration processes.
Middle-layer targets (0–120 cm) showed significant thresholds near X
1 ≈ 33–35 and X
2 ≈ 35–38 (
Figure 8), with predicted values rising rapidly after exceeding these thresholds, reflecting the beginning of shallow infiltration downward transmission and effective recharge, indicating that moisture in this layer is mainly driven by infiltration-accumulation processes. In addition, X
7 (meteorological accumulation) showed a linear positive effect in the middle-layer range, showing that the superposition of medium-term meteorological conditions has a sustained influence on moisture recharge.
The response pattern of deep layer targets (0–200 cm) was obviously different (
Figure 9). In the higher value range of X7, the marginal effect gradually weakened or even turned negative, indicating that deep moisture has limited sensitivity to short-term meteorological changes and is mainly dominated by buffering and dissipation mechanisms. This trend also suggests that deep water storage systems have strong regulation and lag characteristics, and their dynamic changes more reflect long-term infiltration accumulation and capillary recharge processes.
Comprehensively viewing the results of the three layers, shallow variables (X1, X2) maintain high explanatory power at different depths and are the main driving force of the model; X7 reflects the integral effect of deep layers on meteorological processes, playing a supplementary explanatory role. The complementarity of the two in time scale and response mechanism enables the Scenario D model to capture both short-term dynamics and reflect medium-to-long-term water storage trends in profile prediction. This pattern mutually confirms with the SHAP heatmap results (
Figure 6b), verifying the consistency between model structure and physical processes.
4. Discussion
4.1. Optimization and Selection of Input Feature Variables
Systematic comparison of different input feature combinations (Scenarios A–D) in this study showed that the integration of shallow dual-layer soil moisture (0–20 cm, 20–40 cm) and multi-day meteorological accumulation can achieve near-real-time reconstruction of 0–200 cm profile soil moisture without deploying deep probes, with superior prediction accuracy and stability compared to single information source input (
Table 2 and
Table 3). This combination of ‘shallow moisture information and meteorological process driving’ forms complementarity in model interpretability and physical rationality, significantly alleviating the attenuation of deep soil moisture prediction error with increasing depth (
Figure 4 and
Figure 5).
In terms of input feature optimization, research results further revealed the hierachical effect characteristics of shallow and meteorological variables. Shallow 0–20 cm soil moisture reflects short-term moisture dynamics such as surface infiltration and evapotranspiration, while the 20–40 cm layer has both shallow redistribution and transitional characteristics of downward recharge. Combined input of both can effectively capture vertical gradient changes between shallow and middle layers [
38,
39]. This is consistent with observation results in northern dryland areas by Meng et al. [
40], who pointed out that shallow dual-layer information has significant explanatory power for middle-layer (≤100 cm) moisture changes, especially being able to reflect infiltration depth and lagged recharge processes after short-term rainfall events.
On the other hand, the introduction of multi-day meteorological accumulation features showed obvious gains in middle-deep layers. Research found that prediction accuracy of different accumulation time windows did not monotonically improve with time length, but showed layered optimal characteristics: 5–10 day accumulation is most suitable for surface rapid response processes, 10–15 days perform best for middle-layer infiltration and moisture redistribution, while ≥20 day accumulation can better characterize deep buffering and long-term storage response. It is important to emphasize that these specific optimal time windows are intrinsically linked to the loamy soil texture and semi-arid climate of the study area. Soil texture determines the saturated hydraulic conductivity (Ks) and water retention capacity, which govern the vertical velocity of the wetting front. In our study (loam soil), the 20-day accumulation effectively captured the lag in deep-layer recharge. However, for coarse-textured soils (e.g., sandy soil), where Ks is higher, the infiltration rate is faster, likely shortening the optimal lag window (e.g., <15 days). Conversely, for fine-textured soils (e.g., clay), the lower permeability would enhance the buffering effect, potentially extending the optimal accumulation window beyond 25 days. Therefore, while the hierarchical pattern of time windows (increasing with depth) is generalizable, the specific duration values require calibration based on local soil physical properties. This result is consistent with findings based on data-driven models in semi-arid regions abroad. For example, Pal et al. [
41] pointed out that soil profile moisture has depth dependence on the time integration of meteorological processes, with surface response mainly dominated by immediate rainfall and evapotranspiration, while deep moisture regulation reflects cumulative infiltration. The ‘non-monotonic optimal time window’ conclusion of this study further supports this view.
Overall, the input structure of ‘dual-layer moisture and meteorological accumulation’ in this study balanced physical-driven interpretability and data-driven flexibility: shallow moisture provides immediate state information, meteorological accumulation factors reflect external driving force and lag effects, both jointly determining the prediction performance of profile dynamics. This input combination achieved a good balance among accuracy, stability, and deployment feasibility [
42,
43], laying a data and methodological foundation for future real-time monitoring and regional promotion under sensor network conditions.
4.2. Mechanism Analysis of Monitoring Methods and Scenario Applications
SHAP and PDP interpretability results further revealed the mechanism of shallow and meteorological features at different depths. Global SHAP results (
Figure 6) showed that X2 (20–40 cm) and X1 (0–20 cm) ranked in the top two in overall prediction contribution, indicating that profile moisture dynamics are mainly con-trolled by shallow immediate status; meanwhile, the contribution of meteorological accumulation feature X7 significantly increased at depths ≥140 cm, reflecting the lagged response of deep moisture to meteorological processes.
PDP analysis results (
Figure 7,
Figure 8 and
Figure 9) revealed typical nonlinear characteristics of profile layering from the perspective of response curves. Shallow layer targets (0–60 cm) showed approximately linear positive responses to X1 and X2, indicating that moisture in this layer is mainly controlled by surface redistribution and evapotranspiration consumption, with rapid and direct responses to meteorological disturbances (rainfall, evaporation). Middle-layer targets (0–120 cm) showed obvious thresholds near X1 ≈ 33–35 and X2 ≈ 35–38, with predicted values rising sharply after exceeding the thresholds, indicating that when shallow moisture exceeds a certain level, infiltration and recharge processes begin to dominate moisture changes [
3,
22]. This characteristic corresponds to the nonlinear seepage mechanism in the soil moisture infiltration-redistribution process, consistent with measured patterns based on SPAC models, that is, the ‘effective infiltration point’ determines the initiation time of lower layer recharge.
In deep layers (0–200 cm), the marginal effect of X7 gradually weakened or even turned negative in the higher value range, reflecting that deep layers have low sensitivity to short-term meteorological changes and are mainly controlled by cumulative infiltration and buffering mechanisms. This indicates that deep water bodies mainly function as moisture reservoirs, with their dynamic changes showing obvious buffering and lag effects over time. This pattern has been confirmed in existing studies in semi-arid regions; for example, Granata et al. [
44] pointed out that moisture changes in deep layers (>150 cm) have weak immediate responses to single rainfall events, but the cumulative effect is significantly enhanced after consecutive precipitation.
Regarding the physical interpretation of meteorological features, distinct mechanisms are observed. Accumulated precipitation and $ET_0$ directly influence the soil water balance as mass fluxes. In contrast, features like ‘accumulated wind speed’ or ‘temperature’ quantify the integrated aerodynamic and thermal energy supply over time. For instance, a high value of accumulated wind speed does not imply a physical storage of wind, but rather signifies a prolonged period of high aerodynamic conductance, which accelerates surface evaporation and soil drying. This ‘integrated forcing’ effectively explains why deep soil moisture—which is insensitive to daily wind gusts—responds significantly to multi-day accumulated atmospheric anomalies.
From the overall mechanism perspective, the performance of this study’s model is consistent with the moisture coupling process of the soil-atmosphere-crop system. Shallow moisture controls surface water and heat fluxes and short-term dynamics, middle-layer moisture reflects lagged infiltration and root zone regulation, while deep layers embody long-term water storage and energy buffering processes. The cross-validation of SHAP and PDP indicates that this model not only possesses statistical interpretability but is also consistent with the logic of physical processes. These results demonstrate that the integration of shallow sensing and meteorological driving information does not merely rely on model fitting, but rather captures data patterns consistent with moisture transport mechanisms.
4.3. Algorithm Selection and Comparison Under Different Scenarios
In the systematic comparison of four input scenarios (A–D), Scenario D (dual-layer moisture and meteorological accumulation) performed optimally, with prediction R
2 reaching above 0.98 for the 0–80 cm range, maintained at 0.85–0.90 for the 0–140 cm range, and still maintained above 0.76 for the 0–200 cm profile. (
Table 3). This indicates that at different depths, the combination of shallow moisture status and meteorological accumulation signals can both capture rapid dynamics and reflect buffering trends. In comparison, Scenario A relies only on 0–20 cm moisture input; although it can reconstruct shallow changes relatively well, its response to deep layers ≥100 cm is obviously insufficient; Scenario B (adding meteorological factors) and Scenario C (adding 20–40 cm layer moisture) both showed improvements to varying degrees, but are still slightly lower than the comprehensive performance of the dual-source integration approach.
At the algorithm level, the applicability of various models shows obvious differences at different depths. Support Vector Regression (SVR) performs prominently in shallow layer (≤80 cm) fitting, mainly because its kernel function is sensitive to local linear relationships and can well characterize the direct response of shallow moisture to meteorological factors; while tree-based ensemble models (RF, GBDT) have better robustness in deep layers and cross-year validation [
29,
30], capable of capturing complex nonlinear and feature interaction relationships. XGBoost has the most balanced performance in middle-deep layers (100–160 cm), and LightGBM has higher computational efficiency when data volume is large, suitable for embedding in real-time prediction frameworks. In comparison, linear models (such as Ridge), although highly interpretable, have difficulty accurately reflecting infiltration thresholds and lag effects.
In terms of model generalization, this study used 2023–2024 data for training and 2025 data for independent validation. Results showed that Random Forest and GBDT have better inter-annual stability than SVR, indicating that tree-based ensemble methods have stronger robustness to variations across different climate years [
29]. In addition, model ensemble strategies have further potential in the future: a hierarchical model structure can be adopted, using SVR or linear models for shallow layers and GBDT or hybrid models for middle-deep layers, achieving the combined advantage of ‘shallow local precision—deep statistical smoothing’. Similar ideas have also been proven effective in recent hydrological machine learning research [
45,
46].
4.4. Limitations of This Study and Future Improvement Directions
Although the shallow monitoring and meteorological accumulation integration framework proposed in this study performed well in prediction accuracy and physical consistency, there are still several limitations and room for improvement.
First, the spatiotemporal representativeness is limited. The study was based on single-station data from the typical dryland area of Yuci on the Loess Plateau, characterized by loamy (cinnamon) soil. Since the established relationships—particularly the specific time lag windows for meteorological accumulation—are sensitive to soil hydraulic conductivity, the specific parameters (e.g., the 20-day threshold for deep layers) may not generalize to different soil textures (e.g., sandy or clay soils) or climatic zones without re-calibration [
47]. Future research should conduct parallel verification across a gradient of soil textures to establish a quantitative relationship between soil physical parameters (e.g., sand/clay content) and optimal meteorological accumulation windows, thereby enhancing the model’s transferability.
Second, the explicit integration of soil physical properties and crop parameters remains a challenge. To maintain the low-cost and high-deployability advantages of the proposed method, this study did not include difficult-to-acquire in situ parameters such as saturated hydraulic conductivity (Ks), saturated water content (θs), or dynamic root depth distributions as model inputs. While the machine learning models captured the implicit patterns of infiltration and root water uptake through time-lagged meteorological features—as evidenced by the threshold behaviors observed in the PDP analysis—the lack of explicit physical constraints may limit the model’s interpretability under extreme conditions. Future research should explore Physics-Informed Machine Learning (PIML) frameworks that incorporate these parameters as static boundary conditions or regularization terms to ensure that predictions strictly adhere to soil water dynamics laws (e.g., Richards equation) [
48,
49].
Third, deep layer verification data are insufficient. Continuous sensor monitoring was limited to the 0–80 cm depth, while validation for depths > 100 cm relied on manual drilling samples. The lower temporal resolution of manual sampling compared to the daily model outputs creates a temporal mismatch, which may mask potential short-term prediction errors or rapid fluctuations in deep layers. However, given that deep soil moisture dynamics are physically characterized by significant damping and stability compared to surface layers, we assume high-frequency fluctuations are minimal. Future studies should ideally employ deep-profile sensors (e.g., tube-TDR or neutron probes) to rigorously validate the model’s performance on a daily time scale.
Fourth, the lack of uncertainty quantification is a limitation. The current study reports deterministic metrics (e.g., RMSE, R
2), which do not account for sensor noise or model parameter variability. As noted by the reviewer, soil moisture is critical for decision-making, and reporting only point predictions is inadequate for risk assessment. Future research should employ probabilistic methods—such as Bootstrapped Ensembles, Monte Carlo dropout, or Bayesian Regression—to generate confidence intervals and rigorously quantify the reliability of deep-layer predictions [
50].
Fifth, the comparison with physical baselines is absent. This study focused on a data-driven reconstruction approach and did not benchmark against physical models (e.g., HYDRUS-1D or Richards equation approximations). This decision was based on the lack of high-resolution in situ soil hydraulic parameters (e.g., retention curves, saturated conductivity) required to parameterize physical models accurately at the field scale. Without precise parameterization, physical models often suffer from high uncertainty, making a fair comparison difficult. Future research can explore Physics-informed ML or Hybrid ML models, ensuring consistency between prediction processes and SPAC mechanisms while maintaining flexibility [
51,
52].
In addition, the significance at management and application levels still needs to be further deepened. The shallow monitoring and near-real-time reconstruction framework can convert shallow sensor data into root zone available water estimates, serving drought monitoring and precision irrigation. Especially when middle-layer moisture approaches PDP thresholds (X
1 ≈ 33–35, X
2 ≈ 35–38), it can serve as a trigger signal for irrigation scheduling; during deep layer buffering stages, irrigation can be delayed to fully utilize water storage. In the future, hierarchical time windows (5–10 days for shallow layers, 10–15 days for middle layers, ≥20 days for deep layers) can be embedded in intelligent farmland control systems as operational indicators to achieve data-driven precision moisture scheduling [
22].
This study’s framework provided a feasible approach and method for near-real-time estimation of deep soil moisture in semi-arid regions, but continued deepening is still needed in multi-region verification, physical constraints, and uncertainty analysis.
5. Conclusions
This study established and verified a comprehensive prediction framework combining shallow monitoring data and multi-day accumulated meteorological features for the near-real-time estimation problem of profile (0–200 cm) agricultural soil moisture in semi-arid regions. Through three consecutive years of field observations and multi-scenario modeling comparisons, the role differences in different input variables, meteorological time windows, and machine learning algorithms were systematically evaluated, and the dominant mechanisms of profile moisture changes with depth were revealed through interpretability analysis. The research results provided a new technical pathway for achieving low-cost, deployable profile moisture dynamic reconstruction without deep sensors, and provided theoretical basis for drought monitoring and precision irrigation.
This study proposed and verified a near-real-time reconstruction method for 0–200 cm profiles based on shallow dual-layer soil moisture (0–20 cm and 20–40 cm) and multi-day meteorological accumulation driving. Rolling validation results showed that predictions in shallow layers (0–60 cm and 0–80 cm) were highly consistent with observations (R
2 > 0.90, RMSE < 10 mm), and deep layer predictions could well reproduce lagged responses with increasing depth, with clear physical rationality (
Figure 4 and
Figure 5).
Comparison results of different input scenarios showed that: Compared with Scenario A using only single-layer moisture, Scenario B (adding meteorological factors) and Scenario C (adding 20–40 cm layer moisture) both significantly improved middle-deep layer prediction accuracy; comprehensive Scenario D combining dual-layer soil moisture and multi-day meteorological features performed best (
Table 2 and
Table 3). Shallow and middle layers (≤100 cm) R
2 generally exceeded 0.92, with RMSE below 11 mm. Meteorological accumulation time windows showed layered optimal patterns: 5–10 days optimal for shallow layers, 10–15 days most stable for middle layers, and ≥20 days better for ultra-deep layers.
Interpretability results revealed hierarchical driving mechanisms of profile moisture changes: surface and sub-surface moisture (X1, X2) dominated rapid responses of upper profiles, while meteorological accumulation factors (X7) showed significantly increased contributions below 140 cm, reflecting the cumulative and buffering effects of deep moisture on meteorological processes. Threshold ranges (X1 ≈ 33–35, X2 ≈ 35–38) identified in PDP analysis corresponded to the initiation of effective infiltration and downward recharge processes, confirming the physical rationality of model results.
The constructed method has characteristics of low cost, scalability, and embeddability, enabling continuous estimation of deep moisture under shallow IoT monitoring conditions. This framework can serve as an effective tool for agricultural moisture dynamic monitoring and drought scheduling, providing technical support for precision irrigation, moisture balance assessment, and regional application of soil moisture models in semi-arid regions.
The shallow monitoring-meteorological accumulation integration framework of this study theoretically revealed the lagged conduction relationship of shallow moisture to deep dynamics, methodologically achieved near-real-time profile reconstruction, and practically provided feasible alternative solutions for regions lacking deep observations. Future work can further verify the model’s generalization capability across multiple regions and crop systems, and combine with physical mechanistic models or remote sensing inversion methods to construct a cross-scale, integrated agricultural soil moisture monitoring system.