1. Introduction
Soil moisture is a critical component of Earth’s terrestrial system, influencing the exchange of water and energy between the land surface and the atmosphere. It plays a vital role in various natural processes, including plant growth, microbial activity, nutrient transport, and soil health [
1,
2]. Beyond its importance in agriculture, soil moisture serves as a key variable in hydrological, ecological, and climatic systems [
3,
4]. As such, it is a crucial indicator for climate change analysis and environmental monitoring, providing insights into water availability, land degradation, and ecosystem health [
5,
6]. Soil moisture dynamics are governed by a complex interaction of meteorological factors, such as precipitation, temperature, and evapotranspiration. These dynamics are influenced by soil properties, vegetation cover, and topography, making soil moisture highly variable across space and time [
6]. In agroecosystems, soil moisture directly affects crop water use efficiency, evapotranspiration rates, and agricultural productivity [
7]. Moreover, it plays a pivotal role in runoff generation, groundwater recharge, and soil erosion processes, highlighting its significance in water resource management [
8].
The accurate measurement and monitoring of soil moisture are essential for a range of applications, including irrigation management, drought and flood forecasting, and natural hazard mitigation [
9,
10]. Traditional methods such as the gravimetric technique, tensiometers, and time-domain reflectometry provide reliable point-based data. However, these techniques are labor-intensive, spatially limited, and unsuitable for large-scale assessments. This limitation has prompted the use of remote sensing technologies, which offer a non-invasive and scalable approach to monitor soil moisture over large areas. Remote sensing technologies, particularly microwave sensors, have emerged as powerful tools for soil moisture estimation. Satellite platforms such as soil moisture active passive (SMAP) and Sentinel-1 use microwave signals to measure surface soil moisture, providing valuable data for monitoring soil moisture dynamics across different climates and regions [
11]. However, these sensors are typically limited to measuring moisture in the upper layers of the soil, often requiring additional modeling and integration with ground-based data to estimate moisture content at greater depths. Although remote sensing products such as SMAP and Sentinel-1 were not directly utilized in this study, they are discussed to highlight the recent advancements and limitations in large-scale surface soil moisture monitoring.
Recent advancements in geographic information systems (GISs), remote sensing, and machine learning (ML) have greatly enhanced the potential for accurate soil moisture monitoring and prediction [
12,
13]. GISs enable the integration of spatial datasets, while ML algorithms can efficiently process large and complex datasets to capture non-linear relationships in soil moisture variation [
14,
15]. Techniques such as DT, RF, and deep learning models have been successfully applied to estimate soil moisture using remotely sensed and environmental data [
16,
17,
18]. RF, in particular, improves prediction accuracy by reducing overfitting through ensemble learning [
19]. These models are capable of handling diverse inputs, including soil properties, weather variables, and vegetation indices, allowing for detailed analysis across different soil types and climatic conditions [
20]. LR has also been effectively used for classifying soil moisture conditions based on threshold values [
20]. In addition to classification methods, unsupervised learning techniques like clustering are increasingly used to identify natural groupings in soil moisture data, supporting region-specific analysis and management. These approaches provide valuable insights into depth-wise soil moisture distribution, contributing to more efficient irrigation planning and soil conservation practices [
21,
22]. Together, these technologies form a comprehensive framework for understanding and managing soil moisture in the context of precision agriculture.
Previous studies have primarily focused on surface soil moisture estimation using remote sensing or machine learning approaches, with limited attention to field-based multi-depth moisture variability under different irrigation and land-use conditions. In addition, few studies have integrated geospatial interpolation, supervised classification, and clustering techniques within a unified framework for subsurface moisture characterization. To address these research gaps, the present study proposes an integrated geospatial and machine learning framework for detailed soil moisture assessment. The study combines field-based multi-depth measurements with GIS and ML techniques to improve spatial interpretation, predictive classification, and identification of natural soil moisture regimes. The present study was conducted at Punjab Agricultural University, located in an intensively cultivated and groundwater-stressed agricultural region of Punjab, India. The study primarily contributes through depth-specific field-based soil moisture assessment integrated with geospatial and machine learning approaches for exploratory irrigation-management zoning at campus scale.
This study aims to integrate gravimetric soil moisture measurements with geospatial and ML techniques to generate depth-wise spatial soil moisture maps for improved land and water resource management. The key objectives are: (i) to assess soil moisture at four depth intervals (0–15 cm, 15–30 cm, 30–45 cm, and 45–60 cm) using the gravimetric method, (ii) to map spatial variation in depth-wise soil moisture using geospatial techniques, (iii) to evaluate machine learning models (RF, DT, and LR) for predicting soil moisture patterns across varying soil types and agro-climatic conditions, and (iv) to classify spatial moisture regimes using clustering techniques, such as K-means and hierarchical clustering, based on depth, vegetation, and climate.
The novelty of this research lies in its depth-specific approach, moving beyond surface moisture estimation to provide detailed subsurface moisture profiling. The integration of field-based measurements with data-driven modeling and spatial analysis improves prediction accuracy and enhances understanding of moisture dynamics. This framework supports precision agriculture by enabling optimized irrigation scheduling and improved water use efficiency.
2. Materials and Methods
2.1. Description of the Study Area
The study was conducted at the experimental area of Punjab Agricultural University, Ludhiana, Punjab (India), during 2024–2025 (
Figure 1). It is situated between 30°53′ and 30°54′ north latitude and 75°46′ and 75°49′ east longitude at an average elevation of 267 m above mean sea level. The predominant land uses in the area include agricultural lands, primarily used for cultivating rice, wheat, maize, and potato, and built-up areas. The study area experiences a subtropical climate, with temperatures ranging from 7.1 °C in winter to 42.5 °C in summer. The region receives an average annual precipitation of approximately 882 mm, most of which occurs during the monsoon season (July to September). Notably, no precipitation was recorded during the soil sample collection period, ensuring minimal external influence on soil moisture content. The soils of the study area are predominantly sandy loam to loam in texture with moderate drainage characteristics. These soils generally exhibit moderate infiltration and water-holding capacity, which significantly influence soil moisture retention and vertical water movement.
2.2. Land Use, Agricultural Practices, and Irrigation Management
The total geographical area of the PAU is 494.7 hectares, out of which approximately 65% is devoted to agricultural use, while the remaining area consists of built-up infrastructure, including academic buildings, research facilities, and residential quarters. Agricultural activities on the campus are predominantly irrigation-dependent, with limited reliance on monsoon rainfall due to the region’s variable precipitation patterns.
Table 1 provides detailed information on vegetation types, temperature conditions, and irrigation practices across 30 field locations at PAU. The sites exhibited diverse land use, including wheat, Gobhi Sarson (Brassica napus), strawberry, black chickpeas, orchards, and uncultivated plots (both plowed and unplowed). Most wheat-growing locations (e.g., L2, L9, L13, L14, L23, L25, and L27) received up to 7 irrigations during the 2024–2025 season, typically spaced between November and March, with an irrigation depth of 70 mm per event. Gobhi Sarson fields (L22 and L30) received only two irrigations, while black chickpeas at L12 received none. Uncropped locations, including L3–L8, L10, L15, L16, L20, L21, L24, L26, L28, and L29, had no recorded irrigation, allowing for the study of natural moisture dynamics. Orchard sites (L17, L18, and L19) were not irrigated during the study period. This comprehensive dataset was essential for analyzing soil moisture variability under varying vegetation and irrigation regimes.
2.3. Selection of Soil Sampling Locations and Collection of Depth-Wise Soil Samples
Soil sampling locations were systematically selected to ensure spatial representation across the study area. Locations included agricultural fields and areas with varying vegetation types. Each sampling point was geo-referenced using a handheld Global Positioning System (GPS) device (manufactured by Garmin Ltd., and Sourced from Punjab remote sensing centre, Ludhiana, India) to enable accurate spatial identification and facilitate integration with geospatial platforms. The sampling design considered uniform distribution, field accessibility, and heterogeneity in field conditions to capture the variability in soil moisture content effectively. Soil sampling was conducted during the crop-growing season under relatively stable weather conditions without rainfall events. Sampling was performed within a comparable time window following irrigation to minimize temporal variability in soil moisture across locations.
Soil samples were collected from multiple locations at four depth intervals, 0–15 cm, 15–30 cm, 30–45 cm, and 45–60 cm, across the campus, each site representing varied cropping systems and irrigation regimes. A screw-type auger was used to ensure the integrity of depth-wise sampling. Thirty representative sites were identified across cultivated land using ArcGIS Pro’s Grid Application, taking into account variability in soil type and land cover. Built-up areas were excluded. GPS coordinates were recorded at each sampling point. Samples were sealed in moisture-tight labeled moisture boxes to prevent evaporation and cross-contamination before laboratory analysis.
The selected 30 sampling locations were distributed across cultivated areas to adequately represent variations in land use, irrigation practices, and vegetation conditions within the study area. Replicate observations and careful laboratory handling procedures were adopted to minimize measurement errors. All the samples were analyzed under standardized oven-drying conditions to ensure consistency and reliability. Immediately after collection, the fresh weight of each soil sample was recorded. Samples were then transferred to a laboratory oven and dried at a constant temperature of 105 ± 5 °C for 48 h or until a constant weight was achieved. After drying, the final (dry) weight was recorded. This gravimetric procedure was uniformly applied to all depth-wise samples to accurately determine soil moisture content. The gravimetric soil moisture content was calculated using the following standard formula:
2.4. Geospatial and Machine Learning Applications
A range of GIS applications, including ArcGIS Pro, QGIS 3.34, Google Earth Pro, and USGS Earth Explorer, were used to manage and analyze spatial data. ArcGIS Pro (v3.2) was the primary tool for mapping and spatial analysis due to its advanced visualization capabilities. A spatial grid was created over the PAU campus to guide the selection of sampling points, ensuring representation across cultivated areas while excluding built-up zones. GPS-tagged sample locations were linked to corresponding pixels in satellite imagery, facilitating spatial correlation between ground-truth data and remote sensing outputs. Processed soil moisture data and GPS coordinates were imported into QGIS for spatial data management and analysis. The attribute table for each sampling point included depth-wise moisture content, which served as input for interpolation and thematic map generation.
The IDW interpolation technique was used to generate continuous soil moisture surfaces from discrete sampling points. IDW estimates unsampled values based on the assumption that values closer in space are more similar than those farther apart. This technique allowed for the creation of depth-wise soil moisture maps that visually represented spatial variation across the study area. IDW interpolation was selected due to its simplicity and effectiveness in representing local spatial variation under relatively limited and sparsely distributed sampling conditions, making it suitable for exploratory spatial analysis.
Prior to model implementation, the dataset was checked for missing values and normalized where required. The dataset was divided into training (70%) and testing (30%) subsets. Machine learning analyses were performed using Python libraries, including Scikit-learn and Pandas. Hyperparameters were selected using iterative testing to improve model performance. Model implementation included standard preprocessing procedures and consistent train–test partitioning for comparative evaluation of classification performance across depth intervals.
Machine learning and clustering techniques were applied to classify and analyze soil moisture data. Supervised classification algorithms, including DT, RF, and LR, were implemented primarily using depth-wise soil moisture observations and associated spatial field information. RF was employed to address overfitting and improve classification robustness. Multinomial logistic regression was used to classify soil moisture into low, medium, and high categories. Soil moisture observations were grouped into low, medium, and high categories based on the relative distribution of measured moisture values within the dataset. The same classification approach was consistently applied across all depth intervals to maintain comparability among layers.
For clustering analysis, K-means and agglomerative hierarchical clustering were utilized. The optimal number of clusters for K-means was determined using the elbow method based on within-cluster sum of squares (WCSS). Agglomerative hierarchical clustering was performed to identify nested groupings within the data without predefining cluster count. All the models were applied to depth-wise soil moisture data to classify moisture levels and identify spatial and temporal patterns relevant to precision irrigation and soil management.
2.5. Evaluation Metrics
To ensure the reliability and robustness of the classification models and spatial interpolation results, comprehensive validation and evaluation procedures were implemented. For classification tasks, where the goal is to predict discrete target variables, the dataset was split into training and testing subsets to facilitate performance assessment. In addition to train–test splitting, k-fold cross-validation was performed to reduce bias and improve the robustness of classification accuracy assessment. Key evaluation metrics include accuracy, precision, recall, F1-score, support, and confusion matrix, each providing insights into different aspects of model effectiveness. Due to the relatively limited sample size, the reported classification accuracies should be interpreted cautiously and considered exploratory rather than generalized predictive performance estimates. In parallel, spatial interpolation performance was assessed by comparing predicted values with observed measurements at control points using error metrics such as root mean square error (RMSE) and mean absolute error (MAE). These validation steps were critical in establishing the credibility of the study’s findings.
4. Discussion
Overall, the integration of geospatial and machine learning techniques has demonstrated effectiveness in characterizing soil moisture variability, supporting precision agriculture practices, and contributing to improved water resource management. Vertical profiling revealed that moisture stability improved with depth, particularly under actively cultivated and irrigated fields. Locations like L9 and L11 maintained higher subsurface moisture, reflecting enhanced infiltration and retention influenced by vegetative cover, organic matter, and consistent irrigation. These observations align with Bhatt and Kukal [
23], who highlighted improved surface moisture in rice–wheat systems managed under residue retention practices. In contrast, non-cropped or fallow areas such as L24 and L26 experienced lower moisture, supporting the findings of Venkatesh et al. [
24] that bare soils reduce infiltration and promote evaporative losses. These spatial contrasts underscore the crucial role of land cover and agronomic management in regulating moisture regimes across depths.
The regional variability observed across eastern, central, and western zones can be attributed to differences in land use, topographic gradients, and underlying soil textures. These findings correspond with Giri et al. [
25], who reported variable moisture extraction in wheat fields under drip irrigation, governed by plant growth stage and spatial heterogeneity. The interpolation outputs should be interpreted as indicative spatial patterns rather than precise continuous predictions due to the relatively sparse sampling density.
The classification models exhibited depth-dependent performance, with predictive accuracy increasing notably at mid to deeper layers. The DT and RF models achieved optimal accuracy at 30 to 45 cm, recording 88.9% and 77.8%, respectively. These results are consistent with Balas et al. [
26], who found stronger correlations between microwave backscatter and subsurface moisture due to lower temporal variability at depth. Higher classification accuracy at deeper layers may be attributed to reduced temporal fluctuations and lower influence of evaporation and short-term atmospheric interactions. Deeper layers generally exhibit more stable moisture conditions governed by soil texture, infiltration, and water-retention processes, thereby improving model discrimination capability [
27]. At shallow depths, the DT model’s accuracy was constrained by rapid moisture fluctuations driven by surface evaporation, irrigation timing, and plant uptake. The confusion matrices revealed persistent misclassification between “medium” and “high” moisture classes, particularly at 0 to 15 cm. This challenge is also reported in Bhatt and Kukal [
26], who documented high surface variability influenced by tillage practices and atmospheric factors. The overlap between moisture categories reflects a broader issue in machine learning classification of continuous biophysical variables, especially where boundary definitions are narrow or dynamic [
26]. However, the reported classification accuracies should be interpreted cautiously due to the relatively limited dataset size and uneven distribution of moisture classes across the sampling locations. Accordingly, the obtained classification performances should be considered exploratory rather than generalized predictive performance estimates. The relatively small dataset may increase the possibility of overfitting, particularly for depth-wise classification models with limited class representation.
LR exhibited its highest predictive accuracy at 15 to 30 cm, likely due to more stable yet still discriminable moisture gradients at this layer. However, its reduced performance at both shallower and deeper layers illustrates sensitivity to noise and subtle class transitions. The results align with Nurdiawan et al. [
28], who reported limited performance of decision-based models when classification boundaries were not clearly defined due to feature overlap.
Clustering approaches proved effective in identifying latent groupings within the soil moisture dataset. Hierarchical clustering using Euclidean distance revealed three principal clusters aligned with land management regimes. Fields with intensive irrigation and crop cover clustered distinctly from dry fallow areas, echoing the findings by Venkatesh et al. [
24] regarding land-use effects on evapotranspiration and subsurface moisture behavior.
K-means clustering further segmented the sites into four distinct clusters, with the optimal value of k determined by the elbow method. The clustering revealed natural groupings aligned with irrigation frequency, land-use intensity, and moisture retention characteristics. Although clustering revealed meaningful spatial groupings associated with irrigation and land use, the cluster boundaries should be considered interpretative due to moderate overlap among clusters and the absence of sharp elbow separation. Cluster 2 encompassed the highest number of locations, potentially representing a transition zone with moderately irrigated soils and balanced textural composition. The model’s ability to delineate these clusters supports its application in reducing sampling redundancy and improving spatial characterization, consistent with observations by Van Arkel and Kaleita [
29].
From a practical standpoint, the integration of soil moisture profiling with machine learning and geospatial analysis provides a foundation for precision agriculture. The enhanced classification performance at mid-depths, particularly around 30 to 45 cm, suggests that this zone is most reliable for monitoring moisture status for irrigation scheduling. Giri et al. [
25] emphasized the importance of aligning drip irrigation strategies with root-zone moisture variability to optimize water use across crop stages. The identified moisture zones may support preliminary site-specific irrigation planning and prioritization of water management interventions within the campus-scale study area. However, broader regional application would require additional validation using larger and multi-season datasets.
Furthermore, the evidence supports the agronomic value of conservation tillage and residue retention practices, particularly for enhancing moisture conservation in the upper soil layers, as noted by Bhatt and Kukal [
23]. By incorporating depth-specific soil moisture data into irrigation scheduling models, significant improvements in water use efficiency can be achieved. This is particularly relevant for regions like Punjab, where declining groundwater levels necessitate the adoption of data-driven irrigation strategies [
27].
This study demonstrates the synergistic potential of geospatial mapping and machine learning techniques for quantifying and interpreting soil moisture variability. The ability to characterize depth-specific moisture dynamics and segment spatial patterns through clustering highlights the value of these approaches for adaptive water resource management and site-specific crop planning. These findings contribute to the broader goal of developing resilient agricultural systems under conditions of limited water availability.
The study findings are subject to uncertainties associated with interpolation accuracy, limited sample density, classification variability, and clustering stability. Since all the analyses were based on the same field dataset, the generalizability of the developed models to other agro-climatic conditions should be interpreted cautiously. Future studies should include larger datasets, multi-season observations, and external validation for improving robustness and transferability.
5. Conclusions
This study comprehensively assessed the spatial and vertical variability of soil moisture across 30 field locations at Punjab Agricultural University using field measurements and geospatial and machine learning approaches. Soil moisture data across four depth intervals (0–15 cm, 15–30 cm, 30–45 cm, and 45–60 cm) revealed that surface layers exhibited the highest variability, while deeper layers (30–60 cm) showed more stable and consistent moisture retention. Vegetated and irrigated sites, particularly L11, L22, and L25, consistently maintained higher subsoil moisture levels, underlining the critical role of crop cover and irrigation frequency in enhancing soil water storage. In contrast, fallow and non-irrigated fields (e.g., L3, L4, and L26) exhibited low moisture, especially in deeper layers, pointing to reduced infiltration and poor retention. Machine learning models provided valuable insights into depth-specific moisture classification. The DT model achieved the highest accuracy (88.9%) at 30–45 cm, while RF was most reliable at 45–60 cm, and LR peaked at 15–30 cm. Misclassification patterns at shallower depths highlighted the challenges of distinguishing medium and high moisture classes due to dynamic surface processes. These results emphasize that mid to deep soil layers offer more reliable inputs for machine learning-based soil moisture classification. Unsupervised clustering techniques (hierarchical and K-means) further segmented the landscape into distinct moisture regimes aligned with land-use intensity, crop management, and irrigation history. These groupings provide a valuable basis for optimizing irrigation planning and resource allocation across spatially heterogeneous fields. The identified moisture zones may support preliminary site-specific irrigation planning and prioritization of water management interventions within the campus-scale study area. However, broader regional application would require additional validation using larger and multi-season datasets. Overall, the integration of field observations, spatial interpolation, and machine learning techniques offered an integrated framework for characterizing depth-wise soil moisture variability. The study supports the importance of site-specific depth-informed irrigation strategies, particularly in water-stressed regions like Punjab. These findings support the adoption of precision agriculture practices, improve irrigation scheduling, and enhance sustainable water resource management for resilient farming systems.
Despite the usefulness of the developed framework, the study was limited by relatively sparse sampling density and single-season observations. Future research should focus on multi-season monitoring, inclusion of additional soil physical properties, integration of remote sensing datasets, and validation under diverse agro-climatic conditions to improve model robustness and applicability.