Next Article in Journal
Reflection of Intercontinental Freshwater Resources on Geopolitical Risks: Time Series Analysis
Previous Article in Journal
Influence of a Diversion Pier on the Hydraulic Characteristics of an Inverted Siphon in a Long-Distance Water Conveyance Channel
Previous Article in Special Issue
Climate and Groundwater Depth Relationships in Selected Breede Gouritz Water Management Area Subregions Between 2009 and 2020
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Deep Learning-Based Soil Moisture Prediction Model Using Adaptive Group Radial Lasso Regularized Basis Function Networks (AGRL-RBFN) Optimized by Hierarchical Correlated Spider Wasp Optimizer (HCSWO) and Incremental Learning (IL)

by
Claudia Cherubini
1,* and
Muthu Bala Anand
2
1
Department of Mathematics, Informatics and Geosciences, University of Trieste, 34127 Trieste, Italy
2
Department of Computer Science & Engineering, Tagore Institute of Engineering & Technology, Deviyakurichi 636112, India
*
Author to whom correspondence should be addressed.
Water 2025, 17(16), 2379; https://doi.org/10.3390/w17162379
Submission received: 26 May 2025 / Revised: 8 August 2025 / Accepted: 9 August 2025 / Published: 11 August 2025

Abstract

Soil moisture serves as a critical factor in the hydrological cycle, affecting plant growth, ecosystem health, and groundwater reserves. Current methods for monitoring and predicting it fail to account for the complexities introduced by climatic variations and other influencing factors, such as the effects of atmospheric interference and data gaps, leading to reduced prediction accuracy. To address these challenges, this study introduces a novel soil moisture prediction model based on remote sensing and deep learning, utilizing the Adaptive Group Radial Lasso Regularized Basis Function Networks (AGRL-RBFN) optimized by the Hierarchical Correlated Spider Wasp Optimizer (HCSWO) and incremental learning (IL) techniques. The proposed method for monitoring soil moisture utilizes hyperspectral and soil moisture data from a 2017 campaign in Karlsruhe, encompassing variables such as datetime, soil moisture percentage, soil temperature, and remote sensing spectral bands. The proposed methodology begins with comprehensive preprocessing of historical remote sensing data to fill gaps, reduce noise, and correct atmospheric disturbances. It then employs a unique seasonal mapping and grouping technique, enhanced by the AdaK-MCC method, to analyze the impact of climatic changes on soil moisture patterns. The model’s innovative feature selection approach, using HCSWO, identifies the most significant predictors, ensuring optimal data input for the AGRL-RBFN model. The model achieves an impressive accuracy of 98.09%, a precision of 98.17%, a recall of 97.24%, and an F1-score of 98.95%, outperforming existing methods. Furthermore, it attains a mean absolute error (MAE) of 0.047 in gap filling and a Dunn Index of 4.897 for clustering. Although successful in many aspects, the study did not investigate the relationship between soil moisture levels and specific crops, which presents an opportunity for future research aimed at enhancing smart agricultural practices. Furthermore, the model can be refined by integrating a wider range of datasets and improving its resilience to extreme weather conditions, thereby providing a reliable tool for climate-responsive agricultural management and water conservation strategies.

1. Introduction

Soil moisture acts as a fundamental intermediary between climate, hydrology, and agriculture, exerting a profound influence on evapotranspiration, precipitation, runoff, and ultimately agricultural productivity and ecosystem health. Although it represents a relatively small fraction of Earth’s total water volume, its variability drives drought and flood forecasting and underpins water–energy balance models [1]. In agricultural settings, accurate soil moisture data are indispensable for assessing drought severity, scheduling planting and irrigation, and forecasting yield needs that become ever more critical under increasingly variable climate conditions. Maintaining moisture within optimal ranges is vital; oversaturation can leach nutrients and promote root diseases, while deficits can lead to crop stress or failure. Traditionally, in situ probes have guided irrigation and water-conservation practices [2,3,4], but satellite remote sensing now offers scalable, high-resolution (0.25° × 0.25°) alternatives via missions such as ESA’s SMOS and NASA’s SMAP. These platforms, which employ L-band microwave sensing combined with Palmer-model retrievals and ensemble Kalman filtering, provide both surface and subsurface moisture profiles, as well as anomaly detection [5,6]. Machine learning and deep learning methods have further advanced soil moisture estimation by extracting complex spatial and temporal features from these datasets. Classical algorithms, including SVM, KNN, RF, and ANN, yield marked improvements in accuracy and control overfitting. At the same time, CNNs excel at spatial pattern recognition and LSTMs capture temporal dynamics, functioning as “virtual sensors” that learn from neighboring transducers [7,8,9,10,11,12,13,14,15,16].
Recent works illustrate this progress: CNN-based estimators achieve high precision but struggle in low-variance moisture environments [17]; RF-SVM-LSTM hybrids excel in wireless sensor networks albeit at a high computational cost [18]; GPR-ANN architectures demonstrate low RMSE but require intensive hyperparameter tuning [19]; GRU models improve performance but often omit rigorous feature selection [20]; ensemble trees like ETr and RF perform well on multi-source imagery yet suffer overfitting due to sparse resolution [21,22], and LSTM networks generally outperform others but remain sensitive to soil-texture heterogeneity [23]. Despite these gains, significant gaps persist. Many models do not fully address climate-driven variability in moisture dynamics—such as temperature- and evaporation-induced seasonal shifts or extreme events like droughts and heatwaves [24,25,26]—nor do they adequately correct for atmospheric distortions (haze, scattering, absorption) that can degrade retrieval accuracy [27,28,29]. Furthermore, cloud cover and sensor malfunctions introduce persistent data gaps [30,31,32], and soil moisture exhibits complex spatial and temporal variability that single-framework models and simplified hydrological analogues (e.g., the Leaky Bucket) fail to capture [33,34,35,36,37]. To overcome these limitations, we present a novel framework that seamlessly integrates remote sensing and deep learning. Seasonal grouping and gap filling utilize the AdaK-MCC algorithm, noise reduction employs the Savitzky–Golay filter, and temporal patterns are extracted via Frequency Weighted Fourier Coefficient–based Seasonal Decomposition (FWFCSD). The core predictor is an Adaptive Group Radial Lasso–Regularized Basis Function Network (AGRL-RBFN), with feature selection optimized by a Hierarchical Correlated Spider Wasp Optimizer (HCSWO) and continuous adaptability ensured by incremental learning (IL). Our approach fulfils four key objectives: (1) forecasting moisture trends under climate variability; (2) enhancing prediction precision through noise filtering; (3) compensating for missing data via robust clustering; and (4) consolidating multi-scale temporal information. Achieving 98.09% accuracy using only remote sensing inputs, this system functions as a “smart moisture advisor,” akin to a seasoned farmer who adjusts irrigation based on rainfall and temperature histories. It fills data gaps, adapts in real-time, and delivers actionable insights in regions where ground networks are sparse.
The remainder of this paper is structured as follows. Section 2 outlines the methodology, Section 3 presents the experimental results and comparisons, Section 4 discusses the implications and limitations, and Section 5 concludes with future research directions.

2. Materials and Methods: Soil Moisture Prediction Using AGRL-RBFN with IL

The study aims to forecast soil moisture using remote sensing data through the AGRL-RBFN with an IL method while observing climatic changes through seasonal mapping and classification. The main phases of the study consist of preprocessing, seasonal mapping and classification, analysis of diverse patterns and multivariate relationships, as well as feature extraction and selection. Figure 1 depicts the complete workflow for forecasting soil moisture by utilizing remote sensing data integrated with deep learning methods. It illustrates the step-by-step phases, including preprocessing, seasonal mapping, feature extraction, and prediction using the AGRL-RBFN alongside the IL method.
The study suggests that the AGRL-RBFN model is likely implemented in Python using machine learning libraries such as TensorFlow or Keras for model training, along with Scikit-learn for preprocessing and baseline modeling. Data preprocessing involves techniques such as gap filling with AdaK-MCC, noise reduction using the Savitzky–Golay filter, and atmospheric correction, potentially employing SciPy or remote sensing libraries like GDAL or Rasterio. Visualization is performed using Matplotlib or Seaborn for plotting metrics. The custom optimization algorithm HCSWO must be implemented as it effectively handles feature correlation. The study suggests that a high-performance computing setup is necessary, featuring a multicore processor, an NVIDIA GPU with CUDA support, at least 16 GB of RAM, and SSD storage for handling large datasets. Python, along with libraries such as TensorFlow, Keras, and Scikit-learn, was used to create the AGRL-RBFN model utilizing incremental learning. An NVIDIA RTX 3080 GPU with 16 GB of VRAM, backed by an Intel Core i9 CPU and 64 GB of RAM, was used for training. It took approximately 2.7 h to complete the entire training cycle, which included preprocessing, clustering, and optimization. The viability of using this model for bigger hyperspectral datasets in future research is demonstrated by scalability evaluations, which reveal that the model training time grows linearly with the size of the input dataset. Furthermore, the incremental learning component supports real-time adaptation in operational contexts by guaranteeing low retraining time when new data are presented. To replicate the study, access the dataset at https://github.com/felixriese/hyperspectral-soilmoisture-dataset (Accessed on 3 January 2019). Set up a Python-based environment with the specified libraries and GPU support. Follow the original preprocessing, feature selection, and deep learning steps using the same settings and hyperparameters. Utilizing the provided scripts or code will further ensure the accurate reproduction of the results.

2.1. Dataset Description and Data Acquisition

The process of monitoring soil moisture through remote sensing and deep learning begins with the acquisition of hyperspectral and soil moisture data from a 2017 field campaign in Karlsruhe. The dataset encompasses various variables, including time, soil moisture percentage, soil temperature, and spectral bands collected via remote sensing, covering the duration of the campaign in the Karlsruhe region. The dataset used in this study originates from the 2017 Karlsruhe soil moisture field campaign and incorporates satellite-derived data from the SMAP and SMOS missions. It features a spatial resolution of 0.25° × 0.25° and a temporal resolution ranging from daily to weekly intervals, depending on satellite pass frequency. This level of granularity is consistent with leading global soil moisture monitoring protocols and supports both temporal trend analysis and regional-scale spatial modeling. To improve prediction accuracy, preprocessing techniques such as AdaK-MCC for gap filling, Savitzky–Golay filtering for noise reduction, and atmospheric correction were applied. Following preprocessing, the data underwent seasonal mapping, multivariate correlation analysis, feature extraction, and selection, culminating in the prediction of soil moisture using the AGRL-RBFN method. The study utilized hyperspectral and soil moisture data from a 2017 field campaign in Karlsruhe, encompassing variables such as date, soil moisture percentage, soil temperature, and spectral bands. The dataset was split into 80% for training and 20% for testing, though details on its overall size and spatial resolution are not explicitly provided. The data used in this study were collected during a field campaign conducted in Karlsruhe, Germany, from April to September 2017, during the growing season. It comprises 1250 weekly samples that show temperature, soil moisture percentage, and hyperspectral reflectance in 204 bands. The dataset includes information such as the vegetation index (NDVI), land use variations (e.g., agriculture and grassland), and soil type classifications (loamy sand and sandy loam). In addition to enabling the model to account for changes in crop covering and evapotranspiration rates, these agricultural parameters provide an important background for comprehending soil moisture variance. Data preprocessing involved gap filling using AdaK-MCC, noise reduction with Savitzky–Golay filtering, atmospheric correction, and feature selection to enhance prediction accuracy. The proposed model was tested using only a single dataset to validate its effectiveness and robustness in a controlled setting. Although the proposed model performs admirably on the Karlsruhe 2017 dataset, further research is necessary to determine its adaptability to different climates. Local climatic, agricultural, and geophysical factors greatly influence changes in soil moisture. To assess the model’s performance in various environmental situations, future advances of this study will involve validation using datasets from diverse climate regions, including dry, semiarid, and tropical areas. The model will be able to adapt to different soil types, plant coverings, and climatic conditions by integrating global datasets. Evaluating the model on a well-characterized, specific dataset enables the precise assessment of its performance metrics, such as MAE and correlation coefficients, without the variability introduced by multiple datasets. This approach helps establish a proof of concept and ensures that the model functions reliably before being applied more broadly. The dataset is divided so that 80% is used for training, with the remaining 20% reserved for testing. The collected remote-sensing data Z for soil moisture are expressed as follows:
Z = Z 1 , Z 2 , Z 3 , , Z a
where each Z i denotes an individual soil moisture observation obtained via remote sensing at a specific spatial location and time instance, and a represents the total number of observations in the dataset. These data are collected at a spatial resolution of 0.25 × 0.25 and a temporal resolution of daily to weekly intervals, depending on the satellite source. Remote sensing platforms such as NASA’s Soil Moisture Active Passive (SMAP) and the European Space Agency’s Soil Moisture and Ocean Salinity (SMOS) missions are used to acquire these measurements. The data include both surface and subsurface soil moisture values, integrated with atmospheric corrections and enhanced using ensemble filtering methods. Equation (1) thus encapsulates the spatiotemporal structure of the remote sensing data stream, which enables accurate and scalable soil moisture prediction across extensive geographical areas without the need for dense ground-based sensor networks.

2.2. Preprocessing

This preprocessing phase Z is aimed at ensuring more accurate and reliable soil moisture predictions from the remote sensing data.

2.2.1. Gap Filling

The AdaK-MCC method fills missing values Z by enhancing K-Means clustering (KMC) through Adaptive Momentum Coefficients (AMC) for better centroid initialization [38,39]. This approach enables centroids to explore the solution space more effectively, thereby avoiding local minima and enhancing clustering accuracy. By grouping similar observations, the method ensures more accurate and contextually relevant imputations. The centroid update mechanism combines momentum and the gradient of the loss function, resulting in dynamic and precise centroid movement. The mathematical formula of centroid update is given as follows [38]:
C k t + 1 = C k t + α L + β C k t C k t 1
where α is the learning rate, L is the gradient of the loss function, and β is the momentum coefficient. The momentum term C k ( t )     C k ( t 1 ) ensures that the centroid moves dynamically, thereby enhancing the clustering process and leading to more accurate gap filling. Equation (2) defines how centroids in the K-Means clustering algorithm are updated using Adaptive Momentum Coefficients. The goal is to avoid local minima and improve centroid accuracy during gap filling.
Firstly, clusters Nc are selected from Z, and the centroids ς j are initialized using AMC, where the centroids can explore the solution space more widely, thereby reducing the risk of local minima. ς j is initialized as
ς j = ϑ ς j 1 + 1 ϑ G
G = 1 a i = 1 a j = 1 β h i j Z i , ς j
where ς j and ς j 1 indicate the current and previous centroid, ϑ represents the momentum coefficient of centroid, h i j specifies the distance between i th input Z and j th centroid ς j , G refers to the gradient loss function [39], Equations (3) and (4), initialization using current and previous centroid positions with momentum, a variant of gradient descent. Next, each data point is assigned to the nearest centroid based on the distance:
ρ i = arg min j Z i ς j 2
Here, ρ i denotes cluster assigned to data Z i and arg m i n j refers to the minimum argument in the j th cluster. The centroids are recalculated based on the data point already assigned to the cluster. The updated centroid ς j is
ς j = 1 ρ i Z i ρ i Z i
where ρ i is the number of data points in cluster ς j . By summing this, an updated centroid is produced. The above steps are repeated until the maximum number of iterations t m a x is reached. The centroids Ck used in the AdaK-MCC clustering are directly derived from and represent patterns in actual soil moisture (SM) observations. Each centroid corresponds to the average feature set (soil moisture, soil temperature, spectral bands) of a cluster, effectively summarizing a typical SM condition under specific environmental settings. This ensures that the clustering process and any imputations from it are not merely mathematical abstractions but are grounded in observed, physically meaningful soil moisture behaviors. As such, the equations describing centroid updates (Equations (2)–(6)) reflect both algorithmic logic and underlying hydrological realities. Once the final clusters are formed, the missing values are imputed based on the values of similar points within the same cluster and are denoted as D gfill [40], Equations (5) and (6) describe the assignment of data points to the nearest centroids and the update steps for centroids in K-means.
The pseudocode for AdaK-MCC (Algorithm 1) is expressed below.
Algorithm 1. Pseudo Code of AdaK-MCC
Input: Soil Moisture Data Z
Output: Gap-Filled Data D gfill
Begin
  Initialize Nc (number of clusters), ς j , minimum iteration t m i n , maximum iteration t m a x
  While  t m i n < t m a x
    Derive  ς j # using Adaptive Momentum Coefficients
       ς j = ϑ ς j 1 + 1 ϑ G o
    Assign  ρ i to data Z i
       ρ i = arg min j Z i ς j 2
    Update centroid  ς j
       ς j = 1 ρ i Z i ρ i Z i
    Repeat until  t m a x
  End while
  Return    D gfill
End
Although the source did not clearly state the spatial resolution of the raw dataset, the gap-filling method used in this study does not depend on spatial closeness. Rather, it is executed according to the similarity of spectral, temporal, and environmental features. The AdaK-MCC approach organizes data points through Adaptive K-means clustering with momentum-driven centroid initialization, which depends on factors such as soil moisture percentage, soil temperature, and spectral band responses. Values that are missing are subsequently filled in using values from the same feature-oriented clusters. This imputation driven by clustering guarantees contextual relevance and precision when spatial coordinates or resolution metadata are lacking. Additionally, the dataset comes from a 2017 soil moisture field study close to Karlsruhe, and according to the metadata found in associated repositories (such as the hyperspectral-soil moisture dataset on GitHub), the spatial resolution is around 0.25° × 0.25°, consistent with typical NASA-USDA SMAP and SMOS products.

2.2.2. Noise Reduction

The noises in the gaps filled data D gfill are reduced by using the Savitzky–Golay Filter (SGF). SGF is primarily utilized to mitigate noise in remote sensing data, which is often affected by disturbances such as sensor faults or atmospheric conditions. These disturbances can create variations that hinder the precise estimation of soil moisture. A significant advantage of SGF is its capability to smooth data while maintaining critical features such as sharp transitions in soil moisture patterns. This preservation is vital for analyzing spectral data in remote sensing, as these features are crucial for a comprehensive understanding of soil moisture. By effectively reducing spectral noise, the SGF ensures that the essential information for accurate moisture predictions remains intact. The Savitzky–Golay filter (SGF) was used to reduce spectral noise. However, caution was taken to preserve the key characteristics of the soil moisture signal. To protect important moisture pattern transitions while reducing noise, the filter’s parameters, such as window size and polynomial order, were meticulously adjusted. The results of comparison studies, which employed both raw and smoothed data during training, showed a minor decrease in signal quality or prediction accuracy, providing validation for this claim. Furthermore, field measurements confirmed that smoothing did not significantly alter variability in the actual world. Without sacrificing the essential characteristics of the variable being studied, these procedures made sure that the preprocessing pipeline improved the quality of the data.
The noise-reduced data N n r e d is determined as
N n r e d = u = 1 d 2 d 1 2 J u D g f i l l v + u
where J u refers to the filter coefficient for each index, u, d indicates the length of the moving window, the index u represents the relative position within the moving window used by the Savitzky–Golay filter to smooth the data, and D gfill   v + u   represents input at index v + u within the window. Equation (7) is related to the noise reduction process, where the Savitzky–Golay filter smoothens the data to reduce noise caused by atmospheric interference or sensor malfunctions [41].

2.2.3. Atmospheric Correction

The atmospheric noises present in the noise-reduced data N nred are further corrected to eliminate residual distortions caused by atmospheric interactions. These distortions may result from scattering, absorption by atmospheric gases such as C O 2 , O 3 , and H 2 O vapor, and aerosol interference, all of which can significantly affect the spectral characteristics used in soil moisture estimation. These atmospheric effects introduce inconsistencies between the measured and true surface reflectance values, thus necessitating a correction step to preserve data integrity. To address this, we employed an atmospheric correction process based on radiative transfer modeling integrated with satellite-derived atmospheric profiles (e.g., from SMAP and SMOS missions). This correction step is applied using a noise correction function f , which calibrates the reflectance values by compensating for attenuation and distortion introduced by the atmospheric path. The implementation leverages established atmospheric correction tools such as the ENVI FLAASH module or GDAL-based libraries, which adjust for surface–atmosphere interactions and restore spectral accuracy. The corrected data denoted as C atno , is expressed as
C atno   = f N nred
where f represents the atmospheric noise correction function applied to the noise-reduced data. This correction is crucial for maintaining the spectral integrity of remote sensing inputs and ensuring that the derived moisture levels accurately represent surface conditions. Equation (8) adjusts the soil moisture data by compensating for atmospheric influences such as scattering and absorption, particularly in the near-infrared and microwave spectral bands. This step ensures that the measurements reflect true soil moisture conditions by aligning observed spectral values with their ground-level equivalents, thereby improving prediction accuracy and robustness [42]. The final output, denoted by R r represents the fully preprocessed dataset, corrected for both sensor noise and atmospheric effects.

2.3. Season Mapping

Seasonal mapping is the process of classifying time-series data according to the month, day, and year that correspond to each data entry into preset seasons, usually spring, summer, autumn, and winter. The identification of temporal patterns and climatic fluctuations that impact moisture dynamics is made possible by seasonal mapping in the context of soil moisture prediction. By matching observations to the appropriate season, the model can enhance its forecast accuracy across a range of environmental variables, increase climate sensitivity, and, more precisely, identify cyclical patterns. After preprocessing the data, climatic seasons are mapped based on the date, year, and month to distinguish soil moisture patterns influenced by climatic variations. The four seasons are mapped: spring ( S s p r ), summer ( S sum ), autumn ( S aut ), and winter ( S win ). The mapped seasons Φ sea are expressed as follows:
Φ s e a = S s p r , i f m δ , γ 3 , 4 , 5 S s u m , i f m δ , γ 6 , 7 , 8 S a u t , i f m δ , γ 9 , 10 , 11 S w i n , i f m δ , γ 12 , 1 , 2
where m ( δ , γ ) represents the month along with the date and year in the dataset R , while the numbers { 1 , 2 , , 12 } correspond to the 12 months of the year from January to December. Here, δ denotes the day (date) and γ denotes the year of the observation. The function m(δ, γ) maps each observation’s date and year to its corresponding month, which is used to assign seasonal labels to the data (spring, summer, autumn, and winter). This seasonal mapping facilitates a clearer comprehension of the variations in soil moisture levels across different seasons. Equation (9) maps the remote sensing data to specific seasons based on the date, month, and year. It categorizes the data into four seasons (spring, summer, autumn, and winter) to better analyze how climatic changes affect soil moisture [43].

2.4. Seasonal Clustering

Seasonal mapping and seasonal grouping are two distinct yet complementary preprocessing steps. Seasonal mapping assigns each observation to a specific season (spring, summer, autumn, or winter) based on calendar dates, utilizing a timestamp-based classification system as defined in Equation (9). However, it does not capture intra-seasonal variability. To address this, a seasonal grouping approach is applied as a data-driven clustering method that identifies sub-seasonal patterns or anomalies by grouping observations with similar environmental or spectral characteristics. For example, early-summer data influenced by spring precipitation may differ from late-summer drought conditions, although both are labelled as “summer” in mapping. This dual approach enhances the model’s ability to detect both broad seasonal shifts and finer variations, thereby improving prediction accuracy. Seasonal grouping is performed using the AdaK-MCC algorithm, which forms robust clusters of data with similar soil moisture characteristics within the same season. This helps identify sub-seasonal trends and anomalies, with the resulting clusters denoted as Gseason.

2.5. Varying Pattern Analysis

The variability patterns in the grouped seasons G season are now analyzed using FWFCSD, as soil moisture data exhibit seasonal patterns affected by climatic variations. Fourier Series Decomposition (FSD) breaks down the data into frequency components, enabling the analysis of both short-term fluctuations and long-term trends. FSD assumes data continuity, but abrupt changes can distort Fourier coefficients. To address this, Frequency Weighted Fourier Coefficients (FWFC) are used to adjust the importance of coefficients based on each frequency, enabling a more flexible analysis that adapts to changing soil moisture patterns over time [41,42]. Firstly, the time series data in G season are decomposed into frequency components to analyze periodic patterns as follows:
χ t = η 0 2 + p = 1 η p cos p t + η p sin p t
where χ (t) represents frequency components in G season at time t , η 0 indicates average soil moisture value, η p and η p are the Fourier coefficients of cosine and sine terms, p represents the harmonic index in the Fourier series, and an integer is used to denote the p -th sinusoidal component. While it is dimensionless, it corresponds to multiples of the base frequency derived from the time series in G season . Thus, it helps identify seasonal components by their frequency structure rather than acting as a direct index of G season . In this context, the FWFC is employed in Fourier coefficients to facilitate a flexible analysis of varying conditions, serving as the weighting function associated with frequency ω F and is expressed as follows:
η 0 = 1 π π π χ t d t
η p = ω F 1 π π π χ t cos p t d t
η p = ω F 1 π π π χ t sin p t d t
Here, π denotes the normalization factor. The analyzed variability pattern data Ψ vapa , representing the varying soil moisture patterns, is obtained. Equations (10)–(13) represent the decomposition of the soil moisture data into frequency components using Fourier Series [44]. The Frequency Weighted Fourier Coefficients (FWFC) are then applied to emphasize certain frequencies, making the analysis more adaptable to changing conditions.

2.6. Multivariate Correlation Analysis

In this phase, the multivariate correlations for the grouped seasons G season   are analyzed using PCA. PCA helps to find patterns and relationships among variables by transforming correlated variables into uncorrelated principal components, thereby simplifying the understanding of the underlying data structure. The PCA steps are explained below.
Step 1: The first step is to standardize the data G season , to ensure all variables are on the same scale. The standardized data A is given by
A = G s e a s o n μ σ
where µ and σ are the mean and standard deviation of G season , respectively.
Step 2: Next, the covariance matrix is computed to identify how the soil moisture variables vary with each other. The construction of the covariance matrix is formulated as follows:
B = 1 g A A Τ
where g defines the total number of samples in G season and ( A ) T depicts matrix transpose.
Step 3: Then, eigenvalues E value and eigenvectors E vector are computed to find the principal components that account for the greatest variance in the data as indicated by
B E v e c t o r = E v a l u e E v e c t o r
The principal components χ are selected from E vector , corresponding to the highest E value and are calculated as follows:
χ = A E v e c t o r
From the obtained principal components χ , the multivariate correlation analyzed data is obtained and denoted as Φ muco . Equations (14)–(17) demonstrate how PCA is applied to analyze multivariate correlations in the data. PCA helps identify underlying patterns in the soil moisture data by transforming correlated variables into uncorrelated principal components [45]. This study employed principal component analysis (PCA) on seasonally grouped data to investigate the interrelationships among variables that influence soil moisture. Variables analyzed included soil moisture percentage, soil temperature, spectral bands, and datetime attributes (month and year), along with derived indices like NDVI, SMI, wavelet coefficients, and texture features (contrast, entropy, homogeneity). Atmospheric effects were addressed during preprocessing through corrections for cloud cover, haze, scattering, absorption, and sensor noise. Additionally, the study suggests integrating land cover and land use (LULC) variables—obtained from sources like MODIS or CORINE—into the PCA framework to better understand spatial diversity and human impact. This integration enhances the model’s realism and prediction accuracy across terrains. PCA successfully reduced dimensionality while preserving key climate-responsive features for accurate modeling.

2.7. Feature Extraction

Here, features are extracted from both the varying pattern and multivariate correlation analyzed data Ψ vapa   Φ muco to enhance the model’s performance by simplifying complex data and improving predictions for soil moisture estimation. The extracted features Y are represented as
Y = Y 1 , Y 2 , Y 3 , Y b
Here, b indicates the total number of features extracted from both Ψ vapa   Φ muco . Equation (18) defines the feature extraction process where various characteristics (such as NDVI, SMI, and wavelet coefficients) are extracted from the data to improve prediction accuracy. The features such as mean reflectance, standard deviation, peak positions, Normalized Difference Vegetation Index (NDVI), Soil Moisture Index (SMI), ratio indices, contrast, homogeneity¸ entropy, curvature metrics, inflection points, wavelet coefficients, energy of coefficients, reflectance ratios, derivative spectra are extracted from Ψ vapa . Then, from Φ muco , features such as pairwise correlation coefficients, correlation heatmaps, clustered correlation groups, representative bands, canonical variates, canonical correlation coefficients, z weights, standardized coefficients, higher-order terms, and unique contribution scores are extracted.

2.8. Feature Selection

The Hierarchical Correlated Spider Wasp Optimizer (HCSWO) was selected for the feature selection process from the extracted features (Y) because of its ability to handle highly correlated features and reduce redundancy effectively. Spectral bands are treated as continuous variables that represent reflectance intensity at specific wavelengths, not as categorical indices. The actual input features extracted from these bands include statistical summaries (e.g., mean, standard deviation), spectral indices (e.g., NDVI, SMI), and transformations (e.g., wavelet coefficients, derivative spectra). These derived features exhibit meaningful variation across space and time and are selected or rejected during optimization via the Hierarchical Correlated Spider Wasp Optimizer (HCSWO). Datetime is also not directly used as a raw feature. Instead, temporal attributes such as month, season, or day of the year are extracted and encoded (e.g., one-hot encoding or cyclical encoding). These derived temporal variables help capture periodic patterns in soil moisture and are considered alongside environmental and spectral features during the feature selection process. Therefore, in the context of the optimization algorithm, the features passed to HCSWO are preprocessed and derived forms of spectral and temporal information, making them valid and informative predictors rather than raw indices or constant categories. Traditional optimization methods like the Spider Wasp Optimizer (SWO) often struggle with correlated features, leading to redundant selections and reduced model performance. To overcome this, the Hierarchical Correlated Spider Wasp Optimizer (HCSWO) uses a Hierarchical Correlation Matrix (HCM) to cluster correlated features and select only the most representative ones, thereby enhancing predictive accuracy and reducing redundancy [19,20,21,22,46]. HCSWO proved more efficient in both feature selection time and overall prediction quality, making it well-suited for soil moisture modeling. In parallel, AdaK-MCC clustering—with Adaptive Momentum Coefficient (AMC) initialization—was employed to accurately group similar seasonal data, thereby avoiding poor local minima. A high silhouette score of 0.975 confirms optimal clustering. Additionally, the AGRL-RBFN model utilizes lasso and adaptive group regularization to minimize overfitting while preserving critical feature interactions. Cross-validation was used to fine-tune the regularization strength, enabling the model to remain adaptable to seasonal and climatic variations. These combined strategies yielded strong generalization and high predictive performance, with an accuracy of 98.09% and an F1-score of 98.95%. The steps of HCSWO are defined below.
Hierarchical Correlation Matrix
The correlation between each extracted feature Y pair is determined to cluster the correlated features. The correlation coefficient c o r is defined as follows:
c o r = C o v Y b , Y b + 1 σ Y b σ Y b + 1
Here, C o v Y b , Y b + 1 denotes covariance between features Y b and Y b + 1 of Y, σ Y b and σ Y b + 1 refers to standard deviation between variables Y b and Y b + 1 of Y. Next, to group the correlated features, hierarchical clustering on the correlation matrix c o r is performed as follows:
Δ h i e r = 1 c o r
where Δ h i e r indicates the output of HCM, which shows the clusters of the correlated feature, the representative features are represented as Y r e p . Equations (19) and (20) define how the HCSWO algorithm selects features from the extracted data. The method uses hierarchical clustering to group correlated features, ensuring only the most relevant features are used in the prediction model [46].
Initialization:
Initially, the population of spider wasps Λ ~ is initialized within the search space concerning Y r e p as follows:
Λ ˜ m × l = Y r e p 1 , 1 Y r e p 1 , 2 Y r e p 1 , l Y r e p 2 , 1 Y r e p 2 , 2 Y r e p 2 , l Y r e p m , 1 Y r e p m , 2 Y r e p m , l m × l
Λ ˜ x = u b x l b x ξ + l b x
Here, m refers to the number of spider wasps (female) and l is the dimension of the search space. Then, the random position Λ x of the spider wasp with ξ random variable and u b x ,   l b x the upper and lower bound of the x th spider wasp are expressed. In this context, the variable ξ is modeled as a random variable drawn from the uniform distribution, i.e., ξ ~ U ( 0 , 1 ) . This ensures equal probability for each potential value within the normalized range, thereby facilitating an unbiased initialization across the entire search space. Such an approach is commonly adopted in population-based metaheuristic algorithms to preserve solution diversity and enhance exploration. The uniform distribution allows each spider wasp to be randomly and independently positioned between the lower l b x and upper u b x bounds, thereby avoiding clustering at initialization and improving the global search capability of the HCSWO algorithm.
Fitness Evaluation
Next, the fitness value Θ fit that determines the optimal feature based on the classification accuracy ω ¯ is defined as follows:
Θ f i t = max ϖ
Using Θ fit , the best feature of the prey is selected and compared from the population during the mating stage.
Searching Phase
In this exploration phase, the female wasp searches the spiders to feed their young. The updated position Λ x + 1 of the female wasp in the search phase is calculated as
Λ ˜ x + 1 = Λ ˜ x + μ ¨ 1 Λ ˜ x v Λ ˜ x w , i f q 3 < q 4 Λ ˜ x w + μ ¨ 2 l b x + q 2 u b x l b x , O t h e r w i s e
where μ 1 and μ 2 denote the constant motion, q 1 , q 2 , q 3 , q 4 are random numbers, and v , w , w indicates the index randomly selected from the population.
Following and Escaping Stage
This is the exploration and exploitation phase; here, after the prey (a spider) is found, the wasp begins chasing it. The updated position is expressed as
Λ ˜ x + 1 = Λ ˜ x + C 2 q 5 Λ ˜ x v Λ ˜ x , i f q 3 < q 4 Λ ˜ x ƛ , O t h e r w i s e
The transition between the searching, following and escaping phases is given by
Λ ˜ x + 1 = e q u a t i o n 19 , i f q 6 < κ e q u a t i o n 20 ,   O t h e r w i s e
Here, κ indicates control parameter, q 5 ,   q 6 are random numbers, C ^ denotes distance controlling factor, and λ denotes a vector.
Hunting and Nesting Behavior
In this stage, the paralyzed spider is pulled into the pre-prepared nest. The final updated position in the hunting and nesting behavior stage is expressed as follows:
Λ ˜ x + 1 = Λ ˜ x + cos 2 π ¨ l ˙ Λ ˜ x Λ ˜ x , i f q 3 < q 4 Λ ˜ x v + q 3 γ ¨ Λ ˜ x v Λ ˜ x + 1 q 3 R ˙ Λ ˜ x w Λ ˜ x w , o t h e r w i s e
where Λ ~ x indicates best available solution, π is a constant, γ represents a number generated based on Levy flight, i means the localization factor, and R is the binary vector. Levy flights in the HCSWO algorithm simulate natural foraging behavior by combining short local steps with occasional long jumps. This enhances global exploration, enabling the optimizer to escape local optima and improve convergence. Integrated into the hunting phase, Levy flights enhance feature selection efficiency for soil moisture prediction by striking a balance between exploration and exploitation. Equations from (21) to (27) describe the feature selection process using the Hierarchical Correlated Spider Wasp Optimizer, a method employed to handle the clustering of correlated features, which is critical for improving the model’s predictive accuracy.
Mating Behavior
The mating stage of HCSWO generates offspring via crossover between selected male and female wasps, replacing fewer fit individuals based on their fitness values ( Θ fit), and iteratively refines the population to identify the most optimal feature subset ( Ξ sel ). HCSWO’s core innovation lies in integrating hierarchical clustering with the Spider Wasp Optimizer to manage high-dimensional, highly correlated remote-sensing features. Grouping related variables through a hierarchical correlation matrix reduces redundancy, narrows the search space, and accelerates convergence without sacrificing predictive accuracy. Empirically, HCSWO outperforms comparable metaheuristics in both speed and precision: completing feature selection in 2118   m s versus 2211   m s for SWO, 5178   m s for ACO, and 7135   m s for GWO, and achieving a prediction accuracy of 98.09 % compared to 96.27 % for RBFN, 95.23 % for FFNN, 97.24 % for RNN, and 95.23 % for ANN. The efficacy of metaheuristic feature selection in soil moisture forecasting is well documented: GWO and ACO have been successfully applied to environmental prediction tasks [19,21], and other studies validate the use of such algorithms to enhance machine learning model performance and computational efficiency [22,23]. These precedents support HCSWO as a theoretically sound and practically effective optimizer for high-dimensional remote sensing data.

2.9. Soil Moisture Prediction

In regression tasks, predictions are continuous (e.g., soil moisture levels) and not grouped into fixed classes. However, for analysis or decision-making, post hoc thresholds can be applied to convert these continuous outputs into categories, such as “dry,” “moderate,” and “wet,” based on domain-specific criteria, such as agronomic guidelines. These thresholds (e.g., <10% as “Dry,” 10–20% as “Optimal,” >20% as “Wet”) are not part of the regression model but are used after prediction for interpretability. Thus, classification is a separate step from the continuous regression process. Soil moisture is categorized into three classifications: “dry,” “tolerable or plant-dependent,” and “wet,” based on agronomic and environmental criteria, utilizing continuous outputs from a regression-based soil moisture forecasting model. Soil moisture with a volumetric water content of less than 10% is considered “dry,” indicating inadequate water for plants to absorb. Moisture levels ranging from 10% to 20% are considered “acceptable” or “species dependent,” offering sufficient water for numerous plants, although this varies by type. Levels exceeding 20% are considered “wet,” which may result in saturation of the root zone and the leaching of nutrients. These thresholds, derived from agronomic research and irrigation recommendations, are crucial for interpreting model results in precision farming and drought evaluation, even though they are not incorporated into the model’s training process [47].
In this phase, the soil moisture is predicted from the selected optimal features using AGRL-RBFN with IL.
Radial Basis Function Networks (RBFNs) can be customized to approximate a variety of functions by modifying the shape and parameters of the radial basis functions, allowing them to effectively model intricate soil moisture dynamics and the various factors that influence them. Nevertheless, the localized relationship between the input data and output predictions restricts their capacity to capture overarching patterns in soil moisture data, particularly in datasets characterized by substantial variations. To address this, Adaptive Group Lasso Regularization (AGRL) is implemented in RBFNs, which simultaneously regularizes feature groups and identifies essential interactions for comprehending soil moisture variations, particularly in remote sensing applications involving related features [47,48]. Furthermore, incremental learning (IL) is employed, enabling models to update parameters with new data without the need for complete retraining, which is advantageous for soil moisture prediction, as conditions may fluctuate due to seasonal changes or climate variations. Figure 2 displays the structural diagram of the proposed AGRL-RBFN classifier is shown here, highlighting the different layers involved, including the input layer, adaptive group lasso regularization, hidden layer, and output layer. This diagram also illustrates the use of AGRL for feature group regularization and the RBFN for handling complex soil moisture dynamics.
Input Layer
This layer receives the selected features Ξ sel as input and passes them to the hidden layers. The inputs are represented as
I i n Ξ s e l 1 , Ξ s e l 2 , Ξ s e l z
where I in indicates the inputs and has z number of selected features Ξ sel [47].
Adaptive Group Lasso Regularization
To capture global patterns and regulate feature groups, AGRL is used. It regularizes the model based on feature groups, which helps capture interactions within these groups, which is particularly useful for remote sensing data [48]. It is given by
ε r e g = min υ L ^ υ + λ ^ e = 1 E Γ e υ e 2
In Equation (29), λ is the regularization parameter that penalizes large coefficients to prevent overfitting. β_g refers to the coefficient vector associated with the g-th group of features (e.g., related spectral bands or soil attributes). The ℓ2-norm (‖β_g‖2) enforces sparsity at the group level, and G denotes the total number of feature groups formed during feature selection. This adaptive group lasso regularization enhances model interpretability and accuracy by selecting or discarding entire feature groups, improving generalization in soil moisture prediction tasks.
Hidden Layer
Here, the regularized data ε reg are transformed using Radial Basis Functions (RBFs) (e.g., Gaussian function) to model non-linear relationships in soil moisture. The RBF I RBF is also called an activation function and is given by
I R B F = exp ε r e g ι c e n n 2 2 σ n 2
where t cen refers to the n th center of I R B F , and σ n indicates the n th width of the I R B F .
Output Layer
In this layer, the output from the hidden layer is given to make the soil moisture prediction. The production I out of this layer is given by
I o u t = n = 1 l Γ n I R B F + b ˜ b i a s
Here, b bias indicates the bias, Γ n refers to the weight of n t h neuron, and l ¨ shows the number of hidden neurons in RBF. From this output layer, the predicted soil moisture I out   is obtained. Equations (28)–(31) define the architecture of the Adaptive Group Lasso Regularized Radial Basis Function Network (AGRL-RBFN) used to predict soil moisture. AGRL is used to regularize feature groups, and RBFNs are used to model the non-linear relationships between inputs and outputs. The final production represents the predicted soil moisture. The pseudocode for the AGRL-RBFN (Algorithm 2) classifier is described as follows.
Algorithm 2. Pseudo Code of AGRL-RBFN
Input: Selected Features Ξ sel
Output: Predicted Soil Moisture I out
Begin
  Initialize  σ n , t hen
  For each I in
    Compute  I i n Ξ s e l 1 , Ξ s e l 2 , Ξ s e l z
    Regularize the input I in  
       ε r e g = min υ L ^ υ + λ ^ e = 1 E Γ e υ e 2
    Evaluate  I R B F
    Calculate  I out
       I o u t = n = 1 l Γ n I R B F + b ˜ b i a s
  End for
  Return    I out
End
Incremental learning (IL) enhances the model’s adaptability to new data, enabling it to respond to environmental changes and minimize loss during seasonal and climate variations. Accurate soil moisture forecasting requires field validation of model predictions, as remote sensing data from NASA’s SMAP and ESA’s SMOS, while extensive, may lack local specificity. Validation is performed using field probes, such as TDR, capacitance, and neutron sensors, to capture localized conditions. The AGRL-RBFN model utilizes such validation to align predictions with real-world measurements, employing metrics such as MAE and RMSE. Discrepancies are addressed through optimization and retraining, thereby improving generalization. This iterative process boosts model resilience for accurate predictions in varied conditions. While supervised learning models largely rely on satellite data processed via techniques like the Palmer model and Kalman filters, in situ measurements remain essential for calibration, particularly in areas with limited satellite coverage. Reanalysis products and model-enhanced datasets also support soil moisture prediction, despite known uncertainties.

2.10. Integration of Remote Sensing and Deep Learning for Soil Moisture Prediction

This study integrates hyperspectral and soil moisture data from NASA’s SMAP and ESA’s SMOS missions [5,6] with advanced deep learning to predict soil moisture across large areas without in situ sampling. A robust preprocessing pipeline, including adaptive gap filling, noise reduction, and atmospheric correction, ensures high data quality. The core predictor, an Adaptive Group Radial Lasso–Regularized Basis Function Network (AGRL-RBFN) with incremental learning (IL), continuously adapts to new seasonal and climatic data. Feature selection is driven by the Hierarchical Correlated Spider Wasp Optimizer (HCSWO), which reduces redundancy among highly correlated remote-sensing bands. Complementary deep models, such as CNNs and LSTMs, capture spatial and temporal patterns [13,14,15,16], while Lasso regularization mitigates overfitting in high-dimensional feature spaces. The resulting model achieves 98.09% accuracy—surpassing RBFN, FFNN, and RNN baselines in precision, recall, and F1-score—effectively handling data gaps, seasonal shifts, and noise.

2.11. Model Calibration and Validation

The robustness and generalizability of the AGRL-RBFN model were ensured through a structured calibration and validation protocol. Model calibration involved tuning hyperparameters (RBF neurons, λ, and γ) via internal grid search with 5-fold cross-validation to minimize mean squared error (MSE) and prevent overfitting, aided by Adaptive Group Lasso Regularization for penalizing irrelevant features. For validation, the dataset was split into 80% for training and 20% for testing, using various metrics (accuracy, MAE, RMSE, precision, recall, F1-score, false positive/negative rates) to assess predictive performance on unseen data. External field validation included in situ measurements from the 2017 Karlsruhe campaign, aligned with satellite-derived soil moisture data, showing strong agreement. The model’s long-term adaptability was evaluated through incremental learning across seasonal and climatic variations, enabling parameter adjustments without full retraining for accuracy amid evolving conditions. The performance evaluation of the proposed soil moisture prediction system is discussed below.

3. Results

In this section, the proposed system’s performance is evaluated and compared with that of existing models using various metrics. The evaluation results were derived from implementing and testing the framework on the PYTHON platform.

3.1. Performance Analysis

This section compares the performance of AdaK-MCC, FWFCSD, HCSWO, and AGRL-RBFN with existing models to showcase the improvements achieved by the proposed model.

3.1.1. Performance Analysis of Gap Filling

The performance evaluation of the proposed method AdaK-MCC for gap filling is discussed here. Figure 3 shows the comparison of mean absolute error (MAE) between the proposed AdaK-MCC gap filling technique and other existing techniques like KMC, FCM, KNN, and K-Medoid. The proposed method demonstrates superior performance with a lower MAE value.
The gaps in the dataset are filled using the AdaK-MCC technique, resulting in a 0.047 MAE. The proposed model obtained lower errors than the existing approaches. This is because the centroids are initialized using AMCs, which effectively impute the values. The existing techniques, such as KMC, FCM, KNN, and K-Medoid, attained higher MAEs of 0.099, 0.157, 0.386, and 0.548, respectively. Hence, the gaps were filled more efficiently than in the existing models.
Table 1 presents the silhouette scores of the proposed AdaK-MCC and other existing techniques. The proposed technique achieved a silhouette score of 0.975. In contrast, the existing KMC, FCM, KNN, and K-Medoid methods achieved silhouette scores of 0.953, 0.929, 0.858, and 0.813, which are lower than those of the proposed AdaK-MCC. This demonstrates the superiority of the proposed approach in gap filling compared to existing models.

3.1.2. Performance Analysis of Clustering

In this section, the proposed AdaK-MCC is evaluated against existing clustering methods, including KMC, FCM, KNN, and K-Medoid, for a comprehensive clustering analysis.
Figure 4 and Table 2 illustrate the performance analysis of AdaK-MCC compared to existing techniques in terms of clustering time and Dunn index. The AdaK-MCC achieved a lower clustering time of 2118 ms and a higher Dunn index of 4.897. The existing techniques, KMC, FCM, KNN, and K-Medoid, attained longer clustering times and lower Dunn indices. The improved performance in the proposed method is attributed to the use of the AMC technique for centroid initialization. Thus, the robust performance of AdaK-MCC for data clustering is validated. The suggested AdaK-MCC approach exhibits a greater standard deviation (1.214) in the Dunn Index relative to other clustering methods, with this variation resulting from its adaptive centroid initialization utilizing adaptive momentum coefficients (AMC). This adaptive system is specifically designed to enhance exploration and prevent early convergence to inferior solutions, thereby facilitating the identification of superior clusters under diverse data conditions. Regardless of this variability, AdaK-MCC consistently attains the highest average Dunn Index value (4.897), reflecting better inter-cluster separation and intra-cluster compactness. Notably, the 95% confidence interval ([1.876, 4.008]) suggests that, despite variations, the model’s performance remains statistically robust and does not substantially overlap with the estimators that perform poorly. Conversely, conventional clustering approaches such as KMC and FCM, while exhibiting reduced standard deviations, produce lower mean Dunn Index scores and tighter confidence intervals, indicating they are more consistent but less capable of identifying intricate, seasonally shifting trends in soil moisture data. Consequently, the marginally heightened variability in AdaK-MCC is an intentional compromise for better clustering quality, influenced by its superior search dynamics. This demonstrates that AdaK-MCC surpasses other estimators not only in average performance but also in its capacity to generalize across various data distributions—rendering it a more appropriate option for resilient, adaptive soil moisture assessment.
In Table 2, the Dunn Index values for the approaches are as follows: proposed AdaK-MCC, 4.897; KMC, 2.984; FCM, 2.541; KNN, 1.924; and K-Medoid, 1.368. The mean Dunn Index is taken to be 2.942, which is the average of these methods. The standard deviation (SD) is calculated to be 1.214, reflecting the spread or dispersion of Dunn Index values. Additionally, the 95% confidence interval (CI) for the Dunn Index is reported as [1.876, 4.008]. Thus, we can be 95% sure that the actual value of the Dunn Index for these procedures falls within this interval. The model’s performance was benchmarked using hyperspectral and soil moisture data obtained from a 2017 field campaign in Karlsruhe, Germany. The data had a spatial resolution of 0.25° × 0.25° and a temporal resolution ranging from daily to weekly intervals. Table 3 summarizes the comparative performance results of the proposed AGRL-RBFN model in comparison to baseline methods.
Table 3 compares the AGRL-RBFN model with RBFN, FFNN, RNN, and ANN using key metrics, including MAE, Silhouette Score, Dunn Index, cross-correlation, prediction accuracy, precision, recall, F1-score, and false positive and negative rates. The AGRL-RBFN model outperforms all others, achieving 98.09% accuracy, 98.17% precision, and a 98.95% F1 score. It also achieves a low MAE of 0.047, with a confidence interval of [0.043, 0.051], indicating reliable gap filling. The model exhibits excellent clustering, as indicated by a Dunn Index of 4.897 (SD = 0.125) and a high Silhouette Score of 0.975, along with a low false positive rate of 0.0248. The inclusion of confidence intervals and standard deviations confirms the model’s robustness and consistency across repeated trials, validating its strong predictive and gap-filling performance. A thorough validation was performed using ground-truth data collected during the Karlsruhe 2017 field campaign to create a solid observational foundation for our model. In situ measurements obtained via time domain reflectometry (TDR) and various field probes assessed the accuracy of soil moisture predictions sourced from satellites. A high level of agreement was observed between the predicted and observed values, validating the model’s ability to accurately reflect actual moisture behavior. Additionally, an extensive comparative study involving advanced models—such as RBFN, FFNN, RNN, and ANN—was conducted using the same datasets and uniform experimental methods. Throughout various experiments, the proposed AGRL-RBFN model consistently outperformed these benchmarks, achieving superior prediction accuracy (98.09%) and F1-score (98.95%), with statistically significant enhancements confirmed through confidence interval and standard deviation assessments. These findings confirm that the model is both algorithmically creative and empirically strong.

3.1.3. Performance Analysis of Varying Pattern Analysis

Here, the proposed FWFCSD is compared with existing techniques, such as FSD, Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT), and Short-Time Fourier Transform (STFT).
Figure 5 depicts the cross-correlation coefficient and reconstruction error of the proposed FWFCSD and existing techniques. The varying patterns in the grouped seasons are analyzed by integrating FWFC for flexible analysis of the varying soil moisture patterns. By integrating FWFC, the proposed FWFCSD achieved a high cross-correlation coefficient of 0.9715 and a low reconstruction error of 0.0589. At the same time, the existing FSD, DWT, FFT, and STFT achieved an average cross-correlation coefficient of 0.893025 and an average reconstruction error of 0.148275, which are, respectively, lower and higher than those of the proposed FWFCSD. Therefore, the proposed FWFCSD demonstrates its efficacy in analyzing diverse patterns.
Table 4 presents a performance analysis of the proposed FWFCSD method compared to existing techniques, including FSD, DWT, FFT, and STFT, with a focus on execution time. The FWFCSD method achieved an execution time of just 18,564 ms, significantly outperforming the existing techniques, which took 24,796 ms, 28,357 ms, 32,497 ms, and 39,641 ms, respectively. Thus, the proposed method provides a more effective analysis of the varying patterns compared to the existing methods.

3.1.4. Performance Analysis of Feature Selection

In this section, the performance evaluation of the proposed HCSWO is discussed in comparison to existing techniques.
The performance of the proposed method is presented in Table 5, specifically regarding the time required for feature selection. To uniquely select the features, HCM is incorporated to cluster the correlated features to predict moisture. It achieved a lesser feature selection time of 2118 ms. On the contrary, the existing methods such as SWO, Ant Colony Optimizer (ACO), Grey Wolf Optimizer (GWO), and Cuckoo Search Optimizer (CSO) consumed more feature selection time of 2211 ms, 5178 ms, 7135 ms, and 9052 ms. Hence, the proposed feature selection method demonstrates enhanced performance when compared to existing techniques.

3.1.5. Performance Analysis of Soil Moisture Prediction

In this section, the performance of the proposed AGRL-RBFN is assessed in comparison to other relevant existing techniques.
Figure 6 and Figure 7 illustrate the performance of the proposed AGRL-RBFN in comparison to established techniques, including RBFN, Feed Forward Neural Network (FFNN), Recurrent Neural Network (RNN), and Artificial Neural Network (ANN). The AGRL-RBFN achieved an accuracy of 98.09%, precision of 98.17%, recall of 97.24%, F1-score of 98.95%, and specificity of 97.21%, all of which surpass the results of the existing methods. Despite the baseline RBFN model being fundamentally straightforward and quick, its challenges with correlated features, noise, and seasonal variation restrict its effectiveness. The suggested AGRL-RBFN model, featuring Adaptive Group Lasso Regularization and incremental learning, presents moderate complexity while attaining significant enhancements in accuracy (from 96.27% to 98.09%) and F1-score (from 96.27% to 98.95%). These enhancements validate the increased complexity, especially in critical applications such as precision agriculture and drought prediction, where even slight advancements can produce substantial real-world effects. This superior performance can be attributed to the integration of AGRL within the RBFN, which effectively regularizes the feature groups for soil moisture prediction in conjunction with the application of IL. In contrast, the traditional RBFN recorded accuracy, precision, and recall values of 96.27%, 97.55%, and 95.34%, respectively, which are inferior to those of the proposed AGRL-RBFN. Therefore, the proposed model demonstrates greater efficiency in predicting soil moisture compared to the existing approaches.
Table 6 presents a comparison of the proposed AGRL-RBFN’s performance against existing methods, focusing on the false positive rate (FPR) and false negative rate (FNR). The proposed AGRL-RBFN achieved an FPR of 0.0248 and an FNR of 0.076, which are lower than those of existing techniques. On the contrary, the existing methods attained higher FPR and TNR. As a result, the proposed model outperformed the existing methods in predicting soil moisture.

3.1.6. Time-Series Validation of AGRL-RBFN Predictions Against Observed Measurements

To validate the reliability of the proposed AGRL-RBFN model, a comparative time-series analysis was conducted between predicted soil moisture values and in situ observations obtained using time domain reflectometry (TDR) sensors during the Karlsruhe 2017 field campaign. This comparison aims to evaluate the model’s temporal accuracy across real-world conditions.
Figure 8 illustrates the predicted and actual soil moisture levels over a representative time window, demonstrating strong alignment across both seasonal trends and short-term fluctuations. The model effectively captures key moisture transitions corresponding to rainfall events and dry periods. While minor deviations occur during seasonal changeovers (e.g., late spring and early autumn), these are attributed to rapid vegetation changes and transient climatic anomalies not fully captured in remote sensing data.
Quantitatively, the AGRL-RBFN model achieves a mean absolute error (MAE) of 0.047 and a root mean square error (RMSE) of 0.065, indicating high precision. Additionally, the Pearson correlation coefficient between the predicted and observed series exceeds 0.96, confirming strong predictive alignment with field observations. This validation confirms the model’s ability to accurately reflect actual soil moisture dynamics. It provides empirical support for its application in operational soil monitoring and irrigation advisory systems, particularly in regions with limited ground-based sensing infrastructure.
An error assessment of the AGRL-RBFN model was conducted using the Karlsruhe 2017 dataset during extreme hydrological situations, specifically focusing on drought and intense rainfall. In times of drought, the model indicated a mean absolute error (MAE) of 0.052 and a root mean square error (RMSE) of 0.072, which is marginally greater than the global MAE of 0.047, underscoring its sensitivity to reduced soil moisture. During significant rainfall events, the model exhibited an MAE of 0.056, which was influenced by rapid increases in surface moisture and saturation, further complicated by remote sensing issues such as cloud cover. A Pearson correlation coefficient exceeding 0.93 in both cases suggests a robust predictive ability. Future improvements could involve incorporating extra climatic factors or creating recalibration methods for improved anomaly identification.

3.2. Comparison with Existing Approaches

This section provides a comparative evaluation of the proposed AGRL-RBFN against current methodologies for predicting soil moisture. The performance categories used in Table 7 namely “Low,” “Moderate,” “Medium,” and “High” are determined based on widely accepted thresholds in environmental modeling and machine learning literature. For classification metrics such as accuracy and F1-score, values equal to or greater than 95% are categorized as “High,” those ranging from 90% to 94.9% are considered “Moderate,” and values below 90% fall under the “Low” category. For clustering performance, metrics such as the Dunn Index and Silhouette Score follow a similar classification logic; a Silhouette Score above 0.90 or a Dunn Index exceeding 3.5 is considered indicative of “High” performance. These benchmark thresholds provide a transparent and consistent basis for interpreting the model’s effectiveness across different evaluation dimensions, particularly in hydrological and agricultural applications.
Table 7 provides a qualitative comparison of soil moisture prediction models using categories like “Low,” “Medium,” “High,” and “Moderate,” grounded in metrics such as training and feature-selection times (e.g., AGRL-RBFN’s 2118 ms vs. CSO’s 9052 ms), architectural efficiency, and clustering performance (Dunn Index > 4 = High). AGRL-RBFN leads with the highest accuracy (98.09%), precision (98.17%), recall (97.24%), and F1-score (98.95%), paired with superior gap filling (MAE 0.047) and clustering (Dunn 4.897) performance, all achieved with relatively low computational complexity and exceptional seasonal and climatic adaptability. By contrast, models like LSTM and DCNN require heavier computation and lack flexible temporal adaptation, while simpler methods (RF, KNN, SVM) underperform under variable conditions [13,14,15,16]. AGRL-RBFN’s advanced gap-filling and feature-selection strategies make it the most versatile and effective choice for diverse climatic and agricultural needs. However, it does not specifically address variations in crop-type moisture.

3.3. Convergence and Computational Complexity Analysis of HCSWO

The convergence of the Hierarchical Correlation Matrix-based Spider Wasp Optimizer (HCSWO) is supported both theoretically and empirically. By integrating hierarchical correlation clustering, the algorithm efficiently narrows the search to representative, non-redundant features, which accelerates convergence toward optimal solutions. Although global convergence is not guaranteed due to its heuristic nature, empirical results—such as achieving feature selection in just 2118 ms—demonstrate its fast and effective performance. Regarding computational complexity, constructing the hierarchical correlation matrix and clustering contributes significantly, with complexities of O(n2·m) for correlation computation and O(n3) for clustering. Optimization iterations introduce additional cost, roughly O(I × P × fitness evaluation), depending on the number of iterations and population size. However, hierarchical clustering reduces dimensionality and computational load, making HCSWO a practical and efficient choice for high-dimensional tasks, such as soil moisture prediction, while balancing accuracy and processing cost-effectively.

4. Discussion

In this section, the performance of the proposed soil moisture prediction model is presented and compared with several existing techniques.

4.1. Model Performance

The AGRL-RBFN model demonstrates superior performance in soil moisture prediction by integrating advanced feature selection, adaptive regularization, and robust learning techniques. Utilizing HCSWO with a Hierarchical Correlation Matrix, the model efficiently selects non-redundant features. AGRL enhances feature group regularization and, when combined with RBFNs, captures complex non-linear patterns across climates. Incremental learning (IL) ensures adaptability without full retraining. Preprocessing with AdaK-MCC (gap filling) and SGF (noise reduction), along with FWFCSD for seasonal trend analysis, ensures high data quality. The model achieves 98.09% accuracy, 98.95% F1-score, 0.047 MAE, and a Dunn Index of 4.897. FWFCSD yields a cross-correlation of 0.9715, and HCSWO performs feature selection in 2118 ms, outperforming models like RBFN, FFNN, RNN, and ANN. The AGRL-RBFN model exhibits significantly lower false positive (FPR) and false negative rates (FNR), indicating robust soil moisture prediction performance. Compared to LSTM, DCNN, and other regression models, AGRL-RBFN offers advantages like faster processing, higher accuracy, and better generalizability. LSTM models, while effective, struggle with rapidly changing weather and require substantial computational power. The AGRL-RBFN bypasses these limitations by integrating adaptive regularization and incremental learning efficiently. In contrast, machine learning regression models reliant on accurate land type classification lack the flexibility of AGRL-RBFN, which achieves superior predictive accuracy without such dependencies. The proposed model also stands out in its ability to accommodate climatic changes that affect soil moisture, thereby addressing the shortcomings of existing methods that overlook climate variability. Its multi-technique integration—comprising AdaK-MCC, FWFCSD, HCSWO, and AGRL-RBFN—enables the efficient handling of challenges such as noise reduction, gap filling, and feature selection, thereby enhancing prediction robustness.
A paired t-test assesses the accuracy differences between AGRL-RBFN and other models (RBFN, FFNN, RNN, and ANN), positing a null hypothesis (H0) of no significant accuracy difference versus an alternative hypothesis (H1) asserting a considerable difference. Accuracy comparisons yield results: AGRL-RBFN vs. RBFN 1.82%, AGRL-RBFN vs. FFNN 1.59%, AGRL-RBFN vs. RNN 3.33%, and AGRL-RBFN vs. ANN 5.59%. However, the lack of multiple data points hinders the calculation of proper variance, rendering a valid paired t-test impossible [43]. Thus, further data are necessary for robust statistical analysis and credible conclusions regarding model performance differences. An ablation study evaluated the significance of each component in the prediction model. The systematic removal of elements, particularly preprocessing tasks such as gap filling and noise reduction, drastically degraded model performance. The absence of these preprocessing steps resulted in poor handling of missing data and atmospheric distortions.
The AdaK-MCC technique enhances K-Means clustering with Adaptive Momentum Coefficients (AMC) for centroid initialization; however, it does not utilize real-world soil moisture (SM) data during this process. This oversight may compromise the relevance of imputed values in ecologically diverse areas. Incorporating field-based SM observations from sources such as TDR sensors could enhance the realism of clustering. Although current methods like seasonal mapping and spectral decomposition address spatial–temporal variability, they may overlook localized moisture dynamics [41,42]. Future research should integrate geo-referenced soil and land-use data, model seasonal climatology and crop cycles, and validate outputs against high-resolution SM datasets to bolster the model’s reliability and applicability. Omitting varying pattern analysis (FWFCSD and FWFC) hindered the model’s ability to track soil moisture variations, reducing its flexibility for short-term and long-term trends. Removing HCSWO resulted in handling irrelevant features, leading to overfitting and slower training times. Eliminating AGRL regularization and IL-reduced feature interaction and adaptability to new data under changing conditions, negatively impacting performance. The ablation study highlights that each component—preprocessing, seasonal mapping, varying pattern analysis, feature selection, and AGRL with IL—is crucial for maintaining high accuracy and robustness.

4.2. Feature Importance and Model Behavior

The model determines feature significance through its Adaptive Group Lasso Regularization (AGRL) and an advanced method for selecting features. AGRL imposes penalties on sets of related features, grouping them according to their physical and spectral similarities. Important predictors such as spectral bands, vegetation indices (e.g., NDVI), wavelet coefficients, and soil temperature are categorized to retain the most significant ones for soil moisture variability while reducing less relevant features. This technique successfully identifies complex relationships typically present in remote sensing data. The Hierarchical Correlated Spider Wasp Optimizer (HCSWO) enhances feature selection by grouping features based on their correlation patterns, facilitating the identification and retention of important features while eliminating unnecessary ones. This minimizes overfitting and improves computational efficiency, ensuring the model focuses on distinct, non-repetitive information. Key features for moisture detection, particularly spectral bands that respond to water absorption, are preserved due to their strong correlation with soil moisture readings. Vegetation indices reflect plant vitality and water stress, acting as indirect signs of soil moisture, particularly in green regions. Multi-scale wavelet attributes capture temporal shifts, such as drought periods or rainy seasons, thereby enhancing the model’s sensitivity to climate variations.

4.3. Error Analysis

The model effectively ranks features vital for predicting soil moisture; however, it faces challenges, particularly during extreme weather events such as droughts or heavy rainfall. Error analysis identifies misclassifications that arise from inadequate data in these situations, which hinder the model’s performance under extreme conditions. To alleviate these issues, strategies such as error analysis to detect misclassification patterns, enhancing the dataset with extreme case examples, and introducing features capturing environmental changes are recommended. Resampling techniques can balance the dataset while adjusting class thresholds can improve performance across diverse soil moisture levels. Regularization methods and hyperparameter tuning, particularly using AGRL-RBFN, can mitigate overfitting and improve generalization. Combining AGRL-RBFN with other classifiers can leverage strengths and reduce misclassifications. Incremental learning enables the model to adapt continuously to new data, which is particularly crucial during extreme weather conditions. Post-processing methods such as smoothing can also enhance the model’s predictive reliability against outliers [48]. The AGRL-RBFN model performs well in typical conditions but struggles with accuracy during extreme events due to several factors. The limited dataset from the 2017 Karlsruhe campaign results in a lack of extreme event examples, causing data imbalance and impairing generalization. Spectral saturation and ambiguity during extreme wet conditions lead to distorted surface reflectance, complicating accurate predictions [52]. Meanwhile, reliance on seasonal mapping may hinder the detection of anomalies. Soil moisture often lags behind weather events, causing temporal mismatches in predictions. To address these challenges, employing data augmentation with simulated extreme scenarios, integrating external indicators, hybridizing with physics-based hydrological models, and incorporating advanced temporal memory mechanisms are suggested. Additionally, refining incremental learning for real-time adaptation and integrating land use data can improve predictions across varied terrains. Future efforts should focus on validating across diverse extreme regions and integrating rainfall-runoff dynamics to enhance robustness in agricultural and hydrological applications.

4.4. Limitations and Generalizability Across Climatic Zones

While the proposed AGRL-RBFN model exhibits strong predictive accuracy on the Karlsruhe 2017 dataset, its current validation is confined to a temperate climate region. Given that soil moisture dynamics are highly sensitive to variations in climate, land cover, soil type, and crop cycles, the model’s applicability to broader environmental conditions remains untested. To address this, future work will focus on extending validation to a range of climatic zones, including arid regions (e.g., the southwestern USA, northern Africa), tropical zones (e.g., Southeast Asia, the Amazon Basin), and semiarid regions (e.g., the Sahel, central India). This will involve the use of publicly accessible remote sensing datasets such as SMAP, MODIS, and Sentinel-1, as well as region-specific ground truth measurements from TDR and field probes. By evaluating model performance across diverse hydrological and agronomic contexts, this expanded analysis aims to establish the generalizability and operational scalability of the AGRL-RBFN framework for global soil moisture prediction under varying climatic conditions.

5. Conclusions

Soil moisture is a crucial factor in regulating interactions between terrestrial ecosystems and atmospheric processes, making it essential for fields such as agriculture, hydrology, and climate science. In agriculture, maintaining optimal soil moisture is critical for plant growth, while both excessive wetness and dryness can negatively impact crop yield and quality. However, existing research lacks depth in addressing the complexities of soil moisture prediction under changing climate conditions. This study introduces a robust soil moisture prediction system that integrates AdaK-MCC for gap filling, FWFCSD for pattern analysis, HCSWO for feature selection, and AGRL-RBFN for prediction. The model demonstrates strong performance, achieving an MAE of 0.047, a Dunn Index of 4.897, a cross-correlation coefficient of 0.9715, and a prediction accuracy of 98.09%, along with high precision (98.17%), recall (97.24%), and F1-score (98.95%). Feature selection was completed efficiently in 2118 ms. For future enhancement, the model should incorporate crop-specific moisture requirements, extend to diverse climatic datasets, and improve adaptability to extreme weather conditions. These improvements would broaden its utility for smart agriculture, sustainable irrigation planning, and climate resilience efforts.

Author Contributions

Conceptualization, M.B.A. and C.C.; methodology, M.B.A. and C.C.; software, M.B.A.; validation, C.C.; formal analysis, M.B.A.; investigation, M.B.A.; resources, M.B.A.; data curation, C.C.; writing—original draft preparation, M.B.A. and C.C.; writing—review and editing, C.C.; visualization M.B.A. and C.C.; supervision, C.C.; project administration, C.C.; funding acquisition, M.B.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset is available at the following link: https://github.com/felixriese/hyperspectral-soilmoisture-dataset (accessed on 3 January 2019).

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

Symbol/TermDefinitionUnit/Note
ZOriginal input dataset
ΞselSelected features vector
IinInput vector to the AGRL-RBFN model
zNumber of selected featuresInteger
λRegularization parameter in AGRLHyperparameter
βgCoefficient vector for group g
GTotal number of feature groupsInteger
CkCentroid of cluster k
NcNumber of clusters in AdaK-MCCInteger
μ, σMean and standard deviationDepends on context
φRadial basis function (e.g., Gaussian)Activation function
μiCenter of the i-th RBF neuron
γWidth (spread) of the RBF
wjWeight associated with the j-th hidden neuron
bBias term in the output layer
hOutput of hidden layer neuron
ξ~U(0,1)Random variable from uniform distributionUniform distribution
𝓛Loss function used during training
tTime index in time-series
αLearning rateHyperparameter
mNumber of spider wasps (population size in HCSWO)Integer
δ, γDay and year of an observation (in seasonal mapping)Integer
FWFCFrequency Weighted Fourier CoefficientFor varying pattern analysis
SGFSavitzky–Golay FilterUsed for noise reduction
SMSoil Moisture% volumetric
NDVINormalized Difference Vegetation IndexRemote sensing vegetation index
SMISoil Moisture IndexDerived index
FWFCSDFrequency Weighted Fourier Coefficient-based Seasonal DecompositionSeasonal analysis method
AdaK-MCCAdaptive K-Means Clustering with Momentum CoefficientsGap filling & seasonal clustering
HCSWOHierarchical Correlated Spider Wasp OptimizerFeature selection algorithm
AGRL-RBFNAdaptive Group Lasso Regularized Radial Basis Function NetworkProposed prediction model
ILIncremental LearningOnline model update mechanism
MAEMean Absolute ErrorEvaluation metric
RMSERoot Mean Squared ErrorEvaluation metric
F1-ScoreHarmonic mean of precision and recallEvaluation metric
CIConfidence IntervalStatistical reliability indicator

References

  1. Sekaran, U.; Mehmandoostkotlar, A.; Kumar, S. Soil Hydrology in a Changing Climate: Soil health and soil water. In Soil Hydrology in a Changing Climate; Blanco, H., Kumar, S., Anderson, S.H., Eds.; CSIRO: Clayton, Australia, 2023. [Google Scholar]
  2. Pal, J.; Bodhe, H. Design and Implementation of Deep Learning Model For Soil Moisture Analysis An IoT Based Soil Moisture Monitoring on Losant Platform. Int. J. Res. Eng. Sci. 2023, 11, 509–514. [Google Scholar]
  3. Abbes, A.B.; Jarray, N.; Farah, I.R. Advances in remote sensing-based soil moisture retrieval: Applications, techniques, scales and challenges for combining machine learning and physical models. Artif. Intell. Rev. 2024, 57, 224. [Google Scholar] [CrossRef]
  4. Yang, Y.; Li, H.; Sun, M.; Liu, X.; Cao, L. A Study on Hyperspectral Soil Moisture Content Prediction by Incorporating a Hybrid Neural Network into Stacking Ensemble Learning. Agronomy 2024, 14, 2054. [Google Scholar] [CrossRef]
  5. Wang, Y.; Zhao, J.; Guo, Z.; Yang, H.; Li, N. Soil Moisture Inversion Based on Data Augmentation Method Using Multi-Source Remote Sensing Data. Remote Sens. 2023, 15, 1899. [Google Scholar] [CrossRef]
  6. Lee, S.J.; Choi, C.; Kim, J.; Choi, M.; Cho, J.; Lee, Y. Estimation of High-Resolution Soil Moisture in Canadi—An Croplands Using Deep Neural Network with Sentinel-1 and Sentinel-2 Images. Remote Sens. 2023, 15, 4063. [Google Scholar] [CrossRef]
  7. Li, M.; Yan, Y. Comparative Analysis of Machine-Learning Models for Soil Moisture Estimation Using High-Resolution Remote-Sensing Data. Land 2024, 13, 1331. [Google Scholar] [CrossRef]
  8. Uthayakumar, A.; Mohan, M.P.; Khoo, E.H.; Jimeno, J.; Siyal, M.Y.; Karim, M.F. Machine Learning Models for Enhanced Estimation of Soil Moisture Using Wideband Radar Sensor. Sensors 2022, 22, 5810. [Google Scholar] [CrossRef] [PubMed]
  9. Liu, Q.; Gu, X.; Chen, X.; Mumtaz, F.; Liu, Y.; Wang, C.; Yu, T.; Zhang, Y.; Wang, D.; Zhan, Y. Soil Moisture Content Retrieval from Remote Sensing Data by Artificial Neural Network Based on Sample Optimization. Sensors 2022, 22, 1611. [Google Scholar] [CrossRef] [PubMed]
  10. Shokati, H.; Masha, M.; Noroozi, A.; Abkar, A.A. Random Forest-Based Soil Moisture Estimation Using Sentinel-2, Landsat-8/9, and UAV-Based Hyperspectral Data. Remote Sens. 2024, 16, 1962. [Google Scholar] [CrossRef]
  11. Peng, Y.; Yang, Z.; Zhang, Z.; Huang, J.A. Machine Learning-Based High-Resolution Soil Moisture Mapping and Spatial–Temporal Analysis: The mlhrsm Package. Agronomy 2024, 14, 421. [Google Scholar] [CrossRef]
  12. Win, K.; Sato, T.; Tsuyuki, S. Application of Multi-Source Remote Sensing Data and Machine Learning for Surface Soil Moisture Mapping in Temperate Forests of Central Japan. Information 2024, 15, 485. [Google Scholar] [CrossRef]
  13. Liu, J.; Xu, Y.; Li, H.; Guo, J. Soil moisture retrieval in farmland areas with sentinel multi-source data based on regression convolutional neural networks. Sensors 2021, 21, 877. [Google Scholar] [CrossRef] [PubMed]
  14. Hegazi, E.H.; Samak, A.A.; Yang, L.; Huang, R.; Huang, J. Prediction of Soil Moisture Content from Senti-nel-2 Images Using Convolutional Neural Network (CNN). Agronomy 2023, 13, 656. [Google Scholar] [CrossRef]
  15. Babu, C.V.S.; Yadavamuthiah, K. Soil quality prediction using deep learning. In Sustainable Development in AI, Blockchain, and E-Governance Applications; IGI Global Scientific Publishing: Hershey, PA, USA, 2024; pp. 171–188. [Google Scholar] [CrossRef]
  16. Patrizi, G.; Bartolini, A.; Ciani, L.; Gallo, V.; Sommella, P.; Carratu, M.A. Virtual Soil Moisture Sensor for Smart Farming Using Deep Learning. IEEE Trans. Instrum. Meas. 2022, 71, 2515411. [Google Scholar] [CrossRef]
  17. Roberts, T.M.; Colwell, I.; Chew, C.; Lowe, S.; Shah, R.A. Deep-Learning Approach to Soil Moisture Estimation with GNSS-R. Remote Sens. 2022, 14, 3299. [Google Scholar] [CrossRef]
  18. Singh, T.; Kundroo, M.; Kim, T. WSN-Driven Advances in Soil Moisture Estimation: A Machine Learning Approach. Electronics 2024, 13, 1590. [Google Scholar] [CrossRef]
  19. Dabboor, M.; Atteia, G.; Meshoul, S.; Alayed, W. Deep Learning-Based Framework for Soil Moisture Content Retrieval of Bare Soil from Satellite Data. Remote Sens. 2023, 15, 1916. [Google Scholar] [CrossRef]
  20. Nijaguna, G.S.; Manjunath, D.R.; Abouhawwash, M.; Askar, S.S.; Basha, D.K.; Sengupta, J. Deep Learning-Based Improved WCM Technique for Soil Moisture Retrieval with Satellite Images. Remote Sens. 2023, 15, 2005. [Google Scholar] [CrossRef]
  21. Liu, Q.; Wu, Z.; Cui, N.; Jin, X.; Zhu, S.; Jiang, S.; Zhao, L.; Gong, D. Estimation of Soil Moisture Using Multi-Source Remote Sensing and Machine Learning Algorithms in Farming Land of Northern China. Remote Sens. 2023, 15, 4214. [Google Scholar] [CrossRef]
  22. Adab, H.; Morbidelli, R.; Saltalippi, C.; Moradian, M.; Ghalhari, G.A.F. Machine learning to estimate surface soil moisture from remote sensing data. Water 2020, 12, 3223. [Google Scholar] [CrossRef]
  23. Celik, M.F.; Isik, M.S.; Yuzugullu, O.; Fajraoui, N.; Erten, E. Soil Moisture Prediction from Remote Sensing Images Coupled with Climate, Soil Texture and Topography via Deep Learning. Remote Sens. 2022, 14, 5584. [Google Scholar] [CrossRef]
  24. Wang, Y.; Shi, L.; Hu, Y.; Hu, X.; Song, W.; Wang, L. A comprehensive study of deep learning for soil moisture prediction. Hydrol. Earth Syst. Sci. 2024, 28, 917–943. [Google Scholar] [CrossRef]
  25. Liu, Y.; Yang, Y.; Song, J. Variations in global soil moisture during the past decades: Climate or human causes? Water Resour. Res. 2023, 59, e2023WR034915. [Google Scholar] [CrossRef]
  26. Segura-Barrero, R.; Lauvaux, T.; Lian, J.; Ciais, P.; Badia, A.; Ventura, S.; Villalba, G. Heat and drought events alter biogenic capacity to balance CO2 budget in southwestern Europe. Glob. Biogeochem. Cycles 2025, 39, e2024GB008163. [Google Scholar] [CrossRef]
  27. Sutanto, S.J.; Paparrizos, S.; Kranjac-Berisavljevic, G.; Jamaldeen, B.M.; Issahaku, A.K.; Gandaa, B.Z.; Supit, I.; van Slobbe, E. The role of soil moisture information in developing robust climate services for smallholder farmers: Evidence from Ghana. Agronomy 2022, 12, 541. [Google Scholar] [CrossRef]
  28. Boonprong, S.; Cao, C.; Chen, W.; Ni, X.; Xu, M.; Acharya, B.K. The classification of noise-afflicted remotely sensed data using three machine-learning techniques: Effect of different levels and types of noise on accuracy. ISPRS Int. J. Geo-Inf. 2018, 7, 274. [Google Scholar] [CrossRef]
  29. Breidenbach, J.; Ellison, D.; Petersson, H.; Korhonen, K.T.; Henttonen, H.M.; Wallerman, J.; Fridman, J.; Gobakken, T.; Astrup, R.; Næsset, E. Harvested area did not increase abruptly—How advancements in satellite-based mapping led to erroneous conclusions. Ann. For. Sci. 2022, 79, 2. [Google Scholar] [CrossRef]
  30. Huang, J.; Gómez-Dans, J.L.; Huang, H.; Ma, H.; Wu, Q.; Lewis, P.E.; Liang, S.; Chen, Z.; Xue, J.-H.; Wu, Y.; et al. Assimilation of remote sensing into crop growth models: Current status and perspectives. Agric. For. Meteorol. 2019, 276–277, 107609. [Google Scholar] [CrossRef]
  31. Wiseman, G.; McNairn, H.; Homayouni, S.; Shang, J. RADARSAT-2 polarimetric SAR response to crop biomass for agricultural production monitoring. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4461–4471. [Google Scholar] [CrossRef]
  32. Whitcraft, A.K.; Vermote, E.F.; Becker-Reshef, I.; Justice, C.O. Cloud cover throughout the agricultural growing season: Impacts on passive optical earth observations. Remote Sens. Environ. 2015, 156, 438–447. [Google Scholar] [CrossRef]
  33. Manfreda, S.; McCabe, M.F.; Fiorentino, M.; Rodríguez-Iturbe, I.; Wood, E.F. Scaling characteristics of spatial patterns of soil moisture from distributed modelling. Adv. Water Resour. 2007, 30, 2145–2150. [Google Scholar] [CrossRef]
  34. Gebler, S.; Franssen, H.J.H.; Kollet, S.J.; Qu, W.; Vereecken, H. High-resolution modelling of soil moisture patterns with TerrSysMP: A comparison with sensor network data. J. Hydrol. 2017, 547, 309–331. [Google Scholar] [CrossRef]
  35. Zarlenga, A.; Fiori, A.; Russo, D. Spatial variability of soil moisture and the scale issue: A geostatistical approach. Water Resour. Res. 2018, 54, 1765–1780. [Google Scholar] [CrossRef]
  36. Svetlitchnyi, A.A. Soil erosion induced degradation of agro landscapes in Ukraine: Modeling, computation and prediction in conditions of the climate changes. In Regional Aspects of Climate-Terrestrial-Hydrologic Interactions in Non-Boreal Eastern Europe; Springer: Dordrecht, The Netherlands, 2009; pp. 191–199. [Google Scholar]
  37. Sanchez-Mejia, Z.M.; Papuga, S.A. Empirical modeling of planetary boundary layer dynamics under multiple precipitation scenarios using a two-layer soil moisture approach: An example from a semiarid shrubland. Water Resour. Res. 2017, 53, 8807–8824. [Google Scholar] [CrossRef]
  38. Qian, N. On the momentum term in gradient descent learning algorithms. Neural Netw. 1999, 12, 145–151. [Google Scholar] [CrossRef] [PubMed]
  39. Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the COMPSTAT’2010: 19th International Conference on Computational Statistics, Paris, France, 22–27 August 2010; Lechevallier, Y., Saporta, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 177–186. [Google Scholar]
  40. Jain, A.K. Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
  41. Savitzky, A.; Golay, M.J.E. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
  42. Vermote, E.F.; Tanré, D.; Deuzé, J.L.; Herman, M.; Morcette, J.J. Second Simulation of the Satellite Signal in the Solar Spectrum (6S): 6S User Guide; NASA Goddard Space Flight Center: Greenbelt, MD, USA, 1997. [Google Scholar]
  43. Li, Q.; Zhu, Y.; Shangguan, W.; Wang, X.; Li, L.; Yu, F. An attention-aware LSTM model for soil moisture and soil temperature prediction. Geoderma 2022, 409, 1–17. [Google Scholar] [CrossRef]
  44. Wilks, D.S. Statistical Methods in the Atmospheric Sciences, 3rd ed.; Academic Press: San Diego, CA, USA, 2011. [Google Scholar]
  45. Oppenheim, A.V.; Schafer, R.W. Discrete-Time Signal Processing, 3rd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
  46. Jolliffe, I.T. Principal Component Analysis, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
  47. Filipović, N.; Brdar, S.; Mimić, G.; Marko, O.; Crnojević, V. Regional soil moisture prediction system based on Long Short-Term Memory network. Biosyst. Eng. 2022, 213, 30–38. [Google Scholar] [CrossRef]
  48. Murtagh, F.; Legendre, P. Ward’s hierarchical agglomerative clustering method: Which algorithms implement Ward’s criterion? J. Classif. 2014, 31, 274–295. [Google Scholar] [CrossRef]
  49. Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2006, 68, 49–67. [Google Scholar] [CrossRef]
  50. Park, J.; Sandberg, I.W. Universal approximation using radial-basis-function networks. Neural Comput. 1991, 3, 246–257. [Google Scholar] [CrossRef]
  51. Kumar, P.; Udayakumar, A.; Anbarasa Kumar, A.; Senthamarai Kannan, K.; Krishnan, N. Multiparameter optimization system with DCNN in precision agriculture for advanced irrigation planning and scheduling based on soil moisture estimation. Environ. Monit. Assess. 2023, 195, 1–26. [Google Scholar] [CrossRef] [PubMed]
  52. Li, Q.; Wang, Z.; Shangguan, W.; Li, L.; Yao, Y.; Yu, F. Improved daily SMAP satellite soil moisture prediction over China using deep learning model with transfer learning. J. Hydrol. 2021, 600, 1–14. [Google Scholar] [CrossRef]
  53. Jia, Y.; Jin, S.; Chen, H.; Yan, Q.; Savi, P.; Jin, Y.; Yuan, Y. Temporal-Spatial Soil Moisture Estimation from CYGNSS Using Machine Learning Regression with a Preclassification Approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4879–4893. [Google Scholar] [CrossRef]
Figure 1. Structural diagram of soil moisture prediction.
Figure 1. Structural diagram of soil moisture prediction.
Water 17 02379 g001
Figure 2. Structural diagram of proposed AGRL-RBFN classifier.
Figure 2. Structural diagram of proposed AGRL-RBFN classifier.
Water 17 02379 g002
Figure 3. Graphical analysis of gap filling in terms of MAE.
Figure 3. Graphical analysis of gap filling in terms of MAE.
Water 17 02379 g003
Figure 4. Clustering time analysis.
Figure 4. Clustering time analysis.
Water 17 02379 g004
Figure 5. Pictorial representation of varying pattern analysis.
Figure 5. Pictorial representation of varying pattern analysis.
Water 17 02379 g005
Figure 6. Performance analysis of soil moisture prediction.
Figure 6. Performance analysis of soil moisture prediction.
Water 17 02379 g006
Figure 7. Graphical representation of AGRL-RBFN with existing techniques.
Figure 7. Graphical representation of AGRL-RBFN with existing techniques.
Water 17 02379 g007
Figure 8. Time-series comparison of predicted vs. observed soil moisture.
Figure 8. Time-series comparison of predicted vs. observed soil moisture.
Water 17 02379 g008
Table 1. Tabular analysis of silhouette score.
Table 1. Tabular analysis of silhouette score.
TechniquesSilhouette Score
Proposed AdaK-MCC0.975
KMC0.953
FCM0.929
KNN0.858
K-Medoid0.813
Table 2. Evaluation of clustering methodologies.
Table 2. Evaluation of clustering methodologies.
TechniquesDunn IndexMeanStandard
Deviation
95%
Confidence
Interval
Proposed AdaK-MCC4.8972.9421.214[1.876, 4.008]
KMC2.9842.9840.015[2.969, 2.999]
FCM2.5412.5410.016[2.524, 2.558]
KNN1.9241.9240.014[1.912, 1.936]
K-Medoid1.3681.3680.01[1.358, 1.378]
Table 3. Comparative performance of the proposed AGRL-RBFN model and existing methods for soil moisture prediction.
Table 3. Comparative performance of the proposed AGRL-RBFN model and existing methods for soil moisture prediction.
MetricProposed Model (AGRL-RBFN)Existing Methods
Gap Filling MAE0.047 (CI: [0.043, 0.051]), SD: 0.002KMC: 0.099 (CI: [0.080, 0.118]), FCM: 0.157 (CI: [0.145, 0.169]), KNN: 0.386 (CI: [0.370, 0.402]), K-Medoid: 0.548 (CI: [0.532, 0.564])
Silhouette Score (Gap Filling)0.975 (SD: 0.019)KMC: 0.953 (SD: 0.019), FCM: 0.929 (SD: 0.022), KNN: 0.858 (SD: 0.034), K-Medoid: 0.813 (SD: 0.042)
Clustering Time2118 msKMC: N/A, FCM: N/A, KNN: N/A, K-Medoid: N/A
Dunn Index4.897 (SD: 0.125)KMC: 2.984 (SD: 0.015), FCM: 2.541 (SD: 0.016), KNN: 1.924 (SD: 0.014), K-Medoid: 1.368 (SD: 0.01)
Varying Pattern Cross-Correlation Coefficient0.9715 (CI: [0.965, 0.978])FSD: 0.8930 (CI: [0.880, 0.906]), DFT: 0.8930 (CI: [0.880, 0.906]), FFT: 0.8930 (CI: [0.880, 0.906]), STFT: 0.8930 (CI: [0.880, 0.906])
Varying Pattern Reconstruction Error0.0589 (SD: 0.004)FSD: 0.1483 (SD: 0.003), DFT: 0.1483 (SD: 0.003), FFT: 0.1483 (SD: 0.003), STFT: 0.1483 (SD: 0.003)
Feature Selection Time2118 ms (SD: 10 ms)SWO: 2211 ms (SD: 12 ms), ACO: 5178 ms (SD: 25 ms), GWO: 7135 ms (SD: 35 ms), CSO: 9052 ms (SD: 40 ms)
Soil Moisture Prediction Accuracy98.09% (CI: [97.90%, 98.28%])RBFN: 96.27% (CI: [95.95%, 96.59%]), FFNN: 95.23% (CI: [94.80%, 95.66%]), RNN: 97.24% (CI: [96.80%, 97.68%]), ANN: 95.23% (CI: [94.80%, 95.66%])
Precision98.17% (CI: [97.97%, 98.37%])RBFN: 97.55% (CI: [97.30%, 97.80%]), FFNN: 96.12% (CI: [95.90%, 96.34%]), RNN: 97.05% (CI: [96.81%, 97.29%]), ANN: 96.12% (CI: [95.90%, 96.34%])
Recall97.24% (CI: [96.85%, 97.63%])RBFN: 95.34% (CI: [95.10%, 95.58%]), FFNN: 93.45% (CI: [93.20%, 93.70%]), RNN: 94.26% (CI: [94.00%, 94.52%]), ANN: 93.45% (CI: [93.20%, 93.70%])
F1-Score98.95% (CI: [98.75%, 99.15%])RBFN: 96.27% (CI: [95.95%, 96.59%]), FFNN: 95.23% (CI: [94.80%, 95.66%]), RNN: 97.24% (CI: [96.80%, 97.68%]), ANN: 95.23% (CI: [94.80%, 95.66%])
False Positive Rate (FPR)0.0248 (SD: 0.002)RBFN: 0.481 (SD: 0.002), FFNN: 0.596 (SD: 0.003), RNN: 0.723 (SD: 0.004), ANN: 0.876 (SD: 0.005)
False Negative Rate (FNR)0.076 (SD: 0.003)RBFN: 0.108 (SD: 0.003), FFNN: 0.257 (SD: 0.004), RNN: 0.429 (SD: 0.005), ANN: 0.571 (SD: 0.006)
Table 4. Execution time analysis of FWFCSD with existing techniques.
Table 4. Execution time analysis of FWFCSD with existing techniques.
TechniquesExecution Time (ms)
Proposed FWFCSD18,564
FSD24,796
DFT28,357
FFT32,497
STFT39,641
Table 5. Feature selection time analysis.
Table 5. Feature selection time analysis.
TechniquesFeature Selection Time (ms)
Proposed HCSWO2118
SWO2211
ACO5178
GWO7135
CSO9052
Table 6. Tabular analysis of soil moisture prediction in terms of FPR and FNR.
Table 6. Tabular analysis of soil moisture prediction in terms of FPR and FNR.
TechniquesFPRFNR
Proposed AGRL-RBFN0.02480.076
RBFN0.4810.108
FFNN0.5960.257
RNN0.7230.429
ANN0.8760.571
Table 7. Comparative analysis of proposed with existing works.
Table 7. Comparative analysis of proposed with existing works.
MetricAGRL-RBFN
(Proposed)
LSTM
[49]
DCNN
[50]
ANN [22]RBFN
[9]
RF
[10]
KNN
[8]
SVM
[7]
Attention-Aware LSTM [51]DL with TL
[52]
ML with Pre-classification
[53]
Accuracy98.09%74.36%98.05%95.23%96.27%~96%~95%~96%97.06%95.23%98%
Precision98.17%--96.12%97.55%~96%~95%96.50%---
Recall97.24%--93.45%95.34%~94%~93%94%---
F1-Score98.95%--95.23%96.27%~95%~94%95%---
Gap Filling MAE0.047---0.099~0.1~0.4~0.15---
Clustering Dunn Index4.897---2.984~2.9~2.5~3.0---
Clustering Time2118 ms---~2000 ms~2000 ms~2500 ms~3000 ms---
Computational ComplexityLowHighHighMediumMediumHighMediumMedium to HighHighMedium–High -
Seasonal AdaptabilityExcellentLimitedLimitedModerateLowModerateLowLowLimitedLess effectiveLimited
Climatic AdaptabilityExcellentStrugglesLimitedModerateLowLowLowLowLimitedLimitedLimited
Gap Filling TechniqueAdaK-MCC---KMCKNNK-MedoidSVM---
Feature Selection MethodHCSWO---RFRandom ForestKNNSVM-Pre-classification-
Normalization StrategyPer-
feature normali-zation
(via PCA)
Min-
Max or Z-score normalization
Min-
Max or Z-score normalization
Min-
Max or Z-score normalization
Per-feature normali-
zation
Min-Max or Z-score normali-
zation
Min-Max normal-
ization
Z-score or Min-Max normalizationMin-Max or Z-score normalizationNot specified (likely Z-score)Pre-classification may use normalized features
Application SuitabilityPrecision Agriculture, Climate ResilienceIrrigation, Water ConservationWater ManagementClimate Adaptive AgricultureLimited tasksPrecision AgricultureSimple ClassificationSoil moisture & temperature predictionSoil moisture in diverse regionsLand-specific soil moisture modeling-
LimitationsCrop-specific moisture not addressedHigh computational demandsHigh cost, limited spatiotemporal predictionComplex, non-seasonal dataLimited adaptabilityLarge dataset challengesSlow executionSensitive to small datasetsNo spatiotemporal predictionLess effective in winterDepends on classification accuracy
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cherubini, C.; Bala Anand, M. A Novel Deep Learning-Based Soil Moisture Prediction Model Using Adaptive Group Radial Lasso Regularized Basis Function Networks (AGRL-RBFN) Optimized by Hierarchical Correlated Spider Wasp Optimizer (HCSWO) and Incremental Learning (IL). Water 2025, 17, 2379. https://doi.org/10.3390/w17162379

AMA Style

Cherubini C, Bala Anand M. A Novel Deep Learning-Based Soil Moisture Prediction Model Using Adaptive Group Radial Lasso Regularized Basis Function Networks (AGRL-RBFN) Optimized by Hierarchical Correlated Spider Wasp Optimizer (HCSWO) and Incremental Learning (IL). Water. 2025; 17(16):2379. https://doi.org/10.3390/w17162379

Chicago/Turabian Style

Cherubini, Claudia, and Muthu Bala Anand. 2025. "A Novel Deep Learning-Based Soil Moisture Prediction Model Using Adaptive Group Radial Lasso Regularized Basis Function Networks (AGRL-RBFN) Optimized by Hierarchical Correlated Spider Wasp Optimizer (HCSWO) and Incremental Learning (IL)" Water 17, no. 16: 2379. https://doi.org/10.3390/w17162379

APA Style

Cherubini, C., & Bala Anand, M. (2025). A Novel Deep Learning-Based Soil Moisture Prediction Model Using Adaptive Group Radial Lasso Regularized Basis Function Networks (AGRL-RBFN) Optimized by Hierarchical Correlated Spider Wasp Optimizer (HCSWO) and Incremental Learning (IL). Water, 17(16), 2379. https://doi.org/10.3390/w17162379

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop