Next Article in Journal
Geological and Petrophysical Properties of Underground Gas Storage Facilities in Ukraine and Their Potential for Hydrogen and CO2 Storage
Next Article in Special Issue
F-Segfomer: A Feature-Selection Approach for Land Resource Management on Unseen Domains
Previous Article in Journal
Verification of the Assumptions of the Polish State Forest Policy in the Context of the New EU Forest Strategy 2030
Previous Article in Special Issue
Sustainable Operation Strategy for Wet Flue Gas Desulfurization at a Coal-Fired Power Plant via an Improved Many-Objective Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

NDMI-Derived Field-Scale Soil Moisture Prediction Using ERA5 and LSTM for Precision Agriculture

by
Elham Koohikeradeh
1,
Silvio Jose Gumiere
1 and
Hossein Bonakdari
2,*
1
Department of Soils and Agri-Food Engineering, Laval University, Quebec, QC G1V 0A6, Canada
2
Department of Civil Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(6), 2399; https://doi.org/10.3390/su17062399
Submission received: 12 February 2025 / Revised: 6 March 2025 / Accepted: 7 March 2025 / Published: 9 March 2025

Abstract

:
Accurate soil moisture prediction is fundamental to precision agriculture, facilitating optimal irrigation scheduling, efficient water resource allocation, and enhanced crop productivity. This study employs a Long Short-Term Memory (LSTM) deep learning model, integrated with high-resolution ERA5 remote sensing data, to improve soil moisture estimation at the field scale. Soil moisture dynamics were analyzed across six commercial potato production sites in Quebec—Goulet, DBolduc, PBolduc, BNiquet, Lalancette, and Gou-new—over a five-year period. The model exhibited high predictive accuracy, with correlation coefficients (R) ranging from 0.991 to 0.998 and Nash–Sutcliffe efficiency (NSE) values reaching 0.996, indicating strong agreement between observed and predicted soil moisture variability. The Willmott index (WI) exceeded 0.995, reinforcing the model’s reliability. The integration of NDMI assessments further validated the predictions, demonstrating a strong correlation between NDMI values and LSTM-based soil moisture estimates. These findings confirm the effectiveness of deep learning in capturing spatiotemporal variations in soil moisture, underscoring the potential of AI-driven models for real-time soil moisture monitoring and irrigation optimization. This research study provides a scientifically robust framework for enhancing data-driven agricultural water management, promoting sustainable irrigation practices, and improving resilience to soil moisture variability in agricultural systems.

1. Introduction

Recent advancements in remote sensing (RS) and deep learning (DL) have transformed agricultural water management by enabling more accurate soil moisture predictions, which are essential to precision irrigation [1,2]. Despite the increasing availability of high-resolution remote sensing datasets, estimating soil moisture with high accuracy remains challenging due to spatiotemporal variability and the complex interactions among soil properties, vegetation, and atmospheric conditions [3].
Irrigation in Quebec, Canada, began in the late 1940s as a strategy to stabilize soil moisture for tobacco crops (L.) during dry spells [4]. This practice was later expanded to other crops, such as corn, soybean, vegetables, potato (L.), and horticultural crops [5]. The agricultural regions of Quebec experience ample annual rainfall, averaging 940 mm, with nearly two-thirds falling during the growing season [6]. While this is generally sufficient for crop growth, short drought events from May through August often disrupt soil moisture levels, particularly in the shallow root zones of crops such as vegetables, resulting in significant yield reductions [7]. Based on research by Gallichand et al. [7], irrigation requirements for potatoes on loamy sand for the months May to August are (5–15), (15–30), (25–35), and (10–25) mm/week. These fluctuations highlight the need for advanced predictive models that can accurately capture soil moisture dynamics and optimize irrigation management strategies. Researchers can enhance soil moisture prediction by leveraging deep learning techniques and remote sensing data, ultimately improving water use efficiency and supporting sustainable agricultural practices. These variations in soil moisture due to irrigation highlight the need for advanced predictive models that can accurately capture such fluctuations. Researchers can enhance soil moisture prediction by leveraging deep learning techniques and remote sensing data, ultimately leading to more efficient irrigation management strategies.
Numerous studies have explored deep learning models for soil moisture prediction, including CAE-DLSTM, SoMo.ml, DNNR, and PSO-LSTM [8,9,10,11]. While these models have demonstrated promising results, they often struggle in regions with irregular soil moisture variations, highlighting the need for hybrid modeling approaches or localized fine tuning to improve predictive accuracy. Additionally, their high computational demands make them less suitable for real-time monitoring in resource-constrained environments [12]. For instance, Huang et al. [13] developed a Conv-LSTM model for spatiotemporal soil moisture prediction. However, this approach did not fully exploit spatiotemporal dependencies, and it overlooked irrigation as a key human-induced factor affecting soil moisture dynamics.
Integrating deep learning models with remote sensing data has emerged as a promising strategy to enhance the ability of predictive models to handle spatiotemporal variability. Pan et al. [14] demonstrated that combining LSTM-based models with remote sensing datasets, such as SMAP and ERA5, improved spatial heterogeneity representation and enhanced irrigation precision. Additionally, Ling et al. [15] found that ERA5 reanalysis data effectively captured soil moisture variations across different temporal and spatial scales, making it a reliable dataset for soil moisture prediction.
In remote sensing, among spectral bands, shortwave infrared (SWIR) reflectance is particularly sensitive to changes in soil moisture, as drier conditions increase the fraction of absorbed radiative energy [16]. Similarly, near-infrared (NIR) reflectance is responsive to variations in soil water content [17].
The Normalized Difference Moisture Index (NDMI) is one of the lesser-studied vegetation indices in remote sensing for soil moisture estimation [18]. Recent studies have demonstrated its utility in predicting soil moisture content [19]. For example, Safi et al. [20] used NDMI values to assess water stress levels in crops such as irrigated wheat, potatoes, and grapes, integrating it into broader methodologies for evaluating plant moisture deficits.
Despite these advancements, existing studies have primarily used the NDMI or deep learning models separately for soil moisture estimation, without adequately considering the role of irrigation in influencing soil moisture dynamics. Given the reliance on irrigation to stabilize soil moisture levels in Quebec’s agricultural regions, integrating these factors into predictive models is crucial to improving estimation accuracy and informing better water management strategies. This research study aims to bridge this gap by integrating NDMI-derived moisture assessments with an LSTM-based framework, offering an improved approach to soil moisture prediction in precision agriculture. Specifically, this study seeks to address the challenges of localizing deep learning models and reducing computational demands for soil moisture prediction. To achieve this, we develop a Long Short-Term Memory (LSTM)-based framework that integrates ERA5 remote sensing data to enhance model scalability and adaptability across diverse agricultural landscapes. This approach leverages ERA5 reanalysis data to capture temporal dynamics and soil moisture variability better, ensuring robust, high-resolution predictions.
The specific objectives of this research are as follows:
  • Develop and implement an LSTM-based model for high-resolution soil moisture prediction, incorporating both temporal and spatial variability.
  • Integrate ERA5 remote sensing data to enhance the accuracy and generalizability of soil moisture predictions across heterogeneous agricultural environments.
  • Demonstrate the applicability of the model for precision irrigation, offering insights into optimal water allocation strategies to improve sustainable agricultural water management.
This study hypothesizes that integrating NDMI-derived soil moisture assessments with an LSTM-based framework will enhance soil moisture prediction accuracy by addressing the current gap in research, where the NDMI and deep learning models have been predominantly used in isolation. By leveraging ERA5 remote sensing data, this study further hypothesizes that the generalizability and spatial accuracy of deep learning-based soil moisture predictions will be significantly improved across heterogeneous agricultural environments. Given the ability of LSTM to capture sequential dependencies in time-series data, it is expected that the LSTM model will outperform traditional modeling approaches to capturing spatiotemporal soil moisture dynamics. However, the model’s performance is hypothesized to be influenced by NDMI values and local environmental conditions, where higher NDMI values (indicating sufficient soil moisture) are expected to yield better predictive performance, while drought-prone areas may exhibit higher errors. Lastly, this study posits that LSTM-based soil moisture prediction can support precision irrigation strategies by providing actionable, data-driven insights into optimal water allocation, ultimately reducing water waste and improving crop yields. By testing these hypotheses, this research study aims to contribute to advancing AI-driven soil moisture monitoring and its applications in sustainable agricultural water management.
By integrating deep learning techniques with high-resolution remote sensing data, this study contributes to ongoing research in precision irrigation by addressing the specific challenges of soil moisture variability in Quebec’s agricultural regions. Given the historical reliance on irrigation to mitigate drought effects on crops such as potatoes, this research study provides a data-driven approach to optimizing water use and improving irrigation scheduling in response to local conditions. The proposed LSTM-based framework overcomes limitations of conventional modeling approaches, providing a scalable and adaptable solution for soil moisture estimation. Beyond its scientific contributions, this research study has practical implications, including optimized irrigation scheduling, reduced water wastage, and enhanced agricultural resilience to drought. The findings can inform data-driven decision making for farmers, policymakers, and agricultural stakeholders, fostering the adoption of intelligent irrigation strategies that enhance crop productivity while ensuring efficient water use.

2. Materials and Methods

2.1. Materials

2.1.1. Study Area

In this study, agricultural parcels in Quebec, Canada, are analyzed in relation to Quebec’s agricultural drought monitoring system. The study area includes six agricultural parcels dedicated to potato cultivation, located in the cities of Dolbeau and Peribonka: Goulet, Daniel Bolduc (DB), Patrice Bolduc (PB), Bergeron et Niquet (BN), Marc-André Lalancette (LAL), and a newly identified parcel, Gou-new (Figure 1). The geographical boundaries of the study area are defined as follows: west = −72.23, east = −71.99, north = 48.82, and south = 48.74. These parcels serve as key locations for monitoring soil moisture conditions and assessing drought impact on agricultural productivity.

2.1.2. Normalized Difference Moisture Index (NDMI)

In this section, static data are utilized to assess drought conditions across the studied parcels by monitoring soil moisture levels. This analysis underscores the importance of agricultural drought monitoring within parcels. The Normalized Difference Moisture Index (NDMI) is computed via an image from the Sentinel-2 L2A dataset. Python-based scripts for importing images and indices were executed within a Google Collaboratory Notebook environment. The images were subsequently visualized via ArcGIS Pro (version 3.3.0).
The NDMI is a key indicator for assessing the water content of vegetation and monitoring drought conditions. NDMI values are derived from reflectance values, which are themselves dimensionless and do not have a physical unit (ranging from −1 to 1 or expressed as a percentage). Values approaching −1 indicate dry soil, values near zero (−0.2 to 0.4) typically reflect water stress, and high positive values (approximately 0.4 to 1) are associated with dense, healthy vegetation with no water stress.
The NDMI combines information from the NIR and SWIR bands to assess vegetation moisture. While NIR signals reflect leaf structure and dry matter, SWIR signals are sensitive to water content and mesophyll structure. This combination minimizes interference, improving the accuracy of vegetation water content estimation. SWIR reflectance, which is negatively correlated with leaf water content, is a key indicator for estimating the moisture levels of vegetation [21] (Equation (1)).
N D M I = N I R S W I R N I R + S W I R ,
where NIR represents the near-infrared band value from aerial images and SWIR denotes the shortwave infrared band value from the same image [22].
The classification range of the NDMI is delineated into the following categories: very dry soil, dry soil, slightly dry soil, moderate soil moisture, adequate soil moisture, moist soil, and high soil moisture. The categories are defined specifically for the studies parcels in the current research study inspired by Copernicus Browser and research by [18]. Parcel filtering according to the NDMI, as depicted in Figure 2, reveals that most of the areas are susceptible to soil dryness and low moisture levels at the pixel scale.

2.1.3. Remote Sensing-Based Datasets

Remote sensing moisture indices can be used for irrigation scheduling to monitor water stress spatiotemporally. Given the strengths and weaknesses of each dataset source, merging satellites and modeled products can reduce uncertainty and improve irrigation planning and scheduling. All datasets used in the current study were obtained from model-based products from the land component of the Fifth Generation European Reanalysis (ERA5-Land) dataset. ERA5-Land serves as an alternative way to monitor moisture status over broad spatial extents by providing volumetric soil moisture (VSM) estimates at various depths and time scales [23].
ERA5 and its land-specific component, ERA5-Land, are advanced global reanalysis datasets developed by the ECMWF [24]. While ERA5-Land mainly provides land data at a 9 km spatial scale (ERA5 provides hourly SM measurements at a spatial resolution of 0.08°), the dataset also includes 50 variables related to global land water and energy cycles at the hourly scale. This model-based product encompasses the VSM extracted at different soil depths and a schematic of the soil depths (Figure 2).
The soil depth divisions in ERA5 are based on the Hydrology Tiled ECMWF Scheme for Surface Exchanges over Land (HTESSEL), which parametrizes land surface processes, including soil moisture and temperature dynamics. The model defines three standard layers: 0–7 cm (surface layer), 7–28 cm (subsurface layer), and 28–100 cm (root-zone layer). These divisions are established to represent vertical soil moisture and thermal gradients, considering infiltration, evapotranspiration, and root water uptake [25].
The surface layer (0–7 cm) accounts for rapid interactions with atmospheric fluxes, responding dynamically to precipitation and evaporation, thus playing a critical role in short-term hydrological processes and surface energy balance. The subsurface layer (7–28 cm) acts as an intermediate zone with moderated sensitivity to meteorological forcing, influencing plant water availability and root uptake processes. The root-zone layer (28–100 cm) represents a deeper storage reservoir, characterized by a slower response to atmospheric inputs and controlling longer-term soil moisture trends. Additionally, ERA5-Land extends this framework with a fourth layer (100–289 cm), improving the representation of deep soil moisture storage.
These depth classifications are constrained by numerical modeling requirements while ensuring compatibility with land surface hydrological processes. The layered structure enables the partitioning of water fluxes and energy exchanges, optimizing soil moisture assimilation and retrieval in Earth system models [26].
The ERA5-Land hourly dataset was used to assess hourly trends over the past 5 years, from 2019 to 2023, and in June, July, August, and September (6, 7, 8, and 9, respectively; June and September = 01–30, July and August = 01–31), over 1 h intervals (00:00–23:00). The data were downloaded in NetCDF-3 (Network Common Data Form) format, an experimental application for storing and processing climate and geospatial datasets. The geographical data were clipped via subregion extraction (west = −72.23, east = −71.99, north = 48.82, and south = 48.74). As we focus on soil moisture, the only variable extracted from this dataset was VSM (v, cubic meters per cubic meter; hereafter m3 m−3) in layer 1 (0–7 cm).
The aggregated hourly VSM data captured from ERA5 were analyzed over five years, from 2019 to 2023, across the different parcels (Figure 3). The pair plot reveals strong linear correlations between most pairs of parcels, and the range of changes in the VSM content for each parcel was calculated based on the correlation coefficient, which can reflect small changes in the maximum and minimum VSM values.
The trends in the VSM data from ERA5 were closely monitored across the different parcels, and these data were analyzed via histogram plots (Figure 4). The analysis revealed that the data for the Goulet, PBolduc, Lalancette, and DBolduc plots exhibit approximately normal distributions with slight skewness. Notably, the Goulet and DBolduc plots display the highest variability, evident from their distributions’ broader basis. On the other hand, the histograms for the BNiquet and Gou-new parcels exhibit sharp peaks and are skewed; specifically, the BNiquet distribution displays a left skew, whereas the Gou-new distribution exhibits a right skew. These patterns suggest variations in water retention or drainage characteristics among these regions.
Interestingly, the peak VSM values ranged from approximately 0.25 to 0.3 m3/m3 in most parcels, a range that could indicate optimal soil moisture levels for potato cultivation under typical conditions. However, the skewness observed in some plots indicates periods with notably higher or lower soil moisture contents, which must be managed through careful irrigation or drainage practices. The variability and skewness in the soil moisture data suggest the potential benefit of real-time soil moisture monitoring, allowing for dynamic adjustments to irrigation schedules. This is particularly important for parcels such as BNiquet and Gou-new, where the skewness indicates occasional deviations from the norm, highlighting the necessity for more precise water management strategies in these areas.

2.2. Methods

2.2.1. Methodology

In this study, the ERA5 reanalysis model was employed to monitor the soil moisture status based on VSM to support irrigation scheduling. ERA5-Land provides global reanalysis data with hourly meteorological estimates at a 9 × 9 km spatial resolution [24]. The dataset includes soil moisture values (m3 m−3) for three soil layers: the upper soil layer (0–7 cm, referred to as “layer 1” or L1), the 7–28 cm depth (L2), and the 28–100 cm depth (L3). The ERA5-Land dataset integrates satellite observations with atmospheric forcing data [23,27].
In this study, VSM (m3 m−3) values were extracted from the topsoil layer (0–7 cm, or L1). To ensure consistent spatial coverage and facilitate product comparisons, all datasets were reprojected to the UTM coordinate system by using the WGS84 datum via ArcGIS Pro software (version 3.3.2) [28]. The images were further cropped to precisely align with the boundaries of the study area, maintaining the scope of the analysis to the designated region.
Image processing and analysis were conducted on the Google Colab platform [29], executed in Python (3.11.11 version) [30]. The VSM data were spatially confined to the study area, as defined by geographical boundaries: west = −72.23, east = −71.99, north = 48.82, and south = 48.74. Temporally, the analysis spanned five years (2019–2023), with a focus on June to September. Data were extracted at daily intervals for June (1–30), July and August (1–31), and September, with an hourly interval (00:00–23:00).
This research study involves several key steps: First, the input data are defined both temporally (2019–2023) and spatially, using shapefiles representing all parcels. Satellite data from the ERA5 product are then imported into the environment by installing and utilizing the “geemap” package in Google Colab. Soil moisture values are extracted from the NetCDF-3 file by using the “Zonal Statistics” tool (0.20.0 version), a spatial analysis technique implemented in Python. This method computes summary statistics for raster data within specified zones defined by a vector dataset [31]. Specifically, the median statistic is applied at a 30 m scale, following the recommendation by Huang, Xia [32], for the six stations depicted in Figure 5. Finally, the ERA5-derived data are prepared for machine learning by splitting it into training (70%), validation (15%), and testing (remaining points) subsets, ready for application in the LSTM model.

2.2.2. Long Short-Term Memory (LSTM) Networks

LSTM [33] is a specialized RNN designed to handle long-term dependencies in sequential data. Unlike traditional feedforward networks, it uses feedback connections to retain over long sequences [34]. While standard RNN suffers from vanishing/exploding gradient [35,36,37], LSTM overcomes this issue by incorporating a memory cell and a set of gating mechanisms to control information, retain it, and forget it [38]. A typical LSTM unit consists of memory cells with three main gates: forget, input, and output gates (Figure 5). These gates regulate information flow, selectively retaining or discarding data to maintain efficient memory over time. Sigmoid and tanh activation functions help scale values and decide which piece of information to keep or discard.
LSTM’s effectiveness comes from its cell state and hidden state [38]. The cell state (Ct) acts as memory, retaining and selectively forgetting information over time [39], while the hidden state (ht) serves as the output, reflecting both historical and current inputs [38]. Together, they enable efficient sequence processing.
The forget gate controls how much of the previous cell state, Ct−1, is retained or discarded. It calculates the forget vector ft by using the hidden state from the previous time step, ht−1, and the current input, xt (Equation (2)):
f t = σ W f h t 1 , x t + b f
where σ is the sigmoid activation function, which normalizes the output to between 0 and 1, and Wf is the weight matrix for the forget gate. bf is the bias for the forget gate, and ht−1 and xt are concatenated and transformed, enabling the model to focus on specific parts of the input and the previous hidden state.
The input gate in LSTM updates the cell state by first selecting which values should be modified (Equation (3)), ensuring relevant new information is added:
i t = σ W i h t 1 , x t + b i
They then generate new candidate values, denoted by C ˜ t , which are added to the cell state (Equation (4)).
C ˜ t = tanh W C h t 1 , x t + b C
The output of the input gate, it, and the candidate cell state, C ˜ t , are combined to update the cell state. The new cell state, Ct, is established by combining forget and input gate information. The formula is as follows (Equation (5)):
C t = f t · C t 1 + i t · C ˜ t
This equation allows the cell to retain information from the previous state Ct−1 and update it with new information from C ˜ t , guided by the forget and input gates.
The output gate that determines the hidden state, ht, which serves as the output for the current time step and influences the information passed to the next cell (Equation (6)):
o t = σ W o h t 1 , x t + b o
The final hidden state ht is computed by applying the output gate ot to the updated cell state Ct (Equation (7)):
h t = o t · t a n h C t
The hidden state ht is then output and used as the input for the next cell in the sequence.
In an LSTM network, the sigmoid and tanh functions enable information flow. The sigmoid function (Equation (8)) used in the forget, input, and output gates controls how much information passes through by scaling values between 0 and 1, enabling the selective forgetting, updating, and retention of relevant data. The tanh function (Equation (9)) scales state values between −1 and 1, allowing LSTM to handle a broad range of information. This balanced scaling helps capture complex data patterns and relationships, enhancing the network’s performance in modeling sequential data.
σ ( x ) = 1 1 + e x
tanh ( x ) = e x e x e x + e x

2.2.3. LSTM-Based Modeling

Figure 6 shows a sequential neural network model with LSTM layers for regression tasks. Designed for time-series and sequential data, the model begins with a “sequence input layer”, which preserves the time-dependent nature of the data, ensuring effective sequence processing.
Following the input layer, the first “LSTM layer” captures initial temporal dependencies by using memory cells with gating mechanisms to retain relevant information from previous steps and discard irrelevant details. This helps the model learn both short- and long-term patterns in time-series data [37]. After the first LSTM layer, a ReLU (rectified linear unit) activation layer is applied, introducing nonlinearity by converting negative values to zero, preventing vanishing gradient, and enhancing the model’s ability to capture complex patterns [40].
A second LSTM layer is added to enhance the model’s ability to capture complex sequential dependencies. While the first LSTM layer captures basic temporal and patterns, the second layer extracts more nuanced temporal relationships, enabling hierarchical learning and improving predictive accuracy for complex tasks. A second ReLU activation layer follows the second LSTM layer, adding nonlinearity and improving the model’s ability to fit complex patterns without gradient issues. Placing a ReLU after each LSTM layer enhances flexibility and performance in regression tasks. The output from the second ReLU layer is passed to a fully connected layer, transforming the learned features into suitable output for regression tasks. This layer aggregates information from previous layers, ensuring dense connections that enhance prediction quality.
A dropout layer follows the fully connected layer to prevent overfitting [41] by randomly deactivating neurons during training. This forces the network to rely on various pathways, enhancing generalization and robustness for real-world data variations. The architecture ends with a regression output layer, designed for continuous target variables. It enables precise predictions for tasks like time-series forecasting, ensuring the model is optimized for regression applications.
The LSTM model is trained by using input and output pairs. Notably, the overall dataset is divided into three subsets, 60%, 20%, and 20%, of the data for training, validation, and testing, respectively. The training data are utilized during the training phase to adjust the network parameters. The validation data are used to monitor the training process and assess the occurrence of overfitting.
An LSTM experiment was conducted to determine the optimal parameters for the LSTM model, such as the number of blocks, the number of layers, and the dropout rate. All modeling was performed over 1000 epochs by using the entire dataset, TensorFlow, and the Keras Deep Learning Library on the Python programming platform. We experimented with different optimizers to expedite training and ensure sufficient model performance and stability, thereby eliminating the need for typical regularization methods such as dropout. Ultimately, the Adam optimizer [42,43] was chosen for its superior performance, with a learning rate of 0.0001. The Adam optimization algorithm enhances the computational efficiency of models and improves performance when working with large datasets and numerous parameters [44,45]. The mean square error was used as the basis of the loss function, and the root mean square error was selected as the metric for evaluating model accuracy. The Adam optimization algorithm was used as the descent algorithm in this study. Adam, which stands for Adaptive Moment Estimation, combines the advantages of two other popular optimization methods: the AdaGrad algorithm (which works well with sparse gradients) and RMSProp (which performs well in online and non-stationary settings). This algorithm was selected because it enhances computational efficiency and performance when working with large datasets and models with numerous parameters. By adaptively adjusting the learning rate for each parameter, Adam helps stabilize the optimization process and reduces the need for additional regularization methods such as dropout.

2.2.4. Validation Criteria and Metrics

Given the random variability inherent in hydrological factors, relying solely on one measure to assess a statistical model’s effectiveness is inadequate [46,47]. Therefore, in this study, both graphical representations and standardized statistical indices are applied to assess the reliability of the proposed LSTM model. To gauge its predictive ability, the root mean square error (RMSE), the mean relative absolute error (MRAE), and the correlation coefficient (r) were selected as the primary statistical indicators.
R M S E = 1 N i = 1 N S M i , o S M i , f 2
M A R E = 1 N i = 1 N S M i , o S M i , f S M i , o
r = i = 1 N S M i , o S M o _ _   S M i , f S M f _ _ i = 1 N S M i , o S M o _ _ 2 i = 1 N S M i , f S M f _ _   ,   1 r 1
Here, N represents the total number of data points within the testing phase (independent dataset), and SMi,o and SMi,f denote the ith values of the observed and forecasted time series, respectively.
The RMSE and the MARE are used to assess the fit of the model without considering the difference in direction between the forecasted and observed values. However, the RMSE is ideally suited for cases in which the forecasting error is normally distributed [48]. In practice, not all models meet this criterion; therefore, in this study, the MARE was employed, as it considers all relative deviations between the forecasted and observed data equally, regardless of the sign [49]. The correlation coefficient r, generally constrained within [−1, 1], quantifies the proportion of variance in the observed values explained by the model, although it requires linearity assumptions. Additionally, r, RMSE, and MARE may lack sensitivity to outliers and additive or proportional discrepancies between model predictions and actual observations [50]. Relying solely on r could lead to misleading results, especially if there is a systematic bias in the forecasted values.
Additional insights into the accuracy of the LSTM model were obtained by calculating Nash–Sutcliffe efficiency (NSE), the refined Willmott index (WI), and the Legates and McCabe index (LMI) as standardized measures of model fit [51].
N S E = 1 i = 1 N S M o , i S M f , i 2 i = 1 N S M o , i S M _ _ o , i 2 ,   N S E 1
W I = 1 i = 1 N S M i , f S M i , o C   i = 1 N S M i , o S M _ _ o ,   w h e n                     i = 1 N S M i , f S M i , o C   i = 1 N S M i , o S M _ _ o C   i = 1 N S M i , o S M _ _ o i = 1 N S M i , f S M i , o 1 ,   w h e n                   i = 1 N S M i , f S M i , o > C   i = 1 N S M i , o S M _ _ o ,     0 W I 1
L M I = 1 i = 1 N S M o , i S M f , i i = 1 N S M o , i S M _ _ o , i ,   0 L M I 1
In Equation (14), the parameter C is generally set to a value of 2 [52]. NSE is used to assess a model’s ability to predict data that diverge from the mean [51]; however, this measure is sensitive to discrepancies between observed and predicted values [50]. To further evaluate the predictive capacity of the model, the WI was utilized as an improvement over the RMSE, r, and NSE. The WI quantifies the ratio of the mean square error to the potential error, multiplied by the number of observations, and is then subtracted from 1. This metric, which focuses on squared differences, addresses NSE’s lack of sensitivity [53], as it encompasses the ratio of model errors rather than their squared deviations [54]. The Legates and McCabe index (0 ≤ LMI ≤ 1.0) was also used because of its advantage over the WI, especially when high forecast values are expected, even if the model fit is poor [54]. Unlike the WI, which may yield elevated values due to the squared differences between the observed and predicted data, the LMI provides a more refined approach by applying a balanced weighting to model errors. Specifically, the LMI reduces the influence of extreme values or outliers, which can disproportionately affect the WI due to its reliance on squared differences. This characteristic makes the LMI particularly effective in scenarios where high forecast values or significant discrepancies are expected, ensuring a more robust evaluation of model performance [53]. Thus, the LMI [50] was chosen as an improvement over the WI, as it applies balanced weighting to model errors and differences without inflating them through squaring.

3. Results

3.1. LSTM Analysis Across Six Parcels

The performance of the LSTM model was evaluated across six agricultural parcels, Goulet, Gou-new, BNiquet, DBolduc, PBolduc, and Lalancette, by using scatter plots (Figure 7) and multiple statistical indices (Figure 8). Figure 7 demonstrates strong agreement between observed and predicted soil moisture values, with most data points closely clustered around the 1:1 line and falling within the ±5% deviation range. This pattern indicates that the model effectively captures soil moisture dynamics across all locations, with minimal prediction errors.
Quantitative metrics in Figure 8 further validate the model’s accuracy. The coefficient of determination (R2) ranged from 0.991 to 0.998, with the highest values observed for PBolduc (0.997) and BNiquet (0.998). Similarly, the Nash–Sutcliffe efficiency (NSE) values were highest for BNiquet (0.996), PBolduc (0.994), and DBolduc (0.994), highlighting robust performance in these parcels. The Willmott index (WI) values exceeded 0.995 for all parcels, peaking at 0.999 for DBolduc, PBolduc, and BNiquet, indicating near-perfect agreement between predictions and observations.
Despite the overall high performance, some variability was observed. Lalancette and BNiquet exhibited the highest mean absolute relative error (MARE) and root mean square error (RMSE), reflecting relatively higher prediction errors in these parcels. The lower Legates and McCabe index (LMI) values for Lalancette (0.927) and Goulet (0.932) suggest that unique environmental or soil characteristics could influence model accuracy in these areas.
Figure 9 further explores the model’s error dynamics, showing that more than 90% of predictions for all parcels have a relative absolute error (RAE) of less than 1%. Goulet, Gou-new, and PBolduc demonstrated the steepest cumulative frequency curves, indicating the highest predictive accuracy, while Lalancette and BNiquet exhibited slower curves, reflecting larger error margins.
Finally, Figure 10 compares the discrepancy ratios across parcels. Goulet, Gou-new, and BNiquet showed the closest agreement to the ideal ratio of 1.00, while Lalancette displayed greater variability, particularly at higher soil moisture levels. These findings highlight the need for targeted improvements, such as incorporating localized soil and meteorological data, to enhance model accuracy in parcels with higher variability.
In summary, the LSTM model demonstrates robust performance across all parcels, with Goulet, Gou-new, and PBolduc showing the best results based on high accuracy and low error rates. Conversely, Lalancette and BNiquet exhibit slightly higher errors, suggesting that localized soil and environmental factors influence model performance. These results indicate that while LSTM effectively captures soil moisture dynamics across the study area, its predictive accuracy varies depending on site-specific conditions.

3.2. Correlation Between LSTM Model Performance and NDMI-Based VSM Assessment

The performance of the LSTM model aligns closely with the VSM conditions revealed by the NDMI, providing a comprehensive understanding of the dynamics across the six parcels (Figure 11). PBolduc and Goulet demonstrated the best results in both analyses. Their high LSTM predictive accuracy is supported by the NDMI maps, which show extensive regions of higher NDMI values (green tones), indicating adequate soil moisture and healthy vegetation. The histograms for these parcels further confirm this, with values concentrated near the higher end of the water stress range (−0.1 to 0.4). These findings suggest that effective irrigation or favorable environmental conditions have contributed to consistent moisture levels, enabling the LSTM model to perform reliably.
In contrast, Gou-new and DBolduc displayed moderate performance in both LSTM- and NDMI-based evaluations. The NDMI maps (Figure 11) and histograms (Figure 12) for these parcels reveal a mix of dry and moist areas, reflecting spatial variability in soil moisture, likely caused by differences in soil drainage or retention capacity. This variability presents challenges for maintaining consistent soil moisture conditions and impacts the LSTM model’s predictive reliability. Targeted interventions, such as localized irrigation strategies, could help address these issues and enhance both soil moisture balance and model performance.
The most drought-prone parcels, Lalancette and BNiquet, exhibited the lowest NDMI values and the highest LSTM prediction errors. The NDMI maps show extensive areas of low values (orange and yellow tones), while histograms indicate concentrations near the lower range (−0.2 to −0.1), reflecting severe water stress. The LSTM model’s higher errors for these parcels, as evidenced by elevated MARE and RMSE values, highlight the difficulty of accurately predicting soil moisture under such extreme drought conditions. Lalancette shows significant variability in both NDMI values and LSTM performance, suggesting that unique environmental factors, such as poor soil retention or inconsistent irrigation, exacerbate these challenges.
The correlation between the NDMI and LSTM results underscores the utility of the NDMI in diagnosing soil moisture variability and guiding model refinements. Higher NDMI values correspond to better LSTM performance, while parcels with a low NDMI require targeted water management strategies to mitigate drought stress and improve model reliability. These findings suggest that integrating NDMI-based insights with localized soil and meteorological data can enhance both moisture assessments and predictive modeling, ensuring sustainable water management and precision agriculture practices.

4. Discussion

4.1. Predictive Performance of LSTM Model

The results of this study demonstrate the effectiveness of LSTM in predicting soil moisture dynamics, particularly in well-irrigated agricultural parcels, where its predictive accuracy closely aligns with the soil moisture variability captured by the NDMI. The model achieved strong performance in PBolduc and Goulet, where higher NDMI values indicated adequate moisture conditions and healthy vegetation. These findings are consistent with those obtained by Han et al. [8], and Qi et al. [55], who reported that deep learning models, particularly CAEDLSTM, enhance soil moisture predictions by capturing temporal dependencies. While CAEDLSTM achieved a 5.01% increase in R2 and a 12.89% reduction in RMSE compared with standard LSTM, its computational complexity limits real-time applications. In contrast, the NDMI-integrated LSTM approach presented in this study provides a computationally efficient alternative, demonstrating high predictive accuracy under stable moisture conditions.
Despite these promising results, moderate prediction variability was observed in Gou-new and DBolduc, where the NDMI maps showed a mix of dry and moist areas, suggesting that soil drainage capacity and retention influence soil moisture variability. These discrepancies are similar to the findings obtained by Zheng et al. [10], who noted that deep learning models such as DNNR exhibited higher errors in regions with irregular soil moisture patterns. Their study emphasized the importance of multi-feature processing and regional calibration, which aligns with the need to refine LSTM predictions through additional environmental parameters, such as precipitation and evapotranspiration.
The largest prediction errors were observed in Lalancette and BNiquet, the most drought-prone parcels, where the NDMI values were the lowest. The LSTM model showed higher MARE and RMSE in these areas, likely due to extreme soil moisture variability and limited representation of such conditions in the training data. These results are consistent with those obtained by Pan et al. [14], who developed a Graph Convolutional LSTM (GCCL) model to address soil moisture heterogeneity. Their graph-based framework improved prediction accuracy in complex topographical regions but required high-resolution data and increased computational power, a limitation that LSTM mitigates by offering a more practical alternative for precision agriculture.

4.2. Role of NDMI in Improving Soil Moisture Prediction

The observed correlation between NDMI values and LSTM performance reinforces the importance of integrating remote sensing indices with deep learning for soil moisture estimation. Higher NDMI values consistently corresponded to lower prediction errors, while lower NDMI values were associated with greater discrepancies. These findings align with those obtained by Wu et al. [11], who found that PSO-LSTM, optimized with Sentinel-1 VV and VH polarization data, improved soil moisture prediction accuracy (R2 = 0.96). However, their study was limited to citrus orchards, whereas this research study applies NDMI-derived soil moisture estimates across multiple agricultural parcels, broadening its applicability.
Several studies have suggested that LSTM models struggle with extreme soil moisture conditions, particularly under drought stress. Ling et al. [15] and Ojha et al. [56] emphasized that model-based soil moisture products, such as ERA5 and GLDAS, fail to capture irrigation effects due to the absence of direct irrigation signals. In contrast, our study incorporates the NDMI as an indirect measure of soil moisture, enhancing prediction accuracy in precision irrigation applications. Future improvements could integrate multi-sensor data fusion techniques—a recommendation also proposed by Ling et al. [15], who found that ESA CCI SM provided better soil moisture estimates but suffered from temporal variability issues.

4.3. Comparing LSTM with Hybrid and Physics-Based Models

Although the LSTM model demonstrated strong predictive capabilities, it has inherent limitations in capturing extreme moisture variability, like other deep learning models. Several studies have explored hybrid approaches to addressing these challenges. The authors in [57] reported that RNN-LSTM models were highly effective in predicting soil moisture at shallow depths (10–20 cm), but performance declined at 30 cm, where internal soil properties rather than meteorological factors influenced soil moisture retention. Our findings mirror these results, as the predicted accuracy was higher in surface layers, where soil moisture is more dynamic.
Similarly, Huang et al. [13] demonstrated that Conv-LSTM models incorporating static and dynamic variables (land cover, soil type, elevation, etc.) improved prediction accuracy (R2 = 0.92 compared with R2 = 0.84 without static variables). While our study did not incorporate land surface characteristics as explicit input variables, the NDMI-based assessment provides indirect information on vegetation health and moisture levels, supporting the argument that integrating physical and remote sensing data enhances soil moisture prediction accuracy.
Another promising direction is the fusion of machine learning with physics-based hydrological models. Li et al. [58] highlighted that hybrid models (e.g., LSTM + CNN and LSTM + Random Forest) outperform standard deep learning models by improving spatial generalization. The observed discrepancies in this study suggest that integrating ensemble learning or graph-based models could refine predictions in regions with complex moisture dynamics.

4.4. Implications for Precision Agriculture and Future Research

While the NDMI-integrated LSTM model presents a promising tool for soil moisture prediction and irrigation management, several limitations must be addressed to enhance its applicability. One of the key challenges is the reliance on remote sensing data, such as ERA5-derived soil moisture estimates, which may have limited spatial resolution. Moreover, LSTM models require large datasets for training, making them less feasible for regions with sparse ground observations. Similar concerns were raised by Ling et al. [15], who suggested improving bias correction techniques for soil moisture estimation. Future research should focus on integrating higher-resolution meteorological and soil data to reduce uncertainty and develop adaptive models that can be dynamically fine-tuned for specific locations.
Incorporating multi-source remote sensing products (e.g., Sentinel-1, Sentinel-2, and MODIS) could further enhance spatial and temporal resolution, as suggested by Wu et al. [11] and Pan et al. [14].
Additionally, hybrid modeling techniques such as combining LSTM with physics-based hydrological models could improve interpretability and predictive robustness, ensuring better alignment with ground-truth soil moisture measurements.
Overall, this study confirms the potential of LSTM models for precision irrigation, particularly when integrated with NDMI-based assessments. By refining model calibration with additional environmental variables and improving training datasets, LSTM models can become a valuable tool for optimizing irrigation strategies and supporting sustainable agricultural practices under increasing climate variability.

5. Conclusions

This study confirms the effectiveness of LSTM models in predicting soil moisture across six agricultural parcels, with a strong alignment between LSTM predictions and NDMI-based soil moisture assessments. The model achieved high accuracy in PBolduc, Goulet, and Gou-new, where R2 values exceeded 0.995, NSE values were above 0.994, and WI values peaked at 0.999, indicating near-perfect agreement with observed soil moisture.
Conversely, Lalancette and BNiquet exhibited the highest errors, with the lowest Legates and McCabe index (LMI) values of 0.927 and 0.932, respectively, highlighting challenges in predicting soil moisture under high variability. The highest mean absolute relative error (MARE) and root mean square error (RMSE) were also observed in these parcels, indicating greater prediction uncertainty.
Additionally, the correlation between NDMI values and LSTM performance revealed that higher NDMI values corresponded to better model accuracy, while lower NDMI values aligned with increased prediction errors, underscoring the importance of integrating remote sensing data in soil moisture assessment.
These findings suggest that while LSTM models are highly effective in capturing soil moisture dynamics, their accuracy is influenced by spatial heterogeneity and extreme drought conditions. The model’s predictive capability makes it a valuable tool for smart irrigation applications, as it can provide real-time soil moisture predictions that help optimize irrigation scheduling. By incorporating NDMI-based assessments and additional environmental variables (e.g., precipitation, evapotranspiration, and soil texture), the model can support data-driven decision making for precision irrigation, ensuring efficient water use and improved crop health.

Author Contributions

Conceptualization, E.K.; methodology, E.K. and S.J.G.; software, E.K.; validation, E.K.; formal analysis, E.K.; investigation, E.K.; resources, E.K. and S.J.G.; data curation, E.K.; writing—original draft preparation, E.K.; writing—review and editing, E.K., S.J.G. and H.B.; visualization, E.K.; supervision, S.J.G. and H.B.; project administration, S.J.G.; funding acquisition, S.J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research study was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) under grant number CRDPJ 514551-17.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are unavailable due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
BNFarm BN (study site)
DBFarm DB (study site)
GOUFarm GOU (study site)
LALFarm LAL (study site)
PBFarm PB (study site)
SMsoil moisture
VSMvolumetric soil moisture
DLdeep learning
RSremote sensing
ERA5Fifth-Generation ECMWF Reanalysis Dataset
ECMWFEuropean Centre for Medium-Range Weather Forecasts
GLDASGlobal Land Data Assimilation System
LSTM Long Short-Term Memory
MAREmean absolute relative error
MODISModerate-Resolution Imaging Spectroradiometer
NDMINormalized Difference Moisture Index
NDVI Normalized Difference Vegetation Index
NIRnear-infrared
SWIRshortwave infrared
NSENash–Sutcliffe efficiency
WIWillmott index
Rcorrelation coefficient
RAErelative absolute error
RNN recurrent neural network
RMSEroot mean square error

References

  1. Rasheed, M.W.; Tang, J.; Sarwar, A.; Shah, S.; Saddique, N.; Khan, M.U.; Khan, M.I.; Nawaz, S.; Shamshiri, R.R.; Aziz, M.; et al. Soil moisture measuring techniques and factors affecting the moisture dynamics: A comprehensive review. Sustainability 2022, 14, 11538. [Google Scholar] [CrossRef]
  2. Celik, M.F.; Isik, M.S.; Yuzugullu, O.; Fajraoui, N.; Erten, E. Soil moisture prediction from remote sensing images coupled with climate, soil texture and topography via deep learning. Remote Sens. 2022, 14, 5584. [Google Scholar] [CrossRef]
  3. Bwambale, E.; Abagale, F.K.; Anornu, G.K. Smart irrigation monitoring and control strategies for improving water use efficiency in precision agriculture: A review. Agric. Water Manag. 2022, 260, 107324. [Google Scholar] [CrossRef]
  4. Shady, A.M. Irrigation Drainage and Flood Control in Canada; Canadian International Development Agency: Ottawa, ON, Canada, 1989. [Google Scholar]
  5. Tichoux, H. Model Comparison of Three Irrgation Systems for Potato Production in Quebec. Master’s Thesis, McGill University, Montréal, QC, Canada, 1999. [Google Scholar]
  6. Nand, V.; Qi, Z. Potential of implementing irrigation in rainfed agriculture in Quebec: A review of climate change-induced challenges and adaptation strategies. Irrig. Drain. 2023, 72, 1165–1187. [Google Scholar] [CrossRef]
  7. Gallichand, J.; Broughton, R.S.; Boisvert, J.; Rochette, P. Simulation of irrigation requirements for major crops in South Western Quebec. Can. Agric. Eng. 1991, 33, 1–9. [Google Scholar]
  8. Han, J.; Hong, J.; Chen, X.; Wang, J.; Zhu, J.; Li, X.; Yan, Y.; Li, Q. Integrating Convolutional Attention and Encoder–Decoder Long Short-Term Memory for Enhanced Soil Moisture Prediction. Water 2024, 16, 3481. [Google Scholar] [CrossRef]
  9. Orth, R. Global soil moisture data derived through machine learning trained with in-situ measurements. Sci. Data 2021, 8, 170. [Google Scholar]
  10. Cai, Y.; Zheng, W.; Zhang, X.; Zhangzhong, L.; Xue, X. Research on soil moisture prediction model based on deep learning. PLoS ONE 2019, 14, e0214508. [Google Scholar] [CrossRef]
  11. Wu, Z.; Cui, N.; Zhang, W.; Liu, C.; Jin, X.; Gong, D.; Xing, L.; Zhao, L.; Wen, S.; Yang, Y. Estimating soil moisture content in citrus orchards using multi-temporal sentinel-1A data-based LSTM and PSO-LSTM models. J. Hydrol. 2024, 637, 131336. [Google Scholar] [CrossRef]
  12. Tursun, A.; Xie, X.; Wang, Y.; Liu, Y.; Peng, D.; Rusuli, Y.; Zheng, B. Reconstruction of missing streamflow series in human-regulated catchments using a data integration LSTM model. J. Hydrol. Reg. Stud. 2024, 52, 101744. [Google Scholar] [CrossRef]
  13. Huang, F.; Zhang, Y.; Zhang, Y.; Shangguan, W.; Li, Q.; Li, L.; Jiang, S. Interpreting Conv-LSTM for spatio-temporal soil moisture prediction in China. Agriculture 2023, 13, 971. [Google Scholar] [CrossRef]
  14. Pan, Z.; Xu, L.; Chen, N. Combining graph neural network and convolutional LSTM network for multistep soil moisture spatiotemporal prediction. J. Hydrol. 2025, 651, 132572. [Google Scholar] [CrossRef]
  15. Ling, X.; Huang, Y.; Guo, W.; Wang, Y.; Chen, C.; Qiu, B.; Ge, J.; Qin, K.; Xue, Y.; Peng, J. Comprehensive evaluation of satellite-based and reanalysis soil moisture products using in situ observations over China. Hydrol. Earth Syst. Sci. 2021, 25, 4209–4229. [Google Scholar] [CrossRef]
  16. Lobell, D.B.; Asner, G.P. Moisture effects on soil reflectance. Soil Sci. Soc. Am. J. 2002, 66, 722–727. [Google Scholar] [CrossRef]
  17. Hegazi, E.H.; Samak, A.A.; Yang, L.; Huang, R.; Huang, J. Prediction of soil moisture content from sentinel-2 images using convolutional neural network (CNN). Agronomy 2023, 13, 656. [Google Scholar] [CrossRef]
  18. Lykhovyd, P.V.; Sharii, V.O. Normalised difference moisture index in water stress assessment of maize crops. Agrology 2024, 7, 21–26. [Google Scholar] [CrossRef]
  19. Hosseini Chamani, F.; AFirouzi, F.; Amerykhah, H. Pedotransfer Function (PTF) for Estimation Soil moisture using NDVI, land surface temperature (LST) and normalized moisture (NDMI) indices. J. Water Soil Conserv. 2019, 26, 239–254. [Google Scholar]
  20. Safi, A.R.; Karimi, P.; Mul, M.; Chukalla, A.; de Fraiture, C. Translating open-source remote sensing data to crop water productivity improvement actions. Agric. Water Manag. 2022, 261, 107373. [Google Scholar] [CrossRef]
  21. Gao, B.-C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
  22. Taloor, A.K.; Manhas, D.S.; Kothyari, G.C. Retrieval of land surface temperature, normalized difference moisture index, normalized difference water index of the Ravi basin using Landsat data. Appl. Comput. Geosci. 2021, 9, 100051. [Google Scholar] [CrossRef]
  23. Muñoz-Sabater, J.; Dutra, E.; Agustí-Panareda, A.; Albergel, C.; Arduini, G.; Balsamo, G.; Boussetta, S.; Choulga, M.; Harrigan, S.; Hersbach, H.; et al. ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth Syst. Sci. Data 2021, 13, 4349–4383. [Google Scholar] [CrossRef]
  24. Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horanyi, A.; Munoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
  25. Alaminie, A.A.; Annys, S.; Nyssen, J.; Jury, M.R.; Amarnath, G.; Mekonnen, M.A.; Tilahun, S.A. A comprehensive evaluation of satellite-based and reanalysis soil moisture products over the upper Blue Nile Basin, Ethiopia. Sci. Remote Sens. 2024, 10, 100173. [Google Scholar] [CrossRef]
  26. Mihalevich, B.A.; Neilson, B.T.; Buahin, C.A. Evaluation of the ERA5-land reanalysis data set for process-based river temperature modeling over data sparse and topographically complex regions. Water Resour. Res. 2022, 58, e2021WR031294. [Google Scholar] [CrossRef]
  27. Schönauer, M.; Ågren, A.M.; Katzensteiner, K.; Hartsch, F.; Arp, P.; Drollinger, S.; Jaeger, D. Soil moisture modeling with ERA5-Land retrievals, topographic indices, and in situ measurements and its use for predicting ruts. Hydrol. Earth Syst. Sci. 2024, 28, 2617–2633. [Google Scholar] [CrossRef]
  28. Wilber, A.L.; Czarnecki, J.M.; McCurdy, J.D. An ArcGIS Pro workflow to extract vegetation indices from aerial imagery of small-plot turfgrass research. Crop Sci. 2022, 62, 503–511. [Google Scholar] [CrossRef]
  29. Bisong, E. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners; Apress: Berkeley, CA, USA, 2019; pp. 59–64. [Google Scholar]
  30. Yao, J.; Wu, J.; Xiao, C.; Zhang, Z.; Li, J. The classification method study of crops remote sensing with deep learning, machine learning, and Google Earth engine. Remote Sens. 2022, 14, 2758. [Google Scholar] [CrossRef]
  31. Saldanha, R.; Akbarinia, R.; Pedroso, M.; Ribeiro, V.; Cardoso, C.; Pena, E.H.M.; Valduriez, P.; Porto, F. Zonal statistics datasets of climate indicators for Brazilian municipalities. Environ. Data Sci. 2024, 3, e2. [Google Scholar] [CrossRef]
  32. Huang, F.; Xia, X.; Huang, Y.; Lv, S.; Chen, Q.; Pan, Y.; Zhu, X. Comparison of Winter Wheat Extraction Methods Based on Different Time Series of Vegetation Indices in the Northeastern Margin of the Qinghai–Tibet Plateau: A Case Study of Minhe, China. Remote Sens. 2022, 14, 343. [Google Scholar] [CrossRef]
  33. Hochreiter, S. Long Short-Term Memory; Neural Computation MIT-Press: La Jolla, CA, USA, 1997. [Google Scholar]
  34. Gopnarayan, A.; Deshpande, S. Survey of prediction using recurrent neural network with long short-term memory. Int. J. Sci. Res. 2019, 8, 9–11. [Google Scholar]
  35. Fei, H.; Tan, F. Bidirectional grid long short-term memory (bigridlstm): A method to address context-sensitivity and vanishing gradient. Algorithms 2018, 11, 172. [Google Scholar] [CrossRef]
  36. Yifan, Z.; Fengchen, Q.; Fei, X. GS-RNN: A novel RNN optimization method based on vanishing gradient mitigation for HRRP sequence estimation and recognition. In Proceedings of the 2020 IEEE 3rd International Conference on Electronics Technology (ICET), Chengdu, China, 8–12 May 2020. [Google Scholar]
  37. Ebtehaj, I.; Bonakdari, H. CNN vs. LSTM: A Comparative Study of Hourly Precipitation Intensity Prediction as a Key Factor in Flood Forecasting Frameworks. Atmosphere 2024, 15, 1082. [Google Scholar] [CrossRef]
  38. Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
  39. Okut, H. Deep learning for subtyping and prediction of diseases: Long-short term memory. In Deep Learning Applications; IntechOpen: London, UK, 2021. [Google Scholar]
  40. Chen, Z.; Ho, P.-H. Global-connected network with generalized ReLU activation. Pattern Recognit. 2019, 96, 106961. [Google Scholar] [CrossRef]
  41. Gonzalez, J.; Yu, W. Non-linear system modeling using LSTM neural networks. IFAC-PapersOnLine 2018, 51, 485–489. [Google Scholar] [CrossRef]
  42. Brownlee, J. Deep Learning for Time Series Forecasting: Predict the Future with MLPs, CNNs and LSTMs in Python; Machine Learning Mastery: San Juan, Puerto Rico, 2018. [Google Scholar]
  43. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  44. Ogundokun, R.O.; Maskeliunas, R.; Misra, S.; Damaševičius, R. Improved CNN based on batch normalization and adam optimizer. In International Conference on Computational Science and Its Applications; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
  45. Şen, S.Y.; Özkurt, N. Convolutional neural network hyperparameter tuning with adam optimizer for ECG classification. In Proceedings of the 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), Istanbul, Turkey, 15–17 October 2020. [Google Scholar]
  46. Krause, P.; Boyle, D.P.; Bäse, F. Comparison of different efficiency criteria for hydrological model assessment. Adv. Geosci. 2005, 5, 89–97. [Google Scholar] [CrossRef]
  47. Catherine, D. A Practical Guide to Research Methods, A User-Friendly Manual for Mastering Research Techniques and Projects; How to Books: Oxford, UK, 2007. [Google Scholar]
  48. Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
  49. Yaseen, Z.M.; Ebtehaj, I.; Bonakdari, H.; Deo, R.C.; Mehr, A.D.; Mohtar, W.H.M.W.; Diop, L.; El-Shafie, A.; Singh, V.P. Novel approach for streamflow forecasting using a hybrid ANFIS-FFA model. J. Hydrol. 2017, 554, 263–276. [Google Scholar] [CrossRef]
  50. Legates, D.R.; McCabe, G.J., Jr. Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res. 1999, 35, 233–241. [Google Scholar] [CrossRef]
  51. Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
  52. Willmott, C.J.; Robeson, S.M.; Matsuura, K. A refined index of model performance. Int. J. Climatol. 2012, 32, 2088–2094. [Google Scholar] [CrossRef]
  53. Willmott, C.J. On the validation of models. Phys. Geogr. 1981, 2, 184–194. [Google Scholar] [CrossRef]
  54. Willmott, C.J. On the evaluation of model performance in physical geography. In Spatial Statistics and Models; Springer: Dordrecht, The Netherlands, 1984; pp. 443–460. [Google Scholar]
  55. Qi, J.; Zhang, X.; Mccarty, G.W.; Sadeghi, A.M.; Cosh, M.H.; Zeng, X.; Gao, F.; Daughtry, C.S.; Huang, C.; Lang, M.W.; et al. Assessing the performance of a physically-based soil moisture module integrated within the Soil and Water Assessment Tool. Environ. Model. Softw. 2018, 109, 329–341. [Google Scholar] [CrossRef]
  56. Ojha, N.; Mahmoodi, A.; Mialon, A.; Richaume, P.; Ferrant, S.; Kerr, Y. Assessment of SMOS Root Zone Soil Moisture: A comparative study using SMAP, ERA5, and GLDAS. IEEE Access 2024, 12, 76121–76132. [Google Scholar] [CrossRef]
  57. Park, S.-H.; Lee, B.-Y.; Kim, M.-J.; Sang, W.; Seo, M.C.; Baek, J.-K.; E Yang, J.; Mo, C. Development of a soil moisture prediction model based on recurrent neural network long short-term memory (RNN-LSTM) in soybean cultivation. Sensors 2023, 23, 1976. [Google Scholar] [CrossRef]
  58. Li, Q.; Zhu, Y.; Shangguan, W.; Wang, X.; Li, L.; Yu, F. An attention-aware LSTM model for soil moisture and soil temperature prediction. Geoderma 2022, 409, 115651. [Google Scholar] [CrossRef]
Figure 1. Multi-scale representation of the study area within Canada. The top-left inset highlights Canada with Quebec outlined in green. The central map zooms into Quebec, indicating the study area in a purple rectangle. The bottom-left inset further details the study area, showing the agricultural parcels analyzed in this study: PB (farm PB), DB (farm DB), LAL (farm LAL), BN (farm BN), and GOU (farm GOU). The north arrows indicate orientation, and coordinate labels provide spatial reference. The study area is marked with a purple outline for clarity.
Figure 1. Multi-scale representation of the study area within Canada. The top-left inset highlights Canada with Quebec outlined in green. The central map zooms into Quebec, indicating the study area in a purple rectangle. The bottom-left inset further details the study area, showing the agricultural parcels analyzed in this study: PB (farm PB), DB (farm DB), LAL (farm LAL), BN (farm BN), and GOU (farm GOU). The north arrows indicate orientation, and coordinate labels provide spatial reference. The study area is marked with a purple outline for clarity.
Sustainability 17 02399 g001
Figure 2. Representation of ERA5 soil layer depths used in climate and hydrological modeling. Soil is divided into three distinct layers: layer 1 (0–7 cm), layer 2 (7–28 cm), and layer 3 (28–100 cm). These depths correspond to soil moisture and temperature estimates derived from ERA5 reanalysis data. The illustration depicts surface vegetation, soil stratification, and the permafrost or bedrock layer below. The pink brackets indicate the depth ranges associated with each layer.
Figure 2. Representation of ERA5 soil layer depths used in climate and hydrological modeling. Soil is divided into three distinct layers: layer 1 (0–7 cm), layer 2 (7–28 cm), and layer 3 (28–100 cm). These depths correspond to soil moisture and temperature estimates derived from ERA5 reanalysis data. The illustration depicts surface vegetation, soil stratification, and the permafrost or bedrock layer below. The pink brackets indicate the depth ranges associated with each layer.
Sustainability 17 02399 g002
Figure 3. Pairwise scatter and distribution plots of ERA5-derived volumetric soil moisture (VSM) across different agricultural parcels. The scatter plots (yellow points) show the relationships between pairs of parcels, while the histograms (red) illustrate the frequency distribution of VSM within each parcel.
Figure 3. Pairwise scatter and distribution plots of ERA5-derived volumetric soil moisture (VSM) across different agricultural parcels. The scatter plots (yellow points) show the relationships between pairs of parcels, while the histograms (red) illustrate the frequency distribution of VSM within each parcel.
Sustainability 17 02399 g003
Figure 4. Distribution of ERA5-derived volumetric soil moisture (VSM) across different agricultural parcels.
Figure 4. Distribution of ERA5-derived volumetric soil moisture (VSM) across different agricultural parcels.
Sustainability 17 02399 g004
Figure 5. Workflow for soil moisture time-series generation and prediction using LSTM-based model.
Figure 5. Workflow for soil moisture time-series generation and prediction using LSTM-based model.
Sustainability 17 02399 g005
Figure 6. Layered architecture of LSTM-based neural network for time-series prediction.
Figure 6. Layered architecture of LSTM-based neural network for time-series prediction.
Sustainability 17 02399 g006
Figure 7. Scatter plots compare LSTM-predicted and actual soil moisture across different agricultural parcels. The solid line represents the 1:1 agreement, while the dashed lines indicate ±5% deviation margins. Scatter plots comparing LSTM-predicted and actual soil moisture across different agricultural parcels with ±5% deviation margins.
Figure 7. Scatter plots compare LSTM-predicted and actual soil moisture across different agricultural parcels. The solid line represents the 1:1 agreement, while the dashed lines indicate ±5% deviation margins. Scatter plots comparing LSTM-predicted and actual soil moisture across different agricultural parcels with ±5% deviation margins.
Sustainability 17 02399 g007
Figure 8. Performance evaluation of the LSTM model for soil moisture prediction across different agricultural parcels using multiple statistical metrics.
Figure 8. Performance evaluation of the LSTM model for soil moisture prediction across different agricultural parcels using multiple statistical metrics.
Sustainability 17 02399 g008
Figure 9. Cumulative frequency distribution of relative absolute error thresholds for LSTM soil moisture predictions across different agricultural parcels.
Figure 9. Cumulative frequency distribution of relative absolute error thresholds for LSTM soil moisture predictions across different agricultural parcels.
Sustainability 17 02399 g009
Figure 10. Scatter plots of discrepancy ratios between LSTM-predicted and actual soil moisture across different agricultural parcels.
Figure 10. Scatter plots of discrepancy ratios between LSTM-predicted and actual soil moisture across different agricultural parcels.
Sustainability 17 02399 g010
Figure 11. Spatial distribution of NDMI across agricultural parcels, ranging from very dry to high moisture levels.
Figure 11. Spatial distribution of NDMI across agricultural parcels, ranging from very dry to high moisture levels.
Sustainability 17 02399 g011
Figure 12. Distribution of NDMI-based soil moisture estimates across agricultural parcels.
Figure 12. Distribution of NDMI-based soil moisture estimates across agricultural parcels.
Sustainability 17 02399 g012
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Koohikeradeh, E.; Jose Gumiere, S.; Bonakdari, H. NDMI-Derived Field-Scale Soil Moisture Prediction Using ERA5 and LSTM for Precision Agriculture. Sustainability 2025, 17, 2399. https://doi.org/10.3390/su17062399

AMA Style

Koohikeradeh E, Jose Gumiere S, Bonakdari H. NDMI-Derived Field-Scale Soil Moisture Prediction Using ERA5 and LSTM for Precision Agriculture. Sustainability. 2025; 17(6):2399. https://doi.org/10.3390/su17062399

Chicago/Turabian Style

Koohikeradeh, Elham, Silvio Jose Gumiere, and Hossein Bonakdari. 2025. "NDMI-Derived Field-Scale Soil Moisture Prediction Using ERA5 and LSTM for Precision Agriculture" Sustainability 17, no. 6: 2399. https://doi.org/10.3390/su17062399

APA Style

Koohikeradeh, E., Jose Gumiere, S., & Bonakdari, H. (2025). NDMI-Derived Field-Scale Soil Moisture Prediction Using ERA5 and LSTM for Precision Agriculture. Sustainability, 17(6), 2399. https://doi.org/10.3390/su17062399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop