Next Article in Journal
Cloud-Enabled Multi-Axis Soilless Clinostat for Earth-Based Simulation of Partial Gravity and Light Interaction in Seedling Tropisms
Previous Article in Journal
Crop Identification with Monte Carlo Simulations and Rotation Models from Sentinel-2 Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid LSTM Method for Multistep Soil Moisture Prediction Using Historical Soil Moisture and Weather Data

1
College of Engineering, University of Georgia, Tifton, GA 31793, USA
2
Abraham Baldwin Agricultural College, Tifton, GA 31793, USA
3
Department of Crop and Soil Sciences, University of Georgia, Tifton, GA 31793, USA
4
Department of Entomology, University of Georgia, Tifton, GA 31793, USA
*
Author to whom correspondence should be addressed.
AgriEngineering 2025, 7(8), 260; https://doi.org/10.3390/agriengineering7080260
Submission received: 24 June 2025 / Revised: 31 July 2025 / Accepted: 4 August 2025 / Published: 12 August 2025
(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)

Abstract

Soil moisture prediction is a key parameter for effective irrigation scheduling and water use efficiency. However, accurate long-term prediction remains challenging, as most existing models excel in short- to medium-term prediction but struggle to capture the complex temporal dependencies and non-linear interactions of soil moisture variables over extended horizons. This study proposes a hybrid soil moisture prediction method, integrating a long short-term memory (LSTM) network and extreme gradient boosting (XGBoost) model for multistep soil moisture prediction at 24 h, 72 h, and 168 h horizons. The LSTM captures temporal dependencies and extracts high-level features from the dataset, which are then used by XGBoost for final predictions. The study uses real-world data from the D.A.T.A (Demonstrating Applied Technology in Agriculture) research farm at ABAC (Abraham Baldwin Agricultural College) Tifton, GA, USA, utilizing watermark soil moisture sensors and weather station’s data installed on the farm. Results show that the proposed method outperforms other hybrid models, achieving R2 values of 98.67%, 98.54%, and 98.56% for 24, 72, and 168 h predictions, respectively. The study findings highlight that LSTM-XGBoost offers a precise long-term soil moisture prediction, making it a practical tool for real-time irrigation scheduling, enhancing water use efficiency in precision agriculture.

1. Introduction

1.1. Background and Motivation

Soil moisture is a critical parameter in agricultural practices significantly influencing crop growth, irrigation management, and climate modeling. Soil moisture levels can be measured by using in-place probes [1] or remote sensing techniques [2] and their variability arises due to complex interactions between precipitation, evaporation, transpiration, irrigation, topography, and soil characteristics. These interactions are well captured by the Food and Agriculture Organization (FAO) water balance framework [3], which positions soil moisture as a central element linking atmospheric inputs to plant-available water. In this context, evapotranspiration representing water loss via soil evaporation and plant transpiration plays a dominant role in soil moisture fluctuations, especially under varying climatic conditions [3]. Accurate modeling of these hydrological processes is essential not only for optimizing irrigation but also for advancing climate resilience, sustainable water resource management, and yield forecasting in data-driven agriculture.
Traditional soil moisture prediction methods generally fall into the following three major categories: (1) statistical models such as autoregressive integrated moving average (ARIMA) [4,5,6,7] which struggle with non-linear relationships and long-term dependencies; (2) process-based models based on hydrological balance, which require extensive site-specific parameters and can be computationally demanding; (3) machine learning (ML) and deep learning (DL) techniques which have been extensively used over the years for both short- and long-term prediction. ML methods including random forests (RF) [7,8,9,10,11,12], support vector regression (SVR) [4,13,14,15], multivariate adaptative regression spines (MARSs) [16], and modified flower pollination algorithm (MFPA) combined with artificial neural networks (ANNs) [17] have been widely used for short-term soil moisture prediction (up to 1 day). Additionally, neural network-specific ML models such as the deep neural network regression (DNNR) [18], and multi-layer perceptron (MLP) [16] have also been utilized to predict soil moisture levels. However, while these models often perform well for short-term prediction, they tend to underperform in long-term prediction due to their limited ability to capture sequential dependencies in time-series data over an extended period of time. In contrast, DL models, particularly long short-term memory (LSTM) networks, have emerged as powerful tools for time-series prediction [19,20,21]. Their unique ability to learn temporal dependencies and sequential patterns over time makes them well-suited for long-term multistep soil moisture forecasting [22].
Recent studies show that, variant LSTM architectures have been used for long-term soil moisture prediction. A study [23] presented a multi-headed LSTM to enhance soil moisture prediction over different time frames, achieving a high R2 of 95.04%. Similarly, [24] presented an attention-aware LSTM for multistep soil moisture and temperature prediction for 7 days, achieving an R2 of 95.80% and RMSE of 1.033, while [25] applied LSTM for short-term (24 h) soil moisture predictions, achieving 98% R2 for 1 h prediction. Furthermore, [7] presented an LSTM method which predicts soil moisture for 72 h ahead, achieving 3.99 mean absolute scaled error (MASE), while [8] demonstrated an LSTM model with (RMSE = 0.72) outperforming a bi-directional LSTM (Bi-LSTM) model with (RMSE = 0.76) for short-term prediction. Moreover, the study [26] presented a Bi-LSTM model which achieved (R2 = 97.7%) outperforming its benchmark model, multi-layer neural networks (MLNNs) which achieved (R2 = 86.5%), for soil moisture prediction in citrus orchards.
However, despite strong performances from standalone LSTM models, studies show that results can be significantly improved by combining an LSTM model with a ML/DL model to form a robust hybrid model. In that case, LSTM can focus on extracting features and then passing them to final estimators. A study [9] proposed a hybrid Conv-LSTM model, which integrates convolutional neural networks (CNNs) with LSTM layers to enhance spatial-temporal soil moisture predictions in regions with pronounced seasonal effects. The Conv-LSTM model achieved an R2 of 92%, demonstrating its potential for spatial-temporal predictions in agricultural applications. Another study presented a novel encoder–decoder LSTM model with residual learning (EDT-LSTM) [27] to improve soil moisture prediction accuracy on FLUXNET sites for up to 10 days. Results show significant improvements in R2, ranging from 7.95% for 1-day predictions to 19.71% for 10-day predictions while recording an average of 89.6% R2 for 1 day and 58.3% R2 for 10 days. This highlights the potential of encoder–decoder models for soil moisture prediction in diverse agricultural settings. Furthermore, a study [28] proposed a hybrid particle swarm optimization (PSO) and LSTM model (PSO-LSTM) to predict soil moisture at varying water levels using Sentinel-1A satellite data. The model, tested with five input combinations based on vertical (VV) and cross (VH) polarization data, outperformed a standalone LSTM and achieved a significantly lower normalized root mean square error (NRMSE) of 4.568–11.023% compared to 18.056–30.156% for the standalone LSTM model. Moreover, a study [10] explored hybrid LSTM models, including feature attention-based LSTM (FA-LSTM) and generative adversarial network-based LSTM (GAN-LSTM). Results showed that LSTM, combined with feature-attention mechanisms, outperformed CNNs and transformer models, achieving R2 values of 94.3% for 1-day predictions and 81.6% for 7-day predictions. In summary, these studies emphasize the promising potential of hybrid LSTM approaches over regular standalone LSTM models for improving soil moisture predictions across diverse soil types, depths, and prediction horizons.
Although many existing studies report high R2 scores, most models excel in short-term prediction (up to 24 h) but struggle in long-term prediction, as their accuracies decrease as the prediction horizon extends. This highlights the ongoing challenge of improving forecasting performance for long horizons, especially in multistep predictions (when predicting multiple horizons simultaneously). To address this, our study proposes a hybrid LSTM and extreme gradient boosting (XGBoost) framework that combines LSTM’s ability to extract temporal features with XGBoost’s strength in modeling complex non-linear interactions to predict soil moisture trends at 24 h, 72 h, and 168 h simultaneously. The proposed model is benchmarked against a wide range of standalone ML, DL, and other hybrid models, with the goal of developing a scalable, data-driven soil moisture prediction tool to support irrigation scheduling and water use efficiency.

1.2. Objectives of the Study

The main objective of this study is to develop an accurate multistep soil moisture prediction framework for 24 h, 72 h, and 168 h simultaneously using IoT farm sensors and weather data inputs. Specifically, the study aims to do the following:
  • Develop a hybrid LSTM-XGBoost model to predict soil moisture at 24 h, 72 h, and 168 h intervals using historical soil moisture and weather data.
  • Evaluate the performance of the proposed hybrid model against standalone ML, DL, and other hybrid models.
  • Assess the ability of LSTM to capture long-term temporal dependencies in time-series soil moisture and meteorological data for improved feature representation.
  • Investigate the role of XGBoost in refining predictions by leveraging non-linear relationships within LSTM-extracted features.

1.3. Contributions of the Study

This study proposes a novel hybrid approach combining LSTM with XGBoost for multistep ahead soil moisture prediction, targeting the next 24 h, 72 h, and 168 h. The LSTM captures long-term spatial-temporal dependencies in the data, while XGBoost handles complex, non-linear relationships, resulting in a more accurate and reliable prediction model. This approach contributes to the optimization of irrigation scheduling and the improvement of water-use efficiency by leveraging historical soil moisture and weather data. In addition, this research offers an in-depth performance comparison of the proposed hybrid model against several standalone and hybrid ML and DL models. It provides insights into the strengths and limitations of different approaches for multistep ahead soil moisture prediction. Overall, this study establishes a benchmark for soil moisture prediction, particularly in using lag-based soil moisture and weather data, contributing to more efficient irrigation practices and sustainable agricultural management.

1.4. Hypotheses

  • The LSTM-XGBoost hybrid model will outperform all evaluated standalone ML, DL, and other hybrid models in terms of R2 and error metrics across 24, 72, and 168 h prediction horizons.
  • LSTM will significantly enhance temporal feature representation by learning long-term dependencies in soil moisture and weather variables.
  • XGBoost, when applied to LSTM-derived features, will reduce prediction errors by capturing complex non-linearities in the data.
  • Combining lag soil moisture values with historical weather data will improve the accuracy of future soil moisture predictions.

1.5. Research Questions

  • Can a hybrid model combining LSTM and XGBoost improve the accuracy of multistep soil moisture prediction (24, 72, and 168 h ahead) compared to standalone and other hybrid models?
  • How effectively does the LSTM component capture long-term temporal dependencies in time-series soil moisture and meteorological data?
  • What role does XGBoost play in refining soil moisture predictions by modeling non-linear relationships in LSTM-extracted features?
  • Can historical weather and soil moisture data be used to predict future soil moisture values?

2. Materials and Methods

2.1. Dataset Description

The dataset used for this research was collected from 11 March 2024 to 31 December 2024, as part of the University of Georgia’s 4D farm (Digital and Data-Driven Demonstration Farm) on the Abraham Baldwins Agricultural College (ABAC) D.A.T.A. (Demonstrating Applied Technology in Agriculture) farm, located in Tifton, Georgia, USA (GPS coordinates: 31.485043, –83.543575). IoT device sensors were installed on the farm to measure soil moisture content and weather condition in the fields, as shown in Table 1.
The farm is divided into four (4) fields, namely Front Field, North Pivot, South Pivot, and West Field, with a total of fourteen (14) watermark soil moisture sensors (Realm5 Inc., Lincoln, NE, USA) named 4D 01 to 4D 14. The Front Field contains two soil moisture sensors (4D 01 and 4D 02) while the rest contain four sensors each: North Pivot (4D 03–4D 06), South Pivot (4D 07–4D 10), and West Field (4D 11–4D 14).
This study focuses exclusively on the South Pivot field. For each timestamp, the soil moisture value used in the analysis was calculated as the average of the four active sensors located in that field. Each soil moisture device records soil moisture and temperature at three depths, 8 inches, 16 inches, and 24 inches, measured in kilopascals (kPa) and Celsius (C), respectively. A single on-site weather station records the following atmospheric parameters: air temperature, dew point, humidity, solar radiation, rainfall, wind speed, wind direction, wind gust, and air pressure.

2.2. Data Acquisition and Preprocessing

The IoT sensors communicate with the base station using long-range wide area network (LoRaWAN) radios, while the base station uses a cellular modem to transmit the data to the internet. We access the soil moisture and weather measurements from two separate web API endpoints via python requests. These measurements are recorded at different time intervals (weather data recorded every 15 min, soil moisture recorded every hour) and are merged together into a CSV datasheet. Figure 1 shows the data acquisition and pre-processing pipeline, generating the final dataset for model training and testing. Since weather readings are measured and recorded in 15 min intervals and soil moisture is recorded every hour, there are four instances of weather readings for each instance of soil moisture reading, hence these data were merged based on their timestamps.
The timestamp column was converted into a “datetime” object to ensure proper handling of time-dependent data. To capture temporal dependencies in soil moisture, lagged features are created for average moisture readings using different time intervals (LSTM window), including (1, 6, 12, 24, 48, and 96 h) hours historical readings. These lagged features help the model understand the influence of past moisture measurements on current and future soil moisture levels.
Missing values in the dataset are addressed by removing any rows with incomplete data, ensuring that only valid, clean data is used for model training. This decision was made because missing values resulted from sensors being temporarily uninstalled during the harvesting season to prevent damage from machinery and then reinstalled before the next growing season. The data is then normalized using the MinMaxScaler routine (Pytorch), which scales the features to a range between 0 and 1, improving the convergence speed of the ML models and ensuring that no feature dominates due to its scale. To preserve the temporal nature of the data, we used a time-ordered split with 60% for training, 20% for validation, and 20% for testing.
Figure 2 presents a visual representation of the feature frequencies for the ten months (11 March 2024 to 31 December 2024) in the South Pivot field. Soil moisture is recorded at three depths—8, 16, and 24 inches—and the average of these readings is used as the target variable for prediction. Soil temperature, measured by the IoT sensor device, is another parameter considered. The remaining features are meteorological, including air temperature, air pressure (Pa), solar radiation (W/m2), wind direction (degrees), wind gust (kph), wind speed (kph), humidity (%), dew point temperature (C), and rainfall (in.), which are measured by the weather station sensors.

2.3. LSTM Architecture

An LSTM is a neural network derived from recurrent neural networks (RNNs), designed to overcome vanishing and exploding gradient issues in time-series forecasting. They contain memory cells that store long-term dependencies, making them particularly effective for soil moisture prediction, where past moisture conditions influence future values. A standard LSTM unit consists of an input gate to update the cell state with new information, a forget gate which decides what information from previous states should be dropped, and an output gate to determine the final output based on the cell state. Figure 3 represents a general LSTM architecture.
The standard LSTM block is presented by Equations (1)–(6):
f t = σ W f   · h t 1 , x t +   b f
i t = σ W i   · h t 1 , x t + b i
O t = σ W o   · h t 1 , x t + b o
Ĉ t = t a n h W c   · h t 1 , x t + b c
C t = f t C t 1 + i t Ĉ t
h t = O t t a n h ( C t )
The input gate, denoted as it, determines which new information from the current input xt and the previous hidden state ht−1 should be added to the cell state. Its activation is calculated using the sigmoid function σ, applied to a weighted sum of the concatenated input and previous hidden state [ht−1, xt], along with its associated weight matrix Wi and bias bi. Similarly, the forget gate, denoted as ft, decides what information to discard from the cell state based on its own weight matrix Wf, bias bf, and sigmoid activation σ. The output gate, Ot, controls what information from the current cell state will be output as the hidden state ht. It also uses the sigmoid function σ applied to a weighted combination of [ht−1, xt] with its weight matrix Wo and bias bo. The symbol ⊙ represents element-wise multiplication, a crucial operation within the LSTM to selectively filter and combine information. The hyperbolic tangent function tanh is used to produce outputs in the range of (−1 to 1) often applied to the cell state when calculating the candidate values or the final hidden state.

2.4. Proposed LSTM Architecture

The proposed LSTM model architecture consists of three sequential LSTM layers followed by a dropout layer and two fully connected (dense) layers, as summarized in Table 2. The input to the model is a 24 h time series, where 24 represents the number of time steps and 32 denotes the batch size, as shown in Table 2. The first LSTM layer outputs a sequence of 256 features for each time step, which is passed to the second LSTM layer that reduces it to 128 features per time step. The third LSTM layer outputs only the final hidden state of size 64, capturing the summarized temporal representation. A dropout layer is applied for regularization. The output is then passed through a dense layer with 256 neurons (output features) and a final dense layer with 168 neurons which predicts soil moisture values for the next 168 h (7-day forecast).

2.5. XGBoost Architecture

XGBoost is an ensemble learning method derived from the regular gradient boosting method. It uses sequential decision trees as base learners with each tree responsible for correcting errors made by the previous tree in the process called boosting, thereby a good choice for time-series data. Figure 4 presents the general XGBoost workflow.

2.6. Proposed XGBoost Architecture

The proposed XGBoost model architecture is configured with key hyperparameters optimized for stability and performance. It uses 500 boosting rounds (n_estimators = 500) to iteratively improve model accuracy, with a learning rate of 0.05 to control the step size and reduce the risk of overfitting. The maximum tree depth is set to 5 to balance model complexity and generalization. A fixed random seed (random_state = 42) ensures reproducibility of results across runs.

2.7. Experiment Setup

This studies experiments evaluated standalone ML, DL, and hybrid ML-DL models in predicting soil moisture trends over 24 h, 72 h, and 168 h prediction. Firstly, the ML models tested included the following: XGBoost, RF, GB, SVR, LightGBM, LR, ElasticNet, CatBoost, DT, MLP, AdaBoost, and KNN. Secondly, five DL architectures were tested, including GRU, CNN, RNN, Bi-LSTM, and LSTM. The LSTM model architecture comprised three stacked LSTM layers followed by a dropout and a dense layer, allowing it to effectively model long-term temporal dependencies.
The GRU model followed this structure but used GRU units, offering comparable performance with reduced computational overhead. The CNN model consisted of a sequence of 1D convolutional (Conv1D) and max-pooling layers to extract localized temporal features, followed by a flattening operation and a dense layer for prediction. The RNN model used three recurrent layers and a dense output layer, providing a baseline for evaluating short-term dependency modeling. The Bi-LSTM model integrated three bi-directional LSTM layers, enabling the network to process input sequences in both forward and backward directions and capture richer contextual information.
Lastly, beyond standalone models we also developed hybrid architectures that combined LSTM-based temporal feature extraction with ML regressors for final prediction. These hybrid models included combinations of LSTM with other models like RF, LR, XGBoost, KNN, MLP, DT, and CNN. While in other cases LSTM was used for feature extraction before passing them to a regressor for final prediction, in the CNN-LSTM model CNN layers performed initial feature extraction before passing the sequence to LSTM layers for further temporal modeling and prediction.
Finally, we proposed a hybrid model combining LSTM for feature extraction with XGBoost for the final prediction task. This approach capitalizes on the temporal learning capabilities of LSTM and the robust, non-linear decision-making strength of XGBoost. All models were evaluated across the same multistep horizons (24, 72, and 168 h) to ensure consistent comparison.

2.8. Proposed Method Architecture

The proposed (LSTM-XGBoost) hybrid method consists of three sequentially stacked LSTM layers for temporal feature extraction, followed by dropout and dense layers for regularization and feature encoding, and finally an XGBoost regressor for final soil moisture prediction. The LSTM architecture begins with a 256 unit layer, followed by 128 and 64 unit LSTM layers, respectively. A dropout layer with a rate of 0.2 is applied after the final LSTM layer to mitigate the risk of overfitting by randomly deactivating neurons during training. The extracted temporal features are then passed through a dense layer with 64 units, serving as a compact encoded representation of the sequence. This feature vector is used as input to the XGBoost model, which performs the final multistep soil moisture prediction. The LSTM model is trained using an Adam optimizer, ReLU activation function, and MSE loss over 100 epochs, using a (80–20%) train–test split with 20% of the training data reserved for validation. The workflow for the proposed framework is shown in Figure 5.

2.9. Model Performance Evaluation Metrics

We evaluated model performance using the following three common metrics: Coefficient of determination also known as the R-squared (R2), Mean Squared Error (MSE), and Mean Absolute Error (MAE). R2 measures the proportion of variance in the dependent variable explained by the model, with values between 0 and 1, where the closer R2 is to 1 the better the model fits the data. MSE measures the average squared difference between predicted and actual values, where a lower MSE indicates better prediction accuracy while the MAE, on the other hand, calculates the average magnitude of errors without considering their direction. It is less sensitive to outliers than MSE, making it suitable for cases where large errors should not overly impact model evaluation. Equations (7)–(9) present R2, MAE, and MSE, respectively, as follows:
R 2 = 1 i = 1 n ( y i   ŷ i ) 2 i = 1 n ( y i y - ) 2
M A E = 1 n i = 1 n y i ŷ i
M S E = i n i = 1 n ( y i ŷ i ) 2
where n is the number of samples, yi is the actual value for the i-th sample, ŷi is the predicted value for the i-th sample, and y is the mean of the actual values.

3. Results

3.1. Learning Curves of Proposed Models

In summary, all DL and hybrid models were trained with a batch size of 32 and 100 epochs, demonstrating good generalization and convergence with a sharp decrease in MSE loss and an increase in R2. Figure 6 and Figure 7 display the training and validation MSE loss curves for the proposed LSTM-XGBoost and LSTM-RF models, which record the second-best results, making them useful benchmarks for comparison. The training history of all hybrid models is shown in Figure 8. All models displayed effective learning, with error decreasing from approximately 0.0175 to near 0.0001. LSTM-XGBoost showed a smooth learning curve with minimal fluctuations, while LSTM-RF exhibited some loss variations but eventually converged as expected. A summary of the training history for all models is presented in Figure 8.

3.2. Standalone ML Models Result

Table 3 presents a summary result for multistep soil moisture prediction using standalone ML models for (24, 72, and 168 h) predictions. Results demonstrate that the ET model achieved the best performance, recording (96.73%), (96.05%), and (95.60%) R2 values for 24 h, 72 h, and 168 h ahead, respectively. It is followed closely by the RF model which recorded (94.74%), (94.07%), and (91.73%) R2 scores for 24 h, 72 h, and 168 h ahead, respectively, while in third place is the XGBoost model with (91.86%, 91.84%, and 90.83%) R2 for 24 h, 72 h, and 168 h, respectively. Additionally, these models showcase their superiority by recording the lowest error scores led by ET, which recorded an average of (MAE = 0.4538 and MSE = 1.6417) while RF recorded an average of (MAE = 0.6793 and MSE = 2.1862). On the other hand, models like SVR recorded up to (MSE = 27.437 and MAE = 3.07), MLP (MSE = 26.781 and MAE = 3.5319), and LR (MSE = 25.253 and MAE = 3.3593), and they performed the worst. Overall results show that most standalone ML models generally do struggle to make multistep predictions unlike singlestep predictions (next hour predictions) due to dynamic and complex relationships between features that are exacerbated over time.

3.3. Standalone DL Models Result

Table 4 presents the performance of DL models across 24 h, 72 h, and 168 h prediction horizons. The results show that LSTM consistently achieved the best performance, with the highest R2 scores across all prediction times, demonstrating its strong ability to capture temporal dependencies. The CNN model also performed competitively, closely matching LSTM in accuracy, particularly for short- and mid-range forecasts. While LSTM showed slightly higher error metrics, it maintained strong predictive capability. In contrast, the GRU and RNN models exhibited lower accuracy, especially over longer forecast horizons, highlighting their limitations in modeling complex temporal patterns in soil moisture dynamics.

3.4. Hybrid LSTM Models Result

Table 5 summarizes the performance of hybrid LSTM models across all prediction horizons. The LSTM + XGBoost combination consistently outperformed other hybrid approaches, achieving the highest R2 scores (98.67%, 98.56%, and 98.54%) for 24 h, 72 h, and 168 h, respectively. It was closely followed by the LSTM + RF and LSTM + KNN methods, both achieving over 98% R2 values across all prediction horizons. The LSTM + MLP method performed the weakest but still achieved an average R2 score above 92% for all three prediction horizons, outperforming most ML and some DL models, highlighting the strength of hybrid models. Additionally, hybrid models recorded the lowest error, with LSTM + XGBoost achieving an MSE of 0.0002 and MAE of 0.0074 for the 168 h prediction.
In general, the test results suggest that hybrid LSTM models have the highest R2 scores across all time steps compared to standalone ML and DL models. Specifically, the proposed model (LSTM + XGBoost) results demonstrate its effectiveness in capturing non-linear relationships using LSTM-extracted temporal features. Overall, the results highlight the advantage of integrating LSTM with robust ML regressors, particularly tree-based models like XGBoost, RF, and KNN for improved multistep soil moisture prediction.

3.5. Model Testing (Actual vs. Prediction Results) Visualization

We evaluated model performance using a uniform sample size with normalized actual soil moisture values from the test set, focusing on the 168 h prediction. Figure 9 shows the performance of the top-performing standalone ML models (since there were too many ML models, we only selected the top performing models to maintain clarity and readability), including ET, RF, and XGBoost, chosen for their higher R2 scores. Figure 10 and Figure 11 represent DL and hybrid models, respectively, whereby in this case we presented all models since there were not as many ML models.

4. Discussion

This study evaluated the effectiveness of hybrid models for multistep soil moisture prediction using historical weather and soil moisture data across three prediction horizons: 24, 72, and 168 h. The proposed hybrid approach, LSTM–XGBoost, demonstrated superior performance across all horizons, outperforming both standalone ML/DL and other hybrid models in terms of accuracy and generalization.
Ensemble-based ML models such as ET and RF performed reasonably well at 24 and 72 h but declined at 168 h, with R2 dropping by 3% for RF and 1.13% for ET. Furthermore, they experience a notable increase in error (MAE = 0.8953, MSE = 2.6979) for RF and (MAE = 0.5450, MSE = 1.4349) ET, reflecting their limitations in capturing long-range temporal dependencies. In contrast, DL models such as LSTM offered improved stability across all time steps, averaging an improved R2 of 97% and MAE of 0.0158 across all horizons, reflecting their strength in modeling sequential patterns in time-series data.
However, hybrid models, particularly LSTM–RF, LSTM–KNN, and LSTM–XGBoost, outperformed all standalone models, with the proposed LSTM–XGBoost achieving the best overall results, averaging an R2 of 0.9859, MAE of 0.0077, and MSE of 0.00026 across all prediction horizons. When compared to similar hybrid methods reported in prior studies, such as; Conv-LSTM [9], PSO-LSTM [28], FA-LSTM and GAN-LSTM [10], and EDT-LSTM [27] which often excels in 24-h prediction but tend to degrade beyond 72 h. Our proposed method maintained high accuracy and minimal error propagation across all prediction time-steps. A summarized comparison of these studies against the proposed method is presented in Table 6.
The strong performance of the LSTM-XGBoost model can be attributed to the strengths of its individual components and the informative nature of the input features. LSTM captures temporal dependencies in soil and weather sequences, while XGBoost enhances generalization by modeling complex non-linear residual patterns. This helps reduce cumulative error typically associated with long-term prediction.
From a hydrological perspective, the ability to accurately predict soil moisture over multiple days is crucial for efficient irrigation scheduling and water-use efficiency. Our findings contribute to the literature by offering a robust, multistep prediction method validated against a wide range of standalone and other hybrid benchmark models. Limitations and directions for future work are discussed in Section 5.

5. Limitations and Future Works

While the proposed model demonstrates strong performance, several limitations should be noted. The study was conducted on the South Pivot field of the 4D-Farm, which contained cotton at the time of data collection, and the dataset spans a single year (11 March 2024 to 31 December 2024). This introduces potential spatial and temporal bias, as the model was trained and tested under a limited range of environmental conditions. Its generalizability across different crops, soil types, and climates remains to be validated. Furthermore, the model does not explicitly account for abrupt changes in soil moisture caused by extreme rainfall or sudden weather or climatic anomalies, which can lead to sharp spikes that are difficult for the model to capture. Such outliers may fall outside the learned patterns and remain challenging for the model to predict accurately.
In future work, we plan to address these limitations by collecting additional data from diverse fields to capture variability in soil types and environmental conditions. We also aim to evaluate the model’s performance using real-time soil moisture and weather forecasting inputs and quantify prediction uncertainty. Furthermore, future efforts will involve deploying the model within a decision support system for real-time irrigation scheduling tasks.

6. Conclusions

This study demonstrates the effectiveness of hybrid models, proposing an LSTM-XGBoost method, in predicting soil moisture content across multiple time horizons, showing significant improvements over both traditional ML, DL, and other hybrid models. Standalone ML models are effective for short-term predictions (up to 24 h) but struggle with longer multistep forecasts due to their inability to learn complex and dynamic relationships between features. In contrast, DL models capture temporal dependencies more effectively than their ML counterparts, yet hybrid models outperform both ML and DL models.
The proposed LSTM-XGBoost hybrid method achieved superior performance, with R2 scores of 98.67%, 98.54%, and 98.56% for 24, 72, and 168 h, respectively, outperforming both standalone models. Generally, hybrid models also recorded the lowest MSE and MAE across all prediction time horizons, highlighting their robustness in handling non-linear relationships and temporal dependencies between features.
The results underscore the potential of hybrid approaches, specifically the proposed model for soil moisture prediction, offering enhanced accuracy for optimizing irrigation scheduling and improving water-use efficiency in precision agriculture.

Author Contributions

D.F.K. conceptualized the study, developed the methodology, implemented the models, collected and curated the data, performed the formal analysis, and drafted the original manuscript. G.C.R., W.M.P., E.P. and A.M. provided technical guidance on methodology development and domain-specific input throughout the research process. D.O.K. and A.P.T. contributed to reviewing and improving the manuscript through proofreading and editorial suggestions. G.C.R. supervised the project, supported its administration, and secured the research funding. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the U.S. Department of Agriculture, National Institute of Food and Agriculture (USDA-NIFA), grant number 2023-68016-39403.

Data Availability Statement

All data generated, analyzed and experimented to support reported results of this study will be publicly available through an online repository at the project’s website through this link https://4dfarm.org/data/ accessed on 15 February 2025.

Acknowledgments

The authors sincerely appreciate the support of the U.S. Department of Agriculture, National Institute of Food and Agriculture (USDA-NIFA), for funding this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
XGBoostExtreme Gradient Boosting
RFRandom Forest
GBGradient Boosting
SVRSupport Vector Regressor
LightGBMLight Gradient Boosting Method
LRLinear Regression
DTDecision Tree
MLPMulti-Layer Perceptron
KNNK-Nearest Neighbors
Bi-LSTMBi-directional Long Short-Term Memory
LSTMLong Short-Term Memory
GRUGate Rated Unit
RNNRecurrent Neural Network
DLDeep Learning
MLMachine Learning
ARIMAAuto-Regressive Integrated Moving Averages
ETsExtra Trees
RRRidge Regression
CNNsConvolutional Neural Networks
ABACAbraham Baldwin Agricultural College
ISMNInternational Soil Moisture Network
FAOFood and Agriculture Organization

References

  1. Walker, J.P.; Willgoose, G.R.; Kalma, J.D. In situ measurement of soil moisture: A comparison of techniques. J. Hydrol. 2004, 293, 85–99. [Google Scholar] [CrossRef]
  2. Ahmad, S.; Kalra, A.; Stephen, H. Estimating soil moisture using remote sensing data: A machine learning approach. Adv. Water Resour. 2010, 33, 69–80. [Google Scholar] [CrossRef]
  3. Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapotranspiration-Guidelines for Computing Crop Water Requirements; FAO: Rome, Italy, 1998. [Google Scholar]
  4. Granata, F.; Di Nunno, F.; Najafzadeh, M.; Demir, I. A Stacked Machine Learning Algorithm for Multi-Step Ahead Prediction of Soil Moisture. Hydrology 2022, 10, 1. [Google Scholar] [CrossRef]
  5. Fu, R.; Xie, L.; Liu, T.; Zheng, B.; Zhang, Y.; Hu, S. A Soil Moisture Prediction Model, Based on Depth and Water Balance Equation: A Case Study of the Xilingol League Grassland. Int. J. Environ. Res. Public Health 2023, 20, 1374. [Google Scholar] [CrossRef]
  6. Hu, L.; Xu, H.; Zhang, J.; Luo, Q. Soil Moisture Prediction Based on the ARIMA Time-Series Model. In Proceedings of the 35th Chinese Control and Decision Conference, CCDC 2023, Yichang, China, 20–22 May 2023; Institute of Electrical and Electronics Engineers Inc.: Yichang, China, 2023. [Google Scholar] [CrossRef]
  7. Filipović, N.; Brdar, S.; Mimić, G.; Marko, O.; Crnojević, V. Regional soil moisture prediction system based on Long Short-Term Memory network. Biosyst. Eng. 2022, 213, 30–38. [Google Scholar] [CrossRef]
  8. Suebsombut, P.; Sekhari, A.; Sureephong, P.; Belhi, A.; Bouras, A. Field data forecasting using lstm and bi-lstm approaches. Appl. Sci. 2021, 11, 11820. [Google Scholar] [CrossRef]
  9. Huang, F.; Zhang, Y.; Zhang, Y.; Shangguan, W.; Li, Q.; Li, L.; Jiang, S. Interpreting Conv-LSTM for Spatio-Temporal Soil Moisture Prediction in China. Agriculture 2023, 13, 971. [Google Scholar] [CrossRef]
  10. Wang, Y.; Shi, L.; Hu, Y.; Hu, X.; Song, W.; Wang, L. A comprehensive study of deep learning for soil moisture prediction. Hydrol. Earth Syst. Sci. 2024, 28, 917–943. [Google Scholar] [CrossRef]
  11. Li, Q.; Zhang, C.; Shangguan, W.; Li, L.; Dai, Y. A novel local-global dependency deep learning model for soil mapping. Geoderma 2023, 438, 116649. [Google Scholar] [CrossRef]
  12. Khosravi, K.; Golkarian, A.; Barzegar, R.; Aalami, M.T.; Heddam, S.; Omidvar, E.; Keesstra, S.D.; López-Vicente, M. Multi-step ahead soil temperature forecasting at different depths based on meteorological data: Integrating resampling algorithms and machine learning models. Pedosphere 2023, 33, 479–495. [Google Scholar] [CrossRef]
  13. Rani, A.; Kumar, N.; Kumar, J.; Sinha, N.K. Chapter 6—Machine learning for soil moisture assessment. In Deep Learning for Sustainable Agriculture; Poonia, R.C., Singh, V., Nayak, S.R., Eds.; Academic Press: Amsterdam, The Netherlands, 2022; pp. 143–168. [Google Scholar] [CrossRef]
  14. Acharya, U.; Daigh, A.L.M.; Oduor, P.G. Machine learning for predicting field soil moisture using soil, crop, and nearby weather station data in the red river valley of the north. Soil Syst. 2021, 5, 57. [Google Scholar] [CrossRef]
  15. Togneri, R.; dos Santos, D.F.; Camponogara, G.; Nagano, H.; Custódio, G.; Prati, R.; Fernandes, S.; Kamienski, C. Soil moisture forecast for smart irrigation: The primetime for machine learning. Expert Syst. Appl. 2022, 207, 117653. [Google Scholar] [CrossRef]
  16. Heddam, S. 3–New formulation for predicting soil moisture content using only soil temperature as predictor: Multivariate adaptive regression splines versus random forest, multilayer perceptron neural network, M5Tree, and multiple linear regression. In Water Engineering Modeling and Mathematic Tools; Samui, P., Bonakdari, H., Deo, R., Eds.; Elsevier: Amsterdam, The Netherlands, 2021; pp. 45–62. [Google Scholar] [CrossRef]
  17. Chatterjee, S.; Dey, N.; Sen, S. Soil moisture quantity prediction using optimized neural supported model for sustainable agricultural applications. Sustain. Comput. Inform. Syst. 2020, 28, 100279. [Google Scholar] [CrossRef]
  18. Cai, Y.; Zheng, W.; Zhang, X.; Zhangzhong, L.; Xue, X. Research on soil moisture prediction model based on deep learning. PLoS ONE 2019, 14, e0214508. [Google Scholar] [CrossRef]
  19. Dolaptsis, K.; Pantazi, X.E.; Paraskevas, C.; Arslan, S.; Tekin, Y.; Bantchina, B.B.; Ulusoy, Y.; Gündoğdu, K.S.; Qaswar, M.; Bustan, D.; et al. A Hybrid LSTM Approach for Irrigation Scheduling in Maize Crop. Agriculture 2024, 14, 210. [Google Scholar] [CrossRef]
  20. Kouadri, S.; Pande, C.B.; Panneerselvam, B.; Moharir, K.N.; Elbeltagi, A. Prediction of irrigation groundwater quality parameters using ANN, LSTM, and MLR models. Environ. Sci. Pollut. Res. 2022, 29, 21067–21091. [Google Scholar] [CrossRef]
  21. Kandamali, D.F.; Cao, X.; Tian, M.; Jin, Z.; Dong, H.; Yu, K. Machine learning methods for identification and classification of events in ϕ -OTDR systems: A review. Appl. Opt. 2022, 61, 2975–2997. [Google Scholar] [CrossRef]
  22. Prasad, R.; Deo, R.C.; Li, Y.; Maraseni, T. Soil moisture forecasting by a hybrid machine learning technique: ELM integrated with ensemble empirical mode decomposition. Geoderma 2018, 330, 136–161. [Google Scholar] [CrossRef]
  23. Datta, P.; Faroughi, S.A. A multihead LSTM technique for prognostic prediction of soil moisture. Geoderma 2023, 433, 116452. [Google Scholar] [CrossRef]
  24. Li, Q.; Zhu, Y.; Shangguan, W.; Wang, X.; Li, L.; Yu, F. An attention-aware LSTM model for soil moisture and soil temperature prediction. Geoderma 2022, 409, 115651. [Google Scholar] [CrossRef]
  25. Jimenez, A.-F.; Ortiz, B.V.; Bondesan, L.; Morata, G.; Damianidis, D. Long Short-Term Memory Neural Network for irrigation management: A case study from Southern Alabama, USA. Precis. Agric. 2021, 22, 475–492. [Google Scholar] [CrossRef]
  26. Gao, P.; Xie, J.; Yang, M.; Zhou, P.; Chen, W.; Liang, G.; Chen, Y.; Han, X.; Wang, W. Improved soil moisture and electrical conductivity prediction of citrus orchards based on IoT using deep bidirectional lstm. Agriculture 2021, 11, 635. [Google Scholar] [CrossRef]
  27. Li, Q.; Li, Z.; Shangguan, W.; Wang, X.; Li, L.; Yu, F. Improving soil moisture prediction using a novel encoder-decoder model with residual learning. Comput. Electron. Agric. 2022, 195, 106816. [Google Scholar] [CrossRef]
  28. Wu, Z.; Cui, N.; Zhang, W.; Liu, C.; Jin, X.; Gong, D.; Xing, L.; Zhao, L.; Wen, S.; Yang, Y. Estimating soil moisture content in citrus orchards using multi-temporal sentinel-1A data-based LSTM and PSO-LSTM models. J. Hydrol. 2024, 637, 131336. [Google Scholar] [CrossRef]
  29. Guo, R.; Zhao, Z.; Wang, T.; Liu, G.; Zhao, J.; Gao, D. Degradation state recognition of piston pump based on ICEEMDAN and XGBoost. Appl. Sci. 2020, 10, 6593. [Google Scholar] [CrossRef]
Figure 1. Data acquisition and pre-processing workflow.
Figure 1. Data acquisition and pre-processing workflow.
Agriengineering 07 00260 g001
Figure 2. Feature distributions for model inputs parameters collected from the D.A.T.A farm between 11 March 2024 to 31 December 2024 in the South Pivot field.
Figure 2. Feature distributions for model inputs parameters collected from the D.A.T.A farm between 11 March 2024 to 31 December 2024 in the South Pivot field.
Agriengineering 07 00260 g002
Figure 3. LSTM architecture with input gate (it), forget gate (ft), and output gate (Ot).
Figure 3. LSTM architecture with input gate (it), forget gate (ft), and output gate (Ot).
Agriengineering 07 00260 g003
Figure 4. XGBoost architecture [29].
Figure 4. XGBoost architecture [29].
Agriengineering 07 00260 g004
Figure 5. Proposed method architecture showing stacked LSTM layers for feature extraction and an XGBoost regressor for final prediction.
Figure 5. Proposed method architecture showing stacked LSTM layers for feature extraction and an XGBoost regressor for final prediction.
Agriengineering 07 00260 g005
Figure 6. MSE loss curve for LSTM + XGBoost.
Figure 6. MSE loss curve for LSTM + XGBoost.
Agriengineering 07 00260 g006
Figure 7. MSE loss curve for LSTM + RF.
Figure 7. MSE loss curve for LSTM + RF.
Agriengineering 07 00260 g007
Figure 8. MSE loss curves for all hybrid models.
Figure 8. MSE loss curves for all hybrid models.
Agriengineering 07 00260 g008
Figure 9. Actual vs. predicted soil moisture values for 168 h using standalone ML models.
Figure 9. Actual vs. predicted soil moisture values for 168 h using standalone ML models.
Agriengineering 07 00260 g009
Figure 10. Actual vs. predicted soil moisture values for 168 h using standalone DL models.
Figure 10. Actual vs. predicted soil moisture values for 168 h using standalone DL models.
Agriengineering 07 00260 g010
Figure 11. Actual vs. predicted soil moisture values for 168 h using hybrid models.
Figure 11. Actual vs. predicted soil moisture values for 168 h using hybrid models.
Agriengineering 07 00260 g011
Table 1. The distribution of watermark soil moisture sensors (4D01–4D14) and the weather station in the D.A.T.A farm.
Table 1. The distribution of watermark soil moisture sensors (4D01–4D14) and the weather station in the D.A.T.A farm.
Field NameDevice NameLatitudeLongitude
Front Field4D0131.48449−83.5452
4D0231.4845−83.544
North Pivot4D0331.4837728−83.5477376
4D0431.4841216−83.5486016
4D0531.4839808−83.5469888
4D0631.4844096−83.5479488
South Pivot4D0731.4826464−83.5490432
4D0831.4819584−83.5483456
4D0931.4828736−83.54864
4D1031.4822336−83.5472512
West Field4D1131.4814976−83.5514752
4D1231.4814336−83.5508224
4D1331.4826112−83.5502144
4D1431.4826048−83.5510784
Weather Station31.48509752−83.54801174
Table 2. The proposed LSTM architecture.
Table 2. The proposed LSTM architecture.
LayerOutput ShapeParameters
LSTM(32, 24, 256)271,360
LSTM(32, 24, 128)197,120
LSTM(32, 64)49,408
Dropout(32, 64)0
Dense(32, 256)16,640
Dense(32, 168)43,296
Table 3. Multistep soil prediction results using standalone ML models.
Table 3. Multistep soil prediction results using standalone ML models.
24 h72 h168 h
ModelMSEMAER2MSEMAER2MSEMAER2
RF2.18620.67930.94742.35300.84390.94072.69790.89530.9173
SVR26.3893.02120.346926.6293.02280.329527.4373.07000.3119
DT5.72040.86670.86246.80180.96840.82866.96880.96400.7864
GB3.99391.06810.90446.28441.65280.83806.63681.75980.7829
MLP22.6743.25720.454526.4833.53840.332726.7813.53190.1792
LR11.6711.73890.719223.6043.04550.405225.2533.35930.2260
AdaBoost11.3012.81480.729017.9253.61790.535815.8393.37440.4949
ET1.64170.45380.96731.29660.51830.96051.43490.54500.9560
RR11.6671.73850.720823.5983.04510.390225.2473.35890.1779
XGBoost3.38390.74400.91863.23860.96790.91843.31751.01180.9083
ElasticNet11.1981.69290.730622.5362.97260.432223.8953.27420.2677
Table 4. Result summary for DL models in multistep soil moisture prediction.
Table 4. Result summary for DL models in multistep soil moisture prediction.
24 h72 h168 h
ModelMSEMAER2MSEMAER2MSEMAER2
LSTM0.00070.01560.97810.00070.01580.97710.00060.01460.9738
Bi-LSTM0.00050.01280.96620.00050.01260.96520.00050.01220.9627
CNN0.00050.01360.97480.00050.01350.97410.00050.01310.9710
GRU0.00140.02340.93110.00150.02280.92940.00150.02400.9098
RNN0.00200.02690.90630.00230.02990.88780.00230.03010.8660
Table 5. Result summary for hybrid LSTM models for multistep soil moisture prediction.
Table 5. Result summary for hybrid LSTM models for multistep soil moisture prediction.
24 h72 h168 h
ModelMSEMAER2MSEMAER2MSEMAER2
LSTM + RF0.00030.00860.98340.00040.00930.98270.00030.00820.9822
LSTM + LR0.00060.01480.97190.00060.01530.96930.00060.01520.9652
LSTM + XGBoost0.00030.00780.98670.00030.00800.98560.00020.00740.9854
LSTM + KNN0.00030.00720.98380.00040.00780.98240.00030.00680.9821
LSTM + MLP0.00150.02340.92710.00170.02390.92190.00140.02300.9193
LSTM + DT0.00090.01080.95900.00090.01140.95790.00070.01020.9567
CNN + LSTM0.00070.01510.96850.00060.01470.96990.00060.01500.9659
Table 6. A table showing comparative summary between hybrid LSTM models from related studies, and our proposed method.
Table 6. A table showing comparative summary between hybrid LSTM models from related studies, and our proposed method.
StudyYearModelDatasetPrediction HorizonResults (R2)Description
[9]2023Conv-LSTMERA5-LandUnspecified92%No time-step prediction, focused on spatial features.
[27]2022EDT-LSTMFLUXNET 20151–10 days89.6% (1 day)
58.3% (10 days)
Good for short-term prediction accuracy, performance declines with longer horizon.
[28]2024PSO-LSTMSentinel-1A satellite data Unspecified86.8–96.8%No time-step prediction
[10]2024FA-LSTM Internation Soil Moisture Network (ISMN)1–7 days94.3% (1 day)
81.7% (7 days)
Good short-term prediction accuracy, performance declines with longer horizons.
2024GAN-LSTM 1–7 days94.4% (1 day)
81.7% (7 days)
This study2025LSTM-XGBoost(D.A.T.A) farm; historical soil and weather 1–7 days98.67% (1 days)
98.54% (7 days)
Improved short and long-term prediction accuracy
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kandamali, D.F.; Porter, E.; Porter, W.M.; McLemore, A.; Kiobia, D.O.; Tavandashti, A.P.; Rains, G.C. Hybrid LSTM Method for Multistep Soil Moisture Prediction Using Historical Soil Moisture and Weather Data. AgriEngineering 2025, 7, 260. https://doi.org/10.3390/agriengineering7080260

AMA Style

Kandamali DF, Porter E, Porter WM, McLemore A, Kiobia DO, Tavandashti AP, Rains GC. Hybrid LSTM Method for Multistep Soil Moisture Prediction Using Historical Soil Moisture and Weather Data. AgriEngineering. 2025; 7(8):260. https://doi.org/10.3390/agriengineering7080260

Chicago/Turabian Style

Kandamali, Deus F., Erin Porter, Wesley M. Porter, Alex McLemore, Denis O. Kiobia, Ali P. Tavandashti, and Glen C. Rains. 2025. "Hybrid LSTM Method for Multistep Soil Moisture Prediction Using Historical Soil Moisture and Weather Data" AgriEngineering 7, no. 8: 260. https://doi.org/10.3390/agriengineering7080260

APA Style

Kandamali, D. F., Porter, E., Porter, W. M., McLemore, A., Kiobia, D. O., Tavandashti, A. P., & Rains, G. C. (2025). Hybrid LSTM Method for Multistep Soil Moisture Prediction Using Historical Soil Moisture and Weather Data. AgriEngineering, 7(8), 260. https://doi.org/10.3390/agriengineering7080260

Article Metrics

Back to TopTop