Enhancing Subsurface Soil Moisture Forecasting: A Long Short-Term Memory Network Model Using Weather Data

: Subsurface soil moisture is a primary determinant for root development and nutrient transportation in the soil and affects the tractability of agricultural vehicles. A statistical forecasting model, Vector AutoRegression (VAR), and a Long Short-Term Memory network (LSTM) were developed to forecast the subsurface soil moisture at a 20 cm depth using 9 years of historical weather data and subsurface soil moisture data from Fort Wayne, Indiana, USA. A time series analysis showed that the weather data and soil moisture have a stationary seasonal tendency and demonstrated that soil moisture can be forecasted from weather data. The VAR model estimates volumetric soil moisture of one-day ahead with an R 2 , MAE (m 3 m − 3 ), MSE (m 6 m − 6 ), and RMSE (m 3 m − 3 ) of 0.698, 0.0561, 0.0046, and 0.0382 for 2021 corn cropping season, whereas the LSTM model using inputs of previous seven days yielded R 2 , MAE (m 3 m − 3 ), MSE (m 6 m − 6 ), and RMSE (m 3 m − 3 ) of 0.998, 0.00237, 0.00002, and 0.00382, respectively as tested for cropping season of 2020 and 0.973, 0.00368, 0.00003 and 0.00577 as tested for the cropping season of 2021. The LSTM model presents a viable data-driven alternative to traditional statistical models for forecasting subsurface soil moisture.


Introduction
The subsurface soil layer (depths of 15-30 cm) plays a crucial role in crop root growth and nutrient absorption, making it a vital zone for optimal plant development [1].Additionally, this depth serves as a primary tillage depth for many field crops.However, it is susceptible to compaction and the formation of a compaction layer due to factors such as wetness and surface soil traffic, which can alter the soil's physical properties and hinder water flow to deeper layers [2].
Soil moisture (SM) or soil water content is a crucial hydrological variable that profoundly impacts water availability for crops and the interaction between land surface and atmospheric processes [3].Its influence extends to various aspects encompassing both engineering applications and plant life [4].Soil water plays a pivotal role in regulating plant growth, nutrient flow, and microbial processes within the soil while also significantly affecting the tractability of agricultural machinery [5][6][7].Since the moisture content of the soil directly affects its trafficability [8] and extent of compaction [6,7], there is a need to predict soil moisture status throughout the cropping season to improve logistics considering timeliness-especially during planting.Precise soil moisture measurement and predictive modeling also provide important insights into anticipated processes, such as infiltration and runoff generation following precipitation events, and inform optimal agricultural water management [9].
There are several physics-based models for estimating soil moisture.Saxton et al. (1974) developed the Soil-Plant-Atmosphere-Water (SPAW) model [10].The USDAHL model by the U.S. Department of Agriculture Hydrograph Laboratory [11] and the Sacramento Soil Moisture Accounting (SAC-SMA) Model [12] used by the National Weather Service River Forecast System (NWSRFS) are some examples of soil moisture estimation models.Recently, some advanced models are also introduced as Soil Water Infiltration and Movement (SWIM3) of the Agricultural Production Systems Simulator (APSIM) developed by the Agricultural Production Systems Research Unit in Australia [13] and the Integrated Farming System Model (IFSM [14]), developed by the United States Department of Agriculture, Soil Temperature and Moisture model (STM2 [9]) developed by the Agricultural Research Service (USDA ARS), etc. Ascertaining the necessary physical parameters can be a challenge during the implementation of these models [9].
Machine learning approaches to estimating and forecasting soil moisture are becoming popular as agriculture and climatology adopt artificial intelligence tools and techniques.Several research projects regarding soil moisture forecasting have also been undertaken using Artificial Neural Networks (ANN) [15,16], Extreme Learning Machine (ELM) [17], multivariate relevance vector machine [18], and random forest [19].Sequential analysis of historic soil moisture data is also being used to forecast soil moisture.Recurrent Neural Networks (RNN), Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) models are popular for predicting soil moisture from the historic regional soil moisture record.Prakash et al. (2018) compared several machine learning techniques, including multiple linear regression, support vector machines, and RNNs, and found multiple linear regression superior in forecasting surface soil moisture one, two, and seven days into the future.They also found that RNNs are effective in sequential follow-ups for forecasting soil moisture [20].
Statistical models, such as Autoregressive Integrated Moving Average (ARIMA), seasonal autoregressive integrated moving average (SARIMA), Vector Autoregression (VAR), and advanced RNNs like LSTM networks are also used for regional soil moisture forecasting depending on time series data.Wang et al. (2023) studied ARIMA and Back Propagation neural network model and found that a combination of the two gives superior forecasting accuracy than individual models [21].Singh et al. (2020) used LSTM for regional soil moisture forecasting based on previous history for 5-25 cm soil depth [22].A hybrid CNN-GRU model, a combination of CNN (Convolutional Neural Network) and GRU, was developed by Yu et al. (2021) to predict soil moisture in the corn root zone.They also used the historic soil moisture data for regional soil moisture forecasting [23].Jiang et al. (2022) developed an LSTM model and PCA (Principal Component Analysis) model to forecast surface soil moisture from historical soil moisture data and weather variables on a regional basis and found LSTM to perform better with an absolute percentage error of 0.27%.Machine learning techniques can be a replacement for physical forecasting models [24].These models were univariate and used historic soil moisture to forecast.The Soil and Water Assessment Tool (SWAT) model predicts root zone soil moisture based on past soil moisture data, rainfall, evapotranspiration, percolation, bypass flow, and return flow [25].While this model is multivariate, it was and still is primarily used for soil moisture estimation rather than forecasting [25].
Subsurface soil moisture is governed by a combination of topography, soil physical properties, and weather conditions.While topography and soil physical properties remain constant for a specific location, weather factors, such as precipitation, solar radiation, atmosphere temperature, wind velocity, and relative humidity, vary over time.As surface soil moisture is influenced by weather conditions, it, in turn, impacts the subsurface soil moisture dynamics [26].Therefore, understanding the interplay between weather parameters and subsurface soil moisture is crucial for accurate predictions.Several recent studies have demonstrated the potential of using machine learning models for subsurface soil moisture forecasting, often incorporating weather variables as key variables.Carranza et al. (2021) achieved a high level of precision (R 2 = 0.7) in forecasting root zone soil water content using a Random Forest model [27].Basak et al. (2023) introduced innovative approaches, Naive Accumulative Representation (NAR) and Additive Exponential Accumulative Rep-resentation (AEAR), in models forecasting soil moisture across different soil depths.They compared these approaches with LSTM models for short-term soil moisture predictions, using rainfall and soil moisture as primary inputs [28].A et al. [29] and Santos et al. [30] also developed machine learning models for subsurface soil moisture forecasts, and they used rainfall and historical soil moisture data as inputs.While these studies primarily utilized rainfall as a weather variable, they collectively underscore the potential of developing robust subsurface soil moisture models leveraging a wider array of weather-related variables, such as temperature, humidity, and more.
In contemporary agricultural and environmental research, statistical models and the integration of machine learning methodologies have become increasingly prevalent, particularly for forecasting applications.Hou et al. (2023) developed VAR models to forecast evapotranspiration [31].Abdallah et al. (2020) used VAR for short-term weather forecasts and achieved 96.7% precision [32].Bahari et al. (2023) studied artificial intelligence techniques for sea level forecasts.They mentioned that the Relevance Vector Machine (RVM) is an efficient standalone algorithm for sea level forecasting.CNN, RNN, and LSTM were compared, and LSTM models were most efficient in pattern-following forecasts [33].Hybrid models are also used in forecasting gridded time series data (for example, spatialtemporal forecast), and these combined models perform better than individual ML and DL algorithms in grid forecasting [33].Wai et al. (2022) reviewed several types of series forecasting deep learning models, including RNN, LSTM, CNN, GRU, and Temporal Convolutional Network (TCN).In their use cases, LSTM was effective in following long patterns [34].Ng et al. (2023) showed that blending deep learning with physics-based models increased accuracy.They found that deep learning models are subject to data availability and characteristics of available data.In forecasting image-based data, they recommended to study attention-based LSTM models [35].Fan et al. (2020) found that LSTM models for 1 day runoff forecasting were more accurate than ANN [36].Although these studies are not specific to soil moisture, they provide evidence that LSTM models are effective in pattern following with non-gridded series data.
Given the interdependence between surface and subsurface soil moisture, influenced by dynamic weather conditions, the goal of this study was to explore their relationship and forecasting potential.Weather factors such as rainfall, relative humidity, wind speed, air temperature, and solar radiation are key focus areas in this investigation.Moreover, recognizing subsurface soil moisture's pivotal role in field trafficability, especially in the context of root zone soil compaction and farm management, the objectives were: i. to establish the forecastability of subsurface soil moisture (volumetric water content) depending on weather conditions and ii. to build a VAR and an LSTM model for subsurface soil moisture forecasting and compare their prediction accuracy.

Study Area and Data Used
The dataset utilized in this research originates from a weather station located in Fort Wayne, Indiana, USA.The data span from 22 September 2011 to 9 September 2021, comprising a total of 87,362 samples.The selected data elements for this study include key meteorological variables such as ambient temperature ( • C), relative humidity (%), solar radiation (Wm −2 ), wind speed (ms −1 ), total rainfall (mm), and the volumetric water content (VWC) (m 3 m −3 ) of subsurface soil at a depth of 30 cm.The dataset was obtained from the Indiana Geological and Water Survey, Indiana University [37].
The dataset consists of daily records (daily average of hourly measurements) providing comprehensive insights into the temporal variation of these variables.There is a gap in the soil moisture time series data, specifically from 23 March 2015, 11:10 a.m., to 21 April 2015, 1:10 p.m., where the information is missing.
Figure 1 illustrates the geographical position of the study area, while Table 1 presents a succinct overview of the soil conditions at the site, providing valuable context for this research.
Figure 1 illustrates the geographical position of the study area, while Table 1 presents a succinct overview of the soil conditions at the site, providing valuable context for this research.The data were cleaned for model development after conducting time series forecastability and interdependency analysis by removing the missing data points (rows containing cells with no data or 'NaN') from the dataset.The data were cleaned for model development after conducting time series forecastability and interdependency analysis by removing the missing data points (rows containing cells with no data or 'NaN') from the dataset.

Time Series Analysis
To develop forecasting models for a time series variable based on other time series variables, it is necessary to test their forecastability and examine the relationship between the dependent variable and the independent input variables (predictors).Subsurface soil water content was the dependent variable, which was subjected to forecasting depending on input variables, i.e., rainfall, air/ambient temperature, wind speed, relative humidity, and solar radiation.Due to the time series analysis result, the soil water content of the previous days was also considered as an input variable.
Agriculture 2024, 14, 333 5 of 24 Some statistical tests were carried out to analyze the time series of variables to validate that the series is autocorrelated, stationary, cointegrated to soil moisture series, and interdependent.Table 2 shows the tests used for this purpose.To be a pattern-following time series, variables need to be autocorrelated.The Durbin-Watson test shows whether the variables were autocorrelated or not [45].Also, an additive seasonal decompose of soil moisture time series was conducted to check whether the trend of subsurface soil moisture increased or decreased throughout the recorded time and the seasonal pattern of soil moisture to support the forecastability of soil moisture.Both inputs and dependent variables need to be stationary to be forecasted, i.e., they do not depend on the observation time within the series [46,47].Johansen's cointegration test was conducted to evaluate the cointegration of the variables.This test refers to the relation between two or more time series variables.Another test for the interdependency of input variables and the target variable was determined by the Granger causality test.Seasonal decomposition was conducted to check if the moisture content data have any trend or seasonal changing pattern.It also gives insight into stationarity.

Model Development for Subsurface Soil Moisture Forecasting
Based on the results of the time series analysis, it was discerned that strong autocorrelation and cointegration exist within the subsurface soil moisture data.Consequently, a decision was made to employ two distinct algorithms: a statistical model, VAR [48], and a machine learning model, the LSTM network.VAR is recognized for its aptitude in managing cointegrated time series data, rendering it an appropriate choice for this particular context.Conversely, LSTM networks are distinguished for their proficiency in capturing intricate temporal patterns, thus enabling the effective utilization of multiple time series inputs for enhanced predictive capabilities.The steps of developing the models are briefly described in Figure 2. The dataset of historical weather and VWC was cleaned, the missing data rows were deleted, and the dataset was split into two parts: training and testing.The models were developed using the training dataset, and their forecasting performance was evaluated using the testing dataset.

Development of a VAR Model
The VAR model requires the temporal input and output data to be stationary and autocorrelated.The method of testing stationarity, autocorrelation, and forecastability was described and demonstrated in the previous section, and depending on the statistical test results, a multivariate time series forecasting model was attempted to develop using the VAR [48] algorithm.Three steps were followed to develop a VAR model to forecast the subsurface soil moisture.The determination of the order of VAR using AIC (Akaike Information Criteria) [49], Running the VAR algorithm with selected order to train the VAR model with 10 years of historical weather and subsurface soil moisture data and validation of the model by forecasting one day throughout the next one year.The model was developed using the 'Statmodels' tool in Python language.Figure 3

Development of a VAR Model
The VAR model requires the temporal input and output data to be stationary and autocorrelated.The method of testing stationarity, autocorrelation, and forecastability was described and demonstrated in the previous section, and depending on the statistical test results, a multivariate time series forecasting model was attempted to develop using the VAR [48] algorithm.Three steps were followed to develop a VAR model to forecast the subsurface soil moisture.The determination of the order of VAR using AIC (Akaike Information Criteria) [49], Running the VAR algorithm with selected order to train the VAR model with 10 years of historical weather and subsurface soil moisture data and validation of the model by forecasting one day throughout the next one year.The model was developed using the 'Statmodels' tool in Python language.

Selection of Lag Order
The selection of the lag order or length for the VAR model was conducted by assessing the AIC (Akaike Information Criteria) value [49][50][51][52].The computed AIC values are provided in Table 3.

Selection of Lag Order
The selection of the lag order or length for the VAR model was conducted by assessing the AIC (Akaike Information Criteria) value [49][50][51][52].The computed AIC values are provided in Table 3.As the AIC value shows a minimum at the 9th lag, so 9 was selected as the order of VAR.

Construction of Model Equation
The VAR model was built using the selected lag order 9.The equation (Equation ( 1)) developed for the VAR model using 9 as lag is as follows [53]: where Y t = output vector at time t (here subsurface moisture content), A 0 = a constant value (vector intercept), A v,l = coefficient matrix for each variable and lag combination, Y (t−l) = vector of exogenous variables, e t = residual vector at time t.

VAR Model Evaluation
The VAR model was trained with historical weather data and subsurface soil water data from 2011 to 2020.The model accuracy was tested using the following year's forecasts of corn cropping season data (March-September 2021).The evaluation was performed by measuring the mean squared error (MSE) [54], mean absolute error (MAE), and root mean squared error (RMSE) [55].The goodness-of-fit parameter (R 2 ) was also calculated.The Durbin-Watson test for the residuals was conducted to check if there was any autocorrelation among the residuals.

Development of an LSTM Model
A simple LSTM network model was also developed for the purpose of forecasting subsurface soil moisture.This model leverages input variables, including rainfall, air temperature, relative humidity, solar radiation, wind speed, and the historical soil water content data from the preceding n days.The normalized (min-max scaled) training dataset encompasses a daily record spanning nine years (22 September 2011 to 10 September 2019) and tested with the following two years (11 September 2019 to 9 September 2021) daily basis data.

Architecture of the LSTM Model
The LSTM network model was implemented using the Python programming language within the Jupiter Notebook environment of the Anaconda Navigator Software (Anaconda3, developer: Continuum Analytics).This model was designed to accommodate six input variables, resulting in an input layer with six neurons.The output layer was configured with a single neuron, as the sole objective was to forecast subsurface soil water content.The model architecture incorporated five hidden layers and employed the 'Adam' optimizer [56] to estimate the error or loss for training and optimization purposes, which were set by trial and error according to the model's best fit.A succinct overview of the model's structure is presented in Table 4.In this model, three LSTM layers and two dense layers were used.LSTM layers forward the trend of data and the LSTM cell value to the next layer, and the dense layer forwards the dense output of the dense layer to the next layer [57].The three LSTM layers used a tanh activation function that scales the output of those layers between −1 and 1, where ReLU makes the output maximized and passes only the positive values to the next layer [58][59][60].The output layer was given a linear activation so that it may pass the dense output as it was generated.The architecture of the developed LSTM model is shown in Figure 4.

Training the LSTM Model
The model was trained to attain minimal training and validation losses, with their estimation being refined through a systematic trial-and-error approach.The Mean Absolute Error (MAE) served as the loss function and was optimized to minimize deviations.Additionally, trial-and-error was conducted to estimate the optimal number of hidden layers, aiming to identify the configuration that yielded the lowest loss values.
The constructed LSTM network model was trained using 2883 inputs from a ground truth dataset of subsurface soil water content and weather variables, where 20% of the series data was used for validation.The training and validation set was split into blocks containing n rows of data as a single input and the following one of soil water content as output.The sliding window algorithm was used to make the batch input for training the model so that the model should be able to forecast the soil moisture of the (n + 1)th day as an output of feeding the previous n days' input data.The model was optimized by minimizing validation errors.The model training and validation phase were programmed to iterate over 90 epochs, aiming to identify the most appropriate parameter combination that resulted in minimal training and validation errors while achieving optimal accuracy.

Overfitting and Underfitting Test of LSTM Model
The LSTM model involves a trial-and-error process to determine optimal hyperparameters and the number of hidden layers, which can lead to overfitting or underfitting issues.Underfitting occurs when the model inadequately fits the training data, resulting

Training the LSTM Model
The model was trained to attain minimal training and validation losses, with their estimation being refined through a systematic trial-and-error approach.The Mean Absolute Error (MAE) served as the loss function and was optimized to minimize deviations.Additionally, trial-and-error was conducted to estimate the optimal number of hidden layers, aiming to identify the configuration that yielded the lowest loss values.
The constructed LSTM network model was trained using 2883 inputs from a ground truth dataset of subsurface soil water content and weather variables, where 20% of the series data was used for validation.The training and validation set was split into blocks containing n rows of data as a single input and the following one of soil water content as output.The sliding window algorithm was used to make the batch input for training the model so that the model should be able to forecast the soil moisture of the (n + 1)th day as an output of feeding the previous n days' input data.The model was optimized by minimizing validation errors.The model training and validation phase were programmed to iterate over 90 epochs, aiming to identify the most appropriate parameter combination that resulted in minimal training and validation errors while achieving optimal accuracy.

Overfitting and Underfitting Test of LSTM Model
The LSTM model involves a trial-and-error process to determine optimal hyperparameters and the number of hidden layers, which can lead to overfitting or underfitting issues.Underfitting occurs when the model inadequately fits the training data, resulting in poor performance during validation; this is typically indicated by a much lower validation R 2 value compared to the training R 2 .Conversely, achieving high R 2 values in both training and validation does not guarantee a well-fitted model, as there could be overfitting, wherein the model excessively memorizes the training data.In these instances, the model lacks the ability to generalize.Overfitting can arise due to model complexity, including unnecessary hidden layers.To combat overfitting and underfitting in the developed LSTM model, an early stopping callback function was utilized during LSTM training to halt the process at the optimal epoch.Furthermore, in assessing overfitting and underfitting, two parameters were measured: Cost Delta (the difference between training and validation MAE) and the Overfitting Ratio (OR) [61]: (2) The model was optimized to obtain a minimum cost delta, ensuring minimal training and validation loss.An Overfitting Ratio (OR) close to 1 indicates balanced model performance-achieving a level of equilibrium between the model's fitting to the training data and its generalization to unseen data [62].

Testing the LSTM Model
The model's performance was evaluated by comparing its anticipated subsurface soil moisture values with actual data from a separate validation dataset that had not been employed during the training and validation phases.The testing was performed using two years of data, including the cropping season of corn (1 March 2020 to 25 October 2020 and 2 March 2021 to 9 September 2021).
The model's performance was assessed by employing statistical error measures that quantify the disparities between anticipated and actual values.Statistical goodness-of-fit measures and error parameters such as the coefficient of determination (R 2 ) [63], Mean Squared Error (MSE) [54], Root Mean Squared Error (RMSE) [55], and Overall Performance Index (OI) [64] were determined to evaluate the forecasting accuracy.A residual analysis of the normality of residual distribution and assumption of 'zero' mean of error was tested to evaluate the suitability of the constructed model.

Subsurface Soil Water Content in Response to Weather Variables
Figure 5 illustrates the relationship between VWC and the weather variables over time.These variables consistently exhibit discernible patterns that demonstrate a temporal correlation with subsurface soil moisture.These weather variables have individual effects on the rate of change in soil moisture based on physics; they also are interrelated due to seasonal patterns (i.e., their time series cross-correlation [65]).Rainfall has an obvious positive effect on subsurface soil water content.The soil water content directly increases with additional rainfall.In the case of relative humidity, it shows a slightly different scenario.Following the peak values of relative humidity, the change in soil moisture is noticed a few days later.When ambient temperature increases, soil moisture decreases (evapotranspiration) with a lag of a few days.Wind effect on soil moisture also depends on temperature and solar radiation.They are positively correlated, yet the graph shows a positive relation between wind speed and subsurface soil moisture.Solar radiation has a direct opposite relationship with soil moisture, albeit delayed, as seen in the figure.So, the variables are not instantaneously correlated, but soil moisture is related to the history of those variables.As expected, there is a seasonal pattern (cycle) in the time series of weather variables.spiration) with a lag of a few days.Wind effect on soil moisture also depends on temperature and solar radiation.They are positively correlated, yet the graph shows a positive relation between wind speed and subsurface soil moisture.Solar radiation has a direct opposite relationship with soil moisture, albeit delayed, as seen in the Figure .So, the variables are not instantaneously correlated, but soil moisture is related to the history of those variables.As expected, there is a seasonal pattern (cycle) in the time series of weather variables.

Analysis of Historical Subsurface Soil Moisture Data
The subsurface soil moisture data from 2011 to 2021 shows an average moisture content of 35.3%. Figure 6 depicts the distribution of subsurface soil moisture in months of a year.In the months of December to May, the subsurface soil moisture is higher than the average.It decreases from April to September due to warmer temperatures, lower rainfall, and moisture uptake due to vegetation during this part of the year.The soil moisture increases from October to April because of precipitation and low temperatures.

Analysis of Historical Subsurface Soil Moisture Data
The subsurface soil moisture data from 2011 to 2021 shows an average moisture content of 35.3%. Figure 6 depicts the distribution of subsurface soil moisture in months of a year.In the months of December to May, the subsurface soil moisture is higher than the average.It decreases from April to September due to warmer temperatures, lower rainfall, and moisture uptake due to vegetation during this part of the year.The soil moisture increases from October to April because of precipitation and low temperatures.

Analysis of Historical Subsurface Soil Moisture Data
The subsurface soil moisture data from 2011 to 2021 shows an average moisture content of 35.3%. Figure 6 depicts the distribution of subsurface soil moisture in months of a year.In the months of December to May, the subsurface soil moisture is higher than the average.It decreases from April to September due to warmer temperatures, lower rainfall, and moisture uptake due to vegetation during this part of the year.The soil moisture increases from October to April because of precipitation and low temperatures.

Seasonality of Moisture Content
The seasonal decomposition illustrated in Figure 7 demonstrates that there is a consistency among yearly cycles in the subsurface soil moisture level (seasonality, Figure 7b).This result verifies the result shown in Figure 5 that there are seasonal patterns in subsurface soil water content.Since there is no trend throughout the recorded time (across seasons), the data are stationary (Figure 7c).
The seasonal decomposition illustrated in Figure 7 demonstrates that there is a consistency among yearly cycles in the subsurface soil moisture level (seasonality, Figure 7b).This result verifies the result shown in Figure 5 that there are seasonal patterns in subsurface soil water content.Since there is no trend throughout the recorded time (across seasons), the data are stationary (Figure 7c).

Autocorrelation Test of Variables
Figure 8 shows the autocorrelation in subsurface soil moisture data.Figure 8a shows that as the autocorrelation factor decreases rapidly, the dataset comprises non-random and stationary data.Also, it depicts that the soil moisture of previous days ((n − l)th days) is positively correlated to the present value (nth day).The partial autocorrelation plot (Figure 8b) shows the significance of correlations of lags with the first lag [66].Here, the partial autocorrelation factor is significant for the 1st and second lag (near 1), and correlation remains significant (outside the shaded area of confidence) through the 6th lag (days) (Figure 8b).It shows a pattern where the subsequent lags exhibit a positive-negative-positive sequence in the partial autocorrelation.This pattern implies a cyclical or oscillatory behavior in the relationship between the variable and its past values, possibly indicating a recurring pattern or seasonality in the soil moisture data, which validates Figure 7.

Autocorrelation Test of Variables
Figure 8 shows the autocorrelation in subsurface soil moisture data.Figure 8a shows that as the autocorrelation factor decreases rapidly, the dataset comprises non-random and stationary data.Also, it depicts that the soil moisture of previous days ((n − l)th days) is positively correlated to the present value (nth day).The partial autocorrelation plot (Figure 8b) shows the significance of correlations of lags with the first lag [66].Here, the partial autocorrelation factor is significant for the 1st and second lag (near 1), and correlation remains significant (outside the shaded area of confidence) through the 6th lag (days) (Figure 8b).It shows a pattern where the subsequent lags exhibit a positive-negativepositive sequence in the partial autocorrelation.This pattern implies a cyclical or oscillatory behavior in the relationship between the variable and its past values, possibly indicating a recurring pattern or seasonality in the soil moisture data, which validates Figure 7.
The seasonal decomposition illustrated in Figure 7 demonstrates that there is a consistency among yearly cycles in the subsurface soil moisture level (seasonality, Figure 7b).This result verifies the result shown in Figure 5 that there are seasonal patterns in subsurface soil water content.Since there is no trend throughout the recorded time (across seasons), the data are stationary (Figure 7c).

Autocorrelation Test of Variables
Figure 8 shows the autocorrelation in subsurface soil moisture data.Figure 8a shows that as the autocorrelation factor decreases rapidly, the dataset comprises non-random and stationary data.Also, it depicts that the soil moisture of previous days ((n − l)th days) is positively correlated to the present value (nth day).The partial autocorrelation plot (Figure 8b) shows the significance of correlations of lags with the first lag [66].Here, the partial autocorrelation factor is significant for the 1st and second lag (near 1), and correlation remains significant (outside the shaded area of confidence) through the 6th lag (days) (Figure 8b).It shows a pattern where the subsequent lags exhibit a positive-negative-positive sequence in the partial autocorrelation.This pattern implies a cyclical or oscillatory behavior in the relationship between the variable and its past values, possibly indicating a recurring pattern or seasonality in the soil moisture data, which validates Figure 7.The Durbin-Watson test gives an interpretation of existing autocorrelation inside the time series variables.This test also establishes a series as an autocorrelated time series.[39,67].The test result shown in Table 5 shows that variables are positively correlated.The Augmented Dicky-Fuller (ADF) test is statistical evidence of stationarity.The ADF test result for all considered variables (inputs and target) is shown in Table 6.ADF statistics for all the variables are negative.The more negative the ADF statistics, the more the variables are likely to be stationary.ADF statistics value less than the critical value for a certain confidence interval (1%, 5%, or 10%) refers to a stationary pattern of data [40].Here, the ADF statistics value for all the variables except the air temperature is less than the critical value at a 1% confidence interval.The air temperature shows ADF statistics less than the critical value at a 5% confidence interval.Still, all the variables are significantly stationary.The p-values displayed in the table also prove the time series of each variable as stationary.The p-values are less than 0.01 for all the variables and less than 0.05 for air temperature.

Cointegration Test
Johansen's cointegration test gives the scenario if three or more time series are related to each other.The test result aimed to determine the statistical relationship between subsurface soil water content and the input weather variables is presented in Table 7. Johansen's cointegration test shows that the test statistics for the input variables are larger than the critical value at a 95% confidence level.So, it is an indicator for the variables to be cointegrated with soil water content [42,68].

Forecastability Test
The Granger causality test shown in Table 8 demonstrates that all the input variables may be useful in forecasting subsurface soil water content.It verifies the forecastability of time series in water content with weather inputs.The p-values below 0.01 indicate that the time series of water content is significantly dependent on the time series of weather inputs (data concerning moisture content depends on the past data concerning weather time series) [69].The statistical tests and analysis show that the subsurface soil moisture can be defined by the weather conditions, and it can be forecasted from historical weather data and previous data on subsurface soil moisture content.

Statistical Model: Vector Autoregression (VAR)
The VAR model (Equation ( 1)) constant and coefficients (A 0 and A n ) for all the input variables are shown in Table 9.Following Table 9, to forecast the subsurface water content at time t from weather and water content inputs, Equation (3) can be formed.

Evaluation of the VAR Model
The VAR model was evaluated by forecasting the subsurface moisture content of the following one-year (2020-2021) corn cropping season.Figure 9 shows the forecasted and recorded values of subsurface soil water content from 15 March 2021 to 9 September 2021.The model evaluation parameters are shown in Table 10.
and water content inputs, Equation (3) can be formed.

Evaluation of the VAR Model
The VAR model was evaluated by forecasting the subsurface moisture content of the following one-year (2020-2021) corn cropping season.Figure 9 shows the forecasted and recorded values of subsurface soil water content from 15 March 2021 to 9 September 2021.The model evaluation parameters are shown in Table 10.Despite error values (MAE, MSE, and RMSE) being small numerically and 69.8% of the variation being explained, VAR does not capture day-to-day variations (Figure 9).A ttest shows that the forecasted values of soil moisture were significantly different from the actual (p < 0.05).Despite error values (MAE, MSE, and RMSE) being small numerically and 69.8% of the variation being explained, VAR does not capture day-to-day variations (Figure 9).A t-test shows that the forecasted values of soil moisture were significantly different from the actual (p < 0.05).

The LSTM Network Model
The LSTM model was trained using the training dataset with an n = 7 days sliding window, and the eighth-day soil moisture was forecasted.The model was iterated for a total of 90 epochs, and the highest accuracy while simultaneously minimizing training and validation errors was achieved during the 51st iteration.The iteration process ended once the training loss reached a minimum of 3.12%, and the validation loss matched at 1.29%.The model's statistical metrics from the 51st epoch are detailed in Table 11.
Figure 10 portrays the loss minimization process for the entire model over the course of 51 epochs.The graph illustrates the declining trend of both training and validation losses, measured in terms of Mean Absolute Error (MAE), with each successive epoch.From Table 11, the high accuracy of the model at the 51st epoch is evidence of no underfitting.The cost delta, representing a 5.37% difference between validation and training errors, indicates a minimal divergence, thus suggesting the absence of overfitting.Typically, overfitting would lead to poor validation results.In this scenario, the validation R 2 (0.981) closely approximates the training R 2 (0.995), with an OR of 0.93, which indicates the model is balanced.The OR value exceeding unity is evidence of overfitting [62].In Figure 10, it is shown that the training and validation loss moves downward in the 51st epoch, and their difference also decreases at that point, and the fluctuation is lower.However, as the graph shows a little noise in the training and validation loss curve in the last six epochs, there can be a chance of being slightly overfitted [70], which was tested later in the model evaluation section.[61] 0.93

Evaluation of the LSTM Network Model
Figure 11 shows the forecasting accuracy of the LSTM network model for the corngrowing seasons of 2020 (Figure 11a) and 2021 (Figure 11b).The forecasts sometimes

Evaluation of the LSTM Network Model
Figure 11 shows the forecasting accuracy of the LSTM network model for the corngrowing seasons of 2020 (Figure 11a) and 2021 (Figure 11b).The forecasts sometimes overshoot VWC; curves of actual VWC are smoother than forecasts, but overall, the difference between actual and forecasted moisture content was small (MAPE = 0.91%, p > 0.05 for the 2020 crop season and MAPE = 0.97%, p > 0.05 for the 2021 crop season).
overshoot VWC; curves of actual VWC are smoother than forecasts, but overall ference between actual and forecasted moisture content was small (MAPE = 0. 0.05 for the 2020 crop season and MAPE = 0.97%, p > 0.05 for the 2021 crop season Figure 12 shows the model fit for the testing series.In both cases, the interce different from 0.0 (p < 0.001), and the slope is not different from 1.0 (p < 0.001).Th ness of model fit is also described in values in Table 12. Figure 12 shows the model fit for the testing series.In both cases, the intercept is not different from 0.0 (p < 0.001), and the slope is not different from 1.0 (p < 0.001).The goodness of model fit is also described in values in Table 12.    Figure 12 shows that the LSTM forecasts of soil moisture average slightly lower than the actual value.The model fit is R 2 = 0.998 and R 2 = 0.973 (Table 12) for the cropping seasons of 2020 and 2021, respectively.The combination of low MSE and RMSE, along with a high R-squared (R 2 ) value, indicates that this model can be an effective alternative to traditional empirical equations for soil moisture forecasting [71].The acceptable threshold for RMSE value is 10% [72], where the model forecasted soil moisture with RMSE of 0.3% and 0.5% for two consecutive years.The overall performance index of the model also shows a satisfactory result with OI = 0.992 and 0.966.As the model test result shows a good performance in terms of goodness-of-fit parameters, it can be said that the overfitting issue is negligible.
Figure 13 shows the residual distribution of the forecasted series.The distribution curve shows a little skew in the error distribution.Also, the maximum density of errors gathers near zero, with the mean of error being −0.001 and −0.0006 for the cropping seasons of 2020 and 2021, respectively, which are close to zero.So, it can be said that the model satisfies the 'zero' mean of error assumption to be a good forecasting model [73].

Parameters
Test Figure 12 shows that the LSTM forecasts of soil moisture average slightly low the actual value.The model fit is R 2 = 0.998 and R 2 = 0.973 (Table 12) for the c seasons of 2020 and 2021, respectively.The combination of low MSE and RMSE with a high R-squared (R 2 ) value, indicates that this model can be an effective alt to traditional empirical equations for soil moisture forecasting [71].The acceptable old for RMSE value is 10% [72], where the model forecasted soil moisture with R 0.3% and 0.5% for two consecutive years.The overall performance index of the mo shows a satisfactory result with OI = 0.992 and 0.966.As the model test result s good performance in terms of goodness-of-fit parameters, it can be said that the ov issue is negligible.
Figure 13 shows the residual distribution of the forecasted series.The dist curve shows a little skew in the error distribution.Also, the maximum density o gathers near zero, with the mean of error being −0.001 and −0.0006 for the cropp sons of 2020 and 2021, respectively, which are close to zero.So, it can be said model satisfies the 'zero' mean of error assumption to be a good forecasting mod (a) discusses that the LSTM model outperforms the multilayer perceptron network with an R 2 value of 0.80 to 0.98.That study was conducted only using soil water c of different soil depths [74].A comparison between the seasonal autoregressive inte moving average (SARIMA) and the LSTM model to forecast soil moisture profile conducted by Kumar and Rao (2023).They also found the LSTM model superior in of accuracy (R 2 = 0.99).These models were also univariate models [75].
The LSTM model in this study has a high R 2 value similar to the aforemen studies.Moreover, the inclusion of atmospheric variables alongside subsurface V essential due to the high correlation in the time series analysis and forecastability The LSTM model's capacity for pattern-following further enhances accuracy and re ity in forecasting.Given the dependencies and pattern-following characteristics, it ited that a multiple-variable LSTM model serves as a superior alternative to sta models such as SARIMA and VAR.

Conclusions
Subsurface soil moisture forecasting plays a pivotal role in enhancing farming agement, particularly concerning issues of soil compaction and vehicle trafficabi fields.Weather variables were significantly correlated and cointegrated with subs soil moisture, establishing a direct time series dependency.Furthermore, the time analysis found a dependency of subsurface soil moisture value on its previous v hence establishing the feasibility of forecasting soil water content from historical soil and weather variables.Considering the dependency of subsurface soil moistu weather variables, two multivariate time series forecast models (VAR and LSTM  (2021) discusses that the LSTM model outperforms the multilayer perceptron network model with an R 2 value of 0.80 to 0.98.That study was conducted only using soil water content of different soil depths [74].A comparison between the seasonal autoregressive integrated moving average (SARIMA) and the LSTM model to forecast soil moisture profiles was conducted by Kumar and Rao (2023).They also found the LSTM model superior in terms of accuracy (R 2 = 0.99).These models were also univariate models [75].
The LSTM model in this study has a high R 2 value similar to the aforementioned studies.Moreover, the inclusion of atmospheric variables alongside subsurface VWC is essential due to the high correlation in the time series analysis and forecastability tests.The LSTM model's capacity for pattern-following further enhances accuracy and reliability in forecasting.Given the dependencies and pattern-following characteristics, it is posited that a multiple-variable LSTM model serves as a superior alternative to statistical models such as SARIMA and VAR.

Conclusions
Subsurface soil moisture forecasting plays a pivotal role in enhancing farming management, particularly concerning issues of soil compaction and vehicle trafficability in fields.Weather variables were significantly correlated and cointegrated with subsurface soil moisture, establishing a direct time series dependency.Furthermore, the time series analysis found a dependency of subsurface soil moisture value on its previous values, hence establishing the feasibility of forecasting soil water content from historical soil water and weather variables.Considering the dependency of subsurface soil moisture on weather variables, two multivariate time series forecast models (VAR and LSTM) were developed and evaluated by error and goodness-of-fit parameters.Both the VAR model and LSTM network model were statistically fitted models for forecasting soil water.However, the LSTM model has superior accuracy and trend tracking compared to the statistical model.The VAR model fits for forecasting subsurface soil moisture considering 9 days of past input data with MAE = 0.0561 m 3 m −3 ; MSE = 0.0046 (m 3 m −3 ) 2 ; RMSE = 0.0382 m 3 m −3 and R 2 = 0.6982 propagating for 1 year.The LSTM model was propagated considering 7 days of past input data for a cropping season of corn for the years 2020 and 2021 where R 2 , MAE (m 3 m −3 ), MSE ((m 3 m −3 ) 2 ) and RMSE (m 3 m −3 ) were found to be 0.998, 0.00237, 0.000015 and 0.00382 for 2020 and 0.973, 0.00368, 0.000033 and 0.00577 for 2021.The LSTM model was found to be a good alternative to physical and statistical models in terms of both accuracy and pattern following.Nonetheless, it is imperative to acknowledge certain limitations inherent in these models.They are tailored to forecast only one day ahead, and the study was primarily focused on data from a single weather station.Additionally, the challenge of missing data within the continuous time series data stream poses potential hurdles, resulting in erroneous model outputs.As such, future research should focus on extending the forecasting timeline, incorporating richer data, and creating models that accommodate spatial variations alongside temporal inputs from multiple locations.
briefly describes the development stages of the VAR model.

Figure 2 .
Figure 2. Flow diagram of VWC forecast model development and evaluation.

Figure 3
briefly describes the development stages of the VAR model.

Figure 2 .
Figure 2. Flow diagram of VWC forecast model development and evaluation.Agriculture 2024, 14, x FOR PEER REVIEW 7 of 25

Figure 3 .
Figure 3. Flow diagram of VAR model development.

Figure 3 .
Figure 3. Flow diagram of VAR model development.

Figure 4 .
Figure 4. Architecture of the developed LSTM model.

Figure 4 .
Figure 4. Architecture of the developed LSTM model.

Figure 5 .
Figure 5. Dynamic relationship of subsurface soil water content and (a) daily total rainfall, (b) relative humidity, (c) ambient temperature, (d) wind speed, and (e) solar radiation vs. date.

Figure 5 .
Figure 5. Dynamic relationship of subsurface soil water content and (a) daily total rainfall, (b) relative humidity, (c) ambient temperature, (d) wind speed, and (e) solar radiation vs. date.

Figure 8 .
Figure 8.(a) Autocorrelation in subsurface soil moisture time series.(b) Partial autocorrelation in subsurface soil moisture time series (inset: significant autocorrelated lags).

Figure 8 .
Figure 8.(a) Autocorrelation in subsurface soil moisture time series.(b) Partial autocorrelation in subsurface soil moisture time series (inset: significant autocorrelated lags).Figure 8. (a) Autocorrelation in subsurface soil moisture time series.(b) Partial autocorrelation in subsurface soil moisture time series (inset: significant autocorrelated lags).

Figure 8 .
Figure 8.(a) Autocorrelation in subsurface soil moisture time series.(b) Partial autocorrelation in subsurface soil moisture time series (inset: significant autocorrelated lags).Figure 8. (a) Autocorrelation in subsurface soil moisture time series.(b) Partial autocorrelation in subsurface soil moisture time series (inset: significant autocorrelated lags).

Figure 10
Figure 10 portrays the loss minimization process for the entire model over the course of 51 epochs.The graph illustrates the declining trend of both training and validation losses, measured in terms of Mean Absolute Error (MAE), with each successive epoch From Table11, the high accuracy of the model at the 51st epoch is evidence of no underfitting.The cost delta, representing a 5.37% difference between validation and training errors, indicates a minimal divergence, thus suggesting the absence of overfitting.Typically, overfitting would lead to poor validation results.In this scenario, the validation R 2 (0.981) closely approximates the training R 2 (0.995), with an OR of 0.93, which indicates the model is balanced.The OR value exceeding unity is evidence of overfitting[62].In Figure10, it is shown that the training and validation loss moves downward in the 51st epoch, and their difference also decreases at that point, and the fluctuation is lower.However, as the graph shows a little noise in the training and validation loss curve in the last six epochs, there can be a chance of being slightly overfitted [70], which was tested later in the model evaluation section.

Figure 10 .
Figure 10.Training and validation loss minimization in the LSTM network model.

Figure 10 .
Figure 10.Training and validation loss minimization in the LSTM network model.

Figure 12 .
Figure 12.LSTM model fit for soil volumetric water content (m 3 m −3 ) for the corn cropping seasons of 2020 (a) and 2021 (b).

Figure 12 .
Figure 12.LSTM model fit for soil volumetric water content (m 3 m −3 ) for the corn cropping seasons of 2020 (a) and 2021 (b).

Figure 13 .
Figure 13.Residual distribution of LSTM forecasted soil moisture for (a) 2020 and (b) 2021 corn cropping seasons.The forecasted VWC by the LSTM model was no different than actual values (p > 0.05), whereas the VAR model forecasted values differ significantly (p < 0.05).The LSTM model forecasted subsurface soil moisture more accurately than the VAR model in terms of goodness-of-fit and forecasting error parameters.A similar study by Han et al.(2021) discusses that the LSTM model outperforms the multilayer perceptron network model with an R 2 value of 0.80 to 0.98.That study was conducted only using soil water content of different soil depths[74].A comparison between the seasonal autoregressive integrated moving average (SARIMA) and the LSTM model to forecast soil moisture profiles was conducted byKumar and Rao (2023).They also found the LSTM model superior in terms of accuracy (R 2 = 0.99).These models were also univariate models[75].The LSTM model in this study has a high R 2 value similar to the aforementioned studies.Moreover, the inclusion of atmospheric variables alongside subsurface VWC is essential due to the high correlation in the time series analysis and forecastability tests.The LSTM model's capacity for pattern-following further enhances accuracy and reliability in forecasting.Given the dependencies and pattern-following characteristics, it is posited that a multiple-variable LSTM model serves as a superior alternative to statistical models such as SARIMA and VAR.

Table 2 .
Statistical tests for time series analysis.

Table 3 .
Selection of order of Vector Auto Regression (* represents the minimum).

Table 3 .
Selection of order of Vector Auto Regression (* represents the minimum).

Table 4 .
Structure of the LSTM network model.

Table 5
shows the Durbin-Watson test for detecting autocorrelation.
The Durbin-Watson statistics near 2 indicate no autocorrelation.The possible values of Durbin-Watson statistics range from 0 to 4. Statistics test results of less than 2 indicate positive correlation, and greater than 2 indicate negative autocorrelation

Table 6 .
Augmented Dickey-Fuller test result for stationarity checks on soil volumetric water content.

Table 7 .
Summary of Johansen's cointegration test (cointegration with water content).

Table 9 .
Constant and coefficients for Vector Autoregression model equation.

Table 11 .
Statistics of the developed Artificial Neural Network model for soil volumetric water content (m 3 m −3 , after 51st Epoch).

Table 12 .
Goodness-of-fit for LSTM network model to soil moisture content (m 3 m −3 ).