Soil Moisture Prediction from Remote Sensing Images Coupled with Climate, Soil Texture and Topography via Deep Learning

: Soil moisture (SM) is an important biophysical parameter by which to evaluate water resource potential, especially for agricultural activities under the pressure of global warming. The recent advancements in different types of satellite imagery coupled with deep learning-based frameworks have opened the door for large-scale SM estimation. In this research, high spatial resolution Sentinel-1 (S1) backscatter data and high temporal resolution soil moisture active passive (SMAP) SM data were combined to create short-term SM predictions that can accommodate agricultural activities in the ﬁeld scale. We created a deep learning model to forecast the daily SM values by using time series of climate and radar satellite data along with the soil type and topographic data. The model was trained with static and dynamic features that inﬂuence SM retrieval. Although the topography and soil texture data were taken as stationary, SMAP SM data and Sentinel-1 (S1) backscatter coefﬁcients, including their ratios, and climate data were fed to the model as dynamic features. As a target data to train the model, we used in situ measurements acquired from the International Soil Moisture Network (ISMN). We employed a deep learning framework based on long short-term memory (LSTM) architecture with two hidden layers that have 32 unit sizes and a fully connected layer. The accuracy of the optimized LSTM model was found to be effective for SM prediction with the coefﬁcient of determination ( R 2 ) of 0.87, root mean square error (RMSE) of 0.046, unbiased root mean square error (ubRMSE) of 0.045, and mean absolute error (MAE) of 0.033. The model’s performance was also evaluated concerning above-ground biomass, land cover classes, soil texture variations, and climate classes. The model prediction ability was lower in areas with high normalized difference vegetation index (NDVI) values. Moreover, the model can better predict in dry climate areas, such as arid and semi-arid climates, where precipitation is relatively low. The daily prediction of SM values based on microwave remote sensing data and geophysical features was successfully achieved by using an LSTM framework to assist various studies, such as hydrology and agriculture.


Introduction
Freshwater resources are being depleted daily due to climate change and the increasing world population.Hence, the effective use of available water is of the utmost importance, which makes its monitoring vital for water savings, mitigation, and adaptation to climate change.In the last decade, soil moisture (SM) monitoring has been investigated with its different aspects, covering drought monitoring [1,2], flood prediction [3], and agricultural applications [4,5].In particular, in agriculture, SM significantly impacts planning, seeding, fertilization, and irrigation activities.In addition, its close relationship with crop productivity makes SM monitoring an essential factor for optimizing the use of available water resources [6,7].
The dynamics of SM are influenced by the physical properties of topography and soil as well as temporal changes in atmospheric conditions.The impact of these parameters on the variability of SM has been studied in depth concerning topographic data [8][9][10][11], soil texture [11][12][13], and climate variables [14 -16].In general, the prediction of SM in local studies (e.g., station-based SM forecasting) does not require static parameters, such as topography and soil texture, because these data vary insignificantly.However, the variability of SM in time depends on climate data in both local, regional, or global scale studies.
In the literature, researchers focused on minimizing the prediction uncertainties to estimate SM by using in situ measurements [17][18][19][20][21]. Including the meteorological parameters in estimating SM enhances the prediction accuracy significantly.The study conducted in [18] predicted the SM values of five stations located in Shandong Province of China by using varying depth measurements of SM together with meteorological variables.A similar study was performed in [19], extending the spatial distribution of stations worldwide, to forecast the SM values.In this study, however, the time series of each station were trained and validated separately.Another study carried out by [20] used the SM values of globally distributed stations of the International Soil Moisture Network (ISMN) coupled with climate, topography, and soil texture data to create a model for the daily prediction of SM in different depth layers.By spatially interpolating SM values of stations to form 0.25 • grid cells, the trained model can predict SM in a quasi-global way.Although the sensor measurements provide more reliable estimations of SM values, the dependency of the model on SM sensors limits the use of the model within specific regions where in situ measurements exist.The lack of measurements in high latitudes resulted in poorer forecasts of SM values, specifically in arid regions.
Even though in situ measurements play a crucial role in understanding SM, their spatial coverage and network-related problems make them limited in global studies.Recent developments in satellite-based remote sensing allowed continuous monitoring of the Earth's surface.In order to overcome the problems encountered in SM predictions when using in situ measurements, satellite data from microwave remote sensing has been used excessively [22,23].In this context, satellite images are the key to breaking free from the dependency of SM prediction from in situ sensors.The data from the NASA soil moisture active passive (SMAP) [24] and ESA soil moisture and ocean salinity (SMOS) [25] missions are a valuable asset for the global SM monitoring with their 2-3 days temporal resolution.In 2020, ref. [26] expanded the near real-time SM predictions by integrating time series data from SMAP and SMOS missions by using a statistical approach to overcome the inconsistencies between the different SM retrieval algorithms.
Although SMAP and SMOS SM products enable the monitoring of Earth's surface moisture in high temporal resolution, their applications are constrained due to their coarse spatial resolution.To overcome this limitation, researchers [27,28] used downscaling methods by merging higher-resolution satellite images with lower-resolution SMAP/SMOS data to achieve improved spatial resolution SM predictions.Even though these downscaling efforts are applicable in predicting SM, the generated maps still have an insufficient spatial resolution (∼5.6 km) for applications such as agricultural monitoring.In this regard, the launch of the Sentinel SAR satellites by ESA under the Copernicus Programme paved the way for accurate SM retrieval in smaller scale by acquiring higher spatial resolution microwave remote sensing images [29][30][31][32][33].
SM retrieval from remote sensing images has been improved by the state-of-art machine learning-based regression techniques owing to their ability to learn the relationship between predictors and SM from data [34][35][36][37].An extensive review on the use of machine learning algorithms for predicting SM can be found in [38].As computers have improved in performance, deep learning (DL) algorithms have become increasingly popular, as they can handle nonlinear and complex relationships between input and output [39].The SM forecasting studies that use remote sensing images exploited the ability of DL models to capture the spatial and temporal dynamics of SM at the expense of large datasets and high computational costs [5,[40][41][42][43][44][45].
Among the different DL methods, artificial neural networks (ANNs) have been proposed to estimate SM from microwave remote sensing images integrated with some auxiliary data [46].For example, although [47] coupled S1 images with soil texture information, ref. [44] used soil texture and soil temperature data to improve the prediction accuracy of SM retrieval.As an alternative to soil texture data, ref. [48] include climate and topography data to the ANN model.Furthermore, in [42], the combination of soil texture, topography, and climate data was utilized to improve the artificial neural network (ANN) model's performance.
The recurrent neural network (RNN) is a DL technique that considers the sequential relationships between input data and their effects on the output data.Therefore, such DL models are more appropriate when the sequence modeling tasks are needed, such as SM prediction.However, RNN struggles to learn interdependency between input and output data when the sequence span gets longer [49].In order to overcome the limitation of this DL technique, a special kind of RNN, long short-term memory (LSTM) is proposed by [50].With the LSTM, information from a sequence can be carried along the consecutive sequences, and the model can learn the relationship between sequential data and output data.
The study conducted by [51] applied LSTM architecture for the first time in SM studies by using the SMAP L3_SM_P product with climate and soil texture data to improve the design accuracy of SMAP SM data.In 2018, ref. [52] presented a model for the long-term SM forecast on both surface and different depths over the continental US, aiming to exploit the SMAP data together with the land surface models.The model can predict long-term SM values in the same region by using the SMAP SM time series data.In [53], the LSTM model trained with the same data classes used in [51] to nowcast the SM data, when the SMAP L3_SM_P product became available.Another study [54] downscaled the SMAP SM data in (∼1 km) with the help of climate, soil texture, and topography data by implementing LSTM.
This research aims to improve short-term SM prediction by combining the high temporal resolution SMAP SM product and high spatial resolution S1 backscatter coefficients integrated with the auxiliary data to assist the agricultural activities in the field scale.In this context, we used the SM data of the ground stations from ISMN, distributed around the world, to train an LSTM model with two microwave radar data products (SMAP and S1) together with soil texture, climate, and topographical data that are considered as the predictors of SM.The short-term forecast of SM on a field scale was successfully achieved by utilizing an approach dependent on microwave remote sensing, satellite-based observations.The model used in this study predict accurate SM values of the next day with high spatial resolution in regions with different geophysical properties and climate classes.
The manuscript is structured as follows: Section 2 explains the materials and methods.Section 4 describes the experimental research with data processing, model optimization, and our findings by focusing on the accuracy assessments of utilized methods.Section 5 presents the interpretation of the results and focuses on the effects of land cover, especially in the presence of vegetation, soil texture, and climate, on SM estimation.We finalized the paper by highlighting the important outcomes of this study in Section 6.

Materials
In this research, we aim to predict SM by combining the satellite-based data (S1 and SMAP) with soil texture percentages (clay, silt, and sand), topography (elevation, slope, aspect, and hillshade), and climate (temperature, evapotranspiration, and precipitation).By using the features presented in Table 1, we modeled the SM in time by using an LSTM framework.The statistics of these features were presented in Table 2.  ISMN is a data-hosting facility developed and still maintained by several universities [55][56][57].It is supported by the European Space Agency's (ESA) Earth Observation program.The ISMN stations include soil texture properties and SM values in time, freely available at https://ismn.geo.tuwien.ac.at/ (accessed on 17 August 2022).When we started the algorithm development, the total number of available stations was 1611 after 2017, when S1 data became available.The locations of the stations cover different climates and ecoregions.However, ∼70% of the available stations were located in the USA (see Figure 1).In addition to the station locations, in Figure 2, we present the ternary distribution of the soil data.Ternary distribution depicts the data in a 3D space, making it simpler to understand relations.Figure 2 shows that most soil samples are located in the loam class, followed by sandy loam, clay loam, and silty loam.Along with the soil texture and SM data, the metadata of each station includes land cover based on the ESA CCI land cover product [58] and Köppen-Geiger climate classes [59].It should be noted that these data were used only for the evaluation of the model performance w.r.t.varying land cover and climate class of the stations, not for training the model.

Satellite Data
In this research, we accessed all satellite data via the Google Earth Engine (GEE) Python application programming interface (API) [60].From the GEE, we downloaded the S1 data-one of the missions of ESA's Copernicus initiative-together with NASA's SMAP data on the location of the SM stations.Their ensured continuity for the future and sensitivity to changes in vegetation and soil properties makes both satellites a viable option for SM monitoring [5,[61][62][63][64].

Sentinel-1 (S1)
S1 is a synthetic aperture radar (SAR) satellite mission with a C-band (5.6 cm) sensor.The advantage of S1 lies in its sensitivity to SM content [65].There are two identical satellites in the S1 mission, S1a, and S1b.Each satellite has a temporal resolution of 12 days, resulting in an average of a six-day repeat cycle.Unfortunately, in December 2021, S1b failed data dissemination and became space junk.Since then, S1a has been providing data alone, and its temporal resolution depends on the area, with a minimum orbit repeat cycle of six days in Europe and 12 days in other areas.ESA is planning to launch S1c in the first half of 2023 to continue the dual satellite constellation.
This research used the ground range detected (GRD) 10-meter spatial sampled data processed by ESA.The data we have selected has vertical transmission-vertical received (VV) and vertical transmission-horizontal received (VH) polarizations.
In this study, all S1 passes between 31 December 2017 and 01 January 2021 were included for each station of ISMN.In the data processing step, 50 m × 50 m region of interest was defined around each station to calculate the mean value of S1 GRD backscatter signals.The mean backscatter signals were converted from logarithmic scale to linear scale.Additionally, the VH/VV ratio was added as a feature to the dataset.

Sentinel-2 (S2)
S2 is a multi-spectral instrument (MSI) satellite mission with spectral sensitivity to the visible-near-infrared region of the electromagnetic spectrum.In this mission, like S1, there are two identical satellites (a and b).Both satellites have a temporal resolution of 12 days, also resulting in an average of a six-day repeat cycle.
In our research, we used the Level-2a surface reflectance product processed by ESA.The data has 13 bands ranging from 10-to 60-m spatial resolution.We only used red and near-infrared bands to derive the vegetation indices.As in the case of S1, pixels within the 50 m × 50 m region of interest around the stations were extracted to calculate the mean NDVI values.However, this feature was only used to evaluate the model performance in the presence of vegetation and was not included in the feature set to train the model.

Soil Moisture Active Passive (SMAP)
In 2015, NASA launched the SMAP satellite to monitor the SM content by using Lband SAR (active) and radiometer (passive) instruments.SMAP has a temporal resolution of 2-3 days globally.In this research, we used Level-3 data of SMAP SM, which has 10-km spatial resolution [66].

Topography
The topography of the surface also influences the variation in the SM.With the GEE platform, topographic parameters, such as elevation, slope, aspect, and hill shade are obtained from the ALOS DSM Global 30 m dataset [67].

Climate Data
As an integral part of the water cycle, the dynamics of SM are closely associated with climate data, such as precipitation, temperature, and evapotranspiration.In this research, we gathered the precipitation (P), air temperature (T), and evapotranspiration (ET) data on the location of the SM stations by using the Meteomatics API [68].The available meteorological data have a spatial resolution ranging from 1 km to 5 km.Under the assumption of lower spatial variability, we used the reported data without changing the processing pipeline.The usage of the API was made possible within the service provided to AgriCircle AG by Meteomatics.

Data Preprocessing
For SM modeling, we created a dataset that combines static and dynamic features, as previously shown in Table 1.The static features are soil texture and topography; the dynamic features are climate and satellite-derived time-series data.In addition, we added a time variable as a dynamic feature.Because the LSTM framework requires time-series data, we repeated the static features as the sequence length before feeding it to the LSTM framework.
For dynamic features, we prepared a three-year dataset that includes in situ observations acquired from ISMN stations from 31 December 2017 to 1 January 2021.In this dataset, we applied data cleaning to reduce the data-originated uncertainty and eliminate the inconsistency within the measurements.Data cleaning involves a two-step elimination criteria.The first criterion is related to the record length.The record length condition requires that those stations be discarded if more than 10% of the measurements were missing in any station.The second criterion is developed to ensure sequential dependence in the observations.The SM stations with more than 60 consecutive days of missing measurements are also excluded from the analysis because a solution like interpolation was unrealistic considering the complex nature of the problem.According to these criteria, we found 103 stations, shown by red dots in Figure 1, out of 1611 with time series of SM measurements suitable for the analysis.Because dynamic features are gathered from various sources with different temporal resolutions, we upsampled all data into daily sampling by using the linear interpolation method for temporal matching.The ground measurements are resampled into daily SM values to ensure the matching temporal resolution.
For the training of the LSTM model, we formed five different scenarios to determine the contribution of feature groups.As previously shown in Table 1, in SM monitoring, climate data, soil texture, and topographical data are the main drivers of SM.Beginning with the climate data (Case I), we consecutively included soil texture (Case II), topographical data (Case III), and satellite data (Case IV and Case V) and listed them below.

Case-I Climate data
Case-II Climate data, soil texture Case-III Climate data, soil texture, topographical data Case-IV Climate data, soil texture, topographical data, satellite data (SMAP) Case-V Climate data, soil texture, topographical data, satellite data (SMAP, S1) In each case, time variables (sine and cosine of time) are kept within the features set because they are independent variables that represent the positional encoding of input features in a time series.

Methods
We employed the satellite data, soil texture, climate, and topography features mentioned above to forecast the SM by using the following process chart shown in Figure 3.The process starts with the first row and ends with the accuracy assessment and prediction of SM.

Long Short-Term Memory
As a descendent of RNN, [50] proposed an approach called long short-term memory (LSTM) to overcome the vanishing gradient problem in RNN.In LSTM, the ordinary unit cell repeats the input-output sequence; in RNN, this is replaced by a memory cell.LSTM contains three gates: the input gate i t , forget gate f t , and output gate o t .In addition to these gates, there are two different parts: cell state c t , which keeps information from previous states and transfers it to the next, and the hidden state h t , which is the output of the LSTM cell.The equation of input gate, forget gate, and output gate is defined as where w i , w f , and w o are the weight matrix, x t is input, h t−1 is the hidden state from previous time step, b i , b f and b o are bias vector and σ is the sigmoid activation function for the gates.The activation functions introduce nonlinearity by transforming inputs to targeted outputs with a nonlinear regression procedure, making the model capable of learning and performing more complex tasks.After the calculation of gates, the cell state and hidden state can be defined as where w c is the weight matrix, c t−1 is the cell state from the previous time step, b c is the bias vector, tanh is the hyperbolic tangent activation function and is the element-wise multiplication.The size of the weight matrix is determined according to the unit size and hidden layer size of the LSTM model, feature vector dimension, and feature sequence length.It should be noted here that the weight matrix of LSTM does not change through timesteps.For detailed information please refer to [69].

Accuracy Assessment
Four accuracy metrics, namely, coefficient of determination (R 2 ), root mean square error (RMSE), unbiased root mean square error (ubRMSE), and mean absolute error (MAE) were used to evaluate the performance of the implemented model for the SM prediction.We have In the above equations, y i , ŷi , and ȳi indicates actual SM, predicted SM, and mean value of the actual SM, at ith time step, respectively.Out of these four metrics, we use R 2 , RMSE, and ubRMSE to evaluate the performance and MAE for station-based assessments of the trained model.

Implementation of the LSTM Framework
The SM value at time t (Y t ) was predicted by using n number of input features with previous w sequential days (window size) as [X n t−1 . . .X n t−w ].After preparing the dataset, we divided it temporally into 60% for training, 10% for validation, and 30% for testing purposes.The temporal split corresponds to 658 days used to train the model starting from 31 December 2017 until 20 October 2019, 109 days used to validate the model training between 21 October 2019 and 6 February 2020, and 330 days used to evaluate the trained model from 7 February 2020 until 1 January 2021.Whereas the LSTM model was built with training data, the hyperparameter tuning was carried out by using a validation dataset.After the optimum hyperpamater set was determined, independent evaluation of the model was conducted based on testing data.
Before starting the training, we normalized all the input features via the MinMaxScaler function of the sklearn Python package to ensure numerical stability.For the normalization, we followed different strategies for static and dynamic features.By their nature, the static features have global minimum and maximum values; therefore, we normalized them together.On the other hand, dynamic features have local variations that change each station's minimum and maximum values, leading to a station-based normalization.
One of the primary flexibility features involved in the use of time series data is the varying length of past data to make future predictions.In such a structure, the number of previous timesteps is called the window size.The window size parameter must be selected carefully because it impacts forecast accuracy.For its determination in the SM forecast, we reformed the original dataset according to different window sizes: last one day, five days, ten days, and thirty days.
The LSTM networks were created by using TensorFlow back-end with GPU processing integration in the conda environment.We used the the gridSearchCV function of the sklearn Python library, to determine the LSTM model's hyperparameters.In addition, in the LSTM architecture, all models started with an LSTM layer, followed by a one-dimensional dense layer as an output.

Results
The results of the SM prediction framework were presented in this section, starting with data preparation followed by model training, model parameter optimization, and finally the assessment of feature effects.

Model Parameter Optimization
The grid search algorithm was applied by using various hidden layers and unit sizes, learning rates, loss functions, and optimization functions for hyperparameter optimization.The number of hidden layers for LSTM was tested by gradually increasing from a single layer to three stacked layers.The unit size of these stacked layers was tested for 32, 64, and 128.The tested learning rates were 10 −2 , 10 −3 , and 10 −4 .For the optimization function, we tested Adam, Adamax, and SGD [70].For epoch number, the test was for values between 1000 and 1500 with 100 steps.Lastly, the dropout rate was between 0 and 0.5 with 0.05 increments.
The performances of the trained models with setups having different window sizes are presented in Table 3.We can see that the window size of five days is performing better than other window sizes, with the overall MAE reduced to ∼0.03 for both training and testing.Out of these four different window sizes, the one-day window size showed the worst prediction results with R 2 values of ∼0.70 for both training and testing.Following the window size of five days, 10, and 30 days gave comparable results.Focusing on the window size of the last five days, which performed better than the other tested cases, we found that LSTM with two hidden layers and 32 unit sizes followed by a one-dimensional dense layer having a learning rate of 10 −3 , an epoch number of 1000, and a dropout rate of 0.25, and Adamax as the activation function gave the best accuracy for SM prediction.The summary of the grid search is given in Table 4.

Effect of the Different Features on the Model Performance
After the optimum window size and hyperparameters were assessed, we investigated the effect of a different group of features on the model's prediction capability by designing five different cases.Table 5 summarizes the statistics of these cases for their corresponding feature combinations where the model hyperparameters are based on the best performing LSTM model with a window size of five days (see Table 4).We found that the optimum solution for SM prediction was achieved when all feature groups were combined, i.e., Case V, for training the LSTM model.Figure 5 shows the outcomes of the training (left side) and testing (right side) SM predictions for all stations.The scatter plots between measured and estimated values for the training and testing datasets show a similar pattern when compared.The main population of the points is along the 1-1 line.The model can make good predictions with MAE of less than 0.035.In the second row, violin plots show the measurement and prediction distributions.The left side of the violin corresponds to actual values, while the right side stands for the predictions.In an ideal case, we should see a mirror-like shape, which is also the case for our predictions with small differences due to the error previously mentioned in the scatter plots.

Discussion
The LSTM-based SM forecast model relies on satellite-driven data, soil texture, topography, and climate.Therefore, as the predictions are conducted for different conditions, we investigated the prediction performances for land cover classes, biomass variations based on the NDVI calculated from the Sentinel-2 satellite, climate classes, and soil texture.

Relationship between Model Performance and Land Cover
The physical characteristics of the land cover affect the prediction accuracy of the developed LSTM model.This effect originates from the physical heterogeneity of the observed area.
In the ISMN, every station is provided with its land cover type.The corresponding land covers are based on the ESA CCI land cover product [58].In a total of 103 stations, 34 croplands, 20 grasslands, 18 shrublands, 23 trees/forests, and 6 mosaics (mixture of trees, shrubs, herbaceous, and cropland), and two urban sites exist.However, we did not investigate the urban sites due to the insufficient number of samples.
Figure 6 presents the model's prediction capability for different land covers.The smallest MAE (∼0.02) was achieved for shrubland class.The model shows similar performance for cropland, grassland, and tree covers with a mean MAE of approximately ∼0.03.However, the variance of MAE for the cropland cover is higher than the others.The worst MAE, (∼0.05), is obtained for the mosaic cover due to the complexity of the surface.This can be explained by the scattering mechanism of SAR imagery in the presence of vegetation and forest.Because the shrubland land cover class is sparsely vegetated area, radar signals can interact with the soil more than vegetation or forest canopy.

Relationship between Model Performance and NDVI
The presence of biomass over soil may affect the model's prediction capability because the satellite data also carries information regarding the vegetation.To see the effect of the biomass, we calculated the NDVI from the S2 surface reflectance image during the testing periods and compared it with the MAE values of the model for the prediction dates.
Figure 7a visualizes the distribution of MAE values for all available stations together with the NDVI mean and NDVI max values.The figure shows the correlation between the mean NDVI mean and MAE values.MAE values tend to increase with increasing NDVI mean values.
The violin plot given in Figure 7b shows the distribution of the actual vs. predicted SM values at stations whose MAE values are lower (Station ID: 1569, 1541, 1577) with low soil moisture and higher (Station ID: 1527, 816, 1481) with high soil moisture.Here, we focused on finding out the origins of the variations in MAE values among these stations.For this purpose, the variation of the NDVI values were used.This analysis showed that the NDVI variation is one of the reasons for the deterioration of the SM prediction.
The backscattered signals obtained from SAR data were strongly affected by high biomass due to the interaction between electromagnetic radiation, plants, and soil.Therefore, these findings show that the model's estimation performance is prone to uncertainties from the existing biomass.Similar findings also exist in previous studies [35,[71][72][73].These studies found that the SM content in bare or low-density vegetation areas is more predictable than in high-density vegetation areas.Another investigation that we conducted on the impact of NDVI variation was using station-based time series.For this purpose, we focused on some stations that show a variation in NDVI over the years.We see that the growth cycle of NDVI values before seeding and after harvest is lower than crops' vegetative and reproductive phases.We believe that the prediction capability of the model thoughout the growth cycle is an important detail that needs to be investigated.Hence, we prepared the Figure 8a to show the model's performance in time.According to Figure 8a, the model's performance on the SM forecasting dropped approximately between May 2020 to October 2020 due to very low SM values.During this period, we can see an increase in the NDVI values from ∼0.2 to ∼0.9.We observed a similar situation in the other stations as well.In the time series of stations 827 and 1572, given in Figure 8b,c, the station has higher NDVI values from June to the end of December and from mid-April to the beginning of November, respectively.These three stations and the others with similar behavior have MAE values less than 0.075.

Relationship between Model Performance and Soil Texture
The variation in the soil texture is a driving factor for the spatial and temporal changes in the SM.Soils with high clay or silt fraction are associated with high water-holding capacity, resulting in a generally higher SM value.On the other hand, such soils lose their moisture slower than the others.From an agricultural point of view, clay soils have the highest soil moisture content in general; however, silty soils are more favorable for plants.
We provide a ternary plot in Figure 9 to show the MAE values of stations, which are scattered based on their soil texture contents.In the same figure, we also included each station's NDVI mean values in a color map.The combination of soil texture and NDVI mean allows us to observe the relationship between the amount of silt and clay in the soil and vegetation activity.
The size of each circle, representing a station, is proportional to its MAE value.We observe that the smaller circles generally accumulate in areas where the sand fraction is high.Among all the stations, 61% have sandy soil with an average MAE of 0.03, and 38% of them are silty soils with 0.04 average MAE.
As we focus on particular stations for an in-depth investigation, it was observed that the silt content of the stations, having cropland cover, given in Figure 8 are 52%, 61%, and 42% for stations 816, 827, and 1572, respectively.In the corresponding stations, we have similar findings that justify the performance of the model w.r.t. the change in the NDVI values.In addition to silt and clay-dominated soils, the soil types in which the sand proportion is higher generally have a lower trend in SM values because the sandy soil has low waterholding capacity.This property makes them less suitable for agricultural applications.In order to investigate the sand effect, we present the time series of SM predictions at stations 815, 1541, and 1569 in Figure 10.The typical features of these stations are the high percentage of sand fraction in soil content (81%, 52%, and 52% for stations 815, 1541, and 1569) and lower NDVI values along the time series.The mean NDVI value for these stations is 0.15, 0.19, and 0.11, respectively.Unlike the findings from Figure 8, we saw that in Figure 10a, the higher sand fraction leads to lower and less fluctuated SM values.Thus, the highest accuracy was obtained at stations with sandy soils having low NDVI values.

Relationship between Model Performance and Climate Classes
Lastly, we investigated the effect of climate classes.To this aim, we used [59], which defines four classes in total: tropical (A), dry (B), temperate (C), and continental (D).Our selected stations are distributed as 23% in class B and 75% in class C. The remaining 2% belongs to classes A and D, with one station for each.
In Figure 11, we present the model's prediction performance under different climate conditions as a boxplot.The stations in class B shows lower MAE values compared to those in class C (see Figure 11a).Considering the climate class properties, the rapid changes in the moisture affect the dielectric properties of the target [32,74]; at the same time, precipitation is a significant factor that negatively impacts the SM prediction due to the change in the interaction between SAR signals and land surface.
We obtained better soil moisture predictions in arid climates (Bw) than those in semiarid climates (Bs) regions due to less precipitation and more evapotranspiration.We also observed a similar behavior between no-dry-season climate (Cf) and dry summer (Cs) temperate climate classes (see Figure 11b).The no-dry-season climate, as inferred by its name, has a high precipitation rate compared to a dry summer climate, which makes the stations located in this climate region challenging for SM prediction.

Conclusions
In this study, we investigated the short-term SM prediction based on satellite-derived data with LSTM.For this purpose, the static and dynamic features were combined to create sequential input data and used in situ SM measurements of 103 stations from ISMN as an output to train an LSTM model.Our approach uses soil texture and topographical data as static features and satellite (S1 and SMAP) and climate data as dynamic features.As SM monitoring is crucial for water resource management, we employed the SAR data due to its lower sensitivity to atmospheric conditions than optical data.To optimize the LSTM models' hyperparameters, we used the gridSearchCV algorithm.After the optimization, the overall testing accuracy of the model was calculated as R 2 = 0.87, RMSE = 0.046, and MAE = 0.033.The values obtained from different stations are summarized in Appendix A, including the station ID, network and station name, soil texture, NDVI mean and max values, climate, land cover classes, and the corresponding MAE values.
During our investigations, it was observed that the model's prediction performance is affected by the soil texture, vegetation status, and climate conditions.Variations in soil texture change the soil water-holding capacity.In the case in which the amount of sand was dominant, the SM values were easier to model than in the case of silt and clay dominance due to the low SM values and fewer fluctuations in sandy soils.We also observed that vegetation affects the interaction between the SAR signal and the soil.Thus, the model's prediction ability was lowered in vegetated areas with high NDVI values.Moreover, the model can predict better under dry climate conditions, such as arid and semi-arid climates in relatively low precipitation.
This study used satellite-based products to create a model to forecast SM values.For operational purposes, we know that obtaining soil texture data on the pixel level is challenging.However, we can overcome this by conducting an intensive sampling campaign for soil texture, or existing models can be used [75], which employs S1 and S2 multi-temporal data.
In the future, we plan to combine the LSTM model with the attention mechanism to study the contribution of each variable to SM prediction.The LSTM model combined with the attention mechanism can determine the importance of each feature and its temporal relationship with SM phenomena.Thus, we can increase the accuracy of the model predictions and explain the physical behavior of the black-box model.

Figure 1 .
Figure 1.The spatial distribution of ISMN sites.Red dots display the distribution of 103 stations with reliable data.

Figure 2 .
Figure 2. Ternary plot of the soil class distribution of ISMN sites.

Figure 3 .
Figure 3.The overall process chart of the study, starting from data sources and ending with the final-user output.

Figure 4
Figure 4 presents the training progress of the best performing LSTM model, the optimum hyperparameters of which are given in Table 4.The figure shows the change in the loss value, R 2 , and RMSE w.r.t.epoch as the model continues its training with a constant learning rate of 10 −3 .The loss value, R 2 , and RMSE for training and validation datasets converge around epoch number 1000, and the model tends to overfit beyond 1000 epochs.

Figure 4 .
Figure 4. Accuracy of the best-performing LSTM model according to epoch.The upper figure shows the training progress of the model w.r.t.loss value per epoch, and the lower figure shows the change in accuracy w.r.t.R 2 and RMSE.

Figure 5 .
Figure 5.The scatter plot (top left and right) and distribution graph (bottom left and right) of (a) training and (b) testing data of windows size 5.

Figure 6 .
Figure 6.Overall MAE for land cover classes.

Figure 7 .
Model performance w.r.t.NDVI variation.(a) Scatter plot shows the distribution of MAE vs. NDVI relationship for each station.(b) Violin plots representing the statistical distribution of actual and predicted temporal SM data at the ISMN stations with their minimum and maximum NDVI values.

Figure 9 .
Figure 9. Soil texture ternary plot w.r.t.MAE of each station.The circles are scaled based on their MAE value and are colored based on NDVI mean .

Figure 10 .
Time series of SM predictions during the testing period for stations 815, 1541, and 1569.

Table 1 .
Data used in this research provided with its descriptions, spatial, and temporal resolutions.

Table 2 .
The statistics of features used in the study.

Table 3 .
Accuracy of LSTM models with different window size.

Table 4 .
Hyperparameter ranges of LSTM model and selected values for the last five days window size.

Table 5 .
Accuracy analysis of LSTM with different features set.
CC-I: Climate Class-I, CC-II: Climate Class-II, LCC: Land Cover Classification.