Multiple-Depth Soil Moisture Estimates Using Artiﬁcial Neural Network and Long Short-Term Memory Models

: Accurate prediction of soil moisture is important yet challenging in various disciplines, such as agricultural systems, hydrology studies, and ecosystems studies. However, many data-driven models are being used to simulate and predict soil moisture at only a single depth. To predict soil moisture at various soil depths with depths of 100, 200, 500, and 1000 mm from the surface, based on the weather and soil characteristic data, this study designed two data-driven models: artiﬁcial neural networks and long short-term memory models. The developed models are applied to predict daily soil moisture up to 6 days ahead at four depths in the Eagle Lake Observatory in California, USA. The overall results showed that the long short-term memory model provides better predictive performance than the artiﬁcial neural network model for all depths. The root mean square error of the predicted soil moisture from both models is lower than 2.0, and the correlation coefﬁcient is 0.80–0.97 for the artiﬁcial neural network model and 0.90–0.98 for the long short-term memory model. In addition, monthly based evaluation results showed that soil moisture predicted from the data-driven models is highly useful for analyzing the effects on the water cycle during the wet season as well as dry seasons. The prediction results can be used as basic data for numerous ﬁelds such as hydrological study, agricultural study, and environment, respectively.


Introduction
Soil moisture is a component of the natural hydrological cycle that is influenced by rainfall, evapotranspiration, runoff, and fluctuations in the groundwater level, and it is an important element that links climate, soil, and vegetation in this cycle. It is defined as the moisture present in soil voids (space between soil particles) and controls the water-energy exchange between the soil and the atmosphere, accounting for approximately 0.0001% of the surface water [1]. Due to the high variability of soil moisture, it is very important to understand changes in its spatial and temporal distribution. This information can be used in many areas, such as weather forecasting, drought monitoring, runoff forecasting, flood control, yield estimation, and reservoir management [2][3][4][5][6][7]. In certain fields, soil moisture must be understood according to the soil depth. For example, for efficient yield management, it is important to obtain the soil moisture information at relatively varied depths because the root depth varies according to the crop type [8]. As the effect of soil moisture, layer-by-layer, on flooding and drought varies in terms of hydrology, it is essential to analyze the soil moisture at various depths rather than at specific depths.
There are two types of methodologies for measuring soil moisture content: direct and indirect methods. The gravimetric method, which derives the moisture content from the weight difference of the collected soil samples before and after drying, is a typical direct method. The indirect methods include those using a neutron probe, time-domain reflectometry, and psychrometry [9,10]. However, to understand the spatial variability of soil moisture throughout a region, soil moisture should be measured on a site-by-site or point-by-point basis, using gravitational, nuclear, electromagnetic, tension, and humidity techniques over a wide geographical area, which requires considerable time and expensive equipment. For this reason, various physical models capable of predicting the amount of soil moisture, such as the National Water Model (NWM), soil-plant-atmosphere-water model (SPAW), the U.S. Department of Agriculture Hydrograph Laboratory model (US-DAHL), and Sacramento-soil moisture accounting model (SAC-SMA) have been developed and used [11][12][13].
To improve the quality of soil moisture prediction, techniques for predicting soil moisture using a remote sensing system have been developed, which can provide a wide range of practical information [14][15][16][17]. However, when predicting soil moisture using a remote sensing system, a downscaling technique is required owing to the low spatial resolution of the remote sensing system. Parinussa et al. [18] combined images of highand low-resolution measurement data and applied downscaling by using the smoothing filter-based intensity modulation method. Chauhan et al. [19] downscaled the soil moisture data using the special sensor microwave/imager and the advanced very high-resolution radiometer data, with NDVI, land-surface temperature, and surface albedo used as the parameters. Using this method, Ray et al. [20] downscaled the 25-km soil moisture data obtained by the Advanced Microwave Scanning Radiometer on the Earth Observing System to 1 km and compared them with the soil moisture data derived from the VIC-3L-model, which is a physical/dynamic model. However, conventional measuring methods have limitations in that the prediction reliability is reduced due to problems such as the increased observation period, obsolescence of observation equipment, and missing points, as well as the requirement of considerable time, manpower, and money. In addition, the soil moisture data derived from remote sensing have limitations in terms of the lattice size and observation depth and need to be calibrated as they are significantly influenced by factors such as vegetation cover, soil temperature, and terrain. To overcome these limitations, various data-driven models have recently been studied to estimate soil moisture.
With the recent development in computer technology, various models for estimating soil characteristics, such as soil temperature and soil moisture, are being developed by applying data-driven models such as artificial neural network (ANN), support vector machine (SVM), and long short-term memory (LSTM) models. The main concept of datadriven models, such as machine learning, is to determine the relationship between input and output variables in the absence of a clear understanding of the physical process of a certain system. These methods can be more effective than physical or dynamic models for solving complex and nonlinear problems [21]. James et al. [22] applied convolutional neural networks (CNN) for water segmentation using satellite imagery. The results have shown that CNN is suitable for contributing to the wider use of satellite imagery for water management. Furthermore, Kim et al. [23] applied a multilayer perceptron (MLP) and an adaptive neuro-fuzzy inference system to predict the daily soil temperature at two observation points (Champaign and Springfield stations) located in Illinois, USA. A comparison of the simulation results with the observed data confirmed that both models appropriately predicted the soil temperature. Feng et al. [24] confirmed the applicability of various machine learning methods (extreme learning machine (ELM), generalized regression neural networks, backpropagation neural networks, and random forest (RF)) using meteorological factors to predict soil temperature according to the soil depth. As a result of the analysis, all models led to statistically significant results, and in particular, the ELM model was excellent in terms of performance and computational speed. In addition, Sutskever et al. [25] presented improved sequence-based data analysis results using a sequence-to-sequence structure that can consider temporal dependence on the LSTM model (LSTM-s2s). Various studies have been conducted to develop prediction models for Water 2021, 13, 2584 3 of 14 soil moisture and soil temperature based on data-driven models [26][27][28]. Gill et al. [29] developed two models to predict soil moisture by applying SVM and ANN and compared their performances. Consequently, although both models performed well, SVM showed a better performance. Prakash et al. [30] predicted soil moisture using machine learning techniques, multiple linear regression, support vector regression (SVR), and recurrent neural networks; evaluated the predictive power using mean square error (MSE) and R2; and confirmed the applicability of various machine learning models for soil moisture prediction. Achieng [31] predicted and evaluated soil moisture using machine learning techniques such as the radial basis function (RBF), single-layer ANN, and deep neural network, among which RBF was found to be outstanding. Adeyemi et al. [32] predicted soil moisture through dynamic neural network modeling. The model was trained to generate a one-day-ahead prediction of the volumetric soil moisture content based on the previously conducted soil moisture, precipitation, and climatic measurements. In their study, the field data obtained from three sites were used for the prediction, and an R2 value of above 0.94 was obtained in all sites through the model evaluation. Other studies have been conducted for the prediction and evaluation of soil moisture using a machine learning technique and comparison with existing methods [33][34][35][36].
However, in previous studies, since soil moisture at a single depth was simulated and predicted, it is difficult to recognize the performance of the data-driven models for soil moisture prediction at various depths from surface to deep layers. To address these limitations, this study aims to develop prediction models to estimate soil moisture at multiple depths by considering machine learning techniques (i.e., ANN) and deep learning techniques (i.e., LSTM).

Study Area and Data
The Eagle Lake catchment (40 • 37 N latitude, 120 • 43 W longitude), located in California, USA, was designated as the study area ( Figure 1). The hydrological data in this area are in high demand due to the annual flood and drought damage caused by various topographic and climate phenomena. Since flooding and drought are highly related to soil moisture [37], it is important to observe and manage the soil moisture data to reduce the damage. The average annual precipitation in the region is 550 mm, where over 90% of the precipitation occurs from November to March. In addition, large amounts of rainfall are generated in the region from extratropical cyclones or jet streams from the Pacific Ocean [38], and heavy rains are generated from atmospheric rivers during the rainy season [39,40]. sequence structure that can consider temporal dependence on the LSTM model (LSTM-s2s). Various studies have been conducted to develop prediction models for soil moisture and soil temperature based on data-driven models [26][27][28]. Gill et al. [29] developed two models to predict soil moisture by applying SVM and ANN and compared their performances. Consequently, although both models performed well, SVM showed a better performance. Prakash et al. [30] predicted soil moisture using machine learning techniques, multiple linear regression, support vector regression (SVR), and recurrent neural networks; evaluated the predictive power using mean square error (MSE) and R2; and confirmed the applicability of various machine learning models for soil moisture prediction. Achieng [31] predicted and evaluated soil moisture using machine learning techniques such as the radial basis function (RBF), single-layer ANN, and deep neural network, among which RBF was found to be outstanding. Adeyemi et al. [32] predicted soil moisture through dynamic neural network modeling. The model was trained to generate a one-day-ahead prediction of the volumetric soil moisture content based on the previously conducted soil moisture, precipitation, and climatic measurements. In their study, the field data obtained from three sites were used for the prediction, and an R2 value of above 0.94 was obtained in all sites through the model evaluation. Other studies have been conducted for the prediction and evaluation of soil moisture using a machine learning technique and comparison with existing methods [33][34][35][36]. However, in previous studies, since soil moisture at a single depth was simulated and predicted, it is difficult to recognize the performance of the data-driven models for soil moisture prediction at various depths from surface to deep layers. To address these limitations, this study aims to develop prediction models to estimate soil moisture at multiple depths by considering machine learning techniques (i.e., ANN) and deep learning techniques (i.e., LSTM).

Study Area and Data
The Eagle Lake catchment (40°37′ N latitude, 120°43′ W longitude), located in California, USA, was designated as the study area ( Figure 1). The hydrological data in this area are in high demand due to the annual flood and drought damage caused by various topographic and climate phenomena. Since flooding and drought are highly related to soil moisture [37], it is important to observe and manage the soil moisture data to reduce the damage. The average annual precipitation in the region is 550 mm, where over 90% of the precipitation occurs from November to March. In addition, large amounts of rainfall are generated in the region from extratropical cyclones or jet streams from the Pacific Ocean [38], and heavy rains are generated from atmospheric rivers during the rainy season [39,40].  In this study, five variables provided by the Soil Climate Analysis Network (SCAN), containing air temperature, precipitation, vapor pressure, soil temperature, and relative humidity, were used as input data for each model to predict soil moisture at four depths. These data were collected on a daily time scale from November 2014 to February 2020 at an observation station located in Eagle Lake ( Figure 1). In the SCAN monitoring system, the dielectric constant measuring device was used for measuring soil moisture at multiple depths. In this study, the soil temperature and soil moisture data were collected from four layers at the monitoring site, with depths of 100, 200, 500, and 1000 mm from the surface, for predicting the soil moisture at various depths. In this study, approximately 70% (November 2014 to June 2018) of the total data were used for training the model, and the remaining 30% (July 2018 to February 2020) were used for testing.

Long Short-Term Memory Model (LSTM)
Long short-term memory (LSTM), introduced by Hochreiter and Schmidhuber (1997) [41], is a deep learning model based on a recurrent neural network (RNN), which was developed to solve the problem of gradient vanishing or gradient exploding of the error slope in the RNN model when analyzing long-term data. LSTM model is used for learning continuously composed data, mainly for purposes such as language translation and speech pattern recognition. In the field of hydrology, it is used for prediction through learning the hydrological time-series data, such as runoff prediction [42][43][44] and water-level prediction [45]. Figure 2 shows the structure and conceptual diagram of LSTM. In this study, five variables provided by the Soil Climate Analysis Network (SCAN), containing air temperature, precipitation, vapor pressure, soil temperature, and relative humidity, were used as input data for each model to predict soil moisture at four depths. These data were collected on a daily time scale from November 2014 to February 2020 at an observation station located in Eagle Lake ( Figure 1). In the SCAN monitoring system, the dielectric constant measuring device was used for measuring soil moisture at multiple depths. In this study, the soil temperature and soil moisture data were collected from four layers at the monitoring site, with depths of 100, 200, 500, and 1000 mm from the surface, for predicting the soil moisture at various depths. In this study, approximately 70% (November 2014 to June 2018) of the total data were used for training the model, and the remaining 30% (July 2018 to February 2020) were used for testing.

Long Short-Term Memory Model (LSTM)
Long short-term memory (LSTM), introduced by Hochreiter and Schmidhuber (1997) [41], is a deep learning model based on a recurrent neural network (RNN), which was developed to solve the problem of gradient vanishing or gradient exploding of the error slope in the RNN model when analyzing long-term data. LSTM model is used for learning continuously composed data, mainly for purposes such as language translation and speech pattern recognition. In the field of hydrology, it is used for prediction through learning the hydrological time-series data, such as runoff prediction [42][43][44] and waterlevel prediction [45]. Figure 2 shows the structure and conceptual diagram of LSTM. LSTM is composed of several blocks, each of which comprises cells that can maintain their state with time and three nonlinear gates that control the data flow ( Figure 2). The three gates are the forget gate (ft; Equation (6)), input gate (it; Equation (7)), and output gate (ot; Equation (8)). The forget gate can determine how much of the information from the previous block should be retained. The purpose of the input gate is to determine which of the new information is stored in the cell. The output gate determines the final output value among the information stored in the cell. The LSTM algorithm is operated from an input sequence data Xt to final outcome Ot by looping through Equations (1)-(6) with initial values of C0 = 0 and h0 = 0 [32]. LSTM is composed of several blocks, each of which comprises cells that can maintain their state with time and three nonlinear gates that control the data flow ( Figure 2). The three gates are the forget gate (f t ; Equation (6)), input gate (i t ; Equation (7)), and output gate (o t ; Equation (8)). The forget gate can determine how much of the information from the previous block should be retained. The purpose of the input gate is to determine which of the new information is stored in the cell. The output gate determines the final output value among the information stored in the cell. The LSTM algorithm is operated from an input sequence data X t to final outcome O t by looping through Equations (1)-(6) with initial values of C 0 = 0 and h 0 = 0 [32].
where σ is the nonlinear activation function. W f , W i , W o , and W c are weight values of forget gate, input gate, output gate, and memory cells, h t−1 denotes output data from the previous cell, x t is current input data, and b f , b i , and b o are bias vectors of each gate, respectively. In addition, C t is the state of any cell generated from the activation function.
In this study, Rectified Linear Unit (Relu) functions were used as activation functions.
As the calculation process of LSTM is based on various parameters, it is somewhat more complicated and time-consuming than the other models but presents a highperformance result. In addition, unlike other models, it is very useful for learning the relation of long-term data because it uses the concept of a cell to store and update information selectively according to the previous state and current input [46,47]. The LSTM model is available as standard packages in various software programs, and the Keras framework in the Python 3.4 was used to operate the models in this study.

Artificial Neural Network (ANN)
McCulloch and Pitts (1943) introduced the ANN model, which is a supervised machine learning algorithm. Generally, the ANN model is applied to solve problems for the classification and prediction of specific variables that have undefined mathematical relationships. The ANN model is described as a mathematical structure capable of representing the complex and nonlinear process correlating the input and output of the system [48]. The ANN model has shown desirable performance for the analysis of nonlinear relationships between independent and dependent variables in a given data set [49]. Figure 3 represents the conceptual diagram of the ANN model.
where σ is the nonlinear activation function. Wf, Wi, Wo, and Wc are weight values of forget gate, input gate, output gate, and memory cells, ht−1 denotes output data from the previous cell, xt is current input data, and bf, bi, and bo are bias vectors of each gate, respectively. In addition, ̃ is the state of any cell generated from the activation function. In this study, Rectified Linear Unit (Relu) functions were used as activation functions.
As the calculation process of LSTM is based on various parameters, it is somewhat more complicated and time-consuming than the other models but presents a high-performance result. In addition, unlike other models, it is very useful for learning the relation of long-term data because it uses the concept of a cell to store and update information selectively according to the previous state and current input [46,47]. The LSTM model is available as standard packages in various software programs, and the Keras framework in the Python 3.4 was used to operate the models in this study.

Artificial Neural Network (ANN)
McCulloch and Pitts (1943) introduced the ANN model, which is a supervised machine learning algorithm. Generally, the ANN model is applied to solve problems for the classification and prediction of specific variables that have undefined mathematical relationships. The ANN model is described as a mathematical structure capable of representing the complex and nonlinear process correlating the input and output of the system [48]. The ANN model has shown desirable performance for the analysis of nonlinear relationships between independent and dependent variables in a given data set [49]. Figure 3 represents the conceptual diagram of the ANN model. The initial ANN model is a single-layer perceptron containing one input and output layer. It is known as an effective method for linear separation, but it has the limitation that it is hard to solve nonlinear problems [50]. To overcome this limitation, Multi-Layer Perceptron (MLP), one of the most common neural network models, was implemented. The MLP is a class of ANN model and is a complex network that consists of three different types of layers, including input, hidden, and output layers (Figure 3). Since the ANN with multiple layers was used, it can be called ANN, MLP, or ANN-MLP models (In this study, the ANN term is used). These three layers contain sets of neurons that are fully connected with neurons in the following layer, and each layer has different weight values. The ANN model was designed to reduce the difference between estimated and targeted values by the process of adjusting the parameters of the model. The ANN model can be mathematically formulated as following Equation (7). The initial ANN model is a single-layer perceptron containing one input and output layer. It is known as an effective method for linear separation, but it has the limitation that it is hard to solve nonlinear problems [50]. To overcome this limitation, Multi-Layer Perceptron (MLP), one of the most common neural network models, was implemented. The MLP is a class of ANN model and is a complex network that consists of three different types of layers, including input, hidden, and output layers (Figure 3). Since the ANN with multiple layers was used, it can be called ANN, MLP, or ANN-MLP models (In this study, the ANN term is used). These three layers contain sets of neurons that are fully connected with neurons in the following layer, and each layer has different weight values. The ANN model was designed to reduce the difference between estimated and targeted values by the process of adjusting the parameters of the model. The ANN model can be mathematically formulated as following Equation (7).
Water 2021, 13, 2584 6 of 14 where f denotes the activation function in the layers, and X, w represent the input value and weight values between layers. B and b indicate the biases in the output and hidden layers. In the model algorithm, the X can be multiplied by the weight value (w), and then the coupled value is converted by the activation function (f ). The representative activation functions used in the ANN model include sigmoid, hyperbolic tangent function (tanh), and Relu functions. In this study, the Relu function was used as an activation function for the ANN model.

Model Development
In order to predict soil moisture for each layer at t + n time points, the historical observation data from t − m to t time points were used as input for each model. Soil moisture at four layers was predicted using Equation (8): where S is the soil moisture value, I indicates input variables, k is the number of input variables, and l denotes the four layers. m means the previous time steps of input data, n is the prediction time. In this study, the observed meteorological and soil moisture data from the previous 12 days were used as input data to predict soil moisture from 1 to 6 days ahead.
The collected input data required two pre-processing steps. The first step is to supplement the missing values of the data generated during the observation, to enhance the data continuity. The process of supplementing the missing values is essential for the data-driven models, as the temporal continuity of the data is very important. In this study, a missing value was substituted with the average value of the soil moisture data before and after the time step. The second pre-processing step involved data normalization. As the unit and range of each data set differ in each model, the function values are very likely to diverge, thus degrading the simulation performance. Therefore, in this study, all input data were converted to values between 0 and 1 through the normalization process (Equation (9)), as follows: where Z i is the normalized variable, X i is the actual variable, and X max and X min are the maximum and minimum values of the variable, respectively.

Evaluation Methods
In this study, the correlation coefficient (CC), root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), and relative error (RE) were used to evaluate the predictive performance of models applied for soil moisture prediction. CC is an index indicating the degree of the linear relation between the actual and predicted values. A CC value close to one indicates that the two variables have a very strong positive linear relation. RMSE is the standard deviation of the prediction error, which is a difference between predicted results and observation. As the value of RMSE is closer to zero, the prediction can be determined to be more accurate with fewer errors. In addition, the NSE value of one indicates that the model perfectly simulates the observed value, and a value less than zero means that the average observed value is better than simulated results. RE denotes the ratio of the difference between simulated and observed values to the observation. RE value of zero means the best performance of simulated results. Equations (10)-(13) indicate the formula for CC, RMSE, NSE, and RE, respectively.
Water 2021, 13, 2584 where Y obs is the actual value, Y pre is the predicted value derived from the model, Y obs and Y pre are the averaged value of Y obs and Y pre . n is the number of data sets.

Comparison with Observations at Four Layers
In this study, the quality of daily soil moisture at four layers, predicted from the LSTM and ANN models, was evaluated through a comparison with the soil moisture observed at the soil moisture station in Eagle Lake. The evaluation was conducted on the daily soil moisture data observed between July 2018 and February 2020. Figure 4 shows scatter plots of the observed soil moisture and predicted results from the two models (i.e., ANN and LSTM) for the four layers. All data-driven models predicted soil moisture values acceptably, compared with the observed values. The comparison results show that the LSTM model provides relatively better performance than the ANN model for all depths. In addition, the predictive performance of the ANN model seems to decrease as the lead time increases from 1 to 6 days. For example, for a lead time of 6 days, the points representing soil moisture predicted from the ANN model appear to be relatively far apart on the X = Y line. Moreover, it was found that the simulation performance for the soil moisture of the surface layer was relatively worse than that of the deep layer. It can be inferred that the large temporal variability of soil moisture in the surface layer than in the deep layer affected the simulation results.
In this study, three statistical factors (e.g., CC, RMSE, and NSE) were used for the statistical evaluation of the performance of the two models for soil moisture prediction. With these factors, the prediction performance of the models, based on the predicted amount and tendency of soil moisture, was compared with the observed data. Table 1 shows statistical metrics for the soil moisture predicted from the two data-driven models, LSTM and ANN models. The prediction performance of the two models was found to be generally acceptable based on the statistical factors. The statistical metrics showed that the LSTM model showed relatively better predictive performance than the ANN model in all layers. Considering that the CC values were ranged from 0.80 to 0.97 for ANN and from 0.90 to 0.98 for the LSTM model, the RMSE value was lower than 2.0, and the NSE values were ranged from 0.62 to 0.94 for ANN and from 0.74 to 0.96 for the LSTM model in all layers. However, as the lead time increased, the difference in predictive performance between the two models was obviously indicated. In the case of the ANN model, the predictive performance decreased as the lead time increased, whereas the LSTM model showed no significant differences.

Monthly Based Evaluation
It is very important to understand the prediction performance on a monthly basis, as soil moisture significantly influences flooding and drought. Figures 5 and 6 show the monthly based predictive performance of data-driven models for soil moisture prediction with a lead time of 1 and 6 days for layers 1  In this study, three statistical factors (e.g., CC, RMSE, and NSE) were used for the statistical evaluation of the performance of the two models for soil moisture prediction. With these factors, the prediction performance of the models, based on the predicted amount and tendency of soil moisture, was compared with the observed data. Table 1 shows statistical metrics for the soil moisture predicted from the two data-driven models, LSTM and ANN models. The prediction performance of the two models was found to be generally acceptable based on the statistical factors. The statistical metrics showed that the LSTM model showed relatively better predictive performance than the ANN model in all layers. Considering that the CC values were ranged from 0.80 to 0.97 for ANN and from 0.90 to 0.98 for the LSTM model, the RMSE value was lower than 2.0, and the NSE values were ranged from 0.62 to 0.94 for ANN and from 0.74 to 0.96 for the LSTM model in all layers. However, as the lead time increased, the difference in predictive performance between the two models was obviously indicated. In the case of the ANN model, the predictive performance decreased as the lead time increased, whereas the LSTM model showed no significant differences.  for wet and dry seasons for various soil layers, but it was recommended that the ANN model is suitable for predicting soil moisture for only surface layer than a deep layer, and the LSTM model can provide better soil moisture predictions for both surface and deep layers.

Errors in Predicted Soil Moisture
This study investigated how many errors are inherent in the prediction results as the predictive models must be considered according to the characteristics of possible errors for each layer and each lead time. For this, this study used another evaluation metric (i.e., RE (%)) to compare the errors in soil moisture prediction results from ANN and LSTM models. Figure 7 shows box plots representing how much errors are inherent in predicted soil moisture from both models for each layer and lead time from 1 to 6 days. for wet and dry seasons for various soil layers, but it was recommended that the ANN model is suitable for predicting soil moisture for only surface layer than a deep layer, and the LSTM model can provide better soil moisture predictions for both surface and deep layers.

Errors in Predicted Soil Moisture
This study investigated how many errors are inherent in the prediction results as the predictive models must be considered according to the characteristics of possible errors for each layer and each lead time. For this, this study used another evaluation metric (i.e., RE (%)) to compare the errors in soil moisture prediction results from ANN and LSTM models. Figure 7 shows box plots representing how much errors are inherent in predicted soil moisture from both models for each layer and lead time from 1 to 6 days. In addition, these models showed sufficient potential for soil moisture prediction during the dry season (April to October) as well as the wet season. Both models provide suitable prediction performance with average CC values of 0.91 and 0.89, RMSE values of 0.51 and 0.59, and NSE values of 0.81 and 0.78 for a lead time of 1 day ( Figure 5). As shown in Figure 6, both models provided moderate prediction results, but the ANN model has worse performance for a lead time of 6 days. From the monthly based evaluation results, it was concluded that the data-driven models are sufficient for soil moisture prediction for wet and dry seasons for various soil layers, but it was recommended that the ANN model is suitable for predicting soil moisture for only surface layer than a deep layer, and the LSTM model can provide better soil moisture predictions for both surface and deep layers.

Errors in Predicted Soil Moisture
This study investigated how many errors are inherent in the prediction results as the predictive models must be considered according to the characteristics of possible errors for each layer and each lead time. For this, this study used another evaluation metric (i.e., RE (%)) to compare the errors in soil moisture prediction results from ANN and LSTM models. Figure 7 shows box plots representing how much errors are inherent in predicted soil moisture from both models for each layer and lead time from 1 to 6 days. As shown in Figure 7, it was found that the range of RE values became smaller as the layer became deeper. For example, in layer 1, the maximum RE range was −400% to 100% (lead time of t + 6), whereas in layer 4, it was found to be −100% to 75% (lead time of t + 3 and t + 4). This confirms that the ANN and LSTM models provide a lower prediction error compared to the observation for deep layers, where the temporal variability of soil mois- As shown in Figure 7, it was found that the range of RE values became smaller as the layer became deeper. For example, in layer 1, the maximum RE range was −400% to 100% (lead time of t + 6), whereas in layer 4, it was found to be −100% to 75% (lead time of t + 3 and t + 4). This confirms that the ANN and LSTM models provide a lower prediction error compared to the observation for deep layers, where the temporal variability of soil moisture is relatively small. Moreover, this result shows that the data-driven models have sufficient predictive power for soil moisture prediction for various depths from surface to deep layers. However, comparing the performance of the two models, there was a significant difference in the prediction performance, and the LSTM model clearly demonstrated better prediction results than the ANN model for most of the lead times. ANN model shows better performance for surface layers and short-term prediction. For example, ANN model has better predictive performance for lead time of 1 h and layer 1 (CC = 0.97, RMSE = 0.76, and NSE = 0.94) compared to the LSTM model (CC = 0.96, RMSE = 0.91, and NSE = 0.91). In addition, the ANN model has values of RE ranged from −100% to 80%, whereas the LSTM model has RE values that are ranged from −150% to 80%. Therefore, this study suggested that it is important to select an appropriate model for soil moisture prediction for various depths and lead times.

Discussions and Conclusions
In this study, the soil moisture at multiple layers was predicted using meteorological variables with two data-driven models (i.e., ANN and LSTM). This study has the novelty that it provides soil moisture prediction results for multiple layers instead of only a single layer, as shown in other studies. In addition, the prediction results produced from two data-driven models indicated that both models have sufficient potentials for soil moisture analysis as an alternative to the physical-based methods and support to improve the physical-based model's prediction performance. The results of this study demonstrated that both models showed acceptable prediction results, but the LSTM model showed better predictive performance than the ANN model. More specifically, the LSTM model provided high accurate prediction results with a lead time of 1 to 6 days for four layers. However, the ANN model showed better performance for short-term and surface layers than the LSTM model.

Limitations of the Data-Driven Models
Although both models showed highly accurate soil prediction results for multiple depths, there are some limitations for the prediction of soil moisture using data-driven models. First, the quality of predicted soil moisture during specific periods showed lower performance compared to the observation. Second, the predicted soil moisture at the third layer showed poor performance than the other three layers.
The main reason for these errors is uncertainties during the training process of the models. The quality and quantity of training data sets affect the performances of the models since the data-driven models predict the time series using the information learned from the data sets [43]. This study used five input data sets to predict the soil moisture for each layer, and if there is uncertainty in only one of them, it can affect the output quality. For example, missing values of observation data affect the training process of the data-driven models and model parameters, which can be transmitted as uncertainty in the validation results. Therefore, it is essential to use quality-proven data sets for model training in order to avoid the malfunction of the data-driven models.
Another reason for the errors in the predicted soil moisture is the uncertainty in the process of driving the models. The data-driven models are called the black-box model because it is difficult for users to capture the uncertainty generated during operating the models. The performances of the data-driven models are significantly influenced by parameters and model structure that are important for training. Inappropriate parameter selection is able to cause overfitting or false-learned issues, which can provide prediction results with lower accuracy. Therefore, it is essential to find optimal values such as dropout rate, various model parameters and use proper equations before the training. Although this study tried to find optimal values of some key parameters and kept them constant after the initial setting, some errors were shown in the predicted soil moisture. The effect of the parameters on model performance is out of the scope of this study, and it will be an important task for future studies.

Implications for Hydrological Analysis Using Soil Moisture
In this study, two types of data-driven models were applied to predict soil moisture at multiple depths in Eagle Lake point as a case study. The proposed models showed excellent performance, and they can be effective alternatives or supporters of the physical-based model for soil moisture prediction. It was found that the data-driven models can be effective approaches for soil moisture analysis in the area where it is difficult to observe soil moisture directly at various depths due to physical limitations. In addition, it is noteworthy that the data-driven models can collaborate with agricultural, hydrology, and environmental fields that have different purposes of soil moisture usage for each layer. It is expected that the use of the data-driven models will become valuable as the quality of forcing data is improved, and as the technology of computing systems is getting more advanced, the application of complicated data-driven models will be becoming more convenient.
This study suggests that the data-driven models are an effective alternative to the layer-by-layer soil moisture observation method, which has temporal/spatial constraints and is expensive. Moreover, the data-driven models, which have been verified for their reliability in soil moisture prediction, can be used as a reference method for improving the quality of physical models based on complex and diverse equations and methodologies. In this study, a method for predicting the soil moisture value after six days was proposed using meteorological and soil characteristic data. To improve the usability of the predicted results, in future studies, we intend to develop a method for predicting soil moisture for long-term lead time. In addition, based on the results of this study, we intend to develop a complementary method that supplements the weaknesses of both the data-driven models and physical models.