Modeling Hourly Soil Temperature Using Deep BiLSTM Neural Network

: Soil temperature (ST) plays a key role in the processes and functions of almost all ecosystems, and is also an essential parameter for various applications such as agricultural production, geothermal development, and their utilization. Although numerous machine learning models have been used in the prediction of ST, and good results have been obtained, most of the current studies have focused on daily or monthly ST predictions, while hourly ST predictions are scarce. This paper presents a novel scheme for forecasting the hourly ST using weather forecast data. The method considers the hourly ST prediction to be the superposition of two parts, namely, the daily average ST prediction and the ST amplitude (the di ﬀ erence between the hourly ST and the daily average ST) prediction. According to the results of correlation analysis, we selected nine meteorological parameters and combined two temporal parameters as the input vectors for predicting the daily average ST. For the task of predicting the ST amplitude, seven meteorological parameters and one temporal parameter were selected as the inputs. Two submodels were constructed using a deep bidirectional long short-term memory network (BiLSTM). For the task of hourly ST prediction at ﬁve di ﬀ erent soil depths at 30 sites, which are located in 5 common climates in the United States, the results showed the method proposed in this paper performs best at all depths for 30 stations (100% of all) for the root mean square error (RMSE), 27 stations (90% of all) for the mean absolute error (MAE), and 30 stations (100% of all) for the coe ﬃ cient of determination (R 2 ), respectively. Moreover, the method adopted in this study displays a stronger ST prediction ability than the traditional methods under all climate types involved in the experiment, the hourly ST produced by it can be used as a driving parameter for high-resolution biogeochemical models, land surface models and hydrological models and can provide ideas for an analysis of other time series data. the monthly ST using a multi-layer perceptron (MLP), radial basis neural network (RBNN), generated regression network (GRNN), and multiple linear regression (MLR) at di ﬀ erent depths. The results show that the RBNN algorithm achieves the highest prediction accuracy at depths of 5 and 10 cm, and an MLR and a GRNN are better at ST prediction at depths of 50 and 100 cm.


Introduction
Among the many soil factors, soil temperature (ST) is the most important, affecting the processes and functions of almost all ecosystems [1], such as soil infiltration rates [2], soil respiration [3], soil organic matter accumulation and degradation [4], soil chemical and physical reactions [5], land surface hydrological processes, and land atmosphere interactions [6]. ST is extremely important to the growth of crops [7], affecting their germination, root growth, and the absorption of nutrients [8]. Dang [9] reported that an increase in ST will reduce the growth cycle of crops and significantly increase the yield. The ST is also an essential important parameter in a geothermal exploration [10], ground

104
The data-collection times and elevations of these stations are given in Table 1, and are classified 105 based on the Koppen climate classification method. This method involves a representation based on 106 three letters indicating the climatic conditions. The first letter indicates the major climate type, which 107 is one of the following: (A) equatorial, (B) arid, (C) warm temperature, (D) snowfall, and (E) polar 108 climates. The second and third letters indicate the precipitation conditions and temperature, 109 respectively. For example, BSk indicates that the region is located in an arid, cold temperature with  The data-collection times and elevations of these stations are given in Table 1, and are classified based on the Koppen climate classification method. This method involves a representation based on three letters indicating the climatic conditions. The first letter indicates the major climate type, which is one of the following: (A) equatorial, (B) arid, (C) warm temperature, (D) snowfall, and (E) polar climates. The second and third letters indicate the precipitation conditions and temperature, respectively. For example, BSk indicates that the region is located in an arid, cold temperature with a stepped climate. For a detailed description of the climate classification, please refer to [35]. There will be missing values in the data set, which are caused by sensor failure, memory damage, and other reasons. In order to avoid destroying the continuity of the time series data, we use MissForest algorithm [36] to interpolate missing values. This method is a nonparametric interpolation method based on random forest, which is suitable for both discrete variables and continuous variables, and can be well applied to nonlinear data. After processing missing values, the valid data of 30 stations totaled 1,577,714 groups, including the hourly weather conditions such as the temperature, dew point, relative humidity, wind speed, solar radiation, and soil temperature at 5, 10, 20, 50, and 100 cm, respectively. Based on the existing data, we further calculated the month, day of the month, hour of the day, and ST amplitude as supplementary data. The ST amplitude is a parameter defined in this paper, which is used to describe the fluctuation of hourly ST relative to the average ST of the day, it is obtained by subtracting daily average ST from each hourly ST.
We first determined the inputs of the model with a Pearson correlation coefficient [16], which can reflect the degree of correlation between the two variables x and y, based on Equation (1). It is generally believed that the absolute value of r is 0.0-0.2 indicates an extremely weak or no correlation between x and y, 0.2-0.4 indicates a weak correlation, 0.4-0.6 indicates a moderate correlation, 0.6-0.8 indicates a strong correlation, and 0.8-1.0 indicates an extremely strong correlation.
(1) Figure 2 shows the correlation coefficient between the ST amplitude and meteorological parameters for the different depths, in which the correlation coefficient greater than 0 indicates a positive correlation between x and y, and less than 0 indicates a negative correlation. The parameters appearing between -0.2 and 0.2 are month, day of the month, air temperature observed, wind direction average, wind speed maximum, relative humidity, and dew point temperature, indicating that these parameters have no correlation or have extremely weak correlation with ST amplitude for most depths.

123
We first determined the inputs of the model with a Pearson correlation coefficient [16], which (1)

145
Through the analysis of Figure 2 and Figure Table 2，in which meteorological parameters 149 are collected by sensors at intervals of one hour.  Through the analysis of Figures 2 and 3, we delete the parameter in which the absolute value of correlation coefficient is less than 0.2, and choose eight parameters as the input vector for predicting ST amplitude. For the forecasting task of daily average ST, eleven parameters are selected as input. Details of the input parameters are shown in Table 2, in which meteorological parameters are collected by sensors at intervals of one hour.

BiLSTM Networks
The structure of BiLSTM is shown in Figure 4a, and the structure of long short-term memory network (LSTM) is shown in Figure 4b. In the LSTM structure, the output result at time t is only related to time t − 1, whereas in the BiLSTM structure, the output result at time t is related to both t − 1 and t + 1, and this structure can effectively use all information of the past and future for training [37]. In [38] and [39], BiLSTM was used for time-series data processing, and it was found that BiLSTM is faster and more accurate than LSTM and standard recurrent neural networks.

166
We proposed a novel method to predict the hourly ST, which includes two sub-models: The  Figure 5, and the input 171 parameters of the two submodels are given in Table 2. Equations (2)-(5) describe the calculation process of BiLSTM, where h t is the output result of the forward layer at time t, h t is the output result of the backward layer at time t,ŷ t is the network output that combines h t and h t , f and g are nonlinear activation functions, and the goal of network training is to minimize the mean square error (MSE) function s based on the target value y t .

Integrated BiLSTM Model
We proposed a novel method to predict the hourly ST, which includes two sub-models: The daily average ST prediction model and the ST amplitude prediction model. The hourly ST estimation is the summation of the daily average ST prediction results and ST amplitude prediction results, because in the Section 2, we have defined that ST amplitude was obtained by subtracting the daily average ST from the hourly ST. A flow chart of the model is shown in Figure 5, and the input parameters of the two submodels are given in Table 2.

174
The topology of each sub-model is set to six layers, namely, an input layer with 12 features and The topology of each sub-model is set to six layers, namely, an input layer with 12 features and 1 timestep; 4 hidden layers, in which the numbers of neurons of each layer are 14, 14, 14, and 6, respectively; and an output layer with 1 neuron, the loss function of which is the MSE. In addition, an adaptive moment estimation algorithm (Adam) [40] is selected as the optimizer.

Benchmark Models
We chose six benchmark algorithms to prove the relative advantages of the proposed method, namely, three deep learning methods, i.e., LSTM, BiLSTM and deep nerual network (DNN), and three traditional machine learning methods, random forest (RF), support vector regression (SVR), and linear regression (LR). A brief introduction and the parameter selection of each algorithm are as follows.
The LSTM network is a special type of deep neural network proposed by Hochreiter and Schmidhuber [41] in 1997. Because LSTM can solve tasks with a long time-lag well, it is widely used in time-series data processing [42][43][44][45][46]. The topology of the LSTM is consistent with the subnetwork described in Section 3.2.
The topology of BiLSTM, introduced in Section 3.1, is consistent with the subnetwork described in Section 3.2.
DNN is a neural network with multiple hidden layers, which has strong nonlinear fitting ability and has been successfully applied in prediction and classification tasks [47][48][49][50]. In order to ensure fair comparison, we set the number of layers of DNN and the number of neurons in each layer to be consistent with the subnetwork described in Section 3.2, and choose rectified linear units (RELU) [51] as the activation function.
RF is an excellent algorithm used in machine learning. It has the advantages of a fast learning speed, insensitivity to noise, less overfitting, and can achieve good results without numerous parametric adjustments. Through a grid search of the search variable space, the number of decision trees was set to 10, and the tree depth was set to 20.
An SVR is one of the most important algorithms in machine learning. It can deal with high-dimensional, heterogeneous, and scarcely labeled datasets extremely efficiently, and can also successfully adapt to specific applications [52]. In our experiment, the kernel function and penalty factor are set to a radial basis function (RBF) and 1, respectively, which are determined through a grid search of the variable search space.
LR is a regression method widely used in practical applications and does not require a setting of the parameters.
The RF, SVR, and LR were built using scikit-learn, whereas the LSTM, BiLSTM, and DNN networks, based on Theano, were developed using Keras.
In the experiment, the data from each site were divided into two parts, with the first 80% of the data serving as the training set and the second 20% as the test set. When executing the ST amplitude prediction model, all input data need to be normalized to [−1, 1], and, when running the daily average ST prediction model, all input data need to be normalized to [0, 1]. The month, hour of the day, air temperature maximum, air temperature minimum, solar radiation, relative humidity minimum, relative humidity maximum, and vapor pressure are used as the inputs of the benchmark algorithm, and the ST temperature of different depths is the output of the model. When executing the benchmark algorithm, the data need to be normalized to [0, 1]. Because the performances of the integrated BiLSTM, BiLSTM, and LSTM are significantly affected by the initialization weight matrix, we ran these algorithms ten times on the same data and computed the mean of the results to ensure a fair comparison with traditional machine learning methods.

Results
In this paper, a novel integrated BiLSTM model is proposed that combines the daily average ST prediction and the ST amplitude prediction to obtain the hourly ST. Compared with the traditional machine learning model, the integrated BiLSTM exhibits obvious advantages in predicting the hourly ST for various climates.
Our results are presented in four aspects. First, the proposed algorithm was used to predict the ST under different observations, and the results were compared with those of other traditional algorithms. Second, the ST prediction performances of each algorithm under different soil depths were compared. Third, the ST prediction performances of each algorithm under different climate types were compared. Finally, the algorithm proposed in this paper was compared with the algorithms in other literatures.

Model Comparisons
The prediction performance of different models at each observation station are shown in Figures 6-8. As presented in Figure 6, the best performance is obtained by the integrated BiLSTM model, the root mean squared error (RMSE) of which at all 30 sites is lower than that of the benchmark algorithm, whereas the worst performance is obtained by the LR algorithm, the RMSE of which is the highest at all sites. According to the statistics, based on the RMSE index, the integrated BiLSTM is 4.8-18. The mean absolute error (MAE) of every model for each observation station is given in Figure 7. As can be seen, integrated BiLSTM performs better than the benchmark models at 27 observation sites (90% of the total number of stations), and based on the MAE index is 2.6-18.7%, 5.7-21.5%, 20.3-25.7%, 9.8-22.1%, 17.5-44.1%, and 42.2-65.1% more accurate than LSTM, BiLSTM, DNN, RF, SVR, and LR, respectively. At station 2215, the BiLSTM algorithm obtains the best MAE value of 1.06 • C, whereas the MAE value of the integrated BiLSTM algorithm is 1.07 • C, which is only 0.01 • C higher than that of the BiLSTM. By contrast, LSTM obtained the lowest MAE of 1.14 • C and 1.12 • C at the two stations 2031 and 2147, which are 0.06 • C and 0.02 • C lower than that of the integrated BiLSTM, respectively. In other words, the difference between the integrated BiLSTM and the best algorithm is not obvious at these three stations.   Figure 8 shows the R 2 generated by every model at each observation, and as the figure indicates, the integrated BiLSTM was the best model with the highest R 2 value at each observation station. According to statistics, based on the R 2 index, the integrated BiLSTM is 1.3-4.8%, 2.4-8.0%, 5.1-8.3%, 3.2-9.3%, 4.4-45.0%, and 23-57.4% more accurate than LSTM, BiLSTM, DNN, RF, SVR, and LR, respectively. The performance of the LSTM model is slightly better than that of the BiLSTM, and the performance of the LR model is still not ideal. We also noted that the R 2 value of the LR is less than zero at the two sites labeled 2218 and 2147, which indicates that LR model is unsuitable for processing data with a nonlinear correlation.
Although the RF model is not as good as the integrated BiLSTM, LSTM, and BiLSTM with respect to the three indicators, RMSE, MAE, and R 2 , it can be clearly seen from Figures 6-8 that favorable agreements exist between the results of the RF model and these three deep learning models, and the prediction results at multiple sites are better than DNN, which confirms the potential of using the RF model for an estimation of the hourly ST.

269
The best and worst statistical results for each model, and the identification number of the 270 observation sites that produced these results, are given in Tables 3 and 4 The best and worst statistical results for each model, and the identification number of the observation sites that produced these results, are given in Tables 3 and 4, respectively. The RMSE, MAE, and determination coefficient (R 2 ) are used as the evaluation criteria. From Tables 3 and 4 Among the deep learning technology, DNN algorithm is not as good as integrated BiLSTM, LSTM and BiLSTM for predicting hourly ST. As mentioned above, the results of DNN are sometimes even worse than the RF, as can be seen from Table 4, the worst R 2 obtained by DNN is 0.613, while the worst R 2 of RF is 0.656. The statistic performance of each model for an estimation of the hourly ST is shown in Table 5, which are the average performance of each model after 10 runs. According to Table 5, the maximum R 2 , and the minimum RMSE and MAE (the best results), with values of 0.923, 1.53 • C, and 1.22 • C, respectively, were obtained using the integrated BiLSTM. By contrast, the minimum R 2 , and the maximum RMSE and MAE values, were found to be 0.518, 3.43 • C, and 2.76 • C when using the LR. The integrated BiLSTM, BiLSTM, and LSTM perform better than the DNN, RF, SVR, and LR. DNN is inferior to LSTM-based method in processing time series data. Among the deep learning methods, the integrated BiLSTM method achieves the best prediction results for the hourly ST, whereas within the range of traditional machine learning, the random forest model achieves the best performance.  Figure 9 shows the performance of each model in estimating the hourly ST at different soil depths. With respect to RMSE, MAE, and R 2 , the integrated BiLSTM models generally show the best performance at all soil depths, and the LSTM model is the second-best prediction algorithm. Except for RF, the accuracy (based on the RMSE and MAE) of the other models increases with an increase in depth. Taking the integrated BiLSTM model as an example, the highest RMSE and MAE values (the worst results) are obtained at 5 cm, whereas the lowest values (the best results) are obtained at 100 cm. With respect to R 2 , the accuracy of the integrated BiLSTM model increases from 5 to 20 cm, and decreases from 50 to 100 cm, whereas the R 2 values of the other models decrease with an increase in depth. This result differs from the conclusions in [21] and [53], the experimental results of which show that the accuracy of the machine learning algorithm will gradually decrease at a soil depth of 10 to 100 cm. This may be related to the different models used and the different climatic conditions.

Model Performance at Different Depths
We noticed that the lowest RMSE value obtained by the integrated BiLSTM is 2.04 • C at 5 cm, which is 0.1 • C lower than that obtained by the second-best model. The difference gradually increases to 0.31 • C at 50 cm, and then decreases to 0.23 • C at 100 cm. In other words, from 5 to 50 cm, the difference between the integrated BiLSTM model and the second-best model widens, indicating that the integrated BiLSTM has potential for deeper ST prediction tasks.

Model Performance at Different Climates
The statistic performance of each algorithm for predicting the hourly ST under different climates is shown in Tables 6-8. As can be seen from these tables, the best performances were generally obtained using the integrated BiLSTM. This produced the best results for four climates (80% of the total number) for RMSE, five climates (100% of the total number) for MAE, and five climates (100% of the total number) for R 2 , respectively. The LR was the worst model, producing the minimum R 2 , and the maximum RMSE and MAE, under all climate types involved in the experiment. With respect to the RMSE, the BiLSTM performed slightly better than the integrated BiLSTM under the BWh climate.
For the five models, namely, integrated BiLSTM, LSTM, BiLSTM, DNN, and RF, the lowest RMSE and lowest MAE were obtained under the Dfa climate type, whereas the highest RMSE and MAE were generated under the BWh or Csa climate, which does not mean that these models perform better under the Dfa climate. Based on the statistics, the average soil temperature measured in the Dfa climate is 12.21 • C, whereas the average soil temperatures of Bwh and Csa are 21.68 • C and 21.72 • C, respectively. If the relative error (the ratio of the error to the true value) is calculated, we can see that the relative error of each algorithm is the highest in the Dfa climate. In other words, the performance of each model under a Dfa climate (snow areas) is not as good as that in warm or dry areas (BSk, BWh, Cfa, and Csa). It is also clear from Table 8 that the R 2 of these five models is the lowest under the Dfa climate type.

Compared with the Other Literatures
Feng [15] uses four machine learning methods to predict the half hourly ST of a maize field in northern China. The results show that the four methods can provide ideal ST prediction at all depths, and the extreme learning machine (ELM) performs best. We compared the prediction results of our proposed algorithm at 5 cm, 10 cm, and 20 cm (only these three depths are involved in both studies) with Feng's optimal prediction results. The results are shown in Table 9. Our proposed method is less accurate than Feng's when the soil depth is 5 cm, but when the soil depth is 10 cm and 20 cm, it is more accurate than the optimal method mentioned in his article.

Conclusions
Although numerous machine learning models have been used and have achieved good results for predicting the monthly or daily ST, hourly ST predictions have been scarce. This paper presented a novel scheme for forecasting the hourly ST at five different soil depths in 30 sites, located under five common climates in the United States. Differing from other studies that consider meteorological parameters as the input of the model and the ST as the output, our proposed method includes two submodels for the daily average ST prediction and ST amplitude prediction. The hourly ST was obtained by superposing the prediction results of the two submodels. Through an input-output correlation analysis, we selected nine meteorological parameters and two temporal parameters as the inputs of the daily average ST prediction model, and seven meteorological parameters and one temporal parameter as the inputs of the ST amplitude prediction model. The experimental results show that the proposed model produced the best results for 30 stations (100%) for RMSE, 27 stations (90%) for MAE, and 30 stations (100%) for R 2 , respectively. We compared the prediction results of each algorithm for the ST at different depths, and demonstrated that the integrated BiLSTM developed in this study is better than the benchmark algorithm in terms of ST prediction at the five soil depths involved in the experiment, and has greater potential than the benchmark algorithm in terms of a deeper ST prediction. By comparing the prediction results according to the climate type, it was found that the algorithm proposed in this study performs better than the benchmark algorithm under all climate types involved in the experiment, especially in warm or dry areas.
Although we verified the effectiveness of the model with hourly ST predictions, our method is also applicable to the sequence learning of other data, such as the wind speed, humidity, or temperature.
In future work, we will improve the prediction accuracy of the model for ST by enriching data sets, and adjusting the structure of neural network.