A Novel Hybrid Data-Driven Model for Daily Land Surface Temperature Forecasting Using Long Short-Term Memory Neural Network Based on Ensemble Empirical Mode Decomposition

Daily land surface temperature (LST) forecasting is of great significance for application in climate-related, agricultural, eco-environmental, or industrial studies. Hybrid data-driven prediction models using Ensemble Empirical Mode Composition (EEMD) coupled with Machine Learning (ML) algorithms are useful for achieving these purposes because they can reduce the difficulty of modeling, require less history data, are easy to develop, and are less complex than physical models. In this article, a computationally simple, less data-intensive, fast and efficient novel hybrid data-driven model called the EEMD Long Short-Term Memory (LSTM) neural network, namely EEMD-LSTM, is proposed to reduce the difficulty of modeling and to improve prediction accuracy. The daily LST data series from the Mapoling and Zhijiang stations in the Dongting Lake basin, central south China, from 1 January 2014 to 31 December 2016 is used as a case study. The EEMD is firstly employed to decompose the original daily LST data series into many Intrinsic Mode Functions (IMFs) and a single residue item. Then, the Partial Autocorrelation Function (PACF) is used to obtain the number of input data sample points for LSTM models. Next, the LSTM models are constructed to predict the decompositions. All the predicted results of the decompositions are aggregated as the final daily LST. Finally, the prediction performance of the hybrid EEMD-LSTM model is assessed in terms of the Mean Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Pearson Correlation Coefficient (CC) and Nash-Sutcliffe Coefficient of Efficiency (NSCE). To validate the hybrid data-driven model, the hybrid EEMD-LSTM model is compared with the Recurrent Neural Network (RNN), LSTM and Empirical Mode Decomposition (EMD) coupled with RNN, EMD-LSTM and EEMD-RNN models, and their comparison results demonstrate that the hybrid EEMD-LSTM model performs better than the other five models. The scatterplots of the predicted results of the six models versus the original daily LST data series show that the hybrid EEMD-LSTM model is superior to the other five models. It is concluded that the proposed hybrid EEMD-LSTM model in this study is a suitable tool for temperature forecasting.

Decomposition (EEMD), which is based on the development of Empirical Mode Decomposition (EMD) [42,43]. Many hybrid methods that use a combination of EEMD and other algorithms have successfully been applied in some fields. For example, Wang et al. [44] utilized the EEMD coupled with the ARIMA for annual runoff time series forecasting. Zhang et al. [45] proposed a two-stage method that combined the EEMD with the multidimensional k-nearest neighbor model for financial time series forecasting. Niu et al. [46] applied the EEMD and the Least Square Support Vector Machine (LSSVM) base to Phase Space Reconstruction (PSR) for day-ahead PM 2.5 concentration predictions. Wang et al. [30] proposed a hybrid model that utilized the EEMD coupled with ANN for long-term runoff forecasting. Zhang et al. [31] adopted the EEMD coupled with the Elman Neural Network (ENN) for annual runoff time series forecasting. Their research results demonstrated that the EEMD coupled with other popular methods can significantly improve time series forecasting precision compared with some other popular methods.
In this paper, a hybrid data-driven model, EEMD coupled with Long Short-Term Memory (LSTM), namely the EEMD-LSTM, is proposed for daily LST data series forecasting. Thus, the EEMD is employed to decompose daily LST data series into many relatively stable Intrinsic Mode Functions (IMFs) and one residue item. Then, the PACF algorithm is used to determine the number of inputs for LSTM models. Next, the decomposed results (IMFs and residue item) are modeled and forecasted using different LSTM models. The final predicted results are obtained by aggregating all the forecasted results of LSTM models. Finally, six statistical evaluation metrics (i.e., MSE, MAE, MAPE, RMSE, CC and NSCE) are used to measure the performance of the hybrid EEMD-LSTM compared with the hybrid EMD-RNN, EMD-LSTM and EEMD-RNN models and single RNN and LSTM models. In order to test this hybrid data-driven model, the daily LST data series from the Mapoling station in the Dongting Lake basin, central China, from January 1, 2014 to December 31, 2016 are used as a case study.
The reminder of this paper is organized as follows: Section 2 describes the EMD, EEMD, LSTM and the proposed hybrid EEMD-LSTM model in detail. Section 3 provides a case study in detail. Section 4 presents the conclusions of this paper.

Empirical Mode Decomposition (EMD)
Empirical Mode Decomposition (EMD) is a self-adaptive decomposition method which is developed for nonstationary and nonlinear signal processing [43]. Unlike Singular Spectrum Analysis (SSA), Fourier Transform (FT) and Wavelet Transform (WT), EMD does not require and predetermine basis functions and can decompose the original signal into many finite oscillation time scale components called IMFs and a residual component in a self-adaptive way [47]. Each IMF stands for the information on different scales of the original signal data series and must meet the following two rules: (1) In the whole signal data series, the number of extrema must be equal to the number of zero crossing or differ by one at most; (2) At any point, the mean value of the envelope defined by the local maxima and the minima must be zero.
Giving original signal data series x(t)(t = 1, 2, . . . , n), the procedure of EMD can be described as follows: 1. Identify all the local maxima and minima of the original signal data series x(t).
2. Using the three-spline interpolation function to create the upper envelopes e up (t) and the lower envelopes e low (t) of the original signal data series.
3. Calculate mean value m(t) of the upper and lower envelopes. The mean value m(t) can be computed using the following formula: 4. Calculate the difference value d(t) between the original signal series x(t) and the mean value m(t). d(t) can be obtained through the following formula: 5. Check d(t): (a) if d(t) meets the two IMFs rules, then d(t) is defined as the ith IMF. The x(t) is replaced by the residue item r(t) = x(t) − d(t). Here, the ith IMF is represented as c i (t); (b) if d(t) does not meet the two rules, this means d(t) is not an IMF, so the x(t) is replaced by d(t).
6. Repeat steps 1 to 5, until the residue item r(t) becomes a monotone function or the number of extrema is less than one or equal to one, so that no more IMFs can be extracted. r(t) indicates the tendency of the original signal data series.
Finally, the original signal data series can be reconstructed through all the decomposition IMFs c i (t) and a residue r(t). It can be expressed as the following formula: The EMD method decomposes the original signal data series into many IMFs step-by-step from high frequency to low frequency and a trend item by self-adaptive, direct, complete, effective and approximately orthogonal, which doesn't change the information and physical characteristics of the original signal data series. For original signal data series with data length N, it can be decomposed into log 2 N IMFs at most.

Ensemble EMD (EEMD)
Although the EMD method has many apparent advantages in processing nonstationary and nonlinear signal data, there also have some unavoidable defects [42]. The majority of these problems are: (1) endpoint effects and (2) mode-mixing. Endpoint-effects means that different ways of handling endpoint-effects in the EMD decomposition process will bring different results. Because the whole process is related to extrema points, it is very important whether the endpoint is an extrema value point. When the data are relatively short, the problem becomes even more pronounced. Mode-mixing refers to the fact that the same IMF contains different frequency components, or the frequency of the same and similar scale is distributed in different IMFs. So, the mode-mixing will not only cause the mixing of various scale vibration modes but can even lose the physical meaning of the individual IMF. In order to solve these problems of the EMD algorithm, a new Noise-Assisted Data Analysis (NADA) method is developed, namely Ensemble EMD (EEMD) [42]. The main procedure of EEMD method is expressed as follows: 1. Add white noise w i (t) to the original signal data series x(t). Then the new data series can be computed as follows: 2. Afterwards, decompose the new data series into IMFs using the EMD algorithm; 3. Repeat steps 1 and 2 with different white noises, adding to the original signal data series each time; 4. Obtain the mean of the ensemble corresponding IMFs of the decomposition results as the final results.
For the EEMD method, the first important step is to determine the ensemble times and the amplitude of adding noise. If the amplitude of added white noise is too small, it will probably not play a significant role in EMD decomposition. If it is too large, it will cause more interference and affect the results of the final decomposition. However, how to select the best ensemble times and the amplitude of adding noise is still an open question. Wu and Huang [42] suggest the amplitude of adding noise to 0.2 after comparing the results of the actual signal analysis. The effect of adding white noise should obey the following statistics rule: where N is the number of ensemble times, ε represents the amplitude of the added noise and ε n is the final standard deviation of error, which is the difference between the original signal data series and the corresponding IMFs.

Long Short-Term Memory (LSTM) Neural Network
The Recurrent Neural Networks (RNNs) are improved multilayer perceptron networks and somewhat different from those of traditional ANNs [48]. They have internal connections that can pass the processed signals at the current moment to the next moment. In RNNs model, each NN unit is connected with other hidden layers at different time steps, passing previous information to the current moment and computing with the input to form the output. Through loops in the hidden layer, information can thus be passed from one step to the next in the network ( Figure 1). Because of the advantages of RNNs, the use of RNNs on many issues has achieved many incredible successes in the past few years, such as speech recognition, language modeling, translation, image captioning, and time series prediction [49][50][51].
where is the number of ensemble times, represents the amplitude of the added noise and is the final standard deviation of error, which is the difference between the original signal data series and the corresponding IMFs.

Long Short-Term Memory (LSTM) Neural Network
The Recurrent Neural Networks (RNNs) are improved multilayer perceptron networks and somewhat different from those of traditional ANNs [48]. They have internal connections that can pass the processed signals at the current moment to the next moment. In RNNs model, each NN unit is connected with other hidden layers at different time steps, passing previous information to the current moment and computing with the input to form the output. Through loops in the hidden layer, information can thus be passed from one step to the next in the network (Figure 1). Because of the advantages of RNNs, the use of RNNs on many issues has achieved many incredible successes in the past few years, such as speech recognition, language modeling, translation, image captioning, and time series prediction [49][50][51]. Obviously, RNNs are suitable and able to process the complex long-term dependency problem in a simple way. However, RNNs tend to be severely affected by the vanishing gradient problem, which may increase indefinitely and eventually lead to network collapse [52]. Thus, simple RNNs may not be ideal for predicting long-term dependencies. To avoid this problem based on RNNs, Hochreiter and Schmidhuber [53] proposed a special type of RNN, namely the Long-Term Short Memory (LSTM) recurrent neural network. They were refined and popularized by many scholars. The architecture of LSMT is shown in Figure 2. As can be seen from Figure 2, the major advantage of LSTM is that LSTM replaces traditional neuron unit in the hidden layer of RNNs with a memory block, which has one or more memory cells and three adaptive multiplications known as the input gate, forget gate and output gate controlling the information flow through the cell and the neural network. Thus, the features and advantages of LSTM can effectively alleviate the vanishing gradient problem and makes it suitable for processing complex problems with long-term dependencies. Obviously, RNNs are suitable and able to process the complex long-term dependency problem in a simple way. However, RNNs tend to be severely affected by the vanishing gradient problem, which may increase indefinitely and eventually lead to network collapse [52]. Thus, simple RNNs may not be ideal for predicting long-term dependencies. To avoid this problem based on RNNs, Hochreiter and Schmidhuber [53] proposed a special type of RNN, namely the Long-Term Short Memory (LSTM) recurrent neural network. They were refined and popularized by many scholars. The architecture of LSMT is shown in Figure 2. As can be seen from Figure 2, the major advantage of LSTM is that LSTM replaces traditional neuron unit in the hidden layer of RNNs with a memory block, which has one or more memory cells and three adaptive multiplications known as the input gate, forget gate and output gate controlling the information flow through the cell and the neural network. Thus, the features and advantages of LSTM can effectively alleviate the vanishing gradient problem and makes it suitable for processing complex problems with long-term dependencies. Figure 2 shows how the LSTM neural network works. The first step in LSTM is to determine whether information from the cell state is forgotten or remembered. This determination is made by a sigmoid layer called the forget gate layer. The output of forget gate is 0 (completely expunged) or 1 (completely retained). The calculating formula is as follows: The second step is to determine what new information needs to be stored in the cell state. This step consists of two parts. First, a sigmoid layer called the "input gate layer" determines which values are used for updating, and then, a tanh layer is used to generate a new candidate value C t , which could be added to the cell state. At last, these two are combined to create an update to the state. The calculating formulas are expressed as follows: The third step is to update the old cell state C t−1 . First, we multiply the old cell state C t−1 by f t to remove the information that we don't need, we add i t * C t to get the new candidate value, which scaled by how much we determine to update each state value. It can be calculated as the following formula: The final step is to determine the output of the model. First, we run a sigmoid layer to determinate what parts of the cell state we're going to output, and then we put the cell state through tanh function and multiply it by the output of the sigmoid gate. The calculating formulas are defined as follows: where in Equations (6)-(11), x t is the input at time t; h t−1 and h t t are the outputs of the hidden layer at time t − 1 and t, respectively; C t and C t−1 are the cell output states at time t − 1 and t, respectively; C t is the cell input state at time t. f t , i t and o t are the outputs of the forget gate, input gate and output gate at time t, respectively; W f , W i , W o and W C are the weights connecting h t−1 and x t to the forget gate, input gate, output gate and the cell input, respectively; b f , b i , b o and b C are their corresponding bias terms. σ denotes the sigmoid function 1 1+exp(−x) and tanh indicates the hyperbolic tangent function the final standard deviation of error, which is the difference between the original signal data series and the corresponding IMFs.

Long Short-Term Memory (LSTM) Neural Network
The Recurrent Neural Networks (RNNs) are improved multilayer perceptron networks and somewhat different from those of traditional ANNs [48]. They have internal connections that can pass the processed signals at the current moment to the next moment. In RNNs model, each NN unit is connected with other hidden layers at different time steps, passing previous information to the current moment and computing with the input to form the output. Through loops in the hidden layer, information can thus be passed from one step to the next in the network ( Figure 1). Because of the advantages of RNNs, the use of RNNs on many issues has achieved many incredible successes in the past few years, such as speech recognition, language modeling, translation, image captioning, and time series prediction [49][50][51]. Obviously, RNNs are suitable and able to process the complex long-term dependency problem in a simple way. However, RNNs tend to be severely affected by the vanishing gradient problem, which may increase indefinitely and eventually lead to network collapse [52]. Thus, simple RNNs may not be ideal for predicting long-term dependencies. To avoid this problem based on RNNs, Hochreiter and Schmidhuber [53] proposed a special type of RNN, namely the Long-Term Short Memory (LSTM) recurrent neural network. They were refined and popularized by many scholars. The architecture of LSMT is shown in Figure 2. As can be seen from Figure 2, the major advantage of LSTM is that LSTM replaces traditional neuron unit in the hidden layer of RNNs with a memory block, which has one or more memory cells and three adaptive multiplications known as the input gate, forget gate and output gate controlling the information flow through the cell and the neural network. Thus, the features and advantages of LSTM can effectively alleviate the vanishing gradient problem and makes it suitable for processing complex problems with long-term dependencies.

The Novel Hybrid EEMD-LSTM Data-Driven Model
Meteorological data series often shows different frequencies that can be nonstationary and nonlinear. Therefore, it is difficult to accurately model and forecast using a simple model. Thus, a hybrid model based on EEMD method and LSTM neural networks, namely EEMD-LSTM, is proposed to improve the prediction accuracy to solve and improve the long-term dependencies forecasting problem of daily LST. The EEMD method is firstly used to decompose the daily LST data series into many relatively stable IMFs and a residue item to reduce the difficulty of modeling. Then, all decomposed results are forecasted using the LSTM neural network. Finally, all of the forecasting results of decompositions are accumulated as the final predicted results. The workflow chart of the proposed hybrid EEMD-LSTM model is clearly shown in Figure 3. The main procedures of the EEMD-LSTM are as follows.
1. Daily LST data series decomposing. The original daily LST data series is decomposed into many IMFs and a residue item using the EEMD method.
2. Number of inputs determining. The PACF algorithm is used to gain the number of inputs of all the LSTM models.
3. IMFs and residue item modeling and forecasting. All the decomposition results are divided into two parts: the training data set and testing data set. The training data set is used for LSTM modeling. The testing data set is input into the trained LSTM models to predict all the IMFs and residue item. Then, many predicted IMFs and residue item results are achieved.  1. Daily LST data series decomposing. The original daily LST data series is decomposed into many IMFs and a residue item using the EEMD method.
2. Number of inputs determining. The PACF algorithm is used to gain the number of inputs of all the LSTM models.
3. IMFs and residue item modeling and forecasting. All the decomposition results are divided into two parts: the training data set and testing data set. The training data set is used for LSTM modeling. The testing data set is input into the trained LSTM models to predict all the IMFs and residue item. Then, many predicted IMFs and residue item results are achieved.
4. Final predicted results reconstructing. All the predicted results are accumulated as the final predicted results of the daily LST.

Study Area
The Dongting Lake basin is situated in the middle and lower reaches of the Yangtze River basin in the central south of China and lies approximately between the longitude of 107 • 16' E~114 • 15' E and the latitude of 24 • 38' N~30 • 24' N ( Figure 4) [54]. It can be clearly seen from the Figure 4b that the Dongting Lake basin consists of four main rivers, including the Xiangjiang river, Zishui river, Yuanshui river and Lishui river, which flows through the six provinces of Guangdong, Guangxi, Guizhou, Jiangxi, Hubei and Hunan, discharging water into the Yangtze River through the Chenglingji outlet [55]. The Dongting Lake basin has a total drainage area of 26.3 × 10 4 km 2 , accounting for 14.6% of the total drainage area of the Yangtze River basin [31]. It can be clearly seen from Figure 4c that the topography of the basin is dominated by mountains and hills and varies from mountainous and hilly areas in the south, west, southwest and east to the alluvial plains in the central, north and northeast. The basin is in a subtropical monsoon climate zone with high temperatures and high levels of rainfall in summer, as well as low temperatures and less rain in winter. The annual precipitation level is from approximately 1300 mm to 1800 mm and the annual average temperature ranges from 16 • C to 18 • C [31].

Data Collection
In this study, daily LST data from the Mapoling and Zhijiang stations were obtained from the China Meteorological Data Sharing Service System (http://data.cma.cn) during 1 January 2014 to 31 December 2016. All the daily LST data are the daily average data of the four measuring times (2:00, 8:00, 14:00, 20:00), which have undergone a series quality control by the China Meteorological Administration (CMA), including the extreme values' check and the internal consistency check. The accuracy rate of the daily LST data was generally more than 99%. The obtained daily LST data series are used to construct the hybrid EEMD-LSTM model and evaluate the model's performance. The Mapoling station is located on the lower reaches of Xiangjiang river, in Changsha city, near the Dongting Lake, while the Zhijiang station is located on the mountain areas upper reaches of Yuanshui river, in Zhijiang county. We collected 1096 daily LST observation sample points, which are included in this study. The daily LST data series is shown in Figure 5. It is clear from the figure that the daily LST data series shows fluctuation characteristics. The whole data set is separated into the training data set and the testing data set. The training data set covering 1 January 2004 to 30 June 2016 is used for constructing models, while the testing data set ranges from 1 July to 31 December 2016 is used for assessing the prediction performance of the models.

Data Collection
In this study, daily LST data from the Mapoling and Zhijiang stations were obtained from the China Meteorological Data Sharing Service System (http://data.cma.cn) during 1 January 2014 to 31 December 2016. All the daily LST data are the daily average data of the four measuring times (2:00, 8:00, 14:00, 20:00), which have undergone a series quality control by the China Meteorological Administration (CMA), including the extreme values' check and the internal consistency check. The accuracy rate of the daily LST data was generally more than 99%. The obtained daily LST data series are used to construct the hybrid EEMD-LSTM model and evaluate the model's performance. The Mapoling station is located on the lower reaches of Xiangjiang river, in Changsha city, near the Dongting Lake, while the Zhijiang station is located on the mountain areas upper reaches of Yuanshui river, in Zhijiang county. We collected 1096 daily LST observation sample points, which are included in this study. The daily LST data series is shown in Figure 5. It is clear from the figure that the daily LST data series shows fluctuation characteristics. The whole data set is separated into the training data set and the testing data set. The training data set covering 1 January 2004 to 30 June 2016 is used for constructing models, while the testing data set ranges from 1 July to 31 December 2016 is used for assessing the prediction performance of the models.
in this study. The daily LST data series is shown in Figure 5. It is clear from the figure that the daily LST data series shows fluctuation characteristics. The whole data set is separated into the training data set and the testing data set. The training data set covering 1 January 2004 to 30 June 2016 is used for constructing models, while the testing data set ranges from 1 July to 31 December 2016 is used for assessing the prediction performance of the models. The MSE is commonly used for measuring the degree of difference predicted and original data (Equation (12)). The MAE is a measure of the difference between predicted and original data (Equation (13)). The MAPE is selected for assessing the percentage deviation between predicted and original data (Equation (14)). The RMSE, as one of the most widely, frequently and commonly applied metrics, is used to measure the difference between values predicted by model and the actually observed (Equation (15)). The smaller the RMSE value is, the closer the predicted data are to the original data. The CC is a frequently and widely used indicator for measuring how well the predicted data correspond to the original data (Equation (16)). A CC equal to 0 indicates no or weak linear correlation, while a CC is closer to −1 or 1 indicates negative or positive linear correlation, respectively. The NSCE, proposed by Nash and Sutcliffe (1970), is one of the most powerful and popular evaluation indicators for assessing the power of hydro-climate models (Equation (17)). The NSCE value ranges from negative infinity and 0. An NSCE value of 1 corresponds to a perfect match of the model's predictions to the original data. An NSCE of 0 indicates the model predictions are as accurate as the mean of the original data, whereas an NSCE less than 0 indicates the model is not trustworthy. Essentially, the closer the model NSCE is to 1, the more accurate the model is.
where in Equations (12)- (17), T o i and T p i are the original and predicted daily LST data series at time i, respectively. Whereas T o and T p are the mean value of original and predicted daily LST data series at time i, respectively. n represents the number of data sample points.

Daily LST Data Series Decomposition by EEMD
EEMD is an excellent and powerful method for conducting nonstationary and nonlinear signal analysis. It decomposes the original data series into many relatively stable IMFs and one residue item. In the current study, the ensemble number is set to 1000 and the amplitude of added noise is set to 0.2 times the standard deviation of the corresponding data to decompose the daily LST data series of the two stations. Nine independent IMFs and one residue item from each station are obtained (Figures 6 and A1). As can be seen from Figures 6 and A1, IMF presents the oscillation characteristics in the order from high frequency to low frequency at various time scales and the last item is the overall trend of the original daily LST data series. Tables 1 and A1 give the statistics of the original daily LST data series and decomposition results of the two stations. It is evident that the variance and standard deviation of the original daily LST data series from Mapoling and Zhijiang stations are 67.8232 and 8.2355, and 63.8016 and 7.9876, respectively. In contrast, the variance and standard deviation of all the decomposition results (every IMF and one residue item) are much smaller than the original daily LST data series. This indicates that the decomposition results have less volatility and are closer to their mean values. The skew of the original daily LST data series and decomposition results are closer to zero, indicating that the distribution of the data is approximately symmetric. While most kurtosis of the original daily LST data series and decomposition results are much smaller, indicating that the data have less extreme values. Therefore, the EEMD can be a powerful method to decompose nonstationary and nonlinear daily LST data series into many relatively stable IMFs for improving the prediction accuracy. 6 and Figure A1). As can be seen from Figure 6 and Figure A1, IMF presents the oscillation characteristics in the order from high frequency to low frequency at various time scales and the last item is the overall trend of the original daily LST data series. Table 1 and Table A1 give the statistics of the original daily LST data series and decomposition results of the two stations. It is evident that the variance and standard deviation of the original daily LST data series from Mapoling and Zhijiang stations are 67.8232 and 8.2355, and 63.8016 and 7.9876, respectively. In contrast, the variance and standard deviation of all the decomposition results (every IMF and one residue item) are much smaller than the original daily LST data series. This indicates that the decomposition results have less volatility and are closer to their mean values. The skew of the original daily LST data series and decomposition results are closer to zero, indicating that the distribution of the data is approximately symmetric. While most kurtosis of the original daily LST data series and decomposition results are much smaller, indicating that the data have less extreme values. Therefore, the EEMD can be a powerful method to decompose nonstationary and nonlinear daily LST data series into many relatively stable IMFs for improving the prediction accuracy.

Forecasting IMFs
To improve the prediction accuracy, a four-tier layer LSTM is built to predict the daily LST data series and the decompositions (IMF1 to IMF9 and one residue item) in this study. However, the question of how to determine the appropriate number of inputs is still a key issue. In general, a common method of identifying the number of inputs is empirical. In this study, the Partial Autocorrelation Function (PACF) is used to analyze the original data and the decomposition results [56]. This is because the PACF can effectively identify the correlation between the current value and the previous values. A PACF value beyond the 95% confidence level indicates a strong correlation degree; otherwise there is a weak correlation degree. Therefore, the number of lags beyond the 95% confidence level is considered as the number of inputs. The PACF graphs of the original daily LST data series and their decomposition results of Mapoling and Zhijiang stations is shown in Figures 7 and A2, respectively. Evidently the number of inputs of LSTM models for the original daily LST data series and their decomposition results of the Mapoling station are shown as 4, 6, 5, 5, 6, 6, 1, 1, 1, 1 and 1, respectively. While 4, 7, 8, 5, 6, 7, 1, 1, 7, 1, 1 are shown for Zhijiang station. Obviously, the number of inputs of each LSTM model is different. Since, the first several IMFs have high frequencies, the current value is related to many previous values. As the frequency decreases, the IMFs become more and more stable, and the current value is related only to its former one.
question of how to determine the appropriate number of inputs is still a key issue. In general, a common method of identifying the number of inputs is empirical. In this study, the Partial Autocorrelation Function (PACF) is used to analyze the original data and the decomposition results [56]. This is because the PACF can effectively identify the correlation between the current value and the previous values. A PACF value beyond the 95% confidence level indicates a strong correlation degree; otherwise there is a weak correlation degree. Therefore, the number of lags beyond the 95% confidence level is considered as the number of inputs. The PACF graphs of the original daily LST data series and their decomposition results of Mapoling and Zhijiang stations is shown in Figure 7 and Figure A2, respectively. Evidently the number of inputs of LSTM models for the original daily LST data series and their decomposition results of the Mapoling station are shown as 4, 6, 5, 5, 6, 6, 1, 1, 1, 1 and 1, respectively. While 4, 7, 8, 5, 6, 7, 1, 1, 7, 1, 1 are shown for Zhijiang station. Obviously, the number of inputs of each LSTM model is different. Since, the first several IMFs have high frequencies, the current value is related to many previous values. As the frequency decreases, the IMFs become more and more stable, and the current value is related only to its former one. After the determination of the number of inputs of LSTM models, one-step-ahead is used to predict the results. That is, several previous data sample points are used to predict the current data point. The LSTM model consists of one input layer with several inputs which is determined by PACF before, for example, up to several previous (x t−1 , x t−2 , . . . , x t−n ) sample points of the original daily LST data series and the decomposition results are set as the model inputs; two hidden layers including 32 neurons each; and one output layer having one output, for example, x t is the current value of predicted results. Next, the LSTM model is implemented with TensorFlow which is an opensource and widely used neural network framework developed by Google [57]. In addition, the epoch for training is set to 4000 and in each training period, the MSE is employed as the loss function for determining the optimum performance results. Furthermore, the predicted results of the decomposition IMFs are obtained. Finally, all the predicted results are aggregated as the final prediction results of the daily LST data.

Performance Comparison Analysis
To understand the performance of the hybrid EEMD-LSTM model, the predicted results of the hybrid EEMD-LSTM model are compared with the RNN, LSTM, EMD-RNN, EMD-LSTM and EEMD-RNN five models. The predicted results of the six models are illustrated in Figure 8. Obviously, the six models give different forecast results of the daily LST data series of the Mapoling and Zhijiang stations. But the hybrid EMD-RNN, EMD-LSTM, EEMD-RNN and EEMD-LSTM models perform better than single RNN and LSTM models for the two stations. Furthermore, the hybrid EEMD-LSTM model has a more powerful forecasting capacity, particularly when there have sudden changes in the data series. The reason is that the original daily LST data series are characteristic with nonstationary and nonlinear. There have been lots of sudden changes in the original data series. Thus, single RNN and LSTM models can hardly catch the sudden changes in the original data series. While the EMD decomposition results exit the drawbacks of edge-effects and mode-mixing. However, EEMD has overcome these drawbacks. Therefore, the hybrid EEMD-LSTM model achieves the highest accuracy for one-step-ahead forecasting compared with the other models.   Figure 9. In general, it is obvious that the fitted lines (red line) of the predicted results of the six models are close to the 1:1 line (dot black line), which indicates that all the six models present high performance accuracy. Evidently, the RNN has the worst prediction results for the daily LST, while the LSTM obtains slightly better results than the RNN. However, the hybrid models (i.e., EMD-RNN, EMD-LSTM, EEMD-RNN and EEMD-LSTM models) perform better compared with the single RNN and LSTM models. The EEMD-LSTM model outperforms the other hybrid models with the highest coefficient of determination (R 2 ) value for the two sites.  To demonstrate the prediction capability of the EEMD-LSTM model, residual analysis is applied in this study. We calculate the residuals and normalized residuals of the two stations for original data vs. EEMD-LSTM (Figures 10 and A3). Evidently, most of the residuals are between −1 and 1, and most of the normalized residuals are between the confidence level of 95%. But there are a few residuals and normalized residuals beyond the −1 and 1, and 95% confidence level. The potential reason for this is sudden changes in the original daily LST data series. Moreover, the prediction results close to the training data set have less residuals and normalized residuals, while far from the training data set have large residuals and normalized residuals. In order to obtain high prediction results, therefore, we suggest that the time span of daily LST data series prediction should not exceed three months. Otherwise, it is recommended to retrain the EEMD-LSTM model. Furthermore, compared with Figure 8, the daily LST data series are more stationary, the residuals are smaller and the prediction results are more perfect and trustworthy.
To further assess the prediction performance of the hybrid EEMD-LSTM model, six statistical evaluation metrics (i.e., MSE, MAE, MAPE, RMSE, CC and NSCE) are utilized to measure performance. The statistical evaluation results of performance comparison of the six models for daily LST data series are shown in Figure 11. According to the comparison of the RNN, LSTM, EMD-RNN, EMD-LSTM, EEMD-RNN and EEMD-LSTM models for the Mapoling and Zhijiang stations, all the six models clearly show high performance accuracy with the CC values greater than 0.97. Meanwhile, the CC values of the six models are significant at the significance level of 0.01. This means that the prediction results of the six models significantly correlate with the original daily LST data series and have the potential to predict the daily LST. Among all the six models, it is evident that the RNN model has the worst performance results compared with the other models. The LSTM model performs slightly better than the RNN model. The reason for the poor performance of the RNN and LSTM modes is the nonstationary and nonlinear nature of the original daily LST data series. However, the hybrid EEMD-LSTM model outperforms the other models with the smallest MSE, MAE, MAPE and RMSE, as well as the largest CC and NSCE for daily LST forecasting. Furthermore, the NSCE values of the six models are close to 1. This indicates that the predicted results of the six models perfectly match the original daily LST data series and the six models are trustworthy. However, the hybrid EEMD-LSTM has the largest NSCE value, which indicates that the hybrid EEMD-LSTM model is superior to the RNN, LSTM, EMD-RNN and EMD-LSTM models and is the best suitable model for daily LST forecasting.
EEMD-LSTM from 1 July to 31 December 2016. To demonstrate the prediction capability of the EEMD-LSTM model, residual analysis is applied in this study. We calculate the residuals and normalized residuals of the two stations for original data vs. EEMD-LSTM ( Figure 10 and Figure A3). Evidently, most of the residuals are between −1 and 1, and most of the normalized residuals are between the confidence level of 95%. But there are a few residuals and normalized residuals beyond the −1 and 1, and 95% confidence level. The potential reason for this is sudden changes in the original daily LST data series. Moreover, the prediction results close to the training data set have less residuals and normalized residuals, while far from the training data set have large residuals and normalized residuals. In order to obtain high prediction results, therefore, we suggest that the time span of daily LST data series prediction should not exceed three months. Otherwise, it is recommended to retrain the EEMD-LSTM model. Furthermore, compared with Figure 8, the daily LST data series are more stationary, the residuals are smaller and the prediction results are more perfect and trustworthy. To further assess the prediction performance of the hybrid EEMD-LSTM model, six statistical evaluation metrics (i.e., MSE, MAE, MAPE, RMSE, CC and NSCE) are utilized to measure performance. The statistical evaluation results of performance comparison of the six models for daily LST data series are shown in Figure 11. According to the comparison of the RNN, LSTM, EMD-RNN, EMD-LSTM, EEMD-RNN and EEMD-LSTM models for the Mapoling and Zhijaing stations, all the six models clearly show high performance accuracy with the CC values greater than 0.97. Meanwhile, the CC values of the six models are significant at the significance level of 0.01. This means that the prediction results of the six models significantly correlate with the original daily LST data series and have the potential to predict the daily LST. Among all the six models, it is evident that the RNN model has the worst performance results compared with the other models. The LSTM model performs slightly better than the RNN model. The reason for the poor performance of the RNN and LSTM modes is the nonstationary and nonlinear nature of the original daily LST data series. However, the hybrid EEMD-LSTM model outperforms the other models with the smallest MSE, MAE, MAPE and RMSE, as well as the largest CC and NSCE for daily LST forecasting. Furthermore, the NSCE values of the six models are close to 1. This indicates that the predicted results of the six models perfectly match the original daily LST data series and the six models are trustworthy. However, the hybrid EEMD-LSTM has the largest NSCE value, which indicates that the hybrid EEMD-LSTM model is superior to the RNN, LSTM, EMD-RNN and EMD-LSTM models and is the best suitable model for daily LST forecasting. Depending on the comparison of the aforementioned six models, we can reach the conclusion that using the EEMD method to decompose the original daily LST data series to many relatively stable IMFs and one residue item as the input for LSTM models can, to a large extent, improve the prediction accuracy. Thus, the proposed EEMD-LSTM model is a better model than the RNN, LSTM, EMD-RNN, EMD-LSTM and EEMD-RNN models and can achieve better predicting results with a significant improvement on the basis of six statistical evaluation metrics for daily LST forecasting. Depending on the comparison of the aforementioned six models, we can reach the conclusion that using the EEMD method to decompose the original daily LST data series to many relatively stable IMFs and one residue item as the input for LSTM models can, to a large extent, improve the prediction accuracy. Thus, the proposed EEMD-LSTM model is a better model than the RNN, LSTM, EMD-RNN, EMD-LSTM and EEMD-RNN models and can achieve better predicting results with a significant improvement on the basis of six statistical evaluation metrics for daily LST forecasting.

Conclusions
In this study, we proposed a hybrid data-driven model based on EEMD and four-layer LSTM models to predict the daily LST data series. The daily LST data series from the Mapoling station located on the lower reaches of the Xiangjaing river and Zhijiang station located on the upper reaches of Yuanjiang river in Dongting Lake basin, central south China, from 1 January 2014 to 31 December 2016 are used as a case study. The main conclusions of this study are as follows: (1) the original daily LST data series are decomposed into nine relatively stable IMFs and one residue item using the EEMD method to reduce the difficulty of modeling and improving the prediction accuracy. Then, all the decomposition results are divided into the training data set and the testing data set. Next, the PACF algorithm is employed to choose the best number of inputs. After the best number of inputs is determined, the training data set is used to construct the LSTM models and the testing data set is used for predictions and performance comparisons. Finally, the predicted results of the decompositions are obtained and aggregated as the final prediction of the daily LST data. (2) Six statistical evaluation metrics (MSE, MAE, MAPE, RSME, CC and NSCE) are adopted to assess the performance of the RNN, LSTM, EMD-RNN, EMD-LSTM, EEMD-RNN and EEMD-LSTM models. The performance comparison of prediction results in this study shows that all the six models have high prediction accuracy. But the hybrid EEMD-LSTM model has performs better than the RNN, LSTM, EMD-RNN, EMD-LSTM and EEMD-RNN models. While, the hybrid EEMD-LSTM obtained a perfect prediction results for daily LST data series forecasting, the model needs additional future studies in other regions in mainland China. In brief, developing a hybrid data-driven forecasting model by using the LSTM coupled with EEMD algorithm may significantly improve the prediction accuracy.
Author Contributions: X.Z., Q.Z. and G.Z. conceived and designed the study; X.Z. Z.N. and Z.G. collected and preprocessed the data; X.Z. and H.Q. implemented the models and performed the experiments; X.Z., Q.Z. and G.Z. analyzed the results; X.Z. wrote the manuscript; Q.Z. and G.Z. reviewed the draft manuscript.