Combined Forecasting of Streamflow Based on Cross Entropy

In this study, we developed a model of combined streamflow forecasting based on cross entropy to solve the problems of streamflow complexity and random hydrological processes. First, we analyzed the streamflow data obtained from Wudaogou station on the Huifa River, which is the second tributary of the Songhua River, and found that the streamflow was characterized by fluctuations and periodicity, and it was closely related to rainfall. The proposed method involves selecting similar years based on the gray correlation degree. The forecasting results obtained by the time series model (autoregressive integrated moving average), improved grey forecasting model, and artificial neural network model (a radial basis function) were used as a single forecasting model, and from the viewpoint of the probability density, the method for determining weights was improved by using the cross entropy model. The numerical results showed that compared with the single forecasting model, the combined forecasting model improved the stability of the forecasting model, and the prediction accuracy was better than that of conventional combined forecasting models.


Introduction
Streamflow prediction is essential for the rational development and utilization of water resources.However, due to the effects of rainfall, the geographical environment, and human activities, streamflow exhibits high volatility and randomness, and thus its prediction is very complex and difficult.Previous studies of streamflow prediction have employed approaches such as time series methods [1,2], grey modeling [3], regression analysis [4], wavelet analysis [5], Markov chain modeling [6], and neural networks [7][8][9].Single methods are simple, but the impact of abnormal data or fluctuations over time may cause significant errors, so their reliability is not high.In recent years, the prediction method has been improved in many studies.In [10], the multivariate ensemble streamflow prediction was employed, where a variety of meteorological factors were used as input variables for training to obtain the prediction results.This method considers the diversity of historical data.In [11], functional linear models were introduced and applied to hydrological forecasting to forecast the whole flow curve instead of points (daily or hourly).This method is an improvement of the traditional regression model, which is more concerned with the stability of the overall prediction process.In [12], Sobol's variance-based sensitivity analysis was used to rank the model parameters based on their influence on the model results, thereby determining the optimal hydrological model parameters and predictions were then obtained.Thus, the improved methods mainly include data mining, algorithm improvement, and parameter optimization.However, forecasting methods based on historical data are highly dependent on the samples and when the data are unstable, the forecasting method cannot be guaranteed to perform accurately at all points.
Entropy 2016, 18, 336 2 of 12 In 1969, Bates and Granger proposed a combined forecasting method based on weights [13], which can combine different methods and data features to improve the accuracy of forecasting and reduce the risk of the model failing.At present, there are two main types of combined forecasting methods: in the one-step method, two or more models are integrated to obtain an improved forecasting method, such as a combined random and gray integration method, or the linking of fuzzy and neural network analysis to produce a fuzzy neural network [14], although this is not pure combined forecasting; and the other type comprises parallel methods, where the forecasting results obtained by multiple single methods are weighted according to certain rules.In this study, we consider the second type of method.In [15,16], the objective function was weighted by the minimum sum of the squares of the forecast errors, but the time characteristics of the data were not considered adequately.In [17], based on the selection of a similar day, a neural network method was used for dynamic weighting, but the neural network algorithm was not adequate and it was readily trapped by a local optimal solution, while the number of calculations was high.
The concept of entropy, which was proposed by the German physicist Clausius in 1877, is a function of the state of the system, where a reference value and the variation in entropy are often analyzed and compared.Cross entropy (CE) is a type of entropy that reflects the similarity between variables from the perspective of probability.In recent years, some studies have applied CE to the field of hydrology.Entropy theory was introduced in [18] and applied to environmental and water engineering, including water resource evaluation, variable correlation, and reliability evaluation.A new entropy-based approach was also developed for deriving a 2D velocity distribution in an open-channel flow to investigate a rectangular geometric domain and a reliable estimation [19,20].In addition, CE was introduced into combined forecasting [21,22], where a new method was proposed for determining weights to improve the stability of the prediction results.However, the probability density function used in [21] is not suitable for predicting a radial flow.In addition, a wind power load forecasting method based on the normal distribution was proposed by [22], but the time characteristics of the historical data were not considered and the solution process was complex.
In this study, we combine the autoregressive integrated moving average (ARIMA), improved grey forecasting model (GM), and an artificial neural network (a radial basis function, RBF) in a single forecasting model.Based on the time effectiveness of historical data, we develop a combined forecasting method based on CE.The Lagrange function method is used to solve the problem, which is based on the prediction error of a single method for adjusting the weights, so the predicted results are more consistent with the actual situation, thereby improving the accuracy and stability of predictions.

Analysis of Streamflow Characteristics
The Huifa River arises in the Longgang Mountains of Qingyuan Manchu Autonomous County, Fushun City, Liaoning Province, Northeast China, and it flows for 33.70 km into Jilin Meihekou City, Huinan County, and Panshi City.The Huifa River is the largest tributary of the second Songhua River.Its watershed area is 14,896 km 2 and it is located mainly in the territory of Jilin.According to data obtained at the Wudaogou hydrology station in Huadian City (catchment area of 12,391 km 2 ), the change in the amplitude of its water level is 7.69 m, the average annual streamflow is 26.4 billion m 3 , the average flow is 83.70 m 3 /s, the maximum peak discharge is 3010 m 3 /s (1975), the minimum flow rate is 0.44 m 3 /s (1979), the mean annual sediment volume is 0.48 kg/s, and the annual transportation of sediment is 121 million metric tons.
Figure 1 shows the changes in the streamflow at Wudaogou hydrological station during 1965-2010.It can be seen that the monthly streamflow usually increases from January and it generally reaches a peak in July or August, before exhibiting a declining trend until the end of the year.In addition to the extreme streamflow during the flood season in some individual years (e.g., 938 m 3 in 1995 and 1080 m 3 in 2010), the trend and the actual numerical value of the monthly streamflow are relatively strong in each year.Therefore, the selection of reasonably similar years can improve the accuracy of streamflow predictions.
streamflow are relatively strong in each year.Therefore, the selection of reasonably similar years can improve the accuracy of streamflow predictions.Previous studies (e.g., [23]) indicate that rainfall is the main source of streamflow and, thus, the rainfall and flow should have a strong correlation.In order to verify this, we show the annual streamflow and annual rainfall during 1965-2010, as well as the ratio of the two curves in Figure 2. Figure 2 shows that the trends in the annual streamflow and annual rainfall variation are highly similar, except in some individual years (1986, 1995, and 2010).We conducted a regression analysis of the data and used SPSS software (v19.0,IBM Corp., Armonk, NY, USA) to calculate the R2, which was significant at R2 = 0.921, thereby demonstrating that the streamflow and rainfall had a strong correlation.Thus, the historical rainfall data can be used as an input variable [9] to improve the accuracy of forecasting.The forecast rainfall value can also be used for streamflow forecasting in similar years, thereby making the use of historical data more scientific.Previous studies (e.g., [23]) indicate that rainfall is the main source of streamflow and, thus, the rainfall and flow should have a strong correlation.In order to verify this, we show the annual streamflow and annual rainfall during 1965-2010, as well as the ratio of the two curves in Figure 2.
streamflow are relatively strong in each year.Therefore, the selection of reasonably similar years can improve the accuracy of streamflow predictions.Previous studies (e.g., [23]) indicate that rainfall is the main source of streamflow and, thus, the rainfall and flow should have a strong correlation.In order to verify this, we show the annual streamflow and annual rainfall during 1965-2010, as well as the ratio of the two curves in Figure 2. Figure 2 shows that the trends in the annual streamflow and annual rainfall variation are highly similar, except in some individual years (1986, 1995, and 2010).We conducted a regression analysis of the data and used SPSS software (v19.0,IBM Corp., Armonk, NY, USA) to calculate the R2, which was significant at R2 = 0.921, thereby demonstrating that the streamflow and rainfall had a strong correlation.Thus, the historical rainfall data can be used as an input variable [9] to improve the accuracy of forecasting.The forecast rainfall value can also be used for streamflow forecasting in similar years, thereby making the use of historical data more scientific.Figure 2 shows that the trends in the annual streamflow and annual rainfall variation are highly similar, except in some individual years (1986, 1995, and 2010).We conducted a regression analysis of the data and used SPSS software (v19.0,IBM Corp., Armonk, NY, USA) to calculate the R2, which was significant at R2 = 0.921, thereby demonstrating that the streamflow and rainfall had a strong correlation.Thus, the historical rainfall data can be used as an input variable [9] to improve the accuracy of forecasting.The forecast rainfall value can also be used for streamflow forecasting in similar years, thereby making the use of historical data more scientific.

Data Preprocessing
Data preprocessing involves the processing of missing data and data normalization.In the proposed method, we use linear interpolation to fill in any missing data.If the streamflow value is The dimensions of the various types of data are different, so it is necessary to normalize the historical data, and the normalized processing of the streamflow data is conducted as follows: where x' is the normalized value, x is the value at a certain time, and x max is the maximum value in the sample.

Selecting Similar Years
Selecting reasonably similar years can improve the accuracy of predictions.Many methods can be employed for this purpose, including evidence theory, clustering analysis, the trend similarity method, and the grey correlation method.We use the gray correlation degree to select similar years.
The factor considered in this study is the rainfall value, so the training samples and the forecast years have highly similar characteristics.We calculate the historical date sequence: and the predicted date sequence: where the correlation degree g is the number of factors included.First, we obtain the maximum difference value and the minimum difference value, as follows: We then calculate the gray correlation degree between X 0 and Xm 0 : where ρ is the resolution coefficient, which is defined as 0.5 in this study: . If we set a threshold for r, then we can obtain the similar years based on the threshold, before using these data for modeling and calculations.

Combined Forecasting Model
The combined forecasting model comprises m single forecasting models and the relative effectiveness of a single forecasting model determined by the historical data.If the combined forecast value at time t is y t , ω it is the weight of the ith model at time t, and ŷit is the predicted value of the ith model at time t, then the problem of combined forecasting is described as follows: ω it = 1.From Equation (8), we can know that two factors influence the final results of combined forecasting: a single model and the weight of a single forecasting model.In this study, we focus on the latter.There are no uniform rules for selecting a single method, but instead we must consider the actual problem and the needs of the model.The factors considered in this study include: independence, diversity, and the accuracy of the algorithm.We use a single forecasting method to include the ARIMA time series model, GM, and the RBF.Due to limitations on the length of this report, we give no detailed introduction, but readers may refer to previous studies [12,24,25].
The ARIMA model parameters (p, q, d) are obtained from the lowest order ARIMA (1, 1, 1) model, and the minimum Akaike's information criterion is used to find the optimal parameters, where p = 2, q = 3, and d = 2 are used in the prediction model.The GM prediction model is based on the selection of similar years (see Section 3.2).In the RBF prediction model, the input variables comprise the historical streamflow and rainfall data predicted for a five-year period by network training, where the output variable predicts the annual streamflow.

The CE Model
According to the definition of entropy, a method for calculating the difference in information between two random vectors is defined as the CE.The CE model can determine the extent of the mutual support degree by assessing the degree of intersection between different information sources.In addition, the mutual support degree can be used to determine the weights of the information sources, where a greater weight represents higher mutual support [26].This is also called the Kullback-Leibler (K-L) distance.The CE of two probability distributions is expressed as D(f || g).
For the discrete case: and for the continuous case: where f and g denote the probability vector in the discrete case and the probability density function in the continuous case, respectively.The CE model quantifies the "distance" between the amounts of information.However, the K-L distance is not the real length distance, but instead it is the difference between two probability distributions.CE value should be smallest when two pdfs are identical.For the combined forecasting model based on CE, the CE model represents the support for combined forecasting.Therefore, the objective is to assign weights to two different individual methods, so the most similar result is obtained between the total predictive function and the true value.
Using the CE model should solve two major problems: establishing the probability density function and generating the CE objective function, and solving the weight coefficient by iteration.
The streamflow is treated as a sequence of discrete random variables in the forecast period.For a certain point in the sequence, the value of the streamflow at a certain prediction time is continuous, so it can be regarded as a continuous random variable.Therefore, streamflow prediction can be treated as a sequence of discrete times but continuous values.
The probability density function for predicting streamflow f (x) can be regarded as the probability density function f i (x) of the single forecasting method multiplied by the corresponding weight.According to the central limit theorem [22], if a variable is influenced by many small independent random factors, then we can treat the variable as following a normal distribution and, thus, the streamflow value at a certain time can be considered as satisfying a normal distribution.
The minimum CE is used to determine the probability distribution of the different forecasting methods, so the combined probability distribution of the streamflow is obtained.
The probability density function for method i is (i = 1,2, . . .,m): where µ i is mean value, σ i is variance.Thus, the combined probability density function of the predicted streamflow can be obtained based on the probability density function of the single prediction method: and therefore: From ( 13), the objective function of the minimum CE optimization problem is set as: Selecting the appropriate weight vector to obtain the minimum F involves determining the support for different algorithms.
The weight coefficient is derived based on the Lagrange function method.The K-L distance can be transformed into a sampling function g * (x) and f (x; ω it ) to ensure that − g * (x)ln f (x; ω it )dx reaches the minimum value, which is equivalent to the maximum value problem: max g * (x)ln f (x; ω it )dx (15) where: and I {S(x)>γ} is called the indicator function: where S(x) is also f (x; ω it ), ω 0 is the initial weight, γ is the target estimation parameter, and L represents the estimated target value of a low probability event.
Based on the idea of CE, a low probability sampling method (see [27]) is used to convert the optimization problem into the following CE problem: where N is a random number of samples.
where λ is the Lagrange multiplier.Note that: By taking the partial derivative to ω it and λ to zero, we can obtain: By substituting this into m ∑ i=1 ω it = 1, we can obtain: The expression for the weight coefficient is obtained as follows: Iterative process: A. Set t = 1; B. Set w it = w 0 , set iteration number z = 1; C. Generate sample sequence X = {X 1 , X 2 , . . . ,X N } by f (x; ω it ), and sort it from small to large, calculate S(x k ) = f (X k , ω it ) and, thus, the estimated value γ is: D. Calculate Equation ( 23) and obtain the z-th iteration result ω it (z).Set z = z + 1; E. Return to Step B to obtain γ(z), and calculate |γ(z) − γ(z − 1)|.If the results is less than a certain error ε, return to F; otherwise, return to C; F. Stop the iterations, where ω it (z) is the optimal weight and the streamflow prediction value is: G. Set t = t + 1. Assess whether t is less than or equal to T. If yes, return to step 2 to calculate some combined forecast values at other times; if not, finish the computation.
The overall forecasting process is shown in Figure 3.

Results and Analysis
In this study, using the streamflow data from 1965 to 2010, we employed the data from 1965 to 2005 as training samples and the data from 2006 to 2010 as test samples, before predicting the annual streamflow over 12 months.The performance of the single forecasting model and combined forecasting model were characterized by the root mean squared error (RMSE) and maximum relative percentage error (MRPE).

Comparison of the Results Obtained with a Single Method
The annual forecasting results for 2009-2010 are shown in Figure 4 (the streamflow value has been normalized; see Equation ( 2)) and the error analysis for these results is shown in Table 1.

Results and Analysis
In this study, using the streamflow data from 1965 to 2010, we employed the data from 1965 to 2005 as training samples and the data from 2006 to 2010 as test samples, before predicting the annual streamflow over 12 months.The performance of the single forecasting model and combined forecasting model were characterized by the root mean squared error (RMSE) and maximum relative percentage error (MRPE).

Comparison of the Results Obtained with a Single Method
The annual forecasting results for 2009-2010 are shown in Figure 4 (the streamflow value has been normalized; see Equation ( 2)) and the error analysis for these results is shown in Table 1.Based on Table 1 and Figure 4, we can give the following conclusions: (1) compared with the RMSE for a single method, the combined forecasting method was not optimal for a certain prediction point but the overall error was low compared with GM, ARIMA, and RBF, with reductions of 1.14%, 1.12%, and 0.3%, respectively.The combined forecasting method had higher accuracy; (2) the MRPE index was very high with a single method, e.g., August 2010, and there was a risk of the model failing.The combined forecasting method greatly reduced the MRPE and its predictions had greater stability; and (3) the prediction error was relatively large using a single method (e.g., in 2010) and, thus, the error of the combined forecasts was relatively large, so the accuracy of the single forecasting models affected the accuracy of the combined model.

Comparison with Other Combined Forecasting Models
In order to demonstrate the improved accuracy of the predictions obtained by the CE combined forecasting model, we performed comparisons with two other methods for combined forecasting, i.e., equal weight combined forecasting (EW) and the regression model (RM).The objective function of the RM model is: (266 ) i.e., to meet minimize the forecast error of the square in n years, where we set n as 5.The analysis of the predicted results is shown in Table 2.  Based on Table 1 and Figure 4, we can give the following conclusions: (1) compared with the RMSE for a single method, the combined forecasting method was not optimal for a certain prediction point but the overall error was low compared with GM, ARIMA, and RBF, with reductions of 1.14%, 1.12%, and 0.3%, respectively.The combined forecasting method had higher accuracy; (2) the MRPE index was very high with a single method, e.g., August 2010, and there was a risk of the model failing.The combined forecasting method greatly reduced the MRPE and its predictions had greater stability; and (3) the prediction error was relatively large using a single method (e.g., in 2010) and, thus, the error of the combined forecasts was relatively large, so the accuracy of the single forecasting models affected the accuracy of the combined model.

Comparison with Other Combined Forecasting Models
In order to demonstrate the improved accuracy of the predictions obtained by the CE combined forecasting model, we performed comparisons with two other methods for combined forecasting, i.e., equal weight combined forecasting (EW) and the regression model (RM).The objective function of the RM model is: i.e., to meet minimize the forecast error of the square in n years, where we set n as 5.The analysis of the predicted results is shown in Table 2.The comparison of the predicted results showed that the CE model performed better than the EW and RM models, where it exhibited greater stability.This is because the single method is based on a probability density function and it is the largest single method, although not a simple one.Thus, the predicted results were more suitable for combined forecasting.

Influence of the Historical Data Length on the Prediction Results
In order to compare the stability of the prediction method, we reduced the length of the historical data, as follows.Case 1: delete the data from 65-75; Case 2: delete the data from 75-85; Case 3: delete the data from 95-05.The results (averages from 06-10) obtained for CE, RM, and RBF are compared in Table 3.The comparison of the predicted results in Table 3 shows the following: (1) the historical data have a great effect on the prediction results, which is stronger closer to the forecasting date.The forecasting accuracy is significantly reduced when the data are missing close to the forecasting date (Case 3); and (2) in all cases, the prediction accuracy of the CE model is highest, which indicates that the anti-disturbance ability of the model is better and its predictions are more stable.

Conclusions
Due to the combined effects of climate, natural geography, social development, and human activities, the changes in the streamflow are complex in a basin, with random, grey, and nonlinear characteristics.In this study, we proposed a combined forecasting model based on the CE, where we analyzed the variation in streamflow and the reasonable selection of similar years.The similarity of several single methods was used as basic data in the combined forecasting model, where the weights of the combined methods were determined using the Lagrange function.The results showed that, compared with single forecasting models and other combined forecasting models, the forecasting model based on the CE could improve the accuracy and reliability of streamflow forecasting, as well as guaranteeing the accuracy of the error for a single method, thereby making the forecast results more accurate and reliable.The improved combined forecasting method was very useful for describing the variation in streamflow and it improved the stability of the predictions.These predictions can help agriculture and water conservancy departments to develop reasonable plans for the management of water resources.
In the future, we plan to make the following improvements: (1) we will improve the prediction accuracy of single forecasting methods by considering the time characteristics using data mining as well as the validity of the assessment data; (2) we will identify the factors related to streamflow in a more scientific manner; and (3) the probability density function method will be improved so the relationship between the combined forecasting model function and the single model function is more accurate, with more accurate and reliable weights.

Figure 2 .
Figure 2. Annual streamflow, annual rainfall, and the ratio of the two curves at Wudaogou station during 1965-2010.

Figure 2 .
Figure 2. Annual streamflow, annual rainfall, and the ratio of the two curves at Wudaogou station during 1965-2010.

Figure 2 .
Figure 2. Annual streamflow, annual rainfall, and the ratio of the two curves at Wudaogou station during 1965-2010.
t at time t and L t+a∆t at time t + a∆t, and the streamflow data are missing at the intermediate time t + b∆t, then the missing values can be expressed as follows: L

Table 1 .
Error analysis of the results.

Table 1 .
Error analysis of the results.

Table 2 .
Error analysis of the results.

Table 3 .
Comparison of the CE, RM, and RBF results.