Combined Forecasting of Rainfall Based on Fuzzy Clustering and Cross Entropy

Rainfall is an essential index to measure drought, and it is dependent upon various parameters including geographical environment, air temperature and pressure. The nonlinear nature of climatic variables leads to problems such as poor accuracy and instability in traditional forecasting methods. In this paper, the combined forecasting method based on data mining technology and cross entropy is proposed to forecast the rainfall with full consideration of the time-effectiveness of historical data. In view of the flaws of the fuzzy clustering method which is easy to fall into local optimal solution and low speed of operation, the ant colony algorithm is adopted to overcome these shortcomings and, as a result, refine the model. The method for determining weights is also improved by using the cross entropy. Besides, the forecast is conducted by analyzing the weighted average rainfall based on Thiessen polygon in the Beijing–Tianjin–Hebei region. Since the predictive errors are calculated, the results show that improved ant colony fuzzy clustering can effectively select historical data and enhance the accuracy of prediction so that the damage caused by extreme weather events like droughts and floods can be greatly lessened and even kept at bay.


Introduction
Rainfall forecasts play an important role in agricultural production, urban industry and life.The accurate prediction of rainfall has significant economic and social value.It can provide data support for the relevant departments and help detect droughts and floods and reduce the degree of harm.However, affected by complex factors such as geographical environment, ocean currents, air pressure, temperature, etc. [1], rainfall exhibits strong randomness and nonlinear characteristics that often hamper the forecast of rainfall.
The rainfall prediction methods discussed in this paper are based on mathematical models and algorithms, through the full mining of historical data to establish the forecasting model.At present, a variety of approaches have been applied to predict rainfall at home and abroad and basically they can be classified into five categories: (1) Numerical prediction model.This model is based on the physical model of process.Its advantage lies in the fast speed and easy procedure, but due to the impact of the memory required in can only reasonably be used for monthly forecasts and for longer periods (such as an annual forecast) is difficult to use [2,3]; (2) Time series model exponential smoothing.The moving average method and Autoregressive Integrated Moving Average Model (ARIMA) [4] Entropy 2017, 19, 694 2 of 15 belong to this type; they can better describe the linear change process yet sometimes they are not suitable for non-stationary random processes; (3) Probabilistic simulation methods.These include the grey model (GM) [5] and Monte Carlo method [6].The prediction of exponential trends is more accurate with the grey model, but it is only suitable for short and medium term forecasting and the longer the forecast lasts, the larger the errors that may occur.The Monte Carlo method is characterized by describing the random process, but the support of data is also required; (4) Artificial intelligence methods such as radical basis function (RBF), genetic algorithm (GA), wavelet analysis (WA) [7][8][9][10], can better simulate nonlinear processes with higher prediction accuracy, but may fail due to issues like local optima, overlearning and weak generalization ability; (5) in addition there are some other methods, like Numerical Weather Prediction (NWP) [11,12], R/S analysis [13], trend analysis [14], etc. that can predict rainfall from different angles, but two problems should not be ignored: first, in the long-term forecast, rainfall is a random process hence any single forecasting method cannot ensure the stability during the process of prediction.Additionally, serious errors may occur at certain times resulting in the failure of prediction.Second, obtaining useful information with a single method is one-sided, and overlooks different factors from all perspectives.
For the prediction of rainfall, a large amount of historical data is necessary to ensure the accuracy of the forecast.Nonetheless, historical data inevitably contains some errors or abnormal information and this affects the accuracy of the forecast because rainfall is associated with many factors such as temperature, climate, and human activities.Therefore, the forecast method based on fuzzy clustering prediction method has been applied in recent years [15,16].However, the traditional fuzzy clustering algorithm approach easily falls into local optimal solutions, and it is difficult to deal with a large number of high-dimensional data from a time performance point of view [17].In this paper, the ant colony algorithm is proposed to improve the fuzzy clustering.With this method, the reliability and computational efficiency of data filtering and processing are greatly increased.
Bates and Granger established the combined forecasting method based on weights in 1969 [18].The approach combines the different methods and the features of data to improve the accuracy of forecasting and reduce the risk of failure.The combined forecasting method has been widely used in various fields, including electric power load forecasting, economics, logistics, etc., and facts have proven how effective the method is [19][20][21][22].Nonetheless, as a simple combination of several single methods, the previous method neglects the bias of the selection of a single method.Furthermore, there is no detailed analysis of the time characteristics of historical data.At present, some scholars are paying attention to combined forecasting in the field of rainfall forecasting [23][24][25].In Cui's study [23], the wavelet analysis method aims to determine the weight reconstruction of the rainfall forecast, yet the time distribution of historical data is not considered.In Xiong's study [24] and Lu's study [25], real-time river flow or flood forecasting methods have been studied, but they are not suitable for medium and long-term prediction.
The concept of entropy propounded by the German physicist Clausius in 1877 is a function of the state of the system, but the reference value and the variation of entropy are often analyzed and compared.Cross Entropy (CE) is a kind of entropy that reflects the similarity between variables from the perspective of probability.The application of entropy theory in hydrology mainly include the derivation of the distributions and estimation of the corresponding parameters for hydrometeorological variables [26][27][28], dependence analysis [29] and runoff forecasting [30][31][32][33][34].The cross entropy is introduced into the combination forecasting by Li et al. [24,25].Their research put forward a new method of determining weight, which improves the stability of the prediction results.However, the probability density function [24] is not suitable for the prediction of radial flow.The wind power load forecasting method based on normal distribution is proposed by Chen et al. [25].The time characteristic of historical data is not considered in this method and the solution is too complex to be implemented.
The key of the prediction method based on historical data is not only a prediction model, but also a validity of historical data.And that is the category of data mining.The choice of historical data is fundamentally a clustering process, so clustering method is very important.In terms of the weaknesses Entropy 2017, 19, 694 3 of 15 of the fuzzy clustering method, the ant colony algorithm attempts to improve the model.Meanwhile, the method for determining weights is also improved by using the cross entropy (CE).

Research Data
The Beijing-Tianjin-Hebei region is located on the east coast of Eurasia, mid-latitude coastal and inland transfer zone.Influenced by a temperate climate with alternating moist and dry seasons, the annual rainfall in this area ranges from 400 to 800 mm.The study analyses the rainfall from 1969 to 2010 in the Thiessen rainfall station and the forecast is conducted based on the data of rainfall from the Taisen Station in the Beijing-Tianjin-Hebei region.Based on the Taisen Station rainfall data, the forecast is conducted.The results show that improved ant colony fuzzy clustering can effectively select historical data and improve the accuracy of prediction.
On the basis of the data of rainfall from 26 stations, the weight is determined by the Thiessen polygon method, and the weighted average rainfall data sequence is obtained.The Beijing-Tianjin-Hebei administrative divisions and the change of monthly rainfall from 1960 to 2013 in this area are shown in Figure 1.
Entropy 2017, 19, 694 3 of 16 The key of the prediction method based on historical data is not only a prediction model, but also a validity of historical data.And that is the category of data mining.The choice of historical data is fundamentally a clustering process, so clustering method is very important.In terms of the weaknesses of the fuzzy clustering method, the ant colony algorithm attempts to improve the model.Meanwhile, the method for determining weights is also improved by using the cross entropy (CE).

Research Data
The Beijing-Tianjin-Hebei region is located on the east coast of Eurasia, mid-latitude coastal and inland transfer zone.Influenced by a temperate climate with alternating moist and dry seasons, the annual rainfall in this area ranges from 400 to 800 mm.The study analyses the rainfall from 1969 to 2010 in the Thiessen rainfall station and the forecast is conducted based on the data of rainfall from the Taisen Station in the Beijing-Tianjin-Hebei region.Based on the Taisen Station rainfall data, the forecast is conducted.The results show that improved ant colony fuzzy clustering can effectively select historical data and improve the accuracy of prediction.
On the basis of the data of rainfall from 26 stations, the weight is determined by the Thiessen polygon method, and the weighted average rainfall data sequence is obtained.The Beijing-Tianjin-Hebei administrative divisions and the change of monthly rainfall from 1960 to 2013 in this area are shown in Figure 1.There are four obvious characteristic time scales, namely, 3a, 9a, 14a and 24a, respectively, where the characteristic time scale of 3a is always present from 1960-2006 and the period oscillation is stable.What is more, with 24a time scale cycle time oscillation throughout the study period, the performance is relatively stable.In the middle of the 1960s, Beijing-Tianjin-Hebei area had experienced four dry and wet alternations: from the mid-1960s to the late 1970s, the precipitation was abundant.In the 1980s, the precipitation was relatively low.In the 1990s, the precipitation again entered an abundant period.After the 21st century, the precipitation began to decrease.The characteristics of the two feature scales, 9a and 14a, are similar.Before the mid-1970s, the oscillation of the cycle time was more obvious, and after the rich period of 1970 to 1980, there was a slight increase in the feature scale, respectively, about 10a and about 15a.
The analysis reveals that the periodic variation of rainfall is obvious.Therefore, it is important to predict the future rainfall by grasping the key information of the rainfall in the historical year and using the data mining technology to classify the rainfall-related data reasonably.There are four obvious characteristic time scales, namely, 3a, 9a, 14a and 24a, respectively, where the characteristic time scale of 3a is always present from 1960-2006 and the period oscillation is stable.What is more, with 24a time scale cycle time oscillation throughout the study period, the performance is relatively stable.In the middle of the 1960s, Beijing-Tianjin-Hebei area had experienced four dry and wet alternations: from the mid-1960s to the late 1970s, the precipitation was abundant.In the 1980s, the precipitation was relatively low.In the 1990s, the precipitation again entered an abundant period.After the 21st century, the precipitation began to decrease.The characteristics of the two feature scales, 9a and 14a, are similar.Before the mid-1970s, the oscillation of the cycle time was more obvious, and after the rich period of 1970 to 1980, there was a slight increase in the feature scale, respectively, about 10a and about 15a.
The analysis reveals that the periodic variation of rainfall is obvious.Therefore, it is important to predict the future rainfall by grasping the key information of the rainfall in the historical year and using the data mining technology to classify the rainfall-related data reasonably.

An Introduction of Ant Colony Algorithm
We take the Travelling Salesman Problem (TSP) as an example to illustrate Ant Colony (AC) Algorithm.Suppose there are m cities, d ij is the distance between city i and city j.τ ij (t) is the amount of information between city i and city j at time t.We use it to simulate the actual ant anterin, set a total of m ants, the term p ij (t) represents the probably of the k-th ant being transferred between city i and city j at time t: where U is the part of the path that the ants have searched for, and S is the set of cities that the next step of the ant k allowed to pass, a indicates the amount of information on the path to the path chosen by the ants, η ij indicates the degree of transfer expectation between city i and j.When a = 0, the algorithm is the traditional greedy algorithm; and when b = 0, it becomes a pure positive feedback heuristic algorithm.After n moments, the ants can finish all the cities and complete a cycle.In this case, the amount of information on each path is updated according to the following formula: Entropy 2017, 19, 694 where ρ ∈ (0, 1) represents the amount of information that fades with time.The information increment is expressed as: where ∆τ k ij is the amount of information left by ant k between city i and j.It can be expressed as: where Q is a constant and L k is the length of the path traveled by the ant k in this cycle.After several cycles, the calculation can be terminated according to the appropriate stop condition.

Basic Principles of Fuzzy Clustering
Among the many fuzzy clustering algorithms, the most widely used and successful is Fuzzy C-means (FCM).The FCM algorithm divides n vectors x k (k = 1, 2, . . ., n) into m fuzzy clusters and obtains the clustering center of each cluster so that the objective function is minimized.The objective function is defined as: where µ ij is the membership function, c i is the i-th clustering center, h is the fuzzy weight index.
µ ij ∈ (0, 1) and: In order to minimize the objective function, the update of the cluster center and membership function is as follows: Since the solution of a multi-constrained optimization problem is complex, the commonly used method is to fix one of the parameters to optimize the other amounts, and the solution alternates until the difference between two consecutive functions is less than a very small value (the precision requirement).
The flaw of this algorithm is that it needs to be given multiple c values for repeated calculations, and the result is usually a local optimal solution, and the computation time is large because the time required for a matrix multiplication is O(n 3 ), the time complexity of the first step of the algorithm is reached O(n 4 logn).

Improvement of Fuzzy Clustering by Ant Colony Algorithm
One of the keys to improve the speed of fuzzy clustering is to select the initial point of the membership function.If we can get the membership degree approximation result of each parameter point to each cluster, we will improve the speed of the fuzzy clustering algorithm and the ant colony algorithm can achieve this function.The basic idea is to treat the data as an ant with different attributes.The clustering center is regarded as the "food source" that the ants are looking for, so the data clustering is seen as the process of the ant looking for food sources.The specific process can be described as follows: each ant travels from each cluster center, searches for the next sample point in the entire solution space, and then starts from the cluster center and searches for another sample point in the entire solution space.When the sample point reaches the total number of the original sample points of the cluster, the ant is considered to have completed a search for a path so that the ants do not repeat the same sample point in the search for the same path and set a taboo for each ant tabu(N).If tabu(N) = 1, then the node j can choose the search sample point, when the ants selected the node j, tabu(N) will be set 0, then the ants cannot choose the node.
Assume X = {X i |X j = (x i1 , x i2 , . . ., x im )}, i = 1, 2, . . ., n} is a collection of data to be clustered, τ ij (t) is amount of information between X i and X j at time t.When all the ants have completed a path search, it is said the algorithm carried out a search cycle.In the t search period, the path selection probability can be expressed as: where S = {X s |d sj ≤ r j , s = 1, 2, . . ., N}, and the other parameters are consistent with the above.
When the i value is determined, make j from 1 to m, search the maximum p ij (t), then X i is merged into the X j field.Make C j = {X i |d ij < r j , i = 1, 2, . . ., k}, C j represents all the data sets that are merged into X j , and we find the cluster center: When the ant colony completes a search period, the probability of each parameter point attributed to a cluster is obtained according to p ij (t), and the general initial value of the fuzzy clustering membership matrix is obtained, and c j is used as the initial of fuzzy clustering center.
As the ant colony algorithm itself has a certain computational complexity, each fuzzy clustering cycle using multiple ant colony algorithm will produce over optimization phenomenon.We adopt the following strategy: in the first initial cycles, we use ant colony algorithm to determine the initial value p ij (t) (in this paper we set 4 cycles) and c j , then iterate according to Equations ( 8) and (9).When the optimization process slows down, we use the ant colony algorithm once or twice to optimize until the accuracy requirements are reached.

Method Validation
In terms of monthly rainfall, the relevant factors include historical monthly mean temperature, mean air pressure, mean humidity, season etc.Because it is the region's total rainfall forecast, we do not take into account the effects of terrain and ocean currents.
In this way, a five-dimensional vector is formed, namely monthly rainfall, monthly average temperature, monthly mean humidity, monthly mean pressure, seasonal type.Through the fuzzy clustering, the historical data will be grouped to form a database.In this way, when we predict future weather data more precisely, we can search from the historical database to find useful data for rainfall forecast.For seasonal data, we need to map the value, as shown in Table 1.In order to test the performance of the algorithm, on a computer with the processor i7, memory 4g, the membership degree matrix is simulated.The first is to generate (0, 1) random number, the second is to use ant colony optimization (that is p ij (t)).When calculating p ij (t), set ρ = 0.7, a = 1, b = 1, η = 1, τ ij (0) = 0.The results are shown in Figure 3.
clustering, the historical data will be grouped to form a database.In this way, when we predict future weather data more precisely, we can search from the historical database to find useful data for rainfall forecast.For seasonal data, we need to map the value, as shown in Table 1.In order to test the performance of the algorithm, on a computer with the processor i7, memory 4g, the membership degree matrix is simulated.The first is to generate (0, 1) random number, the second is to use ant colony optimization (that is pij(t)).When calculating pij(t), set  = 0. Figure 3a shows that the fuzzy clustering with the random number as the initial value of the membership matrix increases rapidly with the increase of the data volume, while the fuzzy clustering algorithm with pij(t) improves the number of samples with the increase of the number.Figure 3b shows that when the error rate is large, the random number and fuzzy clustering calculation using pij(t) are not very different, but when the error rate becomes smaller, the pij(t) fuzzy clustering is not changed, and the fuzzy clustering using random number calculation time is rising rapidly.Therefore, the clustering method adopted in this paper is more scientific and effective.Figure 3a shows that the fuzzy clustering with the random number as the initial value of the membership matrix increases rapidly with the increase of the data volume, while the fuzzy clustering algorithm with p ij (t) improves the number of samples with the increase of the number.Figure 3b shows that when the error rate is large, the random number and fuzzy clustering calculation using p ij (t) are not very different, but when the error rate becomes smaller, the p ij (t) fuzzy clustering is not changed, and the fuzzy clustering using random number calculation time is rising rapidly.Therefore, the clustering method adopted in this paper is more scientific and effective.

Rainfall Forecasting Model Based on CE
The historical data is huge and contains useful data.There are some abnormal data, so we have to choose specific methods according to actual situations in order to ensure the accuracy of the forecast.It is important to predict the future rainfall by grasping the key information of the rainfall in the historical year and using the data mining technology to classify the rainfall-related data reasonably; So Fuzzy Clustering by Ant Colony Algorithm is used for data clustering and the historical data are classified.

Combined Forecasting Model
The combined forecasting model comprises m single forecasting models and the relative effectiveness of a single forecasting model determined by the historical data.If the combined forecast value at time t is y t , ω ij is the weight of the i-th model at time t, and ŷit is the predicted value of the i-th model at time t, then the problem of combined forecasting is described as follows: Here two factors influence the final results of combined forecasting: a single model and the weight of a single forecasting model.In this study, we focus on the latter.
There are no uniform rules for selecting a single method, but instead we must consider the actual problem and the needs of the model.The factors considered in this study include: independence, diversity, and the accuracy of the algorithm.We use a single forecasting method to include the ARIMA time series model, GM, and the RBF.

The CE Model
According to the definition of entropy, a method for calculating the difference in information between two random vectors is defined as the CE.The CE model determines the extent of the mutual support degree by assessing the degree of intersection between different information sources.Moreover, the mutual support degree can be used to determine the weights of the information sources, where a greater weight represents higher mutual support.This is also called the Kullback-Leibler (K-L) distance.The CE of two probability distributions is expressed as D(f||g) [26,27].
For the discrete case: and for the continuous case: where D (g|| f ) represents the f to g distance, and f and g denote the probability vector in the discrete case and the probability density function in the continuous case, respectively.The CE model quantifies the "distance" between the amounts of information.However, the K-L distance is not the real length distance, but instead is the difference between two probability distributions.In this paper, g is the combined forecast function, f is the single method.CE value should be smallest when two pdf are identical.For the combined forecasting model based on CE, the CE model represents the support for combined forecasting.Therefore, the objective function is to assign weights between different single methods, so that there is the most similar case between the total predictive function and the true value.
To use the CE model, two major problems should be solved: establishing the probability density function and generating the CE objective function and solving the weight coefficient by iteration.
The rainfall is treated as a sequence of discrete random variables in the forecast period.For a certain point in the sequence, the value of the rainfall at a certain prediction time is continuous, so it can be regarded as a continuous random variable.Therefore, rainfall prediction can be treated as a sequence of discrete times but continuous values.
The probability density function for predicting rainfall f (x) can be regarded as the probability density function f i (x) of the single forecasting method multiplied by the corresponding weight.According to the central limit theorem, if a variable is the sum of many independent random factors, we can treat the variable as following a normal distribution, and thus the rainfall value at a certain time can be considered as satisfying a normal distribution.The minimum CE is used to determine the probability distribution of the different forecasting methods, so the combined probability distribution of the rainfall is obtained.
The probability density function for method i is (i = 1, 2, . . ., m): where µ i is the mean value and σ i is the variance.Thus, the combined probability density function of the predicted rainfall can be obtained based on the probability density function of the single prediction method: where ω ij is the weight of the i-th single method.
Entropy 2017, 19, 694 9 of 15 Therefore: From ( 17), the objective function of the minimum CE optimization problem is set as: Selecting the appropriate weight vector to obtain the minimum F involves determining the support for different algorithms.
The weight coefficient is derived based on the Lagrange function method.The K-L distance can be transformed into a sampling function g*(x) and f (x;ω ij ) to ensure that − g * (x) ln f (x; ω it )dx reaches the minimum value, which is equivalent to the maximum value problem: where: and I {S(x)>γ} is called the indicator function: where S(x) is also f (x;ω ij ), ω 0 is the initial weight, γ is the target estimation parameter, and L represents the estimated target value of a low probability event.
Based on the idea of CE, a low probability sampling method is used to convert the optimization problem into the following CE problem: where N is a random number of samples.
Note that m ∑ i=1 ω it = 1, and thus we can construct a Lagrange function: where λ is the Lagrange multiplier.Note that: By taking the partial derivative to ω ij and λ to zero, we can obtain: Entropy 2017, 19, 694 10 of 15 By substituting this into m ∑ i=1 ω it = 1, we can obtain: The expression for the weight coefficient is obtained as follows: Iterative process: A. Set t = 1; B. Set w it = w 0 , set iteration number z = 1; C. Generate sample sequence X = {X 1 , X 2 , . . . ,X N } by f (x; ω it ), and sort it from small to large, calculate S(x k ) = f (X k , ω it ), and thus the estimated value γ is: D. Calculate ( 27) and obtain the z-th iteration result ω it (z).Set z = z + 1; E. Return to Step B to obtain γ(z), and calculate |γ(z) − γ(z − 1)|.If the results is less than a certain error ε, return to F; otherwise, return to C; F. Stop the iterations, where ω it (z) is the optimal weight and the rainfall prediction value is G. Set t = t + 1. Assess whether t is less than or equal to T. If yes, return to step 2 to calculate some combined forecast values at other times; if not, finish the computation.
The overall forecasting process is shown in Figure 4. is the optimal weight and the rainfall prediction value is G. Set t = t + 1. Assess whether t is less than or equal to T. If yes, return to step 2 to calculate some combined forecast values at other times; if not, finish the computation.
The overall forecasting process is shown in Figure 4.

Results and Analysis
This study selects the monthly and annual rainfall data from 1960-2010 as training samples and chooses the rainfall data from 2011-2013 as test samples.Then a monthly rainfall forecast is carried on.The selected single models include ARIMA, GM, RBF, CE.The performance of the single forecasting model and combined forecasting model is characterized by Root mean squared error (RMSE) and Maximum relative percentage error (MRPE).

Predictive Stability Comparison Results
After the single algorithm is determined, the rainfall from 2011 to 2013 is predicted and the absolute error is calculated.To compare the stability of various algorithms, true value, predictive value and the absolute error trend curve (a total of 36 prediction points) are shown as follows:

Results and Analysis
This study selects the monthly and annual rainfall data from 1960-2010 as training samples and chooses the rainfall data from 2011-2013 as test samples.Then a monthly rainfall forecast is carried on.
The selected single models include ARIMA, GM, RBF, CE.The performance of the single forecasting model and combined forecasting model is characterized by Root mean squared error (RMSE) and Maximum relative percentage error (MRPE).

Predictive Stability Comparison Results
After the single algorithm is determined, the rainfall from 2011 to 2013 is predicted and the absolute error is calculated.To compare the stability of various algorithms, true value, predictive value and the absolute error trend curve (a total of 36 prediction points) are shown as follows: From Figure 5, it is clear that in a single algorithm, the absolute error of prediction is large or small, and stable results cannot be maintained over a long time scale.As a result, the reliability is not high; in combined prediction, the absolute error is relatively flat, which greatly improves the stability of prediction.The prediction results are more authentic.The error analysis of the results of the 2011-2013 is shown in Table 2.  From Table 2 and Figure 5 it can be seen that: (1) compared to a single method of RMSE, the combination of forecasting methods in a certain prediction point may not be optimal, but the overall error is small compared with GM, ARIMA, RBF of RMSE, and the error reduces.The combined forecasting method has a higher accuracy: in the average MRPE index, compared to ARIMA, GM, RBF, CE reduces error of 2.11%, 1.73%, 0.1%; in the average RMSE index, compared to ARIMA, GM, RBF, CE were reduced by 1.74%, 1.08%, 0.43%; (2) the MRPE index in a single method can be very high (MRPE of GM in 2012), and there is a risk of failure of the model.The combined forecasting method greatly reduces the maximum error, and has better prediction stability (MRPE of CE in 2012); (3) in a single method, the prediction error sometimes is large, the error of the combined forecast will be also relatively large (for example in 2012), and the accuracy of the single forecasting model has a certain effect on the accuracy of the combined model.

The Influence of Clustering Method on Prediction Results
To illustrate the effect of clustering on the accuracy of prediction, two scenarios are created: Scenario 1: Traditional c-means clustering method Scenario 2: We do not cluster historical data.
The error results are shown in Table 3 and Figure 6.From Table 2 and Figure 5 it can be seen that: (1) compared to a single method of RMSE, the combination of forecasting methods in a certain prediction point may not be optimal, but the overall error is small compared with GM, ARIMA, RBF of RMSE, and the error reduces.The combined forecasting method has a higher accuracy: in the average MRPE index, compared to ARIMA, GM, RBF, CE reduces error of 2.11%, 1.73%, 0.1%; in the average RMSE index, compared to ARIMA, GM, RBF, CE were reduced by 1.74%, 1.08%, 0.43%; (2) the MRPE index in a single method can be very high (MRPE of GM in 2012), and there is a risk of failure of the model.The combined forecasting method greatly reduces the maximum error, and has better prediction stability (MRPE of CE in 2012); (3) in a single method, the prediction error sometimes is large, the error of the combined forecast will be also relatively large (for example in 2012), and the accuracy of the single forecasting model has a certain effect on the accuracy of the combined model.

The Influence of Clustering Method on Prediction Results
To illustrate the effect of clustering on the accuracy of prediction, two scenarios are created: Scenario 1: Traditional c-means clustering method Scenario 2: We do not cluster historical data.
The error results are shown in Table 3 and Figure 6.From the comparison between Tables 2 and 3, it is easy to know that the clustering method used in this paper is consistent with the accuracy of traditional clustering method to meet the forecasting requirements.In the average MRPE index, compared to S1, S2, the clustering method used in this paper reduces the error by 0.01%, 3%; in the average RMSE index, compared to ARIMA, GM and RBF, CE were reduced by 0.05%, 1.08%, 3.52%.However, from Figure 3, the method has great advantages in calculating the speed when the data volume is large.In addition, comparing with the case of S2, we can see that there is a large deviation in the result of prediction in the case of no clustering, which indicates that the selection of historical data has important influence on the precision of prediction and it is necessary to fully excavate and classify the historical information to reach reasonable effect.From the comparison between Tables 2 and 3, it is easy to know that the clustering method used in this paper is consistent with the accuracy of traditional clustering method to meet the forecasting requirements.In the average MRPE index, compared to S1, S2, the clustering method used in this paper reduces the error by 0.01%, 3%; in the average RMSE index, compared to ARIMA, GM and RBF, CE were reduced by 0.05%, 1.08%, 3.52%.However, from Figure 3, the method has great advantages in calculating the speed when the data volume is large.In addition, comparing with the case of S2, we can see that there is a large deviation in the result of prediction in the case of no clustering, which indicates that the selection of historical data has important influence on the precision of prediction and it is necessary to fully excavate and classify the historical information to reach reasonable effect.

Conclusions
Because the meteorological conditions are stochastic, the rainfall variation is a non-stationary time series, and the accuracy of prediction and reliability of traditional single forecasting methods cannot be ensured.The combined forecasting model based on CE is proposed to solve these problems.Besides, the forecasting results of several single methods are used as the input variables for training, and the output weights are predicted by analyzing the total rainfall.The simulation results show that the combined forecasting model enhances the accuracy and reliability of the rainfall forecasting model.Additionally, clustering method is modeled and analyzed so as to improve the accuracy of prediction.The results demonstrate that the accuracy of combined forecasting can be improved by using the fuzzy clustering.The chosen method characterizes the change laws of the rainfall well and to improve the stability of the prediction.The prediction results can help agriculture, water conservancy departments to improve the ability of drought and flood disaster prevention and control.In the future, more steps can be taken for improvement: (1) the single forecasting method with higher accuracy and more suitable single forecasting method should be chosen and the accuracy of combined model prediction should be further improved to explore the rule of selecting different single methods in combined forecasting method; (2) the data in similar year should be collected to predict and historical data should be utilized more effectively so that the result is more scientific and convincing.

Figure 1 .
Figure 1.The Beijing-Tianjin-Hebei administrative divisions and change of monthly rainfall from 1960 to 2013.(a) Beijing-Tianjin-Hebei administrative divisions; (b) monthly rainfall from 1960 to 2013.

Figure 2
Figure 2 gives the results of the wavelet analysis of the data from 1960-2006.Wavelet analysis is a localized analysis of the time (space) frequency.It multiplies the signal (function) step by step through the telescopic translation operation, finally reaches the time subdivision at high frequency, subdivides the frequency at low frequency, and can automatically adapt to the data analysis.

Figure 1 .
Figure 1.The Beijing-Tianjin-Hebei administrative divisions and change of monthly rainfall from 1960 to 2013.(a) Beijing-Tianjin-Hebei administrative divisions; (b) monthly rainfall from 1960 to 2013.

Figure 2 Figure 1 .
Figure 2 gives the results of the wavelet analysis of the data from 1960-2006.Wavelet analysis is a localized analysis of the time (space) frequency.It multiplies the signal (function) step by step through the telescopic translation operation, finally reaches the time subdivision at high frequency, subdivides the frequency at low frequency, and can automatically adapt to the data analysis.

Figure 2
Figure 2 gives the results of the wavelet analysis of the data from 1960-2006.Wavelet analysis is a localized analysis of the time (space) frequency.It multiplies the signal (function) step by step through the telescopic translation operation, finally reaches the time subdivision at high frequency, subdivides the frequency at low frequency, and can automatically adapt to the data analysis.

Figure 3 .
Figure 3. Clustering calculation time.(a) calculation time when error rate is constant; (b) calculation time when data amount is constant.

Figure 3 .
Figure 3. Clustering calculation time.(a) calculation time when error rate is constant; (b) calculation time when data amount is constant.

Entropy 2017, 19 , 694 11 of 16 E
. Return to Step B to obtain (z)  , and calculate | (z) (z 1) |     .If the results is less than a certain error  , return to F; otherwise, return to C; F. Stop the iterations, where (z) it 

Table 2 .
Error analysis of the results.

Table 2 .
Error analysis of the results.

Table 3 .
Influence of clustering method on prediction results.

Table 3 .
Influence of clustering method on prediction results.