4.1. Combined Forecasting Model
The combined forecasting model comprises
m single forecasting models and the relative effectiveness of a single forecasting model determined by the historical data. If the combined forecast value at time
t is
yt,
is the weight of the
ith model at time
t, and
is the predicted value of the
ith model at time
t, then the problem of combined forecasting is described as follows:
where
. From Equation (8), we can know that two factors influence the final results of combined forecasting: a single model and the weight of a single forecasting model. In this study, we focus on the latter.
There are no uniform rules for selecting a single method, but instead we must consider the actual problem and the needs of the model. The factors considered in this study include: independence, diversity, and the accuracy of the algorithm. We use a single forecasting method to include the ARIMA time series model, GM, and the RBF. Due to limitations on the length of this report, we give no detailed introduction, but readers may refer to previous studies [
12,
24,
25].
The ARIMA model parameters (p, q, d) are obtained from the lowest order ARIMA (1, 1, 1) model, and the minimum Akaike’s information criterion is used to find the optimal parameters, where p = 2, q = 3, and d = 2 are used in the prediction model. The GM prediction model is based on the selection of similar years (see
Section 3.2). In the RBF prediction model, the input variables comprise the historical streamflow and rainfall data predicted for a five-year period by network training, where the output variable predicts the annual streamflow.
4.2. The CE Model
According to the definition of entropy, a method for calculating the difference in information between two random vectors is defined as the CE. The CE model can determine the extent of the mutual support degree by assessing the degree of intersection between different information sources. In addition, the mutual support degree can be used to determine the weights of the information sources, where a greater weight represents higher mutual support [
26]. This is also called the Kullback–Leibler (K-L) distance. The CE of two probability distributions is expressed as
D(
f || g).
For the discrete case:
and for the continuous case:
where
f and
g denote the probability vector in the discrete case and the probability density function in the continuous case, respectively.
The CE model quantifies the “distance” between the amounts of information. However, the K-L distance is not the real length distance, but instead it is the difference between two probability distributions. CE value should be smallest when two pdfs are identical. For the combined forecasting model based on CE, the CE model represents the support for combined forecasting. Therefore, the objective is to assign weights to two different individual methods, so the most similar result is obtained between the total predictive function and the true value.
Using the CE model should solve two major problems: establishing the probability density function and generating the CE objective function, and solving the weight coefficient by iteration.
The streamflow is treated as a sequence of discrete random variables in the forecast period. For a certain point in the sequence, the value of the streamflow at a certain prediction time is continuous, so it can be regarded as a continuous random variable. Therefore, streamflow prediction can be treated as a sequence of discrete times but continuous values.
The probability density function for predicting streamflow
f(
x) can be regarded as the probability density function
of the single forecasting method multiplied by the corresponding weight. According to the central limit theorem [
22], if a variable is influenced by many small independent random factors, then we can treat the variable as following a normal distribution and, thus, the streamflow value at a certain time can be considered as satisfying a normal distribution. The minimum CE is used to determine the probability distribution of the different forecasting methods, so the combined probability distribution of the streamflow is obtained.
The probability density function for method
i is (
i = 1,2,…,
m):
where
is mean value,
is variance.
Thus, the combined probability density function of the predicted streamflow can be obtained based on the probability density function of the single prediction method:
and therefore:
From (13), the objective function of the minimum CE optimization problem is set as:
s.t.
Selecting the appropriate weight vector to obtain the minimum F involves determining the support for different algorithms.
The weight coefficient is derived based on the Lagrange function method. The
K-L distance can be transformed into a sampling function
and
to ensure that
reaches the minimum value, which is equivalent to the maximum value problem:
where:
and
is called the indicator function:
where
is also
,
is the initial weight,
is the target estimation parameter, and
L represents the estimated target value of a low probability event.
Based on the idea of CE, a low probability sampling method (see [
27]) is used to convert the optimization problem into the following CE problem:
where
N is a random number of samples.
Note that
, and thus we can construct a Lagrange function:
where
is the Lagrange multiplier.
By taking the partial derivative to
and
to zero, we can obtain:
By substituting this into
, we can obtain:
The expression for the weight coefficient is obtained as follows:
Iterative process:
A. Set t = 1;
B. Set wit = w0, set iteration number z = 1;
C. Generate sample sequence
by
, and sort it from small to large, calculate
and, thus, the estimated value
is:
D. Calculate Equation (23) and obtain the z-th iteration result . Set z = z + 1;
E. Return to Step B to obtain , and calculate . If the results is less than a certain error , return to F; otherwise, return to C;
F. Stop the iterations, where
is the optimal weight and the streamflow prediction value is:
G. Set t = t + 1. Assess whether t is less than or equal to T. If yes, return to step 2 to calculate some combined forecast values at other times; if not, finish the computation.
The overall forecasting process is shown in
Figure 3.