A Robust Weighted Combination Forecasting Method Based on Forecast Model Filtering and Adaptive Variable Weight Determination

Medium-and-long-term load forecasting plays an important role in energy policy implementation and electric department investment decision. Aiming to improve the robustness and accuracy of annual electric load forecasting, a robust weighted combination load forecasting method based on forecast model filtering and adaptive variable weight determination is proposed. Similar years of selection is carried out based on the similarity between the history year and the forecast year. The forecast models are filtered to select the better ones according to their comprehensive validity degrees. To determine the adaptive variable weight of the selected forecast models, the disturbance variable is introduced into Immune Algorithm-Particle Swarm Optimization (IA-PSO) and the adaptive adjustable strategy of particle search speed is established. Based on the forecast model weight determined by improved IA-PSO, the weighted combination forecast of annual electric load is obtained. The given case study illustrates the correctness and feasibility of the proposed method.


Introduction
Nowadays, the strong smart grid (SSG) is vigorously being constructed and the renewable distributed electricity generation capacity is steadily increasing.As an important basis to ensure the security and stable operation of the electric system, electric load forecasting is playing a more and more important role in the implementations of energy policies and the investment decision-making of the electric department under this background [1][2][3][4][5][6][7][8][9][10].However, medium-and-long-term load forecasting has non-linear characteristics caused by the influence of various factors (e.g., national policy, economic and social factors) [7][8][9].It makes the medium-and-long-term load forecasting much complex and uncertain.Thus, how to improve the robustness and accuracy of annual electric load forecasting is very worthy of study.
On the one hand, the forecast accuracy has reached a high level under the given sample and condition in the existing research [1][2][3][4][5][6][7][8][9][10][11][12][13], but the forecast method's robustness currently becomes the bottleneck.The main reason lies in the status that the existing methods are mostly based on the error theory.By the error theory, the fact of the unknown amount of the forecast year's true load is neglected and the accurate forecast error is difficult to obtain.Chen [14,15] pointed out that the validity degree of the forecast model can be expressed by its full and average precision.In the mathematical sense, any forecast model has its inherent attributes which can be measured by its validity rather than the result error reported by Sun et al. [16], Chen et al. [17] and Jin et al. [18].At the same time, the single forecast models should be filtered so that the better ones will be selected and the worse ones will be eliminated.Therefore, filtering the forecast models to select the better ones based on their comprehensive validity degrees can improve the robustness of the forecast method.
On the other hand, the general load forecast method mainly includes artificial neural network reported by Hernandez et al. [19] and Gofman et al. [20], regression analysis reported by Li et al. [21] and time series analysis reported by Paparoditis et al. [22].The exponential smoothing method reported by Weron et al. [23], the gray forecast method reported by Li et al. [24] based on time trend extrapolation reported by Ismail et al. [25], the clustering forecast method reported by Kodogiannis et al. [26] and multiple regression analysis method reported by Hong et al. [27] based on the load related factor analysis cannot ensure the satisfactory result in any case.In order to make full use of the advantages and the contained information of each single forecast model, combination forecast [28][29][30][31][32][33] is an effective method.The question of how to determine the weight assignment of single forecast method is a difficult point in combination forecasting.The constant weight and the variable weight are two common weight determination ways, and the variable weight has a better adaptability.Focusing on this point, a large number of research has been carried out such as mathematical programming method reported by Ma et al. [34], genetic algorithm reported by Chaturvedi et al. [2], Bayesian method reported by Niu et al. [35] and neural network method reported by Hernandez et al. [19] and Gofman et al. [20].These methods mostly have the stabile performance and the accuracy meeting the application requirements, but there are still several problems such as much complexity, slow convergence or strong status dependence.
In this paper, we propose a robust weighted combination forecasting method based on forecast model filtering and adaptive variable weight determination.Firstly, the similar years are selected from the sample history years according to the similarity between the history year and the forecast year.Secondly, the forecast models are filtered based on the comprehensive validity degree which is composed of the fitted validity degree in the history year interval and the estimated forecast validity degree in the forecast year interval.Thirdly, the improved Immune Algorithm-Particle Swarm Optimization (IA-PSO), in which the disturbance variable is introduced and the adaptive adjustable strategy of particle search speed is established, is used to determine the adaptive variable weight of the selected forecast models.Lastly, the weighted combination forecast is carried out.The flowchart of the proposed method is shown in Figure 1.

Similar Years Selection
There are a series of factors influencing the annual load.For example, in order to reflect the influence of each factor on the load forecasting result, Zhu et al. [36] have investigated an Artificial Neural Network-based approach for medium-and-long-term load forecasting.In the proposed three-layer back propagation network, seven factors are selected as inputs which include Gross Domestic Product (GDP), heavy industry production, light industry production, agriculture production, primary industry, secondary industry and tertiary industry; Wang et al. [37] have pointed out that there are mainly eight factors affecting the annual load: area GDP, primary industry GDP ratio, secondary industry GDP ratio, tertiary-industry GDP ratio, power consumption per unit of GDP, electricity price, urban per-capita income and rural per-capita income; Lei et al. [38] have analyzed the variation characteristics of the annual maximum load, annual minimum load and typical daily load based on the recent historical load data and meteorological data of Chongqing region.Then, they have studied the interrelationship between load characteristics and major influencing factors.The results show that the temperature, rainfall, holidays and festivals have a significant influence on the region power load; Liao et al. [39] have researched the current load characteristics of Changde region and main factors influencing load variation.Influencing extents of main influencing factors on regional load, respective proportions of these factors in the influences and the time periods influenced by these factors are analyzed, and the quantization analysis on the relation between load and influencing factors is performed.In conclusion, the factors considered by Wang et al. [37] are more than Zhu et al. [36], and the factors considered by Lei et al. [38] and Liao et al. [39] relate to the characteristics in short-term load forecasting.After comparison and summary, we choose the eight factors pointed out by Wang et al. [37] which are more comprehensive and reasonable than others to select the similar years in medium-and-long-term load forecasting.

3/21
The results show that the temperature, rainfall, holidays and festivals have a significant influence on the region power load; Liao et al. [39] have researched the current load characteristics of Changde region and main factors influencing load variation.Influencing extents of main influencing factors on regional load, respective proportions of these factors in the influences and the time periods influenced by these factors are analyzed, and the quantization analysis on the relation between load and influencing factors is performed.In conclusion, the factors considered by Wang et al. [37] are more than Zhu et al. [36], and the factors considered by Lei et al. [38] and Liao et al. [39] relate to the characteristics in short-term load forecasting.After comparison and summary, we choose the eight factors pointed out by Wang et al. [37] which are more comprehensive and reasonable than others to select the similar years in medium-and-long-term load forecasting.It is assumed that there are nhis history years {1, 2, …, nhis} and nfo forecast years {nhis + 1, nhis + 2, …, nhis + nfo}.The year characteristic is defined as the factor affecting the annual load and the year characteristic quantity is defined as the value of year characteristic in a history year.
If the year characteristic CHa is efficiency type, the year characteristic quantity CHQa,b which means the value of the year characteristic CHa of the history year HYb is standardized as follows: where a = 1, 2, …, nch, b = 1, 2, …, nhis.
If the year characteristic CHa is cost type, the year characteristic quantity CHQa,b which means the value of the year characteristic CHa of the history year HYb is standardized as follows: where For two individuals whose characteristics have the same dimension number, distance and similarity are usually used to measure the difference between them.The distance measures the absolute distance between two individuals in space which is directly related to the position coordinates of each individual (i.e., the numerical value of each characteristic dimension), but the similarity measures the angle between two individual vectors and reflects more difference in direction that in It is assumed that there are n his history years {1, 2, . . ., n his } and n f o forecast years {n his + 1, n his + 2, . . ., n his + n f o }.The year characteristic is defined as the factor affecting the annual load and the year characteristic quantity is defined as the value of year characteristic in a history year.
If the year characteristic CH a is efficiency type, the year characteristic quantity CHQ a,b which means the value of the year characteristic CH a of the history year HY b is standardized as follows: where a = 1, 2, . . ., n ch , b = 1, 2, . . ., n his .If the year characteristic CH a is cost type, the year characteristic quantity CHQ a,b which means the value of the year characteristic CH a of the history year HY b is standardized as follows: where a = 1, 2, . . ., n ch , b = 1, 2, . . ., n his .For two individuals whose characteristics have the same dimension number, distance and similarity are usually used to measure the difference between them.The distance measures the absolute distance between two individuals in space which is directly related to the position coordinates of each individual (i.e., the numerical value of each characteristic dimension), but the similarity measures the angle between two individual vectors and reflects more difference in direction that in difference in position [40][41][42].Therefore, similarity is more suitable to measure the difference between a history year and a forecast year.
Here, the most common cosine similarity is chosen [41,42].The cosine similarity between the history year HY b and the forecast year FY c is as follows: where b = 1, 2, . . ., n his , c = n his + 1, n his + 2, . . ., n his + n f o .Lastly, n shis (n shis < n his ) history years with the highest similarity are selected as the similar years of the forecast year FY c .

Forecast Model Validity Degree
As shown in Figure 2, the load sequence of similar years is  Energies 2016, 9, 20 4/21 difference in position [40][41][42].Therefore, similarity is more suitable to measure the difference between a history year and a forecast year.
Here, the most common cosine similarity is chosen [41,42].The cosine similarity between the history year HYb and the forecast year FYc is as follows: where b = 1, 2, …, nhis, c = nhis + 1, nhis + 2, …, nhis + nfo.Lastly, nshis (nshis < nhis) history years with the highest similarity are selected as the similar years of the forecast year FYc.

Forecast Model Validity Degree
As shown in Figure 2, the load sequence of similar years is  The fitted value relative error of FMd for the similar year e is as follows: , , e ed e d e y y RE y The forecast value relative error of FMd for the forecast year c is as follows: , , Then, the fitted precision of FMd for the similar year e is as follows: The forecast value relative error of FM d for the forecast year c is as follows: Then, the fitted precision of FM d for the similar year e is as follows: The forecast precision of FM d for the forecast year c is as follows: Lastly, the fitted validity degree of FM d is as follows: where EXPpP where EXPpP c,d q " are the expectation and the standard deviation of the forecast precision of FM d for the forecast year c, respectively.

Forecast Model Precision Estimation
The validity degree of a forecast model is defined in Equation ( 9) which can depict its credibility and is a reflection of its inherent property [16][17][18].Obviously, true value has not yet occurred and the forecast error cannot be obtained in the future forecast interval.We can only estimate the precision and the validity of a forecast model based on its inherent property.Then, the suitable forecast models are selected and the combination forecast model is put forward.

Markov Chain-Based Precision Range Estimation
As an inherent property, the forecast model precision is shown in the form of the fitted precision which is obtained through the virtual forecast for the multi-time load.Using the forecast model FM d to forecast the load of the similar year e (e = 1, 2, . . ., n shis ), we can obtain the fitted precision sequence ) . In this sequence, the expectation and the standard deviation of the fitted precision of each similar year show the property of the forecast model.As is known, randomness and discreteness appear in the fitted precision sequence.Therefore, we can use the Markov chain transition matrix [43,44] to represent the transition rule as follows: (1) The fitted precision distribution interval of FM d for the similar year e is divided into n si (n si ď n shis ) sub-intervals with equal distance as S 1 d , S 2 d , ..., S n si d , where Each fitted precision sub-interval can be considered as a fitted precision state.
(2) According to the fitted precision of FM d for the similar year e, the occurrence number of the fitted precision state S g d is OC g (OC g < n shis ).That means there are OC g times which belong to the fitted precision state S d g .The transition number from the fitted precision state S h d to S g d is TRN h,g .Thus, the transition probability of FM d from the fitted precision state S h d to S g d can be obtained as follows: According to Equation ( 11), the one-step state transition matrix of FM d is as follows: The q-step state transition matrix is as follows: (3) We construct an initial vector IV d whose elements are the occurrence numbers of each fitted precision state of FM d .Though multiplying the initial vector IV d with the q-step state transition matrix TM pqq d , the new state matrix of FM d is obtained as follows: (4) We calculate the sum of each column vector of SM d .If the column vector CVE i (i = 1, 2, . . ., n si ) has the maximum sum, the forecast precision will belong to the precision state S i d .

Cloud Model-Based Precision Estimation
Due to the various factors, the fitted precision sequence P 1 d " ) of the forecast model FM d in n shis similar history years clearly has the property of random variables.As a result, the forecast precision is uncertain in its precision range.Therefore, we can describe this uncertainty by the expectation, entropy and hyper-entropy in the precision range and make the quantitative precision estimation based on the normal cloud model [45,46] as follows: (1) Firstly, we construct a backward cloud generator.We can map the fitted precision sequence The empirical coefficient α (0 ď α ď 1) is determined by the forecast staff based on their experiences.The bigger α is, the more important FIV d is.Here α = 0.5, which means that FIV d and FOV d are equally important.
We use the mean comprehensive validity degree of n f m forecast models as the forecast model filtering threshold: If CV d ě CV, the forecast model FM d will be selected for the combination forecast, else it will be eliminated.In the applications, the filtering threshold can be adjusted according to the actual situation and forecast decision-makers' experiences.

Mathematical Description
After the forecast models filtering, n s f m (n s f m < n f m ) better forecast models are selected from n f m forecast models for combination forecast.
We assume that the weight of the selected forecast model SFM j (j = 1, 2, . . ., n s f m ) is ω j ( n s f m ř j"1 ω j " 1, 0 ď ω j ď 1).For the similar history year e (e = 1, 2, . . ., n shis ), the actual load is y 1 e and the forecast load by SFM j is y 3 e,j , so the combination forecast load of n s f m selected forecast models is as follows: The target is to achieve the minimum square sum of the combination forecast error, so the mathematical description is described as follows: , s.t.

Particle Swarm Optimization (PSO)
As a kind of stochastic optimization algorithm, particle swarm optimization (PSO) [47] is developed based on the simulation of bird-group foraging behavior.To search for the optimal solution, the individuals have to cooperate and compete with each other in PSO.
There is an m-dimensional space and an initial swarm PO composed of n pa particles as follows: For the particle PA l (l = 1, 2, . . ., n pa ), its position x l and its speed v l are expressed by two m-dimensional vectors as follows: x l " px l ,1 , x l ,1 , . . ., x l ,m q T (24) Each particle is moving in the solution space, and its direction is determined by its speed.The speed and position of the particle is continuously updated as follows: where v

‚
The momentum part w ¨vpkq l represents the trust in its current motion state where w is the inertia coefficient used to control the influence of the speed v pkq l on the speed v pk`1q l .This part provides a necessary momentum which enables the particle to carry on the inertia motion based on its speed.

‚
The individual cognitive part LF 1 ¨rand pkq 1 ¨ppbest pkq l ´xpkq l q represents the particle self-thinking behavior.This part encourages the particle to fly to the best position found by itself.

‚
The social cognitive part LF 2 ¨rand In early evolution, PSO has the advantage of fast convergence speed and simple operation, so it can be used for solving the nonlinear, non-differentiable and multi-peak complex optimization problems.But in late evolution, PSO has a significantly slower convergence speed and reaches a poor accuracy, so it is easy to fall into the local optimum.

‚
Immune memory: The immune system keeps the antibodies opposing against the invading antigen as memory cells.If the same antigen invades again, the memory cells will be activated and produce a large number of antibodies.In IA-PSO, this idea is used to preserve the excellent particle.The best position pbest pkq l searched by each particle up to now is considered as a memory cell.If the new born particles are detected not to meet the requirements, they will be replaced by the memory cells.

‚
Immune regulation: In IA-PSO, immune regulation is used for particle selection.If a particle has a strong affinity or a low concentration, it will be promoted.Otherwise, it will be demoted.Therefore, the particle diversification can be always kept.The selected probability [48,49] of the particle PA l is as follows: In Equation ( 28), PRO l1 " AF l { all ř u"1 AF u represents the selected probability determined by the affinity where AF l is the affinity of the particle PA l , PRO l2 " CON ´1 l { all ř u"1 CON ´1 u represents the selected probability determined by the concentration where CON l is the concentration of the particle PA l .χ represents the weight of PRO l 1 and 1 ´χ represents the weight of PRO l 2 (0 ď χ ď 1).

‚
Immune selection: In the immune system, vaccinating means to change several components of the antibody according to the vaccination.In IA-PSO, the group best position Gbest pkq i up to the iteration iter k can be considered as the closest one to the optimal solution.Thus, we use several components of Gbest pkq i as the vaccination to vaccinate the particles and calculate the particle fitness value for immune selection.If the particle fitness value after the vaccination is lower than its parent, the vaccination will be abolished.Otherwise, the particle will be retained.
IA-PSO, which inherits the global optimization ability of PSO and the immune information processing mechanism of IA, can improve the algorithm accuracy.But at the same time, the algorithm complexity is increased because of the introduction of the immune system.

Improved IA-PSO Based on Disturbance Variable
By introducing a disturbance variable and establishing the searching speed adaptive mechanism, we improve IA-PSO in this paper.Through this improvement, not only the diversity of particles can be ensured to avoid the local optimum, but also the accuracy and convergence speed can be increased.
To solve the mathematical problem described in Equation ( 22), the search space is set as n s f m -dimensional, the particles number is n pa , and the maximum iteration number is iter Max .The position of each particle is an n s f m -dimensional vector in which each dimensional represents the weight of a selected forecast model.After the iteration iter k (iter k = 1, 2, . . ., iter Max ), the position and speed of the particle PA l (l = 1, 2, . . ., n pa ) are as follows: pbest pkq l , gbest pkq , Gbest pkq are used to represent the best position searched by the particle PA l in the iteration iter k , the best position searched by the particle swarm in the iteration iter k , the best position searched by the particle swarm up to the iteration iter k , respectively.

Introducing Disturbance Variable into IA-PSO
In the production process of a new particle swarm, the particle position is updated according to Equation (27), and the step length is represented by Equation (26).The coefficients of the three parts in Equation ( 26) are randomly changed, but only the rules of the particle to follow the order are changed.It means that the step length only reflects the rules of the search behavior and the diversification is lacking.Group optimization should be a balance between the order following and random irrational behaving, so we introduce the disturbance term [50,51] to Equation (26).In each iteration, the particle position is updated as follows: where 0 ď rand pkq 3 ď 1 is a random number subject to uniform distribution, β pkq l > 0 is the disturbance variable of the particle PA l in the iteration iter k .The disturbance term prand pkq 3 ´0.5q¨βpkq l reflects an unpredictable random behavior.
The disturbance variable β pkq l controls the random decision-making behavior strength of the particle PA l in the iteration iter k .If β pkq l is too big, the awareness of order following will be submerged.If β pkq l is too small, the population diversity and global search ability will be reduced.Therefore, the disturbance variable of each particle should be adjusted in the algorithm operation according to its evolution speed.The adjustment can make the particle swarm with a good generalization ability and convergence speed in the evolution process.Thus β pkq l is defined as follows: is the best position found by the particle PA l in the iteration iter k´1 , iter k´1´θ , iter k´1´ρ .When the evolution begins, β l is bigger which means that the particle PA l has a step length with strong randomness.After several iterations, β l tend to β min which means that the step length randomness of the particle PA l becomes weak.
Through the introduction of the disturbance term, Equation ( 31) reflects the positive and negative sides of the particle updating decision.In Equation ( 31), the first part x pk´1q l is the original position, the second part v pkq l reflects the step length of order following, and the third part prand pkq 3 ´0.5q¨βpkq l reflects the step length of random irrational behaving.Due to the disturbance variable, the particle position updating can be ensured and a strong search desire can be kept even if the local optimum appears when compared with Equation ( 26).As a result, the premature convergence problem can be overcome, the local best solution can be prevented and the algorithm accuracy can be improved.

Establishing the Adaptive Adjustable Strategy of Particle Searching Speed
In the particle searching process, the searching speed should be adaptively adjusted to accelerate the convergence based on the diversity of particles.For the excellent particles, their searching speeds should be decreased to make them quickly be close to the global best solution, and then the convergence can be accelerated.For the poor particles, their searching speed should be adjusted according to the convergence degree of the particle swarm: if the swarm individuals tend to be dispersed, the searching speed should be reduced and the swarm development ability should be enhanced to strengthen the local optimization; if the swarm individuals tend to be converged (the algorithm falls into local optimum), the searching speed should be increased and the swarm detection ability should be enhanced to effectively jump out of the local optimum and achieve the accelerated convergence [42,43].
In iteration iter k , the fitness of the particle PA l is PF pkq l , the fitness of the best particle is PF If the particle is more excellent, it will have a lower searching speed.As a result, the local optimum ability is strengthened and the convergence is accelerated.
where η 1 , η 2 > η 1 , η 2 > 0 and η 1 is used to control the upper limit of v pkq new,l .If η 1 is bigger, the upper limit of v pkq new,l will be bigger.Here we choose η 1 = η 2 = 4. ∆ pkq ě 0, so v pkq new,l P r0.5 ¨vpLq i , 1.3 ¨vpLq i s.In the searching process, v pkq new,l of these particles is dynamically and adaptively adjusted according to the value of ∆ pkq : if the individuals tend to be dispersed (∆ pkq becomes bigger), v pkq new,l will be reduced and the swarm development ability will be enhanced to strengthen the local optimization; if the individuals tend to be converged (∆ pkq becomes smaller), v pkq new,l will be increased and the swarm detection ability will be enhanced to effectively jump out of the local optimum.

Implementation Steps of Forecast Model Weight Determination Based on Improved IA-PSO
The flowchart of forecast model weight determination (FMWD) based on improved IA-PSO is shown in Figure 3. (2) As the general individuals of the swarm, both the local optimum ability and the global optimum ability of these particles are good.Therefore, we don't adjust their searching speeds. (3) These particles are the relatively poor individuals in the swarm.The searching speed is adjusted as follows: where 1 2 , ,   > 0 and 1  is used to control the upper limit of ( ) new , k l v will be bigger.Here we choose In the searching process,

Implementation Steps of Forecast Model Weight Determination Based on Improved IA-PSO
The flowchart of forecast model weight determination (FMWD) based on improved IA-PSO is shown in Figure 3. Step 2 Step 3 Select n pa particles again based on the affinity and concentration of antibody and antigen Step 4 Step 5 Step 6 Step 7 Step 8 The implementation steps are as follows: Step 1: Initialization.It is assumed that the elements of the particle position vector x all belong to the interval [0, 1], the elements of the particle speed vector v all belong to the interval [´v max , v max ], the maximum iteration number is iter max and the initial value of the iteration number k is k = 0. n pa particles are randomly generated.The particle PA l (l = 1, 2, . . ., n pa ) has the position x p0q l and the flying speed v p0q l .
Step 2: Calculate pbest pkq l , gbest pkq l , Gbest pkq l and the fitness of the particle PA l .Here, the fitness of the particle PA l can be represented by the target function in Equation ( 22) as follows: Step 3: Obtain the new generation of particles.The particle speed is adjusted based on the adaptive adjustable mechanism and the new position x pk`1q l and new speed v pk`1q l can be obtained according to Equations ( 26) and (31).The elements of the new speed v pk`1q l must belong to the interval [´v max , v max ].
Step 4: Check the new generation of particles.If the position of a new particle is an infeasible solution (one or more elements of the new position x pk`1q l don't belong to the interval [0, 1]), this particle will be replaced with the memory particle pbest pkq l .In addition, another n apa particles meeting the requirements will be randomly added.According to Equation (28), n pa particles are selected from the n pa + n apa particles based on the affinity and concentration of antibody and antigen.
Step 5: Vaccinate.One particle is randomly selected from n pa new particles.Then, one element is randomly selected from the front t ´1 elements of Gbest pkq l to exchange with the selected particle at the corresponding element.The t th element of the selected particle is calculated as follows: Hence the vaccination is finished.
Step 6: Immune selection.The particle fitness after the vaccination is calculated.If the particle fitness value after the vaccination is lower than its parent, the vaccination will be abolished.Else, the particle will be retained.
Step 7: Looping execution of Steps 4 and 5 for r times (r times vaccination).A new generation of n pa particles is obtained.
Step 8: Judge whether the algorithm should be stopped.The stopping of algorithm is usually determined by the maximum iteration number or the precision.If the algorithm meets its stopping condition, the optimization will be stopped.Else, k = k + 1 and go back to Step 2 to continue.

Weighted Combination Forecast
Based on the improved IA-PSO, the weight ω j of the selected forecast model SFM j (j = 1, 2, . . ., n s f m ) is obtained where For the selected forecast model SFM j , y 2 c,j is the forecast value of the forecast year c (c = n his + 1, n his + 2, . . ., n his + n f o ).So the weighted combination forecast (WCF) value of the forecast year c is as follows:

Case Study
We use the proposed method to forecast a Chinese province's load of 2005 based on its loads of the history years from 1998 to 2004.The year characteristic quantity data from 1998 to 2005 is shown in Table 1.
The cosine similarity between the history years 1998-2004 and the forecast year 2005 is shown in Table 2.According to forecasting decision-makers' experiences, we choose the threshold value of similar years selection as 96%.Therefore, the five history years (1998 and 2001-2004) with highest similarity are selected as the similar years of the forecast year 2005.
The power consumption of the province in 1998 and 2001-2005 is shown in Table 3.The forecast values by eleven forecast models are shown in Table 4.By the validity degree calculation method based on Markov chain and cloud model, the comprehensive validity degrees of the eleven forecast models are obtained as shown in Table 5.
The forecast model filtering threshold of the eleven forecast models is cvd " 81.46%.Therefore, the forecast models FM 4 , FM 5 , FM 8 , FM 9 , FM 11 are selected for the combination forecast and the others are eliminated.Respectively, we use PSO, IA-PSO and improved IA-PSO for the forecast model weight determination.There are five selected forecast models SFM 1 (FM 4 : Power function model), SFM 2 (FM 5 : S-curve model), SFM 3 (FM 8 : Cubic curve model), SFM 4 (FM 9 : Artificial neural network method) and SFM 5 (FM 11 : Grey system method).The parameters are set as follows: n pa = 100, n s f m = 5, v max = 1, iter max = 1000, w = 0.6, LF 1 = LF 2 = 2, n apa = 30, r = 25, β min = 0.001, η 1 = η 2 = 4.The results of forecast model weights determination (FMWD) by PSO, IA-PSO and improved IA-PSO, which are abbreviated as FMWD-PSO, FMWD-IA-PSO, FMWD-improved-IA-PSO, are shown in Table 6.Using the forecast model weights shown in Table 6, the results of weighted combination forecast (WCF) based on FMWD-PSO, FMWD-IA-PSO and FMWD-improved-IA-PSO, which are abbreviated as WCF-FMWD-PSO, WCF-FMWD-IA-PSO and WCF-FMWD-improved-IA-PSO, are shown in Table 7. Four synthesized forecast methods reported by Kang et al. [55] are as follows: (1) Equal weight method: the weights of forecast models are equal, so the combination forecast value of the forecast year c is as follows: where c = n his + 1, n his + 2, . . ., n his + n f o , d = 1, 2, . . ., n f m .It is a simple combination forecast method, and both the precision of single forecast model and the relationship between different forecast models are considered.
(2) Variance analysis method: the combination forecast value of the forecast year c is as follows: where c = n his + 1, n his + 2, . . ., n his + n f o , d = 1, 2, . . ., n f m .All forecast models are independent of each other, so the variance of combination forecast can be expressed as follows: Var " where c = n his + 1, n his + 2, . . ., n his + n f o , d = 1, 2, . . ., n f m , δ dd is the variance of the forecast model FM d .To obtain the minimum value of Varon ω d , the Lagrange multiplier method is used and the weight of FM d is defined as follows: (3) Optimum fitting method The forecast model weight determination of the optimum fitting method is based on the deviations of all single forecast models and the complementarily between different forecast models.The deviations of the forecast model FM d can be expressed as follows: Therefore, the weight of FM d is defined as follows: (4) Optimum forecast method In this method, n f m forecast models are used to carry out the forecasting respectively, and then these models are compared according to standard deviations, fitting goodness, correlation degree or relative error et al.Lastly the best one is chosen as the final forecast model.
Percentage error (PE) and mean absolute percentage error (MAPE) are used as evaluating indicators to compare the proposed method (WCF-FMWD-improved-IA-PSO) to WCF-FMWD-PSO, WCF-FMWD-IA-PSO, the single forecast models (SFM 1 -SFM 5 ) and the four synthesized forecast methods [55] (equal weight method, variance analysis method, optimum fitting method and optimum forecast method).They are shown in Table 8.From the comparison of PE and MAPE of the proposed method (WCF-FMWD-improved-IA-PSO) and others shown in Figures 4 and 5 and Table 8, we can see the follows:

‚
The maximum and minimum PE of the single forecast models (SFM 1 -SFM 5 ) are ´14.51% and 0 respectively, the maximum and minimum MAPE of the single forecast models (SFM 1 -SFM 5 ) are 6.31% and 0.40% respectively.

‚
The maximum and minimum PE of the four synthesized forecast methods in Reference [55] are 5.56% and 0.02% respectively, the maximum and minimum MAPE of the four synthesized forecast methods in Reference [55] are 2.78% and 0.98% respectively.

‚
The maximum and minimum PE of WCF-FMWD-PSO are ´2.36% and 0.12% respectively, the MAPE of WCF-FMWD-PSO is 0.93% and the iteration number is 623.

‚
The maximum and minimum PE of WCF-FMWD-IA-PSO are 1.05% and ´0.11% respectively, the MAPE of WCF-FMWD-IA-PSO is 0.53% and the iteration number is 490.

‚
The maximum and minimum PE of WCF-FMWD-improved-IA-PSO are 0.91% and ´0.14% respectively, the MAPE of WCF-FMWD-improved-IA-PSO is 0.35% and the iteration number is 193.
Energies 2016, 9, 20 From the comparison of PE and MAPE of the proposed method (WCF-FMWD-improved-IA-PSO) and others shown in Figures 4 and 5 and Table 8, we can see the follows:


The maximum and minimum PE of the single forecast models (SFM1-SFM5) are −14.51% and 0 respectively, the maximum and minimum MAPE of the single forecast models (SFM1-SFM5) are 6.31% and 0.40% respectively.


The maximum and minimum PE of the four synthesized forecast methods in Reference [55] are 5.56% and 0.02% respectively, the maximum and minimum MAPE of the four synthesized forecast methods in Reference [55] are 2.78% and 0.98% respectively.


The maximum and minimum PE of WCF-FMWD-PSO are −2.36% and 0.12% respectively, the MAPE of WCF-FMWD-PSO is 0.93% and the iteration number is 623.


The maximum and minimum PE of WCF-FMWD-IA-PSO are 1.05% and −0.11% respectively, the MAPE of WCF-FMWD-IA-PSO is 0.53% and the iteration number is 490.


The maximum and minimum PE of WCF-FMWD-improved-IA-PSO are 0.91% and −0.14% respectively, the MAPE of WCF-FMWD-improved-IA-PSO is 0.35% and the iteration number is 193.Through the above analysis and the comparison shown in Figures 4 and 5 the proposed method WCF-FMWD-improved-IA-PSO results in an improved accuracy overall (MAPE is 0.35, maximum and minimum PE are 0.91% and ´0.14%) when compared against other described methods over the chosen time period and year characteristics.In addition, the proposed method WCF-FMWD-improved-IA-PSO has the fastest convergence rate in forecast model weight determination when compared against WCF-FMWD-PSO and WCF-FMWD-IA-PSO.Therefore, using the proposed method to forecast the medium-and-long-term load is better than using other methods.The correctness and feasibility of the proposed method are proven.

Conclusions
In this paper, we have proposed a robust weighted combination load forecasting method WCF-FMWD-improved-IA-PSO based on forecast model filtering and adaptive variable weight determination to forecast the annual electric load.The contribution and novelty are mainly as follows: (1) Due to the fact that the forecast year's true load is unknown, the comprehensive validity degree of forecast model is defined by the integration of fitted value relative error and forecast value relative error, and then forecast models are filtered based on their comprehensive validity degrees.

‚
The definition of validity degree can effectively overcome the inherent shortcomings of error theory.Entirely investigating the fitting level and the validity of forecast model, the comprehensive validity degree definition and the forecast model filtering method can improve the robustness of combination forecasting.

‚
Revealing the transition pattern between the natural precision and validity degree, the forecast precision estimation method based on Markov chain and cloud model can provide an important basis for the subsequent weighted combination forecasting.In the forecast models' filtering, the better ones will be selected and the worse ones will be eliminated.It can also improve the robustness of combination forecasting. (2) The improved IA-PSO is used to determine the forecast model weight in combination forecasting.
Based on the uniting of immune system's specific information processing mechanism and PSO's global convergence ability, disturbance variable and particle searching speed's adaptive adjustable strategy are introduced to improve the algorithm performance.The particles' diversity is ensured while the convergence speed is increased.It can avoid the local optimal and improve the accuracy.
As can be seen from the case study, the maximum and minimum of percentage error by the proposed method WCF-FMWD-improved-IA-PSO are 0.91%, ´0.14% and the mean absolute percentage error is 0.35%, which are smaller than those by the single forecast models (SFM 1 -SFM 5 ), the four synthesized forecast methods [55], WCF-FMWD-PSO and WCF-FMWD-IA-PSO.These indicate that the proposed method has significant superiority over other methods in the terms of annual electric load forecasting accuracy.The iteration number of FMWD-improved-IA-PSO in the proposed method (193) is far less than iteration numbers of FMWD-PSO and FMWD-IA-PSO (623 and 490), so its advantage that the global optimal solution is reached faster than PSO and IA-PSO is confirmed.In conclusion, the proposed method WCF-FMWD-improved-IA-PSO has a higher robustness and better accuracy, and it can meet the requirements of the annual electric load forecast and can also be applied in the forecast of related fields.In the forecast with analogous features, the proposed method WCF-FMWD-improved-IA-PSO can also be applied.

Figure 1 .
Figure 1.The flowchart of the proposed robust weighted combination forecasting method.

Figure 1 .
Figure 1.The flowchart of the proposed robust weighted combination forecasting method.

)
. We assume that there are n f m forecast models.By the forecast model FM d (d = 1, 2, . . ., n f m ), y 1 e,d is the fitted value of the similar year e (e = 1, 2, . . ., n shis ), and y c,d is the forecast value of the forecast year c (c = n his + 1, n his + 2, . . ., n his + n f o ).

Figure 2 .
Figure 2. The history years and the forecast years.

Figure 2 .
Figure 2. The history years and the forecast years.

pkq l and x pkq l
are the speed and position of the particle PA l in the iteration iter k , LF 1 and LF 2 are the learning factors, and 0 ď rand pkq 1 , rand pkq 2 ď 1 are two random numbers.

1 ¨ppbest pkq l ´xpkq l q and the social cognitive part LF 2 ¨rand pkq 2 ¨pGbest pkq l ´xpkq l q represent
the information sharing and cooperation of different particles.This part guides the particle to fly to the best position of the group.Therefore, the momentum part w ¨vpkq l represents the particles' diversification; the individual cognitive part LF 1 ¨rand pkq the particles' centralization.The main performance of PSO is determined by the balance of the three parts.
of the swarm is PF pkq avg , and the average fitness of the particles whose fitness are bigger than PF pkq avg is PF pkq AVG .We use ∆ pkq " ˇˇPF pkq 0 ´PF pkq AVG ˇˇto evaluate the swarm convergence degree in the iteration L. According to the particle fitness, the swarm is divided into three sub-swarms: the different sub-swarms, we take different adjustment operations to their searching speeds as follows: individuals in the swarm, these particles have been relatively close to the global optimum.Their searching speeds should be reduced to prevent from jumping out of the global optimum.So the searching speed is adjusted as follows: individuals of the swarm, both the local optimum ability and the global optimum ability of these particles are good.Therefore, we don't adjust their searching speeds.the relatively poor individuals in the swarm.The searching speed is adjusted as follows: is dynamically and adaptively adjusted according to the value of ( ) k  : if the individuals tend to be dispersed ( ( ) and the swarm development ability will be enhanced to strengthen the local optimization; if the individuals tend to be converged ( detection ability will be enhanced to effectively jump out of the local optimum.

Figure 3 .
Figure 3.The flowchart of forecast model weight determination based on improved IA-PSO.

Figure 3 .
Figure 3.The flowchart of forecast model weight determination based on improved IA-PSO.

Figure 4 .
Figure 4.The comparison of PE of the proposed method (WCF-FMWD-improved-IA-PSO) and others.

Figure 4 .
Figure 4.The comparison of PE of the proposed method (WCF-FMWD-improved-IA-PSO) and others.

Figure 4 .
Figure 4.The comparison of PE of the proposed method (WCF-FMWD-improved-IA-PSO) and others.

Figure 5 .
Figure 5.The comparison of MAPE of the proposed method (WCF-FMWD-improved-IA-PSO) and others.Figure 5.The comparison of MAPE of the proposed method (WCF-FMWD-improved-IA-PSO) and others.

Figure 5 .
Figure 5.The comparison of MAPE of the proposed method (WCF-FMWD-improved-IA-PSO) and others.Figure 5.The comparison of MAPE of the proposed method (WCF-FMWD-improved-IA-PSO) and others.
qq 2 are the expectation and the standard deviation of the fitted precision of FM d for the similar year e, respectively.The forecast validity degree of FM d is as follows: NORMpEn d , He d q is a normal random number with the expectation En d and the variance He d , P c,d is the estimated forecast precision with the expectation Ex d and the variance En 1 d in the precision range S d g .3.3.Forecast Model Filtering Based on Comprehensive Validity DegreeThe comprehensive validity degree of the forecast model FM d in the whole interval [1, n shis ] Y [n his + 1, n his + n f o ] is as follows:CV d " α ¨FIV d `p1 ´αq ¨FOV d(19)where FIV d is the fitted validity degree of FM d in the similar history years interval[1, n shis ], FOV d is the estimated forecast validity degree of FM d in the forecast years interval [n his + 1, n his + n f o ].
1,d , P 1 2,d , ..., P 1 n shis ,d ) into the normal cloud model.In this normal cloud model, the input is P 1 d and the output is the expectation Ex d , the entropy En d and the hyper-entropy He d .The algorithm of this backward cloud generator is as follows: Ex d " EXP `P1 d ˘(15)

Table 2 .
The cosine similarity between the history years 1998-2004 and the forecast year 2005.

Table 4 .
The forecast power consumption by the eleven forecast models (10 8 kWh).

Table 5 .
The comprehensive validity degree of the eleven forecast models based on the real and forecast power consumptions in 1998 and 2001-2005.

Table 7 .
The results of weighted combination forecast methods (10 8 kWh).

Table 8 .
The percentage error of the proposed method (WCF-FMWD-improved-IA-PSO) and others.