New Hybrid Algorithms for Prediction of Daily Load of Power Network

: Two new hybrid algorithms are proposed to improve the performances of the meta-heuristic optimization algorithms, namely the Grey Wolf Optimizer (GWO) and Shufﬂed Frog Leaping Algorithm (SFLA). Firstly, it advances the hierarchy and position updating of the mathematical model of GWO, and then the SGWO algorithm is proposed based on the advantages of SFLA and GWO. It not only improves the ability of local search, but also speeds up the global convergence. Secondly, the SGWOD algorithm based on SGWO is proposed by using the beneﬁt of differential evolution strategy. Through the experiments of the 29 benchmark functions, which are composed of the functions of unimodal, multimodal, ﬁxed-dimension and composite multimodal, the performances of the new algorithms are better than that of GWO, SFLA and GWO-DE, and they greatly balances the exploration and exploitation. The proposed SGWO and SGWOD algorithms are also applied to the prediction model based on the neural network. Experimental results show the usefulness for forecasting the power daily load.


Introduction
Global optimization problems are common in engineering, economics and many sciences, and that their general formulation is as in the equations below.
optimize − → x ∈ f j (x), (j = 1, 2, ..., m) (1) where − → l =(l 1 ,l 2 ,...,l n ), − → u =(u 1 ,u 2 ,...,u n ). − → x is a decision vector, and n is the dimension of − → x . The area covered by the decision vectors is called the search range. − → u and − → l are the upper and lower bounds of the search range , and they also have n dimensions. f j (x) is called the cost or fitness function. If m equals 1, f 1 (x) is a single fitness function. In the paper, it only considers a single objective, so f 1 (x) is described as f (x).
The meta-heuristic algorithm is an improvement of the heuristic algorithm and it adopts the methods of local search and stochastic, which provides a solution to an acceptance of optimization problem to some extent. Meta-heuristic is an iterative process. Through the combination of different concepts, the process uses the algorithm to exploration and exploitation in the search range. Learning strategy is used to acquire and master information to find the approximate optimal solution during where − → X i and − → X p are the position vectors of i and prey, respectively; t represents the current iteration; both − → A and − → B are coefficient vectors, which are calculated as follows: − → B = 2 − → r 2 With the iteration process, a linearly decreases from 2 to 0. − → r 1 and − → r 2 are random vectors between [0, 1].
During hunting, the alpha, beta and delta guides the wolf pack. A wolf first computes its distance to them according to Equations (8) and (9), and then updates its position by Equation (10).
where − → X α , − → X β and − → X δ respectively represent the positions of α, β and δ. − → D α , − → D β and − → D δ respectively represent the distances between α, β, δ and i. Figure 1 shows the complete flow chart of GWO. It randomly initializes the wolf pack in a limited space and calculates the fitness of each wolf. Then it selects the top three best wolves to update the positions of the wolves according to the Equations (8) to (10), and finally outputs the optimal.

Differential Evolution
Like other evolutionary algorithms, DE operates on the candidate solutions of the population [9], but the population reproduction is different from others. The evolutionary process of DE contains three operations, mutation, crossover and selection, which is very similar to GA. It preserves the individual optimal and shares the information with the population, that is, the optimization problem is solved through cooperation and competition among individuals.
A new individual is produced through adding the differences between two individuals to another, called mutation. The new one is compared with the individuals in the current population, called crossover; if its fitness is better, the old individual will be replaced by it in the next generation, otherwise the old one is still preserved, which is called selection. It evaluates the optimal of each generation during the evolution process. But in the process of solving problems, DE may lead to decrease the diversity of the population and cause the algorithm to stagnate.
It randomly selects two different individuals in the population, then amplifies the difference and performs synthesis with the mutated one. For each objective x i (t), Equation (11) generates a mutant. (11) where x r 1 (t) denotes the individual r 1 in the t th generator. r 1 , r 2 and r 3 are different individuals, which are randomly selected from the population. λ is a constant factor between [0, 2] and it is used to control the proportion of the differential variation (x r 2 (t) − x r 3 (t)).
A new trial is generated by crossover operation and it increases the diversity of the population. It is produced as following: where rand(j) is a random constant between [0, 1] and pCR is a crossover constant between [0, 1]. randi(i) is a randomly chosen integer between [1, n]; j is the dimension of an individual. According to Equation (13), if c i (t + 1) is better than x i (t), it will replace x i in the (t + 1) th generation. Figure 2 shows the complete flow chart of DE. It randomly initializes the population and calculates the fitness of each individual. Then it generates a new individual through random selecting three ones to perform mutation and crossover, and it determines whether to replace the original with the new one by Equation (13). Finally it outputs the optimal.

Shuffled Frog Leaping Algorithm
SFLA utilizes the shuffled complex evolution strategy, which is a post-heuristic computing [35]. A group of frogs are divided into several subgroups. Different subgroups are considered to be a collection of frogs with different ideas and each subgroup is allowed to independently develop. After several evolutions, the all subgroups are reunited. It has the functions of global information exchange and local search, which achieves the balance between local search and global search.
Firstly, the frogs are sorted in descending order according to their fitness. Supposed the whole population consists of m memeplexes and each contains n frogs, so it satisfies the relationship N = m × n. The first, second and m th frogs are divided into the first, second and m th memeplexes respectively. Then the (m + 1) th frog is reassigned to the first memeplex, and so on. Figure 3 shows the memeplex partitioning process. Where the 1st, (m + 1) th , . . . , ((n − 1) * m + 1) th frogs are divided into the 1st memeplex and the m th , (m + m) th , . . . , (n * m) th frogs are divided into the m th memeplex.  X a is the best frog in the population, while X z and X p respectively represent the worst and best frogs of each memeplex. Each memeplex performs local search, and the position of the frog is updated as follows: where is a random factor between [0 1]. X m indicates the maximum value that the frog is allowed to change position. If X z is better than X z , the latter is replaced by the former. Otherwise, X a replaces X p and it continues to calculate X z by Equations (15) and (16). If it still hasn't improved, X z is replaced by a random position. When the local search is completed, the frogs in all memeplexes are combined and sorted, then they are divided into memeplexs to continue to do local search. Figures 4 and 5 show the whole flow chart of SFLA. It randomly initializes the frog group, calculates the fitness of each frog and sorts the frogs by the fitness. The group is divided into different meme groups to execute sub-processes. After the sub-processes are executed, all frogs are merged and sorted again, and then the sub-processes are re-executed until the algorithm ends and it outputs the optimal. During the sub-process, it updates the frogs in the worst position by Equations (14) to (16).    Figure 5. The flow chart of each memeplex.

New Hybrid Algorithms Based on GWO, SFLA and DE
GWO has a great performance in convergence, but it's easy to fall into the trap of local optimum. While SFLA has a great outstanding in global search, but its convergence is unsatisfactory. In the section, a new hybrid algorithm SGWO, based on GWO and SFLA, is proposed to overcome their defects. In a biological community, the species with the worst adaptability tend to be eliminated, or they must have to gain greater viability through variation. SGWOD draws the learning strategies from SGWO and DE to get better performance.

Advanced the Model of GWO
Before starting to design the hybrid algorithms, we first modify the mathematical model of GWO. When the fitness of the alpha is not as good as a new wolf, the former is replaced by the latter. But the old alpha may also have some important information, so it needs proposing a new hierarchy model to utilize the experience of the old one. By Equation (10), we know that it doesn't consider the different importance of the alpha, beta and delta on guiding attack, so a new position updating model is introduced.

A New Hierarchy Model
The alpha is responsible for directing the wolf pack during hunting. Although it has great power, it is responsible for the safety and the livelihood of the wolves, and it is also under the supervision of the pack. There is not inheritance to the position of the alpha, and it needs accepting the challenges of other wolves. If it is defeated, the winner becomes the new leader (new alpha). The beta and delta are usually composed of experienced members of the wolves, which assist the alpha to complete the hunting. Once the alpha can't lead the wolf pack to capture prey well, they replace it.
The new mathematical model is established based on the above descriptions. If the alpha doesn't lead the wolf pack to catch prey well, it degenerates into the beta, and it doesn't directly turn into an omega. The updating equations of the alpha are defined as follows: where i represents a candidate solution; t indicates the current iteration; f is the fitness function; α means the new alpha. If the beta hunts better than i, it is not replaced by i. But if it does not good at assisting the alpha, it becomes the delta and it doesn't turn into an omega wolf. The beta is updated as follows: where β (t) represents the new beta. If the delta hunts better than i, it will not be replaced by i. But if it performs bad in hunting, it becomes an omega. The updating equations of the delta are defined as follows: where δ (t) represents the new delta.

A New Position Updating Model
Supposed an integer sequence R = {1, 2, . . . , i, . . . , n}, R i represents the probability of i. If it follows a triangular discrete distribution, so the i th in the sequence has the following probability.
The wolves work together in hunting, that the alpha leads the pack to attack prey and the beta and delta assist it to complete the attack. Therefore, the alpha plays a decisive role in the position updating of the pack, while the beta and delta play an auxiliary roles. But when the wolves update their positions, the Equation (10) doesn't consider the decisive roles of the alpha, beta, and delta. The fitness is sorted from good to bad, then a new position equation is proposed as follows.
where P i represents the importance of X i and i is selected from the alpha, beta and delta.

Hybrid Algorithm SGWO
From the position updating equations we have known that GWO makes all members converge quickly towards the optimal , while SFLA only makes the worst members converge to the optimum. But GWO may fall into local optimum due to too fast convergence, SFLA achieves global information exchange by combining memeplexes, which avoids local convergence. So through combining two or more methods, the hybrid algorithm effectively solves the problems. The process of SGWO is described as follows: Step 1. Sort the population by fitness.
Step 2. Divide the population into different meme groups.
Step 3. Perform the following local search for each meme group.
Step 3.1. Find the alpha, beta and delta.
Step 3.2. Access each wolf and update the alpha, beta and delta by Equations (17)- (22).
Step 3.3. Update the position of each wolf by Equation (25).
Step 3.4. Repeatedly do 3.1-3.3 until it meets the ending conditions of local search.
Step 4. Combine the meme groups and repeatedly do 1-3 until it meets the ending conditions of the algorithm.
The pseudo code of SGWO is described in Algorithm 1. The pseudo code of RunSGWO function is described in Algorithm 2.

Algorithm 2 RunSGWO
Find the alpha, beta and delta. for i = 1 : Max_iteration do for s = 1 : subPopSize do flag = 1; fitness = fobj(pop(s).Position); if fitness < alpha then beta = alpha; alpha = pop(s); Update the position of s by Equation (25) Calculate the fitness of s end for end for

Hybrid Algorithm SGWOD
Gene mutation occurs in the organism when a creature breeds its next generation. If the mutation is beneficial to the organism, the variants are filtered through the environment and mutant genes are preserved in offspring. After each iteration, the wolves of a memeplex are updated by DE. That is, it eliminates the wolves with the worst fitness. At the same time, new ones are generated to replace the eliminated. The process of SGWOD is described as follows: Step 1. Sort the population by fitness.
Step 2. Divide the population into different meme groups.
Step 3. Perform the following local search for each meme group.
Step 3.1. Find the alpha, beta and omega.
Step 3.2. Access each wolf and update the alpha, beta and omega by Equations (17)- (22).
Step 3.3. Update the position of each wolf by Equation (25).
Step 3.4 Access each wolf, randomly mutate and cross, if the new mutant is better than the old, it is replaced by the new one.
Step 3.5. Repeatedly do 3.1-3.4 until it meets the ending conditions of the local search.
Step 4. Combine the meme groups and repeatedly do 1-3 until it meets the ending conditions of the algorithm.

Experiments and Results
In the section, we use 29 benchmark functions to test. These classical functions, listed in Tables 1-4, are used by many researchers [56,57]. Space is the boundary of its search range, D im represents the dimension of the function, and f min indicates the optimum of it.

Function
Space Table 4. Composite multimodal benchmark functions.

Function
Space D im f min

Experimental Results
For verifying the results, SGWO and SGWOD are compared to GWO, SFLA and GWO-DE, which is an improved GWO with differential evolution [55]. They run 100 times on each benchmark function, and the testing parameters are listed in Table 5. Table 6 shows the statistical results of the algorithms, including average (AVG) and standard deviation (STD). Table 5. Parameters setting of each algorithm.   Figures 6-9 demonstrate the solution quality and speed of the benchmark functions. Horizontal axis represents the iteration numbers while corresponding fitness values are aligned along the vertical axis. They are respectively the convergence curves of the benchmark functions of unimodal, multimodal, fixed-dimension multimodal and composite multimodal. From Figures 6-9, it is seen that the proposed algorithms converge quickly and eventually converge to a very low level. They successfully avoid the trap of local optimum.               Unimodal functions have only a global optimum and no local optima, so they benchmark the utilization of algorithms. SGWO performs better than GWO-DE except for f 6 , and SGWOD is better than other compared algorithms. In the functions of f 1 , f 2 , f 4 and f 5 , SGWO and SGWOD are almost identical in the convergence curves, but SWGOD performs better than SGWO in f 3 , f 6 and f 7 . From Figure 6, we find that they have faster searching ability in the function with only a global optimum.

Experimental Analysis
Multimodal functions have exponential local optima, they test the algorithm whether to avoid local optima. It is observed from the experimental results that the proposed algorithms perform better than other algorithms in most multimodal functions. In multimodal functions, the convergence curves of SGWO and SGWOD are almost identical in f 8 , f 9 and f 10 . SGWO is superior to SGWOD in f 11 , while SGWOD is superior to SGWO in f 12 and f 13 . They perform better than GWO, GWO-DE and SFLA. They can not only converge quickly, but also avoid local optimum and finally they find the global optimum, which shows that they effectively exchange global information. While in fixed-dimension multimodal functions, they perform better than other compared algorithms except that the final convergence of f 14 is not as good as SFLA.                 Composite multimodal functions have extremely complex structures with many randomly located global optimum and several randomly located deep local optima. In composite functions, although the compared algorithms are not good in the convergence curves, as seen from Figure 9, their convergence speed are very fast and they are better than other compared algorithms except for the final convergences of f 24 and f 27 are not as good as GWO-DE. It sees that GWO-DE has a large fluctuation, while SWGO and SGWOD haven't the problem. Their convergence curves are relatively smooth, which means that they do better than the previous generation for each iteration.
From the Table 6 and Figures 6-9, we conclude that the proposed algorithms have improved the ability of global search and quickly converge in the iteration process. Therefore, they have better performances in terms of convergence speed and accuracy.

Combined Prediction Model Based on Hybrid Algorithms and Its Application
In the section, we apply the algorithms used in the Section 4 to implement the prediction of daily power load by the neural network. The prediction plays an important part in the planning, scheduling and security of power systems and it is a useful tool for thermal power planning, hydro-thermal coordination, and unit economic combination of the systems. The methods of traditional statistical analysis include regression analysis, state space and so on. However, since the changing process of power load is a procedure that contains various complex factors, it is difficult to establish an effective mathematical model by traditional methods, which leads to the low prediction accuracy.

The Structure of Neural Network Prediction Model
Time series analysis is an important method of mathematical statistics, which gets useful knowledge from the sequential information. It is essentially to find out the relationship between the data before and after, and build the association model. Then it predicts the future through the historical data and the established model.
The neural network is a method that uses the sum of squared errors as the fitness function and finds the optimum by gradient method. But it has some inherent defects, such as slow learning speed, low precision, and easy falling into local minima. The meta-heuristic algorithm is a global optimization process. It has been widely used in training the parameters of the neural network because it can find global solutions in the multi-dimensional search space. The parameters of the neural network are optimized by the meta-heuristic, then the neural network is used to further accurately optimize the acquired network parameters. So we use the proposed methods to train the network and finish the prediction.
We adopt the three-layer network structure, which contains multiple inputs and one output for predicting daily load. The structure of neural network is shown in Figure 10. Since the neural network uses a three-layer architecture, the selected hyper parameters are input/output weights, and input/output biases. So if the network has 4 neurons, it has 4n input weights, 4 input biases, 4 output weights and 1 output bias. Where n is the number of input vector. The parameters are obtained by SGWO and SGWOD, then the network uses the parameters to predict the data and informs the meta-heuristics about the results of the prediction to guide their evolution.

Processing of Input Data
In order to accurately predict the daily load, it should take into account various factors affecting daily load forecasting and select appropriate features. Therefore, the prediction model includes some relevant factors, such as date classification and daily average temperature. When the weather changes, it has a great impact on the power load. For example, in the summer, air conditioning and other related equipment are used more often than in the spring and autumn due to high temperature and the demand for heatstroke prevention and cooling. In such a situation, it inevitably leads to an increase in the power load. On the rest days, large users such as factories and schools use less electricity, while shopping malls and households use more electricity. But working days are the opposite, especially for major festivals, most enterprises are in a state of holiday. They have very little load, while only domestic electricity and some tertiary industries use electricity, and their electricity consumption is relatively low. We use the data from January 1st to December 30th of a city in China, including daily power load data, weather data and date types, to predict its daily load from January 2nd to December 31st. Because the data has different values and great differences, it is necessary to quantify and normalize the input data to avoid distortion of the model. At the input layer, the daily load is converted to the value of [-1, 1] by the following equation.
where x is the current value; x is the converted result; x max and x min respectively represents the maximum and minimum values of the daily load. In order to improve the accuracy of daily load forecasting, the fitness is redefined as follows: |y −ŷ|)/n (27) where n is the number of prediction data; y indicates the actual result andŷ presents its corresponding prediction. When the temperature is within a certain range, it is a small influence on the electric load, but when the temperature rises or falls to a degree, it is a large impact on the load. Therefore, it is necessary to segment and quantify the temperature, as shown in Table 7. There are also three categories of date types, which are weekdays (Monday-Friday), rest days (Saturday-Sunday), and holidays. 0.4, 0.7 and 1 are the values corresponding to weekdays, rest days and holidays, respectively.

Prediction Results
The prediction results of the algorithms are shown in Table 8, where LS is the classical least squared; NN is the fixed structure neural network. According to the statistics of the prediction results, as shown in Figure 11, the 124th day is the largest prediction error of SGWO and SGWOD. From the quantified data on the 124th and 125th days, it is seen that the power load has changed drastically in the two days, and they are large influenced by other external influence factors. It also shows from another side that power load is closely related to temperature and date types. The efficiency of network can be further improved by adapting the optimization methods [58][59][60][61][62].  Figure 11. The prediction error of the algorithms.

Conclusions
In the study, we improve the model of GWO. Then based on the model, SGWO uses the learning strategies from GWO and SFLA. SGWOD is an advanced SGWO based on DE. We test the algorithms through 29 classical benchmark functions. The experiments show that SGWO and SGWOD have better performances in the exploration and exploitation, but it requires much more processing time because of every meme group needing to run iteratively. Therefore, the future work is to further improve the efficiency of the optimization algorithm.
In the end, the algorithms are used to train the parameters in the neural network for predicting daily power load. Then they find the appropriate network structure and derive the initial parameters of it and prediction is carried out based on the acquired network and parameters. They overcome the blindness of the selection of neural network and get excellent parameters, and finally they achieve the purpose of improving the convergence performance of the network.