Comparative Study of Hybrid Models Based on a Series of Optimization Algorithms and Their Application in Energy System Forecasting

Big data mining, analysis, and forecasting play vital roles in modern economic and industrial fields, especially in the energy system. Inaccurate forecasting may cause wastes of scarce energy or electricity shortages. However, forecasting in the energy system has proven to be a challenging task due to various unstable factors, such as high fluctuations, autocorrelation and stochastic volatility. To forecast time series data by using hybrid models is a feasible alternative of conventional single forecasting modelling approaches. This paper develops a group of hybrid models to solve the problems above by eliminating the noise in the original data sequence and optimizing the parameters in a back propagation neural network. One of contributions of this paper is to integrate the existing algorithms and models, which jointly show advances over the present state of the art. The results of comparative studies demonstrate that the hybrid models proposed not only satisfactorily approximate the actual value but also can be an effective tool in the planning and dispatching of smart grids.


Introduction
The energy system is a complex system that achieves the simultaneous generation, transportation, distribution and consumption of electrical energy, playing a pivotal role in each field of social production.It is essential that the electrical power system have sufficient capacity to address dynamic change, which could otherwise affect the quality of the power supply and even endanger the safety and stability of the electrical system.Currently, the control of electrical systems helps to plan electricity management, arrange reasonable operation modes, save energy, reduce the costs of generating electricity and enhance both economic and social benefits [1].Three indicators in the electrical system are crucial for adapting to modern and scientific power grid management: short-term wind speed, electrical load and electricity price because they are connected to generation, distribution and consumption, respectively.
First, the short-term wind speed can have a great influence on the generation of electricity.Faced with resource shortages, environmental pollution and ecosystem degradation, developing and utilizing clean energy with high efficiency has become an important topic.Wind is a clean and inexhaustible type of energy and one of the most promising energy resources [1].Wind energy has achieved rapid development worldwide, and, as shown in Figure 1, the newly increased wind power installed capability has reached 51477 MW.The power of wind is proportional to the wind speed; therefore, the wind speed determines the magnitude of the wind power.Compared with the long-and Energies 2016, 9, 640 2 of 34 middle-term wind speed, the randomness, fluctuation and intermittent nature of short-term wind speed makes it more difficult to control wind turbines or ensure the normal operation of the power grid [2].Second, with the continuous increase in the installed capacity and the consumption of electrical power, forecasting the electrical load becomes more and more significant [3].Electrical load forecasting means to estimate and forecast the electricity demand through analysing and researching the historical data and extracting the inner relationship of data from the perspective of known economic and social development and the demands of the electrical system, considering factors such as politics, economy and climate.In recent years, large-scale power outages in large-scale areas have been caused by extra electrical load, resulting in great economic losses [4].Thus, the scientific control of electrical load seems vital.Finally, the indicator electricity price is related to its consumption, which can be adjusted with changes in supply and demand until it tends to be reasonable.Forecasting the electricity price is crucial because it has already become one of the cores of electricity marketization [5].On the one hand, the electricity price could balance the economic interests of participants in the market.On the other hand, the market would also be faced with large risks in electricity price owing to the fluctuation of the wind [6].
Energies 2016, 9, 640 2 of 34 short-term wind speed makes it more difficult to control wind turbines or ensure the normal operation of the power grid [2].Second, with the continuous increase in the installed capacity and the consumption of electrical power, forecasting the electrical load becomes more and more significant [3].Electrical load forecasting means to estimate and forecast the electricity demand through analysing and researching the historical data and extracting the inner relationship of data from the perspective of known economic and social development and the demands of the electrical system, considering factors such as politics, economy and climate.In recent years, large-scale power outages in large-scale areas have been caused by extra electrical load, resulting in great economic losses [4].Thus, the scientific control of electrical load seems vital.Finally, the indicator electricity price is related to its consumption, which can be adjusted with changes in supply and demand until it tends to be reasonable.Forecasting the electricity price is crucial because it has already become one of the cores of electricity marketization [5].On the one hand, the electricity price could balance the economic interests of participants in the market.On the other hand, the market would also be faced with large risks in electricity price owing to the fluctuation of the wind [6].Based on the discussion above, we can see that forecasting the electrical power system with high accuracy and reliability is a widespread difficulty; however, it is of great significance.Based on the computational mechanism, the forecasting methods can be divided into four types: statistical methods, physical methods, intelligent methods and hybrid methods.

Statistical Forecasting Methods
Statistical methods construct mathematical and statistical models to conduct time series forecasting and offer better real-time performance [7].Statistical forecasting methods achieve reduced forecasting errors if the input variables are under normal conditions [8].Autoregressive integrated moving average (ARIMA) is a typical statistical technique that is widely used in time series forecasting.Kavasseri and Seetharaman [9] examined the use of a fractional-ARIMA model to forecast wind speeds on the day-ahead and two-day-ahead horizons.The forecast errors in wind speed were analysed and compared with the persistence model, and the results indicated significant improvements in forecasting accuracy.Wang et al. [10] proposed residual modification models to improve the precision of seasonal ARIMA for electricity demand forecasting.They applied a seasonal ARIMA approach, an optimal Fourier method optimized by particle swarm optimization (PSO) and combined the PSO optimal Fourier method with seasonal ARIMA to correct the forecasting results for electrical power in northwest China.The final results showed that the Based on the discussion above, we can see that forecasting the electrical power system with high accuracy and reliability is a widespread difficulty; however, it is of great significance.Based on the computational mechanism, the forecasting methods can be divided into four types: statistical methods, physical methods, intelligent methods and hybrid methods.

Statistical Forecasting Methods
Statistical methods construct mathematical and statistical models to conduct time series forecasting and offer better real-time performance [7].Statistical forecasting methods achieve reduced forecasting errors if the input variables are under normal conditions [8].Autoregressive integrated moving average (ARIMA) is a typical statistical technique that is widely used in time series forecasting.Kavasseri and Seetharaman [9] examined the use of a fractional-ARIMA model to forecast wind speeds on the day-ahead and two-day-ahead horizons.The forecast errors in wind speed were analysed and compared with the persistence model, and the results indicated significant improvements in forecasting accuracy.Wang et al. [10] proposed residual modification models to improve the precision of seasonal ARIMA for electricity demand forecasting.They applied a seasonal ARIMA approach, an optimal Fourier method optimized by particle swarm optimization (PSO) and combined the PSO optimal Fourier method with seasonal ARIMA to correct the forecasting results for electrical power in northwest China.The final results showed that the forecasting accuracy was higher than the seasonal ARIMA model alone and that the combined model was the most satisfactory.Shukur and Lee [11] stated that the non-linearity in the patterns of wind speed data was the reason for inaccurate wind speed forecasting using a linear ARIMA model, and the inaccurate forecasting of the ARIMA model reflected the uncertainty of the modelling process.Babu and Reddy [12] explored both linear ARIMA and non-linear artificial neural network (ANN) models to devise a new hybrid ARIMA-ANN model for the forecasting of electricity price.Cadenas and Rivera [13] also combined ARIMA and ANN for wind speed forecasting using measured hourly wind speed time series at different sites during one month.The final results demonstrated the effectiveness of the proposed model.

Physical Forecasting Methods
Physical forecasting methods utilize physical variables to achieve time series forecasting considering a series of meteorological parameters; therefore, they can perform accurate forecasting [7].However, they always require more complicated computations and incur a considerable cost in time.Numerical weather prediction (NWP) is a widely used physical forecasting method.The NWP model is a computer programme that is aimed to solve equations of the atmospheric processes and describing how the atmosphere changes with time [14].Zhang et al. [15] compared three deterministic and probabilistic NWP-based wind resource assessment methodologies to evaluate the distribution of wind speed, and the results showed that NWP could achieve reliable probabilistic assessments and provide accurate deterministic estimates.Giorgi et al. [16] integrated the neural network with NWP to evaluate the wind speed and wind power, and the combined method offered an interesting improvement in performance, especially with longer time horizons.Felice et al. [17] studied the influence of temperature on daily load forecasting for Italy.The actual capability of available weather forecasts to contribute to the prediction of electricity loads was evaluated using weather data from NWP models.The results demonstrated that the weather data provided by NWP models led to performance improvements.Sile et al. [18] argued that NWP models were a reliable source of meteorological forecasts and could also be used in wind resource assessment.They also analysed the influence of wind speed and wind direction on model errors.

Intelligent Forecasting Methods
Intelligent forecasting methods mainly include artificial intelligence neural networks or evolutionary algorithms.ANN is proven to perform much better than the techniques discussed above because it can handle complex relationships, adaptive control, decision-making under uncertainty, and prediction patterns [19].Liu et al. [20] applied multilayer perceptron (MLP) neural networks to forecast the wind speed based on the mind evolutionary algorithm (MEA) and genetic algorithm (GA).Lou and Dong [21] constructed an electric load forecasting model based on random fuzzy variables (RFVs) and further presented a novel integrated technique, random fuzzy NN (RFNN), for load forecasting.Real operational data collected from the Macau electric utility was applied to test the effectiveness of the model, which showed a much higher variability.Coelho and Santos [22] proposed a non-linear forecasting model based on radial basis function neural networks (RBF) to conduct multi-step-ahead and direction-of-change forecasting of the Spanish electricity pool prices.They proved that the developed model outperformed other methods.Keles et al. [23] presented a methodology based on ANN to forecast electricity prices, which is applied for in-sample and out-of-sample analyses, and the results showed that the overall methodology led to well-fitted electricity price forecasts.Anbazhagan and Kumarappan [24] proposed a day-ahead electricity price classification that could be implemented using a three-layered feed-forward neural network (FFNN) and cascade-forward neural network (CFNN).This method was important because it could help to improve the forecasting accuracy and thus provide robust and accurate forecasting results.Wang and Liu [25] designed two key techniques for forecasting, including clustering and axiomatic fuzzy set (AFS) classification.The main novelty was that the proposed model could both predict the value and capture the prevailing trend in the electricity price time series with good interpretability and accuracy.
Researchers have simulated a series of evolutionary algorithms [26], such as the GA [27], simulated annealing (SA) [28], PSO [29], ant colony algorithm (ACA) [30], and other types of algorithms.GA and PSO are the most commonly used evolutionary algorithms, and PSO has been proven to show better performance on smaller network structures than GA [31].PSO is an evolutionary optimization algorithm using n dimensions to search for the optimum solution within the search region.It is simple to understand and can solve both continuous and discrete problems, and this is because that PSO only needs funtion evaluations instead of initial values.Besides, it can also escape local optimal solutions [32].Aghaei et al. [33] developed a modified PSO algorithm used for multiobjective optimization.In the proposed method, a new mutation method was performed to improve the global searching ability and restrained the premature convergence to local minima to achieve higher accuracy in electrical demand forecasting.Carneiro et al. [34] applied PSO to estimate the Weibull parameters for wind speed, and PSO was demonstrated to be a valuable technique for characterizing the particular wind conditions.Bahrami et al. [35] used PSO to enhance the generation coefficient of the grey model, which played an effective role in improving the accuracy of short-term electric load forecasting.Liu et al. [36] applied the wavelet-particle swarm optimization multilayer perceptron to predict non-stationary wind speeds.However, they proved that the contribution of PSO was less than that of the wavelet component.

Hybrid Forecasting Methods
The hybrid of a GA with existing algorithms can always produce a better algorithm than either the GA or the existing algorithms alone [37]; therefore, the successors could employ hybrid or combined models to achieve good performance.Liu et al. [38] applied wavelets and wavelet packets to preprocess the original wind speed data and concluded that the wavelet packet-ANN had the best performance compared with other traditional models.Ghasemi et al. [39] proposed a novel hybrid algorithm for electricity price and load forecasting, including the flexible wavelet packet transform (FWPT), conditional mutual information (CMI), artificial bee colony (ABC), support vector machine (SVM) and ARIMA.The results showed that the proposed hybrid algorithm had high accuracy in simultaneous electricity forecasting.Ahmad et al. [40] reviewed the development of electrical energy forecasting using artificial intelligence methods, including support vector machine (SVM) and ANN.The results indicated that the hybrid methods were more applicable for electrical energy consumption forecasting.Hu et al. [41] utilized ensemble empirical mode decomposition (EEMD) and SVM to improve the quality of wind speed forecasting, and the proposed hybrid method was proven to achieve an observable improvement in the forecasting validity.These results showed great promise for the forecasting of intricate time series that were also both volatile and irregular.Shi et al. [42] applied hybrid forecasting methods to handle both linear and non-linear components.The results showed that the hybrid approaches were viable options for forecasting both wind speed and wind power generation time series, but they did not always produce a superior forecasting performance for all the forecasting time horizons investigated.
Table 1 summarizes the reviewed forecasting methods.Based on the review above, the drawbacks of traditional forecasting methods can be summarized.Traditional regression methods have high requirements for the original data, including more limited forms of data.They are more applicable to forecasting data with linear trends, whereas for data with high fluctuation and noise, they would become less effective.However, it is well known that time series data in the electrical power system always include a large amount of non-stationary data with seasonality or other tendencies.Moreover, if the environmental or sociological variables change suddenly, the forecasting errors will become large, which is the major drawback of statistical methods [43].On the other hand, in most cases, the one-step forecasting results have higher accuracy; nevertheless, multi-step forecasting always leads to less accurate or reliable forecasting results.The single forecasting methods, such as back propagation neural network (BPNN), can easily get into a local optimum and exhibit a low rate of convergence.

Contribution
To overcome the disadvantages discussed, this paper develops a series of hybrid forecasting models based on different types of improved PSO algorithms to realize accurate and reliable forecasting in the electrical power system.The hybrid models solve the problems above and have the following unique features: (1) Focus on a complex system.From the review above, we know that most researchers focus primarily on the forecasting of a single indicator, whereas this paper explores a new idea and constructs seven hybrid models based on PSO to forecast short-term wind speed, electrical load and electricity price in the electrical power system.The effectiveness of the hybrid models is validated by proving their performance experimentally.The proposed models address the forecasting problems in the complex system, which is of great significance with high practicability.(2) Address the non-stationary data.One of the main features of the proposed hybrid models is the integration of already existing models and algorithms, which jointly show advances over the current state of the art [44].The selection of the type of neural network for the best performance depends on the data sources [45]; therefore, we need to compare the proposed models with other well-known techniques by using the same data sets to prove their performance effectively and efficiently.The hybrid models can handle non-stationary data well [46].(3) High forecasting accuracy.Hybrid models can also help escape a local optimum and search for the global optimum through optimizing the threshold and weight values.In addition, the proposed hybrid models can achieve high forecasting accuracy in multiple-step forecasting, as proven in Experiment IV.This paper develops many types of PSOs: different types of inner modifications of PSO are compared, and combinations of PSO with other artificial intelligence optimization algorithms are analysed.In distinct situations, different types of PSO should be applied.(4) Fast computing speed.The hybrid models have a fast computing speed, allowing short-term forecasting of the electrical power system with high efficiency.(5) Scientific evaluation metrics.The forecasting validity degree (FVD) is introduced to evaluate the performance of the model, in addition to the common evaluation metrics, such as the mean absolute percentage error (MAPE), mean absolute error (MAE) and mean square error (MSE).Thus, we can achieve a more comprehensive evaluation of the developed hybrid models.
The overall structure of this paper is organized as follows: Sections 2 and 3 introduce time series decomposition and optimization of the BP neural network, respectively.The three experimental simulations and the analysis results are reported in Section 4. Section 5 discusses the results, and Section 6 presents the conclusions.

Time Series Decomposition
The preprocessing of the time series plays an important part in improving the forecasting accuracy by obtaining a smoother time series.The essence of empirical mode decomposition is the stabilization processing of a signal through decomposing the fluctuation or tendency in the real signal and then generating a series of data sequences with different characteristic scales.In recent years, this approach has gradually shown its unique advantages in processing non-stationary and non-linear signals [47].However, it suffers from the mode mixing problem.As an improved algorithm, ensemble empirical mode decomposition overcomes this drawback of empirical mode decomposition and maintains the advantages, giving it wider applications [48].The basic concept of ensemble empirical mode decomposition is to add Gaussian white noise to the analysed signal equally, and signal regions with different scales can map suitable scales in relation to the background white noise.Each single signal could generate very noisy results because each test has the white noise added.However, the added white noise would ultimately be eliminated because it has zero mean [49].The overall average is regarded as the final results.
The detailed steps are as follows: Step 1. Initialize the parameters.Set the number of integration N g according to the standard deviation of the signal, and the amplitude of added white noise is a.N g begins from 1 and m = 1.
Step 2. Perform the process of decomposition using empirical mode decomposition.Decompose the added white noise signal and x m = x + n m : n m represents the added white noise with a pre-set amplitude, and x denotes the signal analysed.The decomposed results are called intrinsic mode functions (IMFs), denoted as c i,m , (i = 1, 2, ...).After decomposition, the remaining non-zero signal is the residual function r m .
Step 3. Circle Step 2 through m = m + 1 until the number of the integration arrives at N g .
Step 4. Calculate the ensemble average of IMFs, obtaining the final result.
Remark 1.For each IMF or trend term, the kth white noise counteracts it after calculating the average.The IMF or trend term at each time maintains the natural dyadic filter window; therefore, the final average also maintains this type of quality, and the mixing mode problem is solved.Its pseudo-code is described below [38].
The pseudo-code of EEMD is described in Appendix A.
The ensemble empirical mode decomposition algorithm using parameters presented in [50] is termed the fast ensemble empirical mode decomposition, which can improve the efficiency of the algorithm.It would actually be applied to conduct the decomposition of each time series in this paper.Taking a short-term wind speed series for example, Figure 2 compares the data before and after noise reduction at three observation sites.The line chart shows that after reducing the noise, the time series data seems more stable, which can help to achieve higher forecasting accuracy.Moreover, the table shows that the standard deviation after the de-noising process is smaller than before the process. .After decomposition, the remaining non-zero signal is the residual function m r .
Step 3. Circle Step 2 through 1 m m   until the number of the integration arrives at g N .
Step 4. Calculate the ensemble average of IMFs, obtaining the final result.
Remark 1.For each IMF or trend term, the kth white noise counteracts it after calculating the average.The IMF or trend term at each time maintains the natural dyadic filter window; therefore, the final average also maintains this type of quality, and the mixing mode problem is solved.Its pseudo-code is described below [38].
The pseudo-code of EEMD is described in Appendix A.
The ensemble empirical mode decomposition algorithm using parameters presented in [50] is termed the fast ensemble empirical mode decomposition, which can improve the efficiency of the algorithm.It would actually be applied to conduct the decomposition of each time series in this paper.Taking a short-term wind speed series for example, Figure 2 compares the data before and after noise reduction at three observation sites.The line chart shows that after reducing the noise, the time series data seems more stable, which can help to achieve higher forecasting accuracy.Moreover, the table shows that the standard deviation after the de-noising process is smaller than before the process.

Optimization of Back Propagation Neural Network
Due to the instability of the structure of back propagation, this section introduces optimization algorithms to optimize the weight and threshold of back propagation, including the standard particle swarm optimization algorithm and seven forms of improved particle swarm optimization algorithms.

Standard PSO Algorithm
PSO is an evolutionary algorithm classified in the swarm intelligence group based on bio-inspired algorithms, where a population of Np particles or the proposed solutions evolve with each iteration, moving towards the optimal solution of the problem [51].In fact, a new population in the PSO algorithm is obtained by shifting the positions of the previous one in each iteration, and each individual would be affected by its neighbour's trajectory and its own during its movement [51,52].

Optimization of Back Propagation Neural Network
Due to the instability of the structure of back propagation, this section introduces optimization algorithms to optimize the weight and threshold of back propagation, including the standard particle swarm optimization algorithm and seven forms of improved particle swarm optimization algorithms.

Standard PSO Algorithm
PSO is an evolutionary algorithm classified in the swarm intelligence group based on bio-inspired algorithms, where a population of Np particles or the proposed solutions evolve with each iteration, moving towards the optimal solution of the problem [51].In fact, a new population in the PSO algorithm is obtained by shifting the positions of the previous one in each iteration, and each individual would be affected by its neighbour's trajectory and its own during its movement [51,52].
A standard PSO has certain features.
Feature 1. First, during the initial period, the solutions show stronger randomness with increasing iterations.
Feature 2. Second, one of the advantages of PSO is the application of real number coding, unlike the binary coding of the GA algorithm.
Feature 3. Third, particles can remember through learning from the last generation to find the best solution in the shortest time.Finally, compared with the GA algorithm, the information flow is unidirectional, which means that only g best can deliver information to other particles.The basic steps of the standard PSO are as follows: Step 1. Initialize the velocity and position of each particle in the population: p best indicates the previous optimal position of each particle, and g best represents the global optimal position.
Step 2. Calculate the objective function value, the fitness, of each particle.
Step 3. Update the velocity and position of each particle according to Equations ( 1) and ( 2).
where w means the inertia weight, c 1 is a constant called the cognitive or local weight, and c 2 is a constant called the social or global weight.
Step 4. Calculate the fitness of each particle after the update, and ensure the new p best and g best .
The pseudo-code of the standard PSO algorithm is listed in Appendix B.

Modified PSO Algorithm
This section introduces the modification of the PSO algorithm, including the inertia weight, constraint factor and learning factor.For the inertia weight, three types of modifications are introduced, which are the linear decreasing, self-adaptive and random inertia weight.The premise for the modification of each part of the PSO is that the other parts of the PSO remain unchanged.

Linear Decreasing Inertia Weight Particle Swarm Optimization (LDWPSO)
Definition 1.The inertia weight could influence both the local optimization and global optimization of particles.A larger inertia weight w max is beneficial in improving the global searching ability; in comparison, a smaller inertia weight w min could enhance the local searching ability of the algorithm.The transformation formula is as follows: where t is the number of iterations, and t max is the maximum number of iterations.
Remark 2. According to Definition 1, we have modified the inertia weight based on Equation (3).The linear decreasing inertia weight can achieve a high global search ability.

Self-Adaptive Inertia Weight Particle Swarm Optimization (SAPSO)
Definition 2. The self-adaptive inertia weight w is conductive to balancing the local and global search ability and belongs to the non-linear adjustment method.When the fitness of each particle tends to be uniform or the local optimum, the inertia weight w increases.At the same time, if the fitness is better than the average fitness of each particle, the corresponding inertia weight is smaller, so this particle can stay.The adjustment equation is given below: Energies 2016, 9, 640 where w max is the maximum inertia weight and w min is the minimum inertia weight, f min is the minimum fitness, f avg is the average fitness, and f is the fitness.

Remark 3.
According to Definition 2, we have modified the inertia weight based on Equation ( 4).The self-adaptive inertia weight has a higher ability to balance the local and global searching, which is beneficial to searching for the optimal particle.

Random Weight Particle Swarm Optimization (RWPSO)
The other way to overcome the shortcomings of the linearly decreasing inertia weight is to choose w randomly.Definition 3. Random weight.If the best solution is obtained at the beginning of the evolution, the inertia weight w could be generated smaller at random, which it is helpful in accelerating the velocity of the particle.Furthermore, if the best solution cannot be found at the beginning, the random inertia weight w can overcome the disadvantage of the slow convergence rate.The change equation of the inertia weight is as follows: Remark 4. According to Definition 3, we can obtain a random w between 0.5 and 1.Such modification can bring the particle swarms closer to the objective function, which achieves a higher forecasting accuracy and convergence rate.
where φ is the constriction factor, φ = Remark 5.According to Definition 4, we have modified the constriction factor based on Equation (6).The introduction of constriction factors is beneficial in ensuring the convergence of the particles and cancelling the constraint of the border on the velocity.

Learning Factor Change Particle Swarm Optimization (LNCPSO)
Definition 5.The experience information of each particle and its influences on the movement trail of the experience information of other particles are determined by the learning factor C, which reflects the exchange of information between particles.A larger c 1 would allow particles to wander in the local region, and a larger c 2 would result in the early convergence of a local optimum.The change equation is listed below: where t represents the number of iterations, and c max and c min denote the maximum and minimum learning factor, respectively.Remark 6.According to Definition 5, we have modified the learning factor of PSO based on Equation (7).
The modification can achieve a balance between c 1 and c 2 , which ensures a suitable convergence rate and searching ability.

Combination with Simulated Annealing Algorithm (SMAPSO)
The simulated annealing algorithm could accept both a good solution and a bad solution with defined probabilities during the search process.At the same time, SA is effective for avoiding falling into a local optimum.The algorithm starts from a certain initial solution and then generates another solution from the neighbourhood randomly.
Step 1. Initialize the location m and velocity v of the particles according to random methods.
Step 2. Calculate the fitness of each particle based on the fitness function, and assign the fitness value of each particle to P i .
Step 3. Implement the simulated annealing.Step 4. Calculate the fitness value of each particle at the current temperature, using the equation shown below: Step 5. Update the location and velocity of each particle, ensure the global optimal value P g , and calculate the new fitness.

Combination with Genetic Algorithm (GAPSO)
The genetic algorithm (GA) conducts a search based on the population of chromosomes with the operations of selection, crossover and mutation.GA has the ability to update particles rapidly, avoiding the premature convergence problem of standard particle swarm optimization.Therefore, the advantages of GAs can compensate perfectly for the disadvantages of particle swarm optimization [52].
Step 1. Initialize the number of particles m of group U, and set the maximum iteration to N.
Step 2. Calculate the fitness according to Equation (9).
where Fitness is the fitness function, and k and b are both constants.Here, t i is the actual value, and y i is the forecasted value.
Step 3. Introduce the selection, crossover and mutation of the genetic algorithm.The group with the better fitness is selected for the next generation.Then, the crossover operation of the location and speed between i and j is given below: Step 4. Update the individual optimum and global optimum of the groups.Compare the current fitness of each particle and the fitness of the individual optimum pbest.If the current fitness is better, then update pbest.Compare the individual optimum pbest and the global optimum of the group gbest.If the current pbest is better than gbest, then update gbest.
Step 5. Repeat the above steps until the iteration reaches its maximum value.

Back Propagation Neural Network
A back propagation neural network is a feed-forward neural network implemented by the back propagation algorithm and it is among the most widely applied neural network modes [53].Back propagation can learn and store a large amount of input-output map relations without revealing the mathematical equation that describes the relation.The learning rule is the steepest descent method, and the sum of squared errors is minimized through the back propagation continuously adjusting the weight and threshold [54].Definition 6.The weight and threshold are two important network parameters, and their adjustment formula can be expressed as follows: where H j is the output of hidden node j; I h is the input signal from input node h; w kj (t) and w kj (t + 1) are the weights between hidden node j and output node k before and after the training; u jh (t) and u jh (t + 1) are the weights between hidden node j and input node h before and after the training; θ k and θj are separately the threshold of output node k and hidden node j; α and β are the learning parameters between 0.1 and 0.9; and δ k and σ j are the error signals of output node k and hidden node j, with the following equations: where T k is the target output for output node k.O k and H j are the actual output in output node k and hidden node j.The formula of H j is given below: where n i is the number of input nodes.The function f is the S activation function: The output layer calculates the sum using a linear weighting method: where n h is the number of hidden nodes.

The Hybrid Models
The time series data in the electrical system exhibit high fluctuation, so it is necessary to denoise the time series data in advance.The fast ensemble empirical mode decomposition introduced above is applied for the decomposition of the data to improve the forecasting accuracy.Artificial neural networks can obtain the data laws and can be used in time series forecasting.Among them, the back propagation neural network is one of the most widely applied neural network modes.However, in practical application, it still has some limitations.It is difficult to determine the weight and threshold of the structure of back propagation.Accordingly, seven improved particle swarm optimization algorithms were all employed to seek the optimal value of the weight and threshold of back propagation and compared to determine the most effective hybrid model for time series forecasting.The detailed steps are listed below and shown in Figure 3.
The output layer calculates the sum using a linear weighting method: where h n is the number of hidden nodes.

The Hybrid Models
The time series data in the electrical system exhibit high fluctuation, so it is necessary to denoise the time series data in advance.The fast ensemble empirical mode decomposition introduced above is applied for the decomposition of the data to improve the forecasting accuracy.Artificial neural networks can obtain the data laws and can be used in time series forecasting.Among them, the back propagation neural network is one of the most widely applied neural network modes.However, in practical application, it still has some limitations.It is difficult to determine the weight and threshold of the structure of back propagation.Accordingly, seven improved particle swarm optimization algorithms were all employed to seek the optimal value of the weight and threshold of back propagation and compared to determine the most effective hybrid model for time series forecasting.The detailed steps are listed below and shown in Figure 3. Step 1. Fast ensemble empirical mode decomposition is employed to denoise the original time series data of three indicators.
Step 2. Back propagation is introduced to forecast the time series data of wind speed, electricity price, and electrical load.Step 1. Fast ensemble empirical mode decomposition is employed to denoise the original time series data of three indicators.
Step 2. Back propagation is introduced to forecast the time series data of wind speed, electricity price, and electrical load.
Step 4. Four metrics, including MAPE, MAE, MSE and FVD, are used to evaluate the forecasting performance of the proposed hybrid models by comparing them with a series of traditional models, models combined with other algorithms and the single models.
Step 5.The hybrid models proposed in this paper forecast time series based on the historical data, and multi-step forecasting is conducted to testify further to the effectiveness of the model.
Figure 3 presents the flow chart of the hybrid models.The first part of the hybrid model is the FEEMD.White noise is added to the original time series data, and EMD decomposition is conducted to obtain the intrinsic mode functions.The preprocessed time series data are used to forecast the wind speed, electrical load and electricity price.The second part is BPNN optimized by different types of PSO, including LDWPSO, RWPSO, SAPSO, LNCPSO, CPSO, SMAPSO and GAPSO.Through optimizing and updating the weight and threshold values in BP, the optimal values can be obtained for forecasting.

Experimental Simulation and Results Analysis
This section is aimed at proving the effectiveness of the proposed hybrid models through four experiments after introducing the data sets, data preprocessing and evaluation metrics.The four experiments compare the hybrid models with other famous traditional forecasting models, models with different optimization algorithms and forecasting with different steps.

Data Sets
This paper selects three indicators for forecasting in the electrical system: the data sets include a short-term wind speed time series, an electrical load time series and an electricity price time series.First, for short-term wind speed data, the time interval is 10 min, covering from 1 January to 25 January at three observation sites.The data from 20 days are applied to forecast the data for one day.The number of training data points is 2880, and the number of testing data points is 144.For example, the first training data set covers from 1 January to 20 January, and the corresponding testing data set is 21 January.Similarly, the final training data set is from 5 January to 24 January, and the corresponding testing data set is 25 January.The average of five forecasting days would be calculated as the final results of the hybrid model at each observation site to overcome the instability of back propagation.For the electrical load and electricity price time series data sets, the span is from 1 January to 25 January, collected from New South Wales (NSW).The data from 20 days are used to forecast the data for one day.The number of training data points is 960, and the number of testing data points is 48.The data applied in this paper are all primary data obtained from the local wind farms.
Figure 4 describes the data selection scheme of three indicators in the electrical power system.Figure 4a shows how to choose the data to build the model and conduct the forecasting.Figure 4b shows the forecasting values of the time series data.Figure 4c is the structure of the BPNN, including the input, hidden and output layers.Finally, Figure 4d depicts the data selection scheme for multiple-step forecasting.
wind farms.
Figure 4 describes the data selection scheme of three indicators in the electrical power system.Figure 4a shows how to choose the data to build the model and conduct the forecasting.Figure 4b shows the forecasting values of the time series data.Figure 4c is the structure of the BPNN, including the input, hidden and output layers.Finally, Figure 4d depicts the data selection scheme for multiple-step forecasting.

Evaluation Metrics
It is crucial to apply effective evaluation metrics to assess the forecasting accuracy and this paper introduces two types of metrics: the evaluation of multiple points and the overall performance of the model.

Evaluation of Multiple Points
In addition to evaluating a single point, it is also necessary to assess the forecasting accuracy of multiple points.Three metrics, including MAE, RMSE and MAPE, are applied for this evaluation.

Evaluation Metrics
It is crucial to apply effective evaluation metrics to assess the forecasting accuracy and this paper introduces two types of metrics: the evaluation of multiple points and the overall performance of the model.

Evaluation of Multiple Points
In addition to evaluating a single point, it is also necessary to assess the forecasting accuracy of multiple points.Three metrics, including MAE, RMSE and MAPE, are applied for this evaluation.
MAE and RMSE measure the average magnitude of the forecasting errors, and their equations are given below: Energies 2016, 9, 640 15 of 34 MAPE is an effective method for measuring forecasting errors, and smaller values indicate a higher degree of forecasting accuracy of the model.The MAPE criteria are listed in Table 2 [55].If the value of MAPE is smaller than 10%, the forecasting degree is excellent; if the value is between 10% and 20%, the forecasting degree is good; if the value is between 20% and 50%, the forecasting degree is reasonable; however, if the value is larger than 50%, the forecasting degree is incorrect, which indicates that the forecasting result is very poor.

Forecasting Validity Degree
Currently, the evaluation of the validity of most models uses the error sum of squares and the sum of the absolute value of the errors; in fact, these metrics cannot reflect the validity of forecasting methods well due to the different dimensions of different sequences.This paper introduces the forecasting validity degree based on the element of the invalid degree of k-order forecasting relative error [56].The validity of forecasting methods should be reflected by the comprehensive and average accuracy.That is to say, a method with a high forecasting accuracy in certain periods may not have a high forecasting validity.When the forecasting accuracy is high in all periods, we can say that the method achieves a high forecasting validity.The greater the average forecasting accuracy, the higher the forecasting validity.
Assume the observed value of the indicator sequence is {x t , t = 1, 2, ..., N}, there are m single forecasting methods to forecast the sequence, and x it is the forecasting value at time t with the ith method, i = 1, 2, ..., m, t = 1, 2, ..., N. Some concepts are listed below Definition 7. The value of e it is the relative error of the ith method at time t (i = 1, 2, ..., m, t = 1, 2, ..., N).E = (e it ) m×N is the matrix of relative error.
Matrix E is the sequence of the forecasting relative error using the ith forecasting method at each time t.The tth column of E is the sequence of the forecasting relative error at time t with each method.

Remark 8.
The value e it has randomness due to the influences of all types of factors, so • • • N} can be regarded as a sequence of random variables.

Definition 9.
The element of the k-order forecasting validity degree with the ith method can be shown as follows: where k is the positive integer, (i = 1, 2, • • • m) and {Q t , t = 1, 2, • • • , N} is the discrete probability distribution of the mth forecasting method at time t:

Remark 9.
When the a priori information of the discrete probability distribution is unknown, is the two-order forecasting validity of the i th forecasting method.
Remark 10.Definition 11 indicates that the one-order forecasting validity index is the mathematical expectation of the forecasting accuracy series.When the difference between one and the standard deviation of the forecasting accuracy series is multiplied by its mathematical expectation, the two-order forecasting validity index is obtained [57].
According to Definition 7, we define the forecasting accuracy of the ith forecasting method at time t, A it , as follows: It is clear that A it has the property of a random variable.
Definition 12.The forecasting validity degree of the ith method in the forecasting interval (N + 1, N + T) can be expressed as (30) where Q it signifies the discrete probability distribution of the forecasting accuracy A it of the ith forecasting method in the forecasting interval at time t. N+T Therefore, m if can be regarded as the objective function of the combination forecasting model, and its optimizing model is It is a linear programming problem, so the forecasting validity degree can be calculated based on Equation (31).

Diebold Mariano Test
Diebold and Marino [58,59] proposed the original Diebold Mariano (DM) test, and its essence is described as follows: The forecasting errors e it can be defined as where y t is the actual time series data, ŷit is the ith competing forecasting series, and m denotes the number of forecasting models.The square-error loss function is chosen as shown in Equation ( 33) because it is symmetric around the original points and penalizes larger errors more severely.The equal accuracy hypothesis is tested to judge the forecasting performance of each model.The null and alternative hypotheses are listed in Equation (34): The DM test is based on the loss function d and the sample mean loss differential d, given in Equations ( 35) and (36), respectively: Therefore, the DM test statistic is where 2π fd (0) is a consistent estimator of the asymptotic variance of √ Td.The DM statistics cover a normal distribution, so we can reject the null hypothesis at the 5% level if |DM| > 1.96; otherwise, if |DM| ≤ 1.96, the null hypothesis cannot be rejected [60].

Experimental Setup
Three experiments are conducted to prove the effectiveness of the hybrid proposed models, Experiment I, Experiment II and Experiment III.The electrical load time series data are the most regular, the wind speed time series data are intermediate, and the electricity price time series data are the most irregular.Therefore, three experiments are performed to testify to the validity of the proposed hybrid models in the electrical power system.In each experiment, three types of comparisons are conducted to prove the effectiveness of the model comprehensively and successfully.
Initially, the hybrid FE-NPSO-BP model is compared with PSO-BP to verify the denoising performance of FEEMD.In this comparison, modified PSO algorithms include LDWPSO, SAPSO, RWPSO, CPSO and LNCPSO.The main differences lie in the adjustment of the inertia weight, constriction factor and learning factor.The combined PSO algorithms include SMAPSO and GAPSO.NPSO refers to both modified and combined PSO.Secondly, the hybrid models are compared with some famous forecasting models, such as ARIMA, first-order coefficient (FAC), second-order coefficient (SAC), grey model (GM), Elman neural network (ENN) and BP, demonstrating the advantages of EF-NPSO-BP as proposed in this paper.ARIMA, FAC and SAC belong to statistical models that are more applicable when forecasting the time series with a linear trend.In comparison, GM, ENN and BP have a high ability to forecast the non-linear trend, tolerate error, and learn adaptively.Finally, other optimization algorithms, including the standard PSO, artificial fish swarm algorithm (AFSA), cuckoo algorithm (CA), SA, GA, and ant swarm algorithm (ASA) are applied to optimize the threshold and weight values of BP.The aim of the comparison is to prove the effectiveness of the modification or combination of the PSO, and the parameters of each algorithm are set according to other literature reports.For the other indexes, different models achieve different values.Therefore, it can be summarized that BP optimized by other single optimization algorithms is less stable, and it is difficult to find a suitable method to forecast the electrical load time series accurately.(c) Finally, compared to conventional models, GM has the best forecasting performance, and the MAPE is 2.96% and 2.25% in three-step and one-step forecasting, respectively.The MAPE of ARIMA in two-step forecasting is 2.77%.In general, the performance of machine-learning-based methods is better than for traditional statistical models.

Experiment I
Remark 11.In one-step forecasting, the forecasting accuracy of the electrical load time series is approximately 2% because the time series is more regular.The results show that the effects of the modified and combined particle swarm optimization are similar.FE-NPSO-BP outperforms both conventional models and BP optimized by other algorithms.

Experiment II
Table 4 shows the forecasting results of the wind speed time series, which is less regular than the electrical load time series.The forecasting of wind speed is a challenging task.This section demonstrates the forecasting results using the hybrid models put forward in this paper.The findings are listed below: (a) For the short-term wind speed time series and all forecasting steps, the combined PSO algorithms achieve better forecasting accuracy.GAPSO has the best forecasting results, with a MAPE of 3.18% and an FVD of 0.905 in one-step forecasting.In comparison, PSO-BP has the worst performance, and the MAPE increases by 0.57% compared with GAPSO because FEEMD denoises the original time series and makes the processed data smoother.It can be concluded that the combined algorithms are more effective in forecasting the short-term wind speed, which is because GA has a stronger ability to search for the global optimum and achieve a faster rate of convergence.(b) Among the separate types of optimization algorithms, FE-CA-BP and FE-AFSA-BP have the best forecasting performance.In comparison, the proposed hybrid model, FE-GAPSO-BP, increases the forecasting accuracy by 0.06%, 0.18% and 0.17%.The forecasting differences among different types of optimization algorithms are not significant.(c) When comparing the proposed hybrid models with traditional forecasting methods, BP, ENN and GM achieve the best MAPE, with 3.31%, 4.55% and 5.31%.Although ARIMA has better MAEs in one-and two-step forecasting, the other indexes such as MSE and FVD are worse.BP has a better FVD, but its forecasting performance is worse than that of GAPSO because the output of the single BP is not stable and has a relatively low capability for fault tolerance.
Remark 12.The proposed FE-NPSO-BP outperforms the other traditional forecasting models, and the artificial intelligence neural network has better forecasting performance than traditional statistical models.The forecasting accuracy for short-term wind speed is approximately 3.2%.

Experiment III
Experiment III was designed to verify the effectiveness of the proposed hybrid models using the electricity price time series.If the hybrid models are applicable and suitable, it can be concluded that the proposed hybrid models are effective in forecasting the electric power system, which is because the electricity price is the most irregular one compared to the above two time series.Table 5 shows the comparison results.
(a) For the electricity price time series, SMAPSO has the lowest MAPE and the highest FVD.The MAPE values for GAPSO are similar to the values for SMAPSO, which means that the combined PSO algorithm is more effective in forecasting the electricity price.In one-step forecasting, SMAPSO increases the forecasting accuracy by 0.66%.(b) When comparing different types of algorithms, the MAPE value of FE-SA-BP is the best, with 5.29% for one-step forecasting, the MAPE of FE-AFSA-BP achieves the best value of 5.68% for two-step forecasting, and FE-PSO-BP has the best MAPE at 6.17%.BP optimized by NPSOs outperforms the other algorithms, including AFSA, CA, GA, PSO, ACA and SA.Therefore, the combination of algorithms can adopt the advantages of the single ones.Both the ability to search for the global optimum and the convergence rate are enhanced.(c) Finally, consistent with the results of the electrical load and wind speed time series data, the machine-learning-based algorithms have better forecasting performance than the conventional algorithms, such as ARIMA, FAC and SAC, because the indexes of MAE, MSE and FVD are all better as well.
Remark 13.Based on this comparison, it could be summarized that the hybrid models optimized by the improved particle swarm optimization algorithms perform better than the other types of optimization algorithms, further proving the effectiveness of the model.Moreover, the difference between one-step and three-step forecasting is small and is on an acceptable scale.Therefore, the proposed hybrid models are concluded to be suitable for multi-step forecasting.The forecasting accuracy for the electricity price is approximately 5%.
Figure 5 compares the results of the hybrid models.The bar chart represents the MAPE, and the line chart represents the FVD.The figure shows that for the electrical load time series, FE-LNCPSO-BP has the lowest MAPE and highest FVD.For the wind speed time series, FE-GAPSO-BP achieves the best MAPE and FVD.For the electricity price time series, FE-SMAPSO-BP has the lowest MAPE and the best FVD.The figure clearly shows the performance of each forecasting model.Furthermore, the table presents the forecasting MAPE for the three steps.We find that the MAPE differences between one-step and three-step forecasting for the three time series data are 0.73%, 1.83% and 1.22%, respectively, which is acceptable.

Discussion
This section aims to present a deeper discussion of the experimental results, including statistical models, artificial intelligence neural networks, each part in the hybrid model, forecasting steps and running time.

Statistical Model
As is well known, traditional statistical models include AR, ARMA, ARIMA, FAC and SAC.ARIMA is the next-generation form of ARMA and has a higher forecasting accuracy.In this paper, we only apply ARIMA to conduct the forecasting.For the short-term wind speed time series and the electrical load time series, it has higher forecasting accuracy than BP.For the electricity price time series, its forecasting performance is worse.The basic form of ARIMA is ARIMA (p, q, d).The models could be fitted by least squares regressions to find the values of the parameters, which could minimize the error term after p and q are set.Akaike information criterion (AICs) are applied to judge whether p and q are the best [55].In our experiments, the form of ARIMA for both short-term wind speed and electricity price is ARIMA (3,2,1).For the electrical load time series, the form of ARIMA is ARIMA (3,3,1).The other two models are FAC and SAC.The aim of FAC is to correct the coefficient values constantly based on changes in data for the best forecasting results.SAC is an improvement based on FAC.The experiments above reveal that their forecasting accuracy is worse than that of ARIMA.Although traditional statistical models are used widely in electrical system forecasting, they are more applicable in forecasting linear trends: in other words, it is required that the original data should be smooth without high fluctuation.

Artificial Intelligence Neural Network
As a comparison, the artificial intelligence neural networks are suitable for forecasting non-linear data trends, and this paper contrasts the hybrid models with GM, ENN and BPNN.In the three experiments, the hybrid models proposed all achieve the best forecasting accuracy.The forecasting performance of GM for the electrical load time series is better than the other models, with MAPE values of 2.34%, 2.79% and 2.96%, respectively.However, the forecasting results of BPNN are worse than for the traditional statistical models, such as ARIMA, FAC and SAC, because the

Discussion
This section aims to present a deeper discussion of the experimental results, including statistical models, artificial intelligence neural networks, each part in the hybrid model, forecasting steps and running time.

Statistical Model
As is well known, traditional statistical models include AR, ARMA, ARIMA, FAC and SAC.ARIMA is the next-generation form of ARMA and has a higher forecasting accuracy.In this paper, we only apply ARIMA to conduct the forecasting.For the short-term wind speed time series and the electrical load time series, it has higher forecasting accuracy than BP.For the electricity price time series, its forecasting performance is worse.The basic form of ARIMA is ARIMA (p, q, d).The models could be fitted by least squares regressions to find the values of the parameters, which could minimize the error term after p and q are set.Akaike information criterion (AICs) are applied to judge whether p and q are the best [55].In our experiments, the form of ARIMA for both short-term wind speed and electricity price is ARIMA (3,2,1).For the electrical load time series, the form of ARIMA is ARIMA (3,3,1).The other two models are FAC and SAC.The aim of FAC is to correct the coefficient values constantly based on changes in data for the best forecasting results.SAC is an improvement based on FAC.The experiments above reveal that their forecasting accuracy is worse than that of ARIMA.Although traditional statistical models are used widely in electrical system forecasting, they are more applicable in forecasting linear trends: in other words, it is required that the original data should be smooth without high fluctuation.

Artificial Intelligence Neural Network
As a comparison, the artificial intelligence neural networks are suitable for forecasting non-linear data trends, and this paper contrasts the hybrid models with GM, ENN and BPNN.In the three experiments, the hybrid models proposed all achieve the best forecasting accuracy.The forecasting performance of GM for the electrical load time series is better than the other models, with MAPE values of 2.34%, 2.79% and 2.96%, respectively.However, the forecasting results of BPNN are worse than for the traditional statistical models, such as ARIMA, FAC and SAC, because the network of BPNN is not stable.Therefore, to improve the stability of the network of BPNN and thus the forecasting accuracy, this paper introduces improved PSO to optimize the weight and threshold of BPNN.The experiments above show that the optimized BPNN combines the advantages of each single model, and the forecasting accuracy is improved.As is well known, the three data sequences in the electrical system all have irregular distributions with high fluctuation and substantial noise.Thus, NNs are more applicable when forecasting the indicators in the electrical system due to the training and testing mechanism and the high error tolerance.We also explore the influence of the training and verification ratio on the forecasting results.In addition to 15:1 and 5:1 ratios of training and verification, we also set the value to 3:1, 6:1 10:1 and 12:1.The final results demonstrate that there is no close relationship between the training and verification ratio and the forecasting accuracy.To summarize, although NNs still have some limitations, they are the fittest models for forecasting the indicators in the electrical system after the parameters in the network are optimized and the original data are preprocessed.

Significance of Forecasting Results
In this part, the significance of the forecasting performance of the proposed models is tested by using the DM test.The pairwise comparisons of the forecasting models are summarized in Table 6.The null hypothesis is that there are no observed differences between the performances of two forecasting models, while the alternative hypothesis is that the observed differences between the performances of two forecasting models are significant.For the electrical load time series data, the most suitable model is FE-LNCPSO-BP, so it is compared with the other models.The results show that the differences between FE-LNCPSO-BP and FE-SAPSO-BP, FE-SMAPSO-BP, FE-AFSA-BP, and FE-CA-BP are not significant, which indicates that both combined and modified PSO can forecast the electrical load time series accurately.There is a significant difference between FE-LNCPSO-BP and the other compared models.For the wind speed time series data, FE-GAPSO-BP is the most suitable model, and the results of the DM test show that the differences between FE-GAPSO-BP and FE-LDWPSO-BP, FE-RWPSO-BP, FE-CPSO-BP, FE-SMAPSO-BP, FE-AFSA-BP, and FE-SA-BP are not significant.Therefore, the proposed hybrid models are more effective than the other models.Finally, for electricity price forecasting, FE-GAPSO-BP is the only model with no significant difference from FE-SMAPSO-BP.The other models all exhibit significant differences from FE-SMAPSO-BP.Thus, based on these results, we know that the combined models are more suitable for less regular time series data than the modified models.

Discussion of the Effectiveness of Fast Ensemble Empirical Mode Decomposition
The data in the electrical system are irregular and include high fluctuation with noise, so it is very important to denoise the original data sequences before conducting the forecasting.In this paper, FEEMD is applied to denoise the original time series data.By comparing the forecasting results of PSO-BP with the results of FE-PSO-BP, we can testify to the effectiveness of FEEMD, which was found to increase the forecasting accuracy greatly: FEEMD increases the MAPE by 0.96%, 0.11% and 0.18% for the electrical load time series, short-term wind speed time series, and electricity price time series, respectively.In addition to the improvement of the MAPE, the FVD also increases substantially, to 0.038, 0.039 and 0.040, respectively.Therefore, FEEMD not only contributes to the forecasting accuracy but also can help increase the FVD.Furthermore, this paper removes the first two IMFs, and the rest are utilized in the forecasting.To deeply explore the effectiveness of FEEMD, we also implement experiments that remove the first three, four and five IMFs and judge whether the forecasting results are affected.The results demonstrate that when the first two columns are removed, the forecasting accuracy is the best, as in our experiment.

Comparison of Different Types of Particle Swarm Optimization Algorithms
The improved particle swarm optimization algorithm could be divided into two parts.One part is to introduce the advanced theory into the particle swarm optimization algorithm, and the other is to combine the particle swarm optimization algorithm with other intelligent optimization algorithms.First, in the discussion above, the initial method of modifying the particle swarm optimization algorithm is to adjust its inertia weight.The linearly decreasing inertia weight particle swarm optimization algorithm contributes to obtaining the best solution; however, it still has three drawbacks.The first drawback is that the linearly decreasing inertia weight reduces the convergence rate of the algorithm.The second drawback is that the algorithm is prone to falling into a local optimum because the local search ability of the algorithm is weak at the beginning, and the global search capacity is weak at the end.The final drawback is that it is difficult to forecast the maximum number of iterations, which affects the regulatory function of the algorithm.
Thus, to balance the searching ability of the local and global optimums, the adjustment of non-linear inertia weights is incorporated, including the self-adaptive inertia weight particle swarm optimization and random weight particle swarm optimization.In the former technique, the inertia weight changes along with the value of the fitness.For the latter technique, choosing the inertia weight randomly could overcome the disadvantages of the linearly decreasing method mentioned above.In addition to the inertia weight, the learning factor also plays a significant role in improving the efficiency of the particle swarm optimization algorithm.The learning factor would affect the flying velocity of each particle, and, thus, the introduction of the constriction factor is beneficial for controlling the flying velocity and enhancing the local searching ability of the particles compared to the adjustment of the inertia weight.
The second method is to combine other algorithms with the particle swarm optimization algorithm to overcome the disadvantages of a single algorithm.The combination with simulated annealing is simple to conduct and improves the ability to seek the global optimum, simultaneously enhancing the rate of convergence and the accuracy of the algorithm.The combination of GA and particle swarm optimization also strengthens the convergence rate and improves the convergence accuracy.
The experimental results demonstrate that the combination of particle swarm optimization with other algorithms is more effective when the forecasting accuracy is approximately 4%.However, for high forecasting accuracy, such as for the electrical load time series, there is no great difference between the different types of algorithms.

Selection of the Hidden and Input Layers for Back Propagation
This paper applies BPNN, the most common and effective artificial intelligence neural network in practical application, for forecasting in the electrical system.However, BPNN possesses some drawbacks: for example, its output results are unstable due to the instability of learning and memory, and its convergence rate is slow.Therefore, two key parameters, the weight and threshold values, are optimized by the optimization algorithms in this paper to obtain a more valid hybrid model.Moreover, the selection of hidden layers is a highly complicated problem that requires more experience and several experiments, as there is no ideal analysis formula to calculate the hidden layers.The number of input layers and hidden layers of the BP neural network has a direct relationship with the forecasting accuracy.When the number is too small, there is not enough information for the network to learn; similarly, when the number is too large, it not only increases the training time but also leads to too much time for learning.Under that condition, the error may not be optimal.Furthermore, the large number of input layers and hidden layers would also lead to low fault tolerance, making it difficult for the neural network to identify the samples that are not trained.Furthermore, the overfitting problem cannot be ignored: the increasing error results in the decreasing generalization ability.Therefore, it is crucial to select an appropriate number of hidden layers.In our experiments, the listing technique is applied to choose the input layers and hidden layers.Table 7 shows that when the number of input layers is four and the number of hidden layers is six, BPNN has the best forecasting accuracy for the short-term wind speed time series.When the number of input layers is three and the number of hidden layers is nine, BPNN achieves higher forecasting accuracy for the electrical load time series.When the numbers of input layers and hidden layers are three and six, respectively, the forecasting error of BPNN for electricity price is the smallest.

Steps of Forecasting
To verify the effectiveness of the proposed hybrid model, this paper conducts multi-step forecasting, including one-step, two-step and three-step forecasting, for the three indicators in the electrical system.Table 8 compares the multi-step forecasting accuracy.For the electrical load time series, the forecasting accuracy of one-step forecasting increases by 0.56% and 0.73% compared with three-step forecasting and two-step forecasting.For the wind speed time series, the difference between one-step forecasting and two-step and three-step forecasting is 1.08% and 1.83%, respectively.For the electricity price time series, the forecasting accuracy improves by 0.56% and 1.22%, respectively.In other words, the difference between one-step three-step forecasting is within 2%, which is on an acceptable scale.Furthermore, it is proven that the hybrid model proposed is suitable for multi-step forecasting.The optimization of the parameters in BPNN allows the forecasting model to obtain more accurate results, using its advantages to compensate for the shortcomings of the other component models, which demonstrates the superiority of the proposed hybrid models [61].

Running Time
Table 9 compares the results of the performance time for the experiments using different algorithms on all of the data sets, implemented on Windows 8.1 with a 2.5 GHz Intel Core i5-4200U, 64 bit with 4GB RAM.FE-RWPSO-BP has the shortest running time, at 71.33 s.FE-CA-BP has the longest running time, at 161.49 s.In comparison, the running time of the FE-NPSO-BP models is within two minutes, which also testifies to the good forecasting performance of the hybrid models.They are applicable for forecasting short-term wind speed with a 10 min interval and electrical load and electricity price with a 30 min interval.

Conclusions
Electrical power systems always play an important part in the planning of national and regional economic development.The following three key indicators in the electrical power system are forecasted here: the short-term wind speed, electrical load and electricity price.All of these indicators contain a large amount of information related to the generation, distribution and trade of electricity.However, it is difficult to implement accurate forecasting due to the high fluctuation and noise in the original data sequences.This paper proposes a series of hybrid models called FE-NPSO-BP to explore how to attain better forecasting performance.The electrical load time series is the most regular; therefore, it is easier to achieve a higher forecasting accuracy of approximately 2%.In comparison, the electricity price is the most irregular; thus, its forecasting accuracy is lower, with an approximate value over 4%.The wind speed time series data are intermediate compared to the other two.In our experiments, we found that combined PSO algorithms are more effective for irregular time series data than modified PSO algorithms.However, when the time series data tend to be regular, both combined and modified PSO algorithms are suitable for forecasting.In one-step forecasting, GAPSO, LNCPSO and SMAPSO are the most suitable models for the short-term wind speed, electrical load and electricity price time series.Moreover, the presented models are designed to be easily parallelizable, and, thus, they can perform the learning process over a large data set in a limited amount of time.Both the forecasting accuracy and running time of the hybrid models demonstrate their effectiveness in time series forecasting for electrical power systems.

Figure 1 .
Figure 1.Top 10 countries in cumulative installed wind power capacity in 2014.

Figure 1 .
Figure 1.Top 10 countries in cumulative installed wind power capacity in 2014.

Figure 2 .
Figure 2. The denoising process of original time series data.

Figure 2 .
Figure 2. The denoising process of original time series data.

3. 2 . 4 .Definition 4 .
Constriction Factor Particle Swarm Optimization (CPSO) Particles can control the flying speed effectively, allowing the algorithm to reach a balance of global and local exploration.The velocity equation is described below: (a) Set the initial temperature T k (k = 0), and generate the initial solution x 0 .(b) Repeat the following steps at temperature T k until T k arrives at a balanced state.Generate a new solution x in the domain of x; calculate the objective function f(x ) of x and the objective function f(x) of x; calculate the difference between f(x ) and f(x); and obtain x according to min {1, exp(−∆f/T k )} > random[0, 1].(c) Annealing.T k+1 = CT k, k ← k + 1 ; if the condition of convergence is met, then the annealing process ends.Otherwise, return to (b).

Figure 3 .
Figure 3. Flow chart of the hybrid models proposed in this paper.

Figure 3 .
Figure 3. Flow chart of the hybrid models proposed in this paper.

Figure 4 .
Figure 4. Data selection scheme for three indicators in the electrical power system.

Figure 4 .
Figure 4. Data selection scheme for three indicators in the electrical power system.

Figure 5 .
Figure 5.Comparison of forecasting results in three experiments.

Figure 5 .
Figure 5.Comparison of forecasting results in three experiments.

Table 1 .
Summary of forecasting methods.
t = 1, 2, ..., N. In effect, m k i is the k-order origin moment of the forecasting accuracy sequence {A it , t = 1, 2, ..., N} with the ith forecasting method.
Definition 11.When H (x) = x is a one-element continuous function, H m 1 i = m 1i is the one-order forecasting validity of the ith forecasting method; when H

Table 3
shows the forecasting results of the electrical load time series data by applying hybrid models with different improved PSO algorithms, conventional models, and BP optimized by other optimization algorithms.The results clearly showed the following: (a) First of all, for the comparison of FE-NPSO-BP, in one-step forecasting, LNCPSO has the best MAPE, MAE and FVD at 2.08%, 85.468 and 0.926, respectively.GAPSO achieves better MSE in one-step forecasting.For three-step forecasting, RWPSO has the lowest MAPE, at 2.72%.The forecasting error between the combined PSO and modified PSO is small.Therefore, in summary, the forecasting result is similar for modified PSO and combined PSO when forecasting electoral load time series.(b) For BP optimized by different algorithms, in one-step forecasting, FE-CA-BP has the lowest MAPE, which is 2.18%.FE-PSO-BP and FE-ACA-BP have the best MAPE with 2.69% and 2.90%.

Table 3 .
Comparison of hybrid models with different optimization algorithms and conventional models for electrical load time series.

Table 4 .
Comparison of hybrid models with different optimization algorithms and conventional models for wind speed time series.

Table 5 .
Comparison of hybrid models with different optimization algorithms and conventional models for electricity price time series.

Table 6 .
Summary of DM test (values of DM are absolute values).

Table 7 .
Selection of input layers and hidden layers of BP neural network (MAPE).

Table 8 .
Comparison of multi-step forecasting accuracy.

Table 9 .
Comparison of performance times for hybrid models.