A Hybrid Double Forecasting System of Short Term Power Load Based on Swarm Intelligence and Nonlinear Integration Mechanism

: Accurate and reliable power load forecasting not only takes an important place in management and steady running of smart grid, but also has environmental beneﬁts and economic dividends. Accurate load point forecasting can provide a guarantee for the daily operation of the power grid, and e ﬀ ective interval forecasting can further quantify the uncertainty of power load on this basis to provide dependable and precise load information. However, most of the previous work focuses on the deterministic point prediction of power load and rarely considers the interval prediction of power load, which makes the prediction of power load not comprehensive. In this study, a new double hybrid load forecasting system including point forecasting module and interval forecasting module is developed, which can make up for the shortcomings of incomplete analysis for the existing research. The point forecasting module adopts a nonlinear integration mechanism based on Back Propagation (BP) network optimized by Multi-objective Evolutionary Algorithm based on Decomposition (MOEA / D) to improve the accuracy of point prediction. A fuzzy clustering interval prediction method based on di ﬀ erent data feature classiﬁcation is successfully proposed which provides an e ﬀ ective tool for load uncertainty analysis. The experiment results show that the system not only has a good e ﬀ ect in accurately predicting power load, but also can analyze the uncertainty of the power load, which can be used as an e ﬀ ective technology of power system planning.


Introduction
Power load forecasting is the foundation and key task of management and control of power system [1]. It is often applied in energy supervise, unit commitment and load control [2]. High precision load prediction ensures the secure and steady operation of power system [3]. Therefore, it is essential to enhance the deliverability and prediction precision of smart grid [4]. However, due to many indeterminate reasons such as climate variation, economy growth, public activities and national decisions, the accuracy of power load forecast often fails to achieve the expected results [5]. In view of this, all countries in the world are looking for effective load forecast methods to enhance the accuracy of load forecasting [6].
In addition, the combination of ultra-short term load forecast (USTLF), short term load forecast (STLF), medium and long term load forecast (LTLF) is significant to the safe and economic operation of power system [7]. USTLF and STLF are the necessary basis of power grid dispatching, and reducing the error of STLF is an effective method to strengthen the supervising level of power system [8]. Accurate prediction of power load can save a lot of time to manage the power grid and avoid major changes [9]. Therefore, it is essential to establish a load prediction model with high forecast ability. For the sake of achieve precise and steady STLF, researchers have adopted a lot of methods, including (a) statistical method, (b) artificial intelligence method and (c) hybrid method [10].
In the early stage of power load forecasting, traditional statistical methods are often applied, including some conventional forecasting methods, such as regression method [11], exponential smoothing [12], Autoregressive Moving Average Model (ARMA) [13], Autoregressive Integrated Moving Average (ARIMA) [14,15], seasonal ARIMA [16], grey forecasting model (GM) [17], etc. These models could obtain power load prediction, but due to their own limitations, they cannot achieve the expected forecasting accuracy. In order to overcome these limitations, more and more effective load forecasting approaches have been put forward. In recent years, the prediction model based on artificial intelligence is gradually springing up in electric load prediction [18]. At the moment, the algorithms of artificial intelligence mainly include artificial neural network (ANN) [19,20], support vector machine (SVM) [21], multi-layer perceptron (MLP) and radial basis function (RBF). Although better than the traditional methods, they still unable to fit the current complicated and variable power load characteristics well to achieve satisfactory accuracy due to the defects of a single prediction method [22]. For example, the artificial neural network is easy to fall into local optimization, over fitting and low convergence rate [23]. This has led to the largely establishment of integrating and hybrid models, which are composed of several single models and can achieve better prediction performance [24].
Xiaobo Zhang et al. [25] successfully proposed a new power load prediction model, CS-SSA-SVM, which integrated singular spectrum analysis (SSA), support vector machine (SVM) and cuckoo search (CS) algorithm. This model can significantly enhance the effectiveness of power load forecast. Dong, Y. et al. [26] developed a short-term load prediction model using a unit for feature learning named Pyramid System and recurrent neural networks, and it can greatly increase the stabilization and safety of the smart grid. Wang, R. et al. [27] proposed a new power load forecasting system by combining data preprocessing, hybrid optimized algorithm and certain individual conventional prediction methods, which conquers the shortcomings of individual conventional prediction model and obtains a single model optimization with higher prediction accuracy than traditional forecasting model.
Another problem of power load forecasting is that the research direction is relatively single. Specifically, most of the previous analysis only focuses on the point prediction of load, and rarely considers the load interval prediction together to prediction modeling and analysis. This is not enough to meet the needs of engineering applications, or to ensure the reliability of the power system. Probability interval prediction can display more messages, and its results can help managers to implement appropriate policies. However, the research on interval modeling and prediction is still lacking. At present, the main research direction of uncertainty quantification is mainly statistical methods, including quantile regression [28], bootstrap method [29], kernel density estimation [30], etc. in addition, there are interval prediction methods based on artificial neural network, including lower upper bound estimation method (LUBE) and so on [31]. Table 1 summarizes the existing point prediction and interval prediction methods and models, and evaluates the advantages and disadvantages of these methods.
For point forecasting, from the traditional statistical model to the artificial neural network model, and even for the recently developed hybrid model, the prediction accuracy has been continuously improved, but the models still have a lot of space for improvement. According to the nonlinear characteristics of load, this paper proposes a method of nonlinear combination of single prediction model, and uses swarm intelligence optimization algorithm to optimize the model parameters to further improving the prediction effect. For interval forecasting, no matter quantile regression method [32], bootstrap method [33], kernel density estimation method [34] or LUBE method, these methods have their own advantages and disadvantages that are difficult to overcome. In conclusion, the interval prediction method is not uniform and further research and investigation are needed according to the existing knowledge to obtain more effective results [35]. Therefore, based on the hypothesis of distribution, this study develops a new architecture of interval prediction, which is better than most single model interval prediction architectures. According to the review of the above literature and methods, the major contribution of this article is to design a hybrid double prediction system, including two parts: point forecasting and interval forecasting, which make up for the shortcomings of the existing research. Specifically, the double prediction system includes the preprocessing module based on Improved Complete Ensemble Empirical Mode Decomposition (ICEEMDAN), the prediction module based on nonlinear combination model, the interval prediction module and the evaluation module. As a new signal processing technology, ICEEMDAN decomposes and reconstructs the power load sequence to get a clear time sequence. The nonlinear combination model is an effective prediction model proposed in this study. Among it, Extreme Learning Machine (ELM) [36], RBF, Elman Neural Network (ENN) and ARIMA [37] are selected as the basic models of the combination model, and the prediction results of these four models are nonlinear aggregated by BP [38] neural network. BP network is very sensitive to the selection of parameters, which directly affects the validity of point prediction and interval prediction. Therefore, in order to find the optimal parameters in BP model, Multi-objective Evolutionary Algorithm based on Decomposition (MOEA/D) is developed effectively. Finally, an interval prediction method based on fuzzy clustering is established, which does not need the hypothesis of distribution and model. Therefore, the interval structure has a strong anti-interference ability to the abnormal values in the interval data. In addition, to testify the ability of the designed prediction architecture, we select 9 indicators to verify the accuracy of the prediction, and implement a series of discussion to judge the effectiveness of the prediction system.
The leading innovations of the forecasting system are summed up bellow: (1) A new double forecasting system for power load is established in this paper, which is successfully combined of point forecasting module and interval forecasting module. The purpose of the system is to improve the accuracy of load point prediction, and effectively analyze the uncertainty of power load data.
(2) A nonlinear combination method based on the BP algorithm optimized by MOEA/D is proposed to better improve the forecasting performance of the system. To obtain the optimal combination pattern of each model, the nonlinear aggregation mechanism based on BP is adopted to combine the models and eliminate the inherent defects of individual model and linear combination. In particular, MOEA/D algorithm is used to search the best parameters of BP, which further improves the prediction accuracy.
(3) An interval prediction method based on fuzzy clustering is established, which provides an effective tool for load uncertainty analysis. The interval forecasting method determines the upper and lower bounds of the power load prediction value to quantify the uncertainty information of the load, and provide more comprehensive reference information for the operation risk decision makers of power system. The rest of this paper is: Section 2 shows the methods applied in the proposed double forecasting architecture. In Section 3, the double forecasting system is established. Section 4 introduces the experimental data and displays the experimental results. Section 5 provides further discussion. Finally, the conclusion is given in Section 6.

Knowledge and Tools of Model Preparation
In constructing our model, several methods are chosen as the best choice, and they are combined to improve the ability of the model. Here we introduce three main methods named improved complete ensemble empirical mode decomposition (ICEEMDAN), multi-objective evolutionary algorithm based on decomposition (MOEA/D) and Fuzzy C-Means (FCM) clustering algorithm in detail.
The first is improved complete ensemble empirical mode decomposition (ICEEMDAN).
Being an effective data processing method, ICEEMDAN is proposed by Colominas, Schlotthauer and Torres [39] in 2014. It disintegrates the actual series into some intrinsic mode functions (IMF) and a residual from high frequency to low frequency. CEEMDAN has been proposed to restore the integrity property of EMD.
Nevertheless, CEEMDAN is still worthy of improvement in mode and signal decomposition. In ICEEMDAN, F k (·) represents the operator figures out the k-th mode achieved by EMD, ω (i) is the formation of Gaussian white noise that has zero mean and unit variance, and x (i) = x + ω (i) . M(·) is the operator and it can produce the local average of the signal. Then E 1 (z) = z − M(z) can be obtained. The general steps of ICEEMDAN are as follows: Step 1. The first residue is obtained by calculating the local averages of I formations x (i) = x + β 0 E 1 (ω (i) ) (i = 1, . . . , I) using EMD: In the formula, β 0 = ε 0 · S(x)/S(E 1 (ω (i) )), and ε 0 is the reciprocal of the expected signal-to-noise ratio between the first additional noise and the analytical signal. S(·) is the operator which can calculate the standard deviation of the signal.
Step 2. The first mode is calculated at the first stage (k = 1): Step 3. Calculate the k-th residual (k = 2, . . . , K): Step 4. The second mode is defined, where the local average of the formations r 1 + β 1 E 2 (ω (i) ) is obtained by estimating the second residue. The second mode is Step 5. Computing the k-th mode d k = r k−1 − r k (6) Step 6. Return back to Step 3 and prepare for the next k.

The second is Multi-objective Evolutionary Algorithm Based on Decomposition (MOEA/D).
Recently, decomposition-based multi-objective evolutionary algorithm (MOEA/D) proposed by Zhang Qingfu et al. [40] has attracted more and more researchers' interest because of its concise and effective characteristics, and many theoretical and practical achievements have emerged. The MOEA/D algorithm is introduced below.
A multi-objective optimization problem (MOP) with M objectives and N decision variables can be expressed as follows: Mininize In the formula, Ω ∈ R n is decision space, and the decision vector x = {x 1 , x 2 , · · · , x n } ∈ Ω is a candidate solution of MOP. Here, the objective function F(x) : x → R m includes M conflicting object functions with continuous real values f 1 (x), f 2 (x), · · · , f m (x), and Ω is described as where R m represents the target space.
The pareto dominance relation of individuals is as follows: if there are decision vectors U and V, and satisfy the following two conditions at the same time, we call U dominance V: (i) If and only if f i (u) ≥ f i (v), for every i ∈ {1, . . . , m}.
In this case V is said to be dominated by U, which can be denoted by u v, and among are dominant relations.
If there is no point x ∈ Ω that makes F(x) dominate F(x * ), the point x * ∈ Ω is Pareto optimal. That is, there is only one best set of compromise solutions called the non-dominated (not dominated by all other solutions). The value of Pareto optimization solution in decided space and target space is defined as Pareto solution set (PS) and Pareto frontier (PF).
MOEA/D has strong search ability for continuous optimization, combinatorial optimization and PS complex problems. The principle of the algorithm is: If a multi-objective optimal problem similar to Equation (7) and a weight vector λ= (λ 1 , · · · , λ m are given, and the given weight vector satisfies m i=1 λ i = 1, λ i ≥ 0, i = 1, 2, . . . , m. MOEA/D based on Tchebycheff decomposition uses this weight vector to optimize a MOP into several sub-problems by the following methods. is the ideal point, and λ j = (λ j 1 , · · · , λ j m T . By solving multiple sub-problems with different weight vectors in Equation (9), Pareto optimal solution set with good diversity can be obtained [41].
As is known that g tc is continuous of λ, so if λ i is close to λ j , the g tc (x λ i , z * ) solution must close to the g tc (x λ j , z * ) solution. Hence, a useful tool of g tc (x λ j , z * ) optimization is information about g tc with weight vectors near the λ i .
In the algorithm MOEA/D, the population is made up of the optimal solution of the sub-problem currently found. Each sub-problem maintains a list of neighbors, which preserves sub-problems with weight vectors similar to the sub-problem. Therefore, under the assumption of continuity, two neighbor sub-problems should have similar optimal solutions. In each generation of MOEA/D, each sub-problem is optimized applying only the message of its neighbor sub-problems.
As for each generation t, MOEA/D using the Tchebycheff holds [42]: (1) A point group x 1 , · · · , x N ∈ N, in which the x i is the present solution for the i-th sub-problem.
(3) The best value z i for objective f i now and z = (z 1 , · · · , z m ) T . (4) An external population (EP), applying to store non-dominated solutions found during the search.
The pseudo code of MOEA/D is described as follows: The third is Fuzzy C-Means (FCM) clustering algorithm. FCM clustering algorithm divides the sample points in the sample space X = {x 1 , x 2 , · · · , x n } into c (c > 1) classes, and the degree of each sample point x i belonging to the k-th (1 ≤ k ≤ c) class is expressed as u ik . The fuzzy clustering of sample space X is represented by fuzzy matrix U = (u ik ) n×c , and U satisfies the following conditions: The objective function is defined as: In the formula, V = {v 1 , · · · , v k , · · · , v c }, v k is the k-th clustering center, d 2 (x i , v k ) is the distance measurement function between x i and v k , and m is the fuzzy weighted index. In order to get the best fuzzy c partition of dataset X, the solution (U,V) that makes J m (U,V) the smallest need to be obtained. This can be achieved by the following steps [43]: Step 1: Initialization. Input dataset {x i , i = 1,2, . . . ,n}, clustering number c, fuzzy weighted index m(m ∈ R > 1), maximum number of iterations T and threshold ε. The membership matrix U (t) (t = 0, t is the iteration numbers) is initialized randomly and satisfies the Formula (10). Step 1: Initialization • /*Initialize an primary internal population uniformly randomly.*/ • /*Initialize z = (z 1 ,· · · , z n ) T by a specific problem method. */ • /* Calculate the Euclidean distance between any two weight vectors, and then calculate the closest T weight vectors to each weight vector.*/ if g te y λ j , z ≤ g te x j λ j , z * , then set x j = y and FV j = F(y j ).

END FOR
• /*Update of EP.*/ /*Remove from EP all the vectors dominated by F(y). Add F(y)to EP if no vector in EP dominate F(y). */ END FOR

END WHILE
•

RETURN EP
Step 2: Update the clustering center and membership matrix.

Construction of Power Load Double Prediction System
In this paper, a double forecasting system of power load is established. The double forecasting system means a forecasting system that integrates point forecasting and interval forecasting. The relationship between point forecasting and interval forecasting is similar to point estimation and interval estimation in statistics. The flow of the system we designed is to carry out point prediction first, get the result of point prediction, then construct a suitable confidence interval according to the result of point prediction to carry out interval prediction. The forecasting system includes two modules: point forecasting module and interval forecasting module. The following is the system construction process, and the system structure is shown in Figure 1.

Point Forecasting Module
In this section, we successfully put forward a new type of nonlinear hybrid point forecasting model, using RBF, ELM, ENN and ARIMA, as well as BP network, MOEA/D algorithm and nonlinear combination mechanism to achieve high-precision and more stable load point prediction results. Considering the good prediction performance of BP network, this paper takes BP network as the method of nonlinear combination.
The point prediction module of the designed system is composed of four steps. The details are as follows: Step 1: Power load data preprocessing.

Point Forecasting Module
In this section, we successfully put forward a new type of nonlinear hybrid point forecasting model, using RBF, ELM, ENN and ARIMA, as well as BP network, MOEA/D algorithm and nonlinear combination mechanism to achieve high-precision and more stable load point prediction results.
Considering the good prediction performance of BP network, this paper takes BP network as the method of nonlinear combination.
The point prediction module of the designed system is composed of four steps. The details are as follows: Step 1: Power load data preprocessing. For the sake of removing the noise and further collect helpful information from the power load sequence, we use ICEEMDAN technology to disintegrate the original sequence, and rebuild the smooth time series. Specifically, the original sequence is decomposed into some IMFs. IMFs with higher frequency are eliminated to filter the time series. Here we remove IMF 1 , and the remaining IMFs are rebuilt to get the final series.
In this study, we first use single models to predict the points to obtain a preliminary impression of each model. Then the higher prediction accuracy model, RBF, ELM, ENN and ARIMA, are chosen as member prediction models to build the combination model. RBF, ELM and ENN are used to handle the non-linear features of load, while ARIMA has a good effect on discerning linear characteristics of load data. Say concretely, we divide 1488 load data into training set train and testing set test, where train includes 1152 data and test includes 336 data. The rolling forecasting strategy is employed, which adopts the original data of the previous five periods to forecast the next period. The input and output structure of train is shown in Equations (14) and (15). Using the RBF, ELM and ENN which trained by train to predict the load of the testing set, the prediction sequences predict 1 , predict 2 and predict 3 are obtained, respectively. Similarly, ARIMA is used to obtain the prediction sequence. predict i (i = 1, . . . ,4), including 336 forecasting values are taken as the input datasets, which are inputted into the BP model.
In the Equations (14) and (15), n is the number of train and l is look-back time lag, and x(k) is the power load value at time k. For example, take x(1) to x(l) as input and x(l + 1) as output; next, take x(2) to x(l + 1) as input and x(l + 2) as output, to train the structure of RBF, ELM and ENN. Here, we set l = 5.
For the sake of obtaining the combination model, a nonlinear decision-making method on the basis of BP neural network optimized by MOEA/D is proposed to achieve the best result.
Before introducing the nonlinear integration, the linear integration method is first introduced. Linear integration is to sum up the prediction results by linear weighting the prediction results of n single prediction models. Suppose y t is the actual value, predict it (i = 1, 2, ..., n) is the prediction result at time t, and n is the number of prediction models. w i represents the weight of the i-th prediction model, and n i=1 w i = 1. Therefore, the final prediction results can be calculated as followŝ Considering the complexity of power load and the characteristic of different single models, we choose nonlinear weight combination forecasting to compensate for the shortage of linear integration. In this study, as an important application of nonlinear integration, BP is applied to the integration of a single prediction component for final prediction. Specially, BP neural network is a complex nonlinear black box operation, and a large number of internal nodes in it are connected with each other, which can be used as an arbitrary function approximation mechanism. We regard the trained BP neural network as the weight integration of each single model, so this combination weight can also be regarded as nonlinear integration. However, BP parameters have an important impact on the prediction results and it is difficult to determine. Therefore, as an advanced swarm intelligence algorithm, MOEA/D is used to improve BP parameters to improve forecasting performance.
In particular, the predict 1 , predict 2 , predict 3 and predict 4 , which contains 336 power load prediction values obtained from the RBF, ELM, ENN and ARIMA models mentioned in Step 2, are the basic data of the BP model, where the first 240 values are considered as the training set and the remaining 96 values are taken as the testing set. Then the 96 prediction values obtained by BP are the forecasting results of our proposed model. It is worth noting that it is difficult to find the weights and thresholds of neurons in each layer of BP network, so MOEA/D is introduced to search for the best weights and thresholds of neural network, which solves this problem to a certain extent. The input and output structure of BP network training is shown in Equations (17) and (18).
is the k-th load value predicted by RBF, ELM, ENN and ARIMA, respectively. By input and output, the optimized BP neural network can be trained.
Step 4: Power load point prediction.
According to the established nonlinear prediction model, the rolling forecasting technique is applied for multi-step forecasting, and the final prediction results are obtained. The evaluation index is calculated by using the prediction result and test, and then the performance of the model is effectively evaluated.
Specially, multi-step forecasting means forecasting multiple load values in the future. A time index t is the forecast origin and a positive integer l is the forecast horizon. It can be assumed that the time index t is exactly the time point that we are in, and our target is to obtain the forecasting valuê y t+l (l ≥ 1). l = 1,2,3 corresponds to 1-step, 2-steps and 3-steps, respectively. Figure 2 shows the data usage of multi-step forecasting. Step 4: Power load point prediction.
According to the established nonlinear prediction model, the rolling forecasting technique is applied for multi-step forecasting, and the final prediction results are obtained. The evaluation index is calculated by using the prediction result and test, and then the performance of the model is effectively evaluated.
Specially, multi-step forecasting means forecasting multiple load values in the future. A time index t is the forecast origin and a positive integer l is the forecast horizon. It can be assumed that the time index t is exactly the time point that we are in, and our target is to obtain the forecasting value ˆt l y + (l ≥ 1). l = 1,2,3 corresponds to 1-step, 2-steps and 3-steps, respectively. Figure 2 shows the data usage of multi-step forecasting.

Interval Forecasting Module
Interval forecasting is obtaining the value interval of future load by certain forecasting methods, which shows the possible fluctuation range of future load. If accurate load point prediction can guarantee the daily operation of power grid, then effective interval prediction can further quantify

Interval Forecasting Module
Interval forecasting is obtaining the value interval of future load by certain forecasting methods, which shows the possible fluctuation range of future load. If accurate load point prediction can guarantee the daily operation of power grid, then effective interval prediction can further quantify the uncertainty of power load and provide reliable and accurate load information. The interval forecasting method of this system is developed according to point prediction, which is an interval prediction method based on fuzzy system. The principle of the interval forecasting method proposed in this study is to classify the point forecasting results according to different kinds of load data and construct different but adaptive intervals. The classification is based on the real load data, and judge which category the prediction results belong to, and then construct a specific confidence interval for this category, so as to get the predicted load interval. The three steps of interval forecasting module are shown as: Step 1: Data classification. The training set train of load data is clustered into several classes by FCM clustering method. Assume that the data in each category follows the same normal distribution. Therefore, we can get some interval classes F 1 , F 2 , · · · , F k . The mean value and variance of each category are calculated respectively to prepare for the interval construction next.
Step 2: Load interval estimation. The confidence degree of each category interval is 95%. According to the mean and variance of each category data, the corresponding confidence interval is constructed. Different categories have different width of unified prediction interval. This process of constructing different adaptive intervals according to different data characteristics is also one of the innovations of this model. According to the testing set test point prediction results in the point prediction module, identify the category F that each prediction value falls into. Then, according to the constructed confidence interval of each category, the prediction interval of each prediction value is calculated as: where x i is point prediction value, j is the category number of x i , s j is standard deviation of category j, n j is the data number of category j.
Step 3: Sorting out the prediction results. According to the prediction interval of the above points, the final interval estimation of the power load is obtained.

Experiments and Analysis
This section introduces the application of the double forecast model and several comparison models, and divides the comparison into three experimental demonstrations. The operating environment of the experiment is: 2.60 GHz CPU, 4.00 GB RAM, Windows 7 and Matlab R2016A. Considering the random factors, to guarantee the reliability of the final results, 20 experiments are carried out each time, and the average value is taken, respectively.

Dataset Description
From July 1, 2019 to July 31, 2019, power load data were collected from New South Wales, Queensland, South Australia and Tasmania, including four weeks of power load data in this paper. Electricity demand is collected every 30 min, with a total of 1488 data points and 48 data points per day. Among them, the data from July 1, 2019 to July 27, 2019 were chosen as training set of selected model, 27 days in total, including 1296 data points; the data from July 28, 2019 to July 31, 2019 were used as testing set, 4 days in total, including 192 data points. At the same time, the last three days data are used to decide the network structure of BP model. Training set and testing set select identical rolling forecasting technique and output one-step, two-step and three-step prediction results. The power load data from July 1, 2019 to July 31, 2019 and its statistical indicators which are minimum, maximum, mean and std., are shown in Table 2. The distribution condition of areas and dataset are presented in Figure 3.

System Evaluation
The evaluation indexes of the designed double prediction system are introduced, including 5 indexes of point prediction and 4 indexes of interval prediction.

Point Forecasting Evaluation
Generally speaking, the evaluation criteria are not unique for the prediction system. Hence, this paper uses five common evaluation standards to assess the point forecasting performance of the As shown in the Table 2, the statistics of all samples, training sets and testing sets of three sites are similar. The data set is reasonable and can be chosen to testify the supreme ability of the proposed model.

System Evaluation
The evaluation indexes of the designed double prediction system are introduced, including 5 indexes of point prediction and 4 indexes of interval prediction.

Point Forecasting Evaluation
Generally speaking, the evaluation criteria are not unique for the prediction system. Hence, this paper uses five common evaluation standards to assess the point forecasting performance of the proposed model and other comparative models. The five indexes include mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), direction change (DC) and the index of agreement of forecasting results (IA). Among these indexes, the smaller the values of MAPE, RMSE and MAE, the larger the values of DC and IA, the better the prediction performance. See Table 3 for details of the four indicators. Table 3. Point forecasting evaluation metrics.

Indicator
Definition Equation

MAPE
Mean Absolute Percentage Error Among the formula, y i andŷ i is the true and predicted value. N represents the testing set number. In addition, a i is the directional factor, and is calculated as a t =      0, otherwise 1, i f ((y(t + 1) − y(t))(ŷ(t + 1) − y(t))) > 0 .

Interval Forecasting Evaluation
For interval prediction, we select four evaluation indexes, which are forecasting interval coverage probability (FICP), forecasting interval normalized average width (FINAW), mean width of the constructed PIs (MPI) and Accumulated width deviation (AWD). Table 4 shows the specific definitions of four indices. Table 4. Interval prediction evaluation metrics.

Indicator
Definition Equation

FICP
Forecasting interval coverage probability

Mean width of the constructed PIs
Among the formula, U i and L i represent the upper limit and lower limit of forecasting interval, respectively. c i is the number of the truth value contained in constructed interval. N represents the testing set number. y max and y min are the maximum and minimum of the targets in the whole prediction process. In addition, the calculation expression AWD i , and AWD i is the width deviation of construction interval of each sample.
Specifically, FICP is the forecasting interval coverage probability of the test data set, which is the main evaluation index of interval prediction. It indicates the coverage effect of the obtained confidence interval to the actual value. Given the confidence level, if FICP is at least greater than or equal to 1-alpha, the constructed interval is valid; otherwise, the constructed interval is invalid. FINAW is the normalized average width of the forecasting interval of the testing set. The cost of reducing the width is diminishing the possibility of expected target covering; increasing the coverage requires increasing the width of the interval, so FICP and FINAW are essentially contradictory. MPI represents the average width of the obtained interval. AWD is the accumulated width deviation of testing dataset, which could be obtained by calculating the relative deviation degree. The cumulative sum of AWD i can measure the relative deviation degree. See Table 4 for the specific description of the formula.

Diebold-Mariano Test
To verify the designed hybrid model owns better forecasting ability than compared models, an effective verification method called Diebold-Mariano (DM) test proposed by Diebold FX and Mariano RS is adopted. The theory of DM test is introduced first.
Considering the significance level α, zero hypothesis H 0 indicates the predictive effectiveness of the developed model and the comparison model are not significantly different. The meaning of H 1 is contrasted with H 0 . The relevant formulas are shown as: In the formula, L represents the loss function of prediction error. err i 1 and err i 2 are the error sequence predicted by selected model. In addition, the statistics of DM test can be defined in the following ways: in which S 2 is the estimate of the variance of d i = L(err 1 i ) − L(err 2 i ). Assuming a certain significance level α, the obtained value DM is in comparison with that of z α/2 . Once DM statistics exceed the interval [−z α/2 ,z α/2 ], H 0 can be rejected. This shows the predictive performance of the established model and that of the comparative model are significantly different, which means that H 1 will be accepted.

Results and Analysis of Point Forecasting
To testify the performance of the point prediction module, two experiments are conducted in this part: experiment I and experiment II. The main purpose of experiment I is to prove the good ability of the nonlinear combination model in the point forecasting, so as to reasonably verify the superiority of the proposed model. In addition, experiment I proves the necessity of data preprocessing. In the same way, to prove the rationality and superiority of the ICEEMDAN technology selected in this work, it is compared with other commonly used data preprocessing technology, and this is the content of experiment II. The detailed analysis of each experiment is as follows.

Experiment I: Comparison with Individual Models
In this work, all experimental datasets are trialed to assess the effectiveness of the point prediction module, while three comparisons are designed. In comparison (a), the proposed model is compared with the four data preprocessed models, ICEEMDAN-RBF, ICEEMDAN-ELM, ICEEMDAN-ENN and ICEEMDAN-ARIMA, in order to analyze the advantages of the combination model using the nonlinear combination method. In comparison (b), four ICEEMDAN-based models are compared with single models RBF, ELM, ENN and ARIMA, respectively. In comparison (c), the effectiveness of the designed forecasting model is further tested by using the traditional model SVR and Generalized Regression Neural Network (GRNN) as comparison methods. The predicted results are displayed in Table 5 and Figure 4, and the comparison consequences are as follows.   (1) For comparison (a), the developed hybrid nonlinear model has the best ability in one to three step load forecast in four datasets, whose error index was superior to other models. For instance, in one-step forecasting, the MAPE of the established model is about 0.6577%, 0.7373%, 2.0714% and 1.3288%, while the prediction accuracy of the other ICEEMDAN-based model is 0.1% to 2% lower than that of the established model. For the two and three step prediction, the proposed model is also better than other models in four sites.
(2) For comparison (b), comparing ICEEMDAN-ENN and ICEEMDAN-ARIMA models with ARIMA and ENN without data preprocessing, it can be found that data preprocessing is very important to enhance the ability of load forecasting. For site 1, the MAPE of ENN and ARIMA are higher than that of ICEEMDAN-ENN and ICEEMDAN-ARIMA in one, two and three step prediction. The accuracy of ICEEMDAN-RBF and ICEEMDAN-ELM which are not shown in the table are also improved. For site 2, site 3 and site 4, and Figure 4, the situation is similar.
(3) For comparison (c), it can be seen from five indexes MAPE, MAE, RMSE, IA and DC of Table 5, the proposed model is more accurate than other individual models, such as SVR and GRNN. In addition to the proposed hybrid prediction model, the single model with high prediction accuracy is ARIMA and ENN. Therefore, we choose ARIMA and ENN as the model used in combination models, and the same circumstance as RBF. ARIMA is a linear model, therefore it can show that the power load data has certain linear characteristics, so it is a wise choice to take ARIMA into account in the proposed model. Additionally, although the prediction accuracy of BP is not shown, the experimental results show that it has relatively good prediction effect, so BP is selected as the model used in nonlinear combination.

Experiment II: Tests of Data Preprocessing Methods
This experiment is aiming at comparing the effectiveness of ICEEMDAN selected in this system with other common data preprocessing methods, which includes EMD, EEMD, CEEMD, SSA and WD. Therefore, the point forecasting models on the basis of different data preprocessing methods are EMD-based model, EEMD-based model, CEEMD-based model, SSA-based model and WD-based model. These models only use different decomposition method in the data preprocessing stage. Through the experiment, we can test whether the proposed prediction model is reasonable, and can also find the best method to remove the noises to improve the prediction effectiveness.
The results obtained by models using different data preprocessing approaches are shown in Table 6. Figure 5 shows a clearer and more intuitive comparison. The conclusion can be drawn that the model on the basis of the ICEEMDAN decomposition technology has much better performance than other decomposition based prediction models. Say concretely, for site 2, the MAPE value of ICEEMDAN-based proposed model is 0.7373%, 1.8561% and 1.8413% for three steps, which is 0.1 to 4 percentage points higher than that of EMD-based model, EEMD-based model, CEEMD-based model, SSA-based model and WD-based model. Of all the benchmark models, EMD-based model is the worst. Compared with other models, the MAE, RMSE, IA and DC of the one to three step prediction of the developed model are also improved to different extent, which further shows the superiority of the data preprocessing method selected by this hybrid model.

Remark 2.
Experiment I and experiment II focus on proving the advantages of the proposed point forecast module, and the results show that the designed point forecasting model is a very promising power load forecasting method. It can also prove that the combination of data preprocessing technology, optimization algorithm and nonlinear combined method could successfully solve the difficulties of load prediction through appropriate prediction methods.

5.
The multi-step forecast ability in Experiment II for site 2. The picture is divided into three parts: one to three steps time series forecasting chart, three mparison bar chart and one-step forecasting error point chart. Figure 5. The multi-step forecast ability in Experiment II for site 2. The picture is divided into three parts: one to three steps time series forecasting chart, three steps error comparison bar chart and one-step forecasting error point chart.

Results and Analysis of Interval Prediction (Experiment III)
Base on the point power load forecasting, the probability interval forecasting could show more load information. In this part, we develop a method based on fuzzy clustering, which carries out interval forecasting on the basis of point forecasting. In addition, four datasets are applied in this experiment. To verifying the ability of the designed interval forecast module, we use the all the compared model of point prediction, and also use multi-step prediction to verify the interval predicted results. The results of the interval predicted model and other models are shown in Table 7. Due to the limited space, we only display the results of site 2 and site 3. We set the confidence interval to 90% to assess the effectiveness of the interval predicted model.
(1) For site 2, the best values of all indexes in all models are obtained by the proposed prediction model. For proposed model, the coverage probability of forecasting interval (FICP) is 98.96% in one-step, 79.06% in two-step and 78.95% in three-step. The average width of the interval is 356.9044, 333.2484 and 355.3731 in three steps according to MPI. Compared with the absolute value of power load, the interval width obtained is relatively accurate. AWD is 0.0002, 0.0587 and 0.0554 for three steps, shows the deviation degree of the constructed interval is small. All indexes reflect that the predicted interval of proposed model is qualified. In contrast, for the FICP of single prediction model, none of the predictions is better than proposed model. Although the ICEEMDAN-ELM has the same value of FICP as proposed model in one-step, it is largely lower than two and three-step.
(2) By combining FINAW with FICP, for the proposed model, when the FICP value is very high, FINAW is relatively small, which also shows the superiority of the developed model. In the one and two step forecast, the AWD of most other benchmark models is more than ten times of the developed model in one-step, and they are much larger in two and three step. This reflects the less deviation of the developed model. These four indexes fully reflect the superior forecasting ability of the developed model. The same conclusion can be drawn for site 3 in Table 7. (3) At the same time, in order to intuitive show the comparison results, the results of the designed model and comparison models are pictured in Figure 6. The conclusions are consistent with Table 7, providing intuitive evidence for verifying the superior ability of the proposed system in the load interval forecasting. As shown in Figure 6, compared with other models, the proposed model has more accuracy interval forecast results. Obviously, the prediction range not only covers most of the load values, but also is narrowest among all models. This shows that the designed model is more stable than others. As a result, the designed model has greater advantages for three experimental datasets. Remark 1. The same as the comparison model used in point forecast, 13 different competition models based on four datasets and multi-step interval forecast are compared. The results show the designed interval model is better than all the comparison models. Due to the excellent ability of the designed interval prediction module based on fuzzy clustering, it is a very promising interval prediction method of power load.

Discussions
For the sake of discussing the experiment conclusions in detail and reduce the error of power load forecasting, the validity of the established model, the combination mechanism of combination model and the practical application in the power system are discussed.

DM Test
The validity of the model is verified by DM test by all other models comparing with the proposed hybrid forecasting model. Based on DM test theory, the zero hypothesis is that the prediction results of both models is no significant difference, while the alternative hypothesis is contrast. We chose two scales with alpha of 0.1 and 0.05 as the criteria to judge the significance of the results, among which Z 0.05/2 = 1.96 and Z 0.1/2 = 1.645. Table 8 displays the DM statistics result and averages for the four datasets.
It can be seen that most of the DM test values calculated by the developed model and the above comparison model are larger than the upper limit of 5% significance level. However, for some results of ICEEMDAN-RBF, ICEEMDAN-ARIMA and CEEMD-based model as well as WD-based model, the results do not show significant differences with the proposed model. Therefore, it can be considered to reject the zero hypothesis at the level of 10% significance. For example, the DM test statistic of ICEEMDAN-ARIMA model in site 4 is 1.7554 for one-step, which is not significantly differ from the developed model at 5% significance level, but significantly differ from the developed model at 10% significance level. At the 10% level of significance, almost all the distinctions between the designed model and the benchmark model are significant. There are a few models whose results indicate that the difference between the compared model and the proposed model are not significant, but the indicators such as MAPE show that the proposed model still own the best ability. Therefore, it can be proved that the designed hybrid double forecasting model is preferable to other models. " " refers to 10% significance level; " " refers to 5% significance level. Considering the significance level α, zero hypothesis H

Performance Testing of Optimization Algorithms
This section first introduce the parameter settings of BP network and MOEA/D algorithm, then implement the convergence testing of metaheuristic algorithms.

Parameter Settings
The artificial intelligence algorithm BP is used to combined the power load results. In BP neural network, the weights and thresholds of input, hidden and output layer occupy an important position in network performance. In order to effectively determine the connection weight and node threshold, we choose the MOEA/D algorithm to optimize its parameters. The parameters of BP and MOEA/D are shown in Tables 9 and 10, respectively.

Convergence Testing of Optimization Algorithms
To discuss the performance of MOEA/D algorithm, different population size numbers are selected to test ability under four test functions, and two multi-objective optimization algorithms, Multi-objective Grey Wolf Optimization (MOGWO) and Multi-objective Dragonfly Algorithm (MODA), are selected as the compared model. Table 11 shows the details of the four test functions. Through the comparison of different optimization methods, it is proved that the prediction ability of MOEA/D is better than that of other multi-objective algorithms. A total of 20 experiments are carried out in each case and the average value is obtained. The calculation results of each index are shown in Table 12.  5,5] We choose two performance indexes of optimization algorithm as the criteria to evaluate the performance of optimization algorithm, which are Inverted Generational Distance (IGD) index and Spread index. In addition, the running time of different algorithms is compared. In particular, IGD is an indicator of the convergence condition of the algorithm, and its result can be used to judge the robustness and stability of the algorithm. If the IGD value is smaller, the ability of the algorithm is better. In Pareto set, Spread is usually used to evaluate the distribution of solutions. If SP is equal to 0, all non-dominant solutions are equidistant.
The final simulated results are shown in Table 12. Considering all the algorithms, when the population size is 100, 200, 300 and 500, respectively, the larger the population size has the better the convergence effect. For MOGWO, too large of a population leads to over fitting of data, which makes the algorithm worse. Compared with different algorithms, MOEA/D has the best performance for ZDT1, ZDT2, ZDT3 and ZDT4. The IDG of MOEA/D algorithm is far less than that of other algorithms, which shows the MOEA/D algorithm has the best convergence performance, and MODA is the second best optimal algorithm. The convergence effect of MOGWO algorithm is much worse than other algorithms. For Spread, MOEA/D has the best allocation performance. The elapsed time of MOEA/D algorithm is significantly lower than the other two algorithms, which shows that MOEA/D is undoubtedly the fastest and best algorithm in terms of working efficiency.

Combination Mechanism of Combined Model
For the sake of verifying the effectiveness of the designed nonlinear combination mechanism MOEA/D-BP, a simple average strategy and a linear combination mechanism are selected as the comparison in this study. Among them, the simple average strategy computes the mean value of the prediction results of each model, while the linear combination mechanism uses the multi-objective algorithm MOEA/D as the weight determination method to get the final prediction results. The compared consequences between the developed model and the other two methods are represented in Table 13. Specially, the simple average method is to use the simple average formula under statistical sense to calculate the final predicted value. The method formula is briefly introduced as follows: where f i is the prediction results of the corresponding model. The linear combination of the models is the weighted combination of the results of the four single models, and a final prediction value is obtained. The weights are determined by the multi-objective optimization algorithm, which increases the intelligence of the method. The effects of each combination mechanism are compared based on five point forecasting error measurement rules and four interval error forecasting measurement rules. The result shows that the forecasting effectiveness of the nonlinear combination model is more accurate than that of the simple average method and the linear combination mechanism, regardless of the sites and forecasting steps. The linear combination mechanism is often more effective than the simple average strategy. In other words, the simple average strategy is the worst. Therefore, the developed nonlinear combination mechanism MOEA/D-BP has successfully improved the forecasting effectiveness of power load.

Practical Application of Load Forecasting To a Power System
Load forecasting is of great significance for how to improve the stability and reliability of power grid. Accurate forecasting results can play a decisive role in the safety and stability of power network operation. Point forecast represents the possible situation of load in a future period, while interval forecast can reflect the possible range of load. The result of load forecasting is directly reflected in the power grid planning, and the details are as follows [44].

Application of Load Point Forecasting
Power supply load calculation needs load forecasting. According to the results of load forecasting, we can calculate the power supply load of each voltage level. The power supply load is the premise of calculating the power balance and the basis of determining the newly added variable capacitance. The network load is generally calculated according to the voltage level, which refers to the load provided by the public transformer of the same voltage level. In order to allocate the capacity of distribution network reasonably, it is necessary to predict and analyze the distribution of power supply load of each voltage level network.
Point load forecasting can be applied to high voltage power grid planning. After receiving power from the upper level grid or power supply, the high-voltage grid can directly supply power to the high-voltage users, or provide power to the lower level medium voltage grid, which is the link between the transmission network and the medium voltage grid. Load forecasting is directly related to substation capacity demand and distribution in high voltage power grid planning. Substation capacity demand is to determine the number and capacity of main transformer according to the prediction results of network load and the value of capacity load ratio. The distribution of substation is determined according to the load density of spatial load forecasting.
Point load forecasting can also be applied to medium voltage power grid planning. In the medium voltage network planning, the load forecasting results are directly related to the distribution and transformation planning and line scale planning. The planning of medium voltage distribution is mainly to determine the capacity and distribution points of the new distribution transformer. The medium voltage line planning is mainly based on the load forecasting results to determine the number of lines to meet the growing demand for power supply [45].

Application of Load Interval Forecasting
Analyzing the existing load forecasting methods, it is found that a large number of methods are deterministic load point forecasting results. In fact, because there are various uncertain factors in the power system, the decision making must face a certain degree of risk, so the uncertainty of power demand must be considered in the decision-making. The results of traditional deterministic forecasting methods can not reflect the uncertainty of demand, and interval forecasting can meet this objective requirement.
Interval forecasting transmits more information than point forecasting. The result of interval forecasting is a series of interval value, and this interval corresponds to a certain level of probability confidence level, which can describe the possible range of future forecasting results. According to the results of interval forecasting, the power system decision makers can better understand the fluctuation range of future load changes, and better understand the uncertainty and risk factors that may exist in the future load when carrying out production planning, system safety analysis and other work, so as to make more reasonable decisions in time. According to the upper and lower bounds of interval prediction, the rotating reserve capacity of power system can be arranged, so as to improve the economic benefits of power system operation. Interval load forecasting can also meet the optimal unit combination, economic scheduling and optimal power flow of the power system dispatching department, which is conducive to improving the utilization rate of power generation equipment and the effectiveness of economic scheduling. Therefore, the analysis of power system load change and the study of power load interval forecasting method are helpful for decision makers to better grasp the change of data in power grid planning and other aspects, so as to achieve more scientific analysis and evaluation.

Conclusions
Precise and dependable power load forecasting not only takes an important place in power management and operation of smart grid, but also own environmental advantages as well as economic and social benefits. However, due to the complicated fluctuation of power load, its further development and utilization are greatly limited, and even may endanger the dispatching and management of power system. Most of the previous work focused on the deterministic point prediction of power load, seldom considered the other important aspect which is the interval prediction of power load, and this situation makes the prediction of power load not comprehensive.
In order to fully mine and evaluate the deterministic and uncertain characteristics of power load, this study successfully developed a double forecast system, which makes up for the shortcomings of the existing research. The system is divided into two parts: the point forecasting module based on nonlinear combination and the interval forecasting module based on fuzzy clustering. It is of great importance to comprehensively discuss the predictability and modeling of load. Different from the previous work, this paper effectively designs BP neural network based on MOEA/D optimization as a new nonlinear combination mechanism, obtains the final prediction results, further improves the accuracy of point prediction, and improves the final prediction ability. On the basis of improving the prediction accuracy, the load data is divided into different categories based on fuzzy clustering, and then different intervals are constructed according to the prediction data of different categories. This method constructs different intervals according to different characteristics of data, which is an effective interval prediction method. Finally, a large number of experiments are carried out by using the quantitative index, which proves the effectiveness and superiority of the system. In addition, because the designed system has good performance, it can also be used in other load forecasting, wind power forecasting, economic forecasting and other fields.

Conflicts of Interest:
The authors declare that there are no conflict of interest regarding the publication of this paper.