Short-Term Load Interval Prediction Using a Deep Belief Network

In load predication, point-based forecasting methods have been widely applied. However, uncertainties arising in load predication bring significant challenges for such methods. This therefore drives the development of new methods amongst which interval predication is one of the most effective. In this study, a deep belief network-based lower–upper bound estimation (LUBE) approach is proposed, and a genetic algorithm is applied to reinforce the search ability of the LUBE method, instead of simulated an annealing algorithm. The approach is applied to the short-term load prediction on some realistic electricity load data. To demonstrate the effectiveness and efficiency of the proposed method, it is compared with three state-of-the-art methods. Experimental results show that the proposed approach can significantly improve the predication accuracy.


Introduction
Load prediction plays an important role in the planning of power systems, building reliable power systems and so on.In general there are four types of load predication, that is, long-term, medium-term, short-term, and ultra-short-term forecasting.The short-term load prediction (STLP) is crucially important on the daily operation and scheduling of power systems such as economical dispatching and optimal unit commitment.
To date, there has been a number of studies proposed for STLP.These methods can be loosely categorized as point predication and interval prediction.Representative point predication-based methods include the following: (i) methods based on statistical models, such as state space model [1], regression analysis model [2], autoregressive integrated moving average (ARIMA) [3], Kalman filtering [4], and exponential smoothing (ES) models [5]; (ii) artificial intelligence-based methods, e.g., neural networks (NNs) [6], expert systems [7], support vector machines (SVM) [8], deep learning [9]; and (iii) hybrid models such as neuro-fuzzy systems [10].However, the main issue of point predication is that it only provides a single value as an output without considering the accuracy or reliability of the predication [11].Given the increasing uncertainties of the power grid caused by self-powered users and independent microgrids (based on renewable energies) [12], point predication-based methods now face great challenges.
Interval predication-based methods, as the name says, output an interval as the predication results to cover the future observations with a certain confidence level of expectation probability, which is more suitable to deal with uncertainties [13,14].The upper and lower bounds in interval predication can not only highly cover the fallen objectives, but they also provide an accurate coverage probability as an indication, which obviously brings more quantitative information than point prediction.Table 1 listed several representative interval predication methods.The delta method adopts a nonlinear regression technique to enhance the generalization performance of the neural network (NN) models [15].First, the method linearized the neural network model by a set of parameters generated by minimizing the sum of the squared error cost functions.Then, the linearization model applied the standard asymptotic theory to construct predication intervals (PIs) [16].The main issue of the above method is the use of the linearization that simplifies the approach but that may lose effect when the dataset shows strong non-linearity.Bayesian techniques are used to train neural networks, and they allow the predicted value to have a certain error range [17].However, the need to calculate the Hessian matrix of the cost function constructed makes the calculation of this method expensive.The Bootstrap method is perhaps the most widely used technique for NN-based interval predication, due to its simplicity and ease of implementation [18].Compared with the aforementioned methods, the Bootstrap method does not need to calculate the derivative and the Hessian matrix.However, it requires a large data set to support the training process.The mean-variance estimation-based method can enhance the ability of the NN model to estimate the distribution characteristics of conditional objectives [19].The most striking feature of this approach is that it greatly reduces the calculation cost of the training process.However, the low empirical coverage probability is the biggest drawback of this approach.

Delta method NN is enhanced by the nonlinear regression technique
The use of linearization in NN Bayesian method Strong theoretical foundation of Bayesian concepts Large computational burden required for the calculation of a Hessian matrix

Bootstrap method
Ease of implementation The need of a large data set to support training and calculation

Mean-variance estimation-based method
The low calculation cost of the training process The low empirical coverage probability In addition to the above methods, an alternative interval predication method, namely, the lower upper bound estimation (LUBE), is proposed in 2011 [20].Compared to existing NN-based interval predication methods, the LUBE does not make any assumption about the distribution of the training data sets or the prediction errors.Also, it avoids mass computational calculation of complex derivatives.In this study a single-objective LUBE framework is adopted, though its multi-objective framework has also been proposed [21,22].
The NN-based LUBE approach has many demonstrated applications [23][24][25]; however, the traditional NN model has its own limitations such as the requirement for a long training time for good performance, and the ease of the model to be trapped in local optima.These limitations have greatly restricted the performance of the NN-based LUBE method.The deep belief network (DBN) has attracted a great deal of attention in the last decade [26].The DBN adopts a layer-by-layer training method, by which the whole network can be effectively trained.One notable feature of DBN is that it can hierarchically display multiple characteristics of the patterns of data.
In the last decade, there is a growing appeal for using DBN to predict time series data [27].For example, an empirical mode decomposition (EMD) algorithm is incorporated into DBN to improve the algorithm performance [28].The particle swarm optimization (PSO) approach was introduced to enhance the learning and extraction capability of the restricted Boltzmann machine (RBM) in DBN [29].In [30] an adaptive DBN learning architecture is proposed to autonomously generate/eliminate RBM neurons based on the training data patterns.A fast meta-heuristic algorithm was applied to make the parameter settings of DBN more suitable and accurate [31].The nearest neighbor classification Energies 2018, 11, 2744 3 of 18 algorithm was combined with the dynamic time warping (DTW) method to obtain first-class prediction performance [32].
In the study, a single-objective framework LUBE method using the DBN is proposed to perform short-term load predication.In addition, a genetic algorithm is applied to reinforce the search ability of the LUBE method, instead of a simulated annealing algorithm.The rest of the paper is organized as follows.Section 2 introduces the background knowledge related to the deep belief network model and the LUBE method; Section 3 elaborates the proposed novel interval prediction framework combing DBN and LUBE; experimental setup, results, and discussions are presented in Section 4; Section 5 concludes the paper and identifies some future directions.

Background
This section introduces some necessary background knowledge i.e., the evaluation metrics of the interval predication and the LUBE method.

Evaluation Metrics of Interval Prediction
As is well known, the mean absolute percentage error (MAPE) and the mean square error (MSE) are two widely used metrics in point prediction.Likewise, the PI coverage probability (PICP) and the PI-normalized average width (PINAW) are two important metrics in interval predication.Notably, the metrics have to be optimized simultaneously so as to obtain an interval with narrow range but good coverage (i.e., reliability).
Specifically, the PICP measures the number of objective values that are within the predicted interval.The larger the PICP, the better the predication results.Mathematically, the PICP can be defined as follows: where n denotes the number of objective sets, and c i is a Boolean variable defined by: The variable c i describes the coverage degree of predicated interval (PI).If the objective value y i lies within the lower bound L i and the upper bound U i , then c i = 1; Otherwise, c i = 0.The ideal case, PICP = 100%, indicates that all objective values are within the predication interval.
A little more thought can reveal that a sufficiently wide PI would result in PICP = 100%.However, this is obviously not applicable.Therefore, another metric has to be introduced, the PINAW, which is expect to be minimized; see Equation (3): where S measures the range of the objective values, i.e., the maximum objective value minus the minimum.It is used to standardize the average width of the PI as a percentage.In this way, PINAW can be applied to quantitatively examine the performance of the constructed PI by different methods.
Obviously, the PICP and PINAW are in conflict with one another.A narrow interval (a small PINAW) has a large probability to result in a small PICP.Thus, to assess the overall performance of the interval predication methods, a comprehensive cost function is required to consider both the coverage probability and the width of the predication interval.Moreover, as PICP is the basic feature of interval predication methods, the proposed cost function is designed to give more weight to the variation of PICP.In short, the coverage width-based criterion (CWC) is as follows: where γ(PICP) is a Boolean function: where η is used to penalize the invalid PI, while µ can be determined by the confidence level of PI.

LUBE Approach
Different from traditional interval predication methods, the LUBE approach directly approximates the upper and lower bounds of the PI by unsupervised learning methods.As is mentioned in the last section, the LUBE aims to achieve a narrow predication interval and a high coverage probability of objective values, which is a typical bi-objective optimization problem.By Equation ( 4), the bi-objective problem is reasonably transformed into a single-objective problem, minimizing the unified indicator, CWC.
The proposed metric, CWC, is therefore used to train an NN, so as to construct PI.The NN model has two outputs, the upper bound and the lower bound.A genetic algorithm is applied to reinforce the search ability of the LUBE method.Figure 1 illustrates a typical flowchart of the NN-based LUBE approach.The main steps are described as follows.
Step 1 Population initialization: randomly initialize the population of the genetic algorithm (GA).
The weights and thresholds of the NN models are generated based on the population.
Step 2 PI construction and CWC raw calculation: an NN with two outputs is applied to construct PIs for the training data.PICP, PINAW, and CWC are then calculated, which are taken as the initial fitness of the genetic algorithm.
Step 3 Generation of a new population: the selection, crossover, and mutation operators are performed on the parent population to produce new offspring.
Step 4 PIs construction: a new PI is constructed by using new selected NN parameters.Accordingly, the new metric CWC new is calculated by Equation (4).
Step 5 Each individual evaluation: The index CWC is considered as the fitness in the GA optimal process.The individual with the minimum fitness is recorded as the global optimal solution.The individual also represents the best model parameters.
Step 6 Termination and Results: usually there are frequently used termination criteria, i.e., the maximum number of iterations is reached, or the evaluation indicator remains unchanged for a number of interactions.If the criteria is not met, then the algorithm returns to Step 3.  Energies 2018, 11, 2744 6 of 18

Single-Objective LUBE Framework for DBN-Based Interval Predication
As mentioned previously, the LUBE method directly constructs a predication interval.This is of low computational cost, and it is easy to implement.At present, most studies are based on a NN model to build prediction intervals.However, compared with the NN model, a DBN with RBM structure can discover inherent features of data which is therefore more suitable in predicting time series data.This section thus elaborates the use of DBN-based model for interval prediction.

Deep Belief Network Model
The DBN model generally consists of several restricted Boltzmann machines (RBM), stacking, and a layer of NN [33].The training process of DBN contains two phases: a layer-wise pre-training process and a fine-tuning process.The former provides better initial values of the network parameters, and the latter searches optimal parameters of the network.A typical DBN is illustrated in Figure 2.

Single-Objective LUBE Framework for DBN-Based Interval Predication
As mentioned previously, the LUBE method directly constructs a predication interval.This is of low computational cost, and it is easy to implement.At present, most studies are based on a NN model to build prediction intervals.However, compared with the NN model, a DBN with RBM structure can discover inherent features of data which is therefore more suitable in predicting time series data.This section thus elaborates the use of DBN-based model for interval prediction.

Deep Belief Network Model
The DBN model generally consists of several restricted Boltzmann machines (RBM), stacking, and a layer of NN [33].The training process of DBN contains two phases: a layer-wise pre-training process and a fine-tuning process.The former provides better initial values of the network parameters, and the latter searches optimal parameters of the network.A typical DBN is illustrated in Figure 2.

Input layer
Hidden layer 1 Output layer Hidden layer 2 Hidden layer 3

Pre-Training Process
The goal of the pre-training process is to generate a good set of network parameters for the DBN model.The configuration of parameters is obtained through an unsupervised greedy optimization algorithm by using the (RBM).
RBM, a stochastic binary structure, can learn the distribution characteristics of sample data [34,35].This binary structure consists of visible layers and hidden layers.There are connections between the visible layer and the hidden layer, while there is no connection within the layer.These connections are bidirectional and symmetrical.Figure 3 shows the typical structure of RBM.

Pre-Training Process
The goal of the pre-training process is to generate a good set of network parameters for the DBN model.The configuration of parameters is obtained through an unsupervised greedy optimization algorithm by using the (RBM).
RBM, a stochastic binary structure, can learn the distribution characteristics of sample data [34,35].This binary structure consists of visible layers and hidden layers.There are connections between the visible layer and the hidden layer, while there is no connection within the layer.These connections are bidirectional and symmetrical.Figure 3 shows the typical structure of RBM.The RBM is an energy-based model.The energy of the joint configuration of the visible and hidden layers can be expressed as below: ( ) where i h represents the state of the hidden layer unit i , and j v represents the state of the visible layer unit j .ij w is the weight between the units.i b and j a represent the thresholds of the units.
The energy function is applied to calculate the probability that is assigned to each pair of visible and hidden vectors.The lower the energy, the closer the network is to the desired goal.The probability distribution between the visible layer and the hidden layer is defined as follows: where M is a partition function that counts Given the activation unit of the visible layer, the activation probability of the hidden layer unit is: where σ is the logistic sigmoid function: ( ) ( ) ( ) Accordingly, for a given hidden unit vector, the state probability of the visible layer specific unit can be expressed as: ( ) The RBM is an energy-based model.The energy of the joint configuration of the visible and hidden layers can be expressed as below: where h i represents the state of the hidden layer unit i, and v j represents the state of the visible layer unit j. w ij is the weight between the units.b i and a j represent the thresholds of the units.The energy function is applied to calculate the probability that is assigned to each pair of visible and hidden vectors.The lower the energy, the closer the network is to the desired goal.The probability distribution between the visible layer and the hidden layer is defined as follows: where M is a partition function that counts e −E(v,h) over all possible configurations, and regularizes it as below: Given the activation unit of the visible layer, the activation probability of the hidden layer unit is: where σ is the logistic sigmoid function: Accordingly, for a given hidden unit vector, the state probability of the visible layer specific unit can be expressed as: The update process of the RBM is described below.The number of units selected for the visible layer is the same as the number of training data given, and then Equation ( 10) is used to calculate the state of the corresponding hidden layer.Similarly, based on the state obtained by the hidden layer unit, the state of the visible layer unit is calculated by Equation (11).After a number of such loops, the resulting unit is denoted as h i and v j .Related parameters of RBM are updated as follows: where • epresents the expectation of training data, and η refers to the learning rate.

Fine-Tuning Process
After the pre-training, the DBN network adjusts its connection weights by the back propagation (BP) algorithm.This process is called fine-tuning, which enables the DBN to have better discriminant performance.Based on the loss function of the network, a gradient descent algorithm is adopted to adjust the network parameters, wherein the loss function defined in Equation ( 15) is applied to find the optimal parameter setting: where y defines the forecast point and y defines the actual point.

Model Implementation
Based on the DBN model and the LUBE method, the predication interval can be constructed, and the schematic diagram is shown in Figure 4.Moreover, Figure 5 shows the flowchart of the DBN-based LUBE method.The main steps are discussed below.
Step 1 Data processing.As is known, the power system is a typical nonlinear system, which is affected by various natural and social complex factors.In order to establish an accurate prediction model, the load forecasting method needs to quantify the effects of various factors, but such quantification is often very difficult.Since the evolution of any component of the system is determined by the other components that interact with that component, the load time series contains the long-term evolution information of all variables that affect the load.Therefore, studying the regularity of load and predicting the future development trend of load power can only use historical load data.The theoretical basis of this prediction method is the phase space reconstruction theory proposed by Packard et al. [36].
Assuming that the time series of a component of the system is observed as {x(k), k = 1, 2, • • • , N}, then a point state vector reconstructed in the phase space can be expressed as: where M is the number of phase points in the reconstructed phase space, M = N − (m − 1) × τ. m and τ respectively represent the embedding dimension and the time delay of the system.The authors in [37] demonstrated that when the embedding dimension is sufficiently large, the reconstruction algorithm is an embedded mapping.The reconstructed phase space can preserve many characteristics of the dynamic system, and can recover the dynamic characteristics of the system in the sense of topological equivalence.The key point of the phase space reconstruction technology is to correctly select the embedding dimension m and the time delay τ .A small m cannot show the real structure of a complex system, while a large m makes the true structural relationship between the points unclear, due to the decrease of the density of the points.Therefore, it is necessary to select an appropriate embedding dimension, m .In practical applications, due to the limited data, the choice of an appropriate τ is also critical.If τ is too small, the correlation of the coordinates is too strong, so that the information Energies 2018, 11, x FOR PEER REVIEW 9 of 18 many characteristics of the dynamic system, and can recover the dynamic characteristics of the system in the sense of topological equivalence.The key point of the phase space reconstruction technology is to correctly select the embedding dimension m and the time delay τ .A small m cannot show the real structure of a complex system, while a large m makes the true structural relationship between the points unclear, due to the decrease of the density of the points.Therefore, it is necessary to select an appropriate embedding dimension, m .In practical applications, due to the limited data, the choice of an appropriate τ is also critical.If τ is too small, the correlation of the coordinates is too strong, so that the information The key point of the phase space reconstruction technology is to correctly select the embedding dimension m and the time delay τ.A small m cannot show the real structure of a complex system, while a large m makes the true structural relationship between the points unclear, due to the decrease of the density of the points.Therefore, it is necessary to select an appropriate embedding dimension, m.In practical applications, due to the limited data, the choice of an appropriate τ is also critical.If τ is too small, the correlation of the coordinates is too strong, so that the information is not easily revealed; if τ is too large, the power system will be distorted.Overall, in this study, the two parameters are determined by the mutual information function and the false nearest neighbor method.
Once the time delay and embedding dimension are determined, the time series can be reconstructed.Then it can be applied to train the DBN model.In this case, the number of input units of the model is equal to the embedding dimension.
Step 2 Determine the primary structure of DBN.In this study, the trial and error method is used to find the appropriate number of hidden units in the DBN model.The number of input units is determined by the delay time.
Step 3 Parameter initialization.The parameters of DBN model are initialized by the RBM using Equations ( 12)-( 14).
Step 4 Generation of new population of GA.The new population is used to update the weights and thresholds of DBN, and then we can obtain new cost function.A smaller cost function value in this study will be retained.
Step 5 Model evaluation.First, predication intervals are constructed by the DBN model, then the corresponding metrics, i.e., PICP and PINAW, are calculated.Finally, CWC (a combination of PICP and PINAW) is used to evaluate the quality of the PI.
Step 6 Termination criterion.If the termination condition is met, then the training is terminated.
Otherwise return to step 3.
Step 7 Construct PI.Construct the predicated intervals by the obtained optimal DBN model.

Experiment
In this section, we describe the historical power load data of a small town in the UK as the short-term load forecasting case.To demonstrate the prediction performance of the proposed model, the proposed method is compared against three other state-of-the-art models.

Preprocessing of Data Set
The entire dataset uses real-world electricity load data of a small town in the UK in 2013, which is a 24 h daily load data from 1 January 2013 to 31 December 2013.According to the LUBE method, the data set needs to be divided into two parts: the training set and the test set.In this paper, we chose nearly 75% of the data set (i.e., the first 273 days) as the training set, and the remaining data (i.e., the last 92 days) as the test set to evaluate the predictive performance of the DBN-LUBE model.The power load data for the entire month of August 2013 is shown as an instance in Figure 6.
Energies 2018, 11, x FOR PEER REVIEW 10 of 18 is not easily revealed; if τ is too large, the power system will be distorted.Overall, in this study, the two parameters are determined by the mutual information function and the false nearest neighbor method.
Once the time delay and embedding dimension are determined, the time series can be reconstructed.Then it can be applied to train the DBN model.In this case, the number of input units of the model is equal to the embedding dimension.
Step 2 Determine the primary structure of DBN.In this study, the trial and error method is used to find the appropriate number of hidden units in the DBN model.The number of input units is determined by the delay time.Otherwise return to step 3.
Step 7 Construct PI.Construct the predicated intervals by the obtained optimal DBN model.

Experiment
In this section, we describe the historical power load data of a small town in the UK as the shortterm load forecasting case.To demonstrate the prediction performance of the proposed model, the proposed method is compared against three other state-of-the-art models.

Preprocessing of Data Set
The entire dataset uses real-world electricity load data of a small town in the UK in 2013, which is a 24 h daily load data from 1 January 2013 to 31 December 2013.According to the LUBE method, the data set needs to be divided into two parts: the training set and the test set.In this paper, we chose nearly 75% of the data set (i.e., the first 273 days) as the training set, and the remaining data (i.e., the last 92 days) as the test set to evaluate the predictive performance of the DBN-LUBE model.The power load data for the entire month of August 2013 is shown as an instance in Figure 6.

Parameter Settings
Before constructing the prediction interval, we needed to determine the number of input units and hidden units of the DBN model.The number of input nodes is related to the embedding dimension, which can be determined by the phase space reconstruction theory.By the mutual function and the false nearest function in the TISEAN toolbox [38], m is calculated as 10, and τ is 6.In this case, the dimension of reconstructed delay vectors is 10 and the number of input units for the DBN model is also 10.Specifically, the selection of training sample and training objective is shown in Table 2.

Training Sample
Training Objective x 6555 The number of hidden units was obtained by the trial and error method.The trial and error results of the DBN model are illustrated in Figure 7.It can be observed from the figure that the MSE achieved optimal performance when the number of hidden units was 34.

Parameter Settings
Before constructing the prediction interval, we needed to determine the number of input units and hidden units of the DBN model.The number of input nodes is related to the embedding dimension, which can be determined by the phase space reconstruction theory.By the mutual function and the false nearest function in the TISEAN toolbox [38], m is calculated as 10, and τ is 6.In this case, the dimension of reconstructed delay vectors is 10 and the number of input units for the DBN model is also 10.Specifically, the selection of training sample and training objective is shown in Table 2.The number of hidden units was obtained by the trial and error method.The trial and error results of the DBN model are illustrated in Figure 7.It can be observed from the figure that the MSE achieved optimal performance when the number of hidden units was 34.Therefore, the optimal structure of the DBN used in this case was 10-34-2.The diagram of the model is illustrated in Figure 8.For the DBN model, the transfer functions of the output neurons and the hidden neurons were the pure linear and tan-sigmoid functions, respectively.Therefore, the optimal structure of the DBN used in this case was 10-34-2.The diagram of the model is illustrated in Figure 8.For the DBN model, the transfer functions of the output neurons and the hidden neurons were the pure linear and tan-sigmoid functions, respectively.

Results Analysis
In the experiment, the proposed DBN-based LUBE method was compared with three best-inclass prediction models, i.e., the Elman model [39], the nonlinear autoregressive exogenous (NARX) model [40], and the back propagation (BP) network neural model [41].

Results Analysis
In the experiment, the proposed DBN-based LUBE method was compared with three best-in-class prediction models, i.e., the Elman model [39], the nonlinear autoregressive exogenous (NARX) model [40], and the back propagation (BP) network neural model [41].
The comparative prediction results obtained by the four forecasting models for the entire test data are shown in Figure 9.In Figure 9, the prediction results by the DBN fell within the range of the 1300-4800 KW load, which was the narrowest width of the constructed PIs.Compared to the DBN model, in the BP and NARX models, the predicted intervals were a bit wider.Although most of the predictions of the Elman model were good, the prediction range of the model at the beginning was too volatile, exceeding by an order of magnitude.

Results Analysis
In the experiment, the proposed DBN-based LUBE method was compared with three best-inclass prediction models, i.e., the Elman model [39], the nonlinear autoregressive exogenous (NARX) model [40], and the back propagation (BP) network neural model [41].
The comparative prediction results obtained by the four forecasting models for the entire test data are shown in Figure 9.In Figure 9, the prediction results by the DBN fell within the range of the 1300-4800 KW load, which was the narrowest width of the constructed PIs.Compared to the DBN model, in the BP and NARX models, the predicted intervals were a bit wider.Although most of the predictions of the Elman model were good, the prediction range of the model at the beginning was too volatile, exceeding by an order of magnitude.In order to better show the comparative test results, we selected the results of the four prediction models for the same 10 continuous days, which are respectively shown in Figure 10.In Figure 10, the construction PI of DBN model can perfectly cover the test data.The prediction results by the Elman model also cover most of the test data, but the PI is much wider than that of DBN.Compared to the DBN model, the reliability of the prediction results by the BP and NARX models were much worse.From the results, it can be concluded that the proposed method provided a narrower width and better reliability.models for the same 10 continuous days, which are respectively shown in Figure 10.In Figure 10, the construction PI of DBN model can perfectly cover the test data.The prediction results by the Elman model also cover most of the test data, but the PI is much wider than that of DBN.Compared to the DBN model, the reliability of the prediction results by the BP and NARX models were much worse.
From the results, it can be concluded that the proposed method provided a narrower width and better reliability.In Table 3, the prediction results are compared in terms of four indicators.The CWC indicator is the overall evaluation indicator.According to Equation (4), the smaller the CWC, the better the prediction performance.From Table 3, the CWC of the DBN-based LUBE approach was 0.4702, which was the best.Moreover, in terms of the PI coverage probability, DBN model showed significant superiority over the Elman model.The DBN was also much better than the BP neural network model for the PI-normalized average width.Besides, the DBN model had the shortest running time.In Table 3, the prediction results are compared in terms of four indicators.The CWC indicator is the overall evaluation indicator.According to Equation (4), the smaller the CWC, the better the prediction performance.From Table 3, the CWC of the DBN-based LUBE approach was 0.4702, which was the best.Moreover, in terms of the PI coverage probability, DBN model showed significant superiority over the Elman model.The DBN was also much better than the BP neural network model for the PI-normalized average width.Besides, the DBN model had the shortest running time.During the iteration, the change of the optimum individual's fitness by the four models is depicted in Figure 11.In Figure 11, the CWC indicator of all models decreased sharply and achieved satisfactory results in the initial iterations.As the search progressed, the CWC indicator continued to decrease and eventually it converged to the optimal value.The convergence performance shows that the network model with RBM-initialized network weights had stronger optimization capabilities.Compared with the optimal results, the initial and fitness of the DBN model were significantly superior to the results of other models.During the iteration, the change of the optimum individual's fitness by the four models is depicted in Figure 11.In Figure 11, the CWC indicator of all models decreased sharply and achieved satisfactory results in the initial iterations.As the search progressed, the CWC indicator continued to decrease and eventually it converged to the optimal value.The convergence performance shows that the network model with RBM-initialized network weights had stronger optimization capabilities.Compared with the optimal results, the initial and optimal fitness of the DBN model were significantly superior to the results of other models.To demonstrate the superiority of the proposed method even further, Figure 12 and Table 4 show the construction PI of the DBN model over four seasons.Overall, the DBN model had a great prediction performance for all the four seasons.Comparing the prediction results of the four seasons, To demonstrate the superiority of the proposed method even further, Figure 12 and Table 4 show the construction PI of the DBN model over four seasons.Overall, the DBN model had a great prediction performance for all the four seasons.Comparing the prediction results of the four seasons, the summer results were the worst.The reason might be that the temperature in summer changes more significantly than in the other seasons.the summer results were the worst.The reason might be that the temperature in summer changes more significantly than in the other seasons.

Conclusions
Load prediction often involves a number of uncertainties which makes point predication-based methods not applicable in practice.Interval prediction, as an effective method to quantify uncertainties, therefore has attracted more and more attention.In this study, a selection of point and interval predication methods are first briefly reviewed, then a DBN-based lower upper bound estimation (LUBE) method for short-term load interval forecasting is proposed.To demonstrate the superiority of the proposed method, we compare the DBN-based LUBE method with three state-ofthe-art methods, i.e., the BP neural network, the Elman neural network and the NARX neural network.Experimental results show that the DBN-based LUBE method provides the best predication results in a relatively short period of time.

Conclusions
Load prediction often involves a number of uncertainties which makes point predication-based methods not applicable in practice.Interval prediction, as an effective method to quantify uncertainties, therefore has attracted more and more attention.In this study, a selection of point and interval predication methods are first briefly reviewed, then a DBN-based lower upper bound estimation (LUBE) method for short-term load interval forecasting is proposed.To demonstrate the superiority of the proposed method, we compare the DBN-based LUBE method with three state-of-the-art methods, i.e., the BP neural network, the Elman neural network and the NARX neural network.Experimental results show that the DBN-based LUBE method provides the best predication results in a relatively short period of time.
In terms of future work, more empirical tests should first be performed to further demonstrate the effectiveness of the proposed method.Second, in the single-objective LUBE method, the final objective function CWC is a simple combination of PICP and PINAW, and the forecast accuracy and robustness are generally two conflicting objectives; thus, a multi-objective prediction method should be further studied.Lastly, future studies should also consider the potential effects of an advanced evolutionary algorithm [42][43][44] to enhance the performance and efficiency of the multi-objective method.

Figure 1 .
Figure 1.The flowchart of a typical NN-based lower-upper bound estimation (LUBE) approach.

Figure 1 .
Figure 1.The flowchart of a typical NN-based lower-upper bound estimation (LUBE) approach.

Figure 3 .
Figure 3.The Algorithm schematic of a restricted Boltzmann machine (RBM).

−
over all possible configurations, and regularizes it as below:

Figure 3 .
Figure 3.The Algorithm schematic of a restricted Boltzmann machine (RBM).
11,  x FOR PEER REVIEW 9 of 18 many characteristics of the dynamic system, and can recover the dynamic characteristics of the system in the sense of topological equivalence.

Figure 4 .
Figure 4. Illustration of the lower and upper bound estimations by DBN.

Figure 5 .
Figure 5.The flowchart of the DBN-based LUBE method.

Figure 4 .
Figure 4. Illustration of the lower and upper bound estimations by DBN.

Figure 4 .
Figure 4. Illustration of the lower and upper bound estimations by DBN.

Figure 5 .
Figure 5.The flowchart of the DBN-based LUBE method.

Figure 5 .
Figure 5.The flowchart of the DBN-based LUBE method.

Figure 6 .
Figure 6.The power load data of August 2013.

Figure 6 .
Figure 6.The power load data of August 2013.

Figure 7 .
Figure 7.The trial and error results of the DBN model.

Figure 7 .
Figure 7.The trial and error results of the DBN model.

Figure 8 .
Figure 8.The schematic of DBN structure.

Figure 8 .
Figure 8.The schematic of DBN structure.

Figure 9 .
Figure 9.The prediction results by four models for the entire test data: (a) DBN-based PI; (b) BP network neural-based PI; (c) Elman network neural-based PI; (d) NARX network neural-based PI.

Figure 10 .
Figure 10.The construction PI of 10 continuous days by four prediction models: (a) DBN-based PI; (b) BP network neural-based PI; (c) Elman network neural-based PI; (d) NARX network neural-based PI.

Figure 10 .
Figure 10.The construction PI of 10 continuous days by four prediction models: (a) DBN-based PI; (b) BP network neural-based PI; (c) Elman network neural-based PI; (d) NARX network neural-based PI.

Figure 11 .
Figure 11.The comparative results for the CWC value of the best population by four prediction models on the training data: (a) The CWC obtained by DBN; (b) the CWC obtained by BP network neural; (c) the CWC obtained by Elman network neural; (d) the CWC obtained by NARX network neural.

Figure 11 .
Figure 11.The comparative results for the CWC value of the best population by four prediction models on the training data: (a) The CWC obtained by DBN; (b) the CWC obtained by BP network neural; (c) the CWC obtained by Elman network neural; (d) the CWC obtained by NARX network neural.

Figure 12 .
Figure 12.The construction PIs of four seasons by the DBN neural network: (a) The prediction results for spring; (b) the prediction results for summer; (c) the prediction results for autumn; (d) the prediction results for winter.

Figure 12 .
Figure 12.The construction PIs of four seasons by the DBN neural network: (a) The prediction results for spring; (b) the prediction results for summer; (c) the prediction results for autumn; (d) the prediction results for winter.
(12)(13)(14)eter initialization.The parameters of DBN model are initialized by the RBM using Equations(12)(13)(14).Step 4 Generation of new population of GA.The new population is used to update the weights and thresholds of DBN, and then we can obtain new cost function.A smaller cost function value in this study will be retained.Step 5 Model evaluation.First, predication intervals are constructed by the DBN model, then the corresponding metrics, i.e., PICP and PINAW, are calculated.Finally, CWC (a combination of PICP and PINAW) is used to evaluate the quality of the PI.Step 6 Termination criterion.If the termination condition is met, then the training is terminated.

Table 2 .
Training sample and training objective.

Table 2 .
Training sample and training objective.

Table 3 .
Comparative results of the four models in terms of different indicators.

Table 4 .
Prediction results on four seasons.

Table 4 .
Prediction results on four seasons.