Short-Term Load Forecasting Using a Novel Deep Learning Framework

: Short-term load forecasting is the basis of power system operation and analysis. In recent years, the use of a deep belief network (DBN) for short-term load forecasting has become increasingly popular. In this study, a novel deep-learning framework based on a restricted Boltzmann machine (RBM) and an Elman neural network is presented. This novel framework is used for short-term load forecasting based on the historical power load data of a town in the UK. The obtained results are compared with an individual use of a DBN and Elman neural network. The experimental results demonstrate that our proposed model can signiﬁcantly ameliorate the prediction accuracy.


Introduction
In modern society, electrical energy has become the basic resource of national economic and social development, which is widely applied in various fields, such as power, lighting, chemistry, textile, communication, and broadcasting.With growing living standards and the fast development of electric power industry, better quality power supply has been requested.This means that power users need more economical and reliable electrical energy.The forms of electricity generation and consumption are changing all the time.Due to the non-storable character of electric energy, it is expected that the electricity supply and demand can be balanced as much as possible.Electricity generation needs to change along with electricity consumption; otherwise, the stability of the power system could be endangered [1].In order to keep the balance of the electrical power network, a precise load prediction is essential.Load prediction can be categorized into four classes: ultra-short-term forecasting, short-term forecasting, medium-term forecasting, and long-term forecasting, based on the forecasting duration.
Short-term load forecasting is the basis of power system operation and analysis, referring to the power load prediction for the next few hours, one day, or several days.This load forecasting is beneficial for optimizing operation time of generating units, i.e., the starting and stopping time, and their output.A precise load forecast is helpful for minimizing the total consumption of the generating units [2].Therefore, improving the accuracy of short-term load forecasting is crucial in the operation and management of the modern power system.

Regression Analysis
When analyzing multi-factor models, regression analysis is simpler and more convenient.It can accurately measure the correlation degree between various factors and the degree of regression fitting.
The model is more mechanical and less flexible, and requires higher-quality information.

ARIMA
The model is simple and easy to master.Meanwhile, it has the ability to dynamically determine the parameters of the model and has a fast computation speed.
It can neither reflect the internal relations of things nor analyze the relationship between two factors.Furthermore, it is only suitable for short-term prediction.

ANN
The model has a rapid calculating speed and good non-linear fitting capability.More importantly, it does not need to set up a mathematical model.
Firstly, it cannot express and analyze the relationship between the input and output of the predicted system.Secondly, it has both slow convergence in a learning course and poor fault tolerance ability.Lastly, it is easy to fall into the local minimum.

SVM
The model is simpler in structure with a few parameters.Fewer samples are needed to build the model.More importantly, it has good generalizability.
It is hard to implement for large-scale training samples.It is also difficult to solve multiple classification problems.

Hybrid Model
The model not only preserves the advantages of each individual model, but can also use prediction sample information to a great extent.It is more systematic and more comprehensive than a single prediction model.
The model needs a variety of prediction methods, which makes it complicated and cumbersome.When analyzing the problems in reality, it is difficult to determine that they have some functional relationship.
Among different kinds of prediction models, a deep belief network (DBN) [14] has shown promising performance.The deep belief network has a deep architecture that can represent multiple features of input patterns hierarchically with the pre-trained restricted Boltzmann machine (RBM).It has been widely used in many fields, such as image processing [15], dimensionality reduction [16], and classification tasks [17].Previous research has shown that a DBN performs significantly better than shallow neural networks [18].Compared with the shallow model, DBN can reveal the implicit characteristics of data from the bottom to the top.
In the last decade, numerous studies have been conducted using deep belief networks to perform time series data prediction [19].For instance, Hu et al. [20] pre-trained a deep belief network using different pre-training models and investigated the difference between a DBN and a Stacked Denoising Autoencoder (SDA) when used as pre-training models.Qiu et al. [21] proposed an ensemble approach based on a DBN and Empirical Mode Decomposition (EMD) algorithm to forecast load time series.Adachi et al. [22] used samples from a D-Wave quantum annealing machine to estimate model expectations of Restricted Boltzmann Machines.Keyvanrad et al. [23] developed a new model named nsDBN that has different behaviors according to deviation of the activation of the hidden units from a fixed value.Meanwhile, the model has a variance parameter that can control the force degree of sparseness.Plahl et al. [24] explored a Sparse Encoding Symmetric Machine (SESM) to pre-train DBNs and applied this method to speech recognition.Ranzato et al. [25] described a novel and efficient algorithm to learn sparse representations, and compared it theoretically and experimentally with a Restricted Boltzmann Machine.Kamada et al. [26] proposed an adaptive structure learning method of a Restricted Boltzmann Machine (RBM), which can generate/annihilate neurons by a self-organized learning method according to input patterns.In addition, the adaptive DBN in the assembly process of a pre-trained RBM layer was also proposed.Papa et al. [27] applied a fast meta-heuristic approach called Harmony Search (HS) to fine-tune the parameters of a DBN.Kuremoto et al. [28] optimized the number of input (visible) and hidden neurons by means of the Particle Swarm Optimization (PSO) method, as well as the RBM learning rate.Torres et al. [29] developed a new approach which used an Apache Spark framework to load data in memory and deep learning methods as regressors to forecast electricity consumption.Quyang et al. [30] proposed a data-driven deep learning framework for power load forecasting.First, a Gumbel-Hougaard Copula model is used to model the tail-dependence between power load, electricity price, and temperature.Then, the tail-dependence is applied to a deep belief network for power load forecasting.
Although the DBN has been widely studied, as displayed by the aforementioned studies, to the best of our knowledge, focus is mainly on the training process of DBNs, more specifically, the training of RBMs.Research on the network structure of a DBN is rarely reported in the literature.After the pre-training stage of a DBN, the obtained parameters can be expanded for a multi-hidden layer neural network (MLNN).Generally, neural network models have two common types: Back Propagation (BP) neural networks and Elman neural networks.Compared with BP neural networks, Elman neural networks as a recurrent neural network have been proved to have better performance in time-series forecasting.In the last few decades, the Elman neural network model has been studied extensively for short-term electrical load forecasting [31].For instance, the combination of an Elman network and wavelet is proposed to forecast a one-day-ahead electrical power load by considering the impact of temperature in Reference [32].Similarly, in Reference [33], the authors investigated the short-term load forecasting problem via a hybrid quantized Elman neural network with the least number of quantized inputs, hourly historical load, hourly predicted target temperature, and time index.Despite the considerable research on Elman neural networks, these studies focus on shallow neural networks.Research on the combination of DBN and Elman neural networks for short-term load forecasting is rarely investigated.To fill in this research gap, this study proposed a new deep learning framework for short-term load prediction based on RBM and Elman neural networks.
The rest of the paper is organized as follows.Preliminary knowledge, including introductions to deep belief networks and Elman neural networks, is described in Section 2. Section 3 introduces the proposed Elman integrated deep learning framework.Experimental results are presented in Section 4, and Section 5 concludes this paper and identifies future studies.

Deep Belief Network
The DBN is a deep neural network that consists of several layers of restricted Boltzmann machines (RBMs) and a layer of neural network (NN) [34].The network structure of a typical DBN is shown in Figure 1.Traditional NN models adopt a gradient descent algorithm as the main training method, which is easily trapped in a local minimum value.When the NN structure becomes deep, this drawback becomes apparent because numerous network parameters need to be optimized.Initializing the network parameters to the greatest extent possible is a more sensible method to mitigate the local optimum dilemma.Consequently, in the search spaces, if the network parameters are initialized close to the optimal values, the opportunity to find out the global optimum will also greatly increase.With regard to DBNs, the training process consists of two components: a layer-wise pre-training process and a fine-tuning process.The former is applied to provide better initial values of the network parameters, and the latter is applied to search the optimal parameters based on the initial states of the network.
nodes: visible nodes and hidden nodes.There are connections between nodes in different layers, while there are no connections between nodes in the same layer.Connections between nodes are symmetric and bidirectional.RBMs have been applied to generate models of various data types.A single RBM is shown in Figure 2.  The RBM is an energy model.The energy function of visible layer and hidden layer is depicted as: A lower energy indicates that the network is in a more desirable state.This energy function is used to calculate the probability that is assigned to every possible pair of visible and hidden vectors: where over all possible configurations, and is used for normalization:

Pre-Training Process
The parameters of each hidden layer in a DBN can be initialized by the pre-training process, resulting in a better local optimum, or even the global optimal region.This process is obtained through an unsupervised greedy optimization algorithm by using the restricted Boltzmann machine (RBM).
A restricted Boltzmann machine (RBM) can learn a distribution from its input sample, which is a stochastic two-neural network [35].The network generally consists of two different layers of nodes: visible nodes and hidden nodes.There are connections between nodes in different layers, while there are no connections between nodes in the same layer.Connections between nodes are symmetric and bidirectional.RBMs have been applied to generate models of various data types.A single RBM is shown in Figure 2.   The RBM is an energy model.The energy function of visible layer and hidden layer is depicted as: where i v and j h are the states of visible node i and hidden node j , respectively; A lower energy indicates that the network is in a more desirable state.This energy function is used to calculate the probability that is assigned to every possible pair of visible and hidden vectors: The RBM is an energy model.The energy function of visible layer and hidden layer is depicted as: where v i and h j are the states of visible node i and hidden node j, respectively; a i and b j represent the bias between the visible layer and hidden layer; and w ij is the connecting weight between them.
Energies 2018, 11, 1554 5 of 15 A lower energy indicates that the network is in a more desirable state.This energy function is used to calculate the probability that is assigned to every possible pair of visible and hidden vectors: where Z is the sum of e −E(v,h) over all possible configurations, and is used for normalization: For binary state nodes v i and h j ∈ {0, 1}, the state of hidden node h j is set to 1 with probabilities: where σ(x) represents the logistic sigmoid function 1/(1 + exp(−x)).The state of visible node v i is set to 1 with probability: The training process of the RBM is described as follows.Firstly, a training sample is assigned to the visible nodes, and the {v i } is obtained.Then, the hidden nodes state h j is sampled according to probabilities.This process is repeated once more to update the visible and hidden nodes to produce the one-step "reconstructed" states v i and h j .The related parameters are updated as follows: where η represents the learning rate, and • refers to the expectation of the training data.
The above-mentioned expressions can be derived from the Contrastive Divergence (CD)algorithm.

Fine-Tuning
After pre-training, each layer of DBN is configured with initial parameters.Then the DBN starts fine-tuning the whole structure.Based on the loss function of the forecast data and the actual data, a gradient descent algorithm can be adopted to make a slight adjustment to the network parameters throughout the whole network, achieving the optimal states of the parameters.In this paper, the loss function is depicted as follows: Where y denotes the forecast data and y denotes the actual data.More generally, the DBN is a special BP neural network where the parameters of hidden layers are initialized by an RBM, instead of being randomly assigned.

Elman Neural Network
An Elman neural network is based on the BP neural network that adopts a connection layer to feedback the outputs from the hidden layer.It is a typical local recurrent neural network.The connection layer applies to memory the output value of the last step, which is then used as the input of the hidden layer.This can be considered as a step delay, which makes the Elman neural network sensitive to historical data, and thus, enables the network to have a dynamic memory function [36].
The basic Elman network is composed of an input layer, a connection layer, a hidden layer, and an output layer, as shown in Figure 3.The activation function of the hidden layer is nonlinear, e.g., the sigmoid function or the tan-sigmoid function.The activation function of the connection layer and the output layer is linear.
Energies 2018, 11, x FOR PEER REVIEW 6 of 15 The basic Elman network is composed of an input layer, a connection layer, a hidden layer, and an output layer, as shown in Figure 3.The activation function of the hidden layer is nonlinear, e.g., the sigmoid function or the tan-sigmoid function.The activation function of the connection layer and the output layer is linear.In light of the Elman structure, the nonlinear relation of this model can be represented by the following mathematical equations: where x and I are the output of the hidden layer and the input layer, respectively; W3 are the connecting weights of the input layer to hidden layer, the connection layer to hidden layer, and the hidden layer to output layer, respectively; k is the kth iteration.

RBM-Elman Network
Compared with a BP neural network, which is a static mapping network, the Elman neural network appends an important feedback mechanism that behaves like a dynamic system.Therefore, it is more suitable for use as a time-series model.In this study, we propose a new deep learning framework based on RBM and Elman neural networks for short-term load prediction.The new deep learning framework is denoted as an RBM-Elman network.

RBM-Elman Optimization
Elman neural networks inherit some defects of BP neural networks.For example, a slow convergence rate and being easily trapped in local optima.These deficiencies are partly due to the In light of the Elman structure, the nonlinear relation of this model can be represented by the following mathematical equations: where x and I are the output of the hidden layer and the input layer, respectively; x c and z represent the output of the context layer and the output layer; φ(•) is the activation function, which is usually a nonlinear sigmoid function; and f (•) is a pure linear activation function.W 1 , W 2 and W 3 are the connecting weights of the input layer to hidden layer, the connection layer to hidden layer, and the hidden layer to output layer, respectively; k is the kth iteration.

RBM-Elman Network
Compared with a BP neural network, which is a static mapping network, the Elman neural network appends an important feedback mechanism that behaves like a dynamic system.Therefore, it is more suitable for use as a time-series model.In this study, we propose a new deep learning framework based on RBM and Elman neural networks for short-term load prediction.The new deep learning framework is denoted as an RBM-Elman network.

RBM-Elman Optimization
Elman neural networks inherit some defects of BP neural networks.For example, a slow convergence rate and being easily trapped in local optima.These deficiencies are partly due to the randomly initialized parameters.Therefore, we proposed an RBM to initialize the weights and thresholds of Elman neural networks, by which we expected generalizability and the training speed of Elman neural networks to improve.The basic steps of our proposed method are described as follows, and Figure 4 shows the model implementation flowchart.
Energies 2018, 11, x FOR PEER REVIEW 7 of 15 randomly initialized parameters.Therefore, we proposed an RBM to initialize the weights and thresholds of Elman neural networks, by which we expected generalizability and the training speed of Elman neural networks to improve.The basic steps of our proposed method are described as follows, and Figure 4 shows the model implementation flowchart.

RBM-Elman Algorithm
The main steps of an RBM-Elman algorithm are discussed in turn below: 1. determines the primary structure of an Elman neural network, 2. applies RBMs to initialize the parameter of the hidden layer of Elman neural network, 3. trains the Elman neural network using a gradient descent algorithm, and 4. forecasts load output based on the trained network.
The significance of the new model is the initialization of connection weights and threshold using RBMs.This is expected to be helpful in improving the training speed and convergence, saving the network running time.

Case Studies
In order to validate the forecast performance of our proposed model, we describe a realistic case study of short-term load prediction in this section.First, the data set and model implementation are illustrated.Then, experimental results are demonstrated.

Data Set
The historical power load data of a town in the UK is employed to investigate the forecast effect of our proposed model.The chosen dataset is composed of 24 h of load data from 1 January 2014 to 31 December 2014.The whole dataset is further divided into two subsets: the training set and the testing set.In this study, about 80% of the whole dataset, i.e., the first 292 days, is chosen as the training set.The rest of data is applied to test the forecasting performance of the proposed model.Figure 5 shows an example of load data for June 2014.

RBM-Elman Algorithm
The main steps of an RBM-Elman algorithm are discussed in turn below: 1.
determines the primary structure of an Elman neural network, 2.
applies RBMs to initialize the parameter of the hidden layer of Elman neural network, 3.
trains the Elman neural network using a gradient descent algorithm, and 4.
forecasts load output based on the trained network.
The significance of the new model is the initialization of connection weights and threshold using RBMs.This is expected to be helpful in improving the training speed and convergence, saving the network running time.

Case Studies
In order to validate the forecast performance of our proposed model, we describe a realistic case study of short-term load prediction in this section.First, the data set and model implementation are illustrated.Then, experimental results are demonstrated.

Data Set
The historical power load data of a town in the UK is employed to investigate the forecast effect of our proposed model.The chosen dataset is composed of 24 h of load data from 1 January 2014 to 31 December 2014.The whole dataset is further divided into two subsets: the training set and the testing set.In this study, about 80% of the whole dataset, i.e., the first 292 days, is chosen as the training set.The rest of data is applied to test the forecasting performance of the proposed model.Figure 5 shows an example of load data for June 2014.

Parameter Settings
As indicated above, the input data of the forecast model is the historical power load data.To construct the RBM-Elman model, the raw time-series data was transformed into a more suitable form.In this paper, we employed the state space reconstruction technique with the delay embedding theorem [37] to manipulate the raw data.The discrete time dynamic system was described as: where F represents a nonlinear vector valued function and   t X represent the system state at time step t .By the delay embedding theorem, it was supposed that the information of higher dimensional data could be compressed into the one-dimensional chaotic data.Therefore, the time series data   t X was reconstructed as follows: where m represents the embedding dimension and  represents time delay.Therefore, reconstructing time series turns into finding the optimal values of parameters m and  .For a given dataset, the false nearest neighbor method and mutual information function were applied to determine these two parameters [38].In this study, m was determined to be 10 and  was determined to be 6, which were obtained using the utility functions false nearest and mutual in TISEAN toolbox [39].Then, the reconstructed time series was generated, which is used to train the RBM-Elman neural network.Because of the number of nodes in input layer for the RBM-Elman model was determined by the dimension of reconstructed delay vectors, which is 10, the number of input nodes was also 10.In addition, the raw data was normalized into [0, 1]to accelerate the model training process.
Next, the trial and error method was employed to investigate the number of nodes in the hidden layer.The trial and error results of Elman model are illustrated in Figure 6.From the Figure, we can observe that the mean absolute percentage error (MAPE) index of the model achieved optimal performance when the number was 24.Hence, the structure of optimal policy for Elman neural network is 10-24-1, and the number of context layer nodes was also 24.

Parameter Settings
As indicated above, the input data of the forecast model is the historical power load data.To construct the RBM-Elman model, the raw time-series data was transformed into a more suitable form.In this paper, we employed the state space reconstruction technique with the delay embedding theorem [37] to manipulate the raw data.The discrete time dynamic system was described as: where F represents a nonlinear vector valued function and X(t) represent the system state at time step t.By the delay embedding theorem, it was supposed that the information of higher dimensional data could be compressed into the one-dimensional chaotic data.Therefore, the time series data X(t) was reconstructed as follows: where m represents the embedding dimension and τ represents time delay.Therefore, reconstructing time series turns into finding the optimal values of parameters m and τ.For a given dataset, the false nearest neighbor method and mutual information function were applied to determine these two parameters [38].In this study, m was determined to be 10 and τ was determined to be 6, which were obtained using the utility functions false nearest and mutual in TISEAN toolbox [39].Then, the reconstructed time series was generated, which is used to train the RBM-Elman neural network.
Because of the number of nodes in input layer for the RBM-Elman model was determined by the dimension of reconstructed delay vectors, which is 10, the number of input nodes was also 10.
In addition, the raw data was normalized into [0, 1] to accelerate the model training process.Next, the trial and error method was employed to investigate the number of nodes in the hidden layer.The trial and error results of Elman model are illustrated in Figure 6.From the Figure, we can observe that the mean absolute percentage error (MAPE) index of the model achieved optimal performance when the number was 24.Hence, the structure of optimal policy for Elman neural network is 10-24-1, and the number of context layer nodes was also 24.In the end, for this neural network, the activation functions in the hidden layer and the output layer were the common tan-sigmoid and pure linear function, respectively.The diagram of the model is illustrated in Figure 7.

Model Evaluation
To examine the performance of our proposed model, two metrics were calculated to evaluate the error of output power prediction, including MAPE and mean squared error (MSE), which are frequently used in the literature.The MAPE and MSE are defined as: where N denotes the number of forecast sample, ( ) t Y represents the actual value at time instance t , and ( ) t Y′ is the predicted value.

Experimental Results
The proposed RBM-Elman model was employed for use in a short-term load prediction.For comparison purposes, a DBN model was designed to perform the short-term load prediction with the same dataset.A typical three-layer Elman neural network was also employed to demonstrate the validity of our proposed model.The results obtained by each prediction method are illustrated in Table 2. Furthermore, the forecasting results by RBM-Elman model for test data are illustrated in Figure 8.For a better visualization, the results of seven consecutive days are also illustrated in In the end, for this neural network, the activation functions in the hidden layer and the output layer were the common tan-sigmoid and pure linear function, respectively.The diagram of the model is illustrated in Figure 7.In the end, for this neural network, the activation functions in the hidden layer and the output layer were the common tan-sigmoid and pure linear function, respectively.The diagram of the model is illustrated in Figure 7.

Model Evaluation
To examine the performance of our proposed model, two metrics were calculated to evaluate the error of output power prediction, including MAPE and mean squared error (MSE), which are frequently used in the literature.The MAPE and MSE are defined as: where N denotes the number of forecast sample,   t Y represents the actual value at time instance t , and   t Y  is the predicted value.

Experimental Results
The proposed RBM-Elman model was employed for use in a short-term load prediction.For comparison purposes, a DBN model was designed to perform the short-term load prediction with the same dataset.A typical three-layer Elman neural network was also employed to demonstrate the validity of our proposed model.The results obtained by each prediction method are illustrated in Table 2. Furthermore, the forecasting results by RBM-Elman model for test data are illustrated in Figure 8.For a better visualization, the results of seven consecutive days are also illustrated in

Model Evaluation
To examine the performance of our proposed model, two metrics were calculated to evaluate the error of output power prediction, including MAPE and mean squared error (MSE), which are frequently used in the literature.The MAPE and MSE are defined as: where N denotes the number of forecast sample, Y(t) represents the actual value at time instance t, and Y (t) is the predicted value.

Experimental Results
The proposed RBM-Elman model was employed for use in a short-term load prediction.For comparison purposes, a DBN model was designed to perform the short-term load prediction with the same dataset.A typical three-layer Elman neural network was also employed to demonstrate the validity of our proposed model.The results obtained by each prediction method are illustrated in Table 2. Furthermore, the forecasting results by RBM-Elman model for test data are illustrated in Figure 8.For a better visualization, the results of seven consecutive days are also illustrated in Figure 9.The forecasting results of seven consecutive days by the DBN model and Elman model are illustrated in Figures 10 and 11, respectively.The tabular overviews of the results are presented in Table 2. Amongst them, the MAPE of the RBM-Elman network was the minimum, which was 0.0346.The MAPE of the DBN was 0.0381.From Table 2, we can see that the forecast performance of our proposed RBM-Elman prediction model was better than other models.Meanwhile, it had a shorter computing time.From the above Figures, it is also evident that our proposed method provided a better match of actual load and forecasted load.
In order to further demonstrate the forecast performance of our proposed method, we decomposed the dataset based on different seasons.Each season's dataset is split into the usual 80-20% training-test sets structure.Then our proposed method is applied to examine them.The forecast results of different seasons by RBM-Elman model are given in Table 3. From the results, we can see that the RBM-Elman model achieved good forecasting precision for four seasons.In  The tabular overviews of the results are presented in Table 2. Amongst them, the MAPE of the RBM-Elman network was the minimum, which was 0.0346.The MAPE of the DBN was 0.0381.From Table 2, we can see that the forecast performance of our proposed RBM-Elman prediction model was better than other models.Meanwhile, it had a shorter computing time.From the above Figures, it is also evident that our proposed method provided a better match of actual load and forecasted load.
In order to further demonstrate the forecast performance of our proposed method, we decomposed the dataset based on different seasons.Each season's dataset is split into the usual 80-20% training-test sets structure.Then our proposed method is applied to examine them.The forecast results of different seasons by RBM-Elman model are given in Table 3. From the results, we can see that the RBM-Elman model achieved good forecasting precision for four seasons.In The tabular overviews of the results are presented in Table 2. Amongst them, the MAPE of the RBM-Elman network was the minimum, which was 0.0346.The MAPE of the DBN was 0.0381.From Table 2, we can see that the forecast performance of our proposed RBM-Elman prediction model was better than other models.Meanwhile, it had a shorter computing time.From the above Figures, it is also evident that our proposed method provided a better match of actual load and forecasted load.
In order to further demonstrate the forecast performance of our proposed method, we decomposed the dataset based on different seasons.Each season's dataset is split into the usual 80-20% training-test sets structure.Then our proposed method is applied to examine them.The forecast results of different seasons by RBM-Elman model are given in Table 3. From the results, we can see that the RBM-Elman model achieved good forecasting precision for four seasons.In addition, the forecast results of spring and autumn are better than those of summer and winter.This may be due to the change of temperature in summer and winter seasons.

Conclusions
In the competitive electricity market, an accurate electricity load forecast is necessary.In this study, a deep learning framework based on RBM and Elman neural networks was presented.To verify the effectiveness of our proposed model, the proposed model was compared with an individual use of the DBN and Elman neural networks.The results of these experiments demonstrate that our proposed model achieved the best forecasting precision and had a shorter computing time.
In future studies, we would first like to examine our method on more complex datasets.Second, the hyper-parameters of neural networks were fine-tuned by the back-propagation method, which makes it easy to fall into local optima.Thus, we would like to apply advanced evolutionary algorithms [40][41][42][43] to lightly adjust those hyper-parameters to improve the performance of neural networks even further.Lastly, other improvements of the deep believe network for load predication will also be considered.

Figure 1 .
Figure 1.Illustration of a typical DBN structure.

Figure 2 .
Figure 2. Illustration of a typical RBM structure.
states of visible node i and hidden node j , respectively; between the visible layer and hidden layer; and ij w is the connecting weight between them.

Figure 1 .
Figure 1.Illustration of a typical DBN structure.

Energies 2018 ,
11, x FOR PEER REVIEW 4 of 15nodes: visible nodes and hidden nodes.There are connections between nodes in different layers, while there are no connections between nodes in the same layer.Connections between nodes are symmetric and bidirectional.RBMs have been applied to generate models of various data types.A single RBM is shown in Figure2.

Figure 1 .
Figure 1.Illustration of a typical DBN structure.

Figure 2 .
Figure 2. Illustration of a typical RBM structure.
bias between the visible layer and hidden layer; and ij w is the connecting weight between them.

Figure 2 .
Figure 2. Illustration of a typical RBM structure.

c x and z represent the output
of the context layer and the output layer;     is the activation function, which is usually a nonlinear sigmoid function; and    f is a pure linear activation function.W1, W2 and

Figure 4 .
Figure 4.The flowchart of the RBM-Elman model.

Figure 4 .
Figure 4.The flowchart of the RBM-Elman model.

Figure 5 .
Figure 5. Illustration of the load power in June 2014.

Figure 5 .
Figure 5. Illustration of the load power in June 2014.

Figure 6 .
Figure 6.The trial and error results of Elman model.

Figure 7 .
Figure 7.The structure of Elman neural network.

Figure 6 .
Figure 6.The trial and error results of Elman model.

Energies 2018 , 15 Figure 6 .
Figure 6.The trial and error results of Elman model.

Figure 7 .
Figure 7.The structure of Elman neural network.

Figure 7 .
Figure 7.The structure of Elman neural network.

Figure 9 .Figure 9 .
Figure 9.The forecast results of seven consecutive days using RBM-Elman model.

Figure 10 .
Figure 10.The forecast results of seven consecutive days using DBN model.

Figure 11 .
Figure 11.The forecast results of seven consecutive days using Elman model.

Figure 10 . 15 Figure 10 .
Figure 10.The forecast results of seven consecutive days using DBN model.

Figure 11 .
Figure 11.The forecast results of seven consecutive days using Elman model.

Figure 11 .
Figure 11.The forecast results of seven consecutive days using Elman model.

Table 2 .
Forecast results by different models.The forecasting results of seven consecutive days by the DBN model and Elman model are illustrated in Figures 10 and 11, respectively.

Table 2 .
Forecast results by different models.
Figure 8.The forecast results using RBM-Elman model for test data.

Table 2 .
Forecast results by different models.
Figure 8.The forecast results using RBM-Elman model for test data.

Table 3 .
Forecast results of different seasons by RBM-Elman model.