Deep Belief Network Based Hybrid Model for Building Energy Consumption Prediction

To enhance the prediction performance for building energy consumption, this paper presents a modified deep belief network (DBN) based hybrid model. The proposed hybrid model combines the outputs from the DBN model with the energy-consuming pattern to yield the final prediction results. The energy-consuming pattern in this study represents the periodicity property of building energy consumption and can be extracted from the observed historical energy consumption data. The residual data generated by removing the energy-consuming pattern from the original data are utilized to train the modified DBN model. The training of the modified DBN includes two steps, the first one of which adopts the contrastive divergence (CD) algorithm to optimize the hidden parameters in a pre-train way, while the second one determines the output weighting vector by the least squares method. The proposed hybrid model is applied to two kinds of building energy consumption data sets that have different energy-consuming patterns (daily-periodicity and weekly-periodicity). In order to examine the advantages of the proposed model, four popular artificial intelligence methods—the backward propagation neural network (BPNN), the generalized radial basis function neural network (GRBFNN), the extreme learning machine (ELM), and the support vector regressor (SVR) are chosen as the comparative approaches. Experimental results demonstrate that the proposed DBN based hybrid model has the best performance compared with the comparative techniques. Another thing to be mentioned is that all the predictors constructed by utilizing the energy-consuming patterns perform better than those designed only by the original data. This verifies the usefulness of the incorporation of the energy-consuming patterns. The proposed approach can also be extended and applied to some other similar prediction problems that have periodicity patterns, e.g., the traffic flow forecasting and the electricity consumption prediction.


Introduction
With the growth of population and the development of economy, more and more energy is consumed in the residential and office buildings.Building energy conservation plays an important role in the sustainable development of economy.However, some ubiquitous issues, e.g., the poor building management and the unreasonable task scheduling, are impeding the efficiency of the energy conservation policies.To improve the building management and the task scheduling of building equipment, one way is to provide accurate prediction of the building energy consumption.
Nowadays, numerous data-driven artificial intelligence approaches have been proposed for building energy consumption prediction.In [1], the random forest and the artificial neural network (ANN) were applied to the high-resolution prediction of building energy consumption, and their experimental results demonstrated that both models have comparable predictive power.In [2], a hybrid model combining different machine learning algorithms was presented for optimizing energy consumption of residential buildings under the consideration of both continuous and discrete parameters of energy.In [3], the extreme learning machine (ELM) was used to estimate the building energy consumption, and simulation results indicated that the ELM performed better than the genetic programming (GP) and the ANN.In [4], the clusterwise regression method, also known as the latent class regression, which integrates clustering and regression, was utilized to the accurate and stable prediction of building energy consumption data.In [5], the feasibility and applicability of support vector machine (SVM) for building energy consumption prediction were examined in a tropical region.Moreover, in [6][7][8][9], a variation of SVM, the support vector regressor (SVR) was proposed for forecasting the building energy consumption and the electric load.Furthermore, in [10], a novel machine learning model was constructed for estimating the commercial building energy consumption.
The historical building energy consumption data have high levels of uncertainties and randomness due to the influence of the human distribution, the thermal environment, the weather conditions and the working hours in buildings.Thus, there still exists the need to improve the prediction precision for this application.To realize this objective, we can take two strategies into account.The first strategy is to adopt the more powerful modeling methods to learn the information hidden in the historical data, while the other one is to incorporate the knowledge or patterns from our experience or data into the prediction models.
On the one hand, the deep learning technique provides us one very powerful tool for constructing the prediction model.In the deep learning models, more representative features can be extracted from the lowest layer to the highest layer [11,12].Until today, this miraculous technique has been widely used in various fields.In [13], a novel predictor, the stacked autoencoder Levenberg-Marquardt model was constructed for the prediction of traffic flow.In [14], an extreme deep learning approach that integrates the stacked autoencoder (SAE) with the ELM was proposed for building energy consumption prediction.In [15], the deep learning was employed as an ensemble technique for cancer detection.In [16], the deep convolutional neural network (CNN) was utilized for face photo-sketch recognition.In [17], a deep learning approach, the Gaussian-Bernoulli restricted Boltzmann machine (RBM) was applied to 3D shape classification through using spectral graph wavelets and the bag-of-features paradigm.In [18], the deep belief network (DBN) was applied to solve the natural language understanding problem.Furthermore, in [19], the DBN was utilized to fuse the virtues of multiple acoustic features for improving the robustness of voice activity detection.As one popular deep learning method, the DBN has shown its superiority in machine learning and artificial intelligence.This study will adopt and modify the DBN to make it be suitable for the prediction of building energy consumption.
On the other hand, knowledge or patterns from our experience can provide additional information for the design of the prediction models.In [20][21][22], different kinds of prior knowledge were incorporated into the SVM models.In [23], the knowledge of symmetry was encoded into the type-2 fuzzy logic model to enhance its performance.In [24,25], the knowledge of monotonicity was incorporated into the fuzzy inference systems to assure the models' monotonic input-output mappings.In [26][27][28][29], how to encode the knowledge into neural networks was discussed.As shown in these studies, through incorporating the knowledge or pattern, the constructed machine learning models will yield better performance and have significantly improved generalization ability.
From the above discussion, both the deep learning method and the domain knowledge are helpful for the prediction models' performance improvement.Following this idea, this study tries to present a hybrid model that combines the DBN model with the periodicity knowledge of the building energy consumption to further improve the prediction accuracy.The final prediction results of the proposed hybrid model are obtained by combining the outputs from the modified DBN model and the energy-consuming pattern model.Here, the energy-consuming pattern represents the periodicity property of building energy consumption and can be extracted from the observed historical energy consumption data.In this study, firstly, the structure of the proposed hybrid model will be presented, and how to extract the energy-consuming pattern will be demonstrated.Then, the training algorithm for the modified DBN model will be provided.The learning of the DBN model mainly includes two steps, which firstly optimizes the hidden parameters by the contrastive divergence (CD) algorithm in a pre-train way, and then determines the output weighting vector by the least squares method.Furthermore, the proposed hybrid model will be applied to the prediction of the energy consumption in two kinds of buildings that have different energy-consuming patterns (daily-periodicity and weekly-periodicity).Additionally, to show the superiority of the proposed hybrid model, comparisons with four popular artificial intelligence methods-the backward propagation neural network (BPNN), the generalized radial basis function neural network (GRBFNN), the extreme learning machine (ELM), and the support vector regressor (SVR) will be made.From the comparison results, we can observe that all the predictors (DBN, BPNN, GRBFNN, ELM and SVR) designed using both the periodicity knowledge and residual data perform much better than those designed only by the original data.Hence, we can judge that the periodicity knowledge is quite useful for improving the prediction performance in this application.The experiments also show that, among all the prediction models, the proposed DBN based hybrid model has the best performance.
The rest of this paper is as follows.In Section 2, the deep belief network will be reviewed.In Section 3, the proposed hybrid model will be presented firstly, and then the modified DBN will be provided.In Section 4, two energy consumption prediction experiments for buildings that have different energy-consuming patterns will be done.In addition, the experimental and comparison results will be given.Finally, in Section 5, the conclusions of this paper will be drawn.

Introduction of DBN
The DBN is a stack of restricted Boltzmann machine (RBM) [11,30].Therefore, for better understanding, we will introduce the RBM before the introduction of the DBN in this section.

Restricted Boltzmann Machine
The structure of a typical RBM model is shown in Figure 1.The RBM is an undirected, bipartite graphical model, which consists of the visible (input) layer and the hidden (output) layer.The visible layer and the hidden layer are respectively made up of n visible units and m hidden units, and there is a bias in each unit.Moreover, there are no interconnection within the visible layer or the hidden layer [31].
Visible Layer: v The activation probability of the jth hidden unit can be computed as follows when a visible vector where σ(• • •) is the sigmoid function, w ij is the connection weight between the ith visible unit and jth hidden unit, and b j is the bias of the jth hidden unit.
Similarly, when a hidden vector h h h(h 1 , . . ., h j , . . ., h m ) is known, the activation probability of the ith visible unit can be computed as follows: where i = 1, 2, . . ., n, and a i is the bias of the ith visible unit.Hinton et al. [33] have proposed the contrastive divergence (CD) algorithm to optimize the RBM.The CD algorithm based RBM's iterative learning procedures for binomial units are listed as follows [32].
Step 1: Initialize the number of visible units n, the number of hidden units m, the number of training data N, the weighting matrix W W W, the visible bias vector a a a, the hidden bias vector b b b and the learning rate .
Step 2: Assign a sample x x x from the training data to be the initial state v 0 v 0 v 0 of the visible layer.
Step 6: Update the parameters according to the following equations: Step 7: Assign another sample from the training data to be the initial state v 0 v 0 v 0 of the visible layer, and iterate Steps 3 to 7 until all the N training data have been used.

Deep Belief Network
As aforementioned, the DBN as a miraculous deep model is a stack of RBMs [11,30,34,35]. Figure 2 illustrates the architecture of the DBN with k hidden layers and its layer-wise pre-training process.
The activation of the kth hidden layer with respect to input sample x x x can be computed as where W u and b u (u = 1, 2, . . ., k) are, respectively, the weighting matrices and hidden bias vectors of the uth RBM.Furthermore, σ is the logistic sigmoid function σ(x) = 1/(1 + e −x ).
In order to obtain better feature representation, the DBN utilizes deep architecture and adopts the layer-wise pre-training to optimize the inter-layer weighting matrix [11].The training algorithm of the DBN will be given in the next section in detail.Hidden layer k Visible Layer

The Proposed Hybrid Model
In this section, the structure of the hybrid model will be proposed first.Then, the extraction of the energy-consuming pattern and the generation of the residual data will be given.Finally, the modified DBN (MDBN) and its training algorithm will be presented.
To begin, we assume that we have collected the sampling data for M consecutive days, and, in each day, we collected T data points.Then, sampled time series of energy consumption data can be written as a series of 1D vectors as where and T is the sampling number per day.

Structure of the Hybrid Model
The hybrid model combines the modified DBN (MDBN) model with the periodicity knowledge of the building energy consumption to obtain better prediction accuracy.The design procedure of the proposed model is depicted in Figure 3 and is also given as follows: Step 1: Extract the energy-consuming pattern as the periodicity knowledge from the training data.
Step 2: Remove the energy-consuming pattern from the training data to generate the residual data.
Step 3: Utilize the residual data to train the MDBN model.
Step 4: Combine the outputs from the MDBN model with the periodicity knowledge to obtain the final prediction results of the hybrid model.
It is obvious that the extraction of the energy-consuming pattern, the generation of the residual data and the construction of the MDBN model are crucial in order to build the proposed hybrid model.Consequently, we will introduce them in detail in the following subsections.

Extraction of the Energy-Consuming Patterns and Generation of the Residual Data
Obviously, various regular patterns of energy consumption (e.g., daily-periodicity, weekly-periodicity, monthly-periodicity and even yearly-periodicity) exist in different kinds of buildings.In this study, we will take the daily-periodic and the weekly-periodic energy-consuming patterns as examples to introduce the method for extracting them from the original data.

The Daily-Periodic Pattern
For daily-periodic energy-consuming pattern, it can be extracted from the original time series by the following equation: where P P P = P P P 1 = [p 1 (1), . . ., p 1 (T)], . . ., are, respectively, the data sets of weekdays and weekends, and Then, to generate the residual time series Y Y Y Res for the building energy consumption data set, we use the following rules: where z = 1, 2, . . ., M.
Subsequently, Y Y Y Res can be written as

Modified DBN and Its Training Algorithm
In this subsection, the structure of the MDBN will be shown firstly.Then, the pre-training process of the DBN part will be described in detail.At last, the least squares method will be employed to determine the weighting vector of the regression part.

Structure of the MDBN
In the parameter optimization of the traditional DBNs, the CD algorithm is adopted to pre-train the parameters of multiple RBMs, and the BP algorithm is used to finely tune the parameters of the whole network.In this paper, we add an extra layer as the regression part to the DBN to realize the prediction function.Thus, we call it the modified DBN (MDBN).The structure of the MDBN is demonstrated in Figure 4.In addition, we propose a training algorithm that combines the CD algorithm with the least squares method for the learning of the MDBN model.We divide the training process of the MDBN into two steps.The first step adopts the contrastive divergence algorithm to optimize the hidden parameters in a pre-train way, while the second one determines the output weighting vector by the least squares method.The detailed description will be given as below.

Pre-Training of the DBN Part
Generally speaking, with the number of hidden layers increasing, the effectiveness of the BP algorithm for optimizing the parameters of the deep neural network is getting lower and lower because of the gradient divergence.Fortunately, Hinton et al. [11] proposed a fast learning algorithm for the DBN.This novel approach realizes layer-wise pre-train of the multiple RBMs in the DBN in a bottom-up way as described below: Step 1: Initialize the number of hidden layers k, the number of the training data N and the initial sequence number of hidden layer u = 2.
Step 2: Assign a sample x x x from the training data to be the input data of the DBN.
Step 3: Regard the input layer and the first hidden layer of the DBN as an RBM, and compute the activation A A A 1 (x x x) by Equation ( 3) when the training process of this RBM is finished.
Step 4: Regard the uth and the (u + 1)th hidden layer as an RBM with the input A A A u−1 (x x x), and compute the activation A A A u (x x x) by Equation ( 3) when the training process of this RBM is completed.
Step 6: Use the A A A k (x x x) as the input of the regression part.
Step 7: Assign another sample from the training data as the input data of the DBN, and iterate Step 3 to 7 until all the N training data have been assigned.

Least Squares Learning of the Regression Part
Suppose that the training set is As aforementioned, once the pre-training of the DBN part is completed, the activation of the final hidden layer of the MDBN with respect to the input x x x (l) can be obtained to be A A A k (x x x (l) ), where l = 1, 2, . . ., N. Furthermore, the activation of the final hidden layer of the MDBN with respect to all the N training data can be written in the matrix form as (2)  . . .
where n k is the number of neurons of the kth hidden layer.We always expect that each actual value y (l) with respect to x x x (l) can be approximated by the output ŷ(l) of the predictor with no error.This expectation can be mathematically expressed as where ŷ(l) is the output of the MDBN and can be computed as in which β β β is the output weighting vector and can be expressed as Then, Equation ( 16) can be rewritten in the matrix form as where (1) , y (2) , . . ., From Equation ( 19), the output weighting vector β β β can be derived by the least squares method as [36][37][38][39] where

Experiments
In this section, first of all, four comparative artificial intelligence approaches will be introduced briefly.Next, the applied data sets and experimental setting will be discussed.Then, the proposed hybrid model will be applied to the prediction of the energy consumption in a retail store and an office building that respectively have daily-periodic and weekly-periodic energy-consuming patterns.Finally, we will give the comparisons and discussions of the experiments.

Introduction of the Comparative Approaches
To make a quantitative assessment of the proposed MDBN based hybrid model, four popular artificial intelligence approaches, the BPNN, GRBFNN, ELM, and SVR, are chosen as the comparative approaches and introduced briefly below.

Backward Propagation Neural Network
The structure of BPNN with L hidden layers is demonstrated in Figure 5.The BPNN as one popular kind of ANN adopts back propagation algorithm to obtain the optimal weighting parameters of the whole network [40][41][42].

Input layer
Hidden layer Output layer As shown in Figure 5, the final output of the network can be expressed as [40][41][42] where w k ij is the connection weight between the ith unit of kth layer and the jth unit of (k + 1)th layer, and f (•) is the logistic sigmoid function.
In order to obtain the optimal parameters of the BPNN, the Backward Propagation (BP) algorithm is adopted to minimize the following cost function for each training data point where ŷ(t) and y (t) are the predicted and actual values with respect to the input x x x (t) .The update rule for the weight w k ij can be expressed as where η is the learning rate, and is the gradient of the parameter w k ij , and can be calculated by the backward propagation of the errors.
The BP algorithm has two phases-forward propagation and weight update.In the forward propagation stage, when an input vector is input to the NN, it is propagated forward through the whole network until it reaches the output layer.Then, the error between the output of the network and the desired output is computed.In the weight update phase, the error is propagated from the output layer back through the whole network, until each neuron has an associated error value that can reflect its contribution to the original output.These error values are then used to calculate the gradients of the loss function that are fed to the update rules to renew the weights [40][41][42].

Generalized Radial Basis Function Neural Network
The radial basis function (RBF) NN is a feed-forward NN with only one hidden layer whose structure is demonstrated in Figure 6.The RBFNN has Gaussian functions as its hidden neurons.The GRBFNN is a modified RBFNN and adopts the generalized Gaussian functions as its hidden neurons [43,44].The output of the GRBFNN can be expressed as [43,44] where n 1 is the number of hidden neurons, τ j is the shape parameter of the jth radial basis function in the hidden layer, and c j and d j are, respectively, the center and width of the jth radial basis function.
In order to determine the parameters τ τ τ, c c c and d d d in the hidden layer and the connection weight w j , the aforementioned BP algorithm can also be employed.

Extreme Learning Machine
The ELM is also a feed-forward neural network with only one hidden layer as demonstrated in Figure 6.However, the ELM and GRBFNN have different parameter learning algorithms and different activation functions in the hidden neurons.
In the ELM, the activation functions in the hidden neurons can be the hard-limiting activation function, the Gaussian activation function, the Sigmoidal function, the Sine function, etc. [36,37].
In addition, the learning algorithm for the ELM is listed below: • Randomly assign input weights or the parameters in the hidden neurons.
• Calculate the hidden layer output matrix H H H, where • Calculate the output weights where Y Y Y = [y (1) , y (2) , • • • , y (N) ] T and H H H + is the Moore-Penrose generalized inverse of the matrix H H H.
This learning process is very fast and can lead to excellent modeling performance.Hence, the ELM has found lots of applications in different research fields.

Support Vector Regression
The SVR is a variant of SVM.It can yield improved generalization performance through minimizing the generalization error bound [45].In addition, the kernel trick is adopted to realize the nonlinear transformation of input features.
The model of the SVR can be defined by the following function where is the nonlinear mapping function.
Using the training set ℵ = {(x x x (l) , y (l) )} N l=1 , we can determine the parameters w w w and b, and then obtain the SVR model as where in which α l and α * l are the Langrange multipliers and can be determined by solving the following dual optimization problem [46]: where C is the regularization parameter and ε is the error tolerance parameter.

Applied Data Sets and Experimental Setting
In this subsection, first of all, the building energy consumption data sets will be described.Next, three design factors that are utilized to determine the optimal structure of the MDBN will be shown.Finally, five indices will be given to evaluate the performances of the predictive models.

Applied Data Sets
Two kinds of building energy consumption data sets were downloaded from [47].The first data set includes 34,848 samples from 2 January 2010 to 30 December 2010.The data in this data set were collected every 15 min in one retail store in Fremont, CA, USA.We then aggregated them to generate the hourly energy consumption data.The second data set contains 22,344 samples from 4 April 2009 to 21 October 2011.The data in this set were collected every 60 min in one office building in Fremont, CA, USA.Parts of the samples of the two data sets are depicted in Figure 7.

Design Factors for MDBN
To determine the optimal structure of the MDBN for building energy consumption prediction, we will take three design factors, the number of hidden layers, hidden neurons and input variables, with their corresponding levels into account.The three design factors and their corresponding levels are presented in Table 1 and discussed in detail below.The number of hidden units is an important factor that greatly influences the performance of the MDBN model.Here, we assume that the numbers of neurons in all hidden layers are equal, i.e., n In this paper, we set the number of neurons 50, 100 and 150 as Levels 1, 2 and 3, respectively.• Design Factor iii: the number of input variables r In this paper, we utilize r energy consumption data in the building energy consumption time series before time t to predict the value at time t.In other words, we utilize x x x = [y(t − 1), y(t − 2), . . ., y(t − r)] to predict the value of y = y(t).Here, we consider the number of input variables 4, 5 and 6 as Levels 1, 2 and 3, respectively.

Comparison Setting
In this study, the performances of all the predictors constructed by utilizing the energy-consuming patterns are compared with those designed by the original data.To evaluate the performances of the models, we utilize the following two kinds of indices.
We first consider the mean absolute error (MAE), the root mean square error (RMSE), and the mean relative error (MRE), and calculate them as where K is the number of training or testing data pairs, and ŷ(l) , y (l) are, respectively, the predicted value and actual value with respect to the input x x x (l) .The MAE, RMSE and MRE are common measures of forecasting errors in time series analysis.They serve to aggregate the magnitudes of the prediction errors into a single measure.The MAE is an average of the absolute errors between the predicted values and actual observed values.In addition, the RMSE represents the sample standard deviation of the differences between the predicted values and the actual observed values.As larger errors have a disproportionately large effect on MAE and RMSE, they are sensitive to outliers.The MRE, also known as the mean absolute percentage deviation, can remedy this drawback, and it expresses the prediction accuracy as a percentage through dividing the absolute errors by their corresponding actual values.For prediction applications, the smaller the values of MAE, RMSE and MRE are, the better the forecasting performance will be.
To better show the validity of the models, we also consider another two statistical indices, which are, respectively, the Pearson correlation coefficient, denoted as r, and the coefficient of determination, denoted as R 2 .These two indices can be calculated as where K is also the number of training or testing data pairs, and ŷAve , y Ave are, respectively, the averages of the predicted and actual values.The statistic r is a measure of the linear correlation between the actual values and the predicted values.It ranges from −1 to 1, where −1 means the total negative linear correlation, while 1 is total positive linear correlation.The statistic R 2 provides a measure of how well actual observed values are replicated by the predicted values.In other words, it is a measure of how good a predictor might be constructed from the observed training data [48].The value of R 2 ranges from 0 to 1.In regression applications, the larger the values of r and R 2 are, the better the prediction performances will be.

Energy Consumption Prediction for the Retail Store
In this subsection, the energy-consuming pattern of the retail store will be extracted from the retail store data set firstly.Then, the configurations of the five prediction models for predicting the retail store energy consumption will be shown in detail.At last, the experimental results will be given.

Energy-Consuming Pattern of the Retail Store
We utilize Equations ( 6) and ( 7) to obtain the daily-periodic energy-consuming pattern and the residual time series of the retail store.
Figure 8a shows the daily-periodic energy-consuming pattern.In addition, the residual time series of the retail store, which is used to optimize the MDBN is demonstrated in Figure 8b.

Configurations of the Prediction Models
As aforementioned, we will take three design factors, the number of hidden layers, hidden neurons and input variables, with their corresponding levels into account to determine the optimal structure of the MDBN model for building energy consumption prediction.Consequently, 3 3 = 27 trials are ran.In addition, the experimental results are shown in Table 2.It is obvious that trail 19 can obtain the best performance.In other words, the optimal structure of the MDBN for retail store energy consumption prediction has four hidden layers, 150 hidden units and four input variables.
Furthermore, the parameter configurations of the other four comparative predictors for retail store energy consumption prediction are listed in detail as follows.
• For the BPNN, there were 110 neurons in the hidden layer that can realize the nonlinear transformation of features by the sigmoid function.Additionally, the algorithm was ran for 7000 iterations to achieve the learning objective.• For the GRBFNN, the 6-fold cross-validation was adopted to determine the optimized spread of the radial basis function.Furthermore, the spread was chosen from 0.01 to 2 with the 0.1 step length.
• For the ELM, there were 100 neurons in the hidden layer, and the hardlim function was chosen as the activation function for converting the original features into another space.• For the SVR, the penalty coefficient was set to be 80, and the radial basis function was chosen as the kernel function to realize the nonlinear transformation of input features.

Experimental Results
For the testing data of the retail store, parts of the prediction results of the five predictors constructed by utilizing the energy-consuming pattern are illustrated in Figure 9. Furthermore, for better visualization, the prediction error histograms of the five predictors are shown in Figure 10.It is obvious that the more the prediction errors float around zero, the better the forecasting performance of the predictor will be.
Then, to examine the superiority of the hybrid model for the retail store energy consumption prediction, the five prediction models are compared considering different data types (the original and residual data).The original data means that the predictors are learned using the original data series, while the residual data means that the predictors are constructed by both the energy-consuming pattern and the residual data series.Experimental results are demonstrated in detail in Table 3.

Energy Consumption Prediction for the Office Building
In this subsection, first of all, the energy-consuming pattern of the office building will be extracted from the office building data set.Then, the configurations of the five prediction models for predicting the office building energy consumption will be shown in detail.Finally, the experimental results will be given.

Energy-Consuming Pattern of the Office Building
Being similar to the retail store experiment, we utilize Equations ( 8)-( 14) to obtain the weekly-periodic energy-consuming pattern and the residual time series of the office building.
As mentioned previously, the weekly-periodic energy-consuming pattern should include two parts, which are the weekday pattern and the weekend pattern.The obtained weekday pattern is depicted in Figure 11a, while the weekend pattern is shown in Figure 11b.We can observe that the energy consumption in weekends is quite different from that in weekdays.After removing the energy-consuming pattern, the residual time series of the office building is demonstrated in Figure 11c.This residual time series is utilized to train the MDBN in the hybrid model.

Configurations of the Prediction Models
Similarly, we run 3 3 = 27 trials to determine the optimal structure of the MDBN model for the office building energy consumption prediction.The experimental results are listed in Table 4.As shown in Table 4, the trail 13 obtains the best performance.Consequently, the optimal structure of the MDBN in the hybrid model for office building has three hidden layers, 100 hidden units in each layer and four input variables.

Comparisons and Discussions
As discussed previously, smaller values of the MAE, RMSE and MRE represent better prediction results while lager values of r and R 2 correspond to better performance.Considering all the values of such indices as shown in Tables 3 and 5 (It is worth noting that the values of the indices in Table 3 are about the retail energy consumption while the values in Table 5 are about the office energy consumption.The retail building consumed much more energies than the office building.As a result, some values of the MAE, RMSE and MRE in Table 3 are larger than those in Table 5), the predictors constructed by utilizing the energy-consuming patterns perform better than those designed only by the original data.Taking the RMSE index for example, in the first experiment, the accuracies of the MDBN, BPNN, GRBFNN, ELM and SVR based hybrid models are promoted by 11.1%, 7.0%, 4.2%, 21.6% and 9.6%, respectively, while, in the second experiment, the accuracy improvements of such models are 15.6%, 14.8%, 26.5%, 16.9% and 34.0%, respectively.As a result, we can draw a conclusion that the periodicity knowledge is helpful to improve the accuracy for building energy consumption prediction.
From Figures 9 and 12, we can see that the hybrid DBN model can not only predict the regular testing data well for both the retail store and the office building energy consumption from the global perspective, but also give the best prediction results for the noisy irregular data, e.g., the sampling points from 25 to 50 in Figure 9 in the retail store experiment.These irregular testing data can reflect the uncertainties in the energy consumption time series.In other words, the proposed hybrid DBN model has the most powerful ability to deal with the uncertain and/or the randomness in the historical building energy consumption data.
Figures 10 and 13 demonstrated the prediction error histograms of the five models designed through using the periodicity knowledge in the two experiments.In the histograms, the horizontal direction depicts the exact values of the prediction errors, while the vertical direction indicates the number of the prediction errors in different partitioned intervals.The more the prediction errors float around zero, the better performance the predictors will achieve.From both figures, we can clearly observe that the proposed hybrid DBN model has more prediction errors floating near zero compared with the other four artificial intelligence techniques-that is to say, the approximation capability of the proposed hybrid DBN model is promising for the two experimented buildings.Furthermore, to further validate the accuracy of the MDBN based hybrid model, scatter plots of the actual and predicted values in the two experiments are demonstrated in Figure 14a,b, respectively.From Figure 14, we can observe that the predicted values from the hybrid DBN model can duplicate the actual values well.
Among all the predictors constructed by both the original and residual data, the proposed MDBN based hybrid model has the best prediction accuracy in the two experiments as shown in Tables 3 and 5.This phenomenon indicates that the proposed deep learning method has the miraculous learning and prediction abilities in time series forecasting applications.This also verifies the powerful feature extraction ability of the deep learning algorithm and the effectiveness of the modified learning strategies.
One thing to be mentioned is that the numbers of the data used in this paper are not very big (about the ten thousand scale).Even though the hybrid MDBN model is not learned by big data in both experiments, it still shows us excellent performances.This is also consistent with some other application results where the DBNs were trained without a mass of data.For example, in [49,50], the DBNs were applied to the time series prediction and the wind power prediction, which also do not have a large quantity of data.In both applications, the experimental results demonstrated that the DBN approach performs best compared with the traditional techniques.All these applications verified the learning ability of the DBN models for not very large data applications.

Conclusions
In this paper, a hybrid model is presented to further improve the prediction accuracy for building energy consumption prediction.The proposed model combines the MDBN model with the periodicity knowledge to obtain the final prediction results.The theoretical contributions of this study consist of two aspects: (1) the periodicity knowledge was extracted and encoded into the prediction model.In addition, the prediction accuracy can be greatly improved through utilizing this kind of prior knowledge; (2) a novel learning algorithm that combines the contrastive divergence algorithm and the least squares method was proposed to optimize the parameters of the MDBN.This is the first time that the DBN is applied to the building energy consumption prediction.On the other hand, this study applied the proposed approach to the energy consumption prediction of two kinds of buildings.Experimental and comparison results verified the effectiveness and superiorities of the proposed hybrid model.
As is well known, many kinds of time series data, e.g., the traffic flow time series and the electricity consumption time series, have the periodicity property.The hybrid model can be expected to yield better performance in the predictions of such time series.In the future, we will extend our approach to these applications.On the other aspect, our study only focuses on the data science that tries to utilize the data to realize the energy-consumption prediction without considering any scientific or practical information of energy related principles.Theoretically, the energy related principles are very helpful to improve the prediction performance.We are now exploring the strategies to construct the novel hybrid prediction models through combining the energy related principles and observed data to further improve the prediction accuracy.

Figure 1 .
Figure 1.The structure of a typical RBM model.

Figure 2 .
Figure 2. The architecture of the DBN with k hidden layers.

Figure 3 .
Figure 3.The structure of the hybrid model.

Figure 4 .
Figure 4.The structure of the modified DBN.

Figure 5 .
Figure 5.The structure of BPNN with L hidden layers.

Figure 6 .
Figure 6.The topological structure of the feed-forward single-hidden-layer NN.

Figure 7 .
Figure 7. Parts of the samples of two data sets: (a) the first 500 data points of the retail store; (b) the first 500 data points of the office building.

Figure 8 .
Figure 8. Periodicity knowledge and the residual time series of the retail store data set: (a) the daily-periodic energy-consuming pattern; (b) the residual time series.

Figure 11 .
Figure 11.Periodicity knowledge and the residual time series of the office building data set: (a) the energy-consuming pattern of weekdays; (b) the energy-consuming pattern of weekends; (c) the residual time series.

Figure 14 .
Figure 14.Scatter plots of the actual and predicted values of the energy consumptions in the retail building (a) and the office building (b).

Table 1 .
Design factors and their corresponding levels.The number of hidden layers determines how many RBMs are stacked.In this study, we consider the number of hidden layers 2, 3 and 4 as Levels 1, 2 and 3, respectively.•Design Factor ii: the number of uth hidden units n u • Design Factor i: the number of hidden layers k

Table 2 .
Experimental results of the MDBN in 27 trails under the consideration of three design factors and their corresponding levels.

Table 3 .
The performances of the five models for the retail store energy consumption prediction.

Table 4 .
Experimental results of the MDBN in 27 trails under the consideration of three design factors and their corresponding levels.