A Prediction Methodology of Energy Consumption Based on Deep Extreme Learning Machine and Comparative Analysis in Residential Buildings

: In this paper, we have proposed a methodology for energy consumption prediction in residential buildings. The proposed method consists of four different layers, namely data acquisition, preprocessing, prediction, and performance evaluation. For experimental analysis, we have collected real data from four multi-storied residential building. The collected data are provided as input for the acquisition layer. In the pre-processing layer, several data cleaning and preprocessing schemes were deployed to remove abnormalities from the data. In the prediction layer, we have used the deep extreme learning machine (DELM) for energy consumption prediction. Further, we have also used the adaptive neuro-fuzzy inference system (ANFIS) and artiﬁcial neural network (ANN) in the prediction layer. In the DELM different numbers of hidden layers, different hidden neurons, and various types of activation functions have been used to achieve the optimal structure of DELM for energy consumption prediction. Similarly, in the ANN, we have employed a different combination of hidden neurons with different types of activation functions to get the optimal structure of ANN. To obtain the optimal structure of ANFIS, we have employed a different number and type of membership functions. In the performance evaluation layer for the comparative analysis of three prediction algorithms, we have used the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE). The results indicate that the performance of DELM is far better than ANN and ANFIS for one-week and one-month hourly energy prediction on the given data.


Introduction
The energy consumption in residential buildings has significantly increased in the last decade.Energy is an essential part of our lives and almost all things in some way are associated with electricity [1,2].According to the report issued by the US Energy Information Administration (EIA), 28% growth in global energy demand may occur until 2040 [3].Due to improper usage, a tremendous amount of energy is wasted annually; hence, energy wastage can be avoided by efficient utilization of energy.Smart solutions are required to certify the proper use of energy [4].An energy consumption prediction is very significant to achieve efficient energy maintenance and reduce environmental effect [5][6][7].However, in residential buildings, it is quite challenging as there are many types of buildings and different forms of energy.Also many factors are involved to influence the energy behaviour of the building structures, such as weather circumstances, the physical material used in the building construction, company behaviour, sub-level systems, i.e., lighting, heating, ventilating, and air-conditioning (HVAC) systems, and the execution and routines of the sub-level components [8].
Technologies based on the Internet of Things (IoT) are immensely significant to comprehend the notion of smart homes.Numerous solutions for obtaining energy consumption predictions in heating, ventilating, and air-conditioning (HVAC) systems, and the execution and routines of the sub-level components [8].
Technologies based on the Internet of Things (IoT) are immensely significant to comprehend the notion of smart homes.Numerous solutions for obtaining energy consumption predictions in buildings based on the IoT can be found in the literature [9].Energy management and efficiency is the next most crucial area for IoT applications in South Korea [9].From 2003, homes in this country have been getting smarter and smarter with the inclusion of remote communication devices.The energy demand in South Korea is growing day-by-day; in 2013 South Korea was the eighth largest energy consuming country.The energy consumption distribution in South Korea is between residential and commercial sectors (38%), the industrial sector (55%), the transport sector (1%), and the public sector (6%) [10] as shown in Figure 1.Many solutions were developed based on machine learning algorithms for energy consumption prediction.These models use historical data, which reflect process behavior to be modeled [11,12].The machine learning techniques that have been used redundantly for prediction purposes are artificial neural networks [7], adaptive neuro-fuzzy inference system (ANFIS) [13], support vector machine (SVM) [14], extreme learning machine (ELM) [15], and so forth.The ELM method has some advantages over the conventional NNs such as they are easy to use, fast learning, provide good generalization results, can easily be applied, and can get least training inaccuracy and minimum norm of weights [16].Nowadays the deep learning approaches have also been used in various areas for prediction purposes [17], such as a deep neural network, deep belief network, and a recurrent neural network.The term of deep learning states the number of layers through which data are transferred [18].The deep learning techniques are powerful tools to obtain healthier modelling and prediction performance.The datasets used in References [18][19][20] for time series prediction applications do not have a large quantity of data as compared to datasets in the research areas of image processing, speech recognition and machine vision.Though, in these applications, the deep learning methods worked efficiently as compared to the conventional machine learning approaches due to their slightly deeper architectures and novel learning methods.
In this paper, we have proposed a methodology for the energy prediction having four layers, i.e., the data acquisition layer, the pre-processing layer, the prediction layer, and the performance evaluation layer.We have performed different operations on the data in each layer of the proposed model.In the prediction layer, we used the deep extreme learning (DELM) approach for the improved performance of the energy consumption prediction.The DELM takes the benefits of both extreme learning and deep learning techniques.The DELM increases the number of hidden layers in the original ELM network structure, arbitrarily initializes the input layer weights and the initial hidden layer weights along with the bias of initial hidden layer, uses the technique for hidden layers (excluding first hidden layer) parameters calculation, and finally uses the least square technique for Many solutions were developed based on machine learning algorithms for energy consumption prediction.These models use historical data, which reflect process behavior to be modeled [11,12].The machine learning techniques that have been used redundantly for prediction purposes are artificial neural networks [7], adaptive neuro-fuzzy inference system (ANFIS) [13], support vector machine (SVM) [14], extreme learning machine (ELM) [15], and so forth.The ELM method has some advantages over the conventional NNs such as they are easy to use, fast learning, provide good generalization results, can easily be applied, and can get least training inaccuracy and minimum norm of weights [16].Nowadays the deep learning approaches have also been used in various areas for prediction purposes [17], such as a deep neural network, deep belief network, and a recurrent neural network.The term of deep learning states the number of layers through which data are transferred [18].The deep learning techniques are powerful tools to obtain healthier modelling and prediction performance.The datasets used in References [18][19][20] for time series prediction applications do not have a large quantity of data as compared to datasets in the research areas of image processing, speech recognition and machine vision.Though, in these applications, the deep learning methods worked efficiently as compared to the conventional machine learning approaches due to their slightly deeper architectures and novel learning methods.
In this paper, we have proposed a methodology for the energy prediction having four layers, i.e., the data acquisition layer, the pre-processing layer, the prediction layer, and the performance evaluation layer.We have performed different operations on the data in each layer of the proposed model.In the prediction layer, we used the deep extreme learning (DELM) approach for the improved performance of the energy consumption prediction.The DELM takes the benefits of both extreme learning and deep learning techniques.The DELM increases the number of hidden layers in the original ELM network structure, arbitrarily initializes the input layer weights and the initial hidden layer weights along with the bias of initial hidden layer, uses the technique for hidden layers (excluding first hidden layer) parameters calculation, and finally uses the least square technique for output network weights calculation.We have used the trial and error method to set the best number of hidden layers, a suitable number of neurons in the hidden layers, and a compatible activation function.The performance evaluation of the proposed DELM model with an adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN) regarding energy consumption prediction was carried out.
The rest of the paper is organized as follow.The related work is given in Section 2, a detailed explanation of the proposed model comprised of data acquisition, preprocessing, prediction and performance evaluation modules are given in Section 3. Section 4 discusses the experimental results based on prediction algorithms in detail.The discussion and comparative analysis explanation are provided in Section 5.The paper conclusion and future work are discussed in Section 6.

Related Work
Energy is an extremely vital resource and its demand is growing day-by-day.Saving energy is not only significant to promote a green atmosphere for future sustainability but also vital for household consumers and the energy production corporations.Electricity affects the user's regular expenses and the user always wants to decrease their monthly expenses.Energy production companies are always under intense pressure to fulfil the growing energy demand of the commercial and domestic sectors.Techniques for proficient energy consumption prediction are essential for all stakeholders.Many researchers have made numerous efforts and developed several methods for energy consumption prediction.
Generally, we can achieve more accurate results by using machine learning in different real-world applications.Kalogirou [21] approached a back propagation neural network for the required heating load prediction in buildings.The training of the algorithm was carried out for the energy consumption data of 225 buildings; these buildings were different in sizes from small spaces to big rooms.Olofsson [22] proposed a method to forecast the energy requirement on a yearly basis for a small single-family building situated in Sweden.Yokoyama [23] suggested a method for a cooling demand prediction in a building based on the back propagation neural network.Kreider [24] applied a recurrent neural network to predict energy consumption hourly based on the heating and cooling energy prediction in buildings.The recurrent neural network also used by Ben-Nakhi [25] for the cooling load prediction of three office buildings.The data used for model training and testing was collected from 1997 to 2000 for short-term energy prediction.Carpinteiro [26] used a hierarchical neural network based model for short-term energy consumption prediction.They have used two self-organization maps for load forecasting.The Euclidian distance was used to calculate the distance between two vectors.They have used the Brazilian utility dataset for the energy consumption prediction.Their proposed approach performed well for both short-term and long-term forecasting.Another technique was suggested based on a regression model for short-term load forecasting [27].Irisarri [28] proposed a method of energy load prediction based on summer and winter sessions.Ali, in Reference [29], proposed a technique comprising of six stages for smart homes energy consumption prediction in South Korea.They have used the Kalman filter as a predictor and the Mamdani fuzzy controller to control the actuators.Wahid [30] proposed a technique for energy consumption prediction in residential buildings.They have calculated the first two statistical moments namely mean and variance for the data that consisted of hourly, daily, weekly and monthly energy consumption.Then, the multilayer perceptron on the data with statistical moments was applied for energy consumption prediction.The trial and test methods were used to address a suitable combination of input, hidden, and output layers of neurons.Wahid [31] proposed another energy consumption prediction methodology for residential buildings.The introduced method consisted of five stages, namely data source, data collection, feature extraction, prediction, and performance assessment.Different machine learning algorithms, such as Multi-Layer Perceptron, Random Forest, and K-Nearest Neighbors Algorithms (KNN) were used to obtain the energy consumption predicted results.Arghira [32] presented an energy consumption forecasting method for different appliances in homes.The technique used in this study was developed to predict the day-ahead electricity demand for homes.In this study, the authors have used a historical dataset for homes in France.In this paper, the authors have suggested a stochastic predictor and tested two other predictors.Two pre-processing methods were also proposed namely, segmentation and aggregation.Li [33] suggested an alternate method called the hybrid genetic algorithm-adaptive network-based fuzzy inference system (GA-ANFIS) for energy consumption prediction.In their proposed approach, the GA algorithm was used as an optimizer, which assisted in developing the rule base and the premises and subsequent factors were adjusted through ANFIS for optimization of the prediction performance.Kassa, in Reference [34], proposed a model based on ANFIS for one-day ahead energy generation prediction.The proposed model was tested on real information of a wind power generation profile and the results provided by this method were prominent.In another paper, Ekici [35] proposed a technique using the ANFIS model to predict the energy demands of diverse buildings with different characteristics.
Nowadays, deep learning approaches have been used extensively for energy consumption prediction [18][19][20].Hence, due to the greater ability of learning, the deep learning (DL) methods have been used to improve the performance of modelling, classification, and visualization problems.Collobert [36] proposed an approach based on a convolutional neural network (CNN) for natural language processing (NLP).Hinton [37] used a deep auto-encoder to reduce the dimensionality and the results indicate that the deep auto-encoder performs better compared to a principal component analysis (PCA).Qiu [20] used a DL technique to predict time series small-batch data sets.Li [18] used a deep learning technique to predict traffic flow based on time series data.After review of all these applications, the results indicate that the performance of deep learning techniques is better compared to other counterpart approaches.
Figure 2 illustrates the proposed conceptual model for energy consumption prediction.The proposed methodology consisted of four main modules, namely data acquisition, pre-processing, prediction, and performance evaluation.

Proposed Energy Consumption Prediction Methodology
Energy consumption prediction in residential building is extremely important; it assists the manager to preserve energy and to avoid wastage.Due to the unpredictability and noisy disorder, correct energy consumption prediction in residential buildings is a challenging task.In this paper, we have proposed a methodology based on a deep extreme learning machine (DELM) for energy consumption prediction in residential buildings.We have divided the proposed method into four main layers, namely data acquisition, preprocessing, prediction, and performance evaluation.In the data acquisition layer, we have discussed the detailed data used in the experimental work.In the preprocessing layer, the moving average has been used to remove abnormalities from the data.In the prediction layer, the deep extreme learning machine (DELM) has been proposed to enhance the accuracy of energy consumption results.In the performance evaluation layer, MAE, RMSE, and MAPE [14] performance measures have been used to measure the performance of prediction algorithms.Figure 3 shows the detailed structure of the diagram of the proposed method.
Electronics 2018, 7, x FOR PEER REVIEW 5 of 22 Electronics 2018, 7, x; doi: FOR PEER REVIEW main layers, namely data acquisition, preprocessing, prediction, and performance evaluation.In the data acquisition layer, we have discussed the detailed data used in the experimental work.In the preprocessing layer, the moving average has been used to remove abnormalities from the data.In the prediction layer, the deep extreme learning machine (DELM) has been proposed to enhance the accuracy of energy consumption results.In the performance evaluation layer, MAE, RMSE, and MAPE [14] performance measures have been used to measure the performance of prediction algorithms.Figure 3 shows the detailed structure of the diagram of the proposed method.

Data Acquisition Layer
Figure 4 shows the data collection phase.The datasets from four residential buildings from January to December 2010 were collected [10].We have completed the task of data collection in the data acquisition layer for the proposed work.Sensors were mostly used for the collection of contextual information such as environmental

Data Acquisition Layer
Figure 4 shows the data collection phase.The datasets from four residential buildings from January to December 2010 were collected [10].We have completed the task of data collection in the data acquisition layer for the proposed work.Sensors were mostly used for the collection of contextual information such as environmental conditions, circumstances, temperature, humidity, user occupancy, and so forth.For user occupancy detection, several Passive Infra-Red (PIR) sensors were used to obtain information in 0, 1 form, such as busy or not busy.To get the information about user occupancy, the installation of cameras in transition positions between several regions of the building was required.The collection of the designated residential buildings data was carried out from January 2010 to December 2010.The building had 33 floors (394 ft.tall), the floor-wise information was collected from smart meters and used for this work.The installations of these meters have been carried out in a floor wise manner in the chosen buildings.It also indicated a direct relationship between energy utilization and users occupancy in the dataset.To better explain the entire energy consumption data, a box plot was used.The x-axis represents the hours of the day (24 h), the y-axis represents the energy consumption in kWh.The box represents the energy consumption in a particular hour of the day for the whole year.The long length box indicates high energy consumption and the short length box indicates low energy consumption.As residential buildings are very busy during noon and night times, the energy consumption was higher during these timings.The entire dataset of hourly energy consumption used in the proposed work is shown in Figure 5 for better observation.

Preprocessing Layer
In the pre-processing layer, we have removed abnormalities from the data.The data were assumed to have noise due to the inherent nature of data recording where several external aspects affect the reading.In the same way, there were many factors involved in outliers such as meter problem, human mistake, measurement errors, and so forth.Different smoothing filters can be used to remove abnormalities from the data, such as a moving average, loess, lowess, Rloess, RLowess, Savitsky-Golay.In this study, we have used the moving average method, which is an important filter and widely used by various authors [14] for data smoothing.Equation ( 1) is the mathematical representation of the moving average filter.
where x [ ] represents the inputs, y [ ] denotes the outputs, and M indicates the points of the moving average.In the proposed work, M was equal to 5, which was a suitable size for data smoothing [38].
Usually, data normalization is required when the sample data is scattered and the sample span is large.Hence, the span of the data was minimized for building models and predictions.The

Preprocessing Layer
In the pre-processing layer, we have removed abnormalities from the data.The data were assumed to have noise due to the inherent nature of data recording where several external aspects affect the reading.In the same way, there were many factors involved in outliers such as meter problem, human mistake, measurement errors, and so forth.Different smoothing filters can be used to remove abnormalities from the data, such as a moving average, loess, lowess, Rloess, RLowess, Savitsky-Golay.In this study, we have used the moving average method, which is an important filter and widely used by various authors [14] for data smoothing.Equation ( 1) is the mathematical representation of the moving average filter.
where x [ ] represents the inputs, y [ ] denotes the outputs, and M indicates the points of the moving average.In the proposed work, M was equal to 5, which was a suitable size for data smoothing [38].
Usually, data normalization is required when the sample data is scattered and the sample span is large.Hence, the span of the data was minimized for building models and predictions.The normalization was done, in essence, to have the same range of values for each of the inputs to the machine learning models.It can guarantee a stable convergence of weight and biases.In machine learning algorithm modelling for increasing prediction accuracy and for training process improvement, complete sample data were normalized to fit them in the interval [0, 1] by using Equation (2) given below: where P represents the mapped values, x denotes the starting value, x i is i of input data; x max and x min indicate the maximum and minimum values of starting data accordingly [39].

Prediction Layer
In the prediction layer, we have used three well-known machine learning algorithms to make one-week and one-month energy consumption predictions for the residential buildings.

Deep Extreme Learning Machine (DELM)
The extreme learning machine (ELM) technique is a very famous technique and it has been used in different fields for energy consumption prediction.The conventional artificial neural network based algorithm requires more training samples, slower learning times, and may lead to the over-fitting of a learning model [40].The idea of ELM was first specified by Reference [41].The ELM is used widely in various areas for classification and regression purposes because an ELM learns very quickly and it is computationally efficient.The ELM model comprises the input layer, a single hidden layer, and an output layer.The structural model of an ELM is shown in Figure 6, where p represents input layer nodes, q represents hidden layer nodes, and r indicates output layer nodes.
where P represents the mapped values, x denotes the starting value, is i of input data; and indicate the maximum and minimum values of starting data accordingly [39].

Prediction Layer
In the prediction layer, we have used three well-known machine learning algorithms to make one-week and one-month energy consumption predictions for the residential buildings.

Deep Extreme Learning Machine (DELM)
The extreme learning machine (ELM) technique is a very famous technique and it has been used in different fields for energy consumption prediction.The conventional artificial neural network based algorithm requires more training samples, slower learning times, and may lead to the over-fitting of a learning model [40].The idea of ELM was first specified by Reference [41].The ELM is used widely in various areas for classification and regression purposes because an ELM learns very quickly and it is computationally efficient.The ELM model comprises the input layer, a single hidden layer, and an output layer.The structural model of an ELM is shown in Figure 6, where p represents input layer nodes, q represents hidden layer nodes, and r indicates output layer nodes.3) and ( 4) respectively, where, a and b represent the dimension of the input matrix and the output matrix respectively.Next, the ELM arbitrarily adjusts the weights between the input layer and the hidden layer where is the weight between the kth input layer node and lth hidden layer node as represented in Equation ( 5).Then, the ELM randomly fixes the weights between the hidden neurons and output layer neurons that can be represented by Equation (6), where γ is the weight between the input and hidden layer nodes.3) and ( 4) respectively, where, a and b represent the dimension of the input matrix and the output matrix respectively.Next, the ELM arbitrarily adjusts the weights between the input layer and the hidden layer where w kl is the weight between the kth input layer node and lth hidden layer node as represented in Equation (5).Then, the ELM randomly fixes the weights between the hidden neurons and output layer neurons that can be represented by Equation (6), where γ kl is the weight between the input and hidden layer nodes.
Next, the biases of hidden layers nodes were randomly selected by the ELM, as in Equation ( 7).Further, the ELM selected a g(x) function, which was the activation function of the network.Consider Figure 4; the resultant matrix can be represented as in Equation (8).Respectively, the column vector of the resultant matrix, T, is represented in Equation ( 9).
Electronics 2018, 7, 222 Next, by considering Equations ( 8) and ( 9), we can obtain Equation (10).The hidden layer output is expressed as H and the transposition of V is represented as V and weight matrix values of γ [42,43] were computed using the least square method as given in Equation (11).
The regularization term γ has been used in order to make the network more generalized and more stable [44].
Deep learning is emerging and is the most popular topic for researchers nowadays.A network having at least four layers with input/output layers meets the requirement of a deep learning network.In a deep neural network, the neurons of each layer are trained on a different set of parameters using the prior layer's output.It enables the deep learning networks (DLN) to handle extensive data sets.Deep learning has grasped the attention of many researchers because it is very efficient to solve real-world problems.In the proposed work, we have used the DELM to encapsulate the advantages of both ELM and deep learning.The configuration model of DELM consisted of one input layer having four neurons, six hidden layers where each hidden layer consisted of 10 neurons, and one output layer having one neuron is illustrated in Figure 7.The trial and error method has been used to select the number of nodes in the hidden layers due to the unavailability of any specific mechanism for specifying hidden layers neurons.The projected output of the second hidden layer can be achieved as: where γ + represents the general inverse of matrix γ.Hence, the values of the hidden layer 2 can be simply achieved by means of Equation ( 11) and the activation function inverse.
In the Equation ( 13), the parameters W 1 , H, B 1 , and H 1 represent the weight matrix of the first two hidden layers, the bias of the first hidden layer neurons, the estimated output of first hidden layer, and the estimated output of the second hidden layer respectively.
H + E represents the inverse of H E and the activated function g(x) is used to compute Equation (5).So by specifying any proper activation function g(x), the desired result of the second hidden layer is updated as below: The update of the weighted Matrix γ between hidden layer 2 and hidden layer 3, carried out as in Equation (16), where H + 2 indicates the inverse of H 2 .Therefore, the estimated result of the layer 3 is represented as in Equation (17).
Vγ + new represents the inverse of the weight matrix γ new .Then the DELM defines the matrix The output of the third layer can be achieved by using Equations (10) and 11).
In Equation ( 18) the H 2 signifies the desired result and the hidden layer 2, the weight between the hidden layer 2 and the hidden layer 3 is represented by W 2 , the hidden layer B 2 is the bias of the hidden layer 3 neurons.H + E1 represents the inverse of H E1 , and g −1 (x) denotes the inverse of the activation function g(x).The logistic sigmoid function represented in Equation ( 20) has been adopted.The third hidden layer output is computed as in Equation (21).
Finally, the resultant weighted matrix between the hidden layer 3 and the last layer output is computed as in Equation (22).The estimated result of the hidden layer 3 is represented as in Equation (23).
Vγ + new represents the inverse of the weight matrix γ new .Then the DELM defines the matrix W HE2 = [B 3 , W 3 ].The output of the fourth layer can be achieved by using Equations ( 15) and (24).
In Equation (11), the H 3 denotes the desired output of the third hidden layer, the weight between the third hidden layer and the fourth hidden layer is represented by W 3 , the hidden layer B 3 is the bias of the third hidden layer neurons.H + E1 represents the inverse of H E1 , and g −1 (x) denotes the inverse of the activation function g(x).The logistic sigmoidal function has been adopted.The output of the third and fourth hidden layer is computed as Equation ( 26) below: Finally, the output weight matrix between the fourth layer and the output layer is computed as in Equation (27).The estimated result of the fifth layer can be denoted by Equation (28).The desired output of the DELM network is represented by Equation (29).
So far, we have discussed the calculation process of the four hidden layers of the DELM network.The cycle theory has been applied to demonstrate the calculation process of the DELM.The recalculation of Equations ( 18)-( 22) can be done to get and record each hidden layer's parameters and eventually the last result of DELM network.If increments occur in the hidden layers, the same computation procedure can be reused and executed similarly.In the proposed work we have applied a trial and error method [30,31] to determine the optimal neural network structure.Inputs to the DELM as shown in Figure 7 are hours of the day (X 1 ), days of the week (X 2 ), days of the month (X 3 ) and month (X 4 ) and the output is the energy consumption prediction (ECP).
Electronics 2018, 7, x; doi: FOR PEER REVIEW and eventually the last result of DELM network.If increments occur in the hidden layers, the same computation procedure can be reused and executed similarly.In the proposed work we have applied a trial and error method [30,31] to determine the optimal neural network structure.Inputs to the DELM as shown in Figure 7 are hours of the day (X1), days of the week (X2), days of the month (X3) and month (X4) and the output is the energy consumption prediction (ECP).

Artificial Neural Network (ANN)
ANNs are based on biological information processing and have been extensively used for energy consumption prediction in residential buildings.The ANNs have been commonly used because of their robust nonlinear mapping capability.The ANN might be reflected in a regression method, which signifies the sophisticated nonlinearity between independent and dependent variables [45].In recent years, researchers have deployed ANN models for analyzing numerous types of prediction problems in a variety of circumstances.The ANN model used in the proposed work is the multilayer perceptron (MLPs).MLPs usually have three layers namely input, hidden, and the output consisting of input nodes, neurons, and synaptic connections.In MLPs backpropagation method is used to reduce the prediction residual sum of squares (RSS).The mathematical representation of RSS is given in the Equation (30).
where Y i represents the ith targeted values in the training data and Y i indicates the predicted values.The strength of the input signal is represented through synapse weights, and initially, these weights are initially allocated randomly.The sum of the product of each connection input value and synapse is computed and provided as input to each neuron in the hidden layer.Commonly three types of activation functions, namely linear, tan-sigmoid, and logarithmic sigmoid as represented in Equations ( 31)-( 33) respectively are used in the hidden layer and output layer of MLP [30].
The tan-sigmoid function is used as the activation function in the hidden layer.The best transfer function selection in the hidden is also somewhat trial and test method [46].In the proposed work, we have tested five transfer functions, such as tan-sigmoid, linear, radial basis, symmetric, and saturating linear.In the output layer linear function has been used which is the most appropriate activation function for output neuron (s) of ANNs for regression problems.The structure diagram of the ANN used in the proposed approach is shown in Figure 8.The tan-sigmoid function is used as the activation function in the hidden layer.The best transfer function selection in the hidden is also somewhat trial and test method [46].In the proposed work, we have tested five transfer functions, such as tan-sigmoid, linear, radial basis, symmetric, and saturating linear.In the output layer linear function has been used which is the most appropriate activation function for output neuron (s) of ANNs for regression problems.The structure diagram of the ANN used in the proposed approach is shown in Figure 8.  Different training algorithms, such as Levenburg Marquardt (LM), Bayesian regularization, scaled conjugate gradient and so forth [47] have been used for network training.The development of MLP with the number of pre-defined hyper-parameters disturbs the ability of fitness of the model.The selection of the number of neurons in the hidden layer is a somewhat trial and test method [30].

Adaptive Neuro-Fuzzy Inference System (ANFIS)
ANFIS uses a feed-forward network having multiple layers which use NN algorithms for learning and fuzzy reasoning for mapping the input space to output space.ANFIS is used extensively in various areas for predictions [48][49][50].ANFIS is a fuzzy inference system (FIS), and its implementation is carried out in the adaptive neural framework.The structure of ANFIS is shown in Figure 9 (2 inputs, and one output) for the first order Sugeno fuzzy logic model.In this structure, for each input, two membership functions have been defined.Different training algorithms, such as Levenburg Marquardt (LM), Bayesian regularization, scaled conjugate gradient and so forth [47] have been used for network training.The development of MLP with the number of pre-defined hyper-parameters disturbs the ability of fitness of the model.The selection of the number of neurons in the hidden layer is a somewhat trial and test method [30].

Adaptive Neuro-Fuzzy Inference System (ANFIS)
ANFIS uses a feed-forward network having multiple layers which use NN algorithms for learning and fuzzy reasoning for mapping the input space to output space.ANFIS is used extensively in various areas for predictions [48][49][50].ANFIS is a fuzzy inference system (FIS), and its implementation is carried out in the adaptive neural framework.The structure of ANFIS is shown in Figure 9 (2 inputs, and one output) for the first order Sugeno fuzzy logic model.In this structure, for each input, two membership functions have been defined.The adaptive neuro-fuzzy system consisted of five different layers, each of them is explained below in detail such as layer 1 nodes are adaptive and produce inputs degree of membership functions (MFs).Layer 2 nodes are fixed, and these nodes do simple multiplication.Layer 3 nodes are also fixed, and the role of these nodes in the network is normalization.Layer 4 nodes are adaptive whose output is a simple multiplication of normalized firing strength and first-order Sugeno model.The factors of this layer are named as consequent factors.Layer 5 has a single permanent node where the calculation of all incoming is carried out.
The supervised learning is used to train the network.Hence, the purpose is to train the adaptive network for known functions approximation supplied by training data and then finds the exact value of the mentioned parameters.There is no hard and fast rule to determine a suitable number of The adaptive neuro-fuzzy system consisted of five different layers, each of them is explained below in detail such as layer 1 nodes are adaptive and produce inputs degree of membership functions (MFs).Layer 2 nodes are fixed, and these nodes do simple multiplication.Layer 3 nodes are also fixed, and the role of these nodes in the network is normalization.Layer 4 nodes are adaptive whose output is a simple multiplication of normalized firing strength and first-order Sugeno model.The factors of this layer are named as consequent factors.Layer 5 has a single permanent node where the calculation of all incoming is carried out.
The supervised learning is used to train the network.Hence, the purpose is to train the adaptive network for known functions approximation supplied by training data and then finds the exact value of the mentioned parameters.There is no hard and fast rule to determine a suitable number of membership functions for a variable in ANFIS.In the proposed work, we have applied the trial and error mechanism to determine the effective number of MFs for each variable.Similarly, there are many types of membership functions, such as triangular, trapezoidal, and so forth [49].In the proposed work, we have considered the bell-shaped membership functions as illustrated in Equations ( 34) and (35).
The bell-shaped membership functions are the most common and effective MFs used in the ANFIS for prediction purposes [51].

Performance Evaluation Layer
Several criteria are used for performance computation of different prediction algorithms.In the performance evaluation layer of the proposed model, MAE, RMSE, and MAPE performance indices have been used to compare the target values and the actual values.The MSE is a measure that used for the minimization of the error distribution.The RMSE measures the error between the predicted power and the targeted power, and the MAPE is a measure which evaluates the prediction difference as a percentage of the targeted power.The RMSE, MAE, and MAPE performance can be computed in Equations ( 36)-( 38) respectively as: where N indicates the entire values, T represents the target value, and P indicates the predicted value.These metrics provide a single value to measure the accuracy of the outcomes of different algorithms.These statistical measurements have been used in previous studies to analyze energy consumption prediction models [34].

Model Validation of DELM
To validate the model and analyze the experiments we have used, the actual data collected by using different meters fixed in the designated multi-storied residential buildings.The data is collected for a single year, i.e., 1 January 2010 to 31 December 2010.The size of complete input data is equal to 365 days × 24 h per day = 8760.The installation of smart meters at each floor sub-distribution switchboard has been carried out, and these meters are in connection with the central server.The energy consumption for each hour is recorded for a year, and the unit used for measurement is Kilowatt hour (kWh).The information contained by data-set is floor-wise hourly energy consumption.Example view for two days hourly collected data is illustrated in Figure 10 for anonymous building-04 having 33 floors.
Electronics 2018, 7, x; doi: FOR PEER REVIEW using different meters fixed in the designated multi-storied residential buildings.The data is collected for a single year, i.e., 1 January 2010 to 31 December 2010.The size of complete input data is equal to 365 days × 24 h per day = 8760.The installation of smart meters at each floor sub-distribution switchboard has been carried out, and these meters are in connection with the central server.The energy consumption for each hour is recorded for a year, and the unit used for measurement is Kilowatt hour (kWh).The information contained by data-set is floor-wise hourly energy consumption.Example view for two days hourly collected data is illustrated in Figure 10 for anonymous building-04 having 33 floors.We have used four important parameters, hours of the day (X 1 ), days of the week (X 2 ), days of the month (X 3 ) and month (X 4 ) as input to machine learning algorithms used in the proposed work.Further, to prevent overfitting, we used the k-fold cross-validation.It is a popular method because it is simple to understand and generally results in a less biased or less optimistic estimate of the model skill than other methods, such as a simple train/test split [52].For one-week energy consumption prediction the data is divided into 52 folds of approximately equal size.The first fold is treated as a validation set, and the method is fit on the remaining folds.In the proposed work, we have carried out energy consumption prediction for one-week and one-month.Hence, for a one-week energy consumption prediction, one year hourly data is divided into 52 folds.In testing data set we have used the one-week (7 days × 24 h = 168) data for testing and the remaining data (358 days × 24 h = 8592) to train the models for one week energy consumption prediction.We have swapped the results achieved for one-week energy consumption prediction, and next to the training and testing data and randomly selected another data set (one week) for testing and the remaining for training.This process continues until 52 iterations.Similar, for one-month energy consumption prediction the data have been divided into 12 (k) sets with approximately equal size.We have used the one-month (January) (31 days × 24 h = 744 h) data for testing and the remaining 11 months data (8016 h) for training.Next, we have selected another month (February) data (28 days × 24 h = 672 h) for testing and the remaining 11 months hourly data (8088 h) for training.The processes continue until the 12th month (December) hourly data (31 days × 24 h = 744 h) get selected for testing and the remaining (11 months data) (31 days × 24 h = 744 h) for training.Finally, the average of the testing results was determined.
The optimum network configuration depends on the number of hidden layers, a number of neurons in the hidden layer (s), and the type of activation function.We have applied trial and error method to achieve the optimum structure [46].After applying trial and error method, we achieved the well-suited configuration model consisted of 6 hidden layers and 20 neurons in each hidden layer for the proposed DELM approach.The sigmoid activation function is used as activation function because it is the most popular activation function and numerously used from the last couple of years [51].We have also tried different iteration numbers from 1000 to 3000 with 100 increments and set iteration number as 2000.Now by using the best-suited configuration model the one-week and one-month hourly energy consumption resulted are recorded as shown in Figures 11 and 12 respectively.function because it is the most popular activation function and numerously used from the last couple of years [51].We have also tried different iteration numbers from 1000 to 3000 with 100 increments and set iteration number as 2000.Now by using the best-suited configuration model the one-week and one-month hourly energy consumption resulted are recorded as shown in Figures 11 and 12    In this work, we have used ANN and ANFIS models for comparison with ANN and DELM.The reason behind the selection of ANFIS was its ability to seek for useful features and development of the prediction model.The ANNs is also a very famous technique and it is used for energy consumption purposes.

Model Validation of ANFIS
The structure diagram of the ANFIS for the proposed work is shown in Figure 13 [53].We have used a trial and error approach for the type and number of membership functions selection in the proposed work.For each variable, we have considered two generalized bell-shaped MFs as shown in Figure 14.Total 16 rules were specified; the rule viewer is shown in Figure 15.In this work, we have used ANN and ANFIS models for comparison with ANN and DELM.The reason behind the selection of ANFIS was its ability to seek for useful features and development of the prediction model.The ANNs is also a very famous technique and it is used for energy consumption purposes.

Model Validation of ANFIS
The structure diagram of the ANFIS for the proposed work is shown in Figure 13 [53].We have used a trial and error approach for the type and number of membership functions selection in the proposed work.For each variable, we have considered two generalized bell-shaped MFs as shown in Figure 14.Total 16 rules were specified; the rule viewer is shown in Figure 15.

Model Validation of ANFIS
The structure diagram of the ANFIS for the proposed work is shown in Figure 13 [53].We have used a trial and error approach for the and number of membership functions selection in the proposed work.For each variable, we have considered two generalized bell-shaped MFs as shown in Figure 14.Total 16 rules were specified; the rule viewer is shown in Figure 15.The output predicted results for one-week and one-month energy consumption is shown in Figures 16 and 17    The output predicted results for one-week and one-month energy consumption is shown in Figures 16 and 17

Model Validation of ANN
In the proposed work for achieving the best ANN prediction model, we tried different hidden layer activation functions, different training functions, and output layer transfer functions instead of the sensitivity of input parameters.All the network models have four neurons in input layer, a single neuron in output layer and for hidden layer, we have tried some neurons in the hidden layer started from 5 to 30 with five increments to find best combination input layer, hidden layer, and output layer neurons Trial and error method has been applied to determine the number of neurons in hidden layers [30].We have considered the model as shown in Figure 18

Model Validation of ANN
In the proposed work for achieving the best ANN prediction model, we tried different hidden layer activation functions, different training functions, and output layer transfer functions instead of the sensitivity of input parameters.All the network models have four neurons in input layer, a single neuron in output layer and for hidden layer, we have tried some neurons in the hidden layer started from 5 to 30 with five increments to find best combination input layer, hidden layer, and output layer neurons Trial and error method has been applied to determine the number of neurons in hidden layers [30].We have considered the model as shown in Figure 18

Model Validation of ANN
In the proposed work for achieving the best ANN prediction model, we tried different hidden layer activation functions, different training functions, and output layer transfer functions instead of the sensitivity of input parameters.All the network models have four neurons in input layer, a single neuron in output layer and for hidden layer, we have tried some neurons in the hidden layer started from 5 to 30 with five increments to find best combination input layer, hidden layer, and output layer neurons Trial and error method has been applied to determine the number of neurons in hidden layers [30].We have considered the model as shown in Figure 18

Discussion and Comparative Results Analysis
In the proposed work, we have applied the deep extreme learning algorithm along with ANN and ANFIS on real data collected for one year to predict energy consumption in buildings for one-week and one-month.The data have been pre-processed to remove abnormalities from the data and make the data smooth and error free.In the DELM different number of hidden layers, hidden neurons, different combinations of activation functions have been tried to find the best configuration model for energy consumption prediction.For a fair comparison, we also apply trial and error approach for ANN to find the best configuration model.Hence, we have tried different numbers of neurons in hidden layers, different types of activation functions, and different numbers of neurons in the hidden layer.Similarly, we also tested different types of membership functions and different numbers of membership functions to achieve the suitable structure of ANFIS for energy consumption prediction.
In this work, we have applied the proposed DELM for two different periods of time energy consumption prediction along with optimized ANN, and ANFIS approaches to test the efficiency of these algorithms properly.For one-week energy consumption prediction the data into training set would be more significant as compared to one-month energy consumption prediction.Hence to properly evaluate the performance of prediction algorithms both short-term and long-term energy consumption prediction have been carried output.
We have used different statistical measures to measure the performance of the proposed DELM algorithm along with counterpart algorithms.In Tables 1 and 2 the MAE, RMSE and MAPE values of DELM, ANN and ANFIS for one-week and one-month energy consumption prediction have been recorded.As in the proposed work, we have computed the one-month and one-week energy consumption prediction using machine learning algorithms.Hence the average of statistical measures values for both periods for the used prediction algorithms have been computed in Table 3.These statistical measures values indicate that the DELM performance is far better than the other

Discussion and Comparative Results Analysis
In the proposed work, we have applied the deep extreme learning algorithm along with ANN and ANFIS on real data collected for one year to predict energy consumption in buildings for one-week and one-month.The data have been pre-processed to remove abnormalities from the data and make the data smooth and error free.In the DELM different number of hidden layers, hidden neurons, different combinations of activation functions have been tried to find the best configuration model for energy consumption prediction.For a fair comparison, we also apply trial and error approach for ANN to find the best configuration model.Hence, we have tried different numbers of neurons in hidden layers, different types of activation functions, and different numbers of neurons in the hidden layer.Similarly, we also tested different types of membership functions and different numbers of membership functions to achieve the suitable structure of ANFIS for energy consumption prediction.
In this work, we have applied the proposed DELM for two different periods of time energy consumption prediction along with optimized ANN, and ANFIS approaches to test the efficiency of these algorithms properly.For one-week energy consumption prediction the data into training set would be more significant as compared to one-month energy consumption prediction.Hence to properly evaluate the performance of prediction algorithms both short-term and long-term energy consumption prediction have been carried output.
We have used different statistical measures to measure the performance of the proposed DELM algorithm along with counterpart algorithms.In Tables 1 and 2 the MAE, RMSE and MAPE values of DELM, ANN and ANFIS for one-week and one-month energy consumption prediction have been recorded.As in the proposed work, we have computed the one-month and one-week energy consumption prediction using machine learning algorithms.Hence the average of statistical measures values for both periods for the used prediction algorithms have been computed in Table 3.These statistical measures values indicate that the DELM performance is far better than the other counterpart algorithms.The performance of ANFIS is better as compared to the ANN.The statistical measures values indicate that the performance of the proposed DELM is far better than the ANN and ANFIS for short-term (one-week) as well as on long-term (one-month) hourly energy consumption prediction.So, the proposed DELM is the best choice for the energy consumption prediction for both short and long terms energy consumption prediction.

Conclusions and Future Work
Modelling of energy consumption prediction in residential buildings is a challenging task, because of randomness and noisy disturbance.To obtain better prediction accuracy, in this paper, we have proposed a model for energy consumption prediction in residential buildings.The proposed model comprised of four stages, namely data acquisition layer, a preprocessing layer, the prediction layer, and performance evaluation layer.In data acquisition layer the data was collected through smart meters in a designated building to validate the model and analyze the results.In the preprocessing layer, some pre-processing operations have been carried out on the data to remove abnormalities from the data.In the second stage, we have proposed deep extreme learning machine and applied to the pre-processed data for one-week and one-month energy consumption prediction in residential buildings.The purpose of using different machine learning algorithms on collected data was to obtain better results in term of accuracy for practical applications.For the optimal structure of DELM different parameters various number of hidden layers, different numbers of neurons in the hidden layer and different activation functions have been tuned to get the optimized structure of DELM.We have also applied other well-known machine learning algorithms, such as ANN, and ANFIS one the same data for comparison with proposed DELM.We have used different statistical measures for performance measurements of these machine learning algorithms.These statistical measures values indicate that the performance of proposed DELM is far better as compared to other counterpart algorithms.These

Figure 1 .
Figure 1.Annual energy consumption distribution in the different zones of South Korea [10]

Figure 1 .
Figure 1.Annual energy consumption distribution in the different zones of South Korea [10].

Figure 2 .
Figure 2. A conceptual model of the proposed approach.

Figure 3 .
Figure 3. Detailed processing diagram for the proposed energy consumption prediction approach.

Figure 3 .
Figure 3. Detailed processing diagram for the proposed energy consumption prediction approach.

Electronics 2018, 7 ,
x FOR PEER REVIEW 6 of 22Electronics 2018, 7, x; doi: FOR PEER REVIEW conditions, circumstances, temperature, humidity, user occupancy, and so forth.For user occupancy detection, several Passive Infra-Red (PIR) sensors were used to obtain information in 0, 1 form, such as busy or not busy.To get the information about user occupancy, the installation of cameras in transition positions between several regions of the building was required.The collection of the designated residential buildings data was carried out from January 2010 to December 2010.The building had 33 floors (394 ft.tall), the floor-wise information was collected from smart meters and used for this work.The installations of these meters have been carried out in a floor wise manner in the chosen buildings.It also indicated a direct relationship between energy utilization and users occupancy in the dataset.To better explain the entire energy consumption data, a box plot was used.The x-axis represents the hours of the day (24 h), the y-axis represents the energy consumption in kWh.The box represents the energy consumption in a particular hour of the day for the whole year.The long length box indicates high energy consumption and the short length box indicates low energy consumption.As residential buildings are very busy during noon and night times, the energy consumption was higher during these timings.The entire dataset of hourly energy consumption used in the proposed work is shown in Figure5for better observation.

Figure 5 .
Figure 5. Distribution of data, on an hourly basis, for energy consumption.

Figure 5 .
Figure 5. Distribution of data, on an hourly basis, for energy consumption.

Figure 6 .
Figure 6.Structural diagram of an extreme learning machine (ELM).Initially, take a sample of training [ , ] = , , ( = 1,2, … ., ) , and input feature = [ … . .] and a targeted matrix = [ … . .] consisted of the training samples, then A and B matrices can be represented as in Equations (3) and (4) respectively, where, a and b represent the dimension of the input matrix and the output matrix respectively.Next, the ELM arbitrarily adjusts the weights between the input layer and the hidden layer where is the weight between the kth input layer node and lth hidden layer node as represented in Equation(5).Then, the ELM randomly fixes the weights between the hidden neurons and output layer neurons that can be represented by Equation (6), where γ is the weight between the input and hidden layer nodes.

Figure 6 .
Figure 6.Structural diagram of an extreme learning machine (ELM).Initially, take a sample of training [A, B] = a k, b k, (i = 1, 2, . . ., Z), and input feature A = [a k1 a k2 a k3 . . .a kZ ] and a targeted matrix B = [b l1 b l2 b l3 . . .b lZ ] consisted of the training samples, then A and B matrices can be represented as in Equations (3) and (4) respectively, where, a and b represent the dimension of the input matrix and the output matrix respectively.Next, the ELM arbitrarily adjusts the weights between the input layer and the hidden layer where w kl is the weight between the kth input layer node and lth hidden layer node as represented in Equation(5).Then, the

Figure 7 .
Figure 7. Structural diagram of the proposed energy consumption prediction based on the deep extreme learning machine (DELM) approach.

Figure 7 .
Figure 7. Structural diagram of the proposed energy consumption prediction based on the deep extreme learning machine (DELM) approach.

Figure 8 .
Figure 8. Structural of a model of the artificial neural network (ANN) used in the proposed approach.

Figure 8 .
Figure 8. Structural of a model of the artificial neural network (ANN) used in the proposed approach.

Figure 9 .
Figure 9. Structural diagram for an adaptive neuro-fuzzy inference system.

Figure 9 .
Figure 9. Structural diagram for an adaptive neuro-fuzzy inference system.

Figure 10 .Figure 10 .
Figure 10.Example view of two day hourly energy consumption data collected in Building-IV. respectively.

Figure 11 .
Figure 11.Actual vs. DELM predicted results for one-week energy consumption.

Figure 12 .
Figure 12.Actual vs. DELM predicted results for one-month energy consumption.

Figure 13 .
Figure 13.Screenshot of the structure of the adaptive neuro-fuzzy inference system (ANFIS) for the proposed work [53].

Figure 13 . 22 Figure 14 .
Figure 13.Screenshot of the structure of the adaptive neuro-fuzzy inference system (ANFIS) for the proposed work [53].Electronics 2018, 7, x FOR PEER REVIEW 16 of 22

Figure 16 .
Figure 16.Actual vs. ANFIS predicted results for one-week energy consumption.Figure 16.Actual vs. ANFIS predicted results for one-week energy consumption.

Figure 16 .
Figure 16.Actual vs. ANFIS predicted results for one-week energy consumption.Figure 16.Actual vs. ANFIS predicted results for one-week energy consumption.

Figure 17 .
Figure 17.Actual vs ANFIS predicted results for one-month energy consumption.
because it provides the least MSE values with tan-sigmoid (x) function in the hidden layer, linear function in the output layer and the Levenberg-Marquardt algorithm for training.

Figure 18 .
Figure 18.ANN structure model to predict energy consumption.

Figure 17 .
Figure 17.Actual vs ANFIS predicted results for one-month energy consumption.
because it provides the least MSE values with tan-sigmoid (x) function in the hidden layer, linear function in the output layer and the Levenberg-Marquardt algorithm for training.Electronics 2018, 7, x FOR PEER REVIEW 17 of 22.

Figure 17 .
Figure 17.Actual vs ANFIS predicted results for one-month energy consumption.
because it provides the least MSE values with tan-sigmoid (x) function in the hidden layer, linear function in the output layer and the Levenberg-Marquardt algorithm for training.

Figure 18 .
Figure 18.ANN structure model to predict energy consumption.

Figure 19 .
Figure 19.Actual vs. ANN predicted results for one-week energy consumption.Figure 19.Actual vs. ANN predicted results for one-week energy consumption.

Figure 19 . 22 Figure 20 .
Figure 19.Actual vs. ANN predicted results for one-week energy consumption.Figure 19.Actual vs. ANN predicted results for one-week energy consumption.Electronics 2018, 7, x FOR PEER REVIEW 18 of 22

Figure 20 .
Figure 20.Actual vs. ANN predicted results for one-month energy consumption.
, in essence, to have the same range of values for each of the inputs to the machine learning models.It can guarantee a stable convergence of weight and biases.In machine learning algorithm modelling for increasing prediction accuracy and for training process improvement, complete sample data were normalized to fit them in the interval [0, 1] by using Equation (2) given below: Electronics 2018, 7, x; doi: FOR PEER REVIEW normalization was done w 11 w 12 . . .w 1p w 21 w 22 . . .w 2p w 31 w 32 . . .w 3p γ 11 γ 12 . . .γ 1r γ 21 γ 22 . . .γ 2r γ 31 γ 32 . . .γ 3r

Table 1 .
Performance evaluation of deep extreme learning machine (DELM), adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN) for one-week energy consumption prediction.

Table 2 .
Performance evaluation of DELM, ANFIS and ANN for one-month energy consumption prediction.

Table 3 .
Average values of statistical measures for one-week and one-month energy consumption prediction results of DELM, ANFIS, and ANN.