You are currently viewing a new version of our website. To view the old version click .
Energies
  • Article
  • Open Access

29 January 2022

Forecasting Building Energy Consumption Using Ensemble Empirical Mode Decomposition, Wavelet Transformation, and Long Short-Term Memory Algorithms

,
,
and
1
Taiwan Building Technology Center, National Taiwan University of Science and Technology, Taipei 106, Taiwan
2
Department of Industrial Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan
3
Department of Logistics Engineering, Universitas Pertamina, Jakarta 12220, Indonesia
*
Author to whom correspondence should be addressed.
This article belongs to the Topic Exergy Analysis and Its Applications

Abstract

A building, a central location of human activities, is equipped with many devices that consume a lot of electricity. Therefore, predicting the energy consumption of a building is essential because it helps the building management to make better energy management policies. Thus, predicting energy consumption of a building is very important, and this study proposes a forecasting framework for energy consumption of a building. The proposed framework combines a decomposition method with a forecasting algorithm. This study applies two decomposition algorithms, namely the empirical mode decomposition and wavelet transformation. Furthermore, it applies the long short term memory algorithm to predict energy consumption. This study applies the proposed framework to predict the energy consumption of 20 buildings. The buildings are located in different time zones and have different functionalities. The experiment results reveal that the best forecasting algorithm applies the long short term memory algorithm with the empirical mode decomposition. In addition to the proposed framework, this research also provides the recommendation of the forecasting model for each building. The result of this study could enrich the study about the building energy forecasting approach. The proposed framework also can be applied to the real case of electricity consumption.

1. Introduction

Energy is one of the basic needs of human life. In the last few decades, energy consumption has increased significantly due to several factors, such as a population increase, indoor human activities, an increasing number of buildings, and global climate change. As most human activities are conducted in buildings, it consumes most of the electricity. In the US and the European Union, 40% of energy consumption comes from buildings [].
Buildings can be categorized into several types based on their function: industry, transportation, housing or residential, commercial, public services, agriculture, fisheries, and other sectors. Among all of these types, housing/residential buildings such as housing complexes and apartments consume up to 30% of the total energy buildings worldwide []. In some countries, the energy consumption of commercial buildings such as offices, hospitals, shopping centers, restaurants, warehouses, and others is relatively high because they use many devices continuously []. The EIA [] predicts that energy consumption in buildings will grow by an average of 1.3% per year from 2018 to 2050.
Although energy consumption is predicted to increase continuously, more accurate energy demand forecasting is essential. An accurate energy demand forecast will be essential information to make a decision in many different fields, including scheduling, operations, monitoring, and others []. Forecasting has an important role in different perspectives for future energy development regionally, nationally, and globally []. From the perspective of city stakeholders, it is helpful to understand energy consumption and its variations across types of urban buildings and their user profiles. Energy allocation based on an accurate prediction can improve building policy []. Accurate energy consumption forecasting can also avoid overestimating and underestimating resource allocation []. From the building owner’s or operator’s point of view, energy forecasting is essential for designing daily operational rules for the building []. For the power company and government, city-scale energy consumption predictions help to optimize the electricity distribution and production scenario []. Therefore, energy management with accurate forecasting is needed because accuracy allows management to allocate energy needs due to the significant demand to avoid unexpected power outages and reduce operational costs []. The primary purpose of this study is to propose a forecasting framework for energy consumption.
Forecasting itself can be classified based on the timespan; short-term, mid-term, and long-term []. For example, short-term forecasting with a time span of hourly data for one week [] will have higher accuracy than mid-term and long-term forecasting. It helps the building management in operation []. It also can improve the automation of the energy system of a building [] at the management level.
Many studies accommodate this problem to improve an accurate energy analysis system, one of which is a research work that forecasts electrical energy consumption in buildings. However, several factors become challenges for researchers in making accurate forecasts []. The challenging factors include the limitation of the physical method which requires high computational complexity. On the other hand, the statistics methods cannot perform well for nonlinear hidden feature data.
The proposed forecasting framework combines a time series decomposition method with recurrent neural network algorithms to obtain an accurate forecasting model. There are two decomposition algorithms utilized in this study. They are the ensemble empirical mode decomposition (EEMD) and wavelet transformation (WT). Some previous studies have applied forecasting energy demand using the long short-term memory (LSTM) algorithm [,,]. Peng, Wang, Xia, and Gao [] and Somu, MR, and Ramamritham [] applied the LSTM with WT. On the other hand, Gao, Ruan, and Buildings [] enhanced the LSTM with a feature attention process. These studies show that the LSTM algorithm is a promising algorithm for energy consumption prediction. However, further improvements should be made to get a better result with a high generalization. Thus, this study proposes a forecasting framework, which consists of data preprocessing, data decomposition using EEMD and WT, and the aggregation process for the final prediction.
This study evaluates the proposed forecasting framework’s performance using six benchmark datasets and a real study case of electricity consumption of a university building in Taiwan. Each dataset consists of several buildings. Thus, a total of 20 buildings are evaluated in this study.

2. Literature Review

This section reviews some fundamental theories applied in this research. It includes time series forecasting, decomposition algorithms, and recurrent neural network.

2.1. Time Series Forecasting

Forecasting is the process of predicting future events []. Runge and Zmeureanu [] define forecasting as the process of estimating a value derived from current values or historical data to determine the value that will be generated in the future. In forecasting energy consumption, the data used are time-series data. Electrical energy consumption data are time-series data observed based on the same time interval []. Therefore, forecasting electrical energy consumption is categorized as time-series data forecasting. Its goal is to predict events that will occur in the future based on a series of events or history [].
Problems in forecasting are generally packaged in the form of a time period, such as days, weeks, or months []. Based on the time horizon, forecasting is divided into several categories [], those are:
1.
Short-term
The coverage period of this forecast is hours, days, or weeks ahead. This type of forecasting manages daily operational activities, purchase planning, and evaluation [].
2.
Medium-term
The scope of the forecasting period is 1–2 years in the future []. This forecasting category is very useful for the strategic planning of demand [].
3.
Long-term
The coverage period of this forecast is about 1–50 years in the future. It is commonly used in a planning system or energy plant installation [].
Many techniques have been used for forecasting data on electricity consumption in buildings. To obtain an accurate forecast, the methodology should consider data complexity, including weather conditions, lighting systems, and other systems that involve the use of electricity []. According to Deb, Zhang, Yang, Lee, and Shah [], there are two categories of forecasting techniques: the data-driven technique and the deterministic technique. Table 1 shows several advantages and disadvantages of both techniques.
Table 1. The Advantages and Disadvantages of Data-Driven and Deterministic Technique.
Bourdeau, et al. [] divide energy forecasting methods in buildings into three categories, namely physical (white box), data-driven (black box), and hybrid (gray box). The data-driven (black box) method uses machine learning and statistics to build and model energy in buildings []. The physical method (white box) can be interpreted the same as the deterministic method above. In contrast, the hybrid method combines knowledge and information from physical and data-driven methods. Most of the methods used by researchers are data-driven (black box).

2.2. Recurrent Neural Network

Recurrent neural network (RNN) is a very popular model used to solve forecasting time series because it can retrieve information obtained from time-series data []. The RNN develops the forecasting model based on the input in the current state and inputs from the previous states []. RNN extends an artificial neural network (ANN) that connects neurons in the same hidden layer [] the structure as illustrated in Figure 1.
Figure 1. Structure of recurrent neural network [].
The RNN structure consists of a loop that can store information from the past []. Although the RNN has good forecasting capabilities, it cannot handle data dimensions of long-term dependencies []. The long short-term memory (LSTM) improves RNN, overcoming the long-term dependencies problems []. The LSTM solves the vanishing gradient problem due to not overcoming data with long-term dependencies []. The LSTM uses a component called memory blocked []. The vanishing gradient problem can be eliminated by applying the gate mechanism and memory cells that replace the nodes in it [] as illustrated in Figure 2.
Figure 2. Structure of long short-term memory (LSTM) [].
Memory blocked is a subnet containing repeatedly connected functional modules called memory cells and gates []. Memory cells are tasked with remembering the state of the network and the gates formed to regulate information patterns []. LSTM is divided into three parts, namely input gates, output gates, and forget gates. Input gates function to control the information entering memory cells, forget gates function to remember the amount of information passing through the network, and output gates function to control the amount of information used to perform calculations [].

2.3. Ensemble Empirical Mode Decomposition

The empirical mode decomposition (EMD) is a pre-processing method used for non-stationary data [], such as energy-time series data in buildings. Non-stationary data will be decomposed into several components named intrinsic mode functions (IMF) as illustrated in Figure 3 []. The EMD method shows its superiority in time series data forecasting []. Thus, the ensemble empirical mode decomposition (EEMD) was proposed to improve the EMD []. This method was created to overcome the mixing mode problems in EMD [,]. It also aims to extract the existing oscillation function []. Mode mixing is a problem that arises because one IMF component has a different signal scale or the same signal scale in different IMFs []. To bypass the mixing mode, EEMD follows the EMD steps on the original time series x ( t ) ,   ( t = 1 , 2 , ,   t ) , to get a set of IMF added by Gaussian white noises. Then, the average from the IMF applies as the final decomposition result [].
Figure 3. Intrinsic mode function (IMF) [].
The steps of EEMD as explained below []:
1.
Add the white noise into the original data x ( t ) to get the new construction:
x i ( t ) = x ( t ) + w i ( t ) ,
2.
Describes the time series which have been added the white noise into nth IMF c j i ( t ) ( j = 1 , 2 , , n ) and a residue EMD, r i ( t ) .
x i ( t ) = j = 1 n c j i ( t ) + r i ( t ) ,
3.
Repeat steps 1 and 2 M-times with different white noises until you get the appropriate decomposition result.
4.
Calculate the average of the IMF trials which conducted by M-times as the final IMF.
c j ( t ) = 1 M j = 1 n c j i ( t ) ,
The result of EEMD can be described as a linear combination of IMF and residuals formulated by Equation (4).
x ( t ) = j = 1 n c j ( t ) + r ( t )
The amount of IMF denoted by n, c j ( t ) ( t = 1 , 2 , t ) is the jth IMF extracted in the jth decomposition in the interval t and r ( t ) the final residue. It assumed that the complexity of the data involves information and noises, while the average of the ensemble data with different noises approximates the actual signal, white noises are used to capture the true IMF, and themselves are offset by the ensemble mean. The error of adding white noises can be controlled by Equation (5).
ε n e = ε M ,
where:
  • M —number of ensemble members
  • ε —amplitude of the added noises series amplitude
  • ε n e   —final standard deviation of error
  • ε n e —also interpreted as the difference between the input signal and the relevant IMF [].

2.4. Wavelet Transformation

Wavelet transformation (WT) is a time-frequency decomposition method that provides a time series basis, both time and frequency []. Time series identical to non-stationary data. The WT is a wave that moves up and down in space-time periodically []. WT can also be described as a short wave. This model describes the frequency of the signal timing []. This model is also known as pre-processing for the denoising process on time series data by decomposing the data into several series [].
WT identifies the signal shifts and then analyzes them. The purpose of the analysis is to obtain the information and frequency spectrum simultaneously []. This model is considered an effective model in analyzing time-frequency after special Fourier analysis in signal processing [].
WT has two types of models, namely Continuous Wavelet Transformation (CWT) and Discrete Wavelet Transformation (DWT) []. CWT is modeled by Equation (6).
C W T y ( α , τ ) = 1 | α | y ( t ) ψ * ( t τ α ) d t ,
where α is parameters of scale, τ is translation parameters, ψ * ( x ) is conjugation complex functions, and ψ ( x ) is mother wavelet.
The Discrete Wavelet Transformation (DWT) is modeled in Equation (7).
D W T y ( m , n ) = α 0 m 2 y ( t ) ψ * ( α 0 m t n τ 0 ) d t ,
where m is a constant scale (decomposition level), and n is a constant integer.

3. Methodology

This study proposes a forecasting framework for electricity consumption. The main challenge in this problem is the complexity of the data pattern. Therefore, a decomposition algorithm is applied to understand data characteristics. Afterward, it applies an LSTM algorithm is to forecast the electricity consumption. Figure 4 illustrates the proposed forecasting framework. The main contribution of this paper is the forecasting framework, which consists of decomposition algorithm. The decomposition algorithm divides time-series into several series. Thus, it needs an additional procedure to make a prediction for the data. Each series is predicted using neural network-based algorithm. Afterward, the results need an aggregation process. The proposed forecasting framework also involves the aggregation step for the data.
Figure 4. The proposed forecasting framework.

3.1. Data Pre-Processing

Data pre-processing is a critical part of processing complex time-series data. Therefore, the proposed framework applies data pre-processing techniques, including normalization, statistics descriptive analysis, and decomposition.

3.1.1. Data Normalization

Data normalization aims to avoid a large variety of data values. Normalization is carried out using the standard scaler method, as shown in Equation (8).
x s c a l e d = x i x m e a n s t a n d a r d   d e v i a t i o n ,

3.1.2. Statistics Descriptive Analysis

Data exploration includes analyzing the data distribution by observing the data patterns, performing descriptive statistical tests, and performing stationarity tests using the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) and augmented Dickey-Fuller (ADF) to see the stationarity of the data to strengthen the theory of the method used. Furthermore, data visualization is also applied to analyze the data before forecasting based on the pattern captured from the data visualization.

3.1.3. Data Decomposition

Decomposition is useful for data that have a high level of complexity and non-stationary data to improve the performance of the forecasting method that will be used. In the decomposition, two methods are used. They are WT and EEMD.
  • Wavelet Transformation
This study uses DWT to analyze the data with a time-scale function. WT has a multi-scale resolution and has time-shifting characteristics. Moreover, scaling operations can observe signals at different scales []. Thus, this method is very suitable for handling non-stationary time series data.
Based on Figure 4, the data are divided into two parts, training set x t r a i n ( t ) = 1 , 2 , , n t r a i n and testing set x t e s t ( t ) = 1 , 2 , , n t e s t . The training set is used to build the forecasting model after the WT step. The WT has two components, namely approximation, and detailed series. The approximation series captures the low-frequency features, while the detailed series captures the high-frequency features of the original data. The approximation series is a further decomposed process with WT. The high-frequency noise representing fluctuations and irregularities is extracted and filtered []. DWT consists of two basic components of wavelet functions, namely father wavelet φ and mother wavelet ψ as shown in Equations (9) and (10) [].
φ j , k ( x ) = 2 j 2 φ ( 2 j t k ) ,
ψ j , k ( x ) = 2 j 2 ψ ( 2 j t k ) ,
where j = 1 , , J is the scaling parameter on the decomposition of each j th level. The k is the translation parameter. The approximation series is transformed by the father wavelet from the original data. The detailed series is transformed by the mother wavelet. The detailed series deals with oscillations of length 2 j 2 j + 1 . The approximation and detail series are modeled in Equations (11) and (12).
D j , t = y ( x ) φ j , t ( x ) dx ,
A j , t = y ( x ) ψ j , t ( x ) dx
Finally, the time-series data on electricity consumptions consists of approximation and detail series as represented in Equation (13).
y ( x ) = A j ( x ) + D j ( x ) + D j 1 ( x ) + + D 1 ( x ) ,
2.
Ensemble Empirical Mode Decomposition
The decomposition using EEMD is conducted as follows [,]:
(1)
Add the White Gaussian Noise series ε j ( t ) into the train set x t r a i n ( t )   and become the new series of x t r a i n   j ( t ) .
(2)
Decomposed the x t r a i n   j ( t )   into several IMFs c j ( t ) ,   j = 1 , 2 , , n t r a i n and a residue r ( t ) .
(3)
Then, repeated the steps 1 and 2 on each j = 1 , 2 , ,   N E by adding the white Gaussian noise series for every repeated process. N E are the amount of repetition.
(4)
Take the average of all IMFs and the average of the residue as the final results.
(5)
The time series data after EEMD is the sum of all IMFs components and residues as shown in Equation (14).
x t r a i n ( t ) = j = 1 n c j ( t ) + r ( t ) ,

3.2. Forecasting Using LSTM Algorithm

The proposed forecasting framework applies many-to-one RNN structure where each node in the hidden node uses the LSTM structure. Figure 5 illustrates the framework of RNN and the LSTM features.
Figure 5. The RNN and LSTM structures.
The LSTM model consists of several stages []. It starts by determining the information that can pass in the cell state. The decision is controlled by the forget gate for time t , f t , as shown in Equation (15).
f t = σ ( W f x x t + W f h h t 1 + b f ) ,
where x t is the input of time t , h t 1 is the output from the hidden note at time t 1 , W f x is the weight between nodes in the input layer and forget gate, W f h is the weight between nodes in the hidden layer and forget gate, and b f is bias in forget gate.
The next step is determining information that must be entered into the cell state using Equation (16).
i t = σ ( W i x x t + W i h h t 1 + b i ) ,
where the i t is the output from input gate at time t , W i x is the weight between nodes in the input layer and input gate, W i h is the weight between nodes in the hidden layer and input gate, and b i is bias in the input gate.
In addition, the candidate C ˜ t is generated using Equation (17).
C ˜ t = tan h ( W c x x t + W c h h t 1 + b c ) ,
where W c x is the weight between nodes in the input layer to the output cell,   W c h is the weight between nodes in the hidden layer to the output cell, and b c is the bias of the cell.
Forget the unwanted information by multiplying the old cell state C t 1 with f t and adding some new information to the cell state by i t C ˜ t as shown in Equation (18).
C t = f t C t 1 + i t C ˜ t ,
Calculate the final output by compressing the tanh layer with C t then multiplying by the output gate o t as shown by Equations (19) and (20).
o t = σ ( W o x x t + W o h h t 1 + b o ) ,
h t = o t tan h ( C t )

4. Experimental Result

4.1. Dataset Description

This study applies the proposed forecasting framework to six benchmark datasets and one dataset taken from an education building in Taiwan. The six benchmark datasets consist of 16 buildings from different locations with different time zones and building functionalities. The dataset from Taiwan consists of four buildings data. There are 20 buildings studied in this research. These varieties make the data have different patterns and characteristics. The benchmark dataset consists of industrial buildings, education, commercial building, government, and residential. These data are available at https://github.com/buds-lab/the-building-data-genome-project [] (accessed on 13 March 2021) and http://traces.cs.umass.edu/index.php/Smart/Smart [] (accessed on 13 March 2021). The data were initially collected by using a building electricity meter []. First, the data are used to analyze the building in question. Then the data are extracted every certain period, one of which is the analysis of the energy system of the building []. The unit is a kilowatt-hour (kWh), and each dataset is distinguished based on the level of consumption. Table 2 shows the data specifications. The data are available as hourly data. The forecasting aims to predict one hour. In managing electricity consumption in a building, time hour interval is enough to set the devices.
Table 2. Dataset specifications.

4.2. Statistic Description Analysis

Data exploration using a statistics description approach is a simple yet effective way to analyze the data concentration and distribution. Table 3 summarizes the statistics descriptive of the data. Table 3 represents the level of energy consumption of each building. The average value shows that dataset 5, the education industry, has very high consumption because it is used for residential in the University, multipurpose buildings, laboratories, classrooms, and offices. Multipurpose and residential buildings in the University have the highest average energy consumption. On the contrary, the apartment and primary/secondary class buildings have the lowest energy consumption.
Table 3. Descriptive statistics.
Furthermore, a data stationarity test is carried out to show whether each series is stationary or non-stationary. This study performs the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test and the augmented Dickey-Fuller (ADF) test for the stationarity test. The KPSS test is a unit root test for the stationarity of data on a trend. It is often referred to as a trend-stationary test []. The ADF test is a statistical test that is more popular than the KPSS test []. The ADF test has the same function as the KPSS test, but it tests whether the data are linear or the difference is stationary. The difference between trend or difference stationary is that the stationary trend has a deterministic average, or the shocks caused to the data will return to its path. While difference stationary has a stochastic average property that has permanent shocks, it has the weakest stationary nature of the data [].
The hypothesis for the KPSS test is as follows:
H 0 —The data are stationary (trend stationary)
H 1 —The data have a unit root. Thus, the data are non-stationary
Table 4 shows the results of the KPSS test, which consists of the p-value and the critical value using four different settings. If the p-value < 0.05, then we reject H 0 , which means the data are non-stationary. Otherwise, the data are stationary. Based on the KPPS test, Office_Evelyn, UnivLab_Susan, and PrimClass_Umar are stationary.
Table 4. Results of KPSS and ADF test.
On the other hand, the hypothesis for the SDF test is as follows:
H 0 —the data have a unit root. Thus, the data are non-stationary
H 1 —the data are stationary (difference stationary)
Unlike the KPSS test, the hypothesis of the ADF test is the opposite. If the p-value < 0.05, then reject H 0 which means the data are stationary (difference stationary). Otherwise, the data are non-stationary. The ADF test result in Table 4 shows that only UnivDorm_Malachi is non-stationary.
To see the level of the stationarity of data based on the KPSS and ADF tests, this study refers to the possible outputs generated based on:
  • Case 1: KPSS and ADF tests are non-stationary, then the series is non-stationary.
  • Case 2: KPSS and ADF tests are stationary, so the series is stationary.
  • Case 3: KPSS test is stationary, and ADF is non-stationary, then the series is trend-stationary. The treatment must be done by eliminating the trend to make the series stationary.
  • Case 4: KPSS is non-stationary, and ADF is stationary, so the series the difference stationary. Differentiation is used to create a stationary series.
The combined result of the KPSS and ADF tests in Table 4 reveals that UnivDorm_Malachi is non-stationary. The Office_Evelyn, UnivLab_Susan, and PrimeClass_Umar are stationary. The other series is the difference stationary category (weak), which is the lowest stationary category.
Data visualization in Figure 6 presents the pattern of each series. The charts are available in a span of one year of hourly data resolution. Figure 6 shows that several buildings have similarities in data patterns. Some buildings have similar consumption patterns as Apt_Moon and Apt_Phobos buildings, but both buildings have very low consumption levels. In addition, UnivDorm_Malachi, as the only non-stationary series, has an uneven consumption pattern. Some buildings have stable consumption patterns such as Office_Bobbi, Office_Evelyn, UnivClass_Boyd, UnivLab_Bethany, UnivDorm_Mitch, Office_Glenn, Office_Stella, UnivClass_Seb, UnivLab_Susan, UnivMulti_Ares, UnivMulti_Zeus, PrimClass_Ulysses, PrimClass_Ulysses. Buildings UnivDorm_Athena, UnivExhibit_Hermes, UnivMulti_Ares, UnivMulti_Zeus have a very high consumption rate and the highest among other buildings.
Figure 6. Energy consumption pattern in each building.

4.3. Parameter Setting

This study applies the proposed forecasting framework to predict one hour ahead data based on one-year data for each building. Herein, the data period is hourly. In the forecasting process, the data are divided into training, validation, and testing data. Training data are 70% of the data population of each building, validation data are 20%, and testing data are 10%. The data splitting process is carried out sequentially, accounting for 70% of the first data for training. Then, after 70% of the training data, 20% of the data is used for validation, and the remaining 10% of the data for testing data is calculated after 20% of the validation data until the last data. The validation is the result of the splitting of data testing carried out to overcome overfitting in the data, and the model does not understand the character of the data [].
The LSTM involves several parameters, including the number of hidden nodes, lookback periods, activation function, and recurrent activation. To get the best parameter setting for the forecasting model, this study evaluates several parameter settings as listed in Table 5. Each parameter set is used for all buildings, with five replications for each building.
Table 5. Trial of parameter combinations.
Table 6 shows the best parameter setting for LSTM. This parameter is used of forecasting without decomposition, with EEMD-LSTM, and WT-LSTM. According to Table 6, the forecasting is conducted using 24 h of look back data. It means that predicting the next one hour requires the previous 24 h of historical data. The number of hidden nodes indicates the number of nodes contained in the hidden layers of the LSTM network. The optimizer is Nadam. Nadam is one of the adaptive movement estimations (Adam) that added momentum to the Nesterov accelerated gradient (NAG). This optimizer works for the gradient descent process in the process of finding the minimum function by increasing the speed of convergence and the quality of the studied model. Epoch is number of iterations used for the forward and backward process on developing the model based on the training dataset. Activation is used in the cell state of the LSTM structure, while recurrent activation is used for the gates in the LSTM including input gate, forget gate and output gate.
Table 6. Best parameters setting for the LSTM.
The forecasting model with EEMD-LSTM involves more parameters. They are the noise width (white noise’s standard deviation) and ensemble member (trial). This study uses 0.05 as the noise width and 100 for the ensemble member []. Furthermore, the WT-LSTM is set using a single level of decomposition, symmetric signal extension model, and Daubechies (db1) wavelet object.

4.4. Forecasting Results Analysis

To evaluate the robustness of the proposed forecasting framework, each forecasting method is run ten times for each data series. This study evaluates the results based on the mean absolute percentage error (MAPE), mean squared error (MSE), and mean absolute error (MAE). These parameters are calculated using Equations (21)–(23).
M A P E = 1 n i = 1 n | y i y ^ i y i |
M S E = i = 1 n ( y i y ^ i ) 2 n  
M A E = i = 1 n | y i y ^ i | n
where y i is the actual value of the period i , y ^ i is the predicted value of the period i , and n is the number of periods. To evaluate the performance of the LSTM, this study also compares the LSTM with the recurrent neural network (RNN) algorithm.
Table 7, Table 8, Table 9 and Table 10 shows the average and standard deviation value of the results. Table 11 shows the comparison of the three approaches based on the MAPE value. These results reveal that the eight different buildings, including office buildings, lecture laboratories, lecture classrooms, dormitory or residential in university areas, apartments, multipurpose, and secondary classrooms, have very error rates. The training error of the EEMD-LSTM method has the lowest average MAPE value compared to the LSTM and WT-LSTM. It is followed by the testing error of the EEMD-LSTM, which is still the lowest among the other methods. It can be caused by the decomposition method itself, which helps in the denoising process of the time-series data used.
Table 7. The MAPE, MSE, and MAE of forecasting using LSTM without decomposition.
Table 8. The MAPE, MSE, and MAE of forecasting using RNN without decomposition.
Table 9. The MAPE, MSE, and MAE of forecasting using EEMD-LSTM model.
Table 10. The MAPE, MSE, and MAE of forecasting using WT-LSTM model.
Table 11. Comparison of algorithms based on MAPE (%).
Figure 7 shows the comparison between the actual and predicted value of the energy consumption. It reveals that for most of the buildings, the predicted value is very close to the actual value. Furthermore, the standard deviation of the EEMD-LSTM algorithm shown in Table 9 is relatively small. It indicates that the EEMD-LSTM algorithm has a stable result.
Figure 7. Comparison of the Actual and Predicted Value.
Furthermore, this study also applies a time-series cross-validation to evaluate the forecasting result. Figure 8 shows the framework of the time-series cross-validation applied in this study. Table 12, Table 13, Table 14 and Table 15 summarize the results of the time-series cross-validation using five-fold cross-validation. In addition, Table 16 shows the comparison of all algorithms. These tables show that the EEMD-LSTM algorithm obtains the best result for most of the buildings in terms of the average of MAPE, MSE, and MAE.
Figure 8. Framework for the time-series cross-validation.
Table 12. The cross-validation result of forecasting using LSTM without decomposition.
Table 13. The cross-validation result of forecasting using RNN without decomposition.
Table 14. The cross-validation result of forecasting using EEMD-LSTM.
Table 15. The cross-validation result of forecasting using WT-LSTM.
Table 16. Comparison of algorithms based on MAPE (%) using time series cross-validation.
The EEMD decomposition is carried out to the maximum decomposition based on the amount of data used on its implementation. Meanwhile, the LSTM method has the lower error rates than the WT-LSTM because the wavelet object selection uses Daubechies 1 (db1). The db1 is the same as the oldest wavelet object, namely Haar. This object is the most commonly used object to simplify the implementation, and db1 only has two filters. Each filter is used to catch the low pass filter and high pass filter. The low pass filter is used to decompose the signal that is close to the original data. It is called the approximation series. The high pass filter contains noise named as detail series. The decomposition level used single or one level. Therefore, it only produces an approximation series and a detailed series which causes the signal decomposition to be less smooth and results in a fairly high error compared to the LSTM.
Based on the forecasting results, the apartments buildings consisting of Apt_Moon and Apt_Phobos have the highest error percentage rates in training, validation, and testing. This is caused by the raw data of the two buildings having an extreme slope, touching below 1 kWh. Those two buildings have the lowest level of consumption among the other two buildings in the same category. The uneven distribution of the raw data causes high errors during the forecasting process. Both buildings have significantly different data ranges in several periods. For several periods, the data are relatively stable, while at other periods significantly decreases. The forecasting model is generated based on the training set, which is relatively stable data, while the validation and testing set has a significantly different pattern. Therefore, the validation and testing errors are significantly higher than the training error. The result of five-fold cross-validation also shows that the average MAPE of the third fold has the highest value. At the third fold, the training and test set has a significantly different pattern as illustrated in Figure 9.
Figure 9. The pattern differences between training, validation, and testing set for Apt_moon dataset.
Buildings such as college classrooms and laboratories have more stable forecasting results with small errors than other buildings, followed by offices. Another building that has a high error is the primary/secondary classroom. It is caused by the fairly high distribution of data based on the buildings in that category. Stationary buildings such as UnivClass_Boyd, Office_Stella and PrimClass_Uma tend to have small and stable errors based on data stationarity. The UnivClass_Boyd has the smallest error among other buildings. Likewise, UnivDorm_Malachi is the only non-stationary building that has very small and stable errors in training, validation, and testing.
Further comparison is conducted using statistics Levene’s test and T-test to evaluate how significant the differences are between the LSTM without decomposition and with decomposition. The statistics test uses MAPE results as the data. This study runs the statistical tests using SPSS.
Levene’s test checks whether the variance of the two samples is the same or not with the following hypothesis below.
H 0 —both samples have the same variance
H 1 —both samples have different variances
If the p-value (sig.) is less than 0.05, then reject H 0 meaning that the results from the two methods have different variances. Otherwise, they have the same variances.
The T-test compares the results based on their means with the following hypothesis.
H 0 —both samples have the similar means
H 1 —both samples have different means
If the p-value (sig.) is less than 0.05, then reject H 0 meaning that the results from the two methods are significantly different. Otherwise, their results are not significantly different.
The statistics test results summarized in Table 17 reveal that the LSTM and EEMD-LSTM methods have significantly different variance and average. However, LSTM and WT-LSTM do not significantly differ in terms of the variance and average. Furthermore, the results of EEMD-LSTM and WT-LSTM are significantly different in terms of variance and average.
Table 17. Statistics’ test results.
The test results conclude that the EEMD-LSTM method has a significant difference compared with the LSTM and WT-LSTM methods. On the other hand, the LSTM method is not significantly different from the WT-LSTM method. This conclusion is also supported by the comparison of the average MAPE Test for each method, where the MAPE test for the EEMD-LSTM method tends to be lower. However, the MAPE of LSTM and WT-LSTM methods are not significantly different. The results also reveal that the EEMD-LSTM method obtains better results than the LSTM and WT-LSTM methods.
Based on the previous studies conducted by Imani and Ghassemian [] and Fang, Gao, and Ruan [], the error of LSTM without decomposition is between 3% to 16%. Meanwhile, for this research, the forecasting results using the LSTM method has an average error of 15.3%. The results show that EEMD has successfully improved the forecasting accuracy of the LSTM method.
Finally, this research also provides the recommendation of the forecasting method for each building, as listed in Table 18. However, for the APT-Moon and APT_Phobos, other algorithms should be considered because the MAPE of the EEMD-LSTM is still above 20%.
Table 18. Recommendation of the forecasting method.

5. Conclusions

Energy consumption is predicted to increase continuously. Building as a facility where most of the human activities happen consumes energy, especially electricity. Without proper energy management, energy consumption of a building might be inefficient. On the other hand, managing energy consumption needs information on the future energy demand. Therefore, this study proposes a forecasting framework to predict the energy consumption of a building.
The proposed forecasting framework is developed based on LSTM with decomposition methods to improve the performance of the LSTM. There are three methods evaluated in this study. They are the LSTM without decomposition, LSTM with EEMD as the decomposition, and the LSTM with WT as the decomposition method. The LSTM without decomposition is the baseline for both EEMD-LSTM and WT-LSTM.
This study applies the proposed forecasting methods on 20 series of energy consumption data from 20 different buildings located in many different countries and different functions. The experiment shows that the EEMD-LSTM method obtains the smallest error for 85% of the series. While 15% of the series obtained the smallest error when using the LSTM method. The average error for LSTM, EEMD-LSTM and WT-LSTM are 15.31%, 7.57%, and 16.36%, respectively. The statistics T-test also reveals that the EEMD-LSTM method is significantly better than the other two methods.
Meanwhile, the results of the LSTM and WT-LSTM are not significantly different. The EEMD and WT methods have significantly different values because the EEMD decomposed the data based on the sample data used. On the other hand, the WT decomposition only uses a single-level decomposition, and the object wavelets only have two filters. Further study should increase the decomposition level and the number of filters to improve the performance of WT. Other hybrid forecasting methods and differences also should be considered. In addition to the forecasting algorithm, further research should consider more datasets from different types and locations of the building to get a more general forecasting framework.

Author Contributions

Conceptualization, A.D. and S.-Y.C.; methodology, F.E.Z.; software, A.D. and M.F.; validation, A.D. and F.E.Z.; formal analysis, A.D. and F.E.Z.; investigation, A.D., S-Y. Chou, and F.E.Z.; resources, S.-Y.C.; data curation, A.D.; writing, A.D., F.E.Z. and M.F.; visualization, M.F.; supervision, S-Y. Chou; project administration, S-Y. Chou; funding acquisition, S-Y. Chou. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded in part by the Taiwan Building Technology Center, National Taiwan University of Science and Technology, from the Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education in Taiwan under the grant 110P011.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Benchmark data are available on https://github.com/buds-lab/the-building-data-genome-project [] (accessed on 13 March 2021) and http://traces.cs.umass.edu/index.php/Smart/Smart [] (accessed on 13 March 2021).

Acknowledgments

This work was supported in part by the Taiwan Building Technology Center, National Taiwan University of Science and Technology, from the Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education in Taiwan under the grant 110P011.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cao, X.; Dai, X.; Liu, J. Building energy-consumption status worldwide and the state-of-the-art technologies for zero-energy buildings during the past decade. Energy Build. 2016, 128, 198–213. [Google Scholar] [CrossRef]
  2. Saidur, R.; Masjuki, H.H.; Jamaluddin, M.Y. An application of energy and exergy analysis in residential sector of Malaysia. Energy Policy 2007, 35, 1050–1063. [Google Scholar] [CrossRef]
  3. Allouhi, A.; El Fouih, Y.; Kousksou, T.; Jamil, A.; Zeraouli, Y.; Mourad, Y. Energy consumption and efficiency in buildings: Current status and future trends. J. Clean. Prod. 2015, 109, 118–130. [Google Scholar] [CrossRef]
  4. EIA. International Energy Outlook 2019; U.S. Energy Information Administration (EIA): Washington, DC, USA, 2019.
  5. Bedi, J.; Toshniwal, D. Empirical mode decomposition based deep learning for electricity demand forecasting. IEEE Access 2018, 6, 49144–49156. [Google Scholar] [CrossRef]
  6. Ahmad, T.; Chen, H. A review on machine learning forecasting growth trends and their real-time applications in different energy systems. Sustain. Cities Soc. 2020, 54, 102010. [Google Scholar] [CrossRef]
  7. Kontokosta, C.E.; Tull, C. A data-driven predictive model of city-scale energy use in buildings. Appl. Energy 2017, 197, 303–317. [Google Scholar] [CrossRef] [Green Version]
  8. Hong, W.-C.; Dong, Y.; Zhang, W.Y.; Chen, L.-Y.; Panigrahi, B.K. Cyclic electric load forecasting by seasonal SVR with chaotic genetic algorithm. Int. J. Electr. Power Energy Syst. 2013, 44, 604–614. [Google Scholar] [CrossRef]
  9. Kneifel, J.; Webb, D. Predicting energy performance of a net-zero energy building: A statistical approach. Appl. Energy 2016, 178, 468–483. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Barhmi, S.; Elfatni, O.; Belhaj, I. Forecasting of wind speed using multiple linear regression and artificial neural networks. Energy Syst. 2019, 11, 935–946. [Google Scholar] [CrossRef]
  11. Montgomery, D.; Jennings, C.; Kulahci, M. Introduction to Time Series Analysis and Forecasting; John Wiley & Sons: Hoboken, NJ, USA, 2008; p. 472. [Google Scholar]
  12. Mocanu, E.; Nguyen, P.H.; Gibescu, M.; Kling, W.L. Deep learning for estimating building energy consumption. Sustain. Energy Netw. 2016, 6, 91–99. [Google Scholar] [CrossRef]
  13. Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on lstm recurrent neural network. IEEE Trans. Smart Grid 2019, 10, 841–885. [Google Scholar] [CrossRef]
  14. Fang, C.; Gao, Y.; Ruan, Y. Improving forecasting accuracy of daily energy consumption of office building using time series analysis based on wavelet transform decomposition. IOP Conf. Ser. Earth Environ. Sci. 2019, 294, 012031. [Google Scholar] [CrossRef] [Green Version]
  15. Peng, L.; Wang, L.; Xia, D.; Gao, Q.J.E. Effective energy consumption forecasting using empirical wavelet transform and long short-term memory. Energy 2022, 238, 121756. [Google Scholar] [CrossRef]
  16. Gao, Y.; Ruan, Y.J.E. Buildings, interpretable deep learning model for building energy consumption prediction based on attention mechanism. Energy Build. 2021, 252, 111379. [Google Scholar] [CrossRef]
  17. Somu, N.; MR, G.R.; Ramamritham, K.J.A.E. A hybrid model for building energy consumption forecasting using long short term memory networks. Appl. Energy 2020, 261, 114131. [Google Scholar] [CrossRef]
  18. Runge, J.; Zmeureanu, R. Forecasting energy use in buildings using artificial neural networks: A review. Energies 2019, 12, 3254. [Google Scholar] [CrossRef] [Green Version]
  19. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  20. Koprinska, I.; Rana, M.; Agelidis, V.G. Correlation and instance based feature selection for electricity load forecasting. Knowl.-Based Sys. 2015, 82, 29–40. [Google Scholar] [CrossRef]
  21. Friedrich, L.; Afshari, A. Short-term forecasting of the abu dhabi electricity load using multiple weather variables. Energy Procedia 2015, 75, 3014–3026. [Google Scholar] [CrossRef] [Green Version]
  22. Deb, C.; Zhang, F.; Yang, J.; Lee, S.E.; Shah, K.W. A review on time series forecasting techniques for building energy consumption. Renew. Sustain. Energy Rev. 2017, 74, 902–924. [Google Scholar] [CrossRef]
  23. Bourdeau, M.; Zhai, X.; Nefzaoui, E.; Guo, X.; Chatellier, P. Modeling and forecasting building energy consumption: A review of data-driven techniques. Sustain. Cities Soc. 2019, 48, 101533. [Google Scholar] [CrossRef]
  24. Mandic, D.P.; Chambers, J.A. Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability; Wiley: Hoboken, NJ, USA, 2001. [Google Scholar]
  25. Kumar, D.N.; Raju, K.S.; Sathish, T. River flow forecasting using recurrent neural networks. Water Resour. Manag. 2004, 18, 143–161. [Google Scholar] [CrossRef]
  26. Hua, Y.; Zhao, Z.; Li, R.; Chen, X.; Liu, Z.; Zhang, H. Deep learning with long short-term memory for time series prediction. IEEE Commun Mag. 2019, 57, 114–119. [Google Scholar] [CrossRef] [Green Version]
  27. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  28. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  29. Aungiers, J. LSTM Neural Network for Time Series Prediction. Available online: https://www.jakob-aungiers.com/articles/a/LSTM-Neural-Network-for-Time-Series-Prediction (accessed on 13 March 2021).
  30. Wu, Y.-X.; Wu, Q.-B.; Zhu, J.-Q. Improved EEMD-based crude oil price forecasting using LSTM networks. Phys. A Stat. Mech. its Appl. 2019, 516, 114–124. [Google Scholar] [CrossRef]
  31. Xu, Y.; Zhou, C.; Geng, J.; Gao, S.; Wang, P. A method for diagnosing mechanical faults of on-load tap changer based on ensemble empirical mode decomposition, volterra model and decision acyclic graph support vector machine. IEEE Access 2019, 7, 84803–84816. [Google Scholar] [CrossRef]
  32. Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
  33. Huang, N.; Shen, Z.; Long, S. A new view of non-linear water waves: The hilbert spectrum. Annu. Rev. Fluid Mech. 2003, 31, 417–457. [Google Scholar] [CrossRef] [Green Version]
  34. Zhang, X.; Lai, K.K.; Wang, S.-Y. A new approach for crude oil price analysis based on empirical mode decomposition. Energy Econ. 2008, 30, 905–918. [Google Scholar] [CrossRef]
  35. Terzija, N. Robust Digital Image Watermarking Algorithms for Copyright Protection. Ph.D. Thesis, University of Duisburg-Essen, Duisburg, Germany, 2006. [Google Scholar]
  36. Sripathi, D. Efficient Implementations of Discrete Wavelet Transforms using FPGAs; Florida State University, Florida, USA. 2003. Available online: http://purl.flvc.org/fsu/fd/FSU_migr_etd-1599 (accessed on 18 July 2021).
  37. Sugiartawan, P.; Pulungan, R.; Kartika, A. Prediction by a hybrid of wavelet transform and long-short-term-memory neural networks. Int. J. Adv. Comput. Sci. Appl. 2017, 8, e0142064. [Google Scholar] [CrossRef] [Green Version]
  38. Jin, J.; Kim, J. Forecasting natural gas prices using wavelets, time series, and artificial neural networks. PLoS ONE 2015, 10, e0142064. [Google Scholar] [CrossRef]
  39. Chang, Z.; Zhang, Y.; Chen, W. Electricity price prediction based on hybrid model of adam optimized LSTM neural network and wavelet transform. Energy 2019, 187, 115804. [Google Scholar] [CrossRef]
  40. Liu, Y.; Guan, L.; Hou, C.; Han, H.; Liu, Z.; Sun, Y.; Zheng, M. Wind Power Short-Term Prediction Based on LSTM and Discrete Wavelet Transform. Appl. Sci. 2019, 9, 1108. [Google Scholar] [CrossRef] [Green Version]
  41. Wang, F.; Yu, Y.; Zhang, Z.; Li, J.; Zhen, Z.; Li, K. Wavelet decomposition and convolutional lstm networks based improved deep learning model for solar irradiance forecasting. Appl. Sci. 2018, 8, 1286. [Google Scholar] [CrossRef] [Green Version]
  42. Qin, Q.; Lai, X.; Zou, J. Direct multistep wind speed forecasting using lstm neural network combining eemd and fuzzy entropy. Appl. Sci. 2019, 9, 126. [Google Scholar] [CrossRef] [Green Version]
  43. Liu, W.; Liu, W.D.; Gu, J. Forecasting oil production using ensemble empirical model decomposition based long short-term memory neural network. J. Pet. Sci. Eng. 2020, 189, 107013. [Google Scholar] [CrossRef]
  44. Miller, C.; Meggers, F. The building data genome project: An open, public data set from non-residential building electrical meters. Energy Procedia 2017, 122, 439–444. [Google Scholar] [CrossRef]
  45. Barker, S.; Mishra, A.; Irwin, D.; Cecchet, E.; Shenoy, P.; Albrecht, J. Smart*: An open data set and tools for enabling research in sustainable homes. SustKDD 2012, 111, 108. [Google Scholar]
  46. Kwiatkowski, D.; Phillips, P.C.; Schmidt, P.; Shin, Y. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J. Econ. 1992, 54, 159–178. [Google Scholar] [CrossRef]
  47. Singh, A.A. Gentle Introduction to Handling a Non-Stationary Time Series in Python. Available online: https://www.analyticsvidhya.com/blog/2018/09/non-stationary-time-series-python/# (accessed on 18 July 2021).
  48. Mueller, J.P.; Massaron, L. Machine Learning for Dummies; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
  49. Imani, M.; Ghassemian, H. Lagged load wavelet decomposition and lstm networks for short-term load forecasting. In Proceedings of the 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA), Tehran, Iran, 6–7 March 2019; pp. 6–12. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.