Next Article in Journal
Multiscale YOLOv5-AFAM-Based Infrared Dim-Small-Target Detection
Next Article in Special Issue
TA-DARTS: Temperature Annealing of Discrete Operator Distribution for Effective Differential Architecture Search
Previous Article in Journal
Generalized Spoof Detection and Incremental Algorithm Recognition for Voice Spoofing
Previous Article in Special Issue
Multiple Object Tracking Using Re-Identification Model with Attention Module
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Commercial Building Energy Consumption Using a Multivariate Multilayered Long-Short Term Memory Time-Series Model

by
Tan Ngoc Dinh
*,
Gokul Sidarth Thirunavukkarasu
,
Mehdi Seyedmahmoudian
*,
Saad Mekhilef
and
Alex Stojcevski
School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Hawthorn, VIC 3122, Australia
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2023, 13(13), 7775; https://doi.org/10.3390/app13137775
Submission received: 30 May 2023 / Revised: 18 June 2023 / Accepted: 21 June 2023 / Published: 30 June 2023
(This article belongs to the Special Issue Recent Advances in Automated Machine Learning)

Abstract

:
The global demand for energy has been steadily increasing due to population growth, urbanization, and industrialization. Numerous researchers worldwide are striving to create precise forecasting models for predicting energy consumption to manage supply and demand effectively. In this research, a time-series forecasting model based on multivariate multilayered long short-term memory (LSTM) is proposed for forecasting energy consumption and tested using data obtained from commercial buildings in Melbourne, Australia: the Advanced Technologies Center, Advanced Manufacturing and Design Center, and Knox Innovation, Opportunity, and Sustainability Center buildings. This research specifically identifies the best forecasting method for subtropical conditions and evaluates its performance by comparing it with the most commonly used methods at present, including LSTM, bidirectional LSTM, and linear regression. The proposed multivariate, multilayered LSTM model was assessed by comparing mean average error (MAE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE) values with and without labeled time. Results indicate that the proposed model exhibits optimal performance with improved precision and accuracy. Specifically, the proposed LSTM model achieved a decrease in MAE of 30%, RMSE of 25%, and MAPE of 20% compared with the LSTM method. Moreover, it outperformed the bidirectional LSTM method with a reduction in MAE of 10%, RMSE of 20%, and MAPE of 18%. Furthermore, the proposed model surpassed linear regression with a decrease in MAE by 2%, RMSE by 7%, and MAPE by 10%.These findings highlight the significant performance increase achieved by the proposed multivariate multilayered LSTM model in energy consumption forecasting.

1. Introduction

Energy consumption refers to the amount of energy used over a certain period of time, typically measured in kilowatt-hours (kWh) or British thermal units (BTUs). It is a crucial metric for evaluating energy usage, efficiency, and understanding the energy requirements of a specific region or country [1]. Urban mining, which involves extracting valuable minerals from waste or secondary resources, is emphasized as a sustainable solution. Various secondary resources, such as biomass, desalination water, sewage sludge, phosphogypsum, and e-waste, have been evaluated for their market potential and elemental composition, showcasing their potential to partially replace minerals in different sectors. The review also discusses technological advancements in mineral extraction from waste, emphasizing the importance of improving processes for large-scale implementation [2]. In the context of renewable energy, biofuels are identified as a potential alternative for the transportation industry, and research on utilizing surplus rice straw as a feedstock for biofuels is explored. While biofuels have the potential to reduce emissions, further research is needed to address concerns regarding food security, feedstock selection, and their impact on climate and human health [3].
According to the International Energy Agency (IEA) [4], global energy consumption is expected to continue to increase in the coming decades, driven by population growth, urbanization, and industrialization in developing countries. However, most of the current research has focused on forecasting for countries or regions [5,6]. There are only a limited number of studies that specifically focus on forecasting for individual buildings. Energy consumption in buildings encompasses the precise measurement of energy utilized for specific purposes such as heating, cooling, lighting, and other essential functions within residential, commercial, and institutional structures. In China and India, buildings account for 37% [7] and 35% [8] of global energy consumption, making it an important area for energy efficiency and sustainability efforts [9].
In this study, we propose a forecasting model for energy consumption for various commercial buildings, such as Hawthorn Campus—ATC Building, Hawthorn Campus—AMDC Building, Wantirna Campus—KIOSC Building. In the study, the proposed model, along with its trained data preprocessing method, demonstrated superior performance compared with other popular models, particularly when there is a sufficient amount of training data available or when there is a lack of training data. The results indicate that the proposed method and model can be used to accurately predict energy consumption in commercial buildings, which is crucial for energy management and conservation. In addition, the proposed method can be easily applied to other commercial buildings with similar energy consumption patterns, providing a practical solution for energy management in the commercial building sector.
The rest of this study is organized as follows. Section 2 presents background knowledge by reviewing the existing research on forecasting models and their adoption for energy consumption forecasting. Section 3 introduces the available data and the configuration of the experiment. Section 2 provides the background of the research. Section 4 provides the details of the proposed model. Section 5 describes some popular bench-marking models. Section 6 presents the experiment to evaluate the efficiency of the models on different datasets. Finally, the conclusion of this study is given in Section 7.

2. Background

2.1. Forecasting Models

A forecasting model is a mathematical algorithm or statistical tool used to predict future trends, values, or events based on historical data and patterns. A forecasting model M that analyzes the historical values from the current time t to return the predicted future value at time t + 1 (denoted as y ^ t + 1 ) is built. The objective of the forecasting model is to minimize the discrepancy between the estimated value y ^ t + 1 and the actual value y t + 1 by seeking the closest approximation. To achieve this in a temporal context where data points are indexed in time order, a specific type of forecasting model is often used.
A time series forecasting model is a predictive algorithm that utilizes historical time-series data to anticipate future trends or patterns in the data over time. Two techniques are available for building the model M to obtain this objective, i.e., (i) univariate and (ii) multivariate time-series (TS) [10,11,12]. In univariate TS, only a 1D sequence of energy consumption value y t = { y t ( k 1 ) , y t ( k 2 ) , , y t 1 , y t } is utilized to produce the estimated value y ^ t + 1 , where k is a period of time from the current time t [13,14]. By contrast, for multivariate TS, we could employ one or more other historical features in addition to the energy consumption for training model M [15]. They can be time fields or other specific sources. Therefore, the input for multivariate TS is a multi-dimensional sequence X t = { x t ( k 1 ) , x t ( k 2 ) , , x t 1 , x t } , with x i R n is a vector of dimension n. In forecasting new values, the model could be enhanced if related available features are taken into account. The additional information could help the model capture the dependencies or correlations between features and the target variable. Therefore, the model could better understand the context, mitigate the impact of missing values, and make more precise predictions. Therefore, multivariate TS has been frequently employed for building the forecasting model recently [16].

2.2. Energy Consumption Forecasting

Energy consumption forecasting refers to predicting a particular region’s future energy consumption based on historical consumption and other relevant factors [1]. Accurate energy consumption forecasting is essential for proper energy planning, pricing, and management. It plays a significant role in the transition toward a more sustainable energy future. There have been numerous studies on energy consumption forecasting, and various models have been proposed for this purpose [17]. As shown in Figure 1, in the literature, forecasting models often fall into two categories: (i) Conventional Models and (ii) Artificial Intelligence (AI) Models. Table 1 shows different forecasting models for commercial buildings. It shows the names of forecasting techniques, locations, and performance evaluation indexes with the best accuracy.
Conventional models used in energy consumption forecasting commonly include Stochastic time series (TS) models [30], regression models (RMs) [31], and gray models (GMs) [32,33]. These models typically require historical energy consumption data as input and use various statistical and mathematical techniques to make future energy consumption predictions. However, these models may not be able to capture complex nonlinear relationships and may require manual feature engineering, making them less efficient and scalable compared with AI-based models. AI-based models have become increasingly popular in the field of energy forecasting due to their ability to learn patterns and relationships in complex data [34,35,36]. The use of AI in energy forecasting has the potential to reduce energy costs, optimize energy production, and enhance energy security. Furthermore, LSTM models are able to capture both short-term and long-term dependencies in TS data. They are also capable of handling non-linear relationships between input and output variables, which is important in energy forecasting, where the relationships may be complex. Finally, LSTMs could process sequential data of varying lengths, which is useful for handling variable-length TS data in energy forecasting.

3. Data Used in This Study

3.1. Data Collection

Data collected from three buildings in different regions is employed to evaluate the models, i.e., Hawthorn Campus—ATC Building (denoted as Dataset S 1 ), Hawthorn Campus—AMDC Building (denoted as Dataset S 2 ), and Wantirna Campus—KIOSC Building (denoted as Dataset S 3 ) are incredibly valuable as it provides real-time insights into the building’s performance, energy consumption, and operational efficiencies, allowing for a swift response to potential issues. This real-time data not only enhance decision-making capabilities for building management, maintenance, and optimization but also provides a basis for developing more accurate forecasting models. Furthermore, it can guide strategic energy management, potentially leading to significant cost savings, improved sustainability, and increased occupant comfort over time. The two datasets Dataset S 1 and Dataset S 2 contain the energy consumption from 2017 to 2019, and the dataset Dataset S 3 contains the energy consumption from 2018 to 2019. The prediction value is the difference between the previous and the intermediate next time in using energy, or the cumulation of energy. The historical value is 96 data points every 15 min to predict the next value. Figure 2 Indicates the location of the buildings from the Hawthorn Campus and the Wantrina Campus in the context of Melbourne and Figure 3 and Figure 4 indicate the Electricity accumulation in every 15 min and one hour for the Hawthorn campus (ATC building).

3.2. Data Setup

Dataset. In this experiment, three datasets from three buildings in different regions are employed to evaluate the models, i.e., Hawthorn Campus—ATC Building (denoted as Dataset S 1 ), Hawthorn Campus—AMDC Building (denoted as Dataset S 2 ), and Wantirna Campus—KIOSC Building (denoted as Dataset S 3 ). The two datasets Dataset S 1 and Dataset S 2 contain the energy consumption from 2017 to 2019, and the dataset Dataset S 3 contains the energy consumption from 2018 to 2019. The prediction value is the difference between the previous and the intermediate next time in using energy, or the cumulation of energy. The historical value is 96 data points every 15 min to predict the next value.
Configuration of the proposed model. The proposed model, M LSTM , consists of a succession of one input layer, two LSTM layers, and one dense layer at the end. The input layer contains two input types, as described in Section 4.1. In the following, the first LSTM layer wraps eight LSTM units, and the second wraps four units. The last dense layer has one unit for predicting energy consumption.
Configuration of bench-marking models. As mentioned earlier, three competitive models are used for comparison: LSTM, Bi-LSTM, LR, and SVM models. The LSTM model consists of one single layer with one unit, followed by a Dense layer for prediction. The Bi-LSTM consists of one single Bi-Directional LSTM layer of one unit, followed by a Dense layer for prediction as the LSTM model. The LR model trains on one dense layer.
Training Configuration. Both M LSTM , LSTM, Bi-LSTM, LR, and SVM models are trained using the same training set and evaluated on the same test set. In Dataset S 1 and Dataset S 2 , the models are trained on the data from 2017 and 2018. In Dataset S 3 , the models are trained on the data from 2018 to demonstrate the ability of models with a lack of training data.

4. Methodology

In this section, we propose the forecasting model (Multivariate Multilayered LSTM), which is referred to as M LSTM . The overview of the proposed method is illustrated in Figure 5. There are three phases, i.e., data preprocessing, model training, and evaluation.

4.1. Data Preprocessing

Two major techniques are used in the data preprocessing phase, i.e., (i) data differencing and (ii) time labeling. Additionally, there are two other techniques, i.e., (iii) concatenation and (iv) window slicing.
In regard to data differencing, due to a large amount of energy consumption and the fact that the value is not commonly stationary, the difference between the data value and its previous data value is taken into account. By removing these patterns through data differencing, the resulting stationary TS can be more easily modeled and forecasted. This technique is often required for many forecasting models [37].
In time labelling, the amount of energy consumed during peak time points is typically greater than that during non-peak times. Consequently, the labeling convention assigns a value of 1 to peak time and 0 to non-peak time [38]. These labeled time periods serve as valuable features for training the forecasting model. In the concatenation phase, the data from data differencing and labeled time are concatenated and then sliced into vectors with a length of k during the window-slicing phase.

4.2. Forecasting Model—Multivariate Multilayered LSTM

M LSTM is an extension of the LSTM model, which is a type of recurrent neural network (RNN) architecture used for sequential data processing tasks such as TS forecasting. There are h sub-layers In the hidden layer. The M LSTM model consists of multiple LSTM layers, with each layer having its own set of neurons that process the input variables independently. The output of each layer is then fed into the next layer, allowing the model to capture more complex and abstract relationships between the input variables. There are some additional layers, such as the dropout layer, normalization layer, etc., in the hidden part for training the model efficiently.
As shown in Figure 6, the memory cell is responsible for storing information about the long-term dependencies and patterns in the input sequence, while the gates control the flow of information into and out of the cell. The gates are composed of sigmoid neural network layers and a point-wise multiplication operation, that allows the network to learn which information to keep or discard. In particular, each ith cell at the layer m, denoted as C i m , has three inputs, i.e., the hidden state h i 1 m , cell state c i 1 m of the previous cell, and the hidden state h i m 1 of the cell i at the previous layer m 1 . The cell C i m has two recurrent features, i.e., hidden state h i m , and cell state c i m . C i m is a mathematical function as Equation (1), that takes three inputs and returns two outputs. Both outputs leave the cell at time i and are fed into that same cell at time i + 1 , and the input sequence x i + 1 is also fed into the cell. In the first layer ( m = 1 ), the hidden state h i m 1 is the input x i .
( h i m , c i m ) = C ( h i 1 m , c i 1 m , h i m 1 )
Inside the ith cell, the previous hidden state h i 1 m and input vector x i are fed into three gates. i.e., input gate ( i g i m ), forget gate ( f g i m ), output gate ( o g i m ). They are s i g m o i d functions ( σ ), each of which produces a scalar value as described in Equations (2)–(4) respectively.
i g i m ( h i 1 m , h i m 1 ) = σ ( w i g , m 1 h i m 1 + w i g , m h i 1 m + b i g )
f g i m ( h i 1 m , h i m 1 ) = σ ( w f g , m 1 h i m 1 + w f g , m h i 1 m + b f g )
o g i m ( h i 1 m , h i m 1 ) = σ ( w o g , m 1 h i m 1 + w o g , m h i 1 m + b o g )
where w i g , m 1 , w h g , m 1 , w o g , m 1 R n and w i g , m , w h g , m , w o g , m , b i g , b f g , b o g R denote weight, which is the parameter that should be updated during the training of the cell. Another scalar function, called the update function (denoted as u g ) has a t a n h activation function as described in Equation (5).
u g i ( h i 1 m , h i m 1 ) = t a n h ( w u g , x h i m 1 + w u g , h h i 1 m + b u g )
where w i g , x R and w u g , h R are further weighted to be learned. The returned cell state ( c i m ) and hidden state ( h i ) are formulated in Equation (6), (7) respectively.
c i m = f g i m · c i 1 m + i g i m · u g i
h i m = o g i m · t a n h ( c i m )
In energy forecasting, loss optimization is one of the important steps to improve the accuracy of the model [39]. One common technique for loss optimization is using the Adam optimizer, which is a stochastic gradient descent optimizer that uses moving averages of the parameters to adapt the learning rate. The Adam optimizer computes individual adaptive learning rates for different parameters from estimates of the first and second moments of the gradients. This makes it suitable for optimizing the loss function in models where there are a large number of parameters. By using the Adam optimizer, the model can efficiently learn and update the weights of the neurons in each time step, resulting in better prediction accuracy. In addition, the loss function formulation of the Adam optimizer used in the proposed model aims to strike a balance between accuracy and robustness, taking into account the unique characteristics of the data and the specific requirements of the forecasting task.

5. Bench-Marking Models

In this study, we compare the proposed model with three well-known models, i.e., linear regression (LR), long-short-term memory (LSTM), bidirectional long-short-term memory (Bi-LSTM), and Support Vector Machine (SVM).

5.1. Linear Regression

LR allows knowing the relationship between the response variable (energy consumption) and the return variables (the other variables). As a causative technique, regression analysis predicts energy demand from one or more reasons (independent variables), which might include things such as the day of the week, energy prices, the availability of housing, or other variables. When there is a clear pattern in the previous forecast data, the LR method is applied. Due to this, its simple application has been used in numerous works related to the prediction of electricity consumption. Bianco V et al. (2009) used a LR model to conduct a study on the projection of Italy’s electricity consumption [40], while Saab C et al. (2001) looked into various univariate modeling approaches to project Lebanon’s monthly electric energy usage [41]. With the help of our statistical model, this research has produced fantastic outcomes.
The LR model works by fitting a line to a set of data points with the goal of minimizing the sum of the squared differences between the predicted and actual values of the dependent variable. The slope of the line represents the relationship between the dependent and independent variables, while the intercept represents the value of the dependent variable when the independent variable is equal to zero. The LR model describes the linear relationship between the previous values y t and the estimated future value y ^ t + 1 , formulated as follows:
y ^ t + 1 = i = t ( k 1 ) t w i · y i

5.2. LSTM

The LSTM technique is a type of Recurrent Neural Network (RNN). The RNNs [42] are capable of processing data sequences, or data that must be read together in a precise order to have meaning, in contrast to standard neural networks. This ability is made possible by the RNNs’ architectural design, which enables them to receive input specific to each instant of time in addition to the value of the activation from the previous instant. Given their ability to preserve data from earlier actions, these earlier temporal instants provide for a certain amount of “memory”. Consequently, they possess a memory cell, which maintains the state throughout time [43]. Figure 7 illustrates an overview of the simple LSTM Model.
As noted in Section 4, the LSTM [44] model has the ability to remove or add information to decide what information needs to go through the network from the cell state [44]. Different from the M LSTM model, the LSTM model has just one LSTM layer, with the input sequence x . Therefore, the hidden state (h) and cell state (c) for the ith LSTM cell are calculated as Equation (9).
( h i , c i ) = C ( h i 1 , c i 1 , x i )
In the experiment, we compare the performance of the proposed model with the univariate and multivariate LSTM models. The univariate LSTM takes the first input (i) described in Section 4.1, and the multivariate LSTM takes both those inputs.

5.3. Bidirectional LSTM

Bi-LSTM is also an RNN. It utilizes information in both the previous and following directions in the training phase [45]. The fundamental principle of the Bi-LSTM model is that it examines a specific sequence from both the front and back. Which uses one LSTM layer for forward processing and the other for backward processing. The network would be able to record the evolution of energy that would power both its history and its future [46]. This bidirectional processing is achieved by duplicating the hidden layers of the LSTM, where one set of layers processes the input sequence in the forward direction and another set of layers processes the input sequence in the reverse direction. As illustrated in Figure 8, the hidden state ( h i f ) and the cell state ( c i f ) in the ith forward LSTM cell are calculated as similar as the Equation (9). On the contrary, each LSTM cell in the backward LSTM takes the following hidden state ( h i + 1 b ), and following cell state ( c i + 1 b ), and x i as input. Therefore, the hidden state ( h i b ) and cell state ( c i b ) of the ith backward LSTM cell are calculated as Equation (10).
( h i b , c i b ) = C ( h i + 1 b , c i + 1 b , x i )
After the calculations of both forward and backward LSTM cells, the hidden states of the two directions could be concatenated or combined in some way to obtain the output. The common combination is the sigmoid function, as noted in Figure 8. The output is fed into the Dense layer to obtain the final prediction. Similar to the LSTM model, we also compare the proposed model with univariate and multivariate Bi-LSTM models.

5.4. Support Vector Machine

Support Vector Machines (SVM) are supervised machine learning models used for both classification and regression tasks. In this paper, SVM with a Radial Basis Function (RBF) kernel is used for the regression task. This kind of SVM model utilizes the RBF kernel to transform the input space into a higher-dimensional feature space, enabling the SVM to learn nonlinear decision boundaries. The formulation of the RBF kernel is defined as Equation (11).
K ( x , x ) = e x p ( γ × | | x x | | 2 )
where x , x are input data points, | | . | | denotes the Euclidean distance between them, and γ is a parameter that controls the width of the Gaussian curve. Higher values of gamma result in more localized and complex decision boundaries. Furthermore, the decision function in SVM with RBF kernel can be represented as Equation (12).
f ( x ) = b + i = 1 D α i × y i × K ( x , x i )
where, x is the input data point, b is the bias term, α i is the Lagrange multiplier associated with the ith support vector, y i is the corresponding class label, K ( x , x i ) is the RBF kernel function, and the summation is performed over all support vectors.
The SVM with RBF kernel formulation aims to find the optimal hyperplane that maximizes the margin between the classes while allowing some misclassifications. The RBF kernel enables the SVM to capture complex, nonlinear patterns in the data by mapping the data to a higher-dimensional feature space. The model is trained by solving the quadratic programming problem to find the Lagrange multipliers ( α i ) and bias term b that define the decision function.

6. Experiment

6.1. Metric

To better evaluate the performance, a model is tested by making a set of predictions y ^ = { y ^ 1 , y ^ 2 , , y ^ D } and then comparing it with a set of known actual values Y = { y 1 , y 2 , , y D } , where D is the size of the test set. Three common metrics are used to compare the overall distance of these two sets, i.e., mean absolute percentage error ( M A P E ), normalized root mean squared error ( N R M S E ), and R-squared score ( R 2 score).
MAPE As shown in Equation (13), M A P E is calculated by taking the absolute difference between the predicted and actual values, dividing it by the actual value, and then taking the average of these values over the entire dataset. This calculation results in a single number that represents the average percentage difference between the predicted and actual values. The smaller the M A P E value, the better the model’s performance
M A P E = 100 % D i = 1 D y ^ i y i y i
NRMSE Normalized Root Mean Squared Error ( N R M S E ) is a metric used to evaluate the accuracy of a prediction model. It measures the normalized average magnitude of the residuals or errors between the predicted values and the actual values, as shown in Equation (14).
N R M S E = ( 1 D ) i = 1 D ( y ^ i y i ) 2 y m a x y m i n
where, y ^ i represents the predicted values, A represents the actual values, and s q r t ( ) denotes the square root function. The term ( y ^ i y i ) 2 calculates the squared residuals or errors between the predicted and actual values. The y m a x and y m a x represent the maximum and minimum values in the actual values, respectively. The smaller the N R M S E value, the better the model’s performance.
R 2 score The R 2 score, also known as the coefficient of determination, is a statistical measure that indicates the proportion of the variance in the dependent variable that is predictable from the independent variables in a regression model. The R 2 score is typically used to evaluate the fitness of a regression model, as formulated in Equation (15).
R 2 = 1 i = 1 D ( y ^ i y i ) 2 i = 1 D ( y i y * ) 2
where y * the mean of the actual values. In essence, the R 2 score is a measure of how well the regression model fits the data and provides an assessment of its predictive performance. A higher R 2 score indicates a better fit and stronger explanatory power of the model.

6.2. Result and Discussion

This study aims to experimentally address the effectiveness of the proposed model by answering the following research inquiries: the general performance of training the proposed model and the comparative performance analysis against other competitive models.

6.2.1. General Performance

In the first part of the experiments, M LSTM is trained and evaluated with two types of data preprocessing strategies, i.e., with a labeled time field and without a labelled time field. Figure 9 shows the results of the test set in Dataset S 1 , Dataset S 2 , and Dataset S 3 . In this question, there are two results, (i) a sufficient training set, and (ii) a lack of a training set.
For the first result (i), the model is sufficiently trained with data from 2017 and 2018 and evaluated in 2019 from Dataset S 1 and Dataset S 2 . Figure 9 shows that the model achieves better performance in all three metrics with the labeled time field. The results are similar under the same settings for the other models. The details are provided in Table 2 and the line plot in Figure 10. Therefore, the models can learn and extract more valuable features if they are trained with the appropriate data preprocessing strategy.
For the second result (ii), the model is only trained with data from 2018 and evaluated in 2019 from Dataset S 3 . Figure 10 shows that the model performed well in predicting and matching the actual values, as evidenced by its superior fit line compared with the other models in Figure 9 and Figure 10. These findings suggest that the proposed preprocessing method is effective, particularly in situations with a limited amount of training data available for model training.

6.2.2. Experience Different Models

In the second part of the experiments, we compare the performance evaluation results of M LSTM to those of other competitive models (LSTM, Bi-LSTM, LR, and SVM models) on three datasets in terms of three metrics. Two sets of performance metrics are presented; one set includes time label information ( M A P E t , N R M S E t , R t 2 score), whereas the other set does not include time label information ( M A P E , N R M S E , R 2 score). Table 2 presents the performance evaluation results of M LSTM and LSTM, Bi-LSTM, LR, and SVM models on three datasets in terms of three metrics. Figure 11 shows the M A P E error of models with and without labeled time.
In general, the results indicate that models using labelled time information tend to perform better than those that do not use labelled time information in the same setting. For example, on Dataset S 1 , the four models get lower values, while the same models have higher error values. On the other hand, among the different models, the model M LSTM using the labeled time information tends to perform the best overall, with the lowest M A P E t , N R M S E t , and the highest R t 2 score in most cases. To summarize, using the proposed preprocessing method with time information tends to improve the performance of the models. The M LSTM model using time label information performs the best in general. The labeled time field provides useful information for predicting energy consumption in peak and non-peak periods. These findings suggest that considering time information can help in accurately predicting the target variable in the studied datasets.

7. Conclusions

In conclusion, this work presents a method for pre-processing data and a model for accurately predicting energy consumption in commercial buildings, specifically focusing on buildings on the Hawthorn and Wantirna campuses. The proposed pre-processing method effectively improves the accuracy of energy consumption prediction, even when training data are limited. The results demonstrate the applicability of the proposed method and model for accurately predicting energy consumption in various commercial buildings.
The proposed model, denoted as M LSTM , achieved the lowest M P A E values of 0.159, 0.139, and 0.072 for Dataset S 1 , Dataset S 2 , and Dataset S 3 , respectively. This achievement is crucial for effective energy management and conservation in commercial buildings. The practicality of this approach extends to other commercial buildings with similar energy consumption patterns, making it a viable solution for energy management in the commercial building sector. Visualizations were also provided to aid in understanding the data patterns and trends in the model predictions. Additionally, further research can explore the effectiveness of the proposed pre-processing method and models in predicting energy consumption for different types of buildings or larger datasets. Exploring alternative techniques, such as seasonal decomposition or time series analysis, for incorporating time information into the models could also yield valuable insights. These advancements in energy consumption forecasting contribute to significant cost savings and environmental benefits in commercial buildings.

Author Contributions

Individual Contribution: Conceptualization, T.N.D., G.S.T., M.S., S.M. and A.S.; Methodology, T.N.D., G.S.T. and M.S.; Software, T.N.D. and G.S.T.; Validation, M.S., A.S. and S.M.; Formal analysis, G.S.T., M.S., A.S. and S.M.; Investigation, G.S.T., M.S., S.M. and A.S.; Resources, M.S., A.S. and S.M.; Data curation, G.S.T., T.N.D. and M.S.; Writing—original draft preparation, T.N.D. and G.S.T.; Writing—review and Editing, M.S., S.M. and A.S.; Visualization, G.S.T., T.N.D. and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Khalil, M.; McGough, A.S.; Pourmirza, Z.; Pazhoohesh, M.; Walker, S. Machine learning, deep learning and statistical analysis for forecasting building energy consumption—A systematic review. Eng. Appl. Artif. Intell. 2022, 115, 105287. [Google Scholar] [CrossRef]
  2. Agrawal, R.; Bhagia, S.; Satlewal, A.; Ragauskas, A.J. Urban mining from biomass, brine, sewage sludge, phosphogypsum and e-waste for reducing the environmental pollution: Current status of availability, potential, and technologies with a focus on LCA and TEA. Environ. Res. 2023, 224, 115523. [Google Scholar] [CrossRef] [PubMed]
  3. Alok, S.; Ruchi, A.; Samarthya, B.; Art, R. Rice straw as a feedstock for biofuels: Availability, recalcitrance, and chemical properties: Rice straw as a feedstock for biofuels. Biofuels Bioprod. Biorefining 2017, 12, 83–107. [Google Scholar]
  4. IEA. Clean Energy Transitions in Emerging and Developing Economies; IEA: Paris, France, 2021. [Google Scholar]
  5. Shin, S.-Y.; Woo, H.-G. Energy consumption forecasting in korea using machine learning algorithms. Energies 2022, 15, 4880. [Google Scholar] [CrossRef]
  6. Özbay, H.; Dalcalı, A. Effects of COVID-19 on electric energy consumption in turkey and ann-based short-term forecasting. Turk. J. Electr. Eng. Comput. Sci. 2021, 29, 78–97. [Google Scholar] [CrossRef]
  7. Ji, Y.; Lomas, K.J.; Cook, M.J. Hybrid ventilation for low energy building design in south China. Build. Environ. 2009, 44, 2245–2255. [Google Scholar] [CrossRef] [Green Version]
  8. Manu, S.; Shukla, Y.; Rawal, R.; Thomas, L.E.; De Dear, R. Field studies of thermal comfort across multiple climate zones for the subcontinent: India Model for Adaptive Comfort (IMAC). Build. Environ. 2016, 98, 55–70. [Google Scholar] [CrossRef] [Green Version]
  9. Delzendeh, E.; Wu, S.; Lee, A.; Zhou, Y. The impact of occupants’ behaviours on building energy analysis: A research review. Renew. Sustain. Energy Rev. 2017, 80, 1061–1071. [Google Scholar] [CrossRef]
  10. Itzhak, N.; Tal, S.; Cohen, H.; Daniel, O.; Kopylov, R.; Moskovitch, R. Classification of univariate time series via temporal abstraction and deep learning. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data) IEEE, Osaka, Japan, 17–20 December 2022; pp. 1260–1265. [Google Scholar]
  11. Ibrahim, M.; Badran, K.M.; Hussien, A.E. Artificial intelligence-based approach for univariate time-series anomaly detection using hybrid cnn-bilstm model. In Proceedings of the 2022 13th International Conference on Electrical Engineering (ICEENG) IEEE, Cairo, Egypt, 29–31 March 2022; pp. 129–133. [Google Scholar]
  12. Hu, M.; Ji, Z.; Yan, K.; Guo, Y.; Feng, X.; Gong, J.; Zhao, X.; Dong, L. Detecting anomalies in time series data via a meta-feature based approach. IEEE Access 2018, 6, 27760–27776. [Google Scholar] [CrossRef]
  13. Niu, Z.; Yu, K.; Wu, X. Lstm-based vae-gan for time-series anomaly detection. Sensors 2020, 20, 3738. [Google Scholar] [CrossRef]
  14. Warrick, P.; Homsi, M.N. Cardiac arrhythmia detection from ecg combining convolutional and long short-term memory networks. In Proceedings of the 2017 Computing in Cardiology (CinC) IEEE, Rennes, France, 24–27 September 2017; pp. 1–4. [Google Scholar]
  15. Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate lstm-fcns for time series classification. Neural Netw. 2019, 116, 237–245. [Google Scholar] [CrossRef] [Green Version]
  16. Gasparin, A.; Lukovic, S.; Alippi, C. Deep learning for time series forecasting: The electric load case. CAAI Trans. Intell. Technol. 2021, 7, 1–25. [Google Scholar] [CrossRef]
  17. Wei, N.; Li, C.; Peng, X.; Zeng, F.; Lu, X. Conventional models and artificial intelligence-based models for energy consumption forecasting: A review. J. Pet. Sci. Eng. 2019, 181, 106187. [Google Scholar] [CrossRef]
  18. Kim, Y.; Son, H.G.; Kim, S. Short term electricity load forecasting for institutional buildings. Energy Rep. 2019, 5, 1270–1280. [Google Scholar] [CrossRef]
  19. Chitalia, G.; Pipattanasomporn, M.; Garg, V.; Rahman, S. Robust short-term electrical load forecasting framework for commercial buildings using deep recurrent neural networks. Appl. Energy 2020, 278, 115410. [Google Scholar] [CrossRef]
  20. Pinto, T.; Praça, I.; Vale, Z.; Silva, J. Ensemble learning for electricity consumption forecasting in office buildings. Neurocomputing 2021, 423, 747–755. [Google Scholar] [CrossRef]
  21. Pallonetto, F.; Jin, C.; Mangina, E. Forecast electricity demand in commercial building with machine learning models to enable demand response programs. Energy AI 2022, 7, 100121. [Google Scholar] [CrossRef]
  22. Skomski, E.; Lee, J.Y.; Kim, W.; Chandan, V.; Katipamula, S.; Hutchinson, B. Sequence-to-sequence neural networks for short-term electrical load forecasting in commercial office buildings. Energy Build. 2020, 226, 110350. [Google Scholar] [CrossRef]
  23. Dagdougui, H.; Bagheri, F.; Le, H.; Dessaint, L. Neural network model for short-term and very-short-term load forecasting in district buildings. Energy Build. 2019, 203, 109408. [Google Scholar] [CrossRef]
  24. Khan, Z.A.; Hussain, T.; Ullah, A.; Rho, S.; Lee, M.; Baik, S.W. Towards efficient electricity forecasting in residential and commercial buildings: A novel hybrid CNN with a LSTM-AE based framework. Sensors 2020, 20, 1399. [Google Scholar] [CrossRef] [Green Version]
  25. Karijadi, I.; Chou, S.Y. A hybrid RF-LSTM based on CEEMDAN for improving the accuracy of building energy consumption prediction. Energy Build. 2022, 259, 111908. [Google Scholar] [CrossRef]
  26. Hwang, J.; Suh, D.; Otto, M.O. Forecasting electricity consumption in commercial buildings using a machine learning approach. Energies 2020, 13, 5885. [Google Scholar] [CrossRef]
  27. Fernández-Martínez, D.; Jaramillo-Morán, M.A. Multi-Step Hourly Power Consumption Forecasting in a Healthcare Building with Recurrent Neural Networks and Empirical Mode Decomposition. Sensors 2022, 22, 3664. [Google Scholar] [CrossRef] [PubMed]
  28. Jozi, A.; Pinto, T.; Marreiros, G.; Vale, Z. Electricity consumption forecasting in office buildings: An artificial intelligence approach. In Proceedings of the 2019 IEEE Milan PowerTech, Milan, Italy, 23–27 June 2019; pp. 1–6. [Google Scholar]
  29. Mariano-Hernández, D.; Hernández-Callejo, L.; Solís, M.; Zorita-Lamadrid, A.; Duque-Pérez, O.; Gonzalez-Morales, L.; Alonso-Gómez, V.; Jaramillo-Duque, A.; Santos García, F. Comparative study of continuous hourly energy consumption forecasting strategies with small data sets to support demand management decisions in buildings. Energy Sci. Eng. 2022, 10, 4694–4707. [Google Scholar] [CrossRef]
  30. Divina, F.; Torres, M.G.; Vela, F.A.G.; Noguera, J.L.V. A comparative study of time series forecasting methods for short term electric energy consumption prediction in smart buildings. Energies 2019, 12, 1934. [Google Scholar] [CrossRef] [Green Version]
  31. Johannesen, N.J.; Kolhe, M.; Goodwin, M. Relative evaluation of regression tools for urban area electrical energy demand forecasting. J. Clean. Prod. 2019, 218, 555–564. [Google Scholar] [CrossRef]
  32. Singhal, R.; Choudhary, N.; Singh, N. Short-Term Load Forecasting Using Hybrid ARIMA and Artificial Neural Network Model. In Advances in VLSI, Communication, and Signal Processing: Select Proceedings of VCAS 2018; Springer: Singapore, 2020; pp. 935–947. [Google Scholar]
  33. Li, K.; Zhang, T. Forecasting electricity consumption using an improved grey prediction model. Information 2018, 9, 204. [Google Scholar] [CrossRef] [Green Version]
  34. del Real, A.J.; Dorado, F.; Duran, J. Energy demand forecasting using deep learning: Applications for the french grid. Energies 2020, 13, 2242. [Google Scholar] [CrossRef]
  35. Fathi, S.; Srinivasan, R.S.; Kibert, C.J.; Steiner, R.L.; Demirezen, E. AI-based campus energy use prediction for assessing the effects of climate change. Sustainability 2020, 12, 3223. [Google Scholar] [CrossRef] [Green Version]
  36. Khan, S.U.; Khan, N.; Ullah, F.U.M.; Kim, M.J.; Lee, M.Y.; Baik, S.W. Towards intelligent building energy management: AI-based framework for power consumption and generation forecasting. Energy Build. 2023, 279, 112705. [Google Scholar] [CrossRef]
  37. Athiyarath, S.; Paul, M.; Krishnaswamy, S. A comparative study and analysis of time series forecasting techniques. SN Comput. Sci. 2020, 1, 175. [Google Scholar] [CrossRef]
  38. Noor, R.M.; Yik, N.S.; Kolandaisamy, R.; Ahmedy, I.; Hossain, M.A.; Yau, K.L.A.; Shah, W.M.; Nandy, T. Predict Arrival Time by Using Machine Learning Algorithm to Promote Utilization of Urban Smart Bus. Preprints.org 2020, 2020020197. [Google Scholar] [CrossRef] [Green Version]
  39. Ciampiconi, L.; Elwood, A.; Leonardi, M.; Mohamed, A.; Rozza, A. A Survey and Taxonomy of Loss Functions in Machine Learning. arXiv 2023, arXiv:2301.05579. [Google Scholar]
  40. Bianco, V.; Manca, O.; Nardini, S. Electricity consumption forecasting in italy using linear regression models. Energy 2009, 34, 1413–1421. [Google Scholar] [CrossRef]
  41. Saab, S.; Badr, E.; Nasr, G. Univariate modeling and forecasting of energy consumption: The case of electricity in lebanon. Energy 2001, 26, 1–14. [Google Scholar] [CrossRef]
  42. Yuan, X.; Li, L.; Wang, Y. Nonlinear dynamic soft sensor modeling with supervised long short-term memory network. IEEE Trans. Ind. Inform. 2019, 16, 3168–3176. [Google Scholar] [CrossRef]
  43. Durand, D.; Aguilar, J.; R-Moreno, M.D. An analysis of the energy consumption forecasting problem in smart buildings using lstm. Sustainability 2022, 14, 13358. [Google Scholar] [CrossRef]
  44. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  45. Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
  46. Le, T.; Vo, M.T.; Vo, B.; Hwang, E.; Rho, S.; Baik, S.W. Improving electric energy consumption prediction using cnn and bi-lstm. Appl. Sci. 2019, 9, 4237. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Common categories of Forecasting model in the literature.
Figure 1. Common categories of Forecasting model in the literature.
Applsci 13 07775 g001
Figure 2. Location of Hawthorn Campus and Wantirna Campus in Metropolitan Melbourne, Victoria, Australia.
Figure 2. Location of Hawthorn Campus and Wantirna Campus in Metropolitan Melbourne, Victoria, Australia.
Applsci 13 07775 g002
Figure 3. Electricity accumulation in every 15 min at Hawthorn Campus—ATC Building.
Figure 3. Electricity accumulation in every 15 min at Hawthorn Campus—ATC Building.
Applsci 13 07775 g003
Figure 4. Average electricity accumulation in every 1 h at Hawthorn Campus—ATC Building in 2018. As can be seen, electricity consumption peaks between 8 a.m. and 21 p.m.
Figure 4. Average electricity accumulation in every 1 h at Hawthorn Campus—ATC Building in 2018. As can be seen, electricity consumption peaks between 8 a.m. and 21 p.m.
Applsci 13 07775 g004
Figure 5. Workflow of the proposed model.
Figure 5. Workflow of the proposed model.
Applsci 13 07775 g005
Figure 6. Illustration of the ith LSTM cell at the layer m.
Figure 6. Illustration of the ith LSTM cell at the layer m.
Applsci 13 07775 g006
Figure 7. Overview of the LSTM model.
Figure 7. Overview of the LSTM model.
Applsci 13 07775 g007
Figure 8. Overview of the Bi-LSTM model.
Figure 8. Overview of the Bi-LSTM model.
Applsci 13 07775 g008
Figure 9. Comparison of M LSTM trained with labelled time and without labelled time. (a) M A P E error in the scale of (0, 1), (b) N R M S E error, (c) R 2 score. Note that, the higher R 2 score indicates the better model’s performance.
Figure 9. Comparison of M LSTM trained with labelled time and without labelled time. (a) M A P E error in the scale of (0, 1), (b) N R M S E error, (c) R 2 score. Note that, the higher R 2 score indicates the better model’s performance.
Applsci 13 07775 g009
Figure 10. Comparison of M LSTM with labelled time ( M LSTM t ) with M LSTM without labelled time, and other models in case of lack of training data ( Dataset S 3 in 2019). The time step is 7 days.
Figure 10. Comparison of M LSTM with labelled time ( M LSTM t ) with M LSTM without labelled time, and other models in case of lack of training data ( Dataset S 3 in 2019). The time step is 7 days.
Applsci 13 07775 g010
Figure 11. Bart chart in the comparison of M LSTM with labelled time ( M LSTM t ) with M LSTM without labelled time, and other models with and without labelled time in case of lack of training data ( Dataset S 3 in 2019).
Figure 11. Bart chart in the comparison of M LSTM with labelled time ( M LSTM t ) with M LSTM without labelled time, and other models with and without labelled time in case of lack of training data ( Dataset S 3 in 2019).
Applsci 13 07775 g011
Table 1. Summary of different forecasting models for commercial buildings.
Table 1. Summary of different forecasting models for commercial buildings.
No.Forecasting ModelYearCountryForecast HorizonRef.Accuracy
MAPENormalised-RMSEMAE
1ANN model with external variables (NARX)2019Koreahour ahead[18]1.69%85.44
2Long Short-term Memory Networks with attention (LSTM)2020USAhour ahead[19]5.96%7.21
3AdaBoost.R22021Portugalhour ahead[20]5.34%
4Support Vector Machine (SVM)2022Irelandhour ahead[21]5.3%3.8211.94 kW
5Seq2seq RNN2020USAhour ahead[22] 3.74 kW
6Bayesian regularized (BR) (12 inputs)2019Canadahour ahead[23]1.83%105.03 kW
Levenberg Macquardt (LM) (12 inputs)2019Canadahour ahead 1.82%104.21 kW
7Hybrid convolutional neural network (CNN) with an LSTM autoencoder (LSTM-AE)2020Koreahour ahead[24]0.76%0.470.31
8Hybrid method of Random Forest (RF) and Long Short-Term Memory (LSTM) based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDA)2022USAhour ahead[25]5.33%0.570.43
9Seasonal autoregressive integrated moving average (SARIMAX)2020Koreaday ahead[26]27.15%557.6 kW
10Gated Recurrent Unit (GRU)2022Spainday ahead[27]7.86%156.11
11Hybrid Neural Fuzzy Interface System (HyFIS)2019Portugalday ahead[28]8.71%
Wang and Mendel’s Fuzzy Rule Learning Method (WM)2019Portugalday ahead 8.58%
A genetic fuzzy system for fuzzy rule learning based on the MOGUL methodology (GFS.FR.MOGUL)2019Portugalday ahead 9.87%
12XGBoost2022Spainday ahead[29] 8.83
Table 2. Comparison of M LSTM with competitive models with and without labelled time. M A P E t , N R M S E t , R t 2 score. are denoted metrics for model trained with labelled time. M A P E , N R M S E , R 2 score are denoted metrics for model trained without labelled time. M A P E t and M A P E are rescale to the range from 0 to 1. Better values are marked in bold.
Table 2. Comparison of M LSTM with competitive models with and without labelled time. M A P E t , N R M S E t , R t 2 score. are denoted metrics for model trained with labelled time. M A P E , N R M S E , R 2 score are denoted metrics for model trained without labelled time. M A P E t and M A P E are rescale to the range from 0 to 1. Better values are marked in bold.
With Labelled TimeWithout Labelled Time
DatasetModel MAPE t NRMSE t R t 2 Score MAPE NRMSE R 2 Score
Dataset S 1 M LSTM 0.1590.0710.5430.2510.0840.347
LSTM0.2600.0900.2520.2700.0970.135
Bi-LSTM0.3240.0970.1800.3430.1340.115
Linear Regression0.2850.0940.1910.3110.0930.216
SVM0.2390.0740.4900.2580.0810.310
Dataset S 2 M LSTM 0.1390.0340.8310.1760.0440.719
LSTM0.3850.0750.1560.4950.091−0.230
Bi-LSTM0.3520.0870.1430.4760.093−0.258
Linear Regression0.1670.0340.8270.3450.0690.291
SVM0.2080.0580.2480.3880.0780.122
Dataset S 3 M LSTM 0.0720.1300.5060.0990.1340.399
LSTM0.1840.1460.3780.4310.1440.395
Bi-LSTM0.4490.331−2.1960.8140.295−1.536
Linear Regression0.3120.1820.0350.3900.1380.141
SVM0.1920.1360.4040.3970.1470.341
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dinh, T.N.; Thirunavukkarasu, G.S.; Seyedmahmoudian, M.; Mekhilef, S.; Stojcevski, A. Predicting Commercial Building Energy Consumption Using a Multivariate Multilayered Long-Short Term Memory Time-Series Model. Appl. Sci. 2023, 13, 7775. https://doi.org/10.3390/app13137775

AMA Style

Dinh TN, Thirunavukkarasu GS, Seyedmahmoudian M, Mekhilef S, Stojcevski A. Predicting Commercial Building Energy Consumption Using a Multivariate Multilayered Long-Short Term Memory Time-Series Model. Applied Sciences. 2023; 13(13):7775. https://doi.org/10.3390/app13137775

Chicago/Turabian Style

Dinh, Tan Ngoc, Gokul Sidarth Thirunavukkarasu, Mehdi Seyedmahmoudian, Saad Mekhilef, and Alex Stojcevski. 2023. "Predicting Commercial Building Energy Consumption Using a Multivariate Multilayered Long-Short Term Memory Time-Series Model" Applied Sciences 13, no. 13: 7775. https://doi.org/10.3390/app13137775

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop