1. Introduction
Complex energy systems that support global urbanization tendencies play an important role in the definition, implementation, and evaluation of future smart cities. Within the built environment, making use of modern technologies in the areas of sensing, computing, communication, and control leads to improving the operations of various systems and the well-being of its inhabitants. Of particular relevance is the reliable, clean, and cost-effective energy-supply electrical grid, thereby catering to ever-increasing urban needs. The main target of our work is developing improved models for load forecasting of medium and large commercial buildings that act in a determining role as consumers, prosumers, or balancing entities for grid stability. Statistical-learning algorithms, such as classical and deep neural networks, represent a prime example. The back-box model, achieved thorough such techniques, has proven able to accurately model underlying patterns and trends driving energy consumption that can be used to forecast load profiles and improve high-level control strategies. In such a way, significant economic, through cost savings, and environmental, through limited use of scarce energy resources, benefits can be achieved.
An often quoted figure in the scientific literature [
1] places building energy use at almost 40% of primary energy use in many developed countries with an increasing trend. In modern buildings, a centralized software solution, often denoted as a Building Management System (BMS), collects all relevant data streams originating in building subsystems and provides means for intelligent algorithms to act based on processed data. Energy-relevant data, generated through heterogeneous instrumentation networks, are subsequently leveraged to control relevant energy parameters in daily operation. For an existing older building, new technology can be used to upgrade legacy devices, such as electrical meters, and integrate them into wired or wireless communication networks in a cost-effective manner. Finally, the resulting preprocessed energy-measurement time series serve as input for accurate models of energy prediction and control.
Statistical-learning algorithms have become robust and well-adopted in the last few years through concentrated efforts in both research and industry. This was stimulated by data availability and exponentially increasing computing resources at lower costs, including cloud systems. Neural networks are one example of such an algorithm, which offers good results in many application areas, including the modelling and prediction of energy systems. This is valid for both classification and as regression tasks, where the objective is to predict an output numerical value of interest. Deep-learning networks are highly intricate neural networks with many hidden layers that are able to learn patterns of increasing complexity in input datasets. Initially deployed through industry-driven initiatives in the areas of multimedia processing and translation systems, other technical applications currently stand to benefit from the availability of open-source algorithms and tools. For time series and sensor data, a particular type is gaining traction with the research community, namely, sequence models based on a recurrent neural network that can capture long-term dependencies in input examples. In the nomenclature described by Reference [
2] of machine-learning (ML) taxonomy for smart buildings, our work fits within the area of using ML to estimate aspects related to either energy or devices, in particular, energy profiling and demand estimation.
Within this approach, large commercial buildings provide the operators/owners with economic incentives and returns of investment related to energy-efficiency projects, where small-percentage gains on large absolute values of energy use become more attractive. An equally large market exists for improving energy-forecasting accuracy in the residential sector, which is, however, more fragmented, and the incentives to deploy such approaches have to be present at the energy supplier or through public large-scale programs.
The main contributions of the paper can be thus summarised:
illustrating a deep-learning approach to model large-commercial-building electrical-energy usage as alternative to conventional modelling techniques;
presenting an experimental case study using the chosen deep learning techniques enabling reliable forecasting of building energy use;
analysis of the results in terms of accuracy metrics, both absolute and relative, which provide a way for replicable result towards other related research.
Additional contributions that extend the previous conference paper [
3] are summarised. We provided, as the main goal for the extended version, new experiment results for recurrent neural-network modelling of large-commercial-building energy consumption. These were further analyzed, also taking into account several performance metrics and computational aspects. More technical clarifications regarding the methods and data-processing and -modelling pipeline are also included. Significant revisions and extensions were also carried out in the related work section for more timely and focused state of the art to frame the work, as well as to other parts of the paper to improve readability and allow the replication of the results by interested researchers on neural-network architectures presented in an energy-management system.
We further briefly discuss the structure of the paper.
Section 2 discusses a timely recent publication that deal with four models of electrical-energy consumption of direct relevance to the previously stated contribution areas. In
Section 3, sequence models are introduced as computational intelligence methods for this task. Most notably, Recurrent Neural Networks (RNN) are used through units of Long Short-Term Memory (LSTM) neurons. The selected deep-learning methods are applied as a case study in
Section 4 on publicly available data stemming from four large commercial buildings. The salient findings are also discussed in detail, including computational aspects pertaining to the architectures of the learning algorithms that were implemented.
Section 5 concludes the paper with regard to the applicability of the derived black-box models for in situ electrical-load forecasting.
4. Experiment Evaluation for Building-Energy Time-Series Forecasting
We first present the preprocessed time series for the buildings that make up our study and consist of hourly active power measurement from the electrical meters.
Figure 3 presents the input data for the buildings in Zurich and Chicago, while
Figure 4 presents the input data for the buildings in New York and the second Chicago building. All are from academic campuses and, in terms of absolute electrical energy load, New York uses the most energy, followed by Zurich and Chicago in a similar range, with Chicago 2 having the least energy needs.
The classical LSTM algorithm was implemented for experiment assessment and forecasting. The base network architecture consisted of one sequence input layer, one hidden LSTM layer of varying unit numbers, one fully connected layer, and one regression output layer for the resulting forecasted output value. Each network has a different configuration represented by the number of hidden units from the LSTM layer. Based on this, the following network structures were implemented; in total, 25 networks were trained, validated, and evaluated: C-0, C-1, C-2, C-3, C-4, Z-0, Z-1, Z-2, Z-3, Z-4, C2-0, C2-1, C2-2, C2-3, C2-4, NY-0, NY-1, NY-2, NY-3, NY-4. In our case, the identifier before the dash sign reflects the analyzed building: C stands for the Chicago building, Z for the Zurich building, for the second building from Chicago, and for the New York building. The number after the building identifier marks the complexity of the network in terms of hidden units of LSTM that were implemented in the hidden layer, using a linear increase. This ranges from five hidden units for ID 0, 25 hidden units for ID 1, 50 hidden units for ID 2, 100 hidden units for ID 3, and, finall, y 125 hidden units for ID 4.
The optimization method of choice for training the neural networks was through the Adaptive Moment Estimation (ADAM) algorithm [
25]. This is an often-used general optimization method for first-order gradient-based optimisation of stochastic objective functions with momentum. One of the key optimization parameters for carrying out neural-network training is learning rate. This allows implementing a trade-off between the speed of the processing and its precision, in the sense that a large learning rate can in many situations miss the optimal value of the objective metric. In our case, the learning rate was established through an empirical adjustment process. The initial value was set at 0.1, followed by subsequent decreases with a factor of 0.2 every 200 iterations. From observing the performance over multiple initial training runs, a second parameter, the number of training iterations, was set at 200.
Figure 5,
Figure 6,
Figure 7 and
Figure 8 present the prediction response by the LSTM neural network, with 50 hidden units in the LSTM layer versus real data for the Chicago building and the Zurich building, respectively. The plots demonstrate that the forecasting performance of the LSTM models for the testing datasets was very good.
To evaluate the prediction models, three performance metrics were used: Mean Squared Error (MSE), Root MSE (RMSE), and Mean Absolute Percentage Error (MAPE). In addition, we included the Coefficient of Variation (CV) of the RMSE based on the evaluation discussed in [
9]. The metrics were computed according to the following equations:
where
n represents the number of samples,
and
stand for the actual data and predicted data, respectively.
A summary of the experiment results is listed in
Table 1,
Table 2,
Table 3 and
Table 4 show These include: the error metrics MSE, RMSE, CV (RMSE) and MAPE as well as training/computation time for the previously defined RNN LSTM networks. The figures in bold style mark the best results achieved.
The main outcome of the learning models, as reflected by the aggregate performance metrics from
Table 1,
Table 2,
Table 3 and
Table 4, pinpoints the best network architecture for all four testing scenarios to be the one with 50 LSTM units in the hidden layer.
Figure 9 also presents the evolution of the MAPE and computation time for each building over each defined network. It can be seen that computation time increases linearly with the number of neurons in the LSTM layers, which can be helpful for deploying more tests. We can affirm that, until we achieve the best value for MAPE, 50 neurons in the LSTM layer, this is a good compromise, but after this point, computation time increases too much without better performance. The reference computer includes a 2.6 GHz seventh-generation Intel i5 CPU, 8 GB of RAM, and a solid-state disk, with Windows 10 as the operating system. This is the baseline for the reported computation/training time for all test cases. Algorithms were written and run under MATLAB, version R2018a, which provides a robust high-level technical programming environment. We leveraged built-in functions from the machine- and deep-learning toolboxes, as well as dedicated scripts for data ingestion and preprocessing.
Table 5 provides a summary of the statistical indicators for the comparable relative accuracy metrics: CV (RMSE) and MAPE over the tested scenarios, four buildings with five networks each. The reported statistical indicators are: minimum, maximum, mean
, standard deviation
, skewness, and kurtosis.
Performance evolution during training for the Zurich and New York buildings is graphically depicted in
Figure 10 and
Figure 11. The graphic represents the gradual decrease in the RMSE metric over 200 iterations, with the worst- and best-case scenarios. In the first case, the worst performance is seen on the Z4 network, which tried to overfit the data given the more complex structure. As such, the RMSE presented multiple increases and decreases over the training horizon. In the positive case, Z2, we observed convergence in just under 100 training iterations, as compared to the previous 120 iterations needed by the denser network. Similar networks were represented for the New York dataset. Different behavior was observed in this case, with the best-case convergence of the RMSE being slower at the beginning, with a more gentle slope of the graphic over the first steps.