1. Introduction
As of 2020, buildings accounted for 36% of global energy demand shares, and 37% of the total global energy-related CO
2 emissions. Residential buildings had the highest share at 22%, while non-residential buildings had 8%, and the final 6% were related to the construction industry [
1]. Hence, buildings are at the frontiers of energy research, and will be key to realizing future smart grids and greener, sustainable energy systems. This can be achieved by developing efficient, smart, and adaptive buildings that go beyond the conventional role of passive energy consumers. Moreover, future smart buildings must adapt to the rising complexities of modern power grids because of the induced stochasticity of renewable energy generation and the decentralization of power supply. To accomplish these goals, investment in energy efficiency in buildings has recently been rapidly increasing; as of 2020, the total investment has reached
$180 billion, increasing by 39.5% since 2015 [
1]. Realizing these targets is based on many interlapping paradigms, as shown in
Figure 1. Specifically, these include increasing their renewable power generation [
2], having more efficient electrical products [
3], improving their thermal design [
4], and activating their role in the energy market [
5]. All these measures are employed in conjunction to ensure that future buildings will be better aligned with global sustainability goals, have reduced energy consumption, and be smart active players in the energy market.
Building energy management systems (BEMS) are integral for realizing smart buildings. BEMS, which are based on advanced energy management, tie up all the other paradigms [
6]. The BEMS must be both high performing and conforming to human comfort levels. BEMS need to determine the best schedule for certain appliances [
7] or efficient operational set points [
8]. Moreover, BEMS control the utilization of the thermal body of a building to store energy [
9] and when to pre-cool the building or pre-heat the building/water when there is a surplus of renewable energy. Finally, BEMS will manage buying or selling energy to the grid to minimize costs and increase profits [
10]. Achieving these targets not only requires a deep understanding of each of the paradigms illustrated in
Figure 1, but also on how they can be optimized and integrated using state-of-the-art BEMS.
State-of-the-art BEMS is at the core of future smart buildings. Currently, with the breakthroughs of artificial intelligence (AI) and machine learning (ML) [
11], the rapid development of IoT devices and sensing technology [
12], and low-cost high-computational controllers, the inception and wide deployment of advanced BEMS is imminent. Previously, real-time BEMSs were operated using conventional control methods, such as rule-based methods and proportional-integral-derivative (PID) control, both of which are static and rely on heuristic rules. However, conventional BEMS face many limitations and challenges related to building modelling, satisfying and controlling multiple objectives, system generalization, and scaling.
As shown in
Figure 2, constructing precise building models related to thermal characteristics is a complex task that relies on stochastic elements raised by the assumed schedules of appliance usage, human presence, and various elements in buildings [
13]. The problem is amplified when real-time modelling is required for the BEMS to make decisions within a sub-second time window, especially for local control problems. This type of micro-real-time control will be more critical in future smart homes, where supervisory control with bigger time windows might fail to achieve optimal energy saving to the many fast-changing variables such as energy prices, renewable energy availability and human behaviors. In this case, the physics-based, complex white model that is generally more suited for design building standards and optimal building design cannot be used because of the high computational space and time complexity required [
14]. Hence, simpler models, such as gray or black box models, are required [
15]. Smart buildings with renewable energy, storage systems, electric vehicles, smart appliances, and heating, ventilating, and air-conditioning (HVAC) systems are required to coordinate energy management between these different entities, where operational constraints and objectives are multidimensional and intricate with high-dimensional solution spaces [
16,
17].
Furthermore, system objectives to satisfy comfort levels, energy savings, cost savings, and health and environmental goals introduce an additional layer of objectives and constraints that must be handled by the BEMS. In addition, because conventional modelling and control methods in BEMS are based on hand-engineered designs for each specific building and case, and such a system cannot be easily generalized to other buildings that generally have many different and unique designs [
18]. Finally, scaling the target premise of a BEMS for multiple buildings increases the complexity of BEMS designs and further hinders the cost and time required for wide-scale deployment [
19].
Owing to the previous challenges and the rise of ML methods in many fields since 2011, there has been increased interest in exploiting the benefits of such data-driven methods in BEMS. Applications range from improved forecasting of energy demand [
20] and human behaviour in buildings [
21] to anomaly detection [
22], and classification of different building states [
21]. In particular, reinforcement learning (RL) [
23] and deep reinforcement learning (DRL), have attracted significant interest in the last five years. This is observed in both academia and industry because of its ability to solve the challenges mentioned in
Figure 2. One notable application of RL-based BEMS was the data centres of Google by DeepMind, where a reduction in the cost of cooling reached 40% [
24]. This highlights that advanced energy management, particularly relating to HVAC loads, which are the highest consumers of energy in buildings, is beyond small gains, and that there are large potential savings. As discussed by Yu et al., DRL can contribute to solving the challenges related to BEMS and can be summarized as follows [
6]:
Real-time Modelling: By utilizing neural networks, and complex environments, such models can be modeled with a lower computational cost following the training phase by forward propagation. Furthermore, DRL can also operate model-free or learn the representation of the environment without explicitly knowing its detailed model. This is similar to how human agents navigate the real world without knowing its detailed physical model but by learning how to interact with it.
Handling multiple objectives: In DRL, through careful engineering of the objective function and multiple objectives can be maximized while satisfying the constraints.
Generalization: Similar to human intelligence, being model-free and learning to maximize rewards in a stochastic environment increases the generalizability of BEMS.
Scaling: Considering the previous points, scaling BEMS systems can be achieved in real-time using a less complex approach rather than relying on complex modelling and hand-engineered solutions for optimizing multiple variables.
It must be noted that utilizing neural networks and optimizing multiple objectives are not unique to DRL and are relatively shared with other control paradigms such as model predictive control (MPC) and proportional-integral-derivative (PID). Svetozarevic et al. provided a comprehensive discussion on their comparative analysis and characteristics [
25]. Finally, the main objective of this work is to systematically investigate and summarize the recent advances of DRL applications in BEMS, while focusing on a building-type centric discussion.
1.1. Related Work
Owing to the promising benefits of data-driven methods, such as ML and DRL methods, there has been considerable attention in recent literature to review their various applications in BEMS-related areas. Each review focused on one or more aspects related to BEMS, and some considered broader applications in energy systems that are beyond buildings, as summarized in
Table 1.
First, it was observed from a recent literature review that there is a high emphasis on method explanation and comparison owing to large variations in the RL/DRL models. This high emphasis on method-centric classification is clearly evident in the detailed work of Wang and Hong, where they classified the reviews based on the internal varying configurations of RL/DRL. For each part of the algorithm, they further investigated the chosen parameters and configurations used in recent research, such as algorithms, states, actions, rewards, and implementation environment [
26].
Table 1.
Recent reviews related to building energy modelling and BEMS.
Table 1.
Recent reviews related to building energy modelling and BEMS.
Ref | Year Published | Coverage Span (Up to) | Review Objectives | Main Methods | BEMS Centric | Multiple Building Type-Centric |
---|
[27] | 2022 | ~2021 | Extensive RL Centric review related to BEMS. Primary emphasis is on the classification, types, and applications of RL Algorithms. | RL, DRL | Yes | No |
[21] | 2022 | ~2021 | A system-level oriented review of the integration of learning methods for realizing intelligent buildings management. | ML, RL, DRL | Yes | No |
[28] | 2021 | ~2020 | A broad review of RL applications in energy systems. | RL, DRL | No | No |
[6] | 2021 | ~2020 | A system-scale-centric review of the application of DRL in BEMS. | DRL | Yes | No |
[29] | 2020 | ~2019 | Energy management of AC in buildings via Computational Intelligence (CI) algorithms. | White Box, Black Box, Gray Box. | No | No |
[30] | 2020 | ~2019 | General CI algorithms for BEMS of residential homes. | Mathematical optimization, GA, ML, RL, MPC. | Yes | No |
[31] | 2020 | ~2019 | Modelling occupant behaviours. | Rule-based, Stochastic, and Data-Driven. | No | No |
[32] | 2020 | ~2019 | General DRL applications in the power system. | DRL | No | No |
[26] | 2020 | ~2019 | RL-based BEMS application, focusing on a detailed intrinsic review of the different RL methods, variations, configurations, and simulation vs. real environment analysis. | RL | Yes | No |
[33] | 2019 | 2019 | RL-based occupant comfort control in buildings. | RL | Yes | No |
[34] | 2019 | ~2018 | RL-based BEMS focuses on method variations and an energy appliance-centric review. | RL | Yes | No |
This Study | - | ~2022 | DRL-based BEMS review centric to applications for different types of buildings with a focus on the details of promising recent advances and limitations. | DRL | Yes | Yes |
1.2. Motivation and Objectives
While recent reviews are detailed and informative of RL applications for BEMS, they do not consider building-type-centric discussions. Different building types, such as residential, commercial, and others, should have different specific characteristics, challenges, limitations, and potential, particularly from a data-driven approach perspective. Thus, it will be extremely useful for researchers in this field to realize building type-centric landscapes and discussions related to this area of research, particularly in terms of challenges and opportunities. Furthermore, with the rapid growth of this field, as discussed in
Section 2, realizing the most recent creative and innovative research direction in this promising area of research can contribute to the existing literature. Therefore, the present study aims to contribute to the following:
To systematically review recent advances and innovations in data-driven DRL-based BEMS.
To conduct a building type-centric review and analysis.
To discuss the limitations and challenges related to each building type.
To realize the promising directions of DRL-based BEMS research, especially from a building-type-centric perspective.
The remainder of this paper is organized as follows:
Section 2 discusses the basic classification of RL and DRL methods, as well as the PRISMA approach.
Section 3 discusses the recent research for each building type.
Section 4 discusses the main conclusions observed and future research recommendations. Finally,
Section 5 presents the conclusions of the study.
3. Recent Advances in DRL-Based BEMS per Building Type
The DRL-based BEMS field has grown rapidly in the last five years, with numerous creative ideas and innovations for integrating advanced data-driven control methods in the development of fully enabled smart buildings. Although residential buildings are by far the largest energy consumers, other building types, such as offices and educational buildings, are also being investigated. It would be useful to realize the different directions of research, types of applications, and innovative ideas being implemented for each building type. In particular, it is crucial from a data-centric perspective, as being able to train and use data-driven methods requires large amounts of data, particularly when deploying such systems in the real world. Therefore, it would be interesting to understand how these challenges are satisfied in different types of buildings.
3.1. Residential Buildings
As previously mentioned, residential buildings account for almost 22% of global energy demand, making them one of the most energy-consuming building types. Additionally, while some types of commercial buildings are primarily used during the day by employees, especially in the post-COVID-19 era, they can be more grid-friendly from the perspective of being more aligned with solar energy availability depending on the work culture of the country. It is primarily because there is more solar energy during the day, whereas residential energy demand increases after work hours, peaking in the evening, which can be a sensitive period for grid operators to compensate for the supply demand change. This could be one underlying factor why this type of building is receiving significant attention in this field, particularly from a DR perspective.
Table 4 presents recent research conducted on DRL-based BEMS in residential buildings.
As indicated in
Table 4, DRL-based BEMS research can consider one or multiple buildings to measure the performance of DRL algorithms under different scenarios or to test a multiple-agent DRL approach for managing energy flow, considering multiple buildings or zones simultaneously [
75]. Glatt et al. introduced a decentralized actor-critic reinforcement learning algorithm MARLISA; however, they focused on integrating a centralized critic (
MARLISA_DACC) to coordinate energy storage systems (ESS) control, such as batteries and thermal energy storage (TES), between various buildings in a manner that enhances DR performance and reduces carbon footprints [
76]. With the increase in the scale of residential buildings, multiple-agent approaches can learn to share information and act in a positively correlated manner to maximize the BEMS performance over single-agent approaches. Ahrarinouri et al. utilized a distributed reinforcement learning energy management (DRLEM) to control the energy flow of combined heat and power (CHP) and boilers between multiple buildings, where the connection between the multiple agents reduced the heat losses and costs by 18.3% and 3.3%, respectively, and increased energy sharing in peak time by 23% [
73]. Hence, distributed, and multi-agent approaches will be key methods in further research on residential neighbourhoods and buildings, where renewable energy and EV can be coordinated between different houses to reduce renewable energy curtailment and maximize profits in peer-to-peer local energy trading hubs.
The large variety of appliances and BEMS targets are major opportunities in deploying DRL-based BEMS in residential buildings, and there is a high potential for DR because of their contribution to both morning and evening peak demands [
79], and detached houses having space for renewable energy integration. In the recently reviewed literature in
Table 4, 77% of the studies considered demand response systems, where the varying electricity price was integrated into the objectives of the control logic, while 42% and 45% had also considered the integration of ESS and PV renewable energy, respectively. Furthermore, while 74% of the systems were deployed to manage HVAC systems related to BEMS targets, 32% of the studies included different types of shiftable/fixable appliances, and 19% investigated the inclusion of electric vehicles (EVs).
Table 4 classifies the general BEMS target systems in residential buildings, while
Table 5 includes a detailed list of appliances that were directly controlled, apart from HVAC systems and TE; noticeable appliances include dishwashers, washing machines, and EVs. The diversity of BEMS targets in residential buildings is noticeable and considerably high, giving it a unique potential and research perspective. This is probably related to the fact that homeowners might have higher relative demand flexibility than office buildings; for example, owing to direct cost benefits. The operating environment tends to have higher levels of stress and no direct benefits to individuals to compromise their comfort, where the benefit is for business owners.
For the DRL methods, the most-utilized algorithm was DQN, while DDQN and DDPG were notable. Many studies include a comparison between the different types of DRL to determine the best method based on realizing system objectives. Meanwhile, others investigated hybrid methods, such as the mixed deep reinforcement learning (MDRL) introduced by Huang et al. [
58], which combines both DQN and DDPG for enhanced performance, and the RLMPC implemented by Arroyo et al. [
68] which combines both the MPC and DDQN methods in a manner that leverages the benefits of both methods. Two recent unique variations of DRL were also observed. First, the actor-critic approach using the Kronecker-factored trust region (ACKTR) introduced by Chu et al. [
53] increased the sampling efficiency and integrated discrete and continuous action spaces that exhibited high potential. The second algorithm is a combination of clustering and DDPG developed by Zenginis et al., which homogeneously partitions the training data using a clustering method and then trains different agents of each subset of the training data, achieving higher energy efficiency over a single agent [
54]. While these methods are not directly related to the type of building, exhibiting such methods can aid researchers in choosing recently advanced implementations of DRL on the basis of their application and building type. Finally, DNNs have been the most used value/policy function estimators, whereas very few used other methods, such as CNN. In general, owing to the mixed type of state variables, DNNs can effectively map state–action spaces and can be considered the default estimator; however, this indicates that there can be potential for testing other methods.
The primary objectives of most BEMS systems are typically the same in terms of comfort and reducing energy/cost. In terms of energy and cost, they are highly correlated, where a reduction in one depicts a reduction in the other, although different studies report their primary objective improvements in terms of energy or cost based on whether DR is considered; hence, the price of energy analysis is included. Other secondary objectives, highlighted by some studies, include health factors such as indoor CO2 levels, and the reduction of peak demand, which usually refers to the improvement over a rule-based baseline controller or a comparison between single and multiple-agent methods. Hence, the high energy-saving percentages do not necessarily depict the overall energy reduction, making it harder to cross-compare studies based on these numbers. Nevertheless, they highlight the advantages of energy savings in residential buildings utilizing DRL. Finally, real implementations are significantly lacking, with only three studies (<10%) out of 31 having validated their models outside of a simulation environment, which highlights a clear research gap.
3.2. Office Buildings
Office buildings face the challenge of a limited variety of appliances apart from HVAC systems, mainly because they are located in cities and high-rise buildings with limited space for installing renewable energy. While keeping these facts in perspective, the recent application of DRL-based BEMS in offices can be observed in
Table 6.
The number of recent office building-related studies is comparable to that of residential buildings. The first difference can be noticed when observing the appliance category type, which is primarily related to HVAC systems. Only two studies investigated EVs, while few other control targets were investigated, such as TES, blind control, light control, and personal comfort systems (PCSs). HVAC systems are the main energy consumers in offices and have the flexibility and potential to save energy. In addition to HVAC control, recent innovations can be found for BEMS integrated with EVs. Liang et al. included EVs in their BEMS that utilized a safe reinforcement learning (SRL) strategy to mitigate the effect of extreme weather events and increase building resilience and proactivity [
102]. Meanwhile, Mbuwir et al. used EVs as their core and only a BEMS target in an office building, which revealed that by utilizing a multi-agent DRL; specifically, a promising saving potential of up to 62.5% can be achieved [
96]. Furthermore, it can be noticed that only 24% of research considered DR systems, and only 21% included PV or energy storage systems.
The methods of DRL utilized in office buildings are more diversified than those observed in residential buildings, including the asynchronous advantage actor-critic (A3C) and the soft-actor critic (SAC), where their comparison has indicated improved performance over baseline, rule-based controllers, although one downside is that their comparison to other DRL has not always been considered. Zhang et al. introduced a branching–dueling Q-network (BDQN) and compared it to both PPO and SAC, where they reported that BDQN converged to the highest reward, followed by SAC, revealing higher sample complexity than their counterpart, although they performed slower than PPO, and consumed less memory. Hence, this revealed a trade between time, RAM usage, and reward. Another comparison between the advantage actor-critic (A2C) and PPO was conducted by Lee et al., where A2C exhibited better performance [
90]. Such a comparison is useful in guiding researchers to choose the best subset of algorithms from the current large pool of DRL algorithms.
A critical observation related to office buildings is the significance of indoor thermal comfort in realizing the high productivity of workers. This can be observed in four studies that highlighted the reduction in discomfort or temperature violations as a system objective. Because there is less DR inclusion in the BEMS, a higher number of studies have reported energy savings rather than cost savings in comparison to residential buildings. Finally, only three studies conducted by Zhang et al. implemented and validated their models in real systems [
106].
3.3. Educational Buildings
As depicted in
Table 7, which shows recent research on educational buildings, they are mainly either schools or university facilities and laboratories. The target of the BEMS primarily focused on HVAC systems, and one study investigated TES control and other ventilation systems by controlling windows and air cleaners. Only two recent works included demand response systems with integrated energy storage, mainly TES. As for the objectives, health was considered by An et al., who deployed DQN to control ventilation in two laboratory rooms to achieve reduced economic loss and PM
2.5-related health risks [
109]. This is an interesting co-benefit perspective to quantify not only energy and cost reduction, but also to quantify the impact on human health and integrate the findings into the BEMS objective. Furthermore, Chemingui et al. included the reduction of indoor contamination as a core target of their BEMS. This was realized by optimizing the HVAC system managing 21 zones in a school model, achieving 44% increased thermal comfort, 21% reduction in energy consumption, and low indoor CO
2 concentration [
110]. Considering real implementations, three studies conducted real model validation: one in a laboratory setting, one in a university building, and another in a school setting. Laboratories are suitable for real-system validation, although acquiring data to train the agent can be challenging if the data does not already exist. In An et al., the approach was first to conduct an offline training phase based on an apartment model coupled with particle dynamics for PM
2.5 modelling, after which the trained agent was tested in a laboratory room with different PM
2.5 [
109]. Schmidt et al. conducted a 43-day experiment in a Spanish school by deploying a BEMS utilizing a fitted Q-iteration and Bayesian regularized neural network coupled with genetic optimization. They confirmed that by maintaining comfort levels similar to the reference period, energy consumption decreased by almost 33%, and while prioritizing higher comfort, only a 5% energy increase was observed [
111].
Finally, a recent innovative idea introduced by Zhou et al. combines DRL with deep learning for building energy prediction. It was not included in
Table 7 because it is indirectly related to the BEMS. They utilized DDPG to add an additional learning layer to an LSTM forecaster by having the agent learn to tune the hyperparameters of the LSTM as new training data arrive. They demonstrated that when there is a high variation in the new training data, the prediction accuracy can be increased by up to 23.5% [
117].
3.4. Datacenters
As listed in
Table 8, few studies have investigated data centres. It was observed that the BEMS does not consider DR, renewable energy, or storage systems and is primarily focused on HVAC systems. In general, the main objective of the BEMS is to lower energy demand while meeting operational constraints, while comfort can be slightly compromised in other building types. As a system target, the operational efficiency of data centers is more sensitive as it can compromise the data center’s main operation.
One unique study implemented by Narantuya et al. utilized a multi-agent DRL (mDRL) based on a DQN to optimize computational resource allocation in high-performance computing (HPC)/AI systems. Their system was further deployed in real-time, reducing the task completion time by 20% and the energy consumption by 40% [
119]. Finally, Beimann et al. conducted a comparative analysis of four different DRL methods for the control of a simulated HVAC system of a data centre. Their computational experimental results revealed that SAC has exceptionally high sample efficiency, reaching stable performance with 10 times less data required in comparison to PPO, TRP, and TD3; hence, it is recommended for future utilization, particularly in noisy environments. Moreover, it was reported that all models can achieve an energy reduction of approximately 10% in comparison to a baseline controller [
120].
3.5. Other Commercial Buildings
Finally,
Table 9 includes commercial buildings that are not classified as educational, offices or data centres. Such types of buildings are introduced as either commercial buildings, storehouses, industrial parks, or a mix of (retail and restaurant buildings, offices, and residential) [
123,
124].
All of the studies listed in
Table 9 investigated HVAC systems as the main BEMS target, while two studies included TES and one considered WHP and renewable energy inverters. DR systems were also included in seven studies, particularly in those with larger scales, such as industrial parks or multiple buildings. One notable method introduced was the dueling SAC-based memory-augmented DRL by Zhao et al. to overcome the limitation of time lag in district heating systems in an industrial park. Their novel methodology reduced the energy costs by 2.8% [
127]. Furthermore, two multi-agent approaches were observed. First, Fu et al. utilized a multi-agent DRL method for developing a cooling water system control (MA-CWSC) to control the frequency of the cooling tower and cooling water pump in many chillers. Compared with the single-agent DQN, the proposed model had faster training and simpler action space, resulting in an 11.1% energy saving over the rule-based baseline [
125]. Second, Yu et al. introduced a multi-agent actor-critic (MAAC) algorithm for a multi-zone HVAC system. Their objective was not only to minimize energy costs but also reduce the indoor CO
2 concentration in the building [
16].
In terms of secondary objectives, Pigott et al. considered voltage regulations for a simulated IEEE-33 bus connected to nine buildings. The building types are diverse and include 37 fast-food restaurants, four medium offices, five retail stores, a mall, and 145 residential houses. These models were based on the recent CityLearn framework, which is a platform dedicated to multi-agent models in smart grids, and hence contains both building and power-flow models. Utilizing multiple DRL agents, their model nominally reduced the under-voltage instances and overvoltage occurrences by 34% [
123]. Moreover, Pinto et al. considered both peak demand and peak-to-average ratio, which were reduced by 23% and 20%, respectively, by using a centralized SAC agent controlling four different building types (small/medium offices, retail, and restaurant). Finally, in terms of real system validation, none was observed [
129].
5. Conclusions
In this study, a systematic review based on the PRISMA methodology was conducted for research on DRL-based BEMS in the context of different building types. Five major building types were identified from a pool of 470 papers: residential buildings, offices, educational buildings, data centres, and other commercial buildings. The main goal was to investigate the relationship between the unique characteristics of each building type and their recent research landscape in the context of DRL-based BEMS. In doing so, the unique research directions can be more clearly identified, and the innovations applied in one building-type context may prove useful to another. First, it was observed that residential and office buildings were the most explored types of buildings, with residential buildings being the main energy consumers among other types. Second, detailed characteristics of each building in this context were identified, such as the emphasis on reducing discomfort in offices, lack of DR and renewable energy in data centre building research, and the direction toward realizing district-level BEMS in residential buildings. Third, the main challenge related to the need for large amounts of data was discussed, where recent studies approached this challenge using transfer learning and data-efficient DRL models. Finally, there is still a clear gap in real implementations and system validations, where only 11% of the recent works have been reported so far.