Model-Free HVAC Control in Buildings: A Review

Panagiotis Michailidis; Iakovos Michailidis; Dimitrios Vamvakas; Elias Kosmatopoulos

doi:10.3390/en16207124

,

and

¹

Center for Research and Technology Hellas, 57001 Thessaloniki, Greece

²

Department of Electrical and Computer Engineering, Democritus University of Thrace, 67100 Xanthi, Greece

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Energies2023, 16(20), 7124;https://doi.org/10.3390/en16207124

This article belongs to the Section G: Energy and Buildings

Version Notes

Order Reprints

Abstract

The efficient control of HVAC devices in building structures is mandatory for achieving energy savings and comfort. To balance these objectives efficiently, it is essential to incorporate adequate advanced control strategies to adapt to varying environmental conditions and occupant preferences. Model-free control approaches for building HVAC systems have gained significant interest due to their flexibility and ability to adapt to complex, dynamic systems without relying on explicit mathematical models. The current review presents the recent advancements in HVAC control, with an emphasis on reinforcement learning, artificial neural networks, fuzzy logic control, and their hybrid integration with other model-free algorithms. The main focus of this study is a literature review of the most notable research from 2015 to 2023, highlighting the most highly cited applications and their contributions to the field. After analyzing the concept of each work according to its control strategy, a detailed evaluation across different thematic areas is conducted. To this end, the prevalence of methodologies, utilization of different HVAC equipment, and diverse testbed features, such as building zoning and utilization, are further discussed considering the entire body of work to identify different patterns and trends in the field of model-free HVAC control. Last but not least, based on a detailed evaluation of the research in the field, the current work provides future directions for model-free HVAC control considering different aspects and thematic areas.

Keywords:

model-free control; optimal control; HVAC; reinforcement learning; artificial neural networks; fuzzy logic control; energy; comfort

1. Introduction

The United Nations Environment Programme reports that buildings account for 36% of the world’s energy consumption. Notably, integrated Heating, Ventilation, and Air-Conditioning (HVAC) systems form a significant portion of this energy demand [1,2]. These systems play a pivotal role in striking a balance between energy efficiency and ensuring the comfort of occupants. Yet, with escalating energy costs and mounting environmental concerns, the importance of optimizing energy use without compromising indoor comfort is ever more critical [3,4]. In order to satisfy such diverse objectives, the efficient control of HVAC systems potentially represents the most crucial factor in ensuring proper energy conservation, cost reduction, environmental sustainability, and occupant comfort in building structures.

For many years, occupants and building users have relied on simple, manual methods to operate HVAC systems, including adjusting thermostats, opening windows, or switching fans on and off based on immediate comfort needs [5]. However, although these manual approaches provide a direct and tangible way to influence the indoor environment, they often lack optimal energy utilization and the preservation of consistent comfort over time. To this end, the need for a more sophisticated control approach has become clear, fostering the introduction of basic thermostats and timers. These elements allow for preset temperature settings and scheduled operations, offering a degree of autonomous control while maintaining consistent comfort levels [3]. However, such approaches lack adaptability to real-time environmental changes and occupant preferences. Their inflexible settings often lead to energy wastage during unoccupied periods or unexpected weather shifts, necessitating a more advanced control theme [4,6].

In addition to basic automation, rule-based control (RBC) is a prevalent method for operating HVAC systems. The rules are often shaped by empirical observations and expert knowledge, defining conditions and corresponding actions like lowering temperatures during off-hours or adjusting ventilation based on occupancy levels. Inevitably, such control schemes are unable to provide real-time optimal control for HVAC systems since their operation is subject to a myriad of influences, such as outdoor weather conditions, building occupancy, and equipment variability, which are often uncertain or difficult to predict by a set of predefined rules [7]. Furthermore, the RBC approach generally focuses on maintaining comfort conditions and fails to account for energy efficiency or cost optimality. The lack of integration of complicated interactions between different HVAC components is also absent, which can lead to suboptimal overall system performance [8,9].

This uncertainty and complexity have posed significant challenges in the efficient operation of HVAC systems and paved the way for the development of algorithm-inspired control strategies, such as intelligent model-based or model-free methodologies, which can adequately handle the inherent complexity and nonlinearity of the thermal dynamics in buildings [2,7,10,11]. Model-based control approaches, such as the well-known model predictive control (MPC) approach, have become increasingly popular in HVAC control due to their ability to handle constraints and optimize over a forecast. While model-based approaches portray a far more sophisticated approach than RBC, their implementation comes with its own challenges. First, such approaches strictly depend on mathematical models of the system to predict future behavior and optimize control actions based on these predictions. The development of accurate mathematical models adds a significant drawback due to their complex and nonlinear characteristics. Moreover, these models are also inadequate for capturing the dynamic nature of numerous significant factors that influence HVAC performance, including unpredictable external conditions like weather and occupancy [8,12,13,14]. Second, methodologies such as MPC may be computationally intensive, especially for large multifunctional systems, as they require solving an optimization problem at each control step. This fact illustrates a significant challenge regarding implementing such schemes for real-time control, particularly in systems with fast dynamics, where control decisions need to be made frequently [9]. To this end, the need for significant computational resources and the complexity of the algorithms have transformed model-based control into a tedious and costly approach, pushing users to maintain simpler control strategies such as RBC for HVAC control operation. Furthermore, both RBC and MPC require significant tuning and maintenance efforts to maintain their performance at an optimal level. This need for ongoing manual intervention increases the operational cost and makes these control strategies less suitable for applications where resources for maintenance and tuning are limited [8].

In light of these challenges, model-free control methodologies have emerged as promising alternatives. Model-free control strategies offer several advantages for the control of sophisticated frameworks such as building energy management systems (BEMSs), electric vehicle charging stations (EVCSs), traffic management systems (TMSs), IoT ecosystems, and even autonomous navigation of robotics [15,16,17,18,19,20,21,22,23]. To this end, model-independent control methodologies provide a fruitful control approach for sophisticated autonomous HVAC control. Firstly, such strategies are not strictly dependent on explicit models of the system, eliminating the need for the time-consuming and often inaccurate process of system modeling [24]. This is particularly advantageous in HVAC control, where system dynamics are highly nonlinear, complex, and influenced by numerous external factors, especially in large-scale building structures. Moreover, model-free control algorithms, such as the well-known reinforcement learning and artificial neural network approaches, are adequate for directly learning from data, adapting to changes, and effectively handling complex, nonlinear system dynamics. Such data may be historical or real-time data, fostering continual improvements in control policies over time. Such adaptability allows them to outperform traditional control strategies like RBC in situations where the system’s dynamics are changing or uncertain [25,26]. Additionally, model-free strategies are adequate for optimizing multiple objectives simultaneously, such as energy efficiency, comfort, and equipment lifespan, providing a more holistic approach to HVAC control. They can also handle the interactions between different components of the HVAC system, which may lead to improved overall system performance, e.g., multifunctional HVAC frameworks integrating numerous heterogeneous devices. These unique attributes mean that model-free control is a significant focus in the literature on HVAC control. Figure 1 illustrates the aggregated citations per year for model-free HVAC control research approaches for the period 2015–2022, as measured in Scopus, which highlights their importance and potential in the literature.

Figure 1. Number of citations per pear (2015–2022) for model-free HVAC control according to Scopus.

However, despite these advantages, model-independent control approaches also come with challenges. The training of model-free algorithms can become data-intensive and computationally demanding, especially for complex systems like HVAC. Often, large amounts of high-quality data, which are difficult to obtain in practice, are required for the learning process [24,27]. Another significant challenge concerns the lack of interpretability or transparency in model-free algorithms. Unlike model-based strategies, which base control actions on a clear mathematical model of the system, the decision-making process is often opaque. This “black box” nature poses difficulties in diagnosing problems, validating the control strategy, or ensuring it will behave safely and correctly under all possible conditions [24]. Lastly, while model-free control strategies are adequate for adapting to potential system alterations, they also need to balance adaptability with stability. Rapid changes in the control policy may lead to erratic system behavior, increased wear and tear on the equipment, and discomfort for the building occupants [27]. Therefore, developing strategies to ensure the stability and robustness of model-free control poses a significant challenge.

1.1. Literature Analysis Approach

In this extensive review, our goal is to delve into academic articles on model-free HVAC control in both commercial and residential settings. We have combed through diverse papers, examining their main ideas, control strategies, applied algorithms, and specific applications. Our approach is methodical, ensuring each selected article receives thorough attention, and includes the following steps:

Criteria for articles: The related articles were selected based on the following themes: reinforcement learning HVAC control in buildings; neural network HVAC control in buildings; fuzzy logic HVAC control in buildings; and hybrid HVAC control in buildings.
Keyword selection: Relevant terms related to our subject were examined in the recent literature. Search phrases included building HVAC reinforcement learning control; building HVAC deep reinforcement learning control; building HVAC artificial neural network control; building HVAC fuzzy logic control; building HVAC hybrid control; and building HVAC model-free control. These terms were chosen considering the unique challenges and facets of HVAC systems in buildings.
Article selection: Our literature search, primarily in Google Scholar and Scopus, led us to numerous articles. After a quick scan of their abstracts, we selected the most relevant ones for our detailed review.
Data collection: We categorized the information in each article, focusing on the method used for HVAC control and the application’s context by considering various factors like its benefits, limitations, and practical implications, especially regarding ideal HVAC control scenarios.
Quality assessment: Each article selected was assessed for quality based on numerous criteria. These criteria included the number of citations of the paper, the scientific contributions of the authors, and the methodologies employed in the research. This helped gauge each article’s relevance and impact.
Data analysis: Our findings were organized into clear categories, allowing for easy comparison and understanding.

To this end, the primary aim of the current review paper is to provide a clear overview of current model-free HVAC control methods in buildings, highlighting both their strengths and areas for potential improvement.

1.2. Previous Literature Works

Numerous noteworthy reviews concerning HVAC control were identified in the literature, encompassing both model-based and model-free control approaches. The authors of the current work thoroughly examined their contributions to the respective literature in order to set an example and inspire others to undertake specific reviews. Deserving of honorable mention, Belic et al. [8] presented an examination of the various existing HVAC control approaches and their practical implementations within buildings. Additionally, Gholamzadehmir et al. [28] evaluated the deployment and setup of novel adaptive control strategies while pinpointing unexplored areas in this domain that require additional scrutiny. Notably, they also highlighted the drawbacks of model-based control approaches are highlighted in novel applications. Ref. [29] should also be mentioned, as it provided a thorough and analytical overview of computational intelligence methods for forecasting, fine-tuning, managing, and diagnosing faults in HVAC systems. Moreover, an evaluation of the trends indicated that energy usage reduction was the primary goal in the discussed studies, with enhancing thermal comfort, indoor air atmosphere, and inhabitant preferences being secondary objectives. Ref. [30] should also be highlighted for its perspective on field studies related to thermal comfort in residential structures through the identification of a wide range of comfort temperatures in residences, which were found to be largely influenced by climatic conditions and building functionalities. Last but not least, a recent work was also identified [31], which, similar to the current paper, discussed numerous research papers found in the Scopus database. However, it primarily focused on office structures, highlighting that outdoor weather elements have a more pronounced impact on indoor thermal environments compared to those with air-conditioning.

1.3. Novelty and Contributions

The present review stands out from the aforementioned studies by zeroing in on exclusively model-free approaches to HVAC control within building structures. The current work analyzes a significant number of related articles—around 350—and chooses the most highly cited ones (number of citations > 50) for further, detailed exploration. This selection encompasses multiple facets of model-free control, including its foundational principles, applied methods, HVAC equipment used, control strategy details, and testbed characteristics. Notably, our scope is broader than most previous studies, capturing the essence of the latest HVAC research in building environments. The current work additionally offers a granular breakdown of various dimensions related to buildings’ HVAC control. It should be mentioned that at the moment, there are no related works in the literature that concern model-free applications of HVACs in buildings. The majority of reviews focus solely on one approach (such as RL, ANNs, FLC, etc.). Moreover, the philosophy of the current effort is distinct since it considers multiple aspects of model-free applications. While the majority of the reviews focus on methodologies, the current work goes further and identifies other aspects in the evaluation section (e.g., the utilization of different single or multi-HVAC equipment in the field) and even highlights the contribution of multi-zone and real-life applications in comparison with single-zone and simulation works, respectively.

Furthermore, our research concerns model-free HVAC control applications from 2015 to 2023, ensuring the identification of recent trends and charting out potential control trajectories. Our selection of highly cited papers (citations > 50) prioritizes citation counts, authors’ contributions, and the adopted research strategies. Measurements, statistics, and diagrams have also been created according to Scopus metrics in order to identify the current status of model-free HVAC control in buildings. By aggregating this information in a systematic manner, the aim of the current work is to provide summarized information about the current state of knowledge in the field. This not only aids in identifying patterns, gaps, or trends in the existing recent literature but also guides the potential reader in navigating the vast expanse of related works. Furthermore, such an approach facilitates a quick comparison across studies, enabling the interested reader to engage in different research approaches and their outcomes.

1.4. Paper Structure

The sections of this paper are structured as follows. In Section 1, the introduction and motivation of the current work are illustrated, whereas in Section 2, a general description of HVAC operations and types is provided. In Section 3, the mathematical background and overview of primary model-free methodologies and their subdivisions in terms of HVAC applications are presented. Section 4 discusses the primary literature works, divided into the aforementioned model-free approaches. More than 70 highly cited papers from the 2015–2023 period, along with their conceptual backgrounds and results according to the different model-free approaches, are highlighted. Section 5 summarizes the features of the relevant papers regarding different thematic areas. Each column represents a different thematic area concerning the primary model-free approach and its agent type, optimization target, HVAC type, building zone type, building testbed type, and building use type, as well as a summary of the achievements and number of citations of each work. Section 6 utilizes information from Section 5 regarding the different thematic areas and presents separate evaluations of the related testbed attributes that characterize the aforementioned HVAC control works. Section 7 presents valuable conclusions from the evaluation of the highly cited HVAC literature works. Figure 2 portrays a visual representation of the paper structure per integrated Section/Chapter.

Figure 2. Visual representation of the architecture of the current work.

2. General Description of HVAC Systems

HVAC Operations and Types

HVAC systems operate on the fundamental principles of thermodynamics and heat transfer. By leveraging processes like conduction, convection, and radiation, these systems either add or remove heat from indoor spaces. Ventilation components ensure proper air circulation, maintaining optimal air quality. Collectively, HVAC systems create a balanced indoor climate, ensuring both comfort and health for occupants [5,32].

More specifically, HVAC systems utilize a medium (like a refrigerant, water, or air) to transport heat and fans or pumps to facilitate the flow of this medium and air. HVAC systems include the following operations:

Cooling Operation: The cooling operation of an HVAC system starts with the compressor, where the refrigerant is pressurized and heated, converting it into a high-pressure, high-temperature gas. This gas then flows through the condenser coils, typically located outside the building. As outdoor air is blown over these coils by a fan, the heat from the refrigerant dissipates into the environment, causing the refrigerant to condense into a high-pressure liquid. This liquid then passes through the expansion valve, where its pressure drops suddenly, leading to a significant decrease in temperature.
Heating Operation: The cold refrigerant flows into the evaporator coil situated inside the building. As indoor air is circulated over these coils by another fan, the refrigerant absorbs the heat from the air, thereby cooling it. The refrigerant, now warmed, returns to the compressor, and the cycle repeats. On the other hand, the heating operation essentially reverses this process. The system extracts heat from the outdoor air even when it is cold, amplifies it using the compressor, and then transfers this heat indoors through the evaporator coil, thereby warming the interior space.

Several HVAC types are commonly used in various regions and building types [5]. The most common types of HVAC equipment, as denoted in the literature [5], are as follows:

Air-Conditioners (A/C): These are designed to cool the air in a space and include central air-conditioners, window units, or split systems. The control challenge involves precise temperature regulation while optimizing energy consumption, especially for central systems that need to account for the entire building’s thermal dynamics.
Heat Pumps: These pumps provide both heating and cooling by transferring heat energy from one place to another and include types like air-source, ground-source, and water-source pumps. The control challenge usually concerns the optimization of heat transfer, especially during transitional seasons when temperature differences are minimal.
Air-Handling Units (AHUs): These units condition and circulate air as part of an HVAC system and consist of components like blowers, heating or cooling elements, and filters. The control challenge lies in the coordination of these components to ensure optimal air circulation and conditioning while minimizing energy use.
Variable Air-Volume (VAV) Systems: These systems supply variable airflow rates to save energy and better control comfort. In order to potentially optimize their operation, the adjustment of airflow rates in real time based on occupancy and thermal demand is necessary.
Radiant Heating Devices: These devices transfer thermal energy for space heating through connections to boilers or operate using electricity. The control challenge is to maintain consistent heat output and ensure efficient heat transfer.
Boilers: These produce hot water or steam for heating, which is then circulated through pipes. The potential control challenge in this type of equipment usually concerns the preservation of the desired temperature and pressure, ensuring efficient fuel combustion.
Coolers: Evaporative coolers work by evaporating water to cool the air, which is effective in dry regions. For their efficient operation, it is necessary to optimize the evaporation process and manage water consumption.
Furnaces: These are high-temperature heating devices used for central heating. The control challenge is to achieve high-temperature heating without wasting fuel and ensure even distribution of heated air.
Multi-HVAC Systems: These systems integrate multiple types of HVAC equipment into a single framework, enabling zoning. Here, the control challenge is significantly more demanding than single-HVAC units. The coordination of various components to work harmoniously while considering the distinct thermal demands of different zones presents a significantly more complicated task.

3. Conceptual Background of Model-Free Methodologies for HVAC Control

Model-free methodologies for HVAC control have gained traction due to their ability to operate without explicit knowledge of the system dynamics or a mathematical representation of the system. Among these approaches, reinforcement learning (RL) and deep reinforcement learning (DRL) stand out as the most common control strategies for HVAC control. They learn optimal control strategies by interacting with the environment and receiving feedback in the form of rewards or penalties. Artificial neural networks (ANNs) offer another compelling avenue. ANNs, being inherently model-free, learn from data rather than relying on predefined equations of system dynamics. They adjust their internal parameters to approximate functions and predict outcomes based on input patterns, distinguishing them from model-based controls that operate based on an established system model. Meanwhile, fuzzy logic controllers use linguistic rules and fuzzy sets to handle uncertainties and make decisions, again without the need for an explicit model. Such model-free approaches offer flexibility and adaptability and can handle the nonlinearities and complexities inherent in HVAC systems, making them increasingly popular in modern control paradigms.

Figure 3 portrays a flowchart that provides a visual snapshot of how different elements work together in a model-free HVAC control system to achieve optimization targets. Input data are fed into data preprocessing. Data preprocessing feeds into the control algorithms. The control algorithms make decisions and send commands to the HVAC system components. The HVAC system components execute those decisions and their actions result in outputs. The outputs and other feedback mechanisms loop back into the control algorithms for continuous refinement. A regular check is made if the optimization goals are met. If the goals are met, the process might stabilize or undergo minor adjustments. If not, the feedback loop with the control algorithms continues to adjust and optimize. The arrows in the flowchart indicate the direction of flow and interaction between the elements.

Figure 3. Visual representation of model-free HVAC control.

The following subsections describe the mathematical concepts of frameworks that are able to optimize the control of HVAC operations in building structures.

3.1. Reinforcement Learning

Reinforcement learning (RL) is a branch of machine learning (ML) in which autonomous agents are deployed to tackle dynamic and intricate problems, present in both static and continuously changing environments [33]. These intelligent agents operate within their designated environments, receiving feedback in the form of rewards. Based on their behavior, their actions can lead to either improved outcomes or states, resulting in greater rewards, or to suboptimal outcomes, yielding lower rewards. The primary objective for these agents is to maximize the cumulative rewards, accumulated from all states they traverse, and to discover an optimal policy, which is considered the strategy that illustrates the best course of action for a specific problem. As seen in Figure 4, when the agent interacts with its environment, it receives two signals: one for the state and one for the reward.

Figure 4. Basic reinforcement learning framework.

The formulation of the RL problem revolves around Markov decision processes (MPDs), which are mathematical frameworks that describe the interaction of the agent with the environment and the way it receives rewards for its actions. Generally, MDPs are used to model decision-making problems, where the outcome of a situation is uncertain. They are commonly used in mathematics, robotics, economics, healthcare, and artificial intelligence. An RL decision-making problem described by an MDP consists of the tuple (

S, A, P, R, γ, π

), where

S

is the finite set of all possible states in the environment, called the state space;

A

is the finite set of all possible actions the agent can take in each state, called the action space;

P (s_{t + 1} | s_{t}, a_{t}

) is the set of transition probabilities, which defines the likelihood the agent transitions between one state and another, with

t + 1

being the next state and t the current one;

R (s_{t}, a_{t}

) is the function that provides rewards to the agent for its’ actions, called the reward function;

γ \in [0, 1)

is the discount factor ranging from 0 to 1, which determines the trade-off between immediate and future rewards, with values closer to 0 indicating that immediate rewards hold more weight than future ones; and

π (a | s)

is the strategy of the agent, which links states to actions, called the agent’s policy.

MDPs utilize Bellman equations to define the dynamics of a system. These equations are mathematical expressions used to calculate the value of the total reward that can be received through taking action in a certain state by following a specific policy. The value estimates are updated through iterative procedures according to the Bellman Optimality until convergence. When it is achieved, convergence ensures that the overall solution is the optimal one. The Bellman Optimality is a principle that states that a complex decision-making problem should be decomposed into smaller subproblems to reach the optimal solution, as the optimal solution to the initial complex problem is often unreachable without this strategy. Bellman equations consist of value functions that estimate the state values and the state-action values of a system, where the former indicates the value of a state when an action is taken, whereas the latter indicates the value of an action in a certain state. These equations can be seen in Equations (1) and (2), respectively.

V_{π} (s)

is the Bellman equation for the state-value function, which calculates the value of a state or state-action pair when the agent operates under a given policy

π

.

Q_{π} (s, a)

is the Bellman equation for the action-value function, otherwise known as the “Quality” or “Q-function”, which calculates the value of an action the agent chose under a given policy

π

and state s.

V_{π} (s) = \sum_{a} π (a | s) \sum_{s^{'}, r} p (s^{'}, r | s, a) [r + γ V_{π} (s^{'})], \forall s \in S .

(1)

Q_{π} (s, a) = \sum_{s^{'}, r} p (s^{'}, r | s, a) [r + γ \sum_{a^{'}} π (a^{'} | s^{'}) Q_{π} (s^{'}, a^{'})], \forall s \in S, a \in A .

(2)

Both the state-value function and the action-value function have crucial roles in RL, often complementing each other yet also serving distinct purposes depending on the nature of the problem and type of action. For instance, the state-value function is employed in various policy-based methods, such as the classic value iteration, an established dynamic programming (DP) method; policy improvements; policy gradient methods; and approximation techniques [33,34,35]. Its main objective is to estimate the expected cumulative reward for different states and guide necessary policy improvements. On the other hand, the action-value function is used in value-based methods, such as Q-learning or SARSA, both of which are characterized as temporal difference (TD) learning methods. These algorithms are especially suitable for cases where actions are discrete and sequential [36,37]. It is noteworthy that SARSA also integrates Q-learning principles within its main structure. Methods combining both kinds of RL approaches are called actor–critic methods, most of which belong to the deep reinforcement learning category. Value-based, policy-based, and actor–critic approaches are described below.

3.1.1. Value-Based RL Approach

In value-based reinforcement learning, the primary focus is on learning the value function or the Q-function without explicitly representing the policy. The optimal policy

π^{*}

is then derived by selecting actions that maximize the expected value. One popular algorithm in this category is Q-learning, which iteratively updates Q-values using the Bellman equation:

Q (s, a) \leftarrow (1 - α) Q (s, a) + α (r (s, a) + γ max_{a^{'}} Q (s^{'}, a^{'}))

(3)

where

α

is the learning rate and

a^{'}

denotes the possible actions in the next state

s^{'}

.

3.1.2. Policy-Based RL Approach

Unlike value-based methods, policy-based approaches aim to directly learn the optimal policy

π^{*} (a | s)

without undertaking the intermediate step of estimating value functions. The objective is to find a policy that maximizes the expected reward:

J (π) = E_{π} [\sum_{t = 0}^{\infty} γ^{t} r_{t}]

(4)

Policy gradient methods update the policy parameters

θ

in the direction that improves the expected reward, using the gradient

\nabla_{θ} J (π)

.

3.1.3. Actor–Critic RL Approach

The actor–critic reinforcement learning approach combines aspects of both value- and policy-based approaches. The actor produces an action given a state, represented as

π (a | s; θ)

, and the critic evaluates the chosen action using the value function

V (s; ϕ)

. The actor updates its policy in the direction suggested by the critic. The update rules for the actor’s policy parameters

θ

and the critic’s value function parameters

ϕ

can be represented as follows:

\begin{matrix} θ & \leftarrow θ + α \nabla_{θ} log π (a | s; θ) V (s; ϕ) \end{matrix}

(5)

\begin{matrix} ϕ & \leftarrow ϕ + β (r + γ V (s^{'}; ϕ) - V (s; ϕ)) \nabla_{ϕ} V (s; ϕ) \end{matrix}

(6)

where

α

and

β

are the learning rates.

3.2. Artificial Neural Networks

Artificial neural networks (ANNs) are computational models inspired by the human brain’s neural structure and are designed to recognize patterns and make decisions. These networks consist of interconnected nodes or “neurons” that process and transmit information [38]. Unlike traditional control methods that rely on predefined mathematical models of the system, ANNs are considered model-free because they learn directly from data. They adaptively adjust to capture the underlying dynamics of the system without requiring explicit knowledge of its internal workings. In the realm of HVAC control, ANNs have found prominence due to their ability to handle the system’s complex, nonlinear interactions. Through continuous learning from the environment, ANNs can optimize HVAC operations, striking a balance between energy efficiency and occupant comfort. By doing so, they pave the way for buildings that consume less energy while ensuring a pleasant and consistent indoor climate for inhabitants [39]. The simple artificial neuron can be described mathematically by Equation (3), in which y is the output;

α

is the activation, calculated as the sum of the product of the weights and input signals; and

θ

is the threshold that the activation must surpass to activate the neuron [40,41]. This equation determines the output that the ANN will provide depending on the value of x. In Equation (4), the activation is shown as the sum of the aforementioned product between the weights and inputs. This is known as the weighted sum of the neural network’s inputs and is an important concept that determines which neurons will activate, and which will not. One of the simplest forms of ANNs, a network with one input neuron, one output neuron, and no hidden layers, can be seen in Figure 5.

y = \{\begin{matrix} 1 & if α \geq θ \\ 0 & if α < θ \end{matrix}

(7)

α = \sum_{i} w_{i} x_{i}

(8)

Figure 5. The simplest form of an artificial neural network, with only one input and one output neuron.

Neural networks (NNs) offer a versatile approach for modeling and controlling complex systems such as HVAC. The core principle lies in approximating a function that maps inputs (like environment conditions) to outputs (control actions) by adjusting the network’s internal parameters. Given an HVAC system’s environmental state, represented by vector s, a neural network aims to produce an optimal control action a. The architecture of a typical feedforward NN consists of an input layer, several hidden layers, and an output layer:

\begin{matrix} h^{(1)} & = σ_{1} (W^{(1)} s + b^{(1)}) \\ h^{(2)} & = σ_{2} (W^{(2)} h^{(1)} + b^{(2)}) \\ ⋮ \\ a & = σ_{L} (W^{(L)} h^{(L - 1)} + b^{(L)}) \end{matrix}

where

h^{(l)}

is the activation of the

l^{t h}

layer;

W^{(l)}

and

b^{(l)}

represent the weight matrix and bias vector of the

l^{t h}

layer; and

σ_{l}

is an activation function such as ReLU, sigmoid, or tanh. Training involves adjusting the weights W and biases b to minimize a loss function

L

, which quantifies the difference between the predicted actions and actual optimal actions. Using the mean squared error (MSE) as the loss function,

L = \frac{1}{N} \sum_{i = 1}^{N} {(a_{predicted}^{(i)} - a_{true}^{(i)})}^{2}

(9)

In order to minimize this loss, a backpropagation procedure is employed to calculate how each NN parameter contributes to the error. Using this information, optimization techniques like stochastic gradient descent (SGD) or other algorithms adjust these parameters to reduce the error. This iterative process of adjusting parameters to minimize error is known as training the artificial neural networks (ANNs) [42]. Once trained, ANNs are well suited for providing real-time, adaptive control actions based on the current environmental state, thus achieving energy efficiency and comfort.

The most common ANN architectures used for HVAC control are feedforward neural networks (FNNs) and recurrent neural networks (RNNs) such as LSTMs. FNNs model static relationships in HVAC systems, analyzing current conditions like room temperature or humidity to determine immediate control actions without considering past states, whereas RNNs and LSTMs consider temporal dynamics in HVAC control, allowing past conditions and actions to influence present control decisions, making them ideal for predicting longer-term environmental changes.

3.2.1. Feedforward Neural Networks (FNNs)

FNNs represent a foundational architecture in neural networks and consist of multiple interconnected layers, where data flows from the input to the output without looping back. In the context of HVAC, FNNs can model static relationships between input conditions (like current room temperature or humidity) and the required control actions. By design, FNNs are memoryless, making them suited for scenarios where current input data predominantly dictate the control action and there is minimal dependence on past states or actions. Mathematically, FNNs transform input data through successive layers using weights and biases:

a = f_{L} (f_{L - 1} (\dots f_{2} (f_{1} (s; W^{(1)}, b^{(1)}); W^{(2)}, b^{(2)}) \dots); W^{(L)}, b^{(L)})

(10)

where

f_{l}

represents the activation function of the

l^{t h}

layer.

3.2.2. Recurrent Neural Networks (RNNs)

RNNs are ANNs specifically designed for sequential data and allow information to persist through loops in their structure. This makes the RNN architecture advantageous for HVAC control when considering the temporal dynamics, as past conditions can influence present control decisions. However, traditional RNNs suffer from vanishing and exploding gradient problems in long sequences. To this end, long short-term memory networks (LSTMs)—a specialized RNN architecture—were created in order to overcome these limitations by introducing gates and cell states, enabling them to learn long-term dependencies. For an HVAC system, this means that an LSTM could consider how the temperature has been changing throughout the day, not just the current reading, to make informed decisions:

(h_{t}, c_{t}) = LSTM (h_{t - 1}, c_{t - 1}, s_{t}; Θ)

(11)

where

h_{t}

and

c_{t}

are the hidden state and cell state at time t,

s_{t}

is the input, and

Θ

denotes the parameters of the LSTM.

It should also be mentioned that other novel ANN architectures and concepts have been utilized in the literature. For instance, radial basis functions (RBFs), which are not considered traditional neural networks in the same sense as FNNs or RNNs, can be used in a neural network context, known as an RBF network (RBFN). These networks use radial basis functions as activation functions, utilizing their localized nonlinearities. Such an approach offers the ability to capture intricate relationships in HVAC data. This soft-based control is suitable for modeling the effect of specific environmental conditions like a sudden rise in temperature or humidity on the required control action, allowing for nuanced and tailored responses.

3.3. Fuzzy Logic Control

Fuzzy logic (FL) [38,43] is an alternative approach to address complex control problems, especially when dealing with unknown system models or situations involving uncertainty. Fuzzy logic controllers (FLCs) are particularly well suited for controlling HVAC systems because they offer a more intuitive and human-like decision-making approach. Unlike traditional control systems that operate on crisp values, FLC works with fuzzy sets, allowing them to handle imprecise and uncertain data. More specifically, HVAC-related parameters, such as temperature, humidity, or occupancy, are translated into fuzzy sets, representing terms like “cold”, “comfortable”, or “crowded”. These fuzzy sets are defined by membership functions that describe how much a particular value belongs to a set. Based on a predefined rule base, which might include rules like “If the room is cold and occupancy is high, then increase the heating”, the FLC determines the appropriate control action. Such rules are often designed based on expert knowledge or historical data, capturing the nuances of HVAC operations. As conditions change, the system continually evaluates the rules, ensuring that the HVAC operates efficiently. This results in significant energy savings, as the system only uses what is necessary to achieve comfort. Moreover, by considering the blurred boundaries of comfort, FLC ensures that the indoor environment remains consistently pleasant, balancing energy conservation with occupant satisfaction.

According to the literature, the fundamental components of an FLC for HVAC control are the fuzzifier, rule base, inference engine, and Defuzzifier [43].

Fuzzifier: The fuzzifier translates crisp inputs like the current room temperature or humidity into fuzzy sets. For instance, the temperature might be categorized as ‘Cold’, ‘Comfortable’, or ‘Hot’.

μ_{Cold} (T) = \{\begin{matrix} 1 & if T \leq 18 \\ 0 & if T \geq 22 \\ linear between 18 and 22 & otherwise \end{matrix}

(12)

Rule Base: The rule base contains a set of if-then rules that dictate the controller’s behavior. These rules are often derived from expert knowledge. An example rule might be: “If the temperature is ‘Cold’ and humidity is ‘High’, then turn on the heating and reduce ventilation”.

Inference Engine: Given the fuzzy inputs and the rule base, the inference engine determines the degree to which each rule applies. Using methods like the Mamdani [44] or Sugeno [45] approaches, it computes the fuzzy output set.

Defuzzifier: Finally, the defuzzifier converts the fuzzy output sets into crisp control actions. Common defuzzification methods include the centroid method, bisector method, and maximum membership principle.

3.3.1. Mamdani FLC Approach

Among the various FLC approaches, the Mamdani and Sugeno methods are the most prominent. The Mamdani FLC embodies the classical approach to fuzzy inference. It utilizes human-like reasoning by encapsulating knowledge via if-then rules involving fuzzy sets and linguistic terms. The mathematical representation of a typical rule can be denoted as:

R : IF x is A THEN y is B

3.3.2. Sugeno FLC Approach

The Sugeno or TSK (Takagi–Sugeno–Kang) method, proposed by Takagi and Sugeno in 1985, functions as an inference mechanism. Rather than using fuzzy sets for the consequent, the Sugeno model employs polynomial functions or constants. A representative rule in the Sugeno model can be described as follows:

R : IF x is A THEN y = a x + b

Given that the output is already a weighted average, the process of defuzzification becomes redundant or is significantly streamlined. While the Mamdani method aligns with human intuition and is more interpretable, the Sugeno approach is computationally efficient and apt for optimization and control contexts.

4. Literature Review of Model-Free Applications in HVAC Control

This section discusses the numerous highly cited model-free research applications related to HVAC control optimization in an effort to categorize them into the aforementioned model-free HVAC control application sub-fields: reinforcement learning (RL); artificial neural network (ANN); fuzzy logic controller (FLC); hybrid (i.e., the integration of multiple model-free methodologies); and other applications that are not related to any of the aforementioned approaches.

Along with analyzing the related highly cited applications from the period 2015–2023, this section also presents tables that contain summaries of each model-free control sub-field. To this end, Table 1, Table 2, Table 3, Table 4 and Table 5 contain the following features:

Reference: Denoted as Ref. in the first column of each table.
Year: The publication year of each research application.
Methodology: The specific RL/ANN/FLC/hybrid/other type of control methodology applied in the related work.
Agent: Indicates whether the applied control strategy utilizes a single- or multi-agent control philosophy.
HVAC: The specific HVAC equipment type of each application, as described in the published work. Air-conditioning is denoted as AC; heat pumps are denoted as heat pumps; radiant heating is denoted as radiators; cooling devices are denoted as coolers; variable air-volume equipment is denoted as VAV; air-handling units are denoted as AHUs; and multi-HVAC equipment frameworks integrating more than a single device for control are denoted as multi.
Single-zone: An “x” in this column indicates that the testbed application concerns a single-zone building control application.
Multi-zone: An “x” in this column indicates that the testbed application concerns a multi-zone building control application.
Simulation: An “x” in this column indicates that the testbed application concerns a simulation building control application.
Real-life: An “x” in this column indicates that the testbed application concerns a real-world or real-life building control application.
Residential: An “x” in this column indicates that the testbed application concerns a residential building control application.
Commercial: An “x” in this column indicates that the testbed application concerns a commercial building control application.
Citations: Indicates the number of citations of the related work according to Scopus.

The abbreviation “NaN” represents the “not identified” elements in Tables and Figures. In the following subsections, the integrated research applications are described regarding their motivations, their conceptual control methodologies, and their results.

4.1. Literature Review of Reinforcement Learning Control Applications

In 2015, Barrett et al., presented a pioneering study introducing a groundbreaking architecture for reinforcement learning (RL), with its primary application in creating an intelligent thermostat for HVAC systems [46]. This innovative framework was aimed at the autonomous regulation of HVAC systems, prioritizing the dual optimization of energy expenses and occupant comfort. To enable the successful deployment of reinforcement learning within HVAC control, the researchers proposed a unique formalization of the state-action space. Their findings highlighted the efficacy of the RL paradigm, which achieved up to a 10% cost reduction when benchmarked against a traditional programmable thermostat while ensuring superior levels of occupant comfort.

An engaging research study conducted in 2017 by Wei et al. [47] scrutinized the fruitful application of deep reinforcement learning (DRL) in achieving optimal HVAC system control. This study leveraged the acclaimed EnergyPlus simulation framework, revealing that the DRL methodology surpassed the performance of conventional rule-based control (RBC) and traditional Q-learning RL approaches. Empirical evaluations utilizing precise EnergyPlus models, alongside real-world weather and pricing data, demonstrated the superior efficiency of the implemented DRL-based algorithms. These encompassed a standard DRL algorithm, as well as a heuristic adaptation catering to multi-zone control, both of which proved proficient in curtailing energy expenses while preserving a pleasant ambient temperature.

Also in 2017, Wang et al. [48] focused on optimizing energy in buildings by controlling the HVAC system using a model-free actor–critic reinforcement learning (RL) controller with long short-term memory (LSTM) networks. The goal was to improve thermal comfort and energy consumption. The RL controller was tested in an office space model, resulting in an average improvement of 15% in thermal comfort and 2.5% in energy efficiency compared to traditional controls. The RL controller offered the possibility of implementing customized control in building HVAC systems with minimal human intervention.

In a compelling 2018 study, Chen et al. [49] proposed the application of a model-free Q-learning RL control approach to optimize HVAC and window systems. The objective was to simultaneously attenuate energy consumption and thermal discomfort. At each time step, the control system assessed the indoor and outdoor environments, considering factors such as temperature, humidity, solar radiation, and wind velocity to formulate optimal control decisions congruent with current and future goals. The effectiveness of the approach was confirmed through illustrative simulation case studies in a hot and humid climate (Miami, USA) and a warm, temperate climate (Los Angeles, USA). The results demonstrated a notable edge over the RBC heuristic control strategy, achieving a reduction in energy consumption by 13% and 23%, a decrease in discomfort degree hours by 62% and 80%, and mitigation of high humidity hours by 63% and 77% in Miami and Los Angeles, respectively.

Another interesting study was conducted by Chen et al. in 2019 [50], who considered the Gnu-RL methodology. Gnu-RL represents a novel approach to HVAC control that uses historical data from existing HVAC controllers in order to enable the practical deployment of reinforcement learning (RL) without prior information. Gnu-RL adopts a differentiable model predictive control policy for planning and system dynamics and uses imitation learning for pre-training. The agent then continues to improve its policy using a policy gradient algorithm. In the simulation experiment, Gnu-RL achieved a 6.6% reduction in energy consumption compared to the best-published RL result in the same environment while maintaining a higher level of occupant comfort. In both the simulation and real-world testbed experiments, Gnu-RL demonstrated up to a 16.7% decrease in cooling demand compared to the existing controller while maintaining occupant comfort.

In noteworthy research from 2019, Valladares et al. [51] introduced a DRL algorithm that aimed to achieve an equilibrium between optimal thermal comfort and air quality levels while curtailing the energy demands of air-conditioning units and ventilation fans. The AI agent’s training was based on 10-year simulated data from a laboratory and classroom setting, with occupancy rates of up to 60 users. The adept RL agent balanced the requirements of thermal comfort, indoor air quality, and energy consumption, leading to enhanced outcomes. This included an improved predicted mean vote (PMV), an index that predicts people’s thermal sensation on a seven-point scale (+3, hot; +2, warm; +1, slightly warm; 0, neutral; −1, slightly cool; −2, cool; −3, cold). Furthermore, the AI-controlled system exhibited a 10% decrease in CO

_{2}

levels compared to the existing control system while demonstrating an energy reduction of 4–5%.

Also in 2019, an important work by Liu et al. [52] introduced a novel deep deterministic policy gradient (DDPG) algorithm for short-term energy consumption prediction in HVAC systems. The study utilized an autoencoder (AE) to efficiently process raw data, enhancing the DDPG’s ability to identify important features and thereby improving the prediction model. Real-world data from the operation of a ground-source heat pump system in an office building in Henan, China, were exploited to train and test the models. The results illustrated that the DDPG-based models were well suited for short-term energy consumption forecasting, with MAE, RMSE, and Rsquare values of 3.858, 19.092, and 0.992, respectively. The integration of the autoencoder also proved beneficial. In comparison to traditional models such as backpropagation ANNs and support vector machines (SVMs), the AE-DDPG methodology enhanced prediction efficiency: the

R^{2}

improved by more than 1.12%, whereas the MAE and RMSE were reduced by more than 22.46% and 25.96%, respectively.

To address the challenges of low sample efficiency and safety-aware exploration in DRL techniques such as deep Q-networks (DQNs) applied to complex HVAC systems, Zhang et al. [53] proposed an RL approach that was tailored for learning system dynamics using an artificial neural network. The multifunctional HVAC equipment encompassed elements such as a heat exchanger, a chiller providing chilled water to the heat exchanger, a circulating air fan, the thermal space, connecting ductwork, dampers, and mixing air components. Control was conducted by a random-sampling shooting approach, and the proposed method was evaluated through simulation in a two-zone data center case study using the EnergyPlus tool (https://energyplus.net/ 2019). The results demonstrated a reduction in total energy consumption ranging from 17.1% to 21.8% compared to baseline approaches, achieving convergence rates 10x faster than other model-free RL approaches while maintaining an average deviation of trajectories sampled from learned dynamics below 20%.

In 2019, Gao et al. [54] proposed an innovative RL framework known as DeepComfort. Its dual objectives were to optimize energy utilization while upholding thermal comfort in smart buildings through the application of deep reinforcement learning. The building’s thermal control was construed as a cost-minimization problem, considering both the HVAC system’s energy expenditure and the occupants’ thermal comfort. To accomplish this, the researchers deployed an FNN to predict the occupants’ thermal comfort. Additionally, they utilized deep deterministic policy gradients (DDPGs) to learn the thermal control policy. The validation of this approach was executed via a building thermal control simulation system, encompassing diverse scenarios. According to the evaluation, the implementation resulted in substantial improvements, including a 14.5% improvement in the prediction accuracy of thermal comfort, a 4.31% reduction in the HVAC system’s energy consumption, and a commendable 13.6% improvement in maintaining occupants’ comfort.

Also in 2020, Azuatalam et al. [55] put forth a comprehensive framework encapsulating an efficient reinforcement learning (RL)–PPO controller for a holistic building model. This framework aimed at optimizing HVAC operations by improving energy efficiency and comfort, as well as achieving pertinent demand-response objectives. The multifunctional HVAC was equipped with a VAV, boiler, and chiller. The simulation results showed that the employment of RL for routine HVAC operations could culminate in a peak weekly energy reduction of up to 22% when compared to a manually constructed baseline controller. Furthermore, the adoption of a demand-response-aware RL controller during periods of demand response could potentially lead to average power decreases or increases of up to 50% on a weekly basis compared to a standard RL controller.

In an interesting study in 2020, Zou et al. [56] introduced a strategy designed to optimize the control of air-handling units (AHUs) using long short-term memory (LSTM) networks. The aim was to replicate the functioning of real-world HVAC systems and create efficient deep reinforcement learning (DRL) training conditions. The plan also used advanced DRL algorithms such as deep deterministic policy gradients to achieve optimum control of AHUs. The model was tested on three AHUs, each with two years’ worth of building automation system (BAS) data. The hybrid LSTM-based DRL training environments, which were generated from the first year’s BAS data, achieved a highly accurate approximation of the AHU parameters. When deployed under test conditions created from the second year’s BAS data, the DRL agents managed to save 27–30% energy compared to the actual consumption while ensuring a low level of predicted discomfort.

Also in 2020, Lork et al. [57] proposed a data-driven approach designed to address uncertainty in the control of split-type inverter air-conditioners (ACs) in residential buildings. The scientists compiled data from similar ACs and residential units to create balanced datasets and then used Bayesian convolutional neural networks (BCNNs) to model AC performance and uncertainties in these data. Next, a Q-learning-based reinforcement learning algorithm was utilized to make set-point decisions, using the BCNN models for transition sampling. An illustrative case study based on this framework was used to demonstrate the effectiveness of the approach. According to the experimental results, the controller aware of uncertainties performed better compared to conventional rule-based control (RBC), achieving a 7.69% improvement in discomfort measures and a 3.59% improvement in energy-saving potential.

Another notable study from 2020 aimed at reducing the energy cost of a multi-zone commercial building’s HVAC system while considering random zone occupancy, thermal comfort, and indoor air quality comfort. However, this optimization problem was complex since it struggled with unknown thermal dynamics, parameter uncertainties, and constraints associated with indoor temperature and CO

_{2}

concentration. Moreover, the large discrete solution space and non-convex and non-separable objective function made it even more challenging. To this end, Yu et al. [58] reformulated the energy cost minimization problem as a Markov game and proposed an HVAC control algorithm based on multi-agent deep reinforcement learning with an attention mechanism. The proposed algorithm was able to operate without prior knowledge of uncertain parameters or building thermal dynamics models. The simulation results using real-world data showed that the proposed multi-agent deep reinforcement learning (MADRL) algorithm was effective, robust, scalable, and capable of reducing the total energy cost by 75.25% and 56.50% when compared to the rule-based scheme and heuristic control scheme while delivering an adequate comfort level for the occupants.

Table 1. Summary of RL model-free approaches for HVAC control (2015–2023).

Ref.	Year	Methodology	Agent	HVAC	Single-Zone	Multi-Zone	Simulation	Real-Life	Residential	Commercial	Citations
[46]	2015	Q-learning	Single	NaN	x		x				74
[47]	2017	Q-learning	Single	VAV		x	x				246
[48]	2017	OPMCAC	Multi	VAV	x		x			x	79
[49]	2018	Q-learning	Single	Multi	x		x		x		189
[50]	2019	Gnu-RL	Single	VAV	x		x	x		x	73
[51]	2019	DQN	Single	Cooler		x	x			x	92
[52]	2019	DDPG	Single	Heat Pump	x		x				54
[53]	2019	PPO	Single	Multi		x	x			x	64
[54]	2019	DDPG	Single	NaN	x		x				95
[55]	2020	PPO/TRPO	Single	Multi	x		x			x	88
[56]	2020	DDPG	Single	AHU	x		x	x		x	73
[57]	2020	Q-learning	Multi	AC		x	x		x	x	46
[58]	2020	MAAC	Multi	AHU		x	x			x	114
[59]	2021	SAC/TD3	Single	Multi	x		x			x	47
[60]	2021	DDPG	Multi	Heat Pump		x	x	x			114
[61]	2021	DQN	Single	Radiator	x			x	x	x	52

A 2021 study by Biemann et al. [59] evaluated the effectiveness of four actor–critic algorithms in a simulated data center by assessing their ability to maintain thermal stability and enhance energy efficiency while adapting to weather changes. The focus was on data efficiency, given its practical importance. The performance of the actor–critic algorithms (SCA/TD3) was compared to PPO and TRPO policy-based approaches, as well as to a model-based controller used in EnergyPlus. According to the implementation, the HVAC was compromised by multiple components, including the air economizer, variable volume fan, direct–indirect evaporative cooler, cooling coil (in the west zone and a chilled water cooling coil in the east zone), and outdoor air damper. The results indicated that all the RL-applied algorithms were able to maintain the hourly average temperature within the desired range while reducing energy consumption by at least 10%. With increasing training, a smaller trade-off was observed between thermal stability and energy reduction.

In a substantial contribution to the field in 2021, Du et al. [60] introduced a unique methodology to optimize multi-zone residential HVAC systems, employing a deep reinforcement learning (DRL) technique. The primary objective of their research was to minimize energy consumption costs while ensuring user comfort. The methodology, known as the deep deterministic policy gradient (DDPG), exhibited effectiveness in learning through ongoing interactions with a simulated building environment, devoid of any prior model knowledge. According to the simulation results, the DDPG-based HVAC control strategy surpassed the contemporary deep Q-network (DQN), achieving a reduction in the energy consumption cost of 15% and a significant decrease in the comfort violation of 79%. Furthermore, when juxtaposed against a rule-based HVAC control strategy, the DDPG-based strategy demonstrated remarkable efficacy, mitigating the comfort violation by an impressive 98%.

In a 2021 study, Gupta et al. [61] contributed another intriguing approach to HVAC control. In this research, the authors introduced a deep reinforcement learning (DRL) heating controller designed to enhance thermal comfort while minimizing energy costs in smart buildings. The efficacy of the controller was rigorously assessed through comprehensive simulation experiments employing real-world outdoor temperature data. The results obtained through these evaluations corroborated the superiority of the proposed DRL-based controller over traditional thermostat controllers. This novel approach demonstrated improvements in thermal comfort ranging from 15% to 30%, alongside reductions in energy costs ranging from 5% to 12%. The study further extended its investigations to compare the performance of a centralized DRL-based controller with a decentralized configuration, where each heating unit possessed its own DRL-based controller. The empirical findings revealed that as the count of buildings and the variance in their set-point temperatures rose, the decentralized control configuration exhibited superior performance compared to its centralized counterpart.

In 2022, by utilizing the proximal policy optimization (PPO) principles from reinforcement learning algorithms, Li et al. [62] employed a neural network to develop a comprehensive model for producing distinct control actions, specifically, thermostat adjustments. A novel method for minimizing the objective function was introduced to constrain the size of the update steps, thereby increasing the algorithm’s stability. As a result, a co-simulation platform for the thermal storage air-conditioning system was created, linking TRNSYS (https://www.trnsys.com/ 2022) and MATLAB (https://www.mathworks.com/products/matlab.html 2022). This research developed a demand-response strategy informed by time-of-use electricity pricing, considering elements like the environment, thermal comfort, and energy usage. The proposed RL algorithm was able to adapt to the thermostat adjustments during demand-response periods, and thus, the findings demonstrated efficiency in controlling temperature set points. Moreover, compared to a non-thermal storage air-conditioning system with a fixed set point, the approach resulted in an operational cost reduction of 9.17%, indicating the potential of the tool in optimizing HVAC systems.

A study by Lei et al. [63] proposed a practical control framework based on DRL to integrate personalized thermal comfort and the presence of occupants. A branching dueling Q-network (BDQ), an advanced learning agent, was employed to effectively manage the complex, multi-dimensional control tasks associated with HVAC systems. Additionally, a method used to model personal comfort based on tabular data was incorporated, allowing for seamless integration into operations that involve human users. The BDQ agent was first trained in a simulated environment and then applied in a real office setting, where it applied five-dimensional optimization decisions. This real-world deployment allowed the collection of real-time comfort feedback from users, and according to the outlined advantages, resulted in a notable 14% decrease in energy usage for cooling and an 11% improvement in overall thermal comfort.

In 2022, Deng et al. [64] proposed a novel non-stationary deep Q-network (DQN) methodology in order to address the dynamic behavior of HVAC systems in buildings. This methodology was able to identify the points at which the environment in a building altered and generate optimal control decisions for the HVAC system under these evolving conditions: the non-stationary DQN method was able to outperform the existing DQN method in both single- and multi-zone control efforts. Moreover, the simulation results revealed that the novel methodology was able to save up to 13% more energy and enhance thermal comfort by 9% compared to the conventional non-stationary deep Q-network methodology.

Also in 2022, Yu et al. [65] proposed the synergistic control of personal comfort systems (PCSs) and a central HVAC setup in a co-working office environment, aiming at optimal energy usage and individual thermal comfort. The study was initiated by establishing an energy optimization challenge for both PCSs and the HVAC mechanism. Given the ambiguous nature of the thermal behavior and fluctuating variables of buildings, addressing this challenge was complex. Hence, Yu et al. transformed the issue into a Markovian game with diverse participants. By introducing an innovative real-time control strategy, the scientists utilized an attention-centric multi-agent deep learning methodology, bypassing the need for detailed thermal behavior models or preliminary data on variables. Practical simulations indicated the approach decreased energy usage by up to 4.18% and minimized thermal comfort variance by approximately 72.08% compared to existing baseline benchmarks.

4.2. Literature Review of Artificial Neural Network Control Applications

In noteworthy research from 2015, Huang et al. [66] proposed an ANN methodology for modeling multi-zone buildings, taking into account various energy inputs and thermal interactions between zones. This framework enabled accurate temperature predictions and reduced energy consumption. According to the results, the size of an ANN does not necessarily dictate its accuracy. The optimal ANN usually contains an order number no greater than four, as oversized networks may result in large prediction errors with high-frequency noise. The study further illustrated that the proposed multi-zone model exhibited faster computational speed than single-zone models, thereby enabling the development of more accurate and effective ANN-based predictive control.

In 2016, Sholahudin et al. [67] proposed a strategy for forecasting the hourly heating load of a building, relying on various input parameter combinations via a dynamic ANN. The heating load of a standard apartment complex in Seoul was simulated over a winter month using the EnergyPlus software (https://energyplus.net/ 2016). The acquired datasets were then utilized to educate the time-delay neural network (TDNN) models. The Taguchi method was employed to explore the impact of individual input parameters on the heating load: dry-bulb temperature, dew-point temperature, direct normal radiation, diffuse-horizontal radiation, and wind velocity. The findings revealed that external temperature and wind velocity were the most impactful parameters, and the dynamic model yielded superior outcomes compared to the static model. To this end, the Taguchi method effectively curtailed the number of input parameters, and the dynamic ANN accurately forecasted immediate heating loads using a curtailed number of inputs.

Also in 2016, Javed et al. [68] integrated a decentralized smart controller into an Internet of Things (IoT) framework coupled with cloud computing for the training of a random neural network (RandNN). The network assessed parameters such as temperature, humidity, HVAC airflow, and passive infrared sensor (PIR) data. The RandNN-based controller comprised three primary elements: a base station, sensor nodes, and cloud intelligence, each endowed with distinct functionalities. A sensor node with an embedded RandNN-based occupancy estimator approximated the number of occupants in the room and communicated this information to the base station. Accordingly, the base station, equipped with RandNN models, regulated the HVAC based on the set points for heating and cooling. The real-life implementation was compared to basic RBC controllers, illustrating the RandNN controller’s ability to reduce HVAC energy consumption by 27.12%.

Also in 2016, Chae et al. [69] introduced a near-term building energy consumption prediction framework using an ANN enhanced with a Bayesian regularization algorithm. Predicting electricity consumption on a sub-hourly basis was challenging given the intricate consumption trends and data variability. The approach delved into the impact of network design parameters like time lag, hidden neuron count, and training dataset on the model’s performance and adaptability. According to the simulation findings in three urban office buildings, the developed model was able to accurately predict electricity use in 15-min increments and daily peak consumption in a commercial building cluster test scenario by utilizing adaptive training techniques.

In a 2017 research effort, Chen et al. [70] proposed a data-based methodology that established a loop for precise predictive modeling and real-time regulation of building thermal dynamics. The approach was based on a deep recurrent neural network (RNN), which made use of substantial amounts of sensor data. The trained RNN was subsequently incorporated directly into a finite horizon-constrained optimization issue. To transform constrained optimization into an unconstrained optimization problem, the scientists employed an iterative gradient descent approach with momentum to determine optimal control inputs. The simulation results revealed that the proposed method enhanced performance compared to the model-based approach in both building system modeling and control. According to the results, the RNN approach identified a series of control decisions sufficient to reduce energy usage by 30.74%. Conversely, the solution identified by the RC model provided a mere 4.07% reduction in energy consumption measures.

Table 2. Summary of ANN model-free approaches for HVAC control (2015–2023).

Ref.	Year	Methodology	Agent	HVAC	Single-Zone	Multi-Zone	Simulation	Real-Life	Residential	Commercial	Citations
[66]	2015	MLP/FNN	Single	AHU		x	x			x	99
[67]	2016	TDNN/FNN	Multi	Multi	x		x		x		104
[68]	2016	RandNN/FNN	Single	NaN	x			x		x	67
[69]	2016	MLP/FNN	Single	NaN		x	x			x	277
[70]	2017	RNN	Single	Heat Pump	x		x			x	67
[71]	2017	FNN	Single	NaN	x		x			x	544
[72]	2018	MLP/FNN	Single	Multi	x		x	x		x	100
[73]	2018	MLP/FNN	Single	AHU		x	x			x	51
[74]	2018	MLP/FNN	Single	Heat Pump		x	x			x	70
[75]	2019	LSTM/RNN	Single	NaN	x		x			x	60
[76]	2019	MLP/FNN	Single	NaN		x		x		x	60
[77]	2020	LSTM/RNN	Single	Heat Pump	x			x		x	67
[78]	2021	LSTM/RNN	Single	VAV	x		x			x	56

In another important work from 2017, Ahmad et al. [71] evaluated the performance of a common feedforward neural network (FNN) trained with backpropagation to estimate the hourly HVAC energy consumption of a hotel in Madrid, Spain. The optimization performance of the FNN was compared with that of the random forest (RF), a collective methodology increasingly utilized in forecasting. The inclusion of social variables like guest numbers slightly boosted predictive accuracy in both scenarios. According to the evaluation results based on the root-mean-square error (RMSE), mean absolute percentage error (MAPE), mean absolute deviation (MAD), coefficient of variation (CV), and

R^{2}

metrics, the FNN surpassed the RF control across all metrics. Moreover, the results proved that both methodologies exhibited similar predictive accuracy, indicating they were almost equally viable for applications in building energy management.

In 2018, Gonzales et al. [72] introduced an innovative multi-agent system (MAS) approach within a cloud-based ecosystem coupled with a wireless sensor array (WSN) to enhance HVAC energy efficiency. The entities within the MAS acquired social patterns through data assimilation and the application of an artificial neural network (ANN). Moreover, the system utilized sensor data to adapt to the building’s climate and occupancy and also incorporated weather forecasts and non-working periods to optimize HVAC operations. The approach allowed smoother temperature adjustments, reducing sudden shifts that escalated energy use. According to the case study evaluation, the strategy achieved an average energy saving of 41% in office spaces. The reduction in energy consumption was not linearly related to the difference between the outdoor and indoor temperatures.

Also in 2018, Deb et al. [73] focused on the development of two data-driven forecasting tools for energy conservation linked to HVAC systems in commercial buildings in Singapore. Two predictive frameworks, multiple linear regression (MLR) and an artificial neural network (ANN), were formulated. The essence of the research revolved around choosing the optimal predictors, involving an extensive exploration of 819,150 permutations of 14 variables to pinpoint the most precise model. The main metric observed was the variance in energy use intensity (EUI) pre- and post-modification. The findings highlighted the efficiency of the ANN approach, which achieved a deviation rate of 14.8% compared to the multiple linear regression (MLR) approach

The same year, Kim et al. [74] introduced a cost-focused demand approach for multi-offices, balancing HVAC energy expenses with user comfort. A user-friendly digital platform was introduced, allowing users to express their comfort attributes. Consequently, these attributes were interpreted using neural computation methods and embedded into the demand regulation plan. The empirical findings confirmed the efficacy of the methodology in refining heat pump functionality and ensuring user comfort, thereby minimizing potential deviations from the ideal demand-response scheme.

In an interesting work by Wang et al. in 2019 [75], a long short-term memory (LSTM) network—a special type of RNN—was employed to forecast diverse electrical burdens, illumination demands, occupant numbers, and intrinsic heat increments in two U.S. office buildings. Building A was located in Berkeley, was constructed in 2015, and occupied 6397 m

^{2}

, whereas Building B was located in Philadelphia, was constructed in 1911, and occupied 6410 m

^{2}

. The strength of LSTMs lies in their ability to remember patterns over time, which makes them well suited for time-series prediction tasks. According to the simulation data collected in 2018 and 2014, respectively, the LSTM control compared to the pre-established timetables suggested by ASHRAE guidelines implemented control strategies that reduced the prediction inaccuracies of internal heat increments from 12% to 8% in Building A and from 26% to 16% in Building B.

In 2019, Peng et al. [76] showcased a design methodology and a regulation scheme with learning properties in order to enable HVAC systems to adjust to occupant thermal preferences under dynamically changing conditions. Four basic variables were utilized in order to generate datasets for the preference models: time, indoor weather conditions, outdoor weather conditions, and occupants’ behavior. An ANN framework, along with suitable hyperparameters, was trained on the different thermal preferences. For five months, the learning-based thermal preference control (LTPC) was applied to an HVAC system in single- and multi-user office spaces under real-life conditions. The results highlighted energy conservation of between 4% and 25% compared to the static temperature baselines. Moreover, the necessity of user intervention regarding temperature changes was decreased from 4–9 days per month to 1 day per month.

An important work by Sendra et al. [77] introduced the design and development of an ANN predictor that was specifically designed to anticipate the next day’s power utilization of a building’s HVAC system. The highlighted HVAC system was located within MagicBox in Madrid, an actual self-sustaining solar-powered dwelling equipped with a surveillance mechanism. In order to model the predictor, multiple LSTM neural network architectures were proposed, along with appropriate data preparation methods, to refine the raw dataset. According to the evaluation, the LSTM networks achieved significant results, with test errors (NRMSE) held at 0.13 and a correlation of 0.797 between the predictions and actual test time series. The findings were compared with a simplified one-hour-ahead prediction that provided nearly optimal results, offering promising insights into real-time energy prediction in building structures.

In 2021, notable research was conducted by Elmaz et al. [78], who presented a novel convolutional neural network–long short-term memory (CNN–LSTM) architecture, which integrated superior feature extraction and sequential learning capabilities for room temperature prediction. The control framework was built using a range of variables collected from a room at Antwerp University. The approach was compared to the conventional multi-layer perceptron (MLP) and standard LSTM approaches over prediction horizons of 1 to 120 min. Despite the efficient performance of all the concerned ANN frameworks at the 1-min horizon, the CNN–LSTM proved to be more stable and accurate in extended horizons, maintaining an

R^{2}

> 0.9 over a 120 min. prediction horizon.

4.3. Literature Review of Fuzzy Logic Control Applications

In 2015, Saepullah et al. [79] investigated the use of three different fuzzy inference methods: Mamdani, Sugeno, and Tsukamoto. The research was based on experiments using various room temperature and humidity settings as inputs and compressor speeds as the output. In the first experiment, at 27 °C and 44% humidity, the Mamdani method resulted in energy savings of 34.9%. However, the Tsukamoto and Sugeno methods yielded greater energy savings of 58.099% and 73.8%, respectively. In the second experiment, with a room temperature of 33 °C and humidity of 68%, the Mamdani method resulted in energy savings of 13.8%. In comparison, the Tsukamoto method performed better, with energy savings of 31.176%. After comparing the results of all three methods, the researchers concluded that the Tsukamoto method was the most effective in terms of reducing electrical energy consumption, with average energy savings of 74.2775%.

Also in 2015, Keshtkar et al. [80] proposed a methodology for utilizing fuzzy logic, along with wireless technology and smart grid bonuses, in order to eliminate energy wastage in domestic heating and cooling systems. Digitally controlled thermostats (PCTs) regulated these systems, aiming to cut energy usage while preserving daily routines according to demand-response (DR) programs, time-of-use (TOU), and real-time pricing (RTP) metrics. Since manually adjusting energy use for such a proposal represents a cumbersome procedure for household users, the fuzzy logic methodology was incorporated into the PCTs in order to enhance intelligence for reducing loads and safeguarding comfort levels. The PCT was replicated in the simulation software, acting as a test framework for the efficiency of the fuzzy logic method across various scenarios. According to the results, this approach efficiently controlled set points while maintaining comfort by employing specific rules based on data from sensors and smart incentives, thereby providing superior energy and cost-efficiency compared to the traditional PCT approach.

Also in 2016, Ulpiani et al. [81] examined the energy and comfort outcomes of three distinct control strategies (binary, PID, and FLC) for managing a heating system in a green construction. Experiments were conducted in a test structure fitted with electric heaters and an array of sensors, monitoring the interior and exterior thermal states. Assessments were conducted in real time over a span of roughly a week during unoccupied, varied seasonal conditions in a Mediterranean environment. Each control strategy was evaluated for both comfort and energy efficiency and as the results indicated, the Mamdani fuzzy logic controller (FLC) outperformed the other control schemes, achieving energy savings of between 30 and 70% and consistently retaining adequate comfort parameters.

Similarly, in 2017, Keshtkar et al. [82] proposed a solution for energy management in residential HVAC systems, with an adaptable, autonomous approach using supervised fuzzy logic learning (SFLL), wireless sensor capabilities, and dynamic electricity pricing to create a smart thermostat. In cases where user interaction affected system decisions, the adaptive fuzzy logic model (AFLM) was incorporated to adapt to new user preferences. To simulate a flexible residential environment, a house energy simulator was developed, incorporating an HVAC system, smart meter, and thermostat. The autonomous thermostat demonstrated a 21.3% energy-saving potential over a month of simulation, exhibiting its capability to modify daily temperature set points, thereby conserving energy and costs without sacrificing user comfort and requiring user intervention. Notably, in the case the user altered schedules or preferences, the AFLM illustrated a strong capacity for adaptation, learning from these alterations while maintaining energy efficiency

In a 2018 study, Ain et al. [83] developed fuzzy inference systems (FISs) (both Mamdani and Sugeno), which used humidity and room temperature changes to optimize thermostat settings, thereby enhancing user comfort and energy efficiency. An automatic rule generation approach helped manage the complexity of the system. The lightweight FIS, compatible with IoT systems like RIOT, showed a reduction in energy consumption of 28% in simulations. Incorporating additional factors, such as outdoor temperature, occupancy, set points, and price tariffs, increased energy savings by up to 50% in the worst-case scenarios. According to the results, the methodology evaluated under various environmental conditions demonstrated that the Mamdani FIS performed better in warm climates, whereas the Sugeno FIS excelled in colder ones.

In another important work from 2019, Li et al. [84] introduced a unique method for efficiently regulating indoor climates using real-time tracking of thermal user sensations. By employing a fuzzy logic assessment, cumulative thermal feedback from all occupants was established, which was processed by a linear algorithm to refine temperature guidelines. The objective was to instantly respond to occupants’ thermal preferences without requiring manual data. According to the evaluation, the proposed method could account for individual comfort levels, resulting in timely and accurate temperature modifications, while the participants reported higher comfort levels (score of 5.56) under the new system compared to a score of 5.10 using the traditional method. Significant energy savings were also achieved, demonstrating energy conservation of 20.5% for AHUs and 13.4% for water loops. Overall, there was a reduction of 13.8% in energy consumption compared to conventional approaches.

Table 3. Summary of FLC model-free approaches for HVAC control (2015–2023).

Ref.	Year	Methodology	Agent	HVAC	Single-Zone	Multi-Zone	Simulation	Real-Life	Residential	Commercial	Citations
[79]	2015	Mamdani	Single	NaN	x		x				51
[80]	2015	NaN	Single	NaN	x		x		x		58
[81]	2016	Mamdani	Single	Radiator	x			x	x		50
[82]	2017	AFLM	Single	NaN	x		x			x	89
[83]	2018	Mamdani	Single	Radiator	x		x			x	58
[84]	2019	NaN	Single	VAV		x		x		x	50

4.4. Literature Review of Hybrid Model-Free Control Applications

In 2015, an interesting hybrid methodology was proposed by Hussain et al. [85] to support computational intelligence and optimization strategies by enhancing a fuzzy controller’s efficacy within an HVAC system. The goal was to moderate energy use without compromising the comfort of the inhabitants. It used EnergyPlus to compute the predicted mean vote (PMV) and predicted percentage dissatisfied (PPD) indices, whereas the fuzzy controller and the optimization framework were co-simulated using BCVTB and Simulink. These techniques were compared with EnergyPlus’s traditional thermal control of HVAC. The study concluded that a genetic algorithm (GA) could be used to fine-tune the fuzzy logic controller (FLC) to achieve an improved outcome. Compared to EnergyPlus, the PMV was reduced and the overall energy consumption decreased by 16.1% for cooling and 18.1% for heating.

Also in 2015, Wei et al. [86] utilized an ensemble approach, combining a multi-layer perceptron with an ANN, to construct a comprehensive energy model for a building. The model incorporated three indoor air quality representations, encompassing the establishment temperature model, the establishment relative humidity model, and the establishment CO

_{2}

concentration model. To strike a balance between power usage and indoor air quality, a four-objective optimization problem was designed. This problem was addressed using a revised particle swarm optimization (PSO) algorithm, yielding control parameters for the supply air temperature and static pressure of the air management unit. By assigning varying weights to the objectives within the model, the derived control parameters optimized the HVAC system by trading off between power usage and the establishment of thermal comfort. According to simulated evaluations, the multi-layer perceptron (MLP) ensemble procedure demonstrated superior performance compared to seven other techniques and was chosen to form the comprehensive energy model and the trio of indoor air quality (IAQ) models. The total power conservation for the dataset examined in this paper was 17.4% when IAQ restrictions were not applied and 12.4% when IAQ limitations were imposed for one of the eight user preference scenarios.

In 2016, Attaran et al. [87] suggested a novel approach for the energy optimization of HVAC systems using a combination of the radial basis function neural network (RBFNN) and the epsilon constraint (EC) PID approach. This innovative hybrid method leveraged the RBFNN in the HVAC system to predict residual discrepancies, amplify the control signal, and diminish errors. The main aim of this work was to design and test the EC-RBFNN for a self-adjusting PID controller tailored for a distinct bilinear HVAC system, focusing on temperature and humidity control. Comparative simulation case studies revealed the superior precision of the EC-RBFNN method over the standard PID optimization and combined PID-RBFNN.

In 2017, a self-learning control strategy for HVAC systems was proposed by Ghahramani et al. [88] to adjust a building’s HVAC parameters for optimal efficiency and comfort. The specific system combined three key elements: (i) a metaheuristic element employing a k-nearest neighbor (k-NN) stochastic hill-climbing technique; (ii) a machine learning framework using a decision tree (DT) for regression analysis; and (iii) a self-tuning module that carries out a recursive brute-force search. The control strategy sets daily optimal parameters as its primary method of control, ensuring that it enhances, rather than disrupts, existing building management systems. To assess the performance of the novel strategy, Ghahramani et al. employed the reference model of a small office building from the U.S. Department of Energy across all U.S. climate zones. By simulating various control policies using the EnergyPlus software, the novel framework algorithm led to energy savings of 31.17% compared to standard operations (22.5 °C and 3 K). In terms of measurement accuracy across all climate zones, as defined by the normalized root-mean-square error, the algorithm demonstrated a performance score of 0.047.

A data-driven neuro-fuzzy approach was presented in 2018 by Sala-Cardoso et al. [89] for the provision of short-term predictions concerning the HVAC thermal power demand in smart buildings. The innovation lies in estimating the building’s activity level to enhance the prediction system’s response and context awareness, thus increasing accuracy by factoring in the building’s usage pattern. The methodology combined a recurrent neural network (RNN), which learns the dynamics of a specially developed activity indicator, with an adaptive neuro-fuzzy inference system, which correlates activity predictions with outdoor and bus return temperatures to describe the building’s HVAC thermal power demand. An estimation method was also proposed for the indirect monitoring of the aggregated power consumption of the terminal units. Real data from a research building were experimentally utilized for the evaluation of the hybrid approach, and according to the results, a substantial performance enhancement was observed compared to the baseline methodologies, achieving a mean absolute error below 10%.

In their 2019 research, Satrio et al. [90] evaluated the annual energy usage and thermal comfort in a university building equipped with radiant cooling and a variable air-volume (VAV) system, as measured by the predicted percentage of dissatisfied (PPD) value. A multi-goal optimization approach combining artificial neural networks (ANNs) and multi-objective genetic algorithms (MOGAs) was effectively used to determine the optimal operation of the building. The specifically designed ANN configuration demonstrated accurate predictions during the training phase, as indicated by a root-mean-square error (RMSE) of 0.3 for energy consumption and a PPD value of 1. The multi-objective optimization revealed substantial enhancements in the operation of the HVAC system in terms of thermal comfort while maintaining low annual energy consumption compared to the base design.

Recent studies have shown that deep reinforcement learning holds great potential for controlling HVAC systems. However, its complex nature and slow computation limit its practical use in real-time HVAC optimal control. To address this issue, in 2019, Zhang et al. [91] proposed a practical RL control framework called BEM-DRL. The control framework was tested in a real-life application, considering a commercial office that integrated a novel radiant heating system. The control scheme concerned a four-step integration: building energy modeling, model calibration, deep considered learning training, and control deployment. The results of the 78-day real-life evaluation showed that the BEM-DRL framework achieved a 16.7% reduction in heating demand with more than 95% probability compared to the old RBC approach.

In another hybrid control approach from 2022, Ren et al. [92] proposed a novel prediction-driven optimization strategy for real-time scheduling based on upcoming environmental patterns. Utilizing an advanced deep reinforcement learning technique (dueling DDQN), the home energy management system (HEMS) was dispatched optimally. Given the non-standard distribution of HVAC temperature data both indoors and outdoors, a unique generalized correntropy-assisted long short-term memory (GC-LSTM) neural model was presented. This model leveraged the generalized correntropy (GC) loss function for outdoor temperature forecasts. By implementing this technique in an HEMS scenario, the results revealed a notable decrease in user cost while maintaining user comfort.

Table 4. Summary of hybrid model-free approaches for HVAC control (2015–2023).

Ref.	Year	Methodology	Agent	HVAC	Single-Zone	Multi-Zone	Simulation	Real-Life	Residential	Commercial	Citations
[85]	2015	GA-Fuzzy	Single	Multi	x		x			x	61
[86]	2015	PSO-ANN	Single	AHU	x		x			x	111
[87]	2016	EC-ANN	Single	AHU	x		x				54
[88]	2017	k-NN DTBF	Single	VAV		x	x			x	61
[89]	2018	Neuro-Fuzzy	Single	AHU	x		x			x	31
[91]	2019	GA-RL	Single	Radiator	x			x		x	138
[90]	2019	GA-Fuzzy	NaN	VAV		x	x		x		136

4.5. Literature Review of Other Model-Free Control Applications

In 2016, Cai et al. [93] introduced a versatile multi-agent control approach that was suitable for optimizing building energy systems in a “plug-and-play” fashion in order to reduce building-specific engineering efforts. To facilitate distributed decision making, two distinct consensus-based distributed optimization algorithms—a subgradient method and an alternating direction multiplier (ADMM) method—were adjusted and integrated within the framework. The overall approach was validated via simulations in two case studies: the optimization of a chilled water cooling plant and the ideal control of a direct-expansion (DX) air-conditioning system serving a multi-zone building. In both cases, the multi-agent controller effectively found near-optimal solutions, resulting in an overall energy savings potential of 42.7% compared to the baseline centralized conventional approach.

In 2017, Wang et al. [94] addressed the complex issue of reducing the long-term cumulative cost of the HVAC system in a multi-zone commercial building within a smart grid framework. The total cost of the objective function was depicted as a combination of the energy expenses and the cost related to thermal discomfort. The paper formulated a stochastic program that considered various uncertainties, such as fluctuations in electricity prices, outdoor temperatures, preferred comfort levels, and external thermal disruptions. Moreover, the constraints involved were coupled both spatially and temporally, and the uncertainty of future parameters added further challenges. To address this problem, the authors introduced a real-time HVAC control algorithm that utilized Lyapunov optimization techniques. This innovative approach did not require predictions or specific knowledge about stochastic information, focusing instead on constructing and stabilizing virtual queues connected to indoor temperatures across various zones. The implementation of the proposed cost-aware real-time algorithm (CDRA) was distributed, emphasizing user privacy and improving scalability. Through extensive simulations based on real-world data, the study demonstrated that the introduced algorithm could effectively reduce energy costs by up to 52.43% while only minimally impacting thermal comfort.

In 2017, Peng et al. [95] employed a k-nearest neighbor (k-NN) learning-based, demand-driven control strategy for sensible cooling aimed at predicting the occupants’ future presence and the duration of that presence for the rest of the day by learning from their past and current behaviors. The research approach integrated seven months of occupancy data from motion signals in six offices occupied by ten individuals in a commercial building, encompassing both private and multi-person offices. The predicted occupancy information was indirectly used to deduce setback temperature set points, based on specific rules outlined in the study. During a two-month period, both a baseline control and the innovative demand-driven control were deployed on forty-two real-world occupancy weekdays. According to the final evaluation, the use of demand-driven control led to an energy saving of 20.3% compared to the standard benchmark.

Similarly, in 2018, Peng et al. [96] focused on enhancing the efficiency of HVAC systems by adapting them to occupants’ real-time behavior, specifically in office environments. Since rooms in office buildings are not continuously occupied during scheduled HVAC service times, there exists potential to reduce unnecessary energy usage linked with occupants’ actions. To address this, the study conducted a comprehensive analysis of occupants’ unpredictable behavior within an office building and proposed a demand-driven control strategy. This strategy automatically adapted to occupants’ energy-related actions to reduce energy consumption while maintaining room temperatures comparable to static cooling. The approach by Peng et al. included two kinds of machine learning techniques: unsupervised and supervised learning. These data-based approaches were adapted to occupants’ behavior in two distinct learning processes. The information gathered about occupancy was then utilized through a defined set of rules to deduce real-time room set points for managing the office space’s cooling system. This method aimed to minimize the need for human involvement in the control of the cooling system. The proposed strategy was put into practice for controlling the cooling system in real-world office settings, covering three typical office types—single-person offices, multi-person offices, and meeting rooms—across eleven case study office spaces. The experiments demonstrated energy savings ranging from 7% to 52% compared to traditionally scheduled cooling systems.

In 2020, Li et al. [97] proposed a distributed multi-agent approach for optimal control for multi-zone ventilation systems. The approach focused on indoor air quality (IAQ) and energy use by optimizing individual room ventilation volumes and the primary air-handling unit (PAU). The complex optimization problem was divided into simpler parts and handled by distributed agents, each representing individual rooms and the PAU. Two control scenarios in varying external weather conditions were executed on a TRNSYS-MATLAB collaborative simulation platform to authenticate the suggested multi-agent-based decentralized control method by comparing it with a standard control method and a unified optimal control method. A coordinating agent integrated these agents to find the optimal solutions. The proposed approach was validated using two control test cases under different weather conditions, simulated in a TRNSYS-MATLAB environment. The results showed that the distributed approach could match the optimum output of the centralized control approach.

Table 5. Summary of other model-free approaches for HVAC control (2015–2023).

Ref.	Year	Methodology	Agent	HVAC	Single-Zone	Multi-Zone	Simulation	Real-Life	Residential	Commercial	Citations
[93]	2016	ADMM	Multi	Multi		x	x			x	70
[94]	2017	CDRA	Single	VAV		x	x			x	58
[95]	2017	kNN	Single	Multi	x			x		x	70
[96]	2018	kNN	Single	Cooler		x	x		x		199
[97]	2020	ADMM	Multi	AHU		x	x			x	50

5. Evaluation

In this section, we discuss valuable insights into the previously discussed related research works. It should be noted that each of the tables in the previous section was based on a different thematic area, including RL, ANN, FLC, hybrid, and other model-free strategies. Furthermore, each table encompassed various features of these applications, including the single- or multi-agent character of each algorithmic approach, the specific type of HVAC equipment discussed in the related literature, the single- or multi-zone building-tested HVAC control application, the simulation or real-life results of the HVAC control application, the residential or commercial use of each building tested, and the number of citations of each research application. Each of the aforementioned thematic areas concerning the framework of HVAC control in buildings is thoroughly examined using visual analytics and statistics to highlight the most common features represented by model-free HVAC control.

5.1. Evaluation of Model-Free Control Strategies

The current work gathered a significant amount of highly cited research papers between 2015 and 2023, which represented the majority of research according to the number of citations. While the overall model-free control applications for HVACs numbered almost 9469 citations, the current work utilized 4256 citations (44.5%) covering the specific thematic areas to guarantee an established and verified evaluation. To this end, Figure 6 illustrates the distribution of citations (%) for each model-free HVAC control methodology for the period 2015–2022, as recorded in Scopus. This helps identify which approach was most prevalent overall and garnered a high amount of interest in recent years (2015–2023).

Figure 6. Distribution of citations (%) per model-free control methodology according to Scopus (2015–2022).

Similarly, the following Figure 7 illustrates the number of citations of the different model-free approaches per methodology and year in an effort to identify which approach represents the most prevalent on a yearly basis.

Figure 7. Number of citations annually per model-free control methodology according to Scopus (2015–2022).

5.1.1. Evaluation of Reinforcement Learning Control Strategies

According to the overall RL evaluation, value-based RL types represent the most prevalent approach, at least in terms of citation numbers. Figure 8 (right) reveals that value-based approaches garnered the most research interest, accounting for 47% of the overall share of citations, even though the number of integrated applications was not the highest (Figure 8 left). Among them, Q-learning-based algorithmic methodologies emerged as the most common methodology for HVAC control in buildings (Figure 8 center). Such methodologies utilize Q-tables to estimate the value of taking certain actions in certain states and exhibit particular efficiency in scenarios where the number of possible states and actions is relatively small, as the Q-table may become impractically large otherwise [46,47,49,57]. Another value-based (and also Q-learning-based) methodology that was commonly used in the literature was the deep Q-network (DQN). In this DRL approach, the integrated NN is used to approximate the Q-function, which provides the expected return (i.e., cumulative discounted reward) for taking a certain action in a certain state following a given policy. The NN receives as input the current state of the environment and outputs a Q-value for each possible action. The agent then chooses the action with the highest Q-value. The use of NNs in this way is tailored for discrete action spaces and allows DRL to handle complex tasks [51,61].

Figure 8. Occurrence of RL types (Left), percentage share (%) of citations for each RL type (Right), and occurrence of specific RL methodologies (Center) in the related literature (2015–2023).

On the other hand, policy-based approaches, such as proximal policy optimization (PPO) [53] and TPRO [55], attracted less research interest. According to the evaluation, they achieved the lowest (15%) share of citations (Figure 8 right), which is consistent with their less common occurrence in the highly cited related research works (Figure 8 center and left). Such approaches seek to improve policy updates by constraining them, ensuring they do not deviate too far from the previous policy [53,62].

However, actor–critic RL types attracted significant research interest, with a 37% share of overall citations (Figure 8 right), and their appearance was also relatively high in the related literature (Figure 8 left). The most common actor–critic approach was DDPG (Figure 8 center), which combines the concepts of value- and policy-based methods, utilizing both a Q-function and a policy network, whereas DQN primarily focuses on approximating the Q-function [52,54,56,60]. Other actor–critic methodologies discussed in the related literature include the soft actor–critic (SAC) approach [59], multi-agent actor–critic (MAAC) approach [58], on-policy Monte Carlo actor–critic (OPMCAC) approach [48], etc. (Figure 8 center).

Figure 8 (left) shows the occurrence of RL types, Figure 8 (right) shows the percentage share of citations for each RL type, and Figure 8 (center) illustrates the occurrence of the specific RL methodologies in the related literature (2015–2023).

5.1.2. Evaluation of Artificial Neural Network Control Strategies

Feedforward neural networks (FNNs) [66,67,68,69,71,73,74,76] dominated in terms of both occurrence (Figure 9 left) and share of citations (85%, as illustrated Figure 9 right). According to the overall ANN evaluation, the multi-layer perceptron (MLP) was considered the dominant FNN methodology among the research implementations that integrated ANNs for HVAC control (Figure 9 center). MLPs contain multiple hidden layers and provide a balance between model complexity and computational efficiency, ensuring timely decisions in real-time control scenarios [66,69,72,73,74,76]. Moreover, while simple FNN architectures [71] might require specific input structures or activation functions, MLPs offer greater flexibility in design and adaptability to diverse HVAC system characteristics. Other FNN architectures found in the literature considered random neural networks (RandNN), which are suitable for real-time HVAC control, where swift adaptation to changing conditions is crucial [68], and also time-delay neural networks (TDNN), which are designed for capturing temporal dependencies in data [67].

Figure 9. Number of highly cited research works per ANN Type (Left), Number of highly cited research works per ANN Methodology (Center), and share (%) of citations per ANN type (Right) (2015–2022).

Another common type was recurrent neural networks (RNN) (15%, as illustrated in Figure 9 center and right). Such frameworks are specially designed to excel at processing sequential data, rendering them ideal for HVAC control, where past states influence current decisions. RNNs’ inherent memory capability allows them to capture temporal dependencies, vital for understanding patterns in fluctuating environmental conditions and occupant behaviors [70,75,77,78]. This ability to consider historical data ensures more informed and adaptive control decisions, optimizing energy usage and comfort over time. However, RNNs commonly suffer from gradient dependence problems. As a result, a special type of RNN, long short-term memory (LSTM) networks, were widely utilized (Figure 9 center) in order to retain long-term dependencies in the data. They are also adept at understanding longer patterns in HVAC sequences, which may span days or seasons [75,77,78]. Furthermore, their gated mechanisms, including input, forget, and output gates, provide a more nuanced control over the information flow, ensuring that relevant data are maintained while discarding noise or irrelevant past states, leading to more accurate and efficient HVAC control decisions.

Figure 9 (left) shows the occurrence of ANN types, Figure 9 (right) shows the percentage share of citations for each ANN type, and Figure 9 (center) illustrates the occurrence of the specific ANN methodologies in the related works (2015–2023).

5.1.3. Evaluation of Fuzzy Logic Control Strategies

The FLC approach can effectively handle uncertainties and imprecision, which is common in HVAC environments. The FLC design framework is more simple, especially when expert knowledge about system behavior is available, allowing for the intuitive setting of rules. Additionally, unlike deep learning-based methods (e.g., DRL and ANNs), FLC does not require extensive data for training, making it more suitable for environments with limited data. Another important factor lies in the fact that FLC often requires less computational power and resources, leading to faster real-time responses, which is crucial in HVAC systems.

The literature includes numerous highly cited research works where the Mamdani FLC type is utilized as the HVAC control methodology. Mamdani FLC is commonly preferred (Figure 10) in HVAC control since it can capture human-like reasoning with its interpretable rule-based structure, rendering it intuitive for experts to design and adjust [79,81,83]. Furthermore, the widespread recognition and established methodologies of the Mamdani approach have rendered it a trusted choice among HVAC system researchers.

Figure 10. Number of highly cited research works per FLC type (2015–2022).

Another common FLC approach is the Sugeno fuzzy logic controller, also known as the Takagi–Sugeno approach. The Sugeno output functions are typically linear or constant, allowing for faster calculations, which is crucial in real-time HVAC control [79,83]. Additionally, the Sugeno FLC method facilitates mathematical analysis and optimization, making it easier to integrate with other control strategies or adapt based on data-driven approaches, ensuring precise and adaptive HVAC operations. Such an approach may potentially offer a more compact and computationally efficient approach compared to the Mamdani FLC approach. It is noticeable that their comparison in the literature was commonly found in the related research works [79,83]. Figure 10 reveals the Mamdani and Sugeno approaches as the most common types of FLCs in building HVAC systems in the related literature.

Figure 10 illustrates the number of highly cited research works for each FLC methodology.

5.1.4. Evaluation of Hybrid Model-Free Control Strategies

The primary advantage of hybrid schemes lies in the combination of each methodology, offering better control strategies that can adapt to the highly dynamic building environment and uncertain external factors like weather changes, occupancy variations, and energy price fluctuations. To this end, the occurrence of the prevailing model-free strategies in hybrid control schemes is another interesting topic, illustrating the potential of each methodology to act as part of an integrated control solution that involves distinct and particular solutions in HVAC control. According to the number of occurrences for each of the model-free control strategies, as illustrated in Figure 11, fuzzy logic controller-based (or FLC-based) and GA-based approaches attracted significant interest.

Figure 11. Number of hybrid highly cited research works per type (2015–2022).

The occurrence of FLC-based control schemes in the hybrid optimization mix was quite common. FLC-based approaches are known for their provision of a robust framework for handling uncertainties and imprecision, often encountered in HVAC systems, without requiring extensive training data. FLC integration ensures interpretability and deterministic reasoning, augmenting the adaptability and precision of hybrid control strategies, especially in scenarios with sparse or noisy data [85,89,90]. One widely utilized type of fuzzy-based control is neuro-fuzzy control, which ingeniously combines the inherent interpretability and linguistic representation of fuzzy control with the robust learning capabilities of ANNs [89]. As a result, neuro-fuzzy systems provide a transparent view of their behavior, making them suitable for intricate applications. At the same time, they can learn and adapt to the best rules derived from data, bypassing the need for exhaustive expert inputs [89].

Moreover, a fruitful integration of model-free strategies involves genetic algorithms (GAs) and ANN [86], FLC [85,90], and RL [91] methodologies. Inspired by natural evolution, this type of evolutionary algorithm provides a robust search mechanism for optimizing control strategies, which are further fine-tuned by the learning ability of RL, FLCs, or the pattern recognition of neural networks.

ANNs are also widely used, providing an efficient data-driven framework that captures nonlinearities and intricate interactions, thereby enhancing the adaptability of the control strategy. According to the evaluations, the flexibility of ANNs ensures consistent performance and generalization across diverse building configurations and environmental conditions, cementing their role in multiple hybrid control methodologies [86,87,89,90,92].

While RL-based approaches are less common in hybrid solutions compared to neuro- and fuzzy-based approaches, they can provide significant adaptability attributes, enabling the system to continuously refine its decisions based on real-time feedback and changing conditions. According to the literature evaluations, RL integration ensures that the control strategy remains responsive to evolving energy efficiency goals and occupant comfort preferences, even in the face of unforeseen challenges [91,92].

5.1.5. Evaluation of Other Model-free Control Strategies

Unlike reinforcement learning (RL), which often requires a significant amount of data and iterative training to fine-tune control policies, these algorithms can be deployed with minimal data overhead. Compared to artificial neural networks (ANNs), independent model-free approaches sidestep the need for extensive training datasets and the computational overhead associated with neural network training. Furthermore, compared to the fuzzy logic controller (FLC), they present a more straightforward implementation, eliminating the complexities of defining membership functions and rule bases.

The k-nearest neighbors (k-NN) algorithm attracted significant interest in model-free HVAC control. Its non-parametric nature allows for a flexible adaptation to diverse data distributions without a predefined model structure. By leveraging historical data, k-NN provides real-time predictions based on the proximity of current conditions to past scenarios. Its simplicity, combined with its ability to handle multi-dimensional inputs, makes k-NN an effective tool in situations where intricate dynamics need quick, data-driven decisions without the overhead of more complex algorithms [95,96].

Additionally, ADMMs (alternating direction methods of multipliers) are also common due to their ability to decompose large optimization problems into smaller sub-problems, making them scalable in complex HVAC systems with multiple components. The iterative philosophy of ADMMs allows for the adaptation and control of decision generation in real time. Furthermore, ADMMs can effectively handle constraints, ensuring HVAC operations remain within specified limits [93,97].

Last but not least, it should be noted that the modern literature illustrates numerous efficient algorithmic approaches for achieving efficient optimal control in HVAC systems. For instance, centralized and distributed cognitive adaptive optimization methodologies [98], as well as distributive adaptive control [99], serve as valuable tools for scenarios where rapid deployment with minimal system knowledge is crucial. These methodologies are sufficient for continuous learning from occupant preferences, ensuring enhanced comfort and energy efficiency. Their adaptive nature enables dynamic responses to changing environmental conditions and user behavior, optimizing HVAC operations in real time, similar to ADMM approaches [98,99].

5.2. Evaluation of Agent-Based Optimization Strategies

The adoption of multi-agent or distributed-control approaches for HVAC systems has the advantage of localized decision making, facilitating quicker and more tailored responses to dynamic environmental and occupancy conditions. Especially in model-free approaches, agents make decisions based on data, experience, or heuristic methods, without relying on an explicit model of the system dynamics. To this end, each agent learns or decides its behavior action based on its own experience or the shared experience of other agents. This can be particularly useful in environments where creating an accurate model is challenging or impractical. However, it might require significant data or interaction time for agents to learn optimal or near-optimal strategies. To this end, HVAC systems can address micro-environmental variations within larger spaces, enhancing occupant comfort and optimizing energy usage in specific zones without overarching central commands. This ensures a more granular level of control, often resulting in energy savings and increased system efficiency. Moreover, in the event of a system failure or malfunction in one zone, a decentralized approach can prevent cascading failures across the entire system.

This integration, however, requires sophisticated coordination algorithms to ensure efficient coordination and avoid counterproductive actions. The initial setup, calibration, and maintenance of such systems can be more complex than their centralized counterparts. Furthermore, ensuring seamless communication between agents is challenging, especially in larger buildings with more potential for interference. Recent innovations in decentralized model-free control have managed to eliminate such drawbacks since they present agent ecosystems that do not directly interact with each other. In such cases, the agent ecosystem can be controlled for a common optimization goal, which is centrally calculated and transmitted back to agents in a decentralized manner in the form of control decisions. A useful example of such approaches is decentralized cognitive adaptive optimization (CAO) [42,98,100], which was widely used in the related works for HVAC control. In such cases, no data need to be exchanged between agents, thereby limiting a significant amount of communication requirements compared to the centralized control approach.

Figure 12 reveals the number of applications and percentage share of citations for multi-agent control approaches for the period 2015–2023.

Figure 12. Number of single- and multi-agent applications and share (%) of citations in the HVAC control-related papers (2015–2022).

We might say that multi-agent control in HVAC embodies the future of building energy management, providing a responsive and adaptive approach that aligns with the complexities of contemporary architectural designs and occupant needs. The current paper discusses and analyzes numerous highly cited research works from the period 2015–2023 regarding multi-agent approaches [58,61,68,93,94,97,101],

5.3. Evaluation of HVAC Equipment Types

Single-HVAC systems encompass a diverse range of equipment types, including heat pumps, air coolers, radiant heaters, AHUs (air-handling units), VAV (variable air-volume) systems, boilers, furnaces, etc. Each of these equipment types possesses distinct operational characteristics and challenges for conventional control schemes. For instance, heat pumps utilize the vapor-compression cycle, alternating between heating and cooling modes, introducing a level of operational nonlinearity that may challenge conventional control schemes [52,60,70,74,77]. In contrast, air coolers, largely depend on ambient conditions and exhibit distinct transient behavior during the startup and shutdown phases. This transient nature may increase the complexity of prediction frameworks, as they often require consistent patterns to achieve accurate forecasting [51,62]. Radiant heaters provide warmth by emitting infrared radiation, and their radiant nature means that the warmth is not uniformly distributed but rather focused on surfaces directly in their path. This localized effect, especially in large spaces, can introduce complexities, which rely on generalized rules to modulate system behavior [61,81,82,91]. Air-handling units (AHUs) and variable air-volume (VAV) systems add another layer of complexity. AHUs condition and circulate air, often integrating both heating and cooling coils. Their multi-faceted role means that any control scheme must juggle multiple variables simultaneously [56,58,73,86,87,89,97]. VAV systems, on the other hand, modulate airflow, introducing nonlinear dynamics that may confuse reinforcement learning (RL) algorithms, particularly when attempting to balance comfort with energy conservation [47,48,50,78,84,88,90,94]. According to our evaluation, the most commonly considered type of HVAC equipment in the literature implementations is the VAV system. Figure 13 (left) illustrates the high occurrence of VAV systems in the related literature, which attracted the highest share of research interest in terms of the number of citations (17%, as illustrated in Figure 13 right).

Figure 13. Number of applications and share of citations regarding the different types of equipment identified in the HVAC control-related papers (2015–2022).

Multi-HVAC systems, however, represent the most challenging control implementations for HVAC frameworks since they combine several of the above types of single-HVAC equipment [49,53,55,59,67,72,85,93]. Particularly in the case of integrating non-HVAC equipment into the optimization mix, the problem’s complexity rises exponentially. Such inter-dependencies and compounded operational intricacies can be challenging, even for sophisticated control schemes. For instance, the feedback loop essential for RL can become muddled when multiple equipment types react in tandem. Similarly, models might struggle to discern patterns due to the multitude of variables at play while grappling with defining a comprehensive rule base that encapsulates all equipment interactions. It should be mentioned that model-free control schemes are potentially more suitable for providing optimal control schemes for such multi-HVAC frameworks compared to model-based approaches primarily because of their adaptability. Integrated multi-HVAC systems involve complex interactions, and predicting these using predefined models can be challenging. Model-free control methods, such as reinforcement learning (RL) or neural networks, are adept at learning directly from the system’s responses, eliminating the need for precise system modeling. This offers greater flexibility in handling unforeseen dynamics or nonlinearities. Moreover, model-free approaches can adapt over time, enhancing efficiency and performance. However, the trade-off is the significant data requirement and computational intensity for training these controls. According to our evaluation of the related literature on model-free HVAC control, multi-HVAC implementations represent the highest number of applications (Figure 13 left) and also the highest number of citations (17%, as illustrated in Figure 13 right), along with VAV equipment.

Figure 13 illustrates the number of applications (left) and share of citations (right) (%) for HVAC control applications regarding the different types of equipment identified in the related literature for the period 2015–2023. NaN indicates cases that do not belong to any of the above schemes.

5.4. Evaluation of Building Zones

Both single- and multi-zone HVAC control are utilized depending on the specific requirements and characteristics of the building testbed. Single-zone control most commonly involves a single HVAC control system, e.g., a heat pump, that regulates the entire building’s environment. Such approaches typically concern cases where standard temperature and air quality levels across spaces are acceptable or desired by residents. This approach offers simplicity and straightforwardness in terms of the experimental setup; however, it lacks integration for potential variations in individual room occupancy and the varying thermal gains and losses in different areas of the building. The majority of the related highly cited research works between 2015 and 2023 pertain to single-zone HVAC control, offering a more linear control scheme that mainly focuses on the aggregate needs of the entire building (Figure 14 left).

Figure 14. Comparison of the number of applications and share of citations (%) for single-zone and multi-zone HVAC control-related papers (2015–2022).

On the other hand, multi-zone applications are designed to cater to larger-scale implementations with distinct sections that may have differing requirements concerning HVAC operations. In a multi-zone setup, each zone represents an individual control system, ensuring that different parts of the building maintain specific temperatures and air quality levels. This approach is particularly useful in cases where different sections of the structure have unique requirements due to their function and features, such as an office building housing a server room or a museum with specific temperature-sensitive areas. The complexity of multi-zone systems lies in ensuring harmonious operation, where each control does not counteract another, thereby optimizing overall energy usage. To this end, multi-zone control necessitates a more layered strategy, accounting for interactions between zones and the dynamic demands of each [47,53,55,57,58,60,64,66,76,88,91,93,94,96,97,102].

While multi-zone implementations are fewer than single-zone ones, they also represent a significant research interest in the literature (41%, see Figure 14 right), highlighting their importance and prominence. Figure 14 illustrates the number of applications and share of citations of single- and multi-zone building control testbeds for the period 2015–2023. “NaN” indicates cases that do not belong to any of the above schemes.

5.5. Evaluation of Building Testbeds

Real-life implementations in HVAC control research represent a critical step beyond theoretical designs and simulation results. While simulations are undeniably valuable for initial evaluations and optimization, they inherently operate under predefined conditions and assumptions. In contrast, real-life implementations confront the vast intricacies and unpredictability of the actual environment. Operating in real-life conditions allows for efficient testing and refinement of HVAC control systems in the face of genuine challenges, thereby enabling direct feedback on system performance, user comfort, and energy efficiency. Of course, real-life implementations require significant efforts; thus, in the literature, they are relatively limited. Figure 15 illustrates the number of applications and share of citations for simulations and real-life building control testbeds for the period 2015–2023. NaN indicates cases that do not belong to any of the above schemes.

Figure 15. Comparison of the number of applications and share of citations (%) for simulations and real-life HVAC control-related papers (2015–2022).

The current work discusses and analyzes several highly cited research works, allowing us to highlight the significance of real-life research conducted between 2015 and 2023 [7,50,56,63,68,76,91].

5.6. Evaluation of Building Use

Commercial buildings, which commonly include office spaces, research centers, universities, malls, museums, etc., have a broader spectrum of requirements. They usually incorporate more multi-HVAC frameworks due to their larger size, varied room functions, and higher occupant density. In such spaces, the HVAC system’s primary goal is to maintain a uniform comfort level throughout the building, catering to a general populace rather than individual preferences. Therefore, the control strategies employed in commercial settings often tend toward predictive algorithms, anticipating the daily influx and egress of people, and adjusting the HVAC operations preemptively. These strategies also consider aspects like indoor air quality, which becomes critical given the higher occupant density. Commercial testbeds are chosen when research targets the examination of large-scale system optimizations, integration with other building systems, or energy conservation in high-density environments. The majority of the often-cited model-free research works discussed here primarily concern commercial or tertiary buildings, where experiments have the opportunity to be conducted efficiently and accurately on a daily basis and energy-saving potential is noticeably larger compared to residential testbeds. Figure 16 illustrates the dominance of commercial building testbeds in the literature in terms of both numbers and share of citations (%).

Figure 16. Comparison of the number of applications and share of citations (%) for commercial and residential HVAC control-related papers (2015–2022).

Residential buildings typically contain fewer occupants than commercial ones and usually have single-HVAC systems. Such residential testbeds are characterized by sporadic occupancy patterns, which are tied to routines like work and sleep. Consequently, HVAC control in residential settings often prioritizes individual comfort, catering to specific rooms or zones based on occupancy and the personal preferences of the inhabitants. The control strategies employed here are often more reactive and adjust quickly to occupant feedback and external conditions like changing weather patterns. Residential testbeds are ideal when the goal is to study user-centric algorithms, personalized comfort settings, and energy savings tied to individual behavior and preferences.

More specifically, the current work discusses and analyzes several highly cited research papers, allowing us to highlight the significance of residential research conducted between 2015 and 2023 [49,57,60,61,62,67,80,81,82,83,92].

6. Future Directions in HVAC Control Using Model-Free Strategies

The future directions arising from the holistic evaluation of the studies concern suggestions for improving the quality of current research considering the various deployed mode-free strategies and tendencies in the field of HVAC control.

Reinforcement Learning (RL): According to our evaluation, the primary utilized strategy considering the model-free HVAC control framework is RL, specifically DRL. The potential of RL in HVAC control is vast, but there are several areas ripe for exploration. One of the primary challenges is the sample inefficiency of many RL algorithms such as Q-learning, deep Q-networks (DQNs), and vanilla policy gradient methods. These algorithms often require a large number of samples (interactions with the environment) to learn a satisfactory policy, which can be costly and time-consuming in real-world HVAC scenarios. On the other hand, more advanced algorithms and techniques have been developed to address this inefficiency. Algorithms like proximal policy optimization (PPO) and trust region policy optimization (TRPO) have been shown to be more sample-efficient and stable in various applications. Their approaches are limited in the literature and thus more efforts are needed in order to further advance the concept of efficient HVAC control in buildings. Additionally, model-based RL approaches, where a model of the environment is learned and then used to simulate and optimize the policy, may also prove particularly beneficial for HVAC systems. This is because they can leverage the model to generate “synthetic” samples, reducing the need for real-world interactions. To this end, future research should focus on developing algorithms that can learn more effectively from limited interactions with the environment. Transfer learning, where knowledge gained in one environment is applied to another, could be a key technique for addressing this.
Additionally, multi-agent RL systems, where multiple agents collaboratively learn and operate, could be employed to manage large-scale HVAC systems with numerous components. According to our evaluation, multi-agent RL approaches are limited compared to single-agent techniques, hindering the efficient application of RL control in large-scale buildings, where the potential for energy savings is vast. There is also a need for the development of RL algorithms that are robust to uncertainties such as unpredictable weather changes or equipment malfunction. Incorporating Bayesian approaches into RL might offer a solution by allowing the system to reason about its uncertainty and make more informed decisions.
Artificial Neural Networks (ANNs): ANNs have shown great promise in predicting and optimizing HVAC operations. However, the black-box nature of ANNs can be a hindrance, especially in critical applications where interpretability is crucial. Future research should explore the development of interpretable neural network architectures for HVAC applications. Techniques like attention mechanisms, which highlight the importance of different inputs, could be integrated into HVAC-specific ANNs. Furthermore, the integration of temporal convolutional networks or recurrent architectures can better capture the time-dependent dynamics of HVAC systems. It should be noted that the vast majority of ANN applications consider FNN architectures, highlighting MLP as the most prominent, whereas RNN applications are more sparse. To this end, the examination of RNNs may provide a useful approach to providing upgraded and more sophisticated HVAC control in buildings. Moreover, comparisons between FNN and RNN architectures may provide a valuable research perspective. In addition, other architectures should be examined more intensively in future works, such as attention mechanisms and Transformer architectures for forecasting tasks where certain time points might be more relevant than others; hybrid models that combine the strengths of different architectures, like CNNs for spatial data and LSTMs for temporal data, or even hybrid models that combine ANNs with other techniques such as RL to leverage the strengths of both approaches; and neural ordinary differential equations (neural ODEs) for efficiently modeling HVAC systems using continuous dynamics, offering a differentiable way to solve differential equations.
Fuzzy Logic Control (FLC): FLC offers a more understandable approach to HVAC control, as it makes use of linguistic rules. However, the manual design of membership functions and rule bases can be tedious and may not always capture the complexities of real-world scenarios. In comparison to reinforcement learning (RL) and artificial neural network (ANN) model-free frameworks, FLC applications are limited in the literature between 2015 and 2023. One primary reason is the manual design requirement of FLC systems. Designing membership functions and rule bases can be intricate, especially for multifaceted systems, whereas ANNs and RL inherently learn from data, diminishing the need for manual intervention. Additionally, the scalability of FLC becomes a concern with complex HVAC systems. As these systems grow in complexity, the number of rules in an FLC system can surge exponentially, making it challenging to design and maintain. In contrast, ANN and RL algorithms offer better scalability, thereby adeptly handling larger systems. Furthermore, although FLC can be adaptive, it often necessitates extra mechanisms to modify its rules based on real-time data.
Future research work should focus on adaptive fuzzy systems that can evolve their rules and membership functions based on real-time data. The key is to integrate FLC with data-driven approaches like ANNs for the creation of hybrid models that combine the interpretability of FLC with the predictive power of ANNs. This is currently taking place in numerous research efforts and is known as neuro-fuzzy control approaches, which offer greater adaptability to changing environments compared to simple FLC. Moreover, neuro-fuzzy systems can handle complex, nonlinear relationships more efficiently, making them more robust and versatile. Last but not least, future research should also focus on the development of standardized frameworks for designing, testing, and validating FLC systems in HVAC applications.

According to our evaluation, multi-zone applications are considerably limited compared to single-zone applications. However, it should be mentioned that in general, multi-zone HVAC control fosters more efficient energy usage, as only occupied or specific zones can be conditioned based on demand, rather than conditioning an entire building uniformly. From a research perspective, future works should focus on developing advanced algorithms and sensors that can seamlessly integrate and manage the complexities of multi-zone systems. Emphasis should be also placed on predictive analytics to forecast zone-specific demands and on integrating IoT devices for real-time feedback and adaptive control. As buildings become more dynamic and user-centric, the shift toward multi-zone HVAC control will be pivotal in achieving both energy efficiency and the enhanced well-being of occupants.

Another tendency identified in our evaluation is the limited application to real-world buildings. Real-world results are the ultimate test of any HVAC control strategy since various unforeseen factors, such as sudden weather changes, equipment malfunction, or varying occupancy patterns, can come into play. These factors can significantly impact the performance of an HVAC system, leading to deviations from simulated predictions. Moving forward, future research efforts should focus on bridging this gap. This could involve developing more sophisticated simulation models that better mimic real-world complexities or creating adaptive control strategies that can learn and adjust in real time based on actual performance data.

7. Conclusions

The current work summarizes the most impactful highly cited papers between 2015 and 2023 concerning various model-free approaches for the control of HVAC systems in buildings. According to the evaluation, model-free control approaches, including methods like reinforcement learning, ANNs FLC, and their hybrid variations, are gaining traction in the HVAC domain due to their inherent adaptability to complex systems. HVAC systems in large and diverse buildings can be intricate and their complexity makes detailed modeling challenging. Not relying on a precise model of the environment and instead learning from interactions offers a more flexible solution. This adaptability reduces the need for the in-depth modeling of a building’s thermal dynamics, making it a preferred choice for complicated use cases such as large-scale multifunctional structures.

According to our examination of different papers, RL approaches, especially DRL approaches, hold significant potential for controlling HVAC systems since they are adequate for dynamically adapting to changing conditions, including variations in occupancy, external weather conditions, and building-specific quirks. Their implementation in future works is guaranteed and may be further supported by hybrid schemes that are less common compared to ANNs and FLC. Moreover, artificial neural networks (ANNs) are constantly attracting the interest of researchers due to their adaptive nature in facilitating learning from vast amounts of data and optimizing HVAC performance over time. According to our evaluation, ANNs are commonly employed in hybrid schemes in an effort to improve the performance of conventional approaches. As ANN models continue to grow in size and complexity, their applicability is expected to further advance, thereby offering unique potential for future implementations. On the other hand, fuzzy logic control (FLC) is less prevalent than RL and ANNs in the literature since it often requires expert knowledge to define its rules and membership functions. To this end, hybrid fuzzy approaches have attracted more interest from researchers compared to other approaches, a tendency that is expected to continue as a wider range of hybrid fuzzy approaches emerge.

Looking forward, as the emphasis on energy efficiency and sustainable practices grows, there is an increasing demand for HVAC solutions that can minimize energy use while maximizing comfort. Additionally, as more buildings become equipped with sensors and IoT devices, there is a surge in the volume of available data. Model-free control frameworks are ideally suited to leverage these data, making them prosperous tools for the future of HVAC systems in buildings. Their ability to continuously improve with more data also means that as building usage patterns evolve or as HVAC equipment ages, these systems can adjust and adapt without needing manual recalibrations or overhauls. This combination of adaptability, efficiency, and the push toward smarter buildings makes the integration of such an approach a promising direction, especially for large-scale structures.

Author Contributions

Conceptualization, P.M. and I.M.; methodology, P.M.; software, P.M.; validation, all authors; formal analysis, P.M.; investigation, all authors; resources, all authors; writing—original draft preparation, P.M. and D.V.; writing—review and editing, P.M., I.M. and D.V.; visualization, P.M.; supervision, I.M. and E.K. All authors have read and agreed to the published version of the manuscript.

Funding

We acknowledge the support of the project “Study, Design, Development and Implementation of a Holistic System for Upgrading the Quality of Life and Activity of the Elderly” (MIS 5047294), which was implemented under the action “Support for Regional Excellence” funded by the Operational Programme “Competitiveness, Entrepreneurship, and Innovation” (NSRF 2014–2020) and co-financed by Greece and the European Union (European Regional Development Fund).

Data Availability Statement

The Data generated considering this work were available through Scopus platform (https://www.scopus.com/).

Acknowledgments

The research leading to these results was partially funded by the project “Study, Design, Development and Implementation of a Holistic System for Upgrading the Quality of Life and Activity of the Elderly” (MIS 5047294), which was implemented under the action “Support for Regional Excellence” funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014–2020) and co-financed by Greece and the European Union (European Regional Development Fund). https://aspida.ee.duth.gr/ (accessed on 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AC	Air-Conditioner
ADMM	Alternating Direction Method of Multipliers
AFLM	Adaptive Fuzzy Logic Model
AHU	Air-Handling Unit
ANN	Artificial Neural Network
BCNN	Bayesian Convolutional Neural Network
BCVTB	Building Control Virtual Testbed
BDQ	Branching Dueling Q-Network
BEMS	Building Energy Management System
CAO	Cognitive Adaptive Optimization
CV	Coefficient of Variation
DDPG	Deep Deterministic Policy Gradient
DQN	Deep Q-Network
DR	Demand Response
DRL	Deep Reinforcement Learning
DT	Decision Tree
FIS	Fuzzy Inference System
FLC	Fuzzy Logic Control
FNN	Feedforward Neural Network
GA	Genetic Algorithm
HVAC	Heating Ventilation and Air-Conditioning
IAQ	Indoor Air Quality
IoT	Internet of Things
k-NN	k-Nearest Neighbor
LSTM	Long Short-Term Memory
LTPC	Learning-Based Thermal Preference Control
MAAC	Multi-Agent Actor–Critic
MAD	Mean Absolute Deviation
MADRL	Multi-Agent Deep Reinforcement Learning
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
MDP	Markov Decision Process
MLP	Multi-Layer Perceptron
MLR	Multiple Linear Regression
MPC	Model Predictive Control
NRMSE	Normalized Root-Mean-Square Error
PCM	Personal Comfort System
PIR	Passive Infrared Sensor
PMV	Predicted Mean Vote
PPD	Predicted Percentage Dissatisfied
PPO	Proximal Policy Optimization
PSO	Particle Swarm Optimization
RBC	Rule-Based Control
RBFNN	Radial Basis Function Neural Network
RL	Reinforcement Learning
RMSE	Root-Mean-Square Error
RNN	Recurrent Neural Network
RTP	Real-Time Pricing Program
RandNN	Random Neural Network
Rsquare or R2	Coefficient of Determination
SAC	Soft Actor–Critic
SARSA	State–Action–Reward–State–Action
SFLL	Supervised Fuzzy Logic Learning
SGD	Stochastic Gradient Descent
SVM	Support Vector Machine
TD	Temporal Difference
TD3	Twin Delayed Deep Deterministic Policy Gradient
TDNN	Time-Delay Neural Network
TRNSYS	Transient System Simulation Program
TRPO	Trust Region Policy Optimization
VAV	Variable Air-Volume
WSN	Wireless Sensor Array

References

Global Alliance for Buildings and Construction. Global Status Report for Buildings and Construction; Global Alliance for Buildings and Construction: Paris, France, 2020. [Google Scholar]
Li, Y.; Wang, W.; Wang, Y.; Xin, Y.; He, T.; Zhao, G. A review of studies involving the effects of climate change on the energy consumption for building heating and cooling. Int. J. Environ. Res. Public Health 2021, 18, 40. [Google Scholar] [CrossRef] [PubMed]
Behrooz, F.; Mariun, N.; Marhaban, M.H.; Mohd Radzi, M.A.; Ramli, A.R. Review of control techniques for HVAC systems—Nonlinearity approaches based on Fuzzy cognitive maps. Energies 2018, 11, 495. [Google Scholar] [CrossRef]
Serale, G.; Fiorentini, M.; Capozzoli, A.; Bernardini, D.; Bemporad, A. Model predictive control (MPC) for enhancing building and HVAC system energy efficiency: Problem formulation, applications and opportunities. Energies 2018, 11, 631. [Google Scholar] [CrossRef]
Seyam, S. Types of HVAC Systems. In HVAC System; InTech Open: London, UK, 2018; pp. 49–66. Available online: https://www.intechopen.com/chapters/62059 (accessed on 12 October 2023).
Rafati, A.; Shaker, H.R.; Ghahghahzadeh, S. Fault detection and efficiency assessment for hvac systems using non-intrusive load monitoring: A review. Energies 2022, 15, 341. [Google Scholar] [CrossRef]
Michailidis, P.; Pelitaris, P.; Korkas, C.; Michailidis, I.; Baldi, S.; Kosmatopoulos, E. Enabling optimal energy management with minimal IoT requirements: A legacy A/C case study. Energies 2021, 14, 7910. [Google Scholar] [CrossRef]
Belic, F.; Hocenski, Z.; Sliskovic, D. HVAC control methods-a review. In Proceedings of the 2015 19th International Conference on System Theory, Control and Computing (ICSTCC), Cheile Gradistei, Romania, 14–16 October 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 679–686. [Google Scholar]
Yao, Y.; Shekhar, D.K. State of the art review on model predictive control (MPC) in Heating Ventilation and Air-conditioning (HVAC) field. Build. Environ. 2021, 200, 107952. [Google Scholar] [CrossRef]
Akram, M.W.; Mohd Zublie, M.F.; Hasanuzzaman, M.; Rahim, N.A. Global prospects, advance technologies and policies of energy-saving and sustainable building systems: A review. Sustainability 2022, 14, 1316. [Google Scholar] [CrossRef]
Kim, D.; Lee, J.; Do, S.; Mago, P.J.; Lee, K.H.; Cho, H. Energy modeling and model predictive control for HVAC in buildings: A review of current research trends. Energies 2022, 15, 7231. [Google Scholar] [CrossRef]
Michailidis, I.T.; Sangi, R.; Michailidis, P.; Schild, T.; Fuetterer, J.; Mueller, D.; Kosmatopoulos, E.B. Balancing energy efficiency with indoor comfort using smart control agents: A simulative case study. Energies 2020, 13, 6228. [Google Scholar] [CrossRef]
Ali, S.; Zheng, Z.; Aillerie, M.; Sawicki, J.P.; Pera, M.C.; Hissel, D. A review of DC Microgrid energy management systems dedicated to residential applications. Energies 2021, 14, 4308. [Google Scholar] [CrossRef]
Paone, A.; Bacher, J.P. The impact of building occupant behavior on energy efficiency and methods to influence it: A review of the state of the art. Energies 2018, 11, 953. [Google Scholar] [CrossRef]
Korkas, C.D.; Baldi, S.; Michailidis, P.; Kosmatopoulos, E.B. A cognitive stochastic approximation approach to optimal charging schedule in electric vehicle stations. In Proceedings of the 2017 25th Mediterranean Conference on Control and Automation (MED), Valletta, Malta, 3–6 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 484–489. [Google Scholar]
Michailidis, I.T.; Michailidis, P.; Rizos, A.; Korkas, C.; Kosmatopoulos, E.B. Automatically fine-tuned speed control system for fuel and travel-time efficiency: A microscopic simulation case study. In Proceedings of the 2017 25th Mediterranean Conference on Control and Automation (MED), Valletta, Malta, 3–6 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 915–920. [Google Scholar]
Michailidis, I.T.; Michailidis, P.; Alexandridou, K.; Brewick, P.T.; Masri, S.F.; Kosmatopoulos, E.B.; Chassiakos, A. Seismic Active Control under Uncertain Ground Excitation: An Efficient Cognitive Adaptive Optimization Approach. In Proceedings of the 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT), Thessaloniki, Greece, 10–13 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 847–852. [Google Scholar]
Michailidis, I.T.; Manolis, D.; Michailidis, P.; Diakaki, C.; Kosmatopoulos, E.B. Autonomous self-regulating intersections in large-scale urban traffic networks: A Chania City case study. In Proceedings of the 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT), Thessaloniki, Greece, 10–13 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 853–858. [Google Scholar]
Michailidis, I.T.; Manolis, D.; Michailidis, P.; Diakaki, C.; Kosmatopoulos, E.B. A decentralized optimization approach employing cooperative cycle-regulation in an intersection-centric manner: A complex urban simulative case study. Transp. Res. Interdiscip. Perspect. 2020, 8, 100232. [Google Scholar] [CrossRef]
Michailidis, I.T.; Kapoutsis, A.C.; Korkas, C.D.; Michailidis, P.T.; Alexandridou, K.A.; Ravanis, C.; Kosmatopoulos, E.B. Embedding autonomy in large-scale IoT ecosystems using CAO and L4G-CAO. Discov. Internet Things 2021, 1, 1–22. [Google Scholar] [CrossRef]
Keroglou, C.; Kansizoglou, I.; Michailidis, P.; Oikonomou, K.M.; Papapetros, I.T.; Dragkola, P.; Michailidis, I.T.; Gasteratos, A.; Kosmatopoulos, E.B.; Sirakoulis, G.C. A Survey on Technical Challenges of Assistive Robotics for Elder People in Domestic Environments: The ASPiDA Concept. IEEE Trans. Med. Robot. Bionics 2023, 5, 196–205. [Google Scholar] [CrossRef]
Karatzinis, G.D.; Michailidis, P.; Michailidis, I.T.; Kapoutsis, A.C.; Kosmatopoulos, E.B.; Boutalis, Y.S. Coordinating heterogeneous mobile sensing platforms for effectively monitoring a dispersed gas plume. Integr. Comput.-Aided Eng. 2022, 29, 1–19. [Google Scholar] [CrossRef]
Salavasidis, G.; Kapoutsis, A.C.; Chatzichristofis, S.A.; Michailidis, P.; Kosmatopoulos, E.B. Autonomous trajectory design system for mapping of unknown sea-floors using a team of AUVs. In Proceedings of the 2018 European Control Conference (ECC), Limassol, Cyprus, 12–15 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1080–1087. [Google Scholar]
Gao, C.; Wang, D. Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems. J. Build. Eng. 2023, 74, 106852. [Google Scholar] [CrossRef]
Vamvakas, D.; Michailidis, P.; Korkas, C.; Kosmatopoulos, E. Review and Evaluation of Reinforcement Learning Frameworks on Smart Grid Applications. Energies 2023, 16, 5326. [Google Scholar] [CrossRef]
Macieira, P.; Gomes, L.; Vale, Z. Energy Management Model for HVAC Control Supported by Reinforcement Learning. Energies 2021, 14, 8210. [Google Scholar] [CrossRef]
Boodi, A.; Beddiar, K.; Benamour, M.; Amirat, Y.; Benbouzid, M. Intelligent systems for building energy and occupant comfort optimization: A state of the art review and recommendations. Energies 2018, 11, 2604. [Google Scholar] [CrossRef]
Gholamzadehmir, M.; Del Pero, C.; Buffa, S.; Fedrizzi, R. Adaptive-predictive control strategy for HVAC systems in smart buildings—A review. Sustain. Cities Soc. 2020, 63, 102480. [Google Scholar] [CrossRef]
Ahmad, M.W.; Mourshed, M.; Yuce, B.; Rezgui, Y. Computational intelligence techniques for HVAC systems: A review. In Building Simulation; Tsinghua University Press: Beijing, China, 2016; Volume 9, pp. 359–398. [Google Scholar]
Aqilah, N.; Rijal, H.B.; Zaki, S.A. A review of thermal comfort in residential buildings: Comfort threads and energy saving potential. Energies 2022, 15, 9012. [Google Scholar] [CrossRef]
Lamsal, P.; Bajracharya, S.B.; Rijal, H.B. A Review on Adaptive Thermal Comfort of Office Building for Energy-Saving Building Design. Energies 2023, 16, 1524. [Google Scholar] [CrossRef]
American Society of Heating, Refrigerating and Air-Conditioning Engineers, Inc. HVAC Systems and Equipment; American Society of Heating, Refrigerating, and Air Conditioning Engineers: Atlanta, GA, USA, 1996; Volume 39. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Bertsekas, D.P.; Tsitsiklis, J.N. Neuro-Dynamic Programming; Athena Scientific: Nashua, NH, USA, 1996; Volume 27. [Google Scholar] [CrossRef]
Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef]
Watkins, C.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Rummery, G.; Niranjan, M. On-Line Q-Learning Using Connectionist Systems; Technical Report CUED/F-INFENG/TR 166; University of Cambridge, Department of Engineering: Cambridge, UK, 1994. [Google Scholar]
Boutalis, I.S.; Syrakoulis, G.C. Computational Intelligence & Applications, 1st ed.; Krikos: Xanthi, Greece, 2008. [Google Scholar]
Haykin, S.S. Neural Networks and Learning Machines; Pearson Education India: Tamil Nadu, India, 2009; Available online: https://lps.ufrj.br/~caloba/Livros/Haykin2009.pdf (accessed on 12 October 2023).
Krose, B.; van der Smagt, P. An Introduction to Neural Networks; MIT Press: Cambridge, MA, USA, 1996. [Google Scholar]
Russell, S.; Norvig, P.; Chang, M.W.; Devlin, J.; Dragan, A.; Forsyth, D.; Goodfellow, I.; Malik, J.M.; Mansinghka, V.; Pearl, J.; et al. Artificial Intelligence a Modern Approach, 4th ed.; Pearson: New York, NY, USA, 2022. [Google Scholar]
Michailidis, P.; Michailidis, I.T.; Gkelios, S.; Karatzinis, G.; Kosmatopoulos, E.B. Neuro-distributed cognitive adaptive optimization for training neural networks in a parallel and asynchronous manner. Integr. Comput.-Aided Eng. 2023, 1–23. [Google Scholar] [CrossRef]
Zadeh, L. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Mamdani, E.; Assilian, S. An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. Man-Mach. Stud. 1975, 7, 1–13. [Google Scholar] [CrossRef]
Takagi, T.; Sugeno, M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man Cybern. 1985, 15, 116–132. [Google Scholar] [CrossRef]
Barrett, E.; Linder, S. Autonomous hvac control, a reinforcement learning approach. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, 7–11 September 2015; Proceedings, Part III 15. Springer: Berlin/Heidelberg, Germany, 2015; pp. 3–19. [Google Scholar]
Wei, T.; Wang, Y.; Zhu, Q. Deep reinforcement learning for building HVAC control. In Proceedings of the 54th Annual Design Automation Conference, Austin, TX, USA, 18 June 2017; pp. 1–6. [Google Scholar]
Wang, Y.; Velswamy, K.; Huang, B. A long-short term memory recurrent neural network based reinforcement learning controller for office heating ventilation and air conditioning systems. Processes 2017, 5, 46. [Google Scholar] [CrossRef]
Chen, Y.; Norford, L.K.; Samuelson, H.W.; Malkawi, A. Optimal control of HVAC and window systems for natural ventilation through reinforcement learning. Energy Build. 2018, 169, 195–205. [Google Scholar] [CrossRef]
Chen, B.; Cai, Z.; Bergés, M. Gnu-rl: A precocial reinforcement learning solution for building hvac control using a differentiable mpc policy. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, New York, NY, USA, 13–14 November 2019; pp. 316–325. [Google Scholar]
Valladares, W.; Galindo, M.; Gutiérrez, J.; Wu, W.C.; Liao, K.K.; Liao, J.C.; Lu, K.C.; Wang, C.C. Energy optimization associated with thermal comfort and indoor air control via a deep reinforcement learning algorithm. Build. Environ. 2019, 155, 105–117. [Google Scholar] [CrossRef]
Liu, T.; Xu, C.; Guo, Y.; Chen, H. A novel deep reinforcement learning based methodology for short-term HVAC system energy consumption prediction. Int. J. Refrig. 2019, 107, 39–51. [Google Scholar] [CrossRef]
Zhang, C.; Kuppannagari, S.R.; Kannan, R.; Prasanna, V.K. Building HVAC scheduling using reinforcement learning via neural network based model approximation. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, New York, NY, USA, 13–14 November 2019; pp. 287–296. [Google Scholar]
Gao, G.; Li, J.; Wen, Y. Energy-efficient thermal comfort control in smart buildings via deep reinforcement learning. arXiv 2019, arXiv:1901.04693. [Google Scholar]
Azuatalam, D.; Lee, W.L.; de Nijs, F.; Liebman, A. Reinforcement learning for whole-building HVAC control and demand response. Energy AI 2020, 2, 100020. [Google Scholar] [CrossRef]
Zou, Z.; Yu, X.; Ergan, S. Towards optimal control of air handling units using deep reinforcement learning and recurrent neural network. Build. Environ. 2020, 168, 106535. [Google Scholar] [CrossRef]
Lork, C.; Li, W.T.; Qin, Y.; Zhou, Y.; Yuen, C.; Tushar, W.; Saha, T.K. An uncertainty-aware deep reinforcement learning framework for residential air conditioning energy management. Appl. Energy 2020, 276, 115426. [Google Scholar] [CrossRef]
Yu, L.; Sun, Y.; Xu, Z.; Shen, C.; Yue, D.; Jiang, T.; Guan, X. Multi-agent deep reinforcement learning for HVAC control in commercial buildings. IEEE Trans. Smart Grid 2020, 12, 407–419. [Google Scholar] [CrossRef]
Biemann, M.; Scheller, F.; Liu, X.; Huang, L. Experimental evaluation of model-free reinforcement learning algorithms for continuous HVAC control. Appl. Energy 2021, 298, 117164. [Google Scholar] [CrossRef]
Du, Y.; Zandi, H.; Kotevska, O.; Kurte, K.; Munk, J.; Amasyali, K.; Mckee, E.; Li, F. Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning. Appl. Energy 2021, 281, 116117. [Google Scholar] [CrossRef]
Gupta, A.; Badr, Y.; Negahban, A.; Qiu, R.G. Energy-efficient heating control for smart buildings with deep reinforcement learning. J. Build. Eng. 2021, 34, 101739. [Google Scholar] [CrossRef]
Li, Z.; Sun, Z.; Meng, Q.; Wang, Y.; Li, Y. Reinforcement learning of room temperature set-point of thermal storage air-conditioning system with demand response. Energy Build. 2022, 259, 111903. [Google Scholar] [CrossRef]
Lei, Y.; Zhan, S.; Ono, E.; Peng, Y.; Zhang, Z.; Hasama, T.; Chong, A. A practical deep reinforcement learning framework for multivariate occupant-centric control in buildings. Appl. Energy 2022, 324, 119742. [Google Scholar] [CrossRef]
Deng, X.; Zhang, Y.; Qi, H. Towards optimal HVAC control in non-stationary building environments combining active change detection and deep reinforcement learning. Build. Environ. 2022, 211, 108680. [Google Scholar] [CrossRef]
Yu, L.; Xu, Z.; Zhang, T.; Guan, X.; Yue, D. Energy-efficient personalized thermal comfort control in office buildings based on multi-agent deep reinforcement learning. Build. Environ. 2022, 223, 109458. [Google Scholar] [CrossRef]
Huang, H.; Chen, L.; Hu, E. A neural network-based multi-zone modelling approach for predictive control system design in commercial buildings. Energy Build. 2015, 97, 86–97. [Google Scholar] [CrossRef]
Sholahudin, S.; Han, H. Simplified dynamic neural network model to predict heating load of a building using Taguchi method. Energy 2016, 115, 1672–1678. [Google Scholar] [CrossRef]
Javed, A.; Larijani, H.; Ahmadinia, A.; Emmanuel, R.; Mannion, M.; Gibson, D. Design and implementation of a cloud enabled random neural network-based decentralized smart controller with intelligent sensor nodes for HVAC. IEEE Internet Things J. 2016, 4, 393–403. [Google Scholar] [CrossRef]
Chae, Y.T.; Horesh, R.; Hwang, Y.; Lee, Y.M. Artificial neural network model for forecasting sub-hourly electricity usage in commercial buildings. Energy Build. 2016, 111, 184–194. [Google Scholar] [CrossRef]
Chen, Y.; Shi, Y.; Zhang, B. Modeling and optimization of complex building energy systems with deep neural networks. In Proceedings of the 2017 51st Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 29 October–1 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1368–1373. [Google Scholar]
Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build. 2017, 147, 77–89. [Google Scholar] [CrossRef]
González-Briones, A.; Prieto, J.; De La Prieta, F.; Herrera-Viedma, E.; Corchado, J.M. Energy optimization using a case-based reasoning strategy. Sensors 2018, 18, 865. [Google Scholar] [CrossRef]
Deb, C.; Lee, S.E.; Santamouris, M. Using artificial neural networks to assess HVAC related energy saving in retrofitted office buildings. Sol. Energy 2018, 163, 32–44. [Google Scholar] [CrossRef]
Kim, Y.J. Optimal price based demand response of HVAC systems in multizone office buildings considering thermal preferences of individual occupants buildings. IEEE Trans. Ind. Inform. 2018, 14, 5060–5073. [Google Scholar] [CrossRef]
Wang, Z.; Hong, T.; Piette, M.A. Data fusion in predicting internal heat gains for office buildings through a deep learning approach. Appl. Energy 2019, 240, 386–398. [Google Scholar] [CrossRef]
Peng, Y.; Nagy, Z.; Schlüter, A. Temperature-preference learning with neural networks for occupant-centric building indoor climate controls. Build. Environ. 2019, 154, 296–308. [Google Scholar] [CrossRef]
Sendra-Arranz, R.; Gutiérrez, A. A long short-term memory artificial neural network to predict daily HVAC consumption in buildings. Energy Build. 2020, 216, 109952. [Google Scholar] [CrossRef]
Elmaz, F.; Eyckerman, R.; Casteels, W.; Latré, S.; Hellinckx, P. CNN-LSTM architecture for predictive indoor temperature modeling. Build. Environ. 2021, 206, 108327. [Google Scholar] [CrossRef]
Saepullah, A.; Wahono, R.S. Comparative analysis of mamdani, sugeno and tsukamoto method of fuzzy inference system for air conditioner energy saving. J. Intell. Syst. 2015, 1, 143–147. [Google Scholar]
Keshtkar, A.; Arzanpour, S.; Keshtkar, F.; Ahmadi, P. Smart residential load reduction via fuzzy logic, wireless sensors, and smart grid incentives. Energy Build. 2015, 104, 165–180. [Google Scholar] [CrossRef]
Ulpiani, G.; Borgognoni, M.; Romagnoli, A.; Di Perna, C. Comparing the performance of on/off, PID and fuzzy controllers applied to the heating system of an energy-efficient building. Energy Build. 2016, 116, 1–17. [Google Scholar] [CrossRef]
Keshtkar, A.; Arzanpour, S. An adaptive fuzzy logic system for residential energy management in smart grid environments. Appl. Energy 2017, 186, 68–81. [Google Scholar] [CrossRef]
Ain, Q.u.; Iqbal, S.; Khan, S.A.; Malik, A.W.; Ahmad, I.; Javaid, N. IoT operating system based fuzzy inference system for home energy management system in smart buildings. Sensors 2018, 18, 2802. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Zhang, J.; Zhao, T. Indoor thermal environment optimal control for thermal comfort and energy saving based on online monitoring of thermal sensation. Energy Build. 2019, 197, 57–67. [Google Scholar] [CrossRef]
Hussain, S.; Gabbar, H.A.; Bondarenko, D.; Musharavati, F.; Pokharel, S. Comfort-based fuzzy control optimization for energy conservation in HVAC systems. Control Eng. Pract. 2014, 32, 172–182. [Google Scholar] [CrossRef]
Wei, X.; Kusiak, A.; Li, M.; Tang, F.; Zeng, Y. Multi-objective optimization of the HVAC (heating, ventilation, and air conditioning) system performance. Energy 2015, 83, 294–306. [Google Scholar] [CrossRef]
Attaran, S.M.; Yusof, R.; Selamat, H. A novel optimization algorithm based on epsilon constraint-RBF neural network for tuning PID controller in decoupled HVAC system. Appl. Therm. Eng. 2016, 99, 613–624. [Google Scholar] [CrossRef]
Ghahramani, A.; Karvigh, S.A.; Becerik-Gerber, B. HVAC system energy optimization using an adaptive hybrid metaheuristic. Energy Build. 2017, 152, 149–161. [Google Scholar] [CrossRef]
Sala-Cardoso, E.; Delgado-Prieto, M.; Kampouropoulos, K.; Romeral, L. Activity-aware HVAC power demand forecasting. Energy Build. 2018, 170, 15–24. [Google Scholar] [CrossRef]
Satrio, P.; Mahlia, T.M.I.; Giannetti, N.; Saito, K. Optimization of HVAC system energy consumption in a building using artificial neural network and multi-objective genetic algorithm. Sustain. Energy Technol. Assess. 2019, 35, 48–57. [Google Scholar]
Zhang, Z.; Chong, A.; Pan, Y.; Zhang, C.; Lam, K.P. Whole building energy model for HVAC optimal control: A practical framework based on deep reinforcement learning. Energy Build. 2019, 199, 472–490. [Google Scholar] [CrossRef]
Ren, M.; Liu, X.; Yang, Z.; Zhang, J.; Guo, Y.; Jia, Y. A novel forecasting based scheduling method for household energy management system based on deep reinforcement learning. Sustain. Cities Soc. 2022, 76, 103207. [Google Scholar] [CrossRef]
Cai, J.; Kim, D.; Jaramillo, R.; Braun, J.E.; Hu, J. A general multi-agent control approach for building energy system optimization. Energy Build. 2016, 127, 337–351. [Google Scholar] [CrossRef]
Wang, W.; Chen, J.; Huang, G.; Lu, Y. Energy efficient HVAC control for an IPS-enabled large space in commercial buildings through dynamic spatial occupancy distribution. Appl. Energy 2017, 207, 305–323. [Google Scholar] [CrossRef]
Peng, Y.; Rysanek, A.; Nagy, Z.; Schlüter, A. Occupancy learning-based demand-driven cooling control for office spaces. Build. Environ. 2017, 122, 145–160. [Google Scholar] [CrossRef]
Peng, Y.; Rysanek, A.; Nagy, Z.; Schlüter, A. Using machine learning techniques for occupancy-prediction-based cooling control in office buildings. Appl. Energy 2018, 211, 1343–1358. [Google Scholar] [CrossRef]
Li, W.; Wang, S. A multi-agent based distributed approach for optimal control of multi-zone ventilation systems considering indoor air quality and energy use. Appl. Energy 2020, 275, 115371. [Google Scholar] [CrossRef]
Michailidis, I.T.; Schild, T.; Sangi, R.; Michailidis, P.; Korkas, C.; Fütterer, J.; Müller, D.; Kosmatopoulos, E.B. Energy-efficient HVAC management using cooperative, self-trained, control agents: A real-life German building case study. Appl. Energy 2018, 211, 113–125. [Google Scholar] [CrossRef]
Lymperopoulos, G.; Ioannou, P. Building temperature regulation in a multi-zone HVAC system using distributed adaptive control. Energy Build. 2020, 215, 109825. [Google Scholar] [CrossRef]
Apostolidis, S.; Koutras, D.; Orfanidis, G.; Michailidis, P.; Ioannidis, K.; Kapoutsis, A.; Vrochidis, S.; Kompatsiaris, I.; Kosmatopoulos, E. D3.5 Dynamic and Adaptive Swarm Optimization V1. 2020. Available online: https://aresibo.eu/sites/default/files/documents/d3.5.pdf (accessed on 29 September 2020).
Shahnazari, H.; Mhaskar, P.; House, J.M.; Salsbury, T.I. Modeling and fault diagnosis design for HVAC systems using recurrent neural networks. Comput. Chem. Eng. 2019, 126, 189–203. [Google Scholar] [CrossRef]
Elnour, M.; Meskin, N.; Al-Naemi, M. Sensor data validation and fault diagnosis using Auto-Associative Neural Network for HVAC systems. J. Build. Eng. 2020, 27, 100935. [Google Scholar] [CrossRef]

Figure 1. Number of citations per pear (2015–2022) for model-free HVAC control according to Scopus.

Figure 2. Visual representation of the architecture of the current work.

Figure 3. Visual representation of model-free HVAC control.

Figure 4. Basic reinforcement learning framework.

Figure 5. The simplest form of an artificial neural network, with only one input and one output neuron.

Figure 6. Distribution of citations (%) per model-free control methodology according to Scopus (2015–2022).

Figure 7. Number of citations annually per model-free control methodology according to Scopus (2015–2022).

Figure 8. Occurrence of RL types (Left), percentage share (%) of citations for each RL type (Right), and occurrence of specific RL methodologies (Center) in the related literature (2015–2023).

Figure 9. Number of highly cited research works per ANN Type (Left), Number of highly cited research works per ANN Methodology (Center), and share (%) of citations per ANN type (Right) (2015–2022).

Figure 10. Number of highly cited research works per FLC type (2015–2022).

Figure 11. Number of hybrid highly cited research works per type (2015–2022).

Figure 12. Number of single- and multi-agent applications and share (%) of citations in the HVAC control-related papers (2015–2022).

Figure 13. Number of applications and share of citations regarding the different types of equipment identified in the HVAC control-related papers (2015–2022).

Figure 14. Comparison of the number of applications and share of citations (%) for single-zone and multi-zone HVAC control-related papers (2015–2022).

Figure 15. Comparison of the number of applications and share of citations (%) for simulations and real-life HVAC control-related papers (2015–2022).

Figure 16. Comparison of the number of applications and share of citations (%) for commercial and residential HVAC control-related papers (2015–2022).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Model-Free HVAC Control in Buildings: A Review

Abstract

1. Introduction

1.1. Literature Analysis Approach

1.2. Previous Literature Works

1.3. Novelty and Contributions

1.4. Paper Structure

2. General Description of HVAC Systems

HVAC Operations and Types

3. Conceptual Background of Model-Free Methodologies for HVAC Control

3.1. Reinforcement Learning

3.1.1. Value-Based RL Approach

3.1.2. Policy-Based RL Approach

3.1.3. Actor–Critic RL Approach

3.2. Artificial Neural Networks

3.2.1. Feedforward Neural Networks (FNNs)

3.2.2. Recurrent Neural Networks (RNNs)

3.3. Fuzzy Logic Control

3.3.1. Mamdani FLC Approach

3.3.2. Sugeno FLC Approach

4. Literature Review of Model-Free Applications in HVAC Control

4.1. Literature Review of Reinforcement Learning Control Applications

4.2. Literature Review of Artificial Neural Network Control Applications

4.3. Literature Review of Fuzzy Logic Control Applications

4.4. Literature Review of Hybrid Model-Free Control Applications

4.5. Literature Review of Other Model-Free Control Applications

5. Evaluation

5.1. Evaluation of Model-Free Control Strategies

5.1.1. Evaluation of Reinforcement Learning Control Strategies

5.1.2. Evaluation of Artificial Neural Network Control Strategies

5.1.3. Evaluation of Fuzzy Logic Control Strategies

5.1.4. Evaluation of Hybrid Model-Free Control Strategies

5.1.5. Evaluation of Other Model-free Control Strategies

5.2. Evaluation of Agent-Based Optimization Strategies

5.3. Evaluation of HVAC Equipment Types

5.4. Evaluation of Building Zones

5.5. Evaluation of Building Testbeds

5.6. Evaluation of Building Use

6. Future Directions in HVAC Control Using Model-Free Strategies

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics