A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building

Azzi, Amal; Abid, Meryem; Hanif, Ayoub; Bensag, Hassna; Tabaa, Mohamed; Hachimi, Hanaa; Youssfi, Mohamed

doi:10.3390/en18174783

Open AccessArticle

A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building

by

Amal Azzi

^1,2,

Meryem Abid

¹

,

Ayoub Hanif

^1,3,

Hassna Bensag

^1,3,

Mohamed Tabaa

^1,*

,

Hanaa Hachimi

² and

Mohamed Youssfi

³

¹

Multidisciplinary Laboratory of Research and Innovation (LPRI) Lab, Moroccan School of Engineering Sciences (EMSI), Casablanca 20250, Morocco

²

Laboratory of Advanced Systems Engineering (LISA), Ibn Tofail University (UIT), Kenitra 14000, Morocco

³

Computer Science, Artificial Intelligence and Cyber Security Laboratory (2IACS), ENSET, University Hassan II of Casablanca, Mohammedia 28830, Morocco

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(17), 4783; https://doi.org/10.3390/en18174783

Submission received: 30 July 2025 / Revised: 2 September 2025 / Accepted: 3 September 2025 / Published: 8 September 2025

(This article belongs to the Section G: Energy and Buildings)

Download

Browse Figures

Versions Notes

Abstract

Aware of the nefarious effects of excessive exploitation of natural resources and the greenhouse gases emissions linked to building sector, the concept of smart buildings emerged, referring to a building that uses clean energy efficiently. This requires intelligent control systems to manage the use of residential energy consuming devices, namely the HVAC (Heating, Ventilation, Air-conditioning) system. This system consumes up to 50% of the total energy used by a building. In this paper, we introduce a RL (Reinforcement Learning) and MPC-LSTM (Model Predictive Control-Long-Short Term Memory) hybrid control system that combines DNNs (Deep Neural Networks), through RL, with LSTM’s long-short memory technique and MPC’s control characteristics. The goal of our model is to maintain thermal comfort of residents while optimizing energy consumption. Consequently, to train and test our model, we generate our own dataset using a building model of a corporate building in Casablanca, Morocco, combined with weather data of the same city. Simulations confirm the robustness of our model as it outperforms basic control methods in terms of thermal comfort and energy consumption especially during summer. Compared to conventional methods, our approach resulted in a 45.4% and 70.9% reduction in energy consumption, in winter and summer, respectively. Our approach also resulted in 26 less comfort violations during winter. On the other hand, during summer, our approach found a compromise between energy consumption and comfort with no more than 2.5 °C above ideal temperature limit.

Keywords:

energy management system; building; LSTM; reinforcement learning; MPC

1. Introduction

According to a study conducted by the United Nations Environment Program, the residential and commercial buildings’ sector accounts for about 34% of global energy demand for heating, cooling and lighting; energy that is predominantly generated using fossil fuels which release GHG (Greenhouse Gases) in the atmosphere when burned [1]. Furthermore, it has been established that GHG are a major contributor to the climate change phenomenon due to their ability to trap excessive heat inside the atmosphere, thus, leading to global warming and its nefarious consequences on living creatures’ survival [2].

Aware of the expanding residential areas around the world due to urbanization, efforts were aimed towards the decarbonization of the buildings sector, with hopes of achieving NZCB (Net Zero Carbon Building) by 2050 [3,4,5,6]. In the context of buildings, NZCB covers two aspects: First, the energy used at every step of construction, from raw materials needed for construction all the way to the one used to power up appliances once the building is used, must be ideally entirely from clean renewable sources such as solar or wind energy. This aspect can be demanding in terms of time and equipment. One the other hand, the second aspect requires that energy efficiency of buildings be optimized to ensure that the building itself uses its energy only when absolutely necessary, while maintaining residents comfort [7,8]. As a result, energy management systems, also referred to as EMSs, become crucial to the energy efficiency in buildings where they monitor, control and optimize energy use for reduced operating costs [9,10]. In other words, EMSs act as a mediator between residents, electric devices and energy sources, and they ensure an optimal use of electric devices to satisfy residents’ needs at the least cost (i.e., quantity of energy used, bills, device degradation…) possible. Building equipped with EMS are often known as smart buildings, where automated controls and applications are deployed for an intelligent use of energy devices [11].

While buildings require energy for countless purposes (lighting, cleaning, entertainment…), the HVAC remains the most energy consuming with around 50% of the total energy consumption in buildings [12]. Therefore, optimizing HVAC energy consumption contributes immensely to the optimization of energy consumption in a building, and, consequently, to energy efficiency. In this paper, we will focus on energy optimization of HVAC energy consumption while ensuring residents’ comfort through the incorporation of control systems [13] on which EMS usually rely to execute tasks by controlling physical components accordingly.

HVAC systems play a major role in the building industry. They are responsible for ensuring thermal comfort and improving indoor air quality, which is very important for the long-term well-being of occupants. Literature shows that the control of these systems has undergone a remarkable evolution, moving from traditional or conventional control methods to advanced control techniques.

The key strengths of conventional controllers are their simplicity and stability in static circumstances. The ON/OFF control provides simple regulation to the HVAC systems whereas the PID (Proportional-Integral-Derivative) control is built on the difference between the actual and the desired temperature to provide reactive control of the HVAC equipment [14]. When it comes to multi-objective control or more complicated applications, these approaches are severely constrained. First, they are unable to adjust to changing circumstances or account for variations in the surrounding environment, occupancy (occupants’ presence or absence), weather, and thermal inertia [13]. Here, predictive controls present a viable substitute since they use dynamic models to simulate the system’s evolution to forecast and improve control in the future [15].

In this paper, we suggest a hybrid approach combining three methods widely supported by literature. First, we use LSTM to predict with high accuracy the evolution of thermal conditions and external climate changes, capturing long-term temporal dependencies. Secondly, we rely on the robust dynamic adaptation capabilities of reinforcement learning to select the optimal setpoint temperature that ensures residents’ comfort while minimizing energy consumption as much as possible in the face of unexpected environmental variations. Finally, we use MPC, which ensures the overall optimization of control actions applied to HVAC systems within a given time horizon, while respecting energy and comfort constraints. The combination of these three approaches allows us to benefit from the advantages of each. They offer robust, predictive, and adaptive control that improves both energy efficiency and thermal comfort of the building.

Consequently, our paper will be structured as follows. Section 2 will consist of a literature review summarizing recent advances in hybrid control systems and how they compare to our model. Section 3 will provide an elaborate description of the study environment as well as data generation and preprocessing. Section 4 will describe the various components of our hybrid model, as well as the objective function and the equations used to restrain it. Section 5 will summarize simulations’ parameters and evaluate the performance of our model. And finally, we will discuss our findings and possible future approaches in the conclusion.

2. Related Works

Because of their complex dynamics and need for energy optimization, HVAC systems in the residential sector are known for their complexity control. Such circumstances can be particularly well suited to predictive control strategies due to their unique ability of combining occupancy profiles and weather forecasts to make better control decisions. The authors of [16] use predictive control to guarantee thermal comfort in intelligent buildings with the primary of controlling predicted temperature and humidity. Their solution is based on a “gray box” model of the HVAC system, where data-driven and model-based approaches are combined. This method has a very low mean error of around 0.36 degrees. With the same objective in [17], MPC is used to regulate the temperature of a room, with an accuracy of 0.5 degrees. In [18], MPC is used to guarantee occupants’ comfort by controlling indoor air quality levels and, simultaneously, minimize costs through energy savings of 15% compared to binary control. Meanwhile, the authors of [19], applied MPC with periodic learning to building dynamics for reliable planning without human intervention. This approach aims to reduce the energy consumption of HVAC systems while guaranteeing thermal comfort.

However, the MPC strategy must overcome certain obstacles. When managing multiple zones or buildings, the complexity of calculations is a major drawback, as it requires complex computations that can be difficult to perform in real time [20]. In addition, good performance depends on the type of building and the reliability of the HVAC system model [21]. To obtain well-adapted controls during the prediction and optimization phase, MPC needs extremely accurate and reliable data [22]. These difficulties underline the need for new methods aimed at simpler modeling at reduced computational cost to make full use of MPC’s predictive capabilities.

To achieve this, recent technological developments have introduced the use of ML (Machine Learning) in the buildings control sector. The integration of AI (Artificial Intelligence) offers several simplifications, amongst which we name simplifying dynamic thermal modeling [23]. The first step in integrating AI into MPC was the creation of ANNs (Artificial Neural Networks) [24]. In [25], authors propose an ANN-MPC architecture to improve energy efficiency of heating in buildings using a database simulated with the Dynamic Thermal Simulator. This approach reduced energy consumption by around 38.53%. In [26], authors developed a real-time MPC system based on ANN models to control the operating variables of HVAC systems. Compared to conventional control strategies, this method proved superior with an energy reduction of around 31.7%.

The main advantage of ANNs is their ability to offer universal approximation, making it easier to deal with complicated, non-linear behaviors that traditional models often struggle to reproduce. However, one of the major drawbacks of ANNs is their feedforward architecture, i.e., data flows in one direction, which results in their inability to handle temporal dependencies. This is where RNNs (Recurrent Neural Networks) come in, as they add an essential temporal component to predictive control, making them considerably more effective than ordinary neural networks. For instance, a constrained MPC model combined with a dynamic RNN achieved a reduction of 69.9% of energy consumption of air treatment plants compared to real systems [27].

However, a number of studies have highlighted the limitations of RNNs in this sector, as in [28], which reports that RNNs suffer from instability in the face of long time scales, making it more difficult to accurately capture thermal dynamics of buildings over long durations. In addition, the authors of [29] highlight the need to provide RNNs with a large volume of data per zone, which limits their scalability in multi-zone buildings. These issues, which come from RNNs’ limited memory and sensitivity to changes, have led to the creation of extended-memory networks like LSTMs or the use of RL for more flexible and independent controls over time.

The gradient disappearing issue was addressed by the development of LSTMs, a class of neural networks designed to modify and process data in sequences [30]. Numerous studies demonstrate LSTM’s ability to maximize energy and thermal control in residential sector [31]. Instead of simplifying the complexity of modeling, LSTM can be combined with MPC. For instance, in [32], a data-driven MPC approach for thermal input design is combined with LSTM to predict temperature. With an accuracy of 0.5 °C, MPC regulates flow to maintain a stable indoor temperature when the temperature setting is changed, outperforming PID control.

Although the LSTM-MPC architecture is ideal for accurately predicting thermal variables for short-term optimization purposes, it remains limited when it comes to managing uncertainties and dynamic variability in the environment. In this context, RL applied to MPC appears to be a promising alternative for continuous adaptation without an explicit model. RL emerges as a promising solution for designing adaptive control strategies in residential environments, given the variability of occupancy behaviors and fluctuating environmental conditions. Furthermore, RL agents have the ability to propose set point temperature settings based on circumstantial parameters such as user preferences, occupancy rate and weather conditions [33]. On the same note, The authors of [34] propose a hybrid method for controlling HVAC systems by combining MPC and deep RL, demonstrating superior performance compared to existing algorithms. In addition, the work in [35] provides a detailed analysis of the current state of the art in the field of RL control algorithms and MPC in BEMS (Building Energy Management Systems). The literature has not yet thoroughly examined the three-way combination of MPC-LSTM-RL, despite a few studies focusing on binary combinations such as LSTM-MPC or RL-MPC.

This gap demonstrates the uniqueness and applicability of the methodology used in this work. By combining RL with LSTM and MPC, each component of our model will excel in a different aspect to complement the other two components. For instance, LSTM’s strong long-short term prediction capabilities are used to forecast indoor and outdoor conditions. RL’s adaptive strategy is used to determine the target temperature. And finally, MPC’s feasibility and constraint satisfaction directly controls the HVAC’s configuration to apply decision made by RL according to an environment described by LSTM. By combining the three models, we create a robust dynamic model that manages HVAC’s energy consumption efficiently, while maintaining occupants’ comfort autonomously.

3. Data and Environment Description

3.1. Building Description

Our building is a corporate open space located in Casablanca, Morocco, and frequented by 18 employees. It extends over the surface of 80 m² with walls 2.75 m tall and windows on two facades as illustrated by Figure 1. The first step to energy and comfort optimization in a building is to understand how that building behaves thermally under various conditions.

While it is evident that outdoor temperature influences indoor temperature, it is also proven that radiation, wind speed, occupancy, number of windows -if any- and walls characteristics, amongst others, also influence how much energy seeps inside the building and how long the building internalizes it [36]. However, to be able to estimate indoors conditions, we are required to understand how and to what extent these factors influence the building’s thermal situation.

To tackle this problem, we resort to building modeling which refers to data-driven or mathematical representation of a building’s thermal and energy behavior. Similarly, Table 1 describes the building’s envelope and materials as dictated by typical construction practices in the region.

3.2. Building Modeling Technique

In the literature, a building can be modeled through mathematical equations that represent physical phenomena such as heat conduction through walls. This type of modeling is called white box [37]. While easier to interpret and suitable for model-based control strategies such as MPC, white box models are also more complex, time consuming and require relevant knowledge of thermodynamic principles. Alternatively, historical data can be used to train ML models to predict the behavior of a building under certain circumstances without understanding physical processes behind it, which is referred to as black box modeling [38]. Understandably, black box models rely entirely on the quality and size of data used to train it; therefore, they generalize poorly outside training data despite being powerful in terms of accurate predictions. Ultimately, black box outputs are difficult to interpret because we cannot determine how the model estimated them.

In this paper, we explore the use of LSTM which is data-driven with MPC which is model-based, thus, our approach is a gray box strategy, combining the robustness of black box and generalization of white box [37,38,39]. In the absence of a thermal model of our building, we resorted to building our own using said gray box technique. Consequently, to train the LSTM model and, eventually, use it for predictions, we need to supply it with a time series dataset that reflects the building behavior under different continuous conditions. Therefore, we referred to EnergyPlus 24.1.0, a white box building energy simulation program developed by the US department of energy [40]. Amongst other advantages, EnergyPlus helps predict energy consumption of buildings at every time step over the course of a whole year. This is achieved using physics-based equations that represent the building’s structural characteristics, weather in its region, occupancy and electric appliances used inside it, HVAC included. In essence, EnergyPlus requires 3 inputs: First, the building’s technical file that describes its geometry, construction materials, thermal zone distribution, geographic location, energy consuming devices, and HVAC subsystems as per Table 1. Second, the weather data related to the geographic location of the building. Finally, the building’s thermal model.

To capture the thermodynamic properties of our building, we used BINAYATE 2014, a software developed by The AMEE (Moroccan Agency for Energy Efficiency) to promote energy efficiency in buildings [41]. BINAYATE is provided with the building description as per Table 1, to generate a thermal model of our building. As an output, EnergyPlus generates a dataset that describes meteorological phenomena, indoor and outdoor conditions of the building and its regions over the course of 1 year, with a step of 1 h.

3.3. Data Preprocessing

Our deep learning model, i.e., LSTM, aims to predict energy consumption depending on the various features provided in the dataset. To achieve this, it needs to be able to quantify the cooling and heating loads applied on HVAC, which automatically influences its energy consumption. By load we refer to the quantity of warm or cool air an HVAC system needs to release in the building to maintain its temperature within comfort limits. This information is not automatically obtained by used models; therefore, we use Equation (1) to quantify this load and add it to the generated dataset. This load is influenced partially by the convective load which refers to the influence of “heat” sources in the building zone such as occupants, computers, and lights.

C_{z} \frac{{d T}_{z}}{d t} = \sum_{i = 1}^{N s l} \dot{Q_{i}} + \sum_{i = 1}^{N s u r f a c e s} h_{i} A_{i} (T_{s i} - T_{z}) + \sum_{i = 1}^{N z o n e s} \dot{m_{i}} C_{p} (T_{z i} - T_{z}) + {\dot{m}}_{i n f} C_{p} (T_{\infty} - T_{z}) + {\dot{Q}}_{s y s}

(1)

where

\sum_{i = 1}^{N s l} \dot{Q_{i}}

is the sum of the convective internal loads,

\sum_{i = 1}^{N s u r f a c e s} h_{i} A_{i} (T_{s i} - T_{z})

is the convective heat transfer from the zone surfaces,

{\dot{m}}_{i n f} C_{p} (T_{\infty} - T_{z})

is the heat transfer due to infiltration of outside air,

\sum_{i = 1}^{N z o n e s} \dot{m_{i}} C_{p} (T_{z i} - T_{z})

is the heat transfer due to interzone air mixing,

{\dot{Q}}_{s y s}

is the air system’s output, and

C_{z} \frac{{d T}_{z}}{d t}

is energy stored in the building zone air.

Table 2 provides a description of the features that constitute our dataset, their designation, unit and type. After applying data preprocessing techniques to clean it and prepare it for the next step, i.e., prediction. Using Python’s Pandas library (version 3.11.6), we removed NaN, i.e., missing, values, unused spaces and formatted the date column to generic format. Missing timestamps were filled using the next hour’s timestamp, to maintain consistency. Additionally, we generated a correlation matrix to study associations between features. Furthermore, Commas were used as decimal separators in numerical columns, which were then cleaned up and transformed into floating-point values.

Finally, our dataset consists of features of various scales, some with much larger ranges than others, thus increasing the risk of dominant features overpowering smaller ones and leading to a biased model. To overcome this issue, we refer to scaling to ensure a fair contribution from all features in the final model. Among the various scalers used in the literature, we referred to Min-Max scaler [42,43], which is a technique commonly used in machine learning to convert numerical features into a binary range [0, 1].

x^{'} = \frac{x - m i n (x)}{m a x (x) - m i n (x)}

(2)

The scaling function, also known as normalization, consists of resizing each x value according to Equation (2). This scaling step helps maintain balanced feature contributions, prevents exploding or vanishing gradients.

4. Model Architecture

The aim of our model is to optimize energy consumption of the HVAC system without compromising residents’ thermal comfort. While no specific thermal comfort range can be generalized, most studies in this field put it between 16 °C and 31 °C depending on the region, season, building type (residential, hospital…) and personal preferences [44]. Therefore, in our model we define the thermal comfort zone as an interval between 18 °C and 22 °C during winter, and between 22 °C and 24 °C during summer. Accordingly, indoor temperature should be maintained within defined intervals, account taken of external and internal parameters that might affect temperature (radiation, outdoor temperature…). Given the limitations of classic control methods, our model consists of a hybridization of 3 components as expressed by Figure 2.

Our innovative architecture is based on three interconnected elements that work in synergy. The LSTM model is trained to predict the building’s future thermal and energy conditions, including indoor temperatures, humidity, solar radiation, etc. At the same time, the RL agent is trained on the same dataset to learn an optimization policy to generate suggested setpoints also known as the reference temperature. This temperature is adapted to the current conditions of the simulation and the forecasts provided by the LSTM module. These two complementary flows of information constitute the inputs to the MPC controller, which solves a constrained optimization problem by integrating these inputs-setpoint temperature and predicted indoor/outdoor conditions—to generate the optimal final control commands (air temperature) to be applied to the HVAC systems.

A more detailed explanation of each component of the LSTM-MPC-RL architecture is provided in the following sections.

4.1. Long-Short Term Memory (LSTM)

Before the overall model acts on the HVAC to set indoor conditions to comfort levels, it requires prior knowledge of both indoor and outdoor circumstances such as temperature, humidity, radiation and occupancy, amongst others. Each one of these factors contributes to a certain degree to temperature fluctuations, and by extension, to the HVAC configuration. For instance, an increase in outdoor temperature will eventually cause indoor temperature to rise as well eventually, as the walls and windows allow for heat to seep indoors. With that in mind, we refer to the LSTM model to predict future indoor and outdoor conditions.

LSTM is a type of RNNs, which are traditional algorithms that learn patterns and features dependencies in sequential data to predict future parameters [45]. One of the benefits of LSTM is its ability to retain information from previous steps, which makes it stronger at capturing long term dependencies than classic RNNs. In addition to a memory cell where information is stored, LSTM mimics RNNs architecture where it uses mainly 3 components: cell state (

C_{t}

), hidden state (

h_{t}

), and gates (

f_{t}

,

i_{t}

, and

O_{t}

) as portrayed by Figure 3, which is adapted from [45]. While the cell state (

C_{t}

) carries long term information through steps and updates it over time, the hidden state (

h_{t}

) defines the current state based on the information it had at the time, which helps pass short term information to the next step. Finally, gates filter information and decide what to forget (i.e., forget gate

f_{t}

), what to remember or add to cell state (i.e., input gate

{C^{'}}_{t}

) and what to forward to the next cell (i.e., output gate

O_{t}

). The process starts by reading the current data and deciding which information to forget using the following equation:

f_{t} = σ \times (W_{f} \times [h_{t - 1}, X_{t}] + b_{f})

(3)

where

h_{t - 1}

represents the hidden state of the previous step,

X_{t}

is the input vector of the current step,

W_{f}

is weight matrices,

b_{f}

is bias vectors, and finally,

σ

is the Relu activation function. Relu is a simple activation function that yields the output directly if it is positive, and 0 otherwise, thus mitigating the vanishing gradient problem simply and effectively compared to other activation functions [46,47]. The output

f_{t}

is a vector of values ranging from 0 to 1 where each value determines the weight of information from the previous cell

C_{t - 1}

. Informations with a weight close to 0 are discarded from

C_{t - 1}

, therefore, forgotten. Accordingly, the system determines which values to update using Equation (4).

i_{t} = σ \times (W_{i} \times [h_{t - 1}, X_{t}] + b_{i})

(4)

The system also decides how much of the new information should be stored. To this end, it uses

t a n h

, a function that yields values between −1 and 1, to shape the values of new candidates, as per Equation (5). Furthermore, the system updates the cell state

C_{t}

by combining old information

C_{t - 1}

filtered using

f_{t}

with new information

{C^{'}}_{t}

, as per Equation (6).

{C^{'}}_{t} = \tanh (W_{c} \times [h_{t - 1}, X_{t}] + b_{c})

(5)

C_{t} = f_{t} ʘ C_{t - 1} + i_{t} ʘ {C^{'}}_{t}

(6)

Finally, the output gate defines the new hidden state

h_{t}

by deciding how much of the current memory

C_{t}

should be provided to the following cells and updates the hidden state

h_{t}

using Equations (7) and (8), respectively.

o_{t} = σ \times (W_{o} \times [h_{t - 1}, X_{t}] + b_{o})

(7)

h_{t} = o_{t} ʘ \tanh (C_{t})

(8)

Our LSTM model was trained using the previously described data to predict various indoor and outdoor parameters using 6 predictors:

Weather predictor: uses historical data to predict future weather conditions such as outdoor temperature and humidity.
Indoor conditions predictor: uses current states and control actions to predict the future state of the HVAC system as well as indoor temperature and humidity.
Radiations predictor: predicts solar radiation.
Occupancy predictor: estimates the number of occupants in the building.
Temperature predictor: forecasts the future indoor temperature of the building for a given supply temperature.
Mass flow rate predictor: predicts the mass flow rate of supply air provided by the HVAC.

Using these predictors, the LSTM component provides the system with information about future conditions (temperature, occupancy, …), which is used to define the temperature setpoint.

4.2. Reinforcement Learning (RL)

Using indoor and outdoor conditions, provided by the LSTM module, in combination with the comfort zone limits, the RL model sets the target temperature at every time step, to which we refer as temperature setpoint. Our RL model relies on the DDPG (Deep Deterministic Policy Gradient) approach given its ability to adapt to continuous action space with high-dimensional state spaces such as our environment [48]. Accordingly, standard RL models employ agents to learn how to make decisions through a trial-and-error technic, in which they try various actions and obtain rewards based on the outcome of each action. To interact with its environment, an agent relies on its perception of the state of that environment and its own actions. Three major components define the agent’s experience and performance:

State space: denotes the possible scenarios the agent might face. In our case, it is all the potential combinations of different values of predicted occupancy, outdoor/indoor temperature, outdoor/indoor humidity, supply air temperature, mass flow rate, wind speed, solar radiation from 7 directions, the current time of the day and date. These values are generated by the LSTM model and are used to constitute an 18-features vector that reflects the environment’s current state, which is provided to the agent at every time step $t$ .
Action space: refers to the possible actions the agent can take at any time step $t$ . Given that our goal is to determine the optimal temperature setpoint, the action will be a numerical value $a (t) \in [18.0, 22.0] \cup [22.0, 24.0]$ , where $[18.0, 22.0]$ and $[22.0, 24.0]$ represent the comfort bounds during winter and summer, respectively.
Reward function: to determine the quality of an action, the agent calculates the reward $r_{t}$ it gets at each time step $t$ using the following equation:

r_{t} = - (α \cdot E_{H V A C, t} + β \cdot |T_{s e t p o i n t} - T_{z o n e}| + γ \cdot {T r}_{e r r o r}),

(9)

where

T_{s e t p o i n t}

represents the temperature setpoint,

T_{z o n e}

represents the temperature of the environment as predicted by the LSTM model,

α

,

β

and

γ

are the weights associated with each component,

{T r}_{e r r o r}

is the comfort penalty that measures the gap between

T_{z o n e}

and the temperature limits of comfort range, and finally

E_{H V A C, t}

is the energy consumed by the HVAC system at instant

t

which is calculated as follows:

E_{H V A C, t} = \dot{m} \cdot c \cdot a b s (T_{s u p p l y} - T_{z o n e}) \cdot 1000,

(10)

where

\dot{m}

is the predicted mass flow rate

(k g / s)

and c is the specific heat of air

(1006 \frac{J}{k g} \cdot K)

.

When handling continuous control tasks, deep RL adopts an actor-critic approach in which the actor is the decision-maker that decides which action to take, i.e., which setpoint to adopt, and the critic that evaluates the quality of the action undertaken. To further improve the learning quality, the agent uses a replay buffer which randomly selects a sample of past experiences defined as (state, action, reward, next state) to further learn from [49]. Finally, the agent also introduces some randomness to its actions to improve its exploration capabilities and to avoid being stuck doing the same thing repeatedly.

4.3. Model Predictive Control (MPC)

The MPC optimizer utilizes the setpoint provided by the RL agent and the environment conditions, namely

T_{z o n e}

, provided by the LSTM model to estimate the optimal supply air temperature, denoted

T_{s u p p l y}

. A value of

T_{s u p p l y}

is considered optimal if it follows the given setpoint, ensures occupants comfort and reduces energy consumption.

To identify the optimal

T_{s u p p l y}

value, the MPC component tests various combinations of possible

T_{s e t p o i n t}

values and evaluates them using the cost function hereafter:

c o s t = \sum_{k = 0}^{p - 1} (α \cdot T r_{e r r o r} (k) + β \cdot {(T_{s u p p l y} (k) - T_{s u p p l y} (k - 1))}^{2} + δ \cdot {(T_{s e t p o i n t_a g e n t} (k) + T_{z o n e} (k))}^{2} + γ \cdot E_{H V A C, t} (k)),

(11)

where

{(T_{s u p p l y (t)} - T_{s u p p l y (t - 1)})}^{2}

penalizes abrupt changes in the control actions, and

{(T_{s e t p o i n t_a g e n t} - T_{z o n e (t)})}^{2}

penalizes the difference between

T_{s e t p o i n t_a g e n t}

and the actual temperature of the environment

T_{z o n e}

. When the optimal

T_{s u p p l y}

value is selected, the system applies this value to the HVAC; therefore, the environment state changes based on the predictions provided by the LSTM models.

In this framework, the RL agent determines the setpoint temperature at each time step and transmits it to the MPC as a reference. MPC seeks to follow this reference while ensuring comfort and efficiency. Unlike traditional MPC, which relies on complex mathematical thermal models, our approach replaces them with LSTM predictors trained on reliable historical data. These predictors provide the MPC with the necessary forecasts that enable the optimizer to make effective decisions.

Figure 4 summarizes our solution’s architecture from data generation to energy consumption and HVAC monitoring. To generate our dataset, we provide the Building modeling block, i.e., EnergyPlus, with 1 year of Casablanca’s historical weather data obtained from [50], the building’s thermal model generated by BINAYATE, and HVAC templates. Finally, the model requires prior knowledge of energy use schedules such as occupancy periods and operating hours of lighting and machines.

4.4. Performance Evaluation Metrics

To evaluate the performance of our proposed solution, we refer to the classic evaluation metrics used to evaluate AI models [51]. The first metric is the RMSE (Root Mean Square Error) which measures the mean of the squared differences between predicted and actual values as per Equation (12). This metric is considered easy to interpret which makes it more suitable for physical science models such our HVAC study.

R M S E = \sqrt{\frac{1}{n} {\sum_{i = 1}^{n} (y_{i} - \hat{y_{i}})}^{2}}

(12)

The second metric is the MAE (Mean Absolute Error) which is a more straightforward approach to measuring model’s accuracy with less sensitivity to outliers. MAE quantifies the average sum of the absolute difference between predictions and actual values [52,53]. It is calculated as follows:

M A E = |\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - \hat{y_{i}})| .

(13)

These measures are essential for comparing and evaluating the predictive capabilities of our solution. Finally, our data consists of features with different scales, some of which are larger than others and can overpower them, resulting in a biased model. To avoid this problem, we resorted to scaling our data using MinMaxScaler as per Equation (2), to ensure a fair feature contribution and prevent exploding/vanishing gradients.

5. Simulations and Results

All simulations were conducted on CPU of a computer equipped with a 13th gen Intel core i7-13700 processor and 15.7 GB of RAM. Given that our model comprise three main components, we will be detailing the simulation parameters and results of each component independently hereafter.

5.1. LSTM Training

As described in the model architecture, our LSTM component consists of 6 predictors. Each predictor uses a variety of inputs and was trained to predict a part of the environment state.

Some predictors provided several outputs. Table 3 elaborates on the input/output shape, hyperparameters, learning rate of each predictor as well as its performance using MAE and RMSE metrics. For the training of our predictors, the database was split into 3 sets: train, test and validation set, with split ratios of 70%, 15%, and 15%, respectively. These split ratios allow the models to generalize on unseen data, i.e., validation and consequently fine-tune their hyperparameters to improve their performance while training.

All predictors used Adam optimizer which is one of the most popular optimizers for neural networks applications. It is a simple optimizer that is relatively insensitive to hyperparameters and offers rapid initial progress in training, compared to other methods. Furthermore, it is the most tunable for most problems [54]. Finally, all predictors were trained in batches of 64 at a time, for 30 epochs, with a default learning rate of 0.001.

Table 4 shows that most of the LSTM models yielded good results. The weather and radiation models had higher errors, which is due to the size of their input and output that is relatively large, which makes them less accurate. Nevertheless, with this level of performance, the controller will be able to make better decisions.

5.2. RL Training

The RL agent was trained for 100 episodes, each with 14 steps. This period refers to the time slot within which the building is occupied (14 h, from 07:00 to 21:00); therefore, the environment is reset at the beginning of each episode by choosing the beginning of the day. A step marks the beginning of an hour time frame. Figure 5 summarizes the RL training process used in our architecture.

At the beginning of each step, a setpoint temperature in the comfort zone interval is selected and used to simulate the next state. Depending on the quality of the selected setpoint temperature, a value is retuned, reward or penalty.

Likewise, we tracked the rewards accumulated throughout the training process by the RL agent which is illustrated by Figure 6. It is evident that the rewards improve over time, showing that the agent is learning to make good decisions that balance comfort and energy use.

5.3. MPC Performance

Once the first two components of our hybrid architecture (LSTM and RL)- were trained, the 3 modules were tested in a 48 h simulation. The idea was to track the HVAC energy consumption and temperature of the building during this period which we compare with the setpoint temperature generated by the RL agent, as well as the thermal comfort limits. By doing so, we measure the gap between the thermal comfort zone and the building’s temperature as dictated by our model, to evaluate the ability of our system to maintain the building in ideal thermal conditions while optimizing energy consumption.

As portrayed in Figure 7, our hybrid solution maintained building’s temperature within comfort bounds for both winter and summer scenarios. During winter, temperature was kept within the 18 °C to 22 °C range, ensuring optimal thermal comfort during the occupied period. In summer, the building’s temperature remained within the 22 °C to 24 °C during all occupied times. Interestingly, during unoccupied periods, indoors temperature rose above the upper comfort bound; this was an intentional energy-saving strategy since occupants’ comfort is not required during these time slots. It would have been possible to keep the HVAC system running constantly, maintaining temperature within comfort limits even with no occupants around. However, this would have resulted in a significant increase in energy consumption which goes against the first term of our objective. The second being occupants’ comfort.

Similarly, we tracked energy consumption the HVAC system with and without our hybrid approach as illustrated by Figure 8. We notice that the hybrid system consumes less energy overall, with smoother, less abrupt and more efficient energy consumption compared to the conventional system. Additionally, the hybrid system optimizes energy use during unoccupied hours which optimizes the overall daily consumption.

In addition, Table 5 presents a comparative evaluation of the hybrid RL-MPC-LSTM control system against the conventional historical baseline. Results show that our system reduced energy consumption by 70.9% in summer and 45.4% in winter while avoiding all comfort violations. For contrast, the historical system had 29 comfort violations in summer and 26 in winter. The building’s average temperature remained within a comfortable range in both seasons, showing that comfort was maintained while saving energy.

In Figure 8, we use the different shades of colors (red and gray) to refer to energy consumption of the HVAC system before and after the deployment of the RL-MPC-LSTM system. We notice that energy consumption (light red) is equivalent to historical energy consumption (light gray) during the occupied hours, and null otherwise, as the HVAC system was turned off. On the other hand, after the deployment of our control system, energy consumption (red) dropped significantly during occupied hours. The HVAC system remains operational (gray) even when no occupant was present. As previously discussed, by keeping the HVAC running continuously, indoors temperature remains stable, and the system does not require excessive energy to control it.

When comparing our results to similar works, as portrayed by Table 6, we note that our model outperformed the model suggested in [27] in terms of energy consumption reduction where their model achieved a reduction of approximately 55%. In another similar work [32], the suggested PSO-LSTM-MPC approach managed to maintain indoor temperature within ±0.5 °C from comfort zone, thus outperforming our model’s performance in summer. However, their model requires 3 h to reach desired temperature which is comparatively a long time. In another model [55], authors used MPC along with deep neural networks to cool a large factory building. Their model reduced energy consumption by 35%, which—given the large surface of 80 m × 60 m × 9.7 m—proves MPC’s efficacy. Moreover, in [56], authors referred to XGBoost (eXtreme Gradient Boosting) combined with Deep Q Networks. Their work achieved an energy reduction and thermal comfort increase of approximately 25% and 24%, respectively, both of which are outperformed by our model. In [57] deployed a cloud-based MPC system with ARX (AutoRegressive with eXogenous inputs) in a commercial building. Their model achieved a maximum energy reduction of 15% thus proving the efficiency of enhancing the model’s performance by combining it with RL and LSTM.

Finally, while our model offers a robust approach to energy consumption optimization, it also adds a level of computational complexity. While we considered a single zone case study in our paper to validate our model, in real life, this model would require more resources to successfully be applicable to multi-zone environments. To overcome this issue, several solutions can be applied. For instance, we can resort to offline or cloud training before deployment to avoid costly learning such as in [57]. Additionally, with the rapid advances of heavy training hardware, we can use GPU acceleration to improve scalability for larger zones and speed up responsiveness. Finally, we can refer to partitioning where each moule (MPC, LSTM, RL) is implemented in regular hardware units instead of one large machine.

It is judicious to note that every model’s performance depends majorly on the building type and the dataset used to feed the HVAC optimization/control system, as well as the approach used (PSO, MPC, RNN…). Accordingly, the novelty of our model lies not only in its formulation but the environment it is applied in (laboratory), making direct comparisons with existing models methodologically problematic.

6. Conclusions

With the rapid expansion of residential and corporate areas, buildings’ energy consumption increases significantly. Given the source of this energy (fossil fuels…) and its influence on the environment, it was crucial to use reduce energy consumption in buildings. Accordingly, studies designated the HVAC system as the major energy consuming device in buildings, thus shifting the trend towards reducing its energy consumption in residential sector. However, reducing energy consumption of HVAC systems influences directly residents’ comfort. This dilemma calls for an efficient energy management system to minimize energy consumption of HVAC systems without compromising resident’s comfort.

While various works approached this subject differently, in this paper, we introduce a new smart control system for HVAC in a corporate office in Casablanca, Morocco, using a combination of RL, MPC and LSTM. Our main goal is to reduce energy consumption while keeping indoor conditions within a comfortable range. To achieve this, we leverage LSTM’s ability to learn long-short term dependencies between features to predict indoor and outdoor conditions. To train our model, we generate our own dataset using EnergyPlus to reflect the building’s behavior under various conditions (occupancy, temperature, wind speed, radiation…). EnergyPlus uses the building’s thermal model provided by BINAYATE to simulate a realistic building environment and generate a one-year dataset that contains information about indoor and outdoor environment conditions. We also referred to RL to learn and determine the optimal temperature setpoint to meet our dual objective: reducing energy consumption and increasing residents’ comfort. The MPC module controls the HVAC system to maintain indoors temperature as close as possible to the temperature setpoint defined by the RL module. The dataset we generated was also used to train and test the MPC and RL modules during different periods of the year.

The results show the remarkable performance of our system, which achieved energy savings up to 70.9% during summer and up to 45.4% during winter while keeping indoor temperatures within a comfortable range depending on the season. Our model also successfully avoided thermal comfort violations compared to historical baseline model. These results support our claim that combining data-driven control methods using MPC-LSTM, and RL can strongly enhance energy efficiency in buildings.

As an afterthought, our solution tackles HVAC control and energy optimization in a one-zone building; therefore, we plan on expanding it to be applied to multi zones, and larger buildings. Additionally, we aspire to extend our control system by using multiple control actions at the same time, such as regulating both the supply air temperature and the mass flow rate. We also plan on testing other predictive models, like transformers, which are marketed as being better at forecasts compared to LSTMs. Finally, while our model yields promising results using reasonable computational resources, heavier alternatives would prompt us to rely on cloud-based solutions to scale our model to larger environments. These extensions could make our system smarter and more efficient.

Author Contributions

A.A.: writing—original draft, resources, conceptualization; M.A.: visualization, software, investigation, validation; A.H.: software, writing—original draft; H.B.: validation, supervision; M.T.: investigation, validation and supervision; H.H.: supervision and validation; M.Y.: supervision and validation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Dataset available on request from the authors.

Acknowledgments

This work was supported by the Multidisciplinary Research and Innovation Laboratory (LPRI) of Moroccan School of Engineering Sciences (EMSI) Casablanca.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
AMEE	Association for Medical Education in Europe
ANN	Artificial Neural Network
(B)EMS	(Building) Energy Management System
DDPG	Deep Deterministic Policy Gradient
DNN	Deep Neural Network
GHG	Greenhouse Gas
HVAC	Heating
MAE	Mean Absolute Error
ML	Machine Learning
MPC	Model Predictive Control
NZCB	Net Zero Carbon Building
PID	Proportional, Integral, Derivative
RL	Reinforcement Learning
RMSE	Root Mean Square Error
RNN	Recurrent Neural Network

References

United Nations Environment Programme. 2023 Global Status Report for Buildings and Construction: Beyond Foundations—Mainstreaming Sustainable Solutions to Cut Emissions from the Buildings Sector; United Nations Environment Programme: Nairobi, Kenya, 2024; ISBN 978-92-807-4131-5. [Google Scholar]
Kabir, M.; Habiba, U.E.; Khan, W.; Shah, A.; Rahim, S.; De los Rios-Escalante, P.R.; Farooqi, Z.-U.-R.; Ali, L.; Shafiq, M. Climate Change Due to Increasing Concentration of Carbon Dioxide and Its Impacts on Environment in 21st Century; A Mini Review. J. King Saud Univ.-Sci. 2023, 35, 102693. [Google Scholar] [CrossRef]
Mirasgedis, S.; Cabeza, L.F.; Vérez, D. Contribution of Buildings Climate Change Mitigation Options to Sustainable Development. Sustain. Cities Soc. 2024, 106, 105355. [Google Scholar] [CrossRef]
Ohene, E.; Chan, A.P.C.; Darko, A.; Nani, G. Navigating toward Net Zero by 2050: Drivers, Barriers, and Strategies for Net Zero Carbon Buildings in an Emerging Market. Build. Environ. 2023, 242, 110472. [Google Scholar] [CrossRef]
Maduta, C.; Melica, G.; D’Agostino, D.; Bertoldi, P. Towards a Decarbonised Building Stock by 2050: The Meaning and the Role of Zero Emission Buildings (ZEBs) in Europe. Energy Strategy Rev. 2022, 44, 101009. [Google Scholar] [CrossRef]
Tirelli, D.; Besana, D. Moving toward Net Zero Carbon Buildings to Face Global Warming: A Narrative Review. Buildings 2023, 13, 684. [Google Scholar] [CrossRef]
Hafez, F.S.; Sa’di, B.; Safa-Gamal, M.; Taufiq-Yap, Y.H.; Alrifaey, M.; Seyedmahmoudian, M.; Stojcevski, A.; Horan, B.; Mekhilef, S. Energy Efficiency in Sustainable Buildings: A Systematic Review with Taxonomy, Challenges, Motivations, Methodological Aspects, Recommendations, and Pathways for Future Research. Energy Strategy Rev. 2023, 45, 101013. [Google Scholar] [CrossRef]
Tahmasbi, F.; Khdair, A.I.; Aburumman, G.A.; Tahmasebi, M.; Thi, N.H.; Afrand, M. Energy-Efficient Building Façades: A Comprehensive Review of Innovative Technologies and Sustainable Strategies. J. Build. Eng. 2025, 99, 111643. [Google Scholar] [CrossRef]
Raza, A.; Jingzhao, L.; Ghadi, Y.; Adnan, M.; Ali, M. Smart Home Energy Management Systems: Research Challenges and Survey. Alex. Eng. J. 2024, 92, 117–170. [Google Scholar] [CrossRef]
Han, B.; Zahraoui, Y.; Mubin, M.; Mekhilef, S.; Seyedmahmoudian, M.; Stojcevski, A. Home Energy Management Systems: A Review of the Concept, Architecture, and Scheduling Strategies. IEEE Access 2023, 11, 19999–20025. [Google Scholar] [CrossRef]
Aliero, M.S.; Asif, M.; Ghani, I.; Pasha, M.F.; Jeong, S.R. Systematic Review Analysis on Smart Building: Challenges and Opportunities. Sustainability 2022, 14, 3009. [Google Scholar] [CrossRef]
Pérez-Lombard, L.; Ortiz, J.; Pout, C. A Review on Buildings Energy Consumption Information. Energy Build. 2008, 40, 394–398. [Google Scholar] [CrossRef]
Azzi, A.; Tabaa, M.; Chegari, B.; Hachimi, H. Balancing Sustainability and Comfort: A Holistic Study of Building Control Strategies That Meet the Global Standards for Efficiency and Thermal Comfort. Sustainability 2024, 16, 2154. [Google Scholar] [CrossRef]
Pereira Silva, F.H. On/Off Control Versus PID Control: A Comparative Case Study on Condensers of Cooling Systems. In Proceedings of the 4th International Conference on Advanced Research in Applied Science and Engineering, Brussels, Belgium, 9–11 September 2022. [Google Scholar]
Felez, R.; Felez, J. Advanced Energy Management for Residential Buildings Optimizing Costs and Efficiency Through Thermal Energy Storage and Predictive Control. Appl. Sci. 2025, 15, 880. [Google Scholar] [CrossRef]
Ambroziak, A.; Borkowski, P. Temperature and Humidity Model for Predictive Control of Smart Buildings. J. Build. Eng. 2025, 100, 111668. [Google Scholar] [CrossRef]
Lin, C.-Y.; Liao, T.-K.; Chou, H.-H.; Wu, Y.-C.; Wang, C.-C.; Nian, S.-H.; Tsai, M.-Y.; Hung, T.-W. Model Predictive Control of Variable Refrigerant Flow Systems for Room Temperature Control. IEEE Access 2024, 12, 123193–123207. [Google Scholar] [CrossRef]
Tarragona, J.; Gangolells, M.; Casals, M. Model Predictive Control for Managing Indoor Air Quality Levels in Buildings. Energy Rep. 2024, 12, 787–797. [Google Scholar] [CrossRef]
Zeng, T.; Barooah, P. An Adaptive Model Predictive Control Scheme for Energy-Efficient Control of Building HVAC Systems. ASME J. Eng. Sustain. Build. Cities 2021, 2, 031001. [Google Scholar] [CrossRef]
Taheri, S.; Hosseini, P.; Razban, A. Model Predictive Control of Heating, Ventilation, and Air Conditioning (HVAC) Systems: A State-of-the-Art Review. J. Build. Eng. 2022, 60, 105067. [Google Scholar] [CrossRef]
Kim, D.; Lee, J.; Do, S.; Mago, P.J.; Lee, K.H.; Cho, H. Energy Modeling and Model Predictive Control for HVAC in Buildings: A Review of Current Research Trends. Energies 2022, 15, 7231. [Google Scholar] [CrossRef]
Why Has Advanced Commercial HVAC Control Not Yet Achieved Its Promise? Available online: https://www.alphaxiv.org/overview/2411.06204v1 (accessed on 22 July 2025).
Khan, O.; Parvez, M.; Seraj, M.; Yahya, Z.; Devarajan, Y.; Nagappan, B. Optimising Building Heat Load Prediction Using Advanced Control Strategies and Artificial Intelligence for HVAC System. Therm. Sci. Eng. Prog. 2024, 49, 102484. [Google Scholar] [CrossRef]
Gordon, D.C.; Winkler, A.; Bedei, J.; Schaber, P.; Pischinger, S.; Andert, J.; Koch, C.R. Introducing a Deep Neural Network-Based Model Predictive Control Framework for Rapid Controller Implementation. In Proceedings of the 2024 American Control Conference (ACC), Toronto, ON, Canada, 8–9 July 2024; pp. 5232–5237. [Google Scholar]
Agouzoul, A.; Simeu, E.; Tabaa, M. Synthesis of Model Predictive Control Based on Neural Network for Energy Consumption Enhancement in Building. AEU-Int. J. Electron. Commun. 2024, 173, 155021. [Google Scholar] [CrossRef]
Kim, Y.S.; Park, C.S. Real-Time Predictive Control of HVAC Systems for Factory Building Using Lightweight Data-Driven Model. J. Build. Perform. Simul. 2023, 16, 507–525. Available online: https://www.tandfonline.com/doi/abs/10.1080/19401493.2023.2182363 (accessed on 22 July 2025). [CrossRef]
Asvadi-Kermani, O.; Momeni, H.; Justo, A.; Guerrero, J.M.; Vasquez, J.C.; Rodriguez, J.; Khan, B. Energy Optimization of Air Handling Units Using Constrained Predictive Controllers Based on Dynamic Neural Networks. IEEE Access 2022, 10, 56578–56590. [Google Scholar] [CrossRef]
Li, Z.; Wang, P.; Zhang, J.; Mu, S. A Strategy of Improving Indoor Air Temperature Prediction in HVAC System Based on Multivariate Transfer Entropy. Build. Environ. 2022, 219, 109164. [Google Scholar] [CrossRef]
Hassanpour, H.; Mhaskar, P.; Risbeck, M.J. A Hybrid Machine Learning Approach Integrating Recurrent Neural Networks with Subspace Identification for Modelling HVAC Systems. Can. J. Chem. Eng. 2022, 100, 3620–3634. [Google Scholar] [CrossRef]
Noh, S.-H. Analysis of Gradient Vanishing of RNNs and Performance Comparison. Information 2021, 12, 442. [Google Scholar] [CrossRef]
Taboga, V.; Bellahsen, A.; Dagdougui, H. An Enhanced Adaptivity of Reinforcement Learning-Based Temperature Control in Buildings Using Generalized Training. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 255–266. [Google Scholar] [CrossRef]
Ma, L.; Huang, Y.; Zhang, J.; Zhao, T. A Model Predictive Control for Heat Supply at Building Thermal Inlet Based on Data-Driven Model. Buildings 2022, 12, 1879. [Google Scholar] [CrossRef]
Kim, H.; Ejaz, M.A.; Lee, K.; Cho, H.-M.; Kim, D.H. Predictive Optimal Control Mechanism of Indoor Temperature Using Modbus TCP and Deep Reinforcement Learning. Appl. Sci. 2025, 15, 7248. [Google Scholar] [CrossRef]
Chen, L.; Meng, F.; Zhang, Y. MBRL-MC: An HVAC Control Approach via Combining Model-Based Deep Reinforcement Learning and Model Predictive Control. IEEE Internet Things J. 2022, 9, 19160–19173. [Google Scholar] [CrossRef]
Al-Ani, O.; Das, S. Reinforcement Learning: Theory and Applications in HEMS. Energies 2022, 15, 6392. [Google Scholar] [CrossRef]
Lin, Y.; Huang, T.; Yang, W.; Hu, X.; Li, C. A Review on the Impact of Outdoor Environment on Indoor Thermal Environment. Buildings 2023, 13, 2600. [Google Scholar] [CrossRef]
Yu, J.; Chang, W.-S.; Dong, Y. Building Energy Prediction Models and Related Uncertainties: A Review. Buildings 2022, 12, 1284. [Google Scholar] [CrossRef]
Pan, Y.; Zhu, M.; Lv, Y.; Yang, Y.; Liang, Y.; Yin, R.; Yang, Y.; Jia, X.; Wang, X.; Zeng, F.; et al. Building Energy Simulation and Its Application for Building Performance Optimization: A Review of Methods, Tools, and Case Studies. Adv. Appl. Energy 2023, 10, 100135. [Google Scholar] [CrossRef]
Broholt, T.H.; Knudsen, M.D.; Petersen, S. The Robustness of Black and Grey-Box Models of Thermal Building Behaviour against Weather Changes. Energy Build. 2022, 275, 112460. [Google Scholar] [CrossRef]
EnergyPlus. Available online: https://www.energy.gov/eere/buildings/articles/energyplus (accessed on 22 July 2025).
Chegari, B.; Tabaa, M.; Simeu, E.; Moutaouakkil, F.; Medromi, H. Multi-Objective Optimization of Building Energy Performance and Indoor Thermal Comfort by Combining Artificial Neural Networks and Metaheuristic Algorithms. Energy Build. 2021, 239, 110839. [Google Scholar] [CrossRef]
Wang, Y.; Yu, L.; Ali, M.; Khan, I.A.; Maqsood, T.; Gao, H.; Wang, Q.; Guo, X. A Hybrid CFD and Machine Learning Study of Energy Performance of Photovoltaic Systems with a Porous Collector: Model Development and Validation. Case Stud. Therm. Eng. 2025, 69, 105998. [Google Scholar] [CrossRef]
Henderi, H.; Wahyuningsih, T.; Rahwanto, E. Comparison of Min-Max Normalization and Z-Score Normalization in the K-Nearest Neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer. Int. J. Inform. Inf. Syst. 2021, 4, 13–20. [Google Scholar] [CrossRef]
Mishra, A.K.; Ramgopal, M. Field Studies on Human Thermal Comfort—An Overview. Build. Environ. 2013, 64, 94–106. [Google Scholar] [CrossRef]
Al-Selwi, S.M.; Hassan, M.F.; Abdulkadir, S.J.; Muneer, A.; Sumiea, E.H.; Alqushaibi, A.; Ragab, M.G. RNN-LSTM: From Applications to Modeling Techniques and beyond—Systematic Review. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 102068. [Google Scholar] [CrossRef]
Sharma, S.; Sharma, S.; Athaiya, A. Activation Functions in Neural Networks. Towards Data Sci. 2017, 6, 310–316. [Google Scholar] [CrossRef]
Ding, B.; Qian, H.; Zhou, J. Activation Functions and Their Characteristics in Deep Neural Networks. In Proceedings of the 2018 Chinese Control And Decision Conference (CCDC), Shenyang, China, 9–11 June 2018; pp. 1836–1841. [Google Scholar]
Shakya, A.K.; Pillai, G.; Chakrabarty, S. Reinforcement Learning Algorithms: A Brief Survey. Expert Syst. Appl. 2023, 231, 120495. [Google Scholar] [CrossRef]
Liu, R.; Zou, J. The Effects of Memory Replay in Reinforcement Learning. In Proceedings of the 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 2–5 October 2018; pp. 478–485. [Google Scholar]
Willmott, C.J. On the Validation of Models. Phys. Geogr. 1981, 2, 184–194. [Google Scholar] [CrossRef]
Mean Absolute Error—An Overview|ScienceDirect Topics. Available online: https://www.sciencedirect.com/topics/engineering/mean-absolute-error (accessed on 23 July 2025).
Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
Climate.Onebuilding.Org. Available online: https://climate.onebuilding.org/ (accessed on 1 September 2025).
Sun, R.-Y. Optimization for Deep Learning: An Overview. J. Oper. Res. Soc. China 2020, 8, 249–294. [Google Scholar] [CrossRef]
Ra, S.J.; Kim, J.-H.; Park, C.S. Real-Time Model Predictive Cooling Control for an HVAC System in a Factory Building. Energy Build. 2023, 285, 112860. [Google Scholar] [CrossRef]
Liu, X.; Gou, Z. Occupant-Centric HVAC and Window Control: A Reinforcement Learning Model for Enhancing Indoor Thermal Comfort and Energy Efficiency. Build. Environ. 2024, 250, 111197. [Google Scholar] [CrossRef]
Taheri, S.; Amiri, A.J.; Razban, A. Real-World Implementation of a Cloud-Based MPC for HVAC Control in Educational Buildings. Energy Convers. Manag. 2024, 305, 118270. [Google Scholar] [CrossRef]

Figure 1. Building used in this study with overhead lighting [yellow arrows: natural light].

Figure 2. Proposed LSTM-RL-MPC architecture.

Figure 3. LSTM architecture.

Figure 4. Pipeline of our suggested solution RL-MPC-LSTM pipeline.

Figure 5. RL training process.

Figure 6. Rewards’ progress over 100 episodes.

Figure 7. Building temperature and thermal comfort zone (historical vs. RL-MPC-LSTM controlled) [top: summer, bottom: winter].

Figure 8. Energy consumption of the HVAC (historical vs. RL-MPC-LSTM controlled) [top: summer, bottom: winter].

Table 1. Building construction characteristics.

Construction	Material	Thickness [mm]	Conductivity [W/m^−k]	Density [kg/m³]	Specific Heat [kJ/Kg^−k]
Exterior Wall	Ciment	15	1.8	2500	1
	Raw earth	100	1.04	2300	1
	Layer of air	50	R = 0.18 m² k/W
	Raw earth	100	1.04	2350	1
	Ciment	15	1.8	2500	1
Internal wall	Ciment	15	1.8	2500	1
	Concrete agglo 6 holes	120	0.56	768	0.83
	Ciment	15	1.8	2500	1
Roof	Plaster	20	0.56	1350	1
	Hourdis	200	1.32	1327	1
	Concrete	50	2	2450	1
	Ciment	15	1.8	2500	1
	Floor tile	15	1.3	2300	0.84
Ground floor	Ciment	15	1.8	2500	1
	Hourdis	160	1.18	1372	1
	Concrete	40	2	2450	1
	Ciment	15	1.8	2500	1

Table 2. Description of data generated by EnergyPlus.

Feature Name	Designation		Unit	Type
Date	Date and time of observation		-	datetime
People_Count	Number of occupants		-	integer
Temp_Outdoor	temperature outdoor		°C	float
Temp_Zone	temperature indoor
Temp_Top	Operative temperature
Temp_Supply	Supply air temperature
RH_Indoor	Relative Humidity indoors		%
RH_Outdoor	Relative Humidity outdoors		%
Mass_Flow_Rate	mass flow rate of supply air		kg/s
Wind_Speed	Wind speed		m/s
SolarRad_North	Solar radiation incoming from:	North	W/m²
SolarRad_South		South
SolarRad_East		East
SolarRad_West		West
SolarRad_Roof		Roof
SolarRad_East_Window		Esat side windows
SolarRad_West_Window		West side windows
Energy_HVAC	Energy consumed by HVAC		kWh
Total of rows (observations)			8736

Table 3. Hyperparameters the LSTM module predictors.

Model	Input	Output	Layers	Units	Dropout Rate
Indoor conditions predictor	(24, 14)	2	3	80	0.2
Mass flow rate predictor	(24, 5)	1	4	80
Occupancy predictor	(24, 1)	1	3	50
Radiations predictor	(24, 7)	7	3	80
Weather predictor	(24, 3)	3	4	80
Temp zone predictor	(24, 15)	1	4	80	0.3

Table 4. LSTM predictors performance.

Model	MAE	RMSE
Indoor conditions predictor	0.0310	0.0421
Mass flow rate predictor	0.0176	0.0779
Occupancy predictor	0.1124	0.1738
Radiations predictor	10.0157	25.0250
Weather predictor	1.8957	2.6426
Temp zone predictor	0.1490	0.1733

Table 5. Hybrid vs. conventional Control system performance.

	Season	Total Energy (Wh)	Comfort Violations	Average Building Temp (°C)
RL-MPC-LSTM system	Summer	93,145	0	23.5
RL-MPC-LSTM system	Winter	109,757	0	18.9
Historical baseline	Summer	320,212	29	25.3
Historical baseline	Winter	200,867	26	22.2

Table 6. Performance and architecture comparison between our model and similar works.

Work	Model Architecture	Building (Data) Type	Energy Reduction Ratio
[27]	RNN + RLS based AGPC + FNN	medium/large academic and research building	vs. Previous work: 54.95% vs. real-system: 69.9%
[32]	LSTM + PSO + MPC	3 teaching buildings + 1 office building	Not provided
[55]	10 DNN + MPC + Global Search	Factory building (July–September)	35.10%
[56]	XGBoost + Deep Q-Network	ASHRAE Global Building Occupant Behavior Database: 18 rooms in 4 cities	24.70%
[57]	ARX + MPC + cloud based microservices architecture + 3 control methods: PI (Proportional–integral) MPC Optimized Occupancy-based control	3 months of winter in a two-floor educational facility	PI: 12.83% MPC Optimized: 19.21% Occupancy-based: 14.98%
Our model	LSTM-MPC-RL	Research laboratory (office)	Summer: 70.9% Winter: 45.4%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Azzi, A.; Abid, M.; Hanif, A.; Bensag, H.; Tabaa, M.; Hachimi, H.; Youssfi, M. A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building. Energies 2025, 18, 4783. https://doi.org/10.3390/en18174783

AMA Style

Azzi A, Abid M, Hanif A, Bensag H, Tabaa M, Hachimi H, Youssfi M. A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building. Energies. 2025; 18(17):4783. https://doi.org/10.3390/en18174783

Chicago/Turabian Style

Azzi, Amal, Meryem Abid, Ayoub Hanif, Hassna Bensag, Mohamed Tabaa, Hanaa Hachimi, and Mohamed Youssfi. 2025. "A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building" Energies 18, no. 17: 4783. https://doi.org/10.3390/en18174783

APA Style

Azzi, A., Abid, M., Hanif, A., Bensag, H., Tabaa, M., Hachimi, H., & Youssfi, M. (2025). A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building. Energies, 18(17), 4783. https://doi.org/10.3390/en18174783

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building

Abstract

1. Introduction

2. Related Works

3. Data and Environment Description

3.1. Building Description

3.2. Building Modeling Technique

3.3. Data Preprocessing

4. Model Architecture

4.1. Long-Short Term Memory (LSTM)

4.2. Reinforcement Learning (RL)

4.3. Model Predictive Control (MPC)

4.4. Performance Evaluation Metrics

5. Simulations and Results

5.1. LSTM Training

5.2. RL Training

5.3. MPC Performance

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI