A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building
Abstract
1. Introduction
2. Related Works
3. Data and Environment Description
3.1. Building Description
3.2. Building Modeling Technique
3.3. Data Preprocessing
4. Model Architecture
4.1. Long-Short Term Memory (LSTM)
- Weather predictor: uses historical data to predict future weather conditions such as outdoor temperature and humidity.
- Indoor conditions predictor: uses current states and control actions to predict the future state of the HVAC system as well as indoor temperature and humidity.
- Radiations predictor: predicts solar radiation.
- Occupancy predictor: estimates the number of occupants in the building.
- Temperature predictor: forecasts the future indoor temperature of the building for a given supply temperature.
- Mass flow rate predictor: predicts the mass flow rate of supply air provided by the HVAC.
4.2. Reinforcement Learning (RL)
- State space: denotes the possible scenarios the agent might face. In our case, it is all the potential combinations of different values of predicted occupancy, outdoor/indoor temperature, outdoor/indoor humidity, supply air temperature, mass flow rate, wind speed, solar radiation from 7 directions, the current time of the day and date. These values are generated by the LSTM model and are used to constitute an 18-features vector that reflects the environment’s current state, which is provided to the agent at every time step .
- Action space: refers to the possible actions the agent can take at any time step . Given that our goal is to determine the optimal temperature setpoint, the action will be a numerical value , where and represent the comfort bounds during winter and summer, respectively.
- Reward function: to determine the quality of an action, the agent calculates the reward it gets at each time step using the following equation:
4.3. Model Predictive Control (MPC)
4.4. Performance Evaluation Metrics
5. Simulations and Results
5.1. LSTM Training
5.2. RL Training
5.3. MPC Performance
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial intelligence |
AMEE | Association for Medical Education in Europe |
ANN | Artificial Neural Network |
(B)EMS | (Building) Energy Management System |
DDPG | Deep Deterministic Policy Gradient |
DNN | Deep Neural Network |
GHG | Greenhouse Gas |
HVAC | Heating |
MAE | Mean Absolute Error |
ML | Machine Learning |
MPC | Model Predictive Control |
NZCB | Net Zero Carbon Building |
PID | Proportional, Integral, Derivative |
RL | Reinforcement Learning |
RMSE | Root Mean Square Error |
RNN | Recurrent Neural Network |
References
- United Nations Environment Programme. 2023 Global Status Report for Buildings and Construction: Beyond Foundations—Mainstreaming Sustainable Solutions to Cut Emissions from the Buildings Sector; United Nations Environment Programme: Nairobi, Kenya, 2024; ISBN 978-92-807-4131-5. [Google Scholar]
- Kabir, M.; Habiba, U.E.; Khan, W.; Shah, A.; Rahim, S.; De los Rios-Escalante, P.R.; Farooqi, Z.-U.-R.; Ali, L.; Shafiq, M. Climate Change Due to Increasing Concentration of Carbon Dioxide and Its Impacts on Environment in 21st Century; A Mini Review. J. King Saud Univ.-Sci. 2023, 35, 102693. [Google Scholar] [CrossRef]
- Mirasgedis, S.; Cabeza, L.F.; Vérez, D. Contribution of Buildings Climate Change Mitigation Options to Sustainable Development. Sustain. Cities Soc. 2024, 106, 105355. [Google Scholar] [CrossRef]
- Ohene, E.; Chan, A.P.C.; Darko, A.; Nani, G. Navigating toward Net Zero by 2050: Drivers, Barriers, and Strategies for Net Zero Carbon Buildings in an Emerging Market. Build. Environ. 2023, 242, 110472. [Google Scholar] [CrossRef]
- Maduta, C.; Melica, G.; D’Agostino, D.; Bertoldi, P. Towards a Decarbonised Building Stock by 2050: The Meaning and the Role of Zero Emission Buildings (ZEBs) in Europe. Energy Strategy Rev. 2022, 44, 101009. [Google Scholar] [CrossRef]
- Tirelli, D.; Besana, D. Moving toward Net Zero Carbon Buildings to Face Global Warming: A Narrative Review. Buildings 2023, 13, 684. [Google Scholar] [CrossRef]
- Hafez, F.S.; Sa’di, B.; Safa-Gamal, M.; Taufiq-Yap, Y.H.; Alrifaey, M.; Seyedmahmoudian, M.; Stojcevski, A.; Horan, B.; Mekhilef, S. Energy Efficiency in Sustainable Buildings: A Systematic Review with Taxonomy, Challenges, Motivations, Methodological Aspects, Recommendations, and Pathways for Future Research. Energy Strategy Rev. 2023, 45, 101013. [Google Scholar] [CrossRef]
- Tahmasbi, F.; Khdair, A.I.; Aburumman, G.A.; Tahmasebi, M.; Thi, N.H.; Afrand, M. Energy-Efficient Building Façades: A Comprehensive Review of Innovative Technologies and Sustainable Strategies. J. Build. Eng. 2025, 99, 111643. [Google Scholar] [CrossRef]
- Raza, A.; Jingzhao, L.; Ghadi, Y.; Adnan, M.; Ali, M. Smart Home Energy Management Systems: Research Challenges and Survey. Alex. Eng. J. 2024, 92, 117–170. [Google Scholar] [CrossRef]
- Han, B.; Zahraoui, Y.; Mubin, M.; Mekhilef, S.; Seyedmahmoudian, M.; Stojcevski, A. Home Energy Management Systems: A Review of the Concept, Architecture, and Scheduling Strategies. IEEE Access 2023, 11, 19999–20025. [Google Scholar] [CrossRef]
- Aliero, M.S.; Asif, M.; Ghani, I.; Pasha, M.F.; Jeong, S.R. Systematic Review Analysis on Smart Building: Challenges and Opportunities. Sustainability 2022, 14, 3009. [Google Scholar] [CrossRef]
- Pérez-Lombard, L.; Ortiz, J.; Pout, C. A Review on Buildings Energy Consumption Information. Energy Build. 2008, 40, 394–398. [Google Scholar] [CrossRef]
- Azzi, A.; Tabaa, M.; Chegari, B.; Hachimi, H. Balancing Sustainability and Comfort: A Holistic Study of Building Control Strategies That Meet the Global Standards for Efficiency and Thermal Comfort. Sustainability 2024, 16, 2154. [Google Scholar] [CrossRef]
- Pereira Silva, F.H. On/Off Control Versus PID Control: A Comparative Case Study on Condensers of Cooling Systems. In Proceedings of the 4th International Conference on Advanced Research in Applied Science and Engineering, Brussels, Belgium, 9–11 September 2022. [Google Scholar]
- Felez, R.; Felez, J. Advanced Energy Management for Residential Buildings Optimizing Costs and Efficiency Through Thermal Energy Storage and Predictive Control. Appl. Sci. 2025, 15, 880. [Google Scholar] [CrossRef]
- Ambroziak, A.; Borkowski, P. Temperature and Humidity Model for Predictive Control of Smart Buildings. J. Build. Eng. 2025, 100, 111668. [Google Scholar] [CrossRef]
- Lin, C.-Y.; Liao, T.-K.; Chou, H.-H.; Wu, Y.-C.; Wang, C.-C.; Nian, S.-H.; Tsai, M.-Y.; Hung, T.-W. Model Predictive Control of Variable Refrigerant Flow Systems for Room Temperature Control. IEEE Access 2024, 12, 123193–123207. [Google Scholar] [CrossRef]
- Tarragona, J.; Gangolells, M.; Casals, M. Model Predictive Control for Managing Indoor Air Quality Levels in Buildings. Energy Rep. 2024, 12, 787–797. [Google Scholar] [CrossRef]
- Zeng, T.; Barooah, P. An Adaptive Model Predictive Control Scheme for Energy-Efficient Control of Building HVAC Systems. ASME J. Eng. Sustain. Build. Cities 2021, 2, 031001. [Google Scholar] [CrossRef]
- Taheri, S.; Hosseini, P.; Razban, A. Model Predictive Control of Heating, Ventilation, and Air Conditioning (HVAC) Systems: A State-of-the-Art Review. J. Build. Eng. 2022, 60, 105067. [Google Scholar] [CrossRef]
- Kim, D.; Lee, J.; Do, S.; Mago, P.J.; Lee, K.H.; Cho, H. Energy Modeling and Model Predictive Control for HVAC in Buildings: A Review of Current Research Trends. Energies 2022, 15, 7231. [Google Scholar] [CrossRef]
- Why Has Advanced Commercial HVAC Control Not Yet Achieved Its Promise? Available online: https://www.alphaxiv.org/overview/2411.06204v1 (accessed on 22 July 2025).
- Khan, O.; Parvez, M.; Seraj, M.; Yahya, Z.; Devarajan, Y.; Nagappan, B. Optimising Building Heat Load Prediction Using Advanced Control Strategies and Artificial Intelligence for HVAC System. Therm. Sci. Eng. Prog. 2024, 49, 102484. [Google Scholar] [CrossRef]
- Gordon, D.C.; Winkler, A.; Bedei, J.; Schaber, P.; Pischinger, S.; Andert, J.; Koch, C.R. Introducing a Deep Neural Network-Based Model Predictive Control Framework for Rapid Controller Implementation. In Proceedings of the 2024 American Control Conference (ACC), Toronto, ON, Canada, 8–9 July 2024; pp. 5232–5237. [Google Scholar]
- Agouzoul, A.; Simeu, E.; Tabaa, M. Synthesis of Model Predictive Control Based on Neural Network for Energy Consumption Enhancement in Building. AEU-Int. J. Electron. Commun. 2024, 173, 155021. [Google Scholar] [CrossRef]
- Kim, Y.S.; Park, C.S. Real-Time Predictive Control of HVAC Systems for Factory Building Using Lightweight Data-Driven Model. J. Build. Perform. Simul. 2023, 16, 507–525. Available online: https://www.tandfonline.com/doi/abs/10.1080/19401493.2023.2182363 (accessed on 22 July 2025). [CrossRef]
- Asvadi-Kermani, O.; Momeni, H.; Justo, A.; Guerrero, J.M.; Vasquez, J.C.; Rodriguez, J.; Khan, B. Energy Optimization of Air Handling Units Using Constrained Predictive Controllers Based on Dynamic Neural Networks. IEEE Access 2022, 10, 56578–56590. [Google Scholar] [CrossRef]
- Li, Z.; Wang, P.; Zhang, J.; Mu, S. A Strategy of Improving Indoor Air Temperature Prediction in HVAC System Based on Multivariate Transfer Entropy. Build. Environ. 2022, 219, 109164. [Google Scholar] [CrossRef]
- Hassanpour, H.; Mhaskar, P.; Risbeck, M.J. A Hybrid Machine Learning Approach Integrating Recurrent Neural Networks with Subspace Identification for Modelling HVAC Systems. Can. J. Chem. Eng. 2022, 100, 3620–3634. [Google Scholar] [CrossRef]
- Noh, S.-H. Analysis of Gradient Vanishing of RNNs and Performance Comparison. Information 2021, 12, 442. [Google Scholar] [CrossRef]
- Taboga, V.; Bellahsen, A.; Dagdougui, H. An Enhanced Adaptivity of Reinforcement Learning-Based Temperature Control in Buildings Using Generalized Training. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 255–266. [Google Scholar] [CrossRef]
- Ma, L.; Huang, Y.; Zhang, J.; Zhao, T. A Model Predictive Control for Heat Supply at Building Thermal Inlet Based on Data-Driven Model. Buildings 2022, 12, 1879. [Google Scholar] [CrossRef]
- Kim, H.; Ejaz, M.A.; Lee, K.; Cho, H.-M.; Kim, D.H. Predictive Optimal Control Mechanism of Indoor Temperature Using Modbus TCP and Deep Reinforcement Learning. Appl. Sci. 2025, 15, 7248. [Google Scholar] [CrossRef]
- Chen, L.; Meng, F.; Zhang, Y. MBRL-MC: An HVAC Control Approach via Combining Model-Based Deep Reinforcement Learning and Model Predictive Control. IEEE Internet Things J. 2022, 9, 19160–19173. [Google Scholar] [CrossRef]
- Al-Ani, O.; Das, S. Reinforcement Learning: Theory and Applications in HEMS. Energies 2022, 15, 6392. [Google Scholar] [CrossRef]
- Lin, Y.; Huang, T.; Yang, W.; Hu, X.; Li, C. A Review on the Impact of Outdoor Environment on Indoor Thermal Environment. Buildings 2023, 13, 2600. [Google Scholar] [CrossRef]
- Yu, J.; Chang, W.-S.; Dong, Y. Building Energy Prediction Models and Related Uncertainties: A Review. Buildings 2022, 12, 1284. [Google Scholar] [CrossRef]
- Pan, Y.; Zhu, M.; Lv, Y.; Yang, Y.; Liang, Y.; Yin, R.; Yang, Y.; Jia, X.; Wang, X.; Zeng, F.; et al. Building Energy Simulation and Its Application for Building Performance Optimization: A Review of Methods, Tools, and Case Studies. Adv. Appl. Energy 2023, 10, 100135. [Google Scholar] [CrossRef]
- Broholt, T.H.; Knudsen, M.D.; Petersen, S. The Robustness of Black and Grey-Box Models of Thermal Building Behaviour against Weather Changes. Energy Build. 2022, 275, 112460. [Google Scholar] [CrossRef]
- EnergyPlus. Available online: https://www.energy.gov/eere/buildings/articles/energyplus (accessed on 22 July 2025).
- Chegari, B.; Tabaa, M.; Simeu, E.; Moutaouakkil, F.; Medromi, H. Multi-Objective Optimization of Building Energy Performance and Indoor Thermal Comfort by Combining Artificial Neural Networks and Metaheuristic Algorithms. Energy Build. 2021, 239, 110839. [Google Scholar] [CrossRef]
- Wang, Y.; Yu, L.; Ali, M.; Khan, I.A.; Maqsood, T.; Gao, H.; Wang, Q.; Guo, X. A Hybrid CFD and Machine Learning Study of Energy Performance of Photovoltaic Systems with a Porous Collector: Model Development and Validation. Case Stud. Therm. Eng. 2025, 69, 105998. [Google Scholar] [CrossRef]
- Henderi, H.; Wahyuningsih, T.; Rahwanto, E. Comparison of Min-Max Normalization and Z-Score Normalization in the K-Nearest Neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer. Int. J. Inform. Inf. Syst. 2021, 4, 13–20. [Google Scholar] [CrossRef]
- Mishra, A.K.; Ramgopal, M. Field Studies on Human Thermal Comfort—An Overview. Build. Environ. 2013, 64, 94–106. [Google Scholar] [CrossRef]
- Al-Selwi, S.M.; Hassan, M.F.; Abdulkadir, S.J.; Muneer, A.; Sumiea, E.H.; Alqushaibi, A.; Ragab, M.G. RNN-LSTM: From Applications to Modeling Techniques and beyond—Systematic Review. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 102068. [Google Scholar] [CrossRef]
- Sharma, S.; Sharma, S.; Athaiya, A. Activation Functions in Neural Networks. Towards Data Sci. 2017, 6, 310–316. [Google Scholar] [CrossRef]
- Ding, B.; Qian, H.; Zhou, J. Activation Functions and Their Characteristics in Deep Neural Networks. In Proceedings of the 2018 Chinese Control And Decision Conference (CCDC), Shenyang, China, 9–11 June 2018; pp. 1836–1841. [Google Scholar]
- Shakya, A.K.; Pillai, G.; Chakrabarty, S. Reinforcement Learning Algorithms: A Brief Survey. Expert Syst. Appl. 2023, 231, 120495. [Google Scholar] [CrossRef]
- Liu, R.; Zou, J. The Effects of Memory Replay in Reinforcement Learning. In Proceedings of the 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 2–5 October 2018; pp. 478–485. [Google Scholar]
- Willmott, C.J. On the Validation of Models. Phys. Geogr. 1981, 2, 184–194. [Google Scholar] [CrossRef]
- Mean Absolute Error—An Overview|ScienceDirect Topics. Available online: https://www.sciencedirect.com/topics/engineering/mean-absolute-error (accessed on 23 July 2025).
- Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Climate.Onebuilding.Org. Available online: https://climate.onebuilding.org/ (accessed on 1 September 2025).
- Sun, R.-Y. Optimization for Deep Learning: An Overview. J. Oper. Res. Soc. China 2020, 8, 249–294. [Google Scholar] [CrossRef]
- Ra, S.J.; Kim, J.-H.; Park, C.S. Real-Time Model Predictive Cooling Control for an HVAC System in a Factory Building. Energy Build. 2023, 285, 112860. [Google Scholar] [CrossRef]
- Liu, X.; Gou, Z. Occupant-Centric HVAC and Window Control: A Reinforcement Learning Model for Enhancing Indoor Thermal Comfort and Energy Efficiency. Build. Environ. 2024, 250, 111197. [Google Scholar] [CrossRef]
- Taheri, S.; Amiri, A.J.; Razban, A. Real-World Implementation of a Cloud-Based MPC for HVAC Control in Educational Buildings. Energy Convers. Manag. 2024, 305, 118270. [Google Scholar] [CrossRef]
Construction | Material | Thickness [mm] | Conductivity [W/m−k] | Density [kg/m3] | Specific Heat [kJ/Kg−k] |
---|---|---|---|---|---|
Exterior Wall | Ciment | 15 | 1.8 | 2500 | 1 |
Raw earth | 100 | 1.04 | 2300 | 1 | |
Layer of air | 50 | R = 0.18 m2 k/W | |||
Raw earth | 100 | 1.04 | 2350 | 1 | |
Ciment | 15 | 1.8 | 2500 | 1 | |
Internal wall | Ciment | 15 | 1.8 | 2500 | 1 |
Concrete agglo 6 holes | 120 | 0.56 | 768 | 0.83 | |
Ciment | 15 | 1.8 | 2500 | 1 | |
Roof | Plaster | 20 | 0.56 | 1350 | 1 |
Hourdis | 200 | 1.32 | 1327 | 1 | |
Concrete | 50 | 2 | 2450 | 1 | |
Ciment | 15 | 1.8 | 2500 | 1 | |
Floor tile | 15 | 1.3 | 2300 | 0.84 | |
Ground floor | Ciment | 15 | 1.8 | 2500 | 1 |
Hourdis | 160 | 1.18 | 1372 | 1 | |
Concrete | 40 | 2 | 2450 | 1 | |
Ciment | 15 | 1.8 | 2500 | 1 |
Feature Name | Designation | Unit | Type | |
---|---|---|---|---|
Date | Date and time of observation | - | datetime | |
People_Count | Number of occupants | - | integer | |
Temp_Outdoor | temperature outdoor | °C | float | |
Temp_Zone | temperature indoor | |||
Temp_Top | Operative temperature | |||
Temp_Supply | Supply air temperature | |||
RH_Indoor | Relative Humidity indoors | % | ||
RH_Outdoor | Relative Humidity outdoors | |||
Mass_Flow_Rate | mass flow rate of supply air | kg/s | ||
Wind_Speed | Wind speed | m/s | ||
SolarRad_North | Solar radiation incoming from: | North | W/m2 | |
SolarRad_South | South | |||
SolarRad_East | East | |||
SolarRad_West | West | |||
SolarRad_Roof | Roof | |||
SolarRad_East_Window | Esat side windows | |||
SolarRad_West_Window | West side windows | |||
Energy_HVAC | Energy consumed by HVAC | kWh | ||
Total of rows (observations) | 8736 |
Model | Input | Output | Layers | Units | Dropout Rate |
---|---|---|---|---|---|
Indoor conditions predictor | (24, 14) | 2 | 3 | 80 | 0.2 |
Mass flow rate predictor | (24, 5) | 1 | 4 | 80 | |
Occupancy predictor | (24, 1) | 1 | 3 | 50 | |
Radiations predictor | (24, 7) | 7 | 3 | 80 | |
Weather predictor | (24, 3) | 3 | 4 | 80 | |
Temp zone predictor | (24, 15) | 1 | 4 | 80 | 0.3 |
Model | MAE | RMSE |
---|---|---|
Indoor conditions predictor | 0.0310 | 0.0421 |
Mass flow rate predictor | 0.0176 | 0.0779 |
Occupancy predictor | 0.1124 | 0.1738 |
Radiations predictor | 10.0157 | 25.0250 |
Weather predictor | 1.8957 | 2.6426 |
Temp zone predictor | 0.1490 | 0.1733 |
Season | Total Energy (Wh) | Comfort Violations | Average Building Temp (°C) | |
---|---|---|---|---|
RL-MPC-LSTM system | Summer | 93,145 | 0 | 23.5 |
Winter | 109,757 | 0 | 18.9 | |
Historical baseline | Summer | 320,212 | 29 | 25.3 |
Winter | 200,867 | 26 | 22.2 |
Work | Model Architecture | Building (Data) Type | Energy Reduction Ratio |
---|---|---|---|
[27] | RNN + RLS based AGPC + FNN | medium/large academic and research building | vs. Previous work: 54.95% vs. real-system: 69.9% |
[32] | LSTM + PSO + MPC | 3 teaching buildings + 1 office building | Not provided |
[55] | 10 DNN + MPC + Global Search | Factory building (July–September) | 35.10% |
[56] | XGBoost + Deep Q-Network | ASHRAE Global Building Occupant Behavior Database: 18 rooms in 4 cities | 24.70% |
[57] | ARX + MPC + cloud based microservices architecture + 3 control methods:
| 3 months of winter in a two-floor educational facility | PI: 12.83% MPC Optimized: 19.21% Occupancy-based: 14.98% |
Our model | LSTM-MPC-RL | Research laboratory (office) | Summer: 70.9% Winter: 45.4% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Azzi, A.; Abid, M.; Hanif, A.; Bensag, H.; Tabaa, M.; Hachimi, H.; Youssfi, M. A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building. Energies 2025, 18, 4783. https://doi.org/10.3390/en18174783
Azzi A, Abid M, Hanif A, Bensag H, Tabaa M, Hachimi H, Youssfi M. A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building. Energies. 2025; 18(17):4783. https://doi.org/10.3390/en18174783
Chicago/Turabian StyleAzzi, Amal, Meryem Abid, Ayoub Hanif, Hassna Bensag, Mohamed Tabaa, Hanaa Hachimi, and Mohamed Youssfi. 2025. "A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building" Energies 18, no. 17: 4783. https://doi.org/10.3390/en18174783
APA StyleAzzi, A., Abid, M., Hanif, A., Bensag, H., Tabaa, M., Hachimi, H., & Youssfi, M. (2025). A Hybrid Control Strategy Combining Reinforcement Learning and MPC-LSTM for Energy Management in Building. Energies, 18(17), 4783. https://doi.org/10.3390/en18174783