An End-to-End Relearning Framework for Building Energy Optimization
Abstract
:1. Introduction
2. Related Work
3. Problem Formulation
4. End-to-End Relearning Framework
4.1. Outer Loop: Online Operation and Performance Monitoring
4.2. Inner Loop: Update and Relearning
4.2.1. Modeling of the Dynamic Systems
4.2.2. Exogenous Variable Prediction Models
4.2.3. Deep Reinforcement Learning Controller
Algorithm 1: End-to-End Relearning algorithm |
5. Hyperparameter Selection
5.1. HPO: Problem Formulation
5.2. Decomposition of Hyperparameter Space
5.2.1. Metric-Based Decomposition
5.2.2. Separation of Hyperparameters: Connected Components and D-Separation
- Indirect Causal Effect : X can influence evidence Y if Z has not been observed.
- Indirect Evidential Effect : Evidence X can influence Y if Z has not been observed.
- Common Cause : X can influence Y if Z has not been observed.
- Common Effect : X can influence Y only if Z has been observed. Otherwise, they are independent.
- The evaluation of local metrics blocks the effect of the local hyperparameters on global metrics (Rule 1). Therefore, any two models affecting the same global metric will not have their hyperparameters related via a common effect (Rule 4).
- Assume that Z is the set of global hyperparameters, while X and Y represent local hyperparameter sets associated with individual models. If they belong to the same connected component (sub-graph), we will have to jointly optimize the hyperparameters of the individual models, thus increasing the computational complexity of the optimization process.Instead, we can consider a two-level optimization process: First, we chose candidate values for the global hyperparameters in Z. Given the values of the variables in Z, we can independently optimize the hyperparameters connected to components X and Y, thus decomposing the hyperparameter space further by applying Rule 3.
- In many cases, all global hyperparameters are linked to the global metrics, and then they have to be jointly optimized following Rule 4.
- Choose candidate values of global hyperparameters .
- With the chosen values of , optimize the hyperparameters in associated with each model in the ensemble by using the corresponding model performance as the objective function.
- Evaluate the performance of subsets in on using the set of Equation (1).
5.2.3. Bayesian Optimization for Hyperparameter Tuning
6. Experimental Settings
6.1. System Description
6.2. Reinforcement Learning Definitions
6.3. Implementation of the Solution
6.3.1. Dynamic System Model
6.3.2. Experience Buffer
6.3.3. Supervisory Controller
6.3.4. Exogenous Variable Predictors
6.3.5. Performance Monitor Module
6.4. Hyperparameter Selection
6.4.1. Bayes Net
6.4.2. Separation of Hyperparameters
6.4.3. Two-Step Hyperparameter Optimization
6.4.4. Global and Local Hyperparameter Choices
7. Results and Discussion
7.1. Results on Tuned Model Architecture and Model Evaluation
7.1.1. Dynamic System Model
7.1.2. Experience Buffer
7.1.3. Supervisory Controller
7.1.4. Exogenous Variable Predictor Models
7.1.5. Performance Monitor Module
7.2. Benchmark Experiments
7.3. Results
8. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
HVAC | Heating, Ventilation, and Air Conditioning |
CPS | Cyber Physical Systems |
MPC | Model Predictive Control |
ML | Machine Learning |
RL | Reinforcement Learning |
EWC | Elastic Weighted Consolidation |
ASHRAE | American Society of Heating, Refrigerating and Air-Conditioning Engineers |
MDP | Markov Decision Process |
LC-NSMDP | Lipschitz Continuous Non Stationary Markov Decision Process |
LSTM | Long Short Term Memory |
FCNN | Fully Connected Neural Network |
FIFO | First, In First, Out Queue |
OOD | Out of Distribution Error |
HPO | Hyperparameter Optimization |
DAG | Directed Acyclic Graph |
SMBO | Sequential Model Based Optimization |
FMU | Functional Mockup Unit |
VAV | Variable Air Volume Unit |
MPPI | Model Predictive Path Integral Control |
PPO | Proximal Policy Optimization |
DDPG | Deep Deterministic Policy Gradient |
References
- Cassandras, C. Chapter 3—Online Control and Optimization for Cyber-Physical Systems. In Cyber-Physical Systems; Song, H., Rawat, D.B., Jeschke, S., Brecher, C., Eds.; Intelligent Data-Centric Systems; Academic Press: Boston, MA, USA, 2017; pp. 31–54. [Google Scholar]
- Al-Ali, R.; Bulej, L.; Kofroň, J.; Bureš, T. A guide to design uncertainty-aware self-adaptive components in Cyber–Physical Systems. Future Gener. Comput. Syst. 2022, 128, 466–489. [Google Scholar] [CrossRef]
- Padakandla, S.; J, P.K.; Bhatnagar, S. Reinforcement learning algorithm for non-stationary environments. Appl. Intell. 2020, 50, 3590–3606. [Google Scholar] [CrossRef]
- Lewis, F.L.; Vrabie, D.L.; Syrmos, V.L. Optimal Control; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2012. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Branicky, M.S. Introduction to hybrid systems. In Handbook of Networked and Embedded Control Systems; Springer: Berlin/Heidelberg, Germany, 2005; pp. 91–116. [Google Scholar]
- Naug, A.; Quinones-Grueiro, M.; Biswas, G. Deep reinforcement learning control for non-stationary building energy management. Energy Build. 2022, 277, 112584. [Google Scholar] [CrossRef]
- Kim, D.; Lee, J.; Do, S.; Mago, P.J.; Lee, K.H.; Cho, H. Energy modeling and model predictive control for HVAC in buildings: A review of current research trends. Energies 2022, 15, 7231. [Google Scholar] [CrossRef]
- Zhang, H.; Seal, S.; Wu, D.; Bouffard, F.; Boulet, B. Building energy management with reinforcement learning and model predictive control: A survey. IEEE Access 2022, 10, 27853–27862. [Google Scholar] [CrossRef]
- Fu, Q.; Han, Z.; Chen, J.; Lu, Y.; Wu, H.; Wang, Y. Applications of reinforcement learning for building energy efficiency control: A review. J. Build. Eng. 2022, 50, 104165. [Google Scholar] [CrossRef]
- Fang, X.; Gong, G.; Li, G.; Chun, L.; Peng, P.; Li, W.; Shi, X.; Chen, X. Deep reinforcement learning optimal control strategy for temperature setpoint real-time reset in multi-zone building HVAC system. Appl. Therm. Eng. 2022, 212, 118552. [Google Scholar] [CrossRef]
- Li, F.; Du, Y. Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning. In Deep Learning for Power System Applications: Case Studies Linking Artificial Intelligence and Power Systems; Springer: Berlin/Heidelberg, Germany, 2023; pp. 71–96. [Google Scholar]
- Arroyo, J.; Manna, C.; Spiessens, F.; Helsen, L. Reinforced model predictive control (RL-MPC) for building energy management. Appl. Energy 2022, 309, 118346. [Google Scholar] [CrossRef]
- Guideline 36: Best in Class HVAC Control Sequences. 2025. Available online: https://www.ashrae.org/professional-development/all-instructor-led-training/catalog-of-instructor-led-training/guideline-36-best-in-class-hvac-control-sequences (accessed on 26 February 2025).
- Yoon, Y.; Amasyali, K.; Li, Y.; Im, P.; Bae, Y.; Liu, Y.; Zandi, H. Energy performance evaluation of the ASHRAE Guideline 36 control and reinforcement learning–based control using field measurements. Energy Build. 2024, 325, 115005. [Google Scholar] [CrossRef]
- Lee, Y.M.; Horesh, R.; Liberti, L. Optimal HVAC Control as Demand Response with On-site Energy Storage and Generation System. Energy Procedia 2015, 78, 2106–2111. [Google Scholar] [CrossRef]
- D’Ettorre, F.; Conti, P.; Schito, E.; Testi, D. Model predictive control of a hybrid heat pump system and impact of the prediction horizon on cost-saving potential and optimal storage capacity. Appl. Therm. Eng. 2019, 148, 524–535. [Google Scholar] [CrossRef]
- Luzi, M.; Vaccarini, M.; Lemma, M. A tuning methodology of Model Predictive Control design for energy efficient building thermal control. J. Build. Eng. 2019, 21, 28–36. [Google Scholar] [CrossRef]
- Azuatalam, D.; Lee, W.L.; de Nijs, F.; Liebman, A. Reinforcement learning for whole-building HVAC control and demand response. Energy AI 2020, 2, 100020. [Google Scholar] [CrossRef]
- Fu, Q.; Li, Z.; Ding, Z.; Chen, J.; Luo, J.; Wang, Y.; Lu, Y. ED-DQN: An event-driven deep reinforcement learning control method for multi-zone residential buildings. Build. Environ. 2023, 242, 110546. [Google Scholar] [CrossRef]
- Fang, X.; Gong, G.; Li, G.; Chun, L.; Peng, P.; Li, W.; Shi, X. Cross temporal-spatial transferability investigation of deep reinforcement learning control strategy in the building HVAC system level. Energy 2023, 263, 125679. [Google Scholar] [CrossRef]
- Kontes, G.D.; Giannakis, G.I.; Sánchez, V.; Agustin-Camacho, D.; Romero-Amorrortu, A.; Panagiotidou, N.; Rovas, D.V.; Steiger, S.; Mutschler, C.; Gruen, G.; et al. Simulation-based evaluation and optimization of control strategies in buildings. Energies 2018, 11, 3376. [Google Scholar] [CrossRef]
- Costanzo, G.T.; Iacovella, S.; Ruelens, F.; Leurs, T.; Claessens, B.J. Experimental analysis of data-driven control for a building heating system. Sustain. Energy Grids Netw. 2016, 6, 81–90. [Google Scholar] [CrossRef]
- Di Natale, L.; Svetozarevic, B.; Heer, P.; Jones, C.N. Physically Consistent Neural Networks for building thermal modeling: Theory and analysis. Appl. Energy 2022, 325, 119806. [Google Scholar] [CrossRef]
- Balali, Y.; Chong, A.; Busch, A.; O’Keefe, S. Energy modelling and control of building heating and cooling systems with data-driven and hybrid models—A review. Renew. Sustain. Energy Rev. 2023, 183, 113496. [Google Scholar] [CrossRef]
- Shah, S.F.A.; Iqbal, M.; Aziz, Z.; Rana, T.A.; Khalid, A.; Cheah, Y.N.; Arif, M. The Role of Machine Learning and the Internet of Things in Smart Buildings for Energy Efficiency. Appl. Sci. 2022, 12, 7882. [Google Scholar] [CrossRef]
- Pinto, G.; Wang, Z.; Roy, A.; Hong, T.; Capozzoli, A. Transfer learning for smart buildings: A critical review of algorithms, applications, and future perspectives. Adv. Appl. Energy 2022, 5, 100084. [Google Scholar] [CrossRef]
- Morteza, A.; Yahyaeian, A.A.; Mirzaeibonehkhater, M.; Sadeghi, S.; Mohaimeni, A.; Taheri, S. Deep learning hyperparameter optimization: Application to electricity and heat demand prediction for buildings. Energy Build. 2023, 289, 113036. [Google Scholar] [CrossRef]
- Boutahri, Y.; Tilioua, A. Machine learning-based predictive model for thermal comfort and energy optimization in smart buildings. Results Eng. 2024, 22, 102148. [Google Scholar] [CrossRef]
- Lecarpentier, E.; Rachelson, E. Non-stationary Markov decision processes, a worst-case approach using model-based reinforcement learning. Adv. Neural Inf. Process. Syst. 2019, 32, 7214–7223. [Google Scholar]
- Trimponias, G.; Dietterich, T.G. Reinforcement Learning with Exogenous States and Rewards. arXiv 2023, arXiv:2303.12957. [Google Scholar]
- Glasmachers, T. Limits of end-to-end learning. In Proceedings of the Asian Conference on Machine Learning, PMLR, Nagoya, Japan, 15–17 November 2017; pp. 17–32. [Google Scholar]
- Ring, M.B. Continual Learning in Reinforcement Environments. Ph.D. Thesis, University of Texas at Austin, Austin, TX, USA, 1994. [Google Scholar]
- Bryhn, A.C.; Dimberg, P.H. An operational definition of a statistically meaningful trend. PLoS ONE 2011, 6, e19241. [Google Scholar] [CrossRef]
- Deng, X.; Zhang, Y.; Qi, H. Towards optimal HVAC control in non-stationary building environments combining active change detection and deep reinforcement learning. Build. Environ. 2022, 211, 108680. [Google Scholar] [CrossRef]
- Stripping off the Implementation Complexity of Physics-Based Model Predictive Control for Buildings via Deep Learning. Online. 2019. Available online: https://s3.us-east-1.amazonaws.com/climate-change-ai/papers/neurips2019/34/paper.pdf (accessed on 22 September 2021).
- Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef]
- Sutton, R.S.; McAllester, D.A.; Singh, S.P.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 2000, 12, 1057–1063. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2016, arXiv:1509.02971. [Google Scholar]
- Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 4th ed.; Pearson Education, Inc.: Hoboken, NJ, USA, 2021. [Google Scholar]
- Bowers, R.I.; Salmon, C. Causal Reasoning. In Encyclopedia of Evolutionary Psychological Science; Springer: Cham, Switzerland, 2017; pp. 1–17. [Google Scholar] [CrossRef]
- Geiger, D.; Verma, T.; Pearl, J. d-separation: From theorems to algorithms. In Machine Intelligence and Pattern Recognition; Elsevier: Amsterdam, The Netherlands, 1990; Volume 10, pp. 139–148. [Google Scholar]
- Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyperparameter optimization. Adv. Neural Inf. Process. Syst. 2011, 24, 2546–2554. [Google Scholar]
- Buildings.Examples.VAVReheat. 2021. Available online: https://simulationresearch.lbl.gov/modelica/releases/v4.0.0/help/Buildings_Examples_VAVReheat.html (accessed on 15 July 2021).
- Zaytar, A.; Amrani, C.E. Sequence to Sequence Weather Forecasting with Long Short-Term Memory Recurrent Neural Networks. Int. J. Comput. Appl. 2016, 143, 7–11. [Google Scholar]
- Ding, X.; Du, W.; Cerpa, A.E. MB2C: Model-Based Deep Reinforcement Learning for Multi-zone Building Control. In BuildSys ’20: Proceedings of the 7th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation; Association for Computing Machinery: New York, NY, USA, 2020; pp. 50–59. [Google Scholar] [CrossRef]
- Karevan, Z.; Suykens, J.A.K. Transductive LSTM for time-series prediction: An application to weather forecasting. Neural Netw. 2020, 125, 1–9. [Google Scholar] [CrossRef]
- Liaw, R.; Liang, E.; Nishihara, R.; Moritz, P.; Gonzalez, J.E.; Stoica, I. Tune: A Research Platform for Distributed Model Selection and Training. arXiv 2018, arXiv:1807.05118. [Google Scholar]
- A Ten-Minute Introduction to Sequence-to-Sequence Learning in Keras. 2020. Available online: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html (accessed on 27 February 2022).
- Nagabandi, A.; Finn, C.; Levine, S. Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL. arXiv 2018, arXiv:1812.07671. [Google Scholar]
- Wei, T.; Wang, Y.; Zhu, Q. Deep reinforcement learning for building HVAC control. In Proceedings of the 54th Annual Design Automation Conference 2017, Austin, TX, USA, 18–22 June 2017; pp. 1–6. [Google Scholar]
- Barrett, E.; Linder, S. Autonomous hvac control, a reinforcement learning approach. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2015; pp. 3–19. [Google Scholar]
- Zhang, Z.; Chong, A.; Pan, Y.; Zhang, C.; Lam, K.P. Whole building energy model for HVAC optimal control: A practical framework based on deep reinforcement learning. Energy Build. 2019, 199, 472–490. [Google Scholar] [CrossRef]
- Naug, A.; Quinones–Grueiro, M.; Biswas, G. Reinforcement learning-based HVAC supervisory control of a multi-zone building- A real case study. In Proceedings of the 2022 IEEE Conference on Control Technology and Applications (CCTA), Trieste, Italy, 23–25 August 2022; pp. 1172–1177. [Google Scholar] [CrossRef]
Component | Variables |
---|---|
State | 1. Outside Air Temperature(oat) |
2. Outside Air Relative Humidity(orh) | |
3. Five Zone Temperatures() | |
4. Total Energy Consumption() | |
5. AHU Discharge Air Temperature () | |
Action: | AHU Discharge Temperature Set- |
point () | |
Reward | where |
Non-stationary Transition | 1. Total Energy Consumption Model() |
Model() | 2. Zone Temperature Model() |
3. VAV Damper Percentage Model() |
Hyperparameter(s) | Metric Affected | Global/Local |
---|---|---|
Performance Monitor Module Window sizes: , | Global | |
Memory Buffer Size: | , | Global |
Forecast Horizon Length: N | , . | Global |
Exogenous Variable Predictor: Nodes, Layers, Activations | Local | |
Dynamic System Model(s): Nodes, Layers, Activations | Local | |
RL Agent Policy Network: Nodes, Layers, Activations | Local | |
RL Agent Value Network: Nodes, Layers, Activations | Local | |
Discount Factor: | Local | |
Episode Length: | Local |
Hyperparameter | Search Space (Range) |
---|---|
(Performance Monitor Module) | h |
(Performance Monitor Module) | min |
(Experience Buffer) | h |
N(Exogenous Variable Predictor and Dynamic System Model) | h |
Hyperparameter | Search Space (Range) |
---|---|
Dynamic System Model Weights (, , ) | |
Dynamic System Model Activation (, , ) | |
Exogenous Variable Model Weights | |
Exogenous Variable Model Activation | |
Actor() Critic () Network weights | |
Actor() Critic () Network Activation | |
days |
Non-Stationarity | Weather | Zone Set Point | Thermal Load | Combined | ||||
---|---|---|---|---|---|---|---|---|
Hyperparameter | K | N | K | N | K | N | K | N |
Value (h) | 18 | 72 | 6 | 72 | 6 | 72 | 12 | 48 |
Non-Stationarity | Weather | Zone Setpoint | ||
---|---|---|---|---|
Hyperparameter | ||||
Value (h) | 6 | 2 | 3 | 2 |
Non-stationarity | Thermal Load | Combined | ||
Hyperparameter | ||||
Value (h) | 3 | 2 | 6 | 3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Naug, A.; Quinones-Grueiro, M.; Biswas, G. An End-to-End Relearning Framework for Building Energy Optimization. Energies 2025, 18, 1408. https://doi.org/10.3390/en18061408
Naug A, Quinones-Grueiro M, Biswas G. An End-to-End Relearning Framework for Building Energy Optimization. Energies. 2025; 18(6):1408. https://doi.org/10.3390/en18061408
Chicago/Turabian StyleNaug, Avisek, Marcos Quinones-Grueiro, and Gautam Biswas. 2025. "An End-to-End Relearning Framework for Building Energy Optimization" Energies 18, no. 6: 1408. https://doi.org/10.3390/en18061408
APA StyleNaug, A., Quinones-Grueiro, M., & Biswas, G. (2025). An End-to-End Relearning Framework for Building Energy Optimization. Energies, 18(6), 1408. https://doi.org/10.3390/en18061408