This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation
by
Sattar Hedayat
Sattar Hedayat 1,*
,
Tina Ziarati
Tina Ziarati 1 and
Matteo Manganelli
Matteo Manganelli 1,2,*
1
Faculty of Civil and Industrial Engineering, Sapienza University of Rome, 00184 Rome, Italy
2
Nuclear Department, ENEA, 40121 Bologna, Italy
*
Authors to whom correspondence should be addressed.
Energies 2025, 18(23), 6310; https://doi.org/10.3390/en18236310 (registering DOI)
Submission received: 31 October 2025
/
Revised: 16 November 2025
/
Accepted: 25 November 2025
/
Published: 30 November 2025
Abstract
This paper presents a physics-informed reinforcement learning framework that embeds thermodynamic constraints directly into the policy network of a continuous control agent for HVAC optimization. We introduce a Thermodynamically-Constrained Deep Deterministic Policy Gradient (TC-DDPG) algorithm that operates on continuous actions and enforces physical feasibility through a differentiable constraint layer coupled with physics-regularized loss functions. In a simulation-based evaluation using a custom Python multi-zone resistance-capacitance (RC) thermal model, the proposed method achieves a 34.7% reduction in annual HVAC electricity consumption relative to a rule-based baseline (95% CI: 31.2–38.1%, n = 50 runs) and outperforms standard DDPG by 16.1 percentage points. Thermal comfort during occupied hours maintains PMV∈ [−0.5, 0.5] for 98.3% of operational time, peak demand decreases by 35.8%, and simulated coefficient of performance (COP) improves from 2.87 ± 0.08 to 4.12 ± 0.10. Physics constraint violations are reduced by approximately 98.6% compared to unconstrained DDPG, demonstrating the effectiveness of architectural enforcement mechanisms within the simulation environment. We present a reference prototype and commit to a future public release of the code, configurations, and hyperparameters sufficient to reproduce the reported results. The paper explicitly addresses the limitations of simulation-based studies and presents a staged roadmap toward hardware-in-the-loop testing and pilot deployments in real buildings.
Share and Cite
MDPI and ACS Style
Hedayat, S.; Ziarati, T.; Manganelli, M.
A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation. Energies 2025, 18, 6310.
https://doi.org/10.3390/en18236310
AMA Style
Hedayat S, Ziarati T, Manganelli M.
A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation. Energies. 2025; 18(23):6310.
https://doi.org/10.3390/en18236310
Chicago/Turabian Style
Hedayat, Sattar, Tina Ziarati, and Matteo Manganelli.
2025. "A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation" Energies 18, no. 23: 6310.
https://doi.org/10.3390/en18236310
APA Style
Hedayat, S., Ziarati, T., & Manganelli, M.
(2025). A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation. Energies, 18(23), 6310.
https://doi.org/10.3390/en18236310
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.