A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation

Hedayat, Sattar; Ziarati, Tina; Manganelli, Matteo

doi:10.3390/en18236310

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation

by

Sattar Hedayat

^1,*

,

Tina Ziarati

¹ and

Matteo Manganelli

^1,2,*

¹

Faculty of Civil and Industrial Engineering, Sapienza University of Rome, 00184 Rome, Italy

²

Nuclear Department, ENEA, 40121 Bologna, Italy

^*

Authors to whom correspondence should be addressed.

Energies 2025, 18(23), 6310; https://doi.org/10.3390/en18236310 (registering DOI)

Submission received: 31 October 2025 / Revised: 16 November 2025 / Accepted: 25 November 2025 / Published: 30 November 2025

(This article belongs to the Special Issue New Insights into Hybrid Renewable Energy Systems in Buildings)

Download Versions Notes

Abstract

This paper presents a physics-informed reinforcement learning framework that embeds thermodynamic constraints directly into the policy network of a continuous control agent for HVAC optimization. We introduce a Thermodynamically-Constrained Deep Deterministic Policy Gradient (TC-DDPG) algorithm that operates on continuous actions and enforces physical feasibility through a differentiable constraint layer coupled with physics-regularized loss functions. In a simulation-based evaluation using a custom Python multi-zone resistance-capacitance (RC) thermal model, the proposed method achieves a 34.7% reduction in annual HVAC electricity consumption relative to a rule-based baseline (95% CI: 31.2–38.1%, n = 50 runs) and outperforms standard DDPG by 16.1 percentage points. Thermal comfort during occupied hours maintains PMV∈ [−0.5, 0.5] for 98.3% of operational time, peak demand decreases by 35.8%, and simulated coefficient of performance (COP) improves from 2.87 ± 0.08 to 4.12 ± 0.10. Physics constraint violations are reduced by approximately 98.6% compared to unconstrained DDPG, demonstrating the effectiveness of architectural enforcement mechanisms within the simulation environment. We present a reference prototype and commit to a future public release of the code, configurations, and hyperparameters sufficient to reproduce the reported results. The paper explicitly addresses the limitations of simulation-based studies and presents a staged roadmap toward hardware-in-the-loop testing and pilot deployments in real buildings.

Keywords: physics-informed reinforcement learning; TC-DDPG; continuous control; HVAC optimization; thermodynamic constraints; building energy management; simulation validation

Share and Cite

MDPI and ACS Style

Hedayat, S.; Ziarati, T.; Manganelli, M. A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation. Energies 2025, 18, 6310. https://doi.org/10.3390/en18236310

AMA Style

Hedayat S, Ziarati T, Manganelli M. A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation. Energies. 2025; 18(23):6310. https://doi.org/10.3390/en18236310

Chicago/Turabian Style

Hedayat, Sattar, Tina Ziarati, and Matteo Manganelli. 2025. "A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation" Energies 18, no. 23: 6310. https://doi.org/10.3390/en18236310

APA Style

Hedayat, S., Ziarati, T., & Manganelli, M. (2025). A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation. Energies, 18(23), 6310. https://doi.org/10.3390/en18236310

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Physics-Informed Reinforcement Learning Framework for HVAC Optimization: Thermodynamically-Constrained Deep Deterministic Policy Gradients with Simulation-Based Validation

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI