This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids
by
Otilia Elena Dragomir
Otilia Elena Dragomir
and
Florin Dragomir
Florin Dragomir *
Automation, Computer Science and Electrical Engineering Department, Valahia University of Târgoviște, 13 Aleea Sinaia Street, 130004 Târgoviște, Romania
*
Author to whom correspondence should be addressed.
Buildings 2026, 16(10), 1974; https://doi.org/10.3390/buildings16101974 (registering DOI)
Submission received: 24 April 2026
/
Revised: 10 May 2026
/
Accepted: 14 May 2026
/
Published: 16 May 2026
Abstract
Prosumer communities, aggregations of residential and commercial entities equipped with distributed energy resources (DER), including photovoltaic systems, battery storage, and flexible loads, are emerging as critical organizational units in decarbonising smart grid architectures. Managing these communities effectively requires balancing economic efficiency with equity, autonomy, and environmental sustainability, objectives that conventional centralized control methods and existing multi-agent reinforcement learning (MARL) implementations fail to address simultaneously. This article proposes a value-aligned hierarchical multi-agent reinforcement learning (VA-HMARL) framework as a formally unified architecture that embeds equity (Jain’s Fairness Index J ≥ 0.90), individual autonomy, and carbon sustainability as hard constraints within the MARL reward structure. The framework integrates: a multi-objective Value Alignment Module (VAM) combining economic, fairness, sustainability, and comfort objectives; attention-based implicit coordination for scalable agent interaction; and differentially private federated policy aggregation (ε = 1.0, δ = 10−5) for GDPR-compliant collaborative learning. Simulation on a 20-prosumer community modelled on the IEEE 33-bus feeder over 10 Monte Carlo runs (300 episodes each) demonstrates: a 6.2% energy cost reduction versus the Rule-Based baseline (p = 0.0004); a Jain’s Fairness Index of 0.912 ± 0.031 at policy convergence (final 50 episodes), satisfying the J ≥ 0.90 community equity floor; and an 18.0% reduction in CO2 emissions. The economic efficiency trade-off relative to performance-optimized MARL baselines is limited to 2.4%, within the 5% design target. These results establish VA-HMARL as a technically feasible and ethically grounded paradigm for autonomous decentralized energy governance.
Share and Cite
MDPI and ACS Style
Dragomir, O.E.; Dragomir, F.
A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids. Buildings 2026, 16, 1974.
https://doi.org/10.3390/buildings16101974
AMA Style
Dragomir OE, Dragomir F.
A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids. Buildings. 2026; 16(10):1974.
https://doi.org/10.3390/buildings16101974
Chicago/Turabian Style
Dragomir, Otilia Elena, and Florin Dragomir.
2026. "A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids" Buildings 16, no. 10: 1974.
https://doi.org/10.3390/buildings16101974
APA Style
Dragomir, O. E., & Dragomir, F.
(2026). A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids. Buildings, 16(10), 1974.
https://doi.org/10.3390/buildings16101974
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.