A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids

Dragomir, Otilia Elena; Dragomir, Florin

doi:10.3390/buildings16101974

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids

by

Otilia Elena Dragomir

and

Florin Dragomir

^*

Automation, Computer Science and Electrical Engineering Department, Valahia University of Târgoviște, 13 Aleea Sinaia Street, 130004 Târgoviște, Romania

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(10), 1974; https://doi.org/10.3390/buildings16101974 (registering DOI)

Submission received: 24 April 2026 / Revised: 10 May 2026 / Accepted: 14 May 2026 / Published: 16 May 2026

(This article belongs to the Special Issue AI-Driven Distributed Optimization for Building Energy Management)

Download Versions Notes

Abstract

Prosumer communities, aggregations of residential and commercial entities equipped with distributed energy resources (DER), including photovoltaic systems, battery storage, and flexible loads, are emerging as critical organizational units in decarbonising smart grid architectures. Managing these communities effectively requires balancing economic efficiency with equity, autonomy, and environmental sustainability, objectives that conventional centralized control methods and existing multi-agent reinforcement learning (MARL) implementations fail to address simultaneously. This article proposes a value-aligned hierarchical multi-agent reinforcement learning (VA-HMARL) framework as a formally unified architecture that embeds equity (Jain’s Fairness Index J ≥ 0.90), individual autonomy, and carbon sustainability as hard constraints within the MARL reward structure. The framework integrates: a multi-objective Value Alignment Module (VAM) combining economic, fairness, sustainability, and comfort objectives; attention-based implicit coordination for scalable agent interaction; and differentially private federated policy aggregation (ε = 1.0, δ = 10⁻⁵) for GDPR-compliant collaborative learning. Simulation on a 20-prosumer community modelled on the IEEE 33-bus feeder over 10 Monte Carlo runs (300 episodes each) demonstrates: a 6.2% energy cost reduction versus the Rule-Based baseline (p = 0.0004); a Jain’s Fairness Index of 0.912 ± 0.031 at policy convergence (final 50 episodes), satisfying the J ≥ 0.90 community equity floor; and an 18.0% reduction in CO₂ emissions. The economic efficiency trade-off relative to performance-optimized MARL baselines is limited to 2.4%, within the 5% design target. These results establish VA-HMARL as a technically feasible and ethically grounded paradigm for autonomous decentralized energy governance.

Keywords: multi-agent reinforcement learning; prosumer communities; value alignment; decentralized energy management; federated learning; peer-to-peer energy trading; smart grid optimization

Share and Cite

MDPI and ACS Style

Dragomir, O.E.; Dragomir, F. A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids. Buildings 2026, 16, 1974. https://doi.org/10.3390/buildings16101974

AMA Style

Dragomir OE, Dragomir F. A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids. Buildings. 2026; 16(10):1974. https://doi.org/10.3390/buildings16101974

Chicago/Turabian Style

Dragomir, Otilia Elena, and Florin Dragomir. 2026. "A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids" Buildings 16, no. 10: 1974. https://doi.org/10.3390/buildings16101974

APA Style

Dragomir, O. E., & Dragomir, F. (2026). A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids. Buildings, 16(10), 1974. https://doi.org/10.3390/buildings16101974

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI