Reinforcement Learning for Energy Community Management: A European-Scale Study †
Abstract
:1. Introduction
1.1. Hypothesis of This Study
1.2. Enhancing Novelty: Methodological Advancements and Unexplored Territories
1.3. Regional Variations in Energy Dynamics: Insights from Italian Regions and beyond
2. Related Works
Paper’s Contributions
3. Methods
3.1. Problem Formulation
3.2. Optimal Control Policy
3.3. Reinforcement Learning Approach
3.4. Actor–Critic Architecture
3.5. Optimization Procedure
- Exploitation of optimal control actions: During the training phase, we have access to optimal actions computed using the MILP algorithm outlined in Section 3.2. We leverage this information to enhance our agent’s training. By feeding the optimal actions as input to the Value-DNN, we aim to improve the critic’s ability to evaluate the actions taken by the actor.
- Reward penalties for constraint violations: At each time step, the Policy-DNN of the agent generates U actions, each corresponding to a specific entity. As the training objective is to maximize social welfare, we utilize the social welfare for that time step as the reward signal for the actor’s actions. The Policy-DNN’s output actions are in the range of [−1, 1]. Each action is then scaled by the rated power (i.e., ) of the BESS of the corresponding entity to determine the actual power for charging/discharging the storage systems. Each action is then multiplied by the rated power (i.e., ) of the BESS of the corresponding entity to obtain the actual power for charging/discharging the storage systems. However, the actor’s actions may violate feasibility constraints. In such cases, actions that do not comply with the constraints are replaced with physically feasible actions, and a penalty is computed for the constraint violation for each action () using (10):The resulting total penalty is the average of the individual penalties:This overall penalty is subtracted from the reward, resulting in the following reward signal, which is used to train the agent:
3.6. Simulation Environment
3.7. Reinforcement Learning Logic Concept
4. Results
4.1. Dataset Selection and Rationale
Production across Six Zones
Energy Purchase and Sale Prices in Four Zones
4.2. Training Details
4.3. Evaluation Scenarios, Baselines, and Metrics
- 1.
- FRAN: France, Paris;
- 2.
- SVIZ: Switzerland, Berne;
- 3.
- SLOV: Slovenia, Ljubljana;
- 4.
- GREC: Greece, Athens;
- 5.
- NORD: northern Italy;
- 6.
- CNORD: central-northern Italy;
- 7.
- CSUD: central-southern Italy;
- 8.
- SUD: southern Italy;
- 9.
- CALA: Calabria region, Italy;
- 10.
- SICI: Sicily island, Italy;
- 11.
- SARD: Sardinia island, Italy.
- Optimal Controller (OC), as detailed in Section 3.2. The optimal scheduling of the BESSs for each day was determined using a mixed-integer linear programming (MILP) algorithm. This approach assumes complete knowledge of generation and consumption data for all 24 h, yielding optimal actions for BESS control and maximizing the daily community welfare.
- Rule-Based Controller (RBC). The BESSs’ actions were determined by predefined rules following Algorithm 1. Rule-based controllers, as exemplified in Algorithm 1, are commonly employed to schedule the charge and discharge policies of storage systems. For each entity, the RBC controller charged the BESS with surplus energy as long as the battery had not reached maximum capacity. Conversely, if less energy was produced than required, the loads were supplied with the energy from the BESS, if available.
Algorithm 1: Rule-based controller action selection |
4.4. Results Discussion
5. Conclusions
Implications and Limitations of this Study
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
REC | Renewable Energy Community |
BESS | Battery Energy Storage System |
PV | Photovoltaic |
RL | Reinforcement Learning |
DRL | Deep Reinforcement Learning |
DNN | Deep Neural Network |
SAC | Soft Actor–Critic |
RBC | Rule-Based Controller |
OC | Optimal Control |
MILP | Mixed-Integer Linear Programming |
Constants and sets | |
U | number of entities forming the community |
T | number of time periods per day |
duration of a time period (h) | |
S | set of states |
A | set of actions |
Variables | |
energy imported from the grid by entity u at time t (kWh) | |
energy exported to the grid by entity u at time t (kWh) | |
energy level of the battery of entity u at time t (kWh) | |
energy supplied to the battery of entity u at time t (kWh) | |
energy withdrawn from the battery of entity u at time t (kWh) | |
Parameters | |
maximum capacity of the battery of entity u (kWh) | |
energy generated by PV plant of entity u at time t (kWh) | |
energy demand of entity u at time t (kWh) | |
rated power of the battery of entity u (kW) | |
discharging efficiency of battery of entity u | |
charging efficiency of battery of entity u | |
unit price of energy exported to the grid by entity u at time t (EUR/kWh) | |
unit price of energy imported from the grid by entity u at time t (EUR/kWh) | |
unitary cost for usage of energy storage of entity u at time t (EUR/kWh) | |
unit incentive for community self-consumption at time t (EUR/kWh) |
References
- United Nations. Agenda 2030. Available online: https://tinyurl.com/2j8a6atr (accessed on 28 January 2024).
- Gjorgievski, V.Z.; Cundeva, S.; Georghiou, G.E. Social arrangements, technical designs and impacts of energy communities: A review. Renew. Energy 2021, 169, 1138–1156. [Google Scholar] [CrossRef]
- Directive (EU) 2018/2001 of the European Parliament and of the Council on the promotion of the use of energy from renewable sources. Off. J. Eur. Union 2018, 328, 84–209. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32018L2001 (accessed on 28 January 2024).
- Parhizi, S.; Lotfi, H.; Khodaei, A.; Bahramirad, S. State of the Art in Research on Microgrids: A Review. IEEE Access 2015, 3, 890–925. [Google Scholar] [CrossRef]
- Zia, M.F.; Elbouchikhi, E.; Benbouzid, M. Microgrids energy management systems: A critical review on methods, solutions, and prospects. Appl. Energy 2018, 222, 1033–1055. [Google Scholar] [CrossRef]
- Zanvettor, G.G.; Casini, M.; Giannitrapani, A.; Paoletti, S.; Vicino, A. Optimal Management of Energy Communities Hosting a Fleet of Electric Vehicles. Energies 2022, 15, 8697. [Google Scholar] [CrossRef]
- Stentati, M.; Paoletti, S.; Vicino, A. Optimization of energy communities in the Italian incentive system. In Proceedings of the 2022 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Novi Sad, Serbia, 10–12 October 2022; pp. 1–5. [Google Scholar]
- Talluri, G.; Lozito, G.M.; Grasso, F.; Iturrino Garcia, C.; Luchetta, A. Optimal battery energy storage system scheduling within renewable energy communities. Energies 2021, 14, 8480. [Google Scholar] [CrossRef]
- Aupke, P.; Kassler, A.; Theocharis, A.; Nilsson, M.; Uelschen, M. Quantifying uncertainty for predicting renewable energy time series data using machine learning. Eng. Proc. 2021, 5, 50. [Google Scholar]
- Liu, L.; Zhao, Y.; Chang, D.; Xie, J.; Ma, Z.; Sun, Q.; Yin, H.; Wennersten, R. Prediction of short-term PV power output and uncertainty analysis. Appl. Energy 2018, 228, 700–711. [Google Scholar] [CrossRef]
- Chen, Z.; Wu, L.; Fu, Y. Real-Time Price-Based Demand Response Management for Residential Appliances via Stochastic Optimization and Robust Optimization. IEEE Trans. Smart Grid 2012, 3, 1822–1831. [Google Scholar] [CrossRef]
- Parisio, A.; Rikos, E.; Glielmo, L. A Model Predictive Control Approach to Microgrid Operation Optimization. IEEE Trans. Control. Syst. Technol. 2014, 22, 1813–1827. [Google Scholar] [CrossRef]
- Palma-Behnke, R.; Benavides, C.; Lanas, F.; Severino, B.; Reyes, L.; Llanos, J.; Sáez, D. A Microgrid Energy Management System Based on the Rolling Horizon Strategy. IEEE Trans. Smart Grid 2013, 4, 996–1006. [Google Scholar] [CrossRef]
- Vazquez-Canteli, J.R.; Henze, G.; Nagy, Z. MARLISA: Multi-Agent Reinforcement Learning with Iterative Sequential Action Selection for Load Shaping of Grid-Interactive Connected Buildings. In Proceedings of the 7th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Yokohama, Japan, 18–20 November 2020. [Google Scholar] [CrossRef]
- Bio Gassi, K.; Baysal, M. Improving real-time energy decision-making model with an actor-critic agent in modern microgrids with energy storage devices. Energy 2023, 263, 126105. [Google Scholar] [CrossRef]
- Ji, Y.; Wang, J.; Xu, J.; Fang, X.; Zhang, H. Real-Time Energy Management of a Microgrid Using Deep Reinforcement Learning. Energies 2019, 12, 2291. [Google Scholar] [CrossRef]
- Mocanu, E.; Mocanu, D.C.; Nguyen, P.H.; Liotta, A.; Webber, M.E.; Gibescu, M.; Slootweg, J.G. On-Line Building Energy Optimization Using Deep Reinforcement Learning. IEEE Trans. Smart Grid 2019, 10, 3698–3708. [Google Scholar] [CrossRef]
- Gao, S.; Xiang, C.; Yu, M.; Tan, K.T.; Lee, T.H. Online Optimal Power Scheduling of a Microgrid via Imitation Learning. IEEE Trans. Smart Grid 2022, 13, 861–876. [Google Scholar] [CrossRef]
- Vázquez-Canteli, J.R.; Dey, S.; Henze, G.; Nagy, Z. CityLearn: Standardizing Research in Multi-Agent Reinforcement Learning for Demand Response and Urban Energy Management. arXiv 2020, arXiv:2012.10504. [Google Scholar]
- Guiducci, L.; Palma, G.; Stentati, M.; Rizzo, A.; Paoletti, S. A Reinforcement Learning approach to the management of Renewable Energy Communities. In Proceedings of the 2023 12th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro, 6–10 June 2023; pp. 1–8. [Google Scholar] [CrossRef]
- Legge 28 Febbraio 2020, n. 8, Recante Disposizioni Urgenti in Materia di Proroga di Termini Legislativi, di Organizzazione Delle Pubbliche Amministrazioni, Nonché di Innovazione Tecnologica. Gazzetta Ufficiale n. 51. 2020. Available online: https://www.gazzettaufficiale.it/eli/id/2020/02/29/20G00021/sg (accessed on 28 January 2024).
- Autorità di Regolazione per Energia Eeti e Ambiente. Delibera ARERA, 318/2020/R/EEL—Regolazione delle Partite Economiche Relative all’Energia Condivisa da un Gruppo di Autoconsumatori di Energia Rinnovabile che Agiscono Collettivamente in Edifici e Condomini oppure Condivisa in una Comunità di Energia Rinnovabile. 4 August 2020. Available online: https://www.arera.it (accessed on 28 January 2024).
- Decreto Ministeriale 16 Settembre 2020—Individuazione Della Tariffa Incentivante per la Remunerazione Degli Impianti a Fonti Rinnovabili Inseriti Nelle Configurazioni Sperimentali di Autoconsumo Collettivo e Comunità Energetiche Rinnovabili. Gazzetta Ufficiale n. 285. 2020. Available online: https://www.mimit.gov.it/it/normativa/decreti-ministeriali/decreto-ministeriale-16-settembre-2020-individuazione-della-tariffa-incentivante-per-la-remunerazione-degli-impianti-a-fonti-rinnovabili-inseriti-nelle-configurazioni-sperimentali-di-autoconsumo-collettivo-e-comunita-energetiche-rinnovabili (accessed on 28 January 2024).
- Cielo, A.; Margiaria, P.; Lazzeroni, P.; Mariuzzo, I.; Repetto, M. Renewable Energy Communities business models under the 2020 Italian regulation. J. Clean. Prod. 2021, 316, 128217. [Google Scholar] [CrossRef]
- Moncecchi, M.; Meneghello, S.; Merlo, M. A game theoretic approach for energy sharing in the italian renewable energy communities. Appl. Sci. 2020, 10, 8166. [Google Scholar] [CrossRef]
- Stentati, M.; Paoletti, S.; Vicino, A. Optimization and Redistribution Strategies for Italian Renewable Energy Communities. In Proceedings of the IEEE EUROCON 2023—20th International Conference on Smart Technologies, Torino, Italy, 6–8 July 2023; pp. 263–268. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Rizzo, A.; Burgess, N. An action based neural network for adaptive control: The tank case study. In Towards a Practice of Autonomous Systems; MIT Press: Cambridge, MA, USA, 1992; pp. 282–291. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar]
- Wang, Z.; Bapst, V.; Heess, N.; Mnih, V.; Munos, R.; Kavukcuoglu, K.; de Freitas, N. Sample Efficient Actor-Critic with Experience Replay. arXiv 2017, arXiv:1611.01224. [Google Scholar]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
Global | month | day type | hour | ||
Individual |
Community | Train | Test1 | Test2 |
---|---|---|---|
3 entities | SUD | SICI | GREC |
5 entities | CNOR | SLOV | SARD |
7 entities | CALA | SVIZ | CNOR |
9 entities | FRAN | CSUD | NORD |
3 Entities | 5 Entities | 7 Entities | 9 Entities | |||||
---|---|---|---|---|---|---|---|---|
Entity ID | PV | BESS | PV | BESS | PV | BESS | PV | BESS |
1 | - | - | - | - | - | - | 120 | 140 |
1 | - | - | - | - | - | - | 70 | 80 |
3 | - | - | - | - | 30 | 60 | 50 | 45 |
4 | - | - | - | - | 60 | 70 | 40 | 75 |
5 | - | - | 25 | 50 | 50 | 50 | 25 | 50 |
6 | - | - | 20 | 30 | 10 | 30 | 20 | 30 |
7 | 35 | 20 | 20 | 40 | 35 | 50 | 25 | 35 |
8 | 20 | 35 | 30 | 40 | 40 | 50 | 40 | 50 |
9 | 25 | 40 | 20 | 35 | 40 | 50 | 30 | 35 |
Community | Controller | Train | Test1 | Test2 |
---|---|---|---|---|
SUD | SICI | GREC | ||
3 entities | RL | 99.55% | 98.72% | 91.91% |
RBC | 97.32% | 96.04% | 82.23% | |
CNOR | SLOV | SARD | ||
5 entities | RL | 99.57% | 96.17% | 98.94% |
RBC | 95.38% | 93.28% | 96.75% | |
CALA | SVIZ | CNOR | ||
7 entities | RL | 97.70% | 95.77% | 97.61% |
RBC | 64.49% | 94.48% | 94.79% | |
FRAN | CSUD | NORD | ||
9 entities | RL | 97.95% | 95.87% | 95.74% |
RBC | 56.91% | 94.91% | 93.58% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Palma, G.; Guiducci, L.; Stentati, M.; Rizzo, A.; Paoletti, S. Reinforcement Learning for Energy Community Management: A European-Scale Study. Energies 2024, 17, 1249. https://doi.org/10.3390/en17051249
Palma G, Guiducci L, Stentati M, Rizzo A, Paoletti S. Reinforcement Learning for Energy Community Management: A European-Scale Study. Energies. 2024; 17(5):1249. https://doi.org/10.3390/en17051249
Chicago/Turabian StylePalma, Giulia, Leonardo Guiducci, Marta Stentati, Antonio Rizzo, and Simone Paoletti. 2024. "Reinforcement Learning for Energy Community Management: A European-Scale Study" Energies 17, no. 5: 1249. https://doi.org/10.3390/en17051249
APA StylePalma, G., Guiducci, L., Stentati, M., Rizzo, A., & Paoletti, S. (2024). Reinforcement Learning for Energy Community Management: A European-Scale Study. Energies, 17(5), 1249. https://doi.org/10.3390/en17051249