A Low-Carbon Economic Scheduling Strategy for Multi-Microgrids with Communication Mechanism-Enabled Multi-Agent Deep Reinforcement Learning
Abstract
:1. Introduction
- (1)
- In order to be able to realistically reflect the carbon emissions of a generating unit running for a long period of time, a dynamic carbon emission calculation model is constructed.
- (2)
- A new low-carbon optimized scheduling model based on a communication mechanism-enabled MADRL is proposed for MMG systems to achieve the efficient integration of local observation information of MGs, where each MG acts as a separate agent. The optimal scheduling decision is finally obtained by observing local information and interacting with other agents in an information game.
2. Theoretical Studies on Carbon Emission Flows
2.1. Overview of the Carbon Emissions Theory
2.2. Definitions Related to Carbon Flow Theory
2.2.1. Carbon Emission Flow Rate
2.2.2. Branch Carbon Emission Intensity
2.2.3. Node Carbon Emission Intensity
2.3. Carbon Emission Flow Calculation Method
2.4. Dynamic Carbon Emission Calculation Model
2.5. Multi-Phase Carbon Cost Calculation Model
3. Multi-Microgrid Optimal Scheduling
3.1. Optimal Scheduling Model
3.1.1. MMG Scheduling Objective Function
3.1.2. Constraints
3.2. MADRL-Based Optimal Scheduling Approach
3.2.1. Markov Game Process
- Agent: Each microgrid comprises an agent;
- Environment: All the information on the microgrid;
- State: The state information required by each agent g at each time t is organized as follows:
- Action: The agent action variables for the g-th microgrid are as follows:
- Reward: The optimization objective of the proposed control approach is to maximize the reward value. The reward for each agent is calculated as shown below:
- State transition function: The transfer function obtains the next state based on the currently acquired state , actions , and environment random variables .
3.2.2. Proposed Approach Based on MADRL
3.2.3. Critic Network
3.2.4. Actor Network
4. Case Study
4.1. Dataset
4.2. Baselines
4.2.1. Benchmark Approaches
- (1)
- Particle swarm optimization algorithm (PSO) [42]: The particle constantly adjusts its speed and position according to its own historical optimal position and the global optimal position of the group, moves in the direction of the better solution, and finally, makes the group converge to the approximate optimal solution.
- (2)
- TD3: To enhance the performance and stability of DRL algorithms in continuous-action spaces, TD3 employs two critic networks to mitigate value-estimation bias and adopts a delayed-update strategy to stabilize the training process.
- (3)
- Multi-agent TD3 (MATD3) [43]: It is extended based on single-agent TD3. The core objective is to address the training instability problem in multi-agent body environments due to environmental non-stationarity and strategy coordination difficulties.
4.2.2. Performance in Training
4.2.3. Performance in Testing
4.3. Performance on the Test Set over a Day
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhao, P.F.; Hu, W.H.; Cao, D.; Huang, R.; Wu, X.W.; Huang, Q.; Chen, Z. Causal Mechanism-Enabled Zero-Label Learning for Power Generation Forecasting of Newly-Built PV Sites. IEEE Trans. Sustain. Energy 2025, 16, 392–406. [Google Scholar] [CrossRef]
- Yu, G.Z.; Zhang, Z.L.; Cui, G.A.; Dong, Q.; Wang, S.Y.; Li, X.C.; Shen, L.X.; Yan, H.Z. Low-carbon economic dispatching strategy based on feasible region of cooperative interaction between wind-storage system and carbon capture power plant. Renew. Energy 2024, 228, 120706. [Google Scholar] [CrossRef]
- Zhan, J.Y.; Wang, C.; Wang, H.H.; Zhang, F.; Li, Z.H. Pathways to achieve carbon emission peak and carbon neutrality by 2060: A case study in the Beijing-Tianjin-Hebei region, China. Renew. Sustain. Energy Rev. 2024, 189, 113955. [Google Scholar] [CrossRef]
- Cao, D.; Zhao, J.B.; Hu, W.H.; Huang, Q.; Chen, Z.; Blaabjerg, F. Data-driven multi-agent deep reinforcement learning for distribution system decentralized voltage control with high penetration of PVs. IEEE Trans. Smart Grid 2021, 12, 4137–4150. [Google Scholar] [CrossRef]
- Cao, D.; Hu, W.H.; Zhao, J.B.; Huang, Q.; Chen, Z.; Blaabjerg, F. A multi-agent deep reinforcement learning based voltage regulation using coordinated PV inverters. IEEE Trans. Power Syst. 2020, 35, 4120–4123. [Google Scholar] [CrossRef]
- IEA. Global Energy Review 2025 Dataset; IEA: Paris, France, 2025; Available online: https://www.iea.org/data-and-statistics/data-product/global-energy-review-2025-dataset (accessed on 25 March 2025).
- Zou, H.B.; Tao, J.; Elsayed, S.K.; Elattar, E.E.; Almalaq, A.; Mohamed, M.A. Stochastic multi-carrier energy management in the smart islands using reinforcement learning and un-scented transform. Int. J. Electr. Power Energy Syst. 2021, 130, 106988. [Google Scholar] [CrossRef]
- Dong, W.; Chen, C.F.; Fang, X.L.; Zhang, F.; Yang, Q. Enhanced integrated energy system planning through unified model coupling multiple energy and carbon emission flows. Energy 2024, 307, 132799. [Google Scholar] [CrossRef]
- Zhao, P.F.; Cao, D.; Hu, W.H.; Huang, Y.H.; Hao, M.; Haung, Q.; Chen, Z. Geometric Loss-Enabled Complex Neural Network for Multi-Energy Load Forecasting in Integrated Energy Systems. IEEE Trans. Power Syst. 2024, 39, 5659–5671. [Google Scholar] [CrossRef]
- Feng, Z.H.; Zhang, J.; Lu, J.; Zhang, Z.D.; Bai, W.W.; Ma, L.; Lu, H.N.; Lin, J. Low-Carbon Economic Dispatch Strategy for Integrated Energy Systems under Uncertainty Counting CCS-P2G and Concentrating Solar Power Stations. Energy Eng. 2025, 122, 1531–1560. [Google Scholar] [CrossRef]
- Wang, R.T.; Wen, X.Y.; Wang, X.Y.; Fu, Y.B.; Zhang, Y. Low carbon optimal operation of integrated energy system based on carbon capture technology, LCA carbon emissions and ladder-type carbon trading. Appl. Energy 2022, 311, 118664. [Google Scholar] [CrossRef]
- Gao, C.; Niu, K.; Chen, W.J.; Wang, C.W.; Chen, Y.B.; Qu, R. Novel Low-Carbon Optimal Operation Method for Flexible Distribution Network Based on Carbon Emission Flow. Energy Eng. 2025, 122, 785–803. [Google Scholar] [CrossRef]
- O’Shaughnessy, E.; Cruce, J.R.; Xu, K.F. Too much of a good thing? Global trends in the curtailment of solar PV. Sol. Energy 2020, 208, 1068–1077. [Google Scholar] [CrossRef] [PubMed]
- Yan, N.; Zhao, Z.J.; Li, X.J.; Yang, J.L. Multi-time scales low-carbon economic dispatch of integrated energy system considering hydrogen and electricity complementary energy storage. J. Energy Storage 2024, 104, 114514. [Google Scholar] [CrossRef]
- Hu, J.; Qin, K.; Ma, R.; Liu, W.X.; Zhang, J.Y.; Pang, L.M.; Zhang, J.W. A study on carbon emission flow tracking for new type power systems. Int. J. Electr. Power Energy Syst. 2025, 165, 110455. [Google Scholar] [CrossRef]
- Li, S.C.; Cao, D.; Hu, W.H.; Huang, Q.; Chen, Z.; Blaabjerg, F. Multi-energy management of interconnected multi-microgrid system using multi-agent deep reinforcement learning. J. Mod. Power Syst. Clean Energy 2023, 11, 1606–1617. [Google Scholar] [CrossRef]
- Yu, B.Y.; Fu, J.H.; Dai, Y. Multi-agent simulation of policies driving CCS technology in the cement industry. Energy Policy 2025, 199, 114527. [Google Scholar] [CrossRef]
- Zhao, B.; Wang, X.J.; Lin, D.; Calvin, M.M.; Morgan, J.C.; Qin, R.W.; Wang, C.S. Energy management of multiple microgrids based on a system of systems architecture. IEEE Trans. Power Syst. 2018, 33, 6410–6421. [Google Scholar] [CrossRef]
- Zhou, B.; Zou, J.T.; Chung, C.Y.; Wang, H.Z.; Liu, N.A.; Voropai, N.; Xu, D.S. Multi-microgrid energy management systems: Architecture, communication, and scheduling strategies. J. Mod. Power Syst. Clean Energy 2021, 9, 463–476. [Google Scholar] [CrossRef]
- Li, S.C.; Hu, W.H.; Cao, D.; Hu, J.X.; Chen, Z.; Blaabjerg, F. Coordinated operation of multiple microgrids with heat–electricity energy based on graph surrogate model-enabled robust multiagent deep reinforcement learning. IEEE Trans. Ind. Inform. 2025, 21, 248–257. [Google Scholar] [CrossRef]
- Wang, K.; Xue, Z.H.; Cao, D.; Liu, Y.; Fang, Y.P. Two-Stage Stochastic Resilience Optimization of Converter Stations Under Uncertain Mainshock-Aftershock Sequences. IEEE Trans. Power Syst. 2025, 1–13. [Google Scholar] [CrossRef]
- Zou, C.J.; Wang, K.; Xiahou, T.F.; Cao, D.; Liu, Y. Two-Stage Distributionally Robust Optimization for Infrastructure Resilience Enhancement: A Case Study of 220 kV Power Substations Under Earthquake Disasters. IEEE Trans. Reliab. 2024, 1–15. [Google Scholar] [CrossRef]
- Yi, Y.Q.; Xu, J.Z.; Zhang, W.M. A low-carbon driven price approach for energy transactions of multi-microgrids based on non-cooperative game model considering uncertainties. Sustain. Energy Grids Netw. 2024, 40, 101570. [Google Scholar] [CrossRef]
- Wang, Y.; Li, J.X.; Qu, D.Q.; Wang, X. Low-carbon economic operation strategy for a multi-microgrid system considering internal carbon pricing and emission monitoring. J. Process Control 2024, 143, 103313. [Google Scholar] [CrossRef]
- Cao, Z.H.; Li, Z.S.; Yang, C. Credible joint chance-constrained low-carbon energy Management for Multi-energy Microgrids. Appl. Energy 2025, 377, 124390. [Google Scholar] [CrossRef]
- Chen, H.P.; Yang, S.S.; Chen, J.D.; Wang, X.Y.; Li, Y.; Shui, S.Y.; Yu, H. Low-carbon environment-friendly economic optimal scheduling of multi-energy microgrid with integrated demand response considering waste heat utilization. J. Clean. Prod. 2024, 450, 141415. [Google Scholar] [CrossRef]
- Cao, D.; Zhao, J.B.; Hu, J.X.; Pei, Y.S.; Huang, Q.; Chen, Z.; Hu, W.H. Physics-informed graphical representation-enabled deep reinforcement learning for robust distribution system voltage control. IEEE Trans. Smart Grid 2024, 15, 233–246. [Google Scholar] [CrossRef]
- Cao, D.; Hu, W.H.; Zhao, J.B.; Zhang, G.Z.; Zhang, B.; Liu, Z.; Chen, Z.; Blaabjerg, F. Reinforcement Learning and Its Applications in Modern Power and Energy Systems: A Review. J. Mod. Power Syst. Clean Energy 2020, 8, 1029–1042. [Google Scholar] [CrossRef]
- Cao, D.; Hu, W.H.; Xu, X.; Wu, Q.W.; Huang, Q.; Chen, Z.; Blaabjerg, F. Deep reinforcement learning based approach for optimal power flow of distribution networks embedded with renewable energy and storage devices. J. Mod. Power Syst. Clean Energy 2021, 9, 1101–1110. [Google Scholar] [CrossRef]
- Li, S.C.; Hu, W.H.; Cao, D.; Hu, J.X.; Chen, Z.; Blaabjerg, F. A novel MADRL with spatial-temporal pattern capturing ability for robust decentralized control of multiple microgrids under anomalous measurements. IEEE Trans. Sustain. Energy 2024, 5, 1872–1884. [Google Scholar] [CrossRef]
- Ye, T.; Haung, Y.P.; Yang, W.J.; Cai, G.T.; Yang, Y.Y.; Pan, F. Safe multi-agent deep reinforcement learning for decentralized low-carbon operation in active distribution networks and multi-microgrids. Appl. Energy 2025, 387, 125609. [Google Scholar] [CrossRef]
- Xu, X.S.; Xu, K.; Zeng, Z.Y.; Tang, J.L.; He, Y.X.; Shi, G.Z.; Zhang, T. Collaborative optimization of multi-energy multi-microgrid system: A hierarchical trust-region multi-agent reinforcement learning approach. Appl. Energy 2024, 375, 123923. [Google Scholar] [CrossRef]
- Yang, T.; Xu, Z.M.; Ji, S.J.; Liu, G.L.; Li, X.H.; Kong, H.B. Cooperative optimal dispatch of multi-microgrids for low carbon economy based on personalized federated reinforcement learning. Appl. Energy 2025, 378, 124641. [Google Scholar] [CrossRef]
- Wang, C.; Wang, M.C.; Wang, A.Q.; Zhang, X.J.; Zhang, J.H.; Ma, H.; Yang, N.; Zhao, Z.L.; Lai, C.S.; Lai, L.L. Multiagent deep reinforcement learning-based cooperative optimal operation with strong scalability for residential microgrid clusters. Energy 2025, 314, 134165. [Google Scholar] [CrossRef]
- Yang, Y.; Takase, T. Spatial characteristics of carbon dioxide emission intensity of urban road traffic and driving factors: Road network and land use. Sustain. Cities Soc. 2024, 113, 105700. [Google Scholar] [CrossRef]
- Yang, X.H.; Li, L.X. A joint sharing-sharing platform for coordinating supply and demand resources at distributed level: Coupling electricity and carbon flows under bounded rationality. Appl. Energy 2025, 393, 126051. [Google Scholar] [CrossRef]
- Gao, X.; Zhang, J.H.; Chang, J.; Wang, S.Q.; Su, Z.A.; Mu, Z.Y. Employing battery energy storage systems for flexible ramping products in a fully renewable energy power grid: A market mechanism and strategy analysis through multi-Agent Markov games. Energy Rep. 2024, 12, 5066–5082. [Google Scholar] [CrossRef]
- Pang, X.F.; Fang, X.; Yu, P.; Zheng, Z.D.; Li, H.B. Optimal scheduling method for electric vehicle charging and discharging via Q-learning-based particle swarm optimization. Energy 2025, 316, 134611. [Google Scholar] [CrossRef]
- Wang, J.H.; Du, C.Q.; Yan, F.W.; Hua, M.; Gongye, X.Y.; Yuan, Q.; Xu, H.M.; Zhou, Q. Bayesian optimization for hyper-parameter tuning of an improved twin delayed deep deterministic policy gradients based energy management strategy for plug-in hybrid electric vehicles. Appl. Energy 2025, 381, 125171. [Google Scholar] [CrossRef]
- Zhang, Z.L.; Wan, Y.N.; Qin, J.H.; Fu, W.M.; Kang, Y. A deep RL-based algorithm for coordinated charging of electric vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18774–18784. [Google Scholar] [CrossRef]
- Sukhbaatar, S.; Fergus, R. Learning multiagent communication with backpropagation. Neural Inf. Process. Syst. 2016, 29, 2244–2252. [Google Scholar]
- Du, W.Y.; Ma, J.; Yin, W.J. Orderly charging strategy of electric vehicle based on improved PSO algorithm. Energy 2023, 271, 127088. [Google Scholar] [CrossRef]
- Wang, X.; Zhou, J.S.; Qin, B.; Guo, L.Z. Coordinated control of wind turbine and hybrid energy storage system based on multi-agent deep reinforcement learning for wind power smoothing. J. Energy Storage 2023, 57, 106297. [Google Scholar] [CrossRef]
- PJM. Historical Hourly Electricity Price from PJM. May 2017. Available online: https://www.pjm.com/ (accessed on 15 March 2025).
Line | From Node | To Node | Resistance (Ω) | Reactance (Ω) |
---|---|---|---|---|
1 | 1 | 2 | 0.0125 | 0.005 |
2 | 2 | 3 | 0.0375 | 0.015 |
3 | 2 | 4 | 0.0300 | 0.012 |
4 | 2 | 5 | 0.0225 | 0.009 |
5 | 2 | 6 | 0.0150 | 0.006 |
6 | 2 | 7 | 0.7500 | 0.300 |
7 | 2 | 12 | 0.7500 | 0.300 |
8 | 1 | 7 | 0.0125 | 0.005 |
9 | 7 | 8 | 0.0250 | 0.010 |
10 | 7 | 9 | 0.0200 | 0.008 |
11 | 7 | 10 | 0.0225 | 0.009 |
12 | 7 | 11 | 0.0275 | 0.011 |
13 | 7 | 12 | 0.7500 | 0.300 |
14 | 1 | 12 | 0.0125 | 0.005 |
15 | 12 | 13 | 0.0175 | 0.007 |
16 | 12 | 14 | 0.0200 | 0.008 |
17 | 12 | 15 | 0.0250 | 0.010 |
18 | 12 | 16 | 0.0250 | 0.010 |
Symbol | Value | Unit | Symbol | Value | Unit |
---|---|---|---|---|---|
0.8 | tCO2/MW·h | 10 | - | ||
8 | $/tCO2 | 0–0.05 | - | ||
30.4 | $/MW·h | 0.05 | - | ||
1040 | $/(MW)2·h | 0.5 | - | ||
1.3 | $ | 5 | $ |
MG | [MW] | [MW] | [MW] | [MW] | [MW] |
MG1 | 0.20 | 0.25 | 0.05 | 0.02 | 0.050 |
MG2 | 0.15 | 0.20 | 0.05 | 0.02 | 0.045 |
MG3 | 0.10 | 0.15 | 0.05 | 0.02 | 0.040 |
MG | [MW] | [MW/h] | [MW/h] | ||
MG1 | 0.20 | 1 | 0.05 | 0.025 | 0.025 |
MG2 | 0.18 | 1 | 0.05 | 0.025 | 0.025 |
MG3 | 0.16 | 1 | 0.05 | 0.025 | 0.025 |
Parameters | Value |
---|---|
Train epochs | 104 |
Batch size | 256 |
Reward discount factor | 0.95 |
Memory capacity | 106 |
Learning rate of actor | 10−4 |
Learning rate of critic | 10−4 |
Approach | Average Cost (USD/Day) | Carbon Emissions (tCO2/Day) |
---|---|---|
PSO | 3244.55 | 10.95 |
TD3 | 2130.45 | 9.46 |
MATD3 | 1641.67 | 8.55 |
Proposed approach | 1246.33 | 7.89 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nie, L.; Long, B.; Yu, M.; Zhang, D.; Yang, X.; Jing, S. A Low-Carbon Economic Scheduling Strategy for Multi-Microgrids with Communication Mechanism-Enabled Multi-Agent Deep Reinforcement Learning. Electronics 2025, 14, 2251. https://doi.org/10.3390/electronics14112251
Nie L, Long B, Yu M, Zhang D, Yang X, Jing S. A Low-Carbon Economic Scheduling Strategy for Multi-Microgrids with Communication Mechanism-Enabled Multi-Agent Deep Reinforcement Learning. Electronics. 2025; 14(11):2251. https://doi.org/10.3390/electronics14112251
Chicago/Turabian StyleNie, Lei, Bo Long, Meiying Yu, Dawei Zhang, Xiaolei Yang, and Shi Jing. 2025. "A Low-Carbon Economic Scheduling Strategy for Multi-Microgrids with Communication Mechanism-Enabled Multi-Agent Deep Reinforcement Learning" Electronics 14, no. 11: 2251. https://doi.org/10.3390/electronics14112251
APA StyleNie, L., Long, B., Yu, M., Zhang, D., Yang, X., & Jing, S. (2025). A Low-Carbon Economic Scheduling Strategy for Multi-Microgrids with Communication Mechanism-Enabled Multi-Agent Deep Reinforcement Learning. Electronics, 14(11), 2251. https://doi.org/10.3390/electronics14112251