Physics-Informed Multi-Agent DRL-Based Active Distribution Network Zonal Balancing Control Strategy for Security and Supply Preservation
Abstract
1. Introduction
- Compared with previous security reinforcement learning algorithms, this paper proposes a multi-agent security reinforcement learning algorithm based on a local physical model, which ensures the security of the policy through strict physical model constraints. The proposed method does not rely on any a priori knowledge and is fully guaranteed to satisfy the security constraints of the system during operation.
- For large-scale and clustered distributed photovoltaic grid connections, we design a training structure based on a centralized training–decentralized execution (CTDE) framework, while incorporating a sequential updating methodology to enhance the effectiveness of the training and to achieve zonal balancing control and efficient power preservation.
2. Materials and Methods
2.1. A Secure Scheduling Framework Based on Local Physical Models
2.1.1. Overall Framework and Core Principles
2.1.2. Local Optimization Models Based on Physical Models
2.1.3. Global ADN Model
2.2. A Partition Balancing Control Strategy Based on Sequential Updates
2.2.1. Dec-POMDP Formulation
2.2.2. Sequential Update Strategy
| Algorithm 1 The Proposed Hybrid Model–Data-Driven Approach with MAPPO |
| 1: Initialize replay buffer , Number of episode , Episode length . 2: do 3:’ do 4: do 5: . 6: end for 7: . 8: Solve the local model and obtain operation set-points. 9: and record them in buffer . 10: end for 11: Sample a mini-batch from . 12: do 13: based on Equations (25) and (26). 14: end for 15: by using MSE: 16: 17: end for |
3. Results
3.1. Simulation Setup
3.2. Performance Evaluation of the Proposed Algorithm
3.3. Comparison with Existing Algorithms
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| DG | Distributed generation |
| DPV | Distributed photovoltaics |
| ADN | Active distribution network |
| MINLP | Mixed-integer nonlinear programming |
| MILP | Mixed-integer linear programming |
| DRL | Deep reinforcement learning |
| MARL | Multi-agent reinforcement learning |
| CTDE | Centralized training decentralized execution |
| DNN | Deep neural network |
| SOC | State of charge |
| MADDPG | Multi-agent deep deterministic policy gradient |
| PPO | Proximal policy optimization |
| ReLU | Rectified linear unit |
References
- Ma, Z. Form and Development Trend of Future Distribution System; China Electric Power Research Institute: Beijing, China, 2015. [Google Scholar]
- Lopes, J.A.P.; Hatziargyriou, N.; Mutale, J.; Djapic, P.; Jenkins, N. Integrating distributed generation into electric power systems: A review of drivers, challenges and opportunities. Electr. Power Syst. Res. 2007, 77, 1189–1203. [Google Scholar] [CrossRef]
- Qian, T.; Fang, M.; Hu, Q.; Shao, C.; Zheng, J. V2Sim: An Open-Source Microscopic V2G Simulation Platform in Urban Power and Transportation Network. IEEE Trans. Smart Grid 2025, 1. [Google Scholar] [CrossRef]
- Qian, T.; Liang, Z.; Shao, C.; Guo, Z.; Hu, Q.; Wu, Z. Unsupervised learning for efficiently distributing EVs charging loads and traffic flows in coupled power and transportation systems. Appl. Energy 2025, 377, 124476. [Google Scholar] [CrossRef]
- Liang, Z.; Qian, T.; Korkali, M.; Glatt, R.; Hu, Q. A Vehicle-to-Grid planning framework incorporating electric vehicle user equilibrium and distribution network flexibility enhancement. Appl. Energy 2024, 376, 124231. [Google Scholar] [CrossRef]
- Shang, Y.; Li, D.; Li, Y.; Li, S. Explainable spatiotemporal multi-task learning for electric vehicle charging demand prediction. Appl. Energy 2025, 384, 125460. [Google Scholar] [CrossRef]
- Xie, S.; Wu, Q.; Zhang, M.; Guo, Y. Coordinated Energy Pricing for Multi-Energy Networks Considering Hybrid Hydrogen-Electric Vehicle Mobility. IEEE Trans. Power Syst. 2024, 39, 7304–7317. [Google Scholar] [CrossRef]
- Xin-gang, Z.; Zhen, W. Technology, cost, economic performance of distributed photovoltaic industry in China. Renew. Sustain. Energy Rev. 2019, 110, 53–64. [Google Scholar] [CrossRef]
- Carrasco, J.M.; Franquelo, L.G.; Bialasiewicz, J.T.; Galvan, E.; PortilloGuisado, R.C.; Prats, M.A.M.; Leon, J.I.; Moreno-Alfonso, N. Power-Electronic Systems for the Grid Integration of Renewable Energy Sources: A Survey. IEEE Trans. Ind. Electron. 2006, 53, 1002–1016. [Google Scholar] [CrossRef]
- Hu, Z.; Su, R.; Veerasamy, V.; Huang, L.; Ma, R. Resilient Frequency Regulation for Microgrids Under Phasor Measurement Unit Faults and Communication Intermittency. IEEE Trans. Ind. Inform. 2025, 21, 1941–1949. [Google Scholar] [CrossRef]
- Braun, M.; Stetz, T.; Bründlinger, R.; Mayr, C.; Ogimoto, K.; Hatta, H.; Kobayashi, H.; Kroposki, B.; Mather, B.; Coddington, M.; et al. Is the distribution grid ready to accept large-scale photovoltaic deployment? State of the art, progress, and future prospects. Prog. Photovolt. Res. Appl. 2012, 20, 681–697. [Google Scholar] [CrossRef]
- Yan, G.; Wang, Q.; Zhang, H.; Wang, L.; Wang, L.; Liao, C. Review on the Evaluation and Improvement Measures of the Carrying Capacity of Distributed Power Supply and Electric Vehicles Connected to the Grid. Energies 2024, 17, 4407. [Google Scholar] [CrossRef]
- Huang, Y.; Lin, Z.; Liu, X.; Yang, L.; Dan, Y.; Zhu, Y.; Ding, Y.; Wang, Q. Bi-level Coordinated Planning of Active Distribution Network Considering Demand Response Resources and Severely Restricted Scenarios. J. Mod. Power Syst. Clean Energy 2021, 9, 1088–1100. [Google Scholar] [CrossRef]
- Huang, S.; Han, D.; Pang, J.Z.F.; Chen, Y. Optimal Real-Time Bidding Strategy for EV Aggregators in Wholesale Electricity Markets. IEEE Trans. Intell. Transp. Syst. 2025, 26, 5538–5551. [Google Scholar] [CrossRef]
- Xie, S.; Wu, Q.; Hatziargyriou, N.D.; Zhang, M.; Zhang, Y.; Xu, Y. Collaborative Pricing in a Power-Transportation Coupled Network: A Variational Inequality Approach. IEEE Trans. Power Syst. 2023, 38, 783–795. [Google Scholar] [CrossRef]
- Wang, C.; Yan, M.; Pang, K.; Wen, F.; Teng, F. Cyber-Physical Interdependent Restoration Scheduling for Active Distribution Network via Ad Hoc Wireless Communication. IEEE Trans. Smart Grid 2023, 14, 3413–3426. [Google Scholar] [CrossRef]
- Wang, C.; Lin, W.; Wang, G.; Shahidehpour, M.; Liang, Z.; Zhang, W.; Chung, C.Y. Frequency-Constrained Optimal Restoration Scheduling in Active Distribution Networks With Dynamic Boundaries for Networked Microgrids. IEEE Trans. Power Syst. 2025, 40, 2061–2077. [Google Scholar] [CrossRef]
- Ji, H.; Wang, C.; Li, P.; Ding, F.; Wu, J. Robust Operation of Soft Open Points in Active Distribution Networks With High Penetration of Photovoltaic Integration. IEEE Trans. Sustain. Energy 2019, 10, 280–289. [Google Scholar] [CrossRef]
- Zhao, J.; Li, Y.; Li, P.; Wang, C.; Ji, H.; Ge, L.; Song, Y. Local voltage control strategy of active distribution network with PV reactive power optimization. In Proceedings of the 2017 IEEE Power & Energy Society General Meeting, Chicago, IL, USA, 16–20 July 2017; pp. 1–5. [Google Scholar]
- Li, X.; Hu, C.; Luo, S.; Lu, H.; Piao, Z.; Jing, L. Distributed Hybrid-Triggered Observer-Based Secondary Control of Multi-Bus DC Microgrids Over Directed Networks. IEEE Trans. Circuits Syst. I Regul. Pap. 2025, 72, 2467–2480. [Google Scholar] [CrossRef]
- Qi, J.; Ying, A.; Zhang, B.; Zhou, D.; Weng, G. Distributed Frequency Regulation Method for Power Grids Considering the Delayed Response of Virtual Power Plants. Energies 2025, 18, 1361. [Google Scholar] [CrossRef]
- Wang, C.; Xu, Y.; Pang, K.; Shahidehpour, M.; Wang, Q.; Wang, G.; Wen, F. Imposing Fine-Grained Synthetic Frequency Response Rate Constraints for IBR-Rich Distribution System Restoration. IEEE Trans. Power Syst. 2025, 40, 2799–2802. [Google Scholar] [CrossRef]
- Yang, Q.; Wang, G.; Sadeghi, A.; Giannakis, G.B.; Sun, J. Two-Timescale Voltage Control in Distribution Grids Using Deep Reinforcement Learning. IEEE Trans. Smart Grid 2020, 11, 2313–2323. [Google Scholar] [CrossRef]
- Chen, X.; Qu, G.; Tang, Y.; Low, S.; Li, N. Reinforcement Learning for Selective Key Applications in Power Systems: Recent Advances and Future Challenges. IEEE Trans. Smart Grid 2022, 13, 2935–2958. [Google Scholar] [CrossRef]
- Qian, T.; Liang, Z.; Shao, C.; Zhang, H.; Hu, Q.; Wu, Z. Offline DRL for Price-Based Demand Response: Learning From Suboptimal Data and Beyond. IEEE Trans. Smart Grid 2024, 15, 4618–4635. [Google Scholar] [CrossRef]
- Qian, T.; Ming, W.; Shao, C.; Hu, Q.; Wang, X.; Wu, J.; Wu, Z. An Edge Intelligence-Based Framework for Online Scheduling of Soft Open Points With Energy Storage. IEEE Trans. Smart Grid 2024, 15, 2934–2945. [Google Scholar] [CrossRef]
- Qian, T.; Liang, Z.; Chen, S.; Hu, Q.; Wu, Z. A Tri-Level Demand Response Framework for EVCS Flexibility Enhancement in Coupled Power and Transportation Networks. IEEE Trans. Smart Grid 2025, 16, 598–611. [Google Scholar] [CrossRef]
- Cao, D.; Zhao, J.; Hu, J.; Pei, Y.; Huang, Q.; Chen, Z.; Hu, W. Physics-Informed Graphical Representation-Enabled Deep Reinforcement Learning for Robust Distribution System Voltage Control. IEEE Trans. Smart Grid 2024, 15, 233–246. [Google Scholar] [CrossRef]
- Jiang, C.; Lin, Z.; Liu, C.; Chen, F.; Shao, Z. MADDPG-Based Active Distribution Network Dynamic Reconfiguration with Renewable Energy. Prot. Control. Mod. Power Syst. 2024, 9, 143–155. [Google Scholar] [CrossRef]
- Sun, X.; Qiu, J. Two-Stage Volt/Var Control in Active Distribution Networks With Multi-Agent Deep Reinforcement Learning Method. IEEE Trans. Smart Grid 2021, 12, 2903–2912. [Google Scholar] [CrossRef]
- Wang, T.; Ma, S.; Tang, Z.; Xiang, T.; Mu, C.; Jin, Y. A Multi-Agent Reinforcement Learning Method for Cooperative Secondary Voltage Control of Microgrids. Energies 2023, 16, 5653. [Google Scholar] [CrossRef]
- Yang, X.; Liu, H.; Wu, W. Attention-Enhanced Multi-Agent Reinforcement Learning Against Observation Perturbations for Distributed Volt-VAR Control. IEEE Trans. Smart Grid 2024, 15, 5761–5772. [Google Scholar] [CrossRef]
- Zhang, L.; Yang, F.; Yan, D.; Qian, G.; Li, J.; Shi, X.; Xu, J.; Wei, M.; Ji, H.; Yu, H. Multi-Agent Deep Reinforcement Learning-Based Distributed Voltage Control of Flexible Distribution Networks with Soft Open Points. Energies 2024, 17, 5244. [Google Scholar] [CrossRef]
- Zhan, H.; Jiang, C.; Lin, Z. A Novel Graph Reinforcement Learning-Based Approach for Dynamic Reconfiguration of Active Distribution Networks with Integrated Renewable Energy. Energies 2024, 17, 6311. [Google Scholar] [CrossRef]
- Zhang, Q.; Dehghanpour, K.; Wang, Z.; Qiu, F.; Zhao, D. Multi-Agent Safe Policy Learning for Power Management of Networked Microgrids. IEEE Trans. Smart Grid 2021, 12, 1048–1062. [Google Scholar] [CrossRef]
- Zhang, J.; Sang, L.; Xu, Y.; Sun, H. Networked Multiagent-Based Safe Reinforcement Learning for Low-Carbon Demand Management in Distribution Networks. IEEE Trans. Sustain. Energy 2024, 15, 1528–1545. [Google Scholar] [CrossRef]
- Li, H.; He, H. Learning to Operate Distribution Networks With Safe Deep Reinforcement Learning. IEEE Trans. Smart Grid 2022, 13, 1860–1872. [Google Scholar] [CrossRef]
- Bhatnagar, S.; Lakshmanan, K. An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes. J. Optim. Theory Appl. 2012, 153, 688–708. [Google Scholar] [CrossRef]
- Hua, D.; Peng, F.; Liu, S.; Lin, Q.; Fan, J.; Li, Q. Coordinated Volt/VAR Control in Distribution Networks Considering Demand Response via Safe Deep Reinforcement Learning. Energies 2025, 18, 333. [Google Scholar] [CrossRef]
- Kim, D.; Oh, S. TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning. IEEE Robot. Autom. Lett. 2022, 7, 2621–2628. [Google Scholar] [CrossRef]
- Zhang, Q.; Leng, S.; Ma, X.; Liu, Q.; Wang, X.; Liang, B.; Liu, Y.; Yang, J. CVaR-Constrained Policy Optimization for Safe Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 830–841. [Google Scholar] [CrossRef]
- Chen, P.; Liu, S.; Wang, X.; Kamwa, I. Physics-Shielded Multi-Agent Deep Reinforcement Learning for Safe Active Voltage Control With Photovoltaic/Battery Energy Storage Systems. IEEE Trans. Smart Grid 2023, 14, 2656–2667. [Google Scholar] [CrossRef]
- Wang, C.; Zhou, S.; Wang, L.; Lu, Z.; Wu, C.; Wen, X.; Shou, G. Autonomous Driving via Knowledge-Enhanced Safe Reinforcement Learning. IEEE Trans. Intell. Veh. 2024, 1–14. [Google Scholar] [CrossRef]
- Zhao, H.; Zhao, J.; Qiu, J.; Liang, G.; Dong, Z.Y. Cooperative Wind Farm Control With Deep Reinforcement Learning and Knowledge-Assisted Learning. IEEE Trans. Ind. Inform. 2020, 16, 6912–6921. [Google Scholar] [CrossRef]
- Lin, H.; Shen, X.; Guo, Y.; Ding, T.; Sun, H. A linear Distflow model considering line shunts for fast calculation and voltage control of power distribution systems. Appl. Energy 2024, 357, 122467. [Google Scholar] [CrossRef]
- Neumann, F.; Hagenmeyer, V.; Brown, T. Assessments of linear power flow and transmission loss approximations in coordinated capacity expansion problems. Appl. Energy 2022, 314, 118859. [Google Scholar] [CrossRef]
- Song, T.; Han, X.; Zhang, B. Multi-Time-Scale Optimal Scheduling in Active Distribution Network with Voltage Stability Constraints. Energies 2021, 14, 7107. [Google Scholar] [CrossRef]
- Kuba, J.; Feng, X.; Ding, S.; Dong, H.; Wang, J.; Yang, Y. Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL. arXiv 2022, arXiv:2208.01682. [Google Scholar]
- Yu, C.; Velu, A.; Vinitsky, E.; Wang, Y.; Bayen, A.M.; Wu, Y. The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. arXiv 2021, arXiv:2103.01955. [Google Scholar]











| Classification | Features | References |
|---|---|---|
| Penalty function method | most direct; cannot completely guarantee safety and is the least effective | [38,39] |
| Trust domain method | can ensure safety; too conservative to be optimal | [40,41] |
| Security exploration method | improves the security of the training process; it is challenging to obtain correct and sufficient a priori knowledge | [42,43,44] |
| Parameters | Specific Values |
|---|---|
| actor learning rate | 0.001 |
| critic learning rate | 0.001 |
| clip coefficient | 0.2 |
| entropy coefficient | 0.01 |
| discount factor | 0.99 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhai, B.; Li, Y.; Qiu, W.; Zhang, R.; Jiang, Z.; Wang, W.; Qian, T.; Hu, Q. Physics-Informed Multi-Agent DRL-Based Active Distribution Network Zonal Balancing Control Strategy for Security and Supply Preservation. Energies 2025, 18, 2959. https://doi.org/10.3390/en18112959
Zhai B, Li Y, Qiu W, Zhang R, Jiang Z, Wang W, Qian T, Hu Q. Physics-Informed Multi-Agent DRL-Based Active Distribution Network Zonal Balancing Control Strategy for Security and Supply Preservation. Energies. 2025; 18(11):2959. https://doi.org/10.3390/en18112959
Chicago/Turabian StyleZhai, Bingxu, Yuanzhuo Li, Wei Qiu, Rui Zhang, Zhilin Jiang, Wei Wang, Tao Qian, and Qinran Hu. 2025. "Physics-Informed Multi-Agent DRL-Based Active Distribution Network Zonal Balancing Control Strategy for Security and Supply Preservation" Energies 18, no. 11: 2959. https://doi.org/10.3390/en18112959
APA StyleZhai, B., Li, Y., Qiu, W., Zhang, R., Jiang, Z., Wang, W., Qian, T., & Hu, Q. (2025). Physics-Informed Multi-Agent DRL-Based Active Distribution Network Zonal Balancing Control Strategy for Security and Supply Preservation. Energies, 18(11), 2959. https://doi.org/10.3390/en18112959

