Attention-Enhanced Multi-Agent Deep Reinforcement Learning for Inverter-Based Volt-VAR Control in Active Distribution Networks
Abstract
1. Introduction
- (1)
- Dec-POMDP-based modeling and attention-aided centralized critic. We formulate inverter-based VVC in ADNs as a Dec-POMDP to explicitly capture stochastic variations and imperfect local observations. We embed a multi-head attention mechanism into the centralized critic to evaluate the global action-value function, enabling selective aggregation of the most relevant information from other agents during training. This attention-aided critic significantly improves training stability and learning efficiency compared with conventional MADRL approaches.
- (2)
- CTDE paradigm for practical deployment. The proposed method adopts a CTDE paradigm, where the policy networks trained offline during the centralized training stage are executed online using only local observations. This design ensures low computational and communication overhead during real-time operation, clearly distinguishing the proposed framework from fully centralized control schemes that require extensive global information exchange. As a result, the method is well suited for ADNs with imperfect communication infrastructures and stringent real-time control requirements. Moreover, inverter-based strategies leverage existing inverter capabilities and thus avoid additional regulating devices, offering a more economical and readily implementable solution.
- (3)
- Effective and scalable inverter-based VVC performance. The proposed approach effectively mitigates voltage violations and suppresses voltage fluctuations without resorting to active power curtailment. By fully coordinating the reactive power capability of distributed PV inverters, cooperative control among agents is achieved, while each agent independently determines its reactive power adjustment based on shared information within the MADRL framework. Furthermore, the proposed method maintains robust control performance as the number of agents and the penetration level of DGs increase, demonstrating strong robustness and generalization capabilities. This, in turn, enhances the hosting capacity for DGs and improves the overall safety, stability, and reliability of ADNs.
2. Inverter-Based VVC Model
2.1. Principle of PV Inverter Participating in VVC
2.2. VVC Model in Power Distribution Networks
3. VVC Method Based on MAAC
3.1. Multi-Agent Deep Reinforcement Learning
3.2. Offline Centralized Training and Online Decentralized Execution Framework
3.3. Multi-Agent Actor–Attention–Critic (MAAC)
| Algorithm 1 Offline Centralized Training Process |
| Randomly initialize parameters of actor network and centralized critic network of agent Initialize target networks , , and replay buffer . for episode = 1, 2, …, H do Reset environment and obtain the initial global state Obtain initial local observationa for each agent for time step = 1, 2, …, T per episode do Select action for each agent Execute joint action , and receive rewards and next state . Obtain next observations Store transitions in replay buffer Set Randomly sample a minibatch from for agent = 1, 2, …, N do Update the centralized critic using the attention-based Q-function according to (12) and (13) Update the actor network using the policy gradient according to (15) and (16) end for |
| Soft update target networks: end for end for |
| Algorithm 2 Online Decentralized Executing Process |
| Load the trained actor parameters for each agent for time step t = 1, 2, …, T do for agent i = 1, 2, …, N do Obtain local observation Calculate action Output the control action to the corresponding inverter |
|
end for end for |
3.4. Formulation of Dec-POMDP
3.5. Stability and Feasibility Analysis
4. Case Study
4.1. Simulation Example and Experiment Settings
4.2. Results and Analysis
4.3. Proof of Scalability
4.4. Discussion and Insights
- (1)
- Better coordination and credit assignment under partial observability. In practical ADNs, each inverter has only local information. Conventional methods may produce myopic actions and struggle to coordinate across multiple inverters. By contrast, our centralized critic leverages an attention mechanism to selectively aggregate the most relevant information from other agents during training, enabling more accurate evaluation of joint actions and facilitating consistent credit assignment. As a result, agents learn cooperative behaviors that reduce system-wide voltage deviations and violations.
- (2)
- Improved training stability by alleviating multi-agent non-stationarity. Multi-agent environments are inherently non-stationary from the perspective of each individual agent because other agents’ policies are simultaneously updated. The centralized critic helps stabilize the learning signal and reduces gradient variance, which explains the smoother training curves and better convergence behavior compared with conventional MADRL baselines.
- (3)
- Deployment-oriented CTDE design with physically feasible actions. Although the critic uses joint information in centralized training, the learned actors execute in a fully decentralized manner using only local observations, which matches realistic communication constraints. In addition, actions are constrained by inverter capability limits, ensuring that the learned policy remains physically feasible. This CTDE structure enables the method to retain coordination benefits without incurring heavy online computation requirements.
- (4)
- Scalability and robustness. The method maintains performance as the number of agents and PV penetration increases, indicating that attention-based aggregation can focus on influential interactions and avoid redundant information, thereby improving generalization to more complex operating conditions.
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yan, R.; Xu, Y. Multi-Objective and Multi-Agent Deep Reinforcement Learning for Real-Time Decentralized Volt/VAR Control of Distribution Networks Considering PV Inverter Lifetime. IEEE Trans. Power Syst. 2025, 40, 1558–1569. [Google Scholar] [CrossRef]
- Yan, R.; Xing, Q.; Xu, Y. Multi-Agent Safe Graph Reinforcement Learning for PV Inverters-Based Real-Time Decentralized Volt/Var Control in Zoned Distribution Networks. IEEE Trans. Smart Grid 2024, 15, 299–311. [Google Scholar] [CrossRef]
- Quan, H.; Peng, X.; Liu, H.; Zhou, P.; Wu, Z.; Su, H. Real time voltage optimization control method for distribution networks based on deep reinforcement learning. Grid Technol. 2023, 47, 2029–2038. [Google Scholar] [CrossRef]
- Reinaldo, T.; Lopes, L.A.C.; El-Fouly, T.H.M. Coordinated active power curtailment of grid connected PV inverters for overvoltage prevention. IEEE Trans. Sustain. Energy 2011, 2, 139–147. [Google Scholar] [CrossRef]
- Zeraati, M.; Esmail, M.; Golshan, H.; Guerrero, J.M. Distributed control of battery energy storage systems for voltage regulation in distribution networks with high PV penetration. IEEE Trans. Smart Grid 2018, 9, 3582–3593. [Google Scholar] [CrossRef]
- Yang, N.-C.; Zhong, P.-Y. Day-Ahead Scheduling of On-Load Tap Changer Transformer and Switched Capacitors by Multi-Pareto Optimality. Mathematics 2022, 10, 2969. [Google Scholar] [CrossRef]
- Liu, H.; Zhang, C.; Chai, Q.; Meng, K.; Guo, Q.; Dong, Z.Y. Robust regional coordination of inverter-based volt/var control via multi-agent deep reinforcement learning. IEEE Trans. Smart Grid 2021, 12, 5420–5433. [Google Scholar] [CrossRef]
- Mahmoud, K.; Lehtonen, M. Comprehensive analytical expressions for assessing and maximizing technical benefits of photovoltaics to distribution systems. IEEE Trans. Smart Grid 2021, 12, 4938–4949. [Google Scholar] [CrossRef]
- Weckx, S.; Driesen, J. Optimal local reactive power control by PV inverters. IEEE Trans. Sustain. Energy 2016, 7, 1624–1633. [Google Scholar] [CrossRef]
- Singhal, A.; Ajjarapu, V.; Fuller, J.; Hansen, J. Real-Time local volt/var control under external disturbances with high PV penetration. IEEE Trans. Smart Grid 2019, 10, 3849–3859. [Google Scholar] [CrossRef]
- Du, Z.; Lin, X.; Zhong, G.; Liu, H.; Zhao, W. Data-Driven Voltage Control Method of Active Distribution Networks Based on Koopman Operator Theory. Mathematics 2024, 12, 3944. [Google Scholar] [CrossRef]
- Dou, X.; Li, C.; Niu, P.; Sun, D.; Zhang, Q.; Dou, Z. An Optimal Scheduling Method for Power Grids in Extreme Scenarios Based on an Information-Fusion MADDPG Algorithm. Mathematics 2025, 13, 3168. [Google Scholar] [CrossRef]
- Cao, D.; Hu, W.; Zhao, J.; Zhang, G.; Zhang, B.; Liu, Z.; Chen, Z.; Blaabjerg, F. Reinforcement Learning and Its Applications in Modern Power and Energy Systems: A Review. J. Mod. Power Syst. Clean Energy 2020, 8, 1029–1042. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, D.; Qiu, R.C. Deep reinforcement learning for power system applications: An overview. CSEE J. Power Energy Syst. 2020, 6, 213–225. [Google Scholar] [CrossRef]
- Pan, Z.; Quan, H.; Lin, X.; Zhou, H.; Yu, M.; Kang, H.; Chen, L. Reinforcement Learning Based Reactive Power Real-Time Dispatch Optimization in Distribution Networks. In Proceedings of the 2023 5th International Conference on Power and Energy Technology (ICPET), Tianjin, China, 27–30 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 642–647. [Google Scholar] [CrossRef]
- Gao, X.; Xin, H.; Liu, J.; Li, T. Event-Driven Prescribed Optimal Disturbance Rejection for Dynamic Positioning of Ships via Reinforcement Learning. IEEE Trans. Autom. Sci. Eng. 2026. early access. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, X.; Wang, J.; Zhang, Y. Deep reinforcement learning based volt-var optimization in smart distribution systems. IEEE Trans. Smart Grid 2021, 12, 361–371. [Google Scholar] [CrossRef]
- Zhang, J.; Li, Y.; Wu, Z.; Rong, C.; Wang, T.; Zhang, Z.; Zhou, S. Deep-Reinforcement-Learning-Based two-timescale voltage control for distribution systems. Energies 2021, 14, 3540. [Google Scholar] [CrossRef]
- Hossain, R.; Gautam, M.; Thapa, J.; Livani, H.; Benidris, M. Deep reinforcement learning assisted co-optimization of Volt-VAR grid service in distribution networks. Sustain. Energy Grids Netw. 2023, 35, 101086. [Google Scholar] [CrossRef]
- Sun, X.; Qiu, J. Two-Stage volt/var control in active distribution networks with multi-agent deep reinforcement learning method. IEEE Trans. Smart Grid 2021, 12, 2903–2912. [Google Scholar] [CrossRef]
- Zuo, J.; Ai, Q.; Wang, W.; Tao, W. Day-Ahead Economic Dispatch Strategy for Distribution Networks with Multi-Class Distributed Resources Based on Improved MAPPO Algorithm. Mathematics 2024, 12, 3993. [Google Scholar] [CrossRef]
- Cao, D.; Zhao, J.; Hu, W.; Ding, F.; Huang, Q.; Chen, Z. Attention Enabled Multi-Agent DRL for Decentralized Volt-VAR Control of Active Distribution System Using PV Inverters and SVCs. IEEE Trans. Sustain. Energy 2021, 12, 1582–1592. [Google Scholar] [CrossRef]
- Chen, Y.; Liu, Y.; Zhao, J.; Qiu, G.; Yin, H.; Li, Z. Physical-assisted multi-agent graph reinforcement learning enabled fast voltage regulation for PV-rich active distribution network. Appl. Energy 2023, 351, 121743. [Google Scholar] [CrossRef]
- Luo, F.; Wang, S.; Lv, Y.; Mu, R.; Fo, J.; Zhang, T.; Xu, J.; Wang, C. Domain knowledge-enhanced graph reinforcement learning method for Volt/Var control in distribution networks. Appl. Energy 2025, 398, 126409. [Google Scholar] [CrossRef]
- Kabiri, R.; Holmes, G.; McGrath, B.P.; Meegahapola, L.G. LV grid voltage regulation using transformer electronic tap changing, with PV inverter reactive power injection. IEEE J. Emerg. Sel. Top. Power Electron. 2015, 3, 1182–1192. [Google Scholar] [CrossRef]
- Mahmoud, K.; Yorino, N.; Ahmed, A. Optimal distributed generation allocation in distribution systems for loss minimization. IEEE Trans. Power Syst. 2016, 31, 960–969. [Google Scholar] [CrossRef]
- Li, Y.; Hu, X.; Zhuang, Y.; Gao, Z.; Zhang, P.; El-Sheimy, N. Deep reinforcement learning (DRL): Another perspective for unsupervised wireless localization. IEEE Internet Things J. 2020, 7, 6279–6287. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef]
- Fujimoto, S.; Van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: Cambridge, MA, USA, 2018; pp. 1587–1596. [Google Scholar]
- Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multiagent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 6379–6390. [Google Scholar]
- Shariq, I.; Sha, F. Actor-attention-critic for multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; PMLR: Cambridge, MA, USA, 2019; pp. 2961–2970. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: Cambridge, MA, USA, 2018; pp. 1861–1870. [Google Scholar]
- Elia Group. Transparency on Grid Data: Solar Power-Generation. Available online: https://www.elia.be/en/grid-data/generation-data/solar-power-generation (accessed on 18 January 2025).
- UK Power Networks. SmartMeter Energy Consumption Data in London Households. Available online: https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households (accessed on 18 January 2025).
- Thurner, L.; Scheidler, A.; Schäfer, F.; Menke, J.H.; Dollichon, J.; Meier, F.; Meinecke, S.; Braun, M. Pandapower-an open-source pythontool for convenient modeling, analysis, and optimization of electric power systems. IEEE Trans. Power Syst. 2018, 33, 6510–6521. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]














| Parameters | Values |
|---|---|
| Batch size | 64 |
| Replay buffer size | 800 |
| Discount factor | 0.99 |
| Step size | 480 |
| Actor network’s learning rate | 0.0001 |
| Critic network’s learning rate | 0.0001 |
| Measurement noise standard deviation | 0.1 |
| Attention head | 1 |
| Actor/Critic hidden layers | 64 |
| Target network update frequency | 120 |
| Behavior network update frequency | 30 |
| Entropy temperature parameter | 0.001 |
| Soft update factor | 0.1 |
| Method | Average (p.u.) | Execution Time (ms) |
|---|---|---|
| Original | 0.0206 | - |
| Droop control | 0.0117 | 14.58 |
| MADDPG | 0.0112 | 34.29 |
| MATD3 | 0.0162 | 34.77 |
| MAPPO | 0.0178 | 35.21 |
| MAAC | 0.0074 | 33.83 |
| Number of Agents | Node Location of PV | PV Penetration | Average Voltage Deviation (p.u.) |
|---|---|---|---|
| 6 | 13, 18, 22, 25, 29, and 33 | 131% | 0.0074 |
| 8 | 5, 9, 13, 18, 22, 25, 29, and 33 | 172% | 0.0069 |
| 10 | 5, 9, 13, 15, 18, 20, 22, 25, 29, and 33 | 211% | 0.0044 |
| 12 | 4, 7, 10, 13, 15, 18, 20, 22, 25, 29, 31, and 33 | 251% | 0.0038 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chen, W.; Niu, H.; Liu, L.; Lin, J.; Quan, H. Attention-Enhanced Multi-Agent Deep Reinforcement Learning for Inverter-Based Volt-VAR Control in Active Distribution Networks. Mathematics 2026, 14, 839. https://doi.org/10.3390/math14050839
Chen W, Niu H, Liu L, Lin J, Quan H. Attention-Enhanced Multi-Agent Deep Reinforcement Learning for Inverter-Based Volt-VAR Control in Active Distribution Networks. Mathematics. 2026; 14(5):839. https://doi.org/10.3390/math14050839
Chicago/Turabian StyleChen, Wenwen, Hao Niu, Linbo Liu, Jianglong Lin, and Huan Quan. 2026. "Attention-Enhanced Multi-Agent Deep Reinforcement Learning for Inverter-Based Volt-VAR Control in Active Distribution Networks" Mathematics 14, no. 5: 839. https://doi.org/10.3390/math14050839
APA StyleChen, W., Niu, H., Liu, L., Lin, J., & Quan, H. (2026). Attention-Enhanced Multi-Agent Deep Reinforcement Learning for Inverter-Based Volt-VAR Control in Active Distribution Networks. Mathematics, 14(5), 839. https://doi.org/10.3390/math14050839

