An End-to-End Hierarchical Intelligent Inference Model for Collaborative Operation of Grid Switches
Abstract
1. Introduction
- (1)
- This paper constructs a two-layer power grid graph model with bays as intermediate nodes: The upper layer uses bays as nodes to represent the power grid topology and power flow distribution, while the lower layer uses switchgear as nodes to construct the substation busbar layout, achieving the transformation from grid-level topology to substation-level busbar configuration, thus creating a unified knowledge reasoning environment.
- (2)
- A new hierarchical reinforcement learning framework is designed, which automatically transforms the state and action spaces between the upper and lower layers, decoupling the complex constraints of load transfer and switching operations. The upper layer builds the power grid load transfer optimization model, while the lower layer generates the substation switching operation sequences, effectively reducing the problem-solving dimension.
- (3)
- A multi-task learning mechanism is introduced, treating the switching operations of different bays as parallel sub-tasks. This enables secondary decomposition of the solution space and facilitates knowledge transfer and reuse across tasks, thereby improving the efficiency of model training.
2. Structure of the Hierarchical Optimization Model
3. Neo4j Graph Model Based on Bay Topology Integration
4. “Load Transfer—Switching Operation” Hierarchical Optimization Model
4.1. Transfer Supply Space and Maintenance Space
- (1)
- In the graph model, retrieve the detailed information of the maintenance equipment and input this information into the power flow calculation model. By simulating the scenario of isolating the maintenance equipment, perform a power flow analysis of the grid, calculating system parameters such as current, voltage, and power. Identify the outage load areas caused by the equipment shutdown, and feedback the outage load area information into the graph model.
- (2)
- In the graph model, perform a path search from the outage load area to other areas, recording the switch bays along the path as the transfer supply space.
- (3)
- In the graph model, use the maintenance equipment as the starting point to search for the power supply path connected to non-switching equipment, and record the switch bays along the path as the maintenance space. In this process, set as the collection of the bays in both the transfer-supply and maintenance spaces:In the above formula, represents the number of switch bays in the path.
4.2. Transfer-Supply Reasoning Model Based on D3QN
4.2.1. D3QN Algorithm
4.2.2. Dynamic Adjustment Mechanism of Reward Weights
4.2.3. Transfer-Supply State Space
4.2.4. Reward and Penalty Function
- (1)
- Setting node voltage deviation penalty :where is the deviation coefficient in the load transfer optimization model, used to adjust the sensitivity of the reward and penalty to deviations from the target limit. The parameter plays a key role in balancing rewards and penalties, thereby guiding the agent’s learning process. The maximum allowable node voltage deviation is 5, and represents the actual voltage deviation at node. When a node’s voltage deviation exceeds the limit, the agent receives a negative penalty, which increases with the magnitude of the deviation to prevent excessively high voltage deviations.
- (2)
- Setting line overload penalty :where is the maximum line load rate of 100%, and represents the actual load rate of the line. When the line load rate exceeds the limit, a negative penalty is applied, which increases with the degree of overload to mitigate the risk of line overloading.
- (3)
- Setting closing impulse current penalty :where represents the actual closing impulse current of the system, represents the actual closing steady-state current, denotes the setting value of the instantaneous current-breaking protection, and denotes the setting value of the overcurrent protection. When either the impulse current or the steady-state current exceeds its corresponding protection setting, a negative penalty is applied to prevent excessive current surges during the closing operation.
- (4)
- Setting transformer overload penalty :where represents the actual loading rate of the transformer, and is the maximum loading rate, set to 100%. When the transformer loading rate exceeds the limit, a negative penalty is applied, increasing with the degree of overload to prevent transformer overloading.
- (5)
- Setting repeated action penalty :where represents the number of repeated actions for a bay, and denotes the total number of bays in the action space. When a bay switches from a non-operating state to an operating state but is switched back to a non-operating state due to repeated actions, an invalid action is generated. A larger number of repetitions results in a greater penalty, preventing ineffective operations during the reasoning process.
- (6)
- The primary objective of load-transfer optimization is to ensure that no load area is deenergized and that all safety constraints are satisfied. The secondary objective is to minimize the number of bay operations to improve system operating efficiency. The operation efficiency reward is defined as follows:where represents the number of operated bays, and is the initial reward for load transfer. The initial reward is set to 10 to ensure that the agent prioritizes the high reward associated with reaching the target state, thereby avoiding inefficient policies driven by small immediate rewards.
4.3. Switching Operation Reasoning Model Based on MT-D3QN
4.3.1. MT-D3QN Algorithm
4.3.2. Switching Operation Space
4.3.3. State Space
4.3.4. Reward and Penalty Function
- (1)
- Setting penalty for repeated switch operation:where is the deviation coefficient of the switch operation optimization model, which determines the sensitivity of the rewards and penalties to deviations from the target limits. represents the number of repeated actions, and denotes the total number of controllable switches. The greater the number of repetitions, the larger the penalty, thereby preventing invalid switch operations during the reasoning process.
- (2)
- The reward function is designed based on the proximity between the target state and the current device state. The state priorities are set in the following order: operating state , hot standby state , cold standby state , maintenance state , and degraded state ; The state reward function is defined as follows:where represents the priority of the target state, and represents the priority of the current switch state. is the initial reward for the switch operation. Setting to 10 allows the agent to prioritize the high reward of the target state during decision-making, preventing it from being trapped in non-target state strategies due to small immediate rewards.
5. Case Study Analysis
5.1. Case Study 1: Maintenance Verification of the WB33 Busbar at the 35 kV Substation
5.1.1. Experimental Environment and Algorithm Parameter Configuration
5.1.2. Experimental Results and Performance Analysis
5.1.3. Sensitivity Analysis of the Reward Function
5.2. Case Study 2: Random Task Maintenance Test
5.2.1. Performance Analysis
5.2.2. Metrics and Comparison
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| Neo4J | Graph database |
| DQN | Deep Q-Network |
| D3QN | Dueling Double Deep Q-Network |
| MT-D3QN | Multi-Task Dueling Double Deep Q-Network |
| MILP | Mixed-Integer Linear Programming |
References
- Martinez, M.T.V.; Comech, M.P.; Hurtado, A.A.P.; Olivan, M.A.; Cortón, D.L.; Castillo, C.R.D. Software-Defined Analog Processing Based on IEC 61850 Implemented in an Edge Hardware Platform to be Used in Digital Substations. IEEE Access 2024, 12, 11549–11560. [Google Scholar] [CrossRef]
- Guibout, C.; Wataré, A.; Carli, F.; Carbonne, A.; Mourier, K.; Rudolph, T. Centralized Protection and Control for Transmission System Operations: Practical Applications and Perspectives. IEEE Power Energy Mag. 2024, 22, 67–78. [Google Scholar] [CrossRef]
- Chen, Y.; Li, H.; Li, X.; Zhang, K.; Hu, J.; Liu, D. Research of “one key sequence control” test method based on panoramic digital simulation technology. AIP Adv. 2022, 12, 125314. [Google Scholar] [CrossRef]
- Wang, C.; Fu, Z.; Zhang, Z.; Wang, W.; Chen, H.; Xu, D. Fault Diagnosis of Power Transformer in One-Key Sequential Control System of Intelligent Substation Based on a Transformer Neural Network Model. Processes 2024, 12, 824. [Google Scholar] [CrossRef]
- Wang, M.; Yang, M.; Fang, Z.; Wang, M.; Wu, Q. A Practical Feeder Planning Model for Urban Distribution System. IEEE Trans. Power Syst. 2023, 38, 1297–1308. [Google Scholar] [CrossRef]
- Chen, B.; Liu, J.; Wu, H.; Wang, H.; Chen, Y. Flexible-resource coordination supply recovery of active distribution network considering multiple demand responses. Front. Energy Res. 2024, 12, 1496247. [Google Scholar] [CrossRef]
- Ghasemi, S.; Darwesh, A.; Moshtagh, J. Critical loads restoration of distribution networks after blackout by microgrids to improve network resiliency. Electr. Eng. 2023, 105, 2909–2922. [Google Scholar] [CrossRef]
- Wen, J.; Qu, X.; Jiang, L.; Lin, S. A Hierarchical Restoration Mechanism for Distribution Networks Considering Multiple Faults. Math. Probl. Eng. 2022, 2022, 8787262. [Google Scholar] [CrossRef]
- Poudel, S.; Dubey, A. A Two-Stage Service Restoration Method for Electric Power Distribution Systems. IET Gener. Transm. Distrib. 2021, 4, 500–521. [Google Scholar] [CrossRef]
- Arif, A.; Cui, B.; Wang, Z. Switching Device-Cognizant Sequential Distribution System Restoration. IEEE Trans. Power Syst. 2022, 37, 317–329. [Google Scholar] [CrossRef]
- Ma, X.; Peng, B.; Ma, X.; Tian, C.; Yan, Y. Multi-timescale optimization scheduling of regional integrated energy system based on source-load joint forecasting. Energy 2023, 283, 129186. [Google Scholar] [CrossRef]
- Xing, H.; Hong, S.; Sun, X. Active distribution network expansion planning considering distributed generation integration and network reconfiguration. J. Electr. Eng. Technol. 2018, 13, 540–549. [Google Scholar] [CrossRef]
- Lü, X.; He, S.; Xu, Y.; Zhai, X.; Qian, S.; Wu, T.; Wang, Y. Overview of improved dynamic programming algorithm for optimizing energy distribution of hybrid electric vehicles. Electr. Power Syst. Res. 2024, 232, 110372. [Google Scholar] [CrossRef]
- Pereira, E.C.; Barbosa, C.H.N.R.; Vasconcelos, J.A. Distribution Network Reconfiguration Using Iterative Branch Exchange and Clustering Technique. Energies 2023, 16, 2395. [Google Scholar] [CrossRef]
- Ayanlade, S.O.; Ariyo, F.K.; Jimoh, A.; Akindeji, K.T.; Adetunji, A.O.; Ogunwole, E.I.; Owolabi, D.E. Optimal Allocation of Photovoltaic Distributed Generations in Radial Distribution Networks. Sustainability 2023, 15, 13933. [Google Scholar] [CrossRef]
- Shukla, V.; Mukherjee, V.; Singh, B. Genetic algorithm based for coordinated control of distributed generations with different load models. Int. J. Syst. Assur. Eng. Manag. 2025, 16, 89–112. [Google Scholar] [CrossRef]
- Bosisio, A.; Berizzi, A.; Lupis, D.; Morotti, G.; Iannarelli, I.; Greco, B. A Tabu-search-based Algorithm for Distribution Network Restoration to Improve Reliability and Resiliency. J. Mod. Power Syst. Clean Energy 2023, 11, 302–311. [Google Scholar] [CrossRef]
- Ren, C.; Zhou, J.; Xu, X.; Mao, Y.; Ma, Y.; Wang, B. Load Balance and Recovery Optimization of Distribution Network Based on Binary Particle Swarm Optimization Algorithm. In Proceedings of the 2022 5th International Conference on Renewable Energy and Power Engineering (REPE), Beijing, China, 28–30 September 2022; pp. 103–107. [Google Scholar] [CrossRef]
- Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
- Shakya, A.K.; Pillai, G.; Chakrabarty, S. Reinforcement learning algorithms: A brief survey. Expert Syst. Appl. 2023, 231, 120495. [Google Scholar] [CrossRef]
- Nakabi, T.A.; Toivanen, P. Deep reinforcement learning for energy management in a microgrid with flexible demand. Sustain. Energy Grids Netw. 2021, 25, 100413. [Google Scholar] [CrossRef]
- Zhang, K.; Zhang, J.; Xu, P.D.; Gao, T.; Gao, D.W. Explainable AI in Deep Reinforcement Learning Models for Power System Emergency Control. IEEE Trans. Comput. Soc. Syst. 2022, 9, 419–427. [Google Scholar] [CrossRef]
- Lee, D.; He, N.; Kamalaruban, P.; Cevher, V. Optimization for Reinforcement Learning: From a single agent to cooperative agents. IEEE Signal Process. Mag. 2020, 37, 123–135. [Google Scholar] [CrossRef]
- Liu, C.; Zhu, F.; Liu, Q.; Fu, Y. Hierarchical Reinforcement Learning With Automatic Sub-Goal Identification. IEEE CAA J. Autom. Sin. 2021, 8, 1686–1696. [Google Scholar] [CrossRef]
- Mao, Z.; Liu, Y.; Qu, A. Integrating Big Data Analytics in Autonomous Driving: An Unsupervised Hierarchical Reinforcement Learning Approach. Transp. Res. Part C Emerg. Technol. 2024, 162, 104606. [Google Scholar] [CrossRef]
- Zhang, S.; Yan, Y.; Bao, W.; Guo, S.; Jiang, J.; Ma, M. Network Topology Identification Algorithm Based on Adjacency Matrix. In Proceedings of the 2017 IEEE Innovative Smart Grid Technologies—Asia (ISGT-Asia), Auckland, New Zealand, 4–7 December 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Park, S.; Gama, F.; Lavaei, J.; Sojoudi, S. Distributed Power System State Estimation Using Graph Convolutional Neural Networks. In Proceedings of the Hawaii International Conference on System Sciences 2023, Maui, HI, USA, 3–6 January 2023. [Google Scholar] [CrossRef]
- Zhu, D.; Zeng, W.; Su, J. Construction of transformer substation fault knowledge graph based on a depth learning algorithm. Int. J. Model. Simul. Sci. Comput. 2022, 14, 2341017. [Google Scholar] [CrossRef]
- Chen, T.; Yang, P.; Li, H.; Gao, J.; Yuan, Y. Two-Stage Optimization Model Based on Neo4j-Dueling Deep Q Network. Energies 2024, 17, 4998. [Google Scholar] [CrossRef]
- Senisetty, M.; Kiran, P. Energy Optimization in Microgrids: A Federated Multi-Task Reinforcement Learning Approach. In Proceedings of the 2025 4th International Conference on Advances in Computing, Communication, Embedded and Secure Systems (ACCESS), Ernakulam, India, 11–13 June 2025; pp. 179–184. [Google Scholar] [CrossRef]
- Cao, J.; Wang, X.; Wang, Y.; Tian, Y. An improved Dueling Deep Q-network with optimizing reward functions for driving decision method. Proc. Inst. Mech. Eng. Part D 2022, 237, 2295–2299. [Google Scholar] [CrossRef]
- Zeng, L.; Yao, W.; Shuai, H.; Zhou, Y.; Ai, X.; Wen, J. Resilience Assessment for Power Systems Under Sequential Attacks Using Double DQN With Improved Prioritized Experience Replay. IEEE Syst. J. 2023, 17, 1865–1876. [Google Scholar] [CrossRef]














| Substation Number | Main Transformer | Voltage Ratio /(KV/KV) | Capacity /MVA | Area Number | Area Load/MW |
|---|---|---|---|---|---|
| Substation 1 | T31 | 35/10 | 8 | Area1 | 4.2 |
| T32 | 35/10 | 8 | Area2 | 3.5 | |
| Substation 2 | T33 | 35/10 | 8 | Area3 | 3.9 |
| T34 | 35/10 | 8 | Area4 | 3.0 |
| Parameter | D3QN | MT-D3QN |
|---|---|---|
| Network Architecture | 256 × 1024 × 512 | 128 × 1024 × 512 |
| Experience Replay Capacity | 10,000 | 10,000 |
| Batch Size | 64 | 128 |
| Learning Rate | 0.005 | 0.009 |
| Exploration Rate | 0.9 | 0.9 |
| Discount Factor | 0.9 | 0.95 |
| Minimum Exploration Rate | 0.01 | 0.01 |
| Update Frequency | 50 | 50 |
| Scenario | Optimization Stage | Result |
|---|---|---|
| WB33 Busbar Maintenance | Transfer Space | 1052bay, 1054bay |
| Maintenance Space | 1037bay, 1031bay | |
| Transfer Switching Space | 1052, 10521, 10522, 105211, 105212, 1054, 10541, 10542, 105411, 105412 | |
| Maintenance Switching Space | 1037, 10371, 10372, 103711, 103712, 1031, 10311, 10312, 103111, 103112 | |
| Switching Operation Sequence | close10522, 10521, 1052, 10541, 10542, 1054; open1031, 1037, 10312, 10372, 10311, 10371; close103111, 103711, 103112, 103712 |
| Substation Number | Main Transformer | Voltage Ratio /(KV/KV) | Capacity /MVA | Area Number | Area Load/MW |
|---|---|---|---|---|---|
| Substation 1 | T11 | 220/110 | 70 | Area1 Area2 | 14.00 15.40 |
| T12 | 110/10 | 31.5 | |||
| T13 | 110/10 | 31.5 | |||
| Substation 2 | T21 | 220/110 | 70 | Area3 Area4 | 12.80 12.60 |
| T22 | 110/10 | 31.5 | |||
| T23 | 110/10 | 31.5 | |||
| Substation 3 | T31 | 220/110 | 70 | Area5 Area6 | 15.10 15.70 |
| T32 | 110/10 | 31.5 | |||
| T33 | 110/10 | 31.5 |
| Algorithm | Transfer Success Rate/% | Maintenance Success Rate/% | Power Flow Violation Rate/% | Switching Misoperation Rate/% | Average Reward | Average Inference Time/s |
|---|---|---|---|---|---|---|
| MILP | 71.17 | 71.17 | / | / | / | 1680.46 |
| DQN | 46.67 | 66.67 | 53.33 | 33.33 | 9.49 | 831.07 |
| D3QN | 73.33 | 86.67 | 26.67 | 13.33 | 12.03 | 660.87 |
| D3QN + MT-D3QN | 86.67 | 100 | 13.33 | 0 | 13.71 | 309.47 |
| Scenario | Algorithm | Inference Results |
|---|---|---|
| T33 Maintenance | DQN | Colse1043, 1047, 10431, 10471, 10432, 10472; Open1032, 138, 10321, 1381, 10322, 1382; Colse103211,13811, 103212, 13812; |
| D3QN | Colse1043, 1046, 10431, 10461, 10432, 10462; Open1032, 138, 10321, 1381, 10322, 1382; Colse103211, 13811, 103212, 13812; | |
| MILP | Colse1045, 1042, 1046, 10451, 10422, 10461, 10452, 10421, 10462; Open1032, 138, 10321, 1381, 10322, 1382; Colse103211, 13811, 103212, 13812; | |
| D3QN + MT-D3QN | Colse1050, 1046, 10501, 10461, 10502, 10462; Open1032, 138, 10321, 1381, 10322, 1382; Colse103211, 13811, 103212, 13812; | |
| L1 Maintenance | DQN | Colse1043, 1049, 10431, 10491, 10432, 10492; Open111, 213, 1111, 2131, 1112, 2132; Colse1113, 11111, 21311, 11112, 21312; |
| D3QN | Colse1042, 1044, 10421, 10441, 10422, 10442; Open111, 213, 1111, 2131, 1112, 2132; Colse11111, 21311, 11112, 21312; | |
| MILP | Colse1043, 1045, 1048, 10432, 10452, 10481, 10432, 10451, 10482; Open111, 213, 1111, 2131, 1112, 2132; Colse1113, 11111, 21311, 11112, 21312; | |
| D3QN + MT-D3QN | Colse1045, 1044, 10451, 10441, 10452, 10442; Open111, 213, 1111, 2131, 1112, 2132; Colse11111, 21311, 11112, 21312; | |
| QF123 Maintenance | DQN | Colse1041, 1044, 10411, 10441, 10412, 10442; Open123, 1231, 1233; Colse12311, 12312; |
| D3QN | Colse1043, 1044, 1045, 10431, 10441, 10451, 10432, 10442, 10452; Open123, 1231, 1233; Colse12311, 12312; | |
| MILP | Colse1043, 1048, 1044, 10431, 10481, 10441, 10432, 10482, 10442; Open123, 1231, 1233; Colse12311, 12312; | |
| D3QN + MT-D3QN | Colse1047, 1045, 10471, 10451, 10472, 10452; Open123, 1231, 1233; Colse12311, 12312; |
| Scenario | Algorithm | Voltage Deviation (MAE) | Average Line Load Rate (%) | Line Max Load (%) | Average Transformer Load Rate (%) | Transformer Max Load (%) | Total Switch Operation | Degraded State Ratio (%) | Loop Impact Limit Violation Rate (%) |
|---|---|---|---|---|---|---|---|---|---|
| T33 Maintenance | MILP | 0.823 | 40.392 | 41.378 | 64.113 | 78.123 | 18 | 0 | 0 |
| DQN | 0.915 | 45.426 | 48.435 | 67.517 | 86.752 | 15 | 0 | 0 | |
| D3QN | 0.847 | 40.1 | 40.241 | 65.376 | 80.537 | 15 | 0 | 0 | |
| D3QN + MT-D3QN | 0.798 | 37.142 | 38.232 | 62.896 | 72.154 | 15 | 0 | 0 | |
| L1 Maintenance | MILP | 1.642 | 42.392 | 46.978 | 64.113 | 83.123 | 20 | 0 | 0 |
| DQN | 1.897 | 47.426 | 49.456 | 69.517 | 88.723 | 20 | 25 | 50 | |
| D3QN | 1.786 | 43.213 | 47.251 | 65.376 | 85.557 | 18 | 0 | 0 | |
| D3QN + MT-D3QN | 1.498 | 39.542 | 42.331 | 61.853 | 77.256 | 18 | 0 | 0 | |
| QF123 Maintenance | MILP | 1.598 | 17.026 | 46.978 | 63.113 | 83.123 | 20 | 0 | 0 |
| DQN | 1.783 | 17.047 | 49.456 | 71.517 | 90.723 | 20 | 0 | 50 | |
| D3QN | 1.598 | 17.028 | 47.251 | 65.376 | 85.557 | 18 | 0 | 0 | |
| D3QN + MT-D3QN | 1.498 | 17.013 | 42.331 | 62.896 | 77.762 | 18 | 0 | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, M.; Chen, T.; Yuan, J.; Jiang, Y.; Ren, J. An End-to-End Hierarchical Intelligent Inference Model for Collaborative Operation of Grid Switches. Energies 2025, 18, 6574. https://doi.org/10.3390/en18246574
Zhao M, Chen T, Yuan J, Jiang Y, Ren J. An End-to-End Hierarchical Intelligent Inference Model for Collaborative Operation of Grid Switches. Energies. 2025; 18(24):6574. https://doi.org/10.3390/en18246574
Chicago/Turabian StyleZhao, Mingrui, Tie Chen, Jiaxin Yuan, Yuting Jiang, and Junlin Ren. 2025. "An End-to-End Hierarchical Intelligent Inference Model for Collaborative Operation of Grid Switches" Energies 18, no. 24: 6574. https://doi.org/10.3390/en18246574
APA StyleZhao, M., Chen, T., Yuan, J., Jiang, Y., & Ren, J. (2025). An End-to-End Hierarchical Intelligent Inference Model for Collaborative Operation of Grid Switches. Energies, 18(24), 6574. https://doi.org/10.3390/en18246574

