LLM-Based Dynamic Distribution Network Reconfiguration with Distributed Photovoltaics
Abstract
1. Introduction
- An LLM-based human-machine collaborative framework for dynamic distribution network reconfiguration is proposed. The framework integrates a DRL-based base policy with an LLM-based instruction-supervision layer. It allows the operator to inject diverse, temporary natural language instructions (e.g., limiting switching operations within specific time windows or avoiding particular feeders) into the DNR process without re-modeling or retraining the control policy.
- We design a two-stage decision chain that first uses the DRL policy to generate candidate reconfiguration actions, and it then lets the LLM audit and, when necessary, modify the remaining safe actions in light of the operator’s instructions. In this framework, the LLM operates only on safety-filtered candidates, so all executed actions satisfy network constraints while being aligned with natural language instructions.
- Hour-level dynamic DNR simulations are conducted on an IEEE 16-bus distribution network and an IEEE 33-bus distribution network with high PV penetration and stochastic uncertainties. The proposed DRL and LLM instruction-supervised framework is compared with a conventional model-based DNR method and a pure DRL-based DNR baseline. The results demonstrate that the proposed method simultaneously improves loss reduction and voltage quality, controls the number of switching operations, and significantly enhances instruction satisfaction, illustrating the practical value of LLM-based human–AI collaboration in active distribution network operation.
2. Framework Description
2.1. DRL-Based Base Policy Layer
2.2. LLM-Based Instruction Supervision Layer
- The action proposed by the DRL agent.
- System information summarizing the current operating condition of the distribution network.
- The operator’s natural language instructions.
2.3. Distribution Network Environment and Information Flow
2.4. Operating Modes
- Autonomous mode: When no natural language instruction is issued, the DRL agent operates as a conventional dynamic DNR controller. Its output is applied to the distribution network (after basic feasibility checks embedded in the environment), and the LLM agent is effectively bypassed.
- Instruction-driven mode: When the operator issues temporary natural language instructions, the LLM agent is activated. The LLM evaluates and revises the DRL-generated action based on both system information and the instructions, and the revised action is implemented. In this mode, human operators can flexibly impose short-term preferences or operational policies without changing the DRL model.
3. Formulation and DRL Solution
3.1. Dynamic DNR Objective and Operational Constraints
3.1.1. Dynamic Multi-Term Objective
3.1.2. Power Balance and DistFlow Model
3.1.3. Operational Limits
3.1.4. Radial Topology Constraints
3.2. MDP Formulation for DRL-Based Dynamic DNR
3.2.1. State Space
3.2.2. Action Space and Topology Projection
3.2.3. State Transition
- Determine from according to (24), and, if necessary, project infeasible choices back to to respect (17)–(21).
- Solve the distribution power-flow model under and the realized load and RES profiles at time , enforcing (6)–(16) and (19)–(21), and obtain , as well as curtailment and shedding levels.
- Sample the next-step exogenous profiles from the scenario model.
- Construct the next state as (25).
3.2.4. Reward Design
3.2.5. Episodic Return
3.3. Soft Actor-Critic for Dynamic DNR
4. LLM-Based Natural Language Instruction Supervision
4.1. Information Flow and Module Roles
4.2. Prompt Design and Instruction Understanding
4.3. LLM-Supervised Action Revision
| Algorithm 1: Deterministic safety filtering for LLM-revised actions | ||
| Input: System-condition summary at time ; operator instruction ; DRL action ; network graph data and switch metadata. | ||
| Output: Executed action . | ||
| 1. Query the LLM with , the system-condition summary, and . Obtain a compliance judgement and an optional revision . | ||
| 2. If compliant, set . | ||
| 3. Otherwise, validate the revised action : | ||
| 3.1 Parse . If the format is invalid, set . | ||
| 3.2 Check basic operability (all referenced switches are controllable and available at time ). If failed, set . | ||
| 3.3 Check radiality consistency using a deterministic graph test (cycle-free and connected topology after applying ). If failed, set . | ||
| 3.4 If all checks pass, set . | ||
| 4. Execute after standard electrical feasibility checks. | ||
4.4. Quantitative Evaluation of Instruction Compliance
4.4.1. Instruction Grounding and Clause Representation
4.4.2. Clause-Level Satisfaction with Partial and Conditional Fulfilment
- (i)
- Prohibition clause.
- (ii)
- Switching-budget clause.
- (iii)
- Performance clause.
- (iv) Conditional clause.
4.4.3. Compound Instructions and Priority-Aware Aggregation
4.4.4. Fallback Rate for Auditable Robustness
5. Case Study
5.1. Case Study on the IEEE 16-Bus Distribution System
5.1.1. System Description and Simulation Settings
5.1.2. Training Performance of DRL Agents on the IEEE 16-Bus Feeder
5.1.3. Baseline Comparison Without Natural Language Instructions
5.1.4. Instruction-Aware Operation with SAC and LLM
5.2. Case Study on the IEEE 33-Bus Distribution System
6. Discussion
6.1. On Stochastic Operating Conditions and Distribution Shift
6.2. Reproducibility of the LLM-Based Supervisory Layer
6.3. Scalability Considerations
- (1)
- Token-efficient full-state serialization
- (2)
- Partitioned multi-agent deployment
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Nomenclature
| Indices and Sets | |
| Indices of buses | |
| Time index within the operating horizon | |
| Set of buses | |
| Set of candidate branches | |
| Set of branches equipped with remotely controllable switches, | |
| Set of buses directly downstream of bus in the radial tree | |
| Number of decision steps in the operating horizon | |
| State space of the MDP formulation | |
| Action space of the DRL agent | |
| Replay buffer storing transitions | |
| Bus set associated with performance clause | |
| Parameters | |
| Time step length | |
| Active and reactive demand at bus and time | |
| Active and reactive power of RESs at bus and time | |
| Reference voltage magnitude at bus | |
| Lower and upper bounds of voltage magnitude at bus | |
| Apparent power rating (thermal limit) of branch | |
| Resistance and reactance of branch | |
| Energy price for active power purchased from the upstream grid | |
| Weights of loss, voltage deviation, switching cost | |
| Weights of topology deviation, branch overload, and load shedding | |
| Penalty coefficient for topology deviation on branch | |
| Operation cost coefficient of the controllable switch on branch | |
| Minimum and maximum active and reactive power output of controllable generation | |
| Maximum allowed number of switching operations | |
| Maximum cumulative number of switching operations for branch | |
| Discount factor in the SAC updates | |
| Temperature parameter in SAC that balances expected return | |
| Temperature factor of SAC | |
| LLM sampling temperature | |
| top-p | LLM nucleus sampling parameter |
| Variables | |
| Active power purchased from the upstream grid at time | |
| Total active power loss of the distribution network at time | |
| Active and reactive power output of controllable generation at bus and time | |
| Active load shedding at bus and time | |
| Active and reactive power flow on branch at time | |
| Voltage magnitude at bus and time | |
| Binary energization status of branch at time | |
| Vectors collecting nodal active and reactive demands at time | |
| Vectors collecting nodal RES injections at time | |
| Vector of selected branch current and voltage magnitudes at time | |
| Voltage deviation index | |
| Switching cost index | |
| Topology deviation index | |
| Branch-level overload/violation-related penalty | |
| MDP state at time | |
| Multi-binary action vector at time | |
| Immediate reward at time | |
| Expected discounted return associated with policy | |
| Stochastic policy of the SAC agent with parameters | |
| Soft Q-functions of the SAC agent | |
| Parameter vector of the policy network | |
| Policy objective used to update | |
| Active natural language instruction at time | |
| Priority tiers of clauses (hard prohibitions/conditional or safety-relevant rules/soft preferences) | |
| Actionable indicator | |
| Actionable rate over window | |
| Per-step satisfaction score of clause | |
| Per-step satisfaction score of clause | |
| Violation indicator for the prohibition clause | |
| Switching count and budget for clause | |
| Deviation metric for the performance clause | |
| Trigger indicator for the conditional clause | |
| Tier-wise compliance vector | |
| Tier-wise satisfaction | |
| Fallback indicator | |
| Fallback rate over window |
References
- International Energy Agency. Renewables 2022: Analysis and Forecast to 2027; IEA: Paris, France, 2022. [Google Scholar]
- Lotfi, H.; Hajiabadi, M.E.; Parsadust, H. Power distribution network reconfiguration techniques: A thorough review. Sustainability 2024, 16, 10307. [Google Scholar] [CrossRef]
- Behbahani, M.R.B.; Jalilian, A.; Bahmanyar, A.; Ernst, D. Comprehensive review on static and dynamic distribution network reconfiguration methodologies. IEEE Access 2024, 12, 9510–9525. [Google Scholar] [CrossRef]
- Ushashree, G.S.; Kumar, N. Power system reconfiguration in distribution system for loss minimization using optimization techniques: A review. Wirel. Pers. Commun. 2023, 128, 1907–1940. [Google Scholar] [CrossRef]
- Liu, Q.; Zhang, L.; Ji, X.; Shi, H. Dynamic reconfiguration of distribution network considering the uncertainty of distributed generation and loads. Front. Energy Res. 2023, 11, 1279579. [Google Scholar] [CrossRef]
- Yu, Y.; Yang, M.; Zhang, Y.; Ye, P.; Ji, X.; Li, J. Fast reconfiguration method of low-carbon distribution network based on convolutional neural network. Front. Energy Res. 2023, 11, 1102949. [Google Scholar] [CrossRef]
- Gao, Y.; Wang, W.; Shi, J.; Yu, N. Batch-constrained reinforcement learning for dynamic distribution network reconfiguration. IEEE Trans. Smart Grid 2020, 11, 5357–5369. [Google Scholar] [CrossRef]
- Bui, V.-H.; Su, W. Real-time operation of distribution network: A deep reinforcement learning-based reconfiguration approach. Sustain. Energy Technol. Assess. 2022, 50, 101841. [Google Scholar] [CrossRef]
- Gholizadeh, N.; Musilek, P. A generalized deep reinforcement learning model for distribution network reconfiguration with power-flow-based action-space sampling. Energies 2024, 17, 5187. [Google Scholar] [CrossRef]
- Jiang, S.; Gao, H.; Wang, X.; Liu, J.; Zuo, K. Deep reinforcement learning based multi-level dynamic reconfiguration for urban distribution network: A cloud-edge collaboration architecture. Glob. Energy Interconnect. 2023, 6, 1–14. [Google Scholar] [CrossRef]
- Kundačina, O.B.; Vidović, P.M.; Petković, M.R. Solving dynamic distribution network reconfiguration using deep reinforcement learning. Electr. Eng. 2022, 104, 1487–1501. [Google Scholar] [CrossRef]
- Shalaby, A.A.; Kirschen, D.S.; Zhang, B. Safe deep reinforcement learning for resilient self-proactive distribution grids against wildfires. IEEE Trans. Smart Grid, 2025; early access. [Google Scholar] [CrossRef]
- Su, T.; Teng, F.; Strbac, G. A review of safe reinforcement learning methods for modern power systems. Proc. IEEE 2025, 113, 213–255. [Google Scholar] [CrossRef]
- Zhang, J.; Liu, X.; Li, Y.; He, H. Networked multiagent safe reinforcement learning for low-carbon demand management in distribution network. arXiv 2023, arXiv:2311.15594. [Google Scholar] [CrossRef]
- Yan, Z.; Xu, Y. Real-time optimal power flow with linguistic stipulations: Integrating GPT-agent and deep reinforcement learning. IEEE Trans. Power Syst. 2024, 39, 4747–4750. [Google Scholar] [CrossRef]
- Zhang, L.; Yue, D.; Hancke, G.P.; Dou, C.; Yu, L.; Chen, Z. Optimization of energy and carbon emissions in integrated energy system based on deep reinforcement learning assisted by large language model. IEEE Trans. Ind. Inform. 2025, 21, 8186–8197. [Google Scholar] [CrossRef]
- Cheng, Y.; Zhao, H.; Zhou, X.; Zhao, J.; Cao, Y.; Yang, C.; Cai, X. A large language model for advanced power dispatch. Sci. Rep. 2025, 15, 8925. [Google Scholar] [CrossRef] [PubMed]
- Tan, L.; Wang, J.; Li, Z. Large language model based framework for secure operation of power systems. In Proceedings of the 3rd International Conference on Power System and Electrical Technology (PSET), Tokyo, Japan, 5–8 August 2024; pp. 1–6. [Google Scholar]
- Yang, Y.; Yang, X.; Lin, C.; Wu, W. Large language model powered automated modeling and optimization of active distribution network dispatch problems. IEEE Trans. Smart Grid, 2025; early access. [Google Scholar] [CrossRef]
- Zhang, C.; Wang, Y.; Strbac, G.; Dong, Z. Large language models meet energy systems: Opportunities, challenges, and future perspectives. Appl. Energy 2026, 403, 127076. [Google Scholar] [CrossRef]









| Parameter | Value | Parameter | Value |
|---|---|---|---|
| Number of buses | 16 | Sectionalizing switch number | 13 |
| Number of branches | 16 | Tie switch number | 4 |
| Base voltage | 12.66 kV | Number of PV units | 6 |
| Base power | 10 MW | Buses with PV units | 4, 6, 8, 11, 13, 15 |
| Parameter | Value | Parameter | Value |
|---|---|---|---|
| Actor Learning Rate | 1 × 10−4 | Buffer Size | 10,000 |
| Critic Learning Rate | 1 × 10−4 | Minimal Size | 200 |
| Alpha Learning Rate | 1 × 10−4 | Discount Factor | 0.99 |
| Episode Number | 2000 | Temperature Factor | 0.05 |
| Batch Size | 64 | Target Entropy | −1.0 |
| LLM Model | Moonshot-v1-32k | Context Window | 32k |
| Top-p | 0.7 | Temperature | 0.3 |
| Metric | Model-Based | Heuristic | AC | SAC |
|---|---|---|---|---|
| Daily purchasing cost (103 $) | 18.72 | 19.21 | 20.05 | 18.79 |
| Daily energy loss (MWh) | 9.50 | 10.65 | 11.95 | 9.64 |
| Avg. voltage deviation (p.u.) | 0.0287 | 0.0412 | 0.0605 | 0.0293 |
| Max. voltage deviation (p.u.) | 0.0450 | 0.0635 | 0.0803 | 0.0474 |
| Daily switching operations | 10 | 12 | 14 | 12 |
| Computation time per day (s) | 12.4 | 15.7 | 0.18 | 0.19 |
| Category | Metric | SAC | SAC and Rule Supervisor | SAC and LLM |
|---|---|---|---|---|
| Conventional system metrics | Daily purchasing cost (103 $) | 18.79 | 18.83 | 18.83 |
| Daily energy loss (MWh) | 9.64 | 9.88 | 9.72 | |
| Load shedding (MWh) | 0.00 | 0.00 | 0.00 | |
| Branch overloading cost (103 $) | 0.00 | 0.00 | 0.00 | |
| Avg. voltage deviation (p.u.) | 0.0293 | 0.0321 | 0.0295 | |
| Max. voltage deviation (p.u.) | 0.0474 | 0.0498 | 0.0478 | |
| Daily switching operations | 12 | 12 | 12 | |
| Computation time per day (s) | 0.09 | 0.26 | 0.30 | |
| Natural language instruction satisfaction metrics | Actionable rate | - | 1 | 1 |
| Violation hours of line (5,11) within (h) | 6 | 0 | 0 | |
| Tier-1 prohibition compliance (%) | 14.3 | 100 | 100 | |
| Fallback rate (%) | - | 0 | 0 |
| Metric | Model-Based | AC | SAC |
|---|---|---|---|
| Daily purchasing cost (103 $) | 2.74 | 2.82 | 2.76 |
| Daily energy loss (MWh) | 4.80 | 5.20 | 4.86 |
| Avg. voltage deviation (p.u.) | 0.024 | 0.027 | 0.025 |
| Max. voltage deviation (p.u.) | 0.066 | 0.072 | 0.068 |
| Daily switching operations | 10 | 9 | 8 |
| Computation time per day (s) | 42.7 | 0.62 | 0.58 |
| Category | Metric | SAC | SAC and LLM |
|---|---|---|---|
| Conventional system metrics | Daily purchasing cost (103 $) | 2.76 | 2.80 |
| Daily energy loss (MWh) | 4.86 | 4.98 | |
| Avg. voltage deviation (p.u.) | 0.025 | 0.024 | |
| Min. voltage at buses 29–33 (p.u.) | 0.936 | 0.953 | |
| Hours with (V < 0.95) p.u. at buses 29–33 | 7 | 2 | |
| Daily switching operations | 8 | 9 | |
| Computation time per day (s) | 0.6 | 2.4 | |
| Natural language instruction satisfaction metrics | Actionable rate | - | 1 |
| Triggered hours within (overload or voltage violation) (h) | 1 | 1 | |
| Non-essential lateral switching hours within (h) | 2 | 0 | |
| Tier-2 conditional lateral compliance (%) | 60 | 100 | |
| Tier-3 voltage-quality compliance (%) | 72.0 | 80.0 | |
| Fallback rate (%) | - | 0 |
| Metric | Normal | PV Forecast Error | Cloud-Transient |
|---|---|---|---|
| Daily purchasing cost (103 $) | 2.76 | 2.82 | 2.91 |
| Daily energy loss (MWh) | 4.86 | 4.93 | 4.98 |
| Avg. voltage deviation (p.u.) | 0.025 | 0.031 | 0.044 |
| Max. voltage deviation (p.u.) | 0.068 | 0.082 | 0.095 |
| Daily switching operations | 8 | 8 | 9 |
| Computation time per day (s) | 0.58 | 0.66 | 0.61 |
| Category | Metric | Repeatability Result (Mean ± Std) |
|---|---|---|
| Conventional system metrics | Daily purchasing cost (103 $) | 2.64 ± 0.12 |
| Daily energy loss (MWh) | 4.80 ± 0.07 | |
| Avg. voltage deviation (p.u.) | 0.041 ± 0.008 | |
| Min. voltage at buses 29–33 (p.u.) | 0.953 ± 0.012 | |
| Hours with (V < 0.95) p.u. at buses 29–33 | 2.0 ± 0.0 | |
| Daily switching operations | 6.3 ± 2.5 | |
| Computation time per day (s) | 2.404 ± 0.115 | |
| Natural language instruction satisfaction metrics | Actionable rate | 1 ± 0.0 |
| Triggered hours within (overload or voltage violation) (h) | 0 ± 0.0 | |
| Non-essential lateral switching hours within (h) | 0 ± 0.0 | |
| Tier-2 conditional lateral compliance (%) | 100 ± 0.0 | |
| Tier-3 voltage-quality compliance (%) | 91.7 ± 0.0 | |
| Fallback rate (%) | 0 ± 0.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, H.; Zhou, H. LLM-Based Dynamic Distribution Network Reconfiguration with Distributed Photovoltaics. Electronics 2026, 15, 566. https://doi.org/10.3390/electronics15030566
Zhang H, Zhou H. LLM-Based Dynamic Distribution Network Reconfiguration with Distributed Photovoltaics. Electronics. 2026; 15(3):566. https://doi.org/10.3390/electronics15030566
Chicago/Turabian StyleZhang, Hanxin, and Hao Zhou. 2026. "LLM-Based Dynamic Distribution Network Reconfiguration with Distributed Photovoltaics" Electronics 15, no. 3: 566. https://doi.org/10.3390/electronics15030566
APA StyleZhang, H., & Zhou, H. (2026). LLM-Based Dynamic Distribution Network Reconfiguration with Distributed Photovoltaics. Electronics, 15(3), 566. https://doi.org/10.3390/electronics15030566
