1. Introduction
As the critical link between the power grid and end-users, distribution systems are undergoing a profound transformation characterized by “high penetration of renewable energy and high levels of power electronics” [
1]. Driven by China’s carbon neutrality goals, the installed capacity of distributed generation (DG) has surged; renewable capacity is projected to exceed 1.7 TW by 2025, raising its share by 16 percentage points compared to 2019 [
2]. Meanwhile, total electricity consumption reached 9.22 trillion kWh in 2023 and is expected to rise to 9.8 trillion kWh in 2024, placing unprecedented demands on system reliability and flexibility [
3]. The randomness and volatility of wind and photovoltaic (PV) generation have fundamentally disrupted the traditional radial, unidirectional power flow, transforming distribution grids into active, bidirectional “source-grid-load-storage” coupled systems [
4,
5]. This shift severely challenges rapid fault recovery and self-healing control: approximately 80% of all outages originate from distribution system failures, and fault-related outages account for 40.16% of total outage duration [
2,
4]. Extreme weather events—typhoons, ice storms, heatwaves—further highlight the urgency of resilience, as these high-impact low-probability (HILP) events can cause widespread, long-duration outages with catastrophic consequences [
6]. Traditional fault restoration methods are therefore increasingly inadequate for modern active distribution networks. In response, recent policies have explicitly prioritized the construction of safe, efficient, low-carbon, flexible, and intelligent distribution systems with enhanced self-healing capabilities [
7].
Traditional fault restoration relies heavily on centralized decision-making, whose limitations have grown pronounced under the “dual-high” context. First, the exponential increase in operating states driven by high DG penetration escalates computational complexity, challenging real-time requirements [
8]. While mixed-integer linear programming (MILP) and second-order cone programming (MISOCP) can provide globally optimal solutions, they suffer from the curse of dimensionality in large networks, with solving times rising exponentially with system size [
9]. Metaheuristics such as genetic algorithms and particle swarm optimization (PSO) offer faster computation but are prone to premature convergence and lack rigorous convergence guarantees [
10]. Expert systems, though intuitive, rely on manually curated rule bases that are difficult to adapt to dynamic topologies and uncertain DG outputs [
4,
9]. Second, deterministic models fail to adequately capture the stochastic nature of DG and load fluctuations, leading to restoration plans that may be infeasible or suboptimal under real-world variability [
11]. Third, centralized architectures demand high communication bandwidth and reliability, are vulnerable to single-point failures, and raise data privacy and security concerns in multi-stakeholder environments [
12].
In this context, the concept of grid-based operation offers a promising paradigm for distribution network fault recovery. The three-layer spatial information grid architecture [
13] and grid GIS heterogeneous resource integration [
14] established the theoretical and technical basis for dynamic power grid modeling. Gao’s open grid service system advanced dynamic grid standardization, though its fixed-boundary assumption limited adaptability to post-fault topology changes [
15]. Wang et al. identified three bottlenecks of static partitioning and stressed the need for dynamic operational grids (DOGs) to accommodate DG fluctuations and real-time control [
16]. Zhao et al. integrated dynamic grid construction with load restoration but overlooked the synergy between topology reconfiguration and source-load matching [
17]. Dong et al.’s “grid-subgrid-node” three-level framework provided architectural support for cross-level resource sharing and coordinated restoration in DOGs [
18]. The deployment of flexible interconnection devices such as soft open points (SOPs) has further enhanced operational flexibility by enabling continuous power flow control between feeders, although high investment costs and complex control remain barriers [
19].
At the algorithm level, adapting intelligent optimization to the dynamic and uncertain nature of DOGs is a key focus. Ji et al. introduced multi-agent immune algorithms for self-healing but could not address the real-time bottleneck in large-scale dynamic grids [
20]. The finite-state machine model of Binb et al. lacks economic optimization and struggles with multi-objective restoration requirements [
21]. Fan’s multi-timescale load restoration and Xu’s two-level optimized scheduling provide important references for hierarchical restoration and source-storage coordination in DOGs [
22,
23]. Yuan’s improved hippopotamus optimization algorithm offers an efficient solution for the multi-objective DOG partitioning problem [
24]. Deep reinforcement learning (DRL) has shown promise in handling high-dimensional state spaces and real-time uncertainty, with algorithms such as DQN, graph reinforcement learning, and SAC applied to restoration tasks; however, DRL requires extensive training data, lacks interpretability, and can generate unsafe actions that violate constraints [
25]. Uncertainty handling remains critical: stochastic programming depends heavily on accurate prior distributions, robust optimization can be overly conservative, and chance-constrained programming balances robustness and economy but is computationally expensive in high-dimensional settings [
26].
Moreover, mobile flexibility resources have opened new avenues for post-disaster restoration. Mobile energy storage systems (MESS) and electric vehicles (EVs) provide spatial–temporal flexibility by relocating to affected areas, complementing stationary resources [
27]. Song et al. proposed a bi-level restoration method coordinating switches, SOPs, and MESS, significantly improving restoration efficiency [
28]. Kang et al. extended this to DC distribution networks, exploiting synergies between EV charging/discharging and SOP-based reconfiguration [
29]. Yang et al. combined SOP control with maintenance scheduling to optimize both repair sequences and restoration actions [
30]. On the demand side, integrated demand response (IDR) programs that leverage flexible loads can adjust load profiles to match available generation, enhancing restoration [
31]. Ge et al. demonstrated that combining IDR with network reconfiguration and emergency power supplies markedly improves the resilience of park-level distribution networks [
32]. Despite these advances, most existing grid-based restoration strategies treat grid boundaries as static or semi-static, failing to fully exploit the synergies between dynamic grid partitioning and the spatial–temporal flexibility of mobile and demand-side resources.
Thus, three major challenges persist in grid-based operation for new distribution systems. First, traditional optimization algorithms lack sufficient adaptability under dynamic grid partitioning and multi-objective constraint coupling, struggling to meet both the stringent real-time requirements of critical load restoration and the economic efficiency of general load restoration. Second, the control-execution coordination in hierarchical architectures remains immature, with temporal discrepancies between centralized optimization decisions and distributed local execution frequently causing control errors and suboptimal performance. Third, collaborative research on multi-timescale control and dynamic topological reconfiguration is lacking, and existing methods have not effectively addressed transient instability risks during islanded-to-grid-connected mode transitions.
To address these research gaps, this paper proposes a rapid fault recovery and self-healing strategy for distribution systems based on dynamic operational grids. Theoretically and practically, this strategy is designed to explicitly advance beyond existing restoration paradigms. Unlike traditional rule-based expert systems that struggle with adaptability under high distributed generation penetration, or multi-agent algorithms that often encounter real-time computational and convergence bottlenecks in large-scale networks, the proposed approach establishes a highly efficient hierarchical mathematical balance. Furthermore, compared to conventional dynamic microgrid formations that frequently overlook the deep synergy between network reconfiguration and source-load matching, this framework guarantees macroscopic topological optimality via the optimized Floyd algorithm at the primary layer, while ensuring rapid, localized resource allocation through Dynamic Programming at the secondary layer. The main contributions include: (1) establishing a weighted graph theory model for dynamic operational grids and its construction process, thereby resolving the theoretical framework for rapid system reconfiguration after a fault; (2) designing a two-layer recovery mechanism combining primary and secondary restoration layers, achieving an effective balance between recovery speed and economic efficiency; (3) In the primary restoration layer, we innovatively combine the Floyd algorithm with an optimized adaptive DP algorithm to solve the problem of generating rapid power supply paths for critical loads; (4) In the secondary restoration layer, multiple improvements were made to the traditional particle swarm optimization algorithm, including the introduction of adaptive weights, converged particles, and a Lévy flight perturbation mechanism, effectively enhancing the algorithm’s convergence performance and global search capability. Through case studies, the effectiveness and superiority of the proposed strategy under various fault scenarios were systematically verified.
3. Emergency Control Strategy Based on Particle Preprocessing and Stepwise Optimization
In the grid-based operation of modern distribution systems, rapid self-healing is key to suppressing fault propagation. As a core component of fault response, the emergency control strategy aims to: rapidly isolate the fault, reconfigure adjacent static grids to maintain their self-balance, and construct a dynamic operating grid capable of supplying at least all Level 1 loads. This strategy minimizes the impact of the fault and rapidly restores power supply by jointly optimizing power generation output and network topology within the faulted area. This is essentially a high-dimensional mixed-integer nonlinear programming problem. This section proposes a stepwise optimization method based on particle preprocessing to enhance solution efficiency and stability under emergency conditions.
3.1. Particle Structure Design and Preprocessing
The decision space for emergency control strategies is encoded using a composite particle vector. As shown in
Figure 2, the particle
consists of three dimensions, where
represents the number of power plants, and
represents the number of branches. The particle values corresponding to these three dimensions are
:
- (1)
Active Power Output Control Dimension (): The first dimensions control the active power output percentage of the distributed power source, with a range of , corresponding to the particle value .
- (2)
Reactive Power Output Control Dimensions (): The next dimensions control the percentage of reactive power output for the distributed power source. The valid range is , corresponding to particle values .
- (3)
Branch State Control Dimensions (
): The following
dimensions control the on/off state of the
branch. A standard function maps continuous particle values
to discrete
states:
where
represents the step function,
and
represent the upper and lower bounds of the
values. This encoding method naturally integrates continuous optimization algorithms with discrete network topology decisions.
3.2. Step-by-Step Optimization Solution Strategy
Direct optimization searches on high-dimensional mixed particles are prone to convergence difficulties and excessive computational time due to strong variable coupling. To address this issue, a step-by-step preprocessing solution strategy is proposed, mathematically expressed as follows:
where
represents a particle,
represents the active power dimension of the particle,
represents the reactive power dimension of the particle,
represents the branch connection dimension of the particle, and
represents the optimization algorithm. The specific step-by-step optimization procedure is as follows:
- (1)
Step 1 Optimization: Fixed topology, optimize output. Keep the initial grid topology after the fault unchanged, and use the optimization algorithm to find the optimal power source output combination to minimize power imbalance and voltage overshoot.
- (2)
Step 2 Optimization: Fixed Output, Optimized Topology. Based on the optimal output obtained in Step 1 , the grid topology is optimized to identify network connection configurations that better support this output scheme and restore more loads, resulting in .
- (3)
Step 3 Joint Fine-Tuning. Using the high-quality solution obtained in the first two steps as the initial point, a comprehensive joint optimization fine-tuning is performed to obtain a final solution with superior overall performance.
This strategy effectively reduces the complexity of a single optimization run and, through information exchange between steps, provides an initial solution close to the optimal region for the final optimization, significantly improving the efficiency and reliability of emergency decision-making.
3.3. Objective Function
The emergency control strategy aims to generate a safe, stable, and resilient initial recovery plan for dynamically operating grids. Its objective function employs a structured, 100-point comprehensive scoring system to systematically guide the optimization algorithm in searching for the global optimal solution. This system moves away from the traditional approach of single-constraint penalization and instead establishes a multidimensional evaluation framework that integrates power supply reliability, operational safety, and system resilience. The mathematical description of the objective function is as follows:
In the equation, represents the final target score, represents the step function, represents the power output within the dynamically operated grid after partitioning, represents the load capacity within the dynamically operated grid after partitioning, represents the target score for the statically operated grid after partitioning (out of 100 points), and are the weights of the respective targets. consists primarily of three parts: (power quality assessment), (power flow out-of-limit risk assessment), and (Risk Resilience Assessment). Among these, is the voltage quality assessment metric, consisting of three components: (Online Operation Rate), (Voltage Quality), and (Frequency Quality). Specifically, is the online operation rate metric, reflecting the proportion of system nodes with normal power supply, where represents the number of normal nodes with both voltage and frequency within acceptable ranges, and represents the total number of system nodes; is a voltage quality indicator that evaluates the concentration of voltage distribution using the mean-standard deviation format. Here, represents the average voltage of all nodes, and represents the standard deviation of node voltages; a higher ratio indicates greater overall voltage stability and lower dispersion; is a frequency quality indicator used to measure the degree of deviation from the rated frequency, where represents the system’s rated frequency, and represents the system’s actual operating frequency; are the weighting coefficients for the corresponding sub-objectives, used to adjust the importance of each indicator in the overall evaluation. is the power flow out-of-limit risk assessment indicator; is the mean of the branch current; is the standard deviation of the branch current. Similarly, the greater the difference between the upper and lower limits of the branch current, the larger becomes, and the lower the power supply quality and economic efficiency of the power grid. is the risk resilience assessment indicator, is the total maximum capacity of the energy storage system within the static operating grid, is the actual active and reactive power output of the distributed power source (power station), is the total apparent power of the current output of all distributed power sources within the grid, represents the remaining available capacity (apparent power) of the energy storage system after meeting the current output of all distributed power sources, The closer the value is to 1, the greater the remaining capacity of the energy storage system and the stronger its risk resilience. Unlike the macroscopic profile weights in Equation (3), the selection logic for coefficients is strictly state-dependent, dynamically adjusting based on real-time fault severity. When post-fault voltage stability is highly compromised, is maximized; conversely, during widespread regional outages with high uncertainty, is prioritized to aggressively preserve energy storage margins.
The objective function signifies that the partitioned dynamic operating grid possesses sufficient power generation capacity to restore the primary loads within the grid, while the partitioned static operating grid can achieve self-balanced operation. Guided by this comprehensive evaluation function, the emergency control strategy can identify a preliminary restoration plan that is optimally balanced in terms of power quality, operational safety, and future risk resilience, while satisfying the hard constraints of primary load supply and system safety and stability.
4. A Rapid Self-Healing Recovery Method for Grid-Based Operation Following Faults
After successfully constructing the dynamic operational grid, the core task shifts to how to efficiently restore power supply to loads within the grid. This paper proposes a two-layer load restoration strategy, which divides the restoration process into two stages: the primary restoration layer focuses on the rapid restoration of critical loads, emphasizing speed and reliability; the secondary restoration layer is dedicated to maximizing the restoration of remaining loads, while balancing system operational economy.
The distinct conceptual advancement of this framework, compared to conventional two-stage or hierarchical restoration methods, lies in its strict mathematical coupling and boundary-locking mechanism rather than a loose sequential execution. Traditional hierarchical methods typically decouple the restoration process based on temporal scales or voltage levels, often allowing subsequent optimization stages to iteratively alter previously established power paths, which introduces computational delays and reliability risks for critical loads. In contrast, our proposed architecture utilizes a strict “Topological and Capacity Locking” paradigm. The primary layer generates a deterministic Source-Load (SL) matrix via the Floyd algorithm and allocates resources via an optimized adaptive DP algorithm. Crucially, the resulting power supply backbone acts as an immutable hard boundary—both topologically and dynamically, as formulated via the coupling constraints in
Section 4.2—for the secondary layer’s optimization space. Consequently, the secondary layer’s improved PSO algorithm explores a drastically reduced, strictly safe feasible region, ensuring absolute uninterrupted supply for Level 1 loads while fundamentally eliminating the convergence uncertainties traditionally associated with high-dimensional metaheuristic searches.
4.1. Primary Restoration Layer
The primary restoration layer is tasked with ensuring power supply to Class 1 loads. Given that optimization algorithms may take a long time to execute in emergency situations, this layer relies more heavily on predefined recovery strategies and rapid response mechanisms.
4.1.1. Shortest Path Search Based on the Floyd Algorithm
The primary restoration layer must first identify the optimal electrical path from Tier 1 load points to power source points. This paper employs the Floyd–Warshall algorithm to solve the shortest path problem between all vertex pairs in a weighted graph. Based on dynamic programming, this algorithm efficiently calculates the shortest path length between any two points and the nodes traversed along that path. The algorithm flowchart is shown in
Figure 3. To explicitly elucidate the operational process depicted in
Figure 3, the algorithm consists of several sequential blocks:
- (1)
Initialization block: The algorithm first initializes a distance matrix to store the initial edge weights between directly connected nodes, alongside a path matrix to track the sequence of intermediate nodes.
- (2)
Nested iteration blocks: The core computational structure is a triple-nested loop that systematically iterates through all possible intermediate nodes (), starting nodes (), and destination nodes ().
- (3)
Condition and update block (Relaxation): Within the nested loops, the algorithm evaluates whether the path from node to node via intermediate node yields a smaller total weight than the currently recorded path. If the condition is met, both the distance and path matrices are dynamically updated.
- (4)
Termination block: The iterative process concludes when all nodes have been comprehensively evaluated as potential intermediate points, ultimately outputting the globally optimal shortest paths for the primary restoration layer.
The state transition equation is as follows:
In the equation, the element
represents the shortest path weight from node
to node
, where the intermediate nodes belong only to the set
. The choice of edge weights affects path selection and interpretation, and can be categorized into the following cases:
In the formula, represents the available edge weights that do not require power flow calculation; the edge weights in can only be obtained after performing the power flow calculation. In , represents the equivalent resistance of the branch, and represents the equivalent impedance of the branch; In, represents the branch current, represents the branch equivalent reactance, and represent the branch reference voltage and terminal voltage, respectively, represents the branch power factor, represents the power angle, and in , the first equation represents the voltage drop, while the second equation represents the line’s transmission capacity. Depending on the specific circumstances, different edge weights can be selected as the basis for finding the minimum restoration path for a primary load.
Regarding algorithm selection and scalability, the standard Floyd-Warshall algorithm possesses a time complexity of . While this typically poses severe computational bottlenecks in centralized, large-scale distribution networks, it is exceptionally well-suited for the proposed dynamic operational grid framework. The DOG architecture inherently decentralizes large-scale systems by partitioning the extensive fault-affected areas into smaller, autonomous local grids. By executing the Floyd algorithm exclusively within the boundaries of these dimensionally reduced local grids, the effective number of nodes () is significantly constrained. This architectural “divide-and-conquer” approach circumvents the algorithmic scalability bottleneck, allowing the Floyd algorithm to guarantee mathematically optimal shortest paths within milliseconds. Consequently, this method is highly scalable and suitable for large-scale systems, as the computational burden does not grow exponentially with the total system size, but rather depends on the size of the localized dynamic grids.
4.1.2. SL Source-Load Matrix and Optimized Adaptive DP Algorithm
Using the Floyd algorithm, this paper obtained the shortest path weights from all power source nodes to all primary load nodes and stored them in the SL (Source-Load) matrix, as shown in
Figure 4. The rows of this matrix represent primary load nodes, the columns represent power source nodes, and the element
is the shortest path weight from load node
to power source node
.
Next, the problem of allocating power source capacity among different primary load nodes must be addressed. This paper models the core load restoration problem as a multi-constraint resource allocation problem and solves it using an optimized adaptive DP algorithm, whose flowchart is shown in
Figure 5.
Suppose each primary load is a special backpack, and distributed power sources are special items. Now, we need to pack the items into the backpacks, with the requirement that the volume of the items must exceed the capacity of the backpacks. If the weights of the SL matrix are treated as values, the goal is to minimize the total value while allowing items to overflow from each backpack. As items, distributed power sources are measured in units of capacity; therefore, each item can be considered as 1 unit of capacity. In the calculations, energy losses are represented by the weights of the SL matrix. Unlike the backpack problem, the constraints for the source-load matching problem are as follows:
- (1)
On each line, the power source’s capacity must allow for a certain degree of frequency and voltage regulation while ensuring the basic power supply for primary loads.
where
is the maximum allowable transmission capacity of the branch (line),
is the total apparent power of the load actually flowing through that branch, and
represents the capacity factor, which can be selected within a range of approximately
based on frequency and voltage regulation requirements. It is worth noting that, rather than employing computationally intensive stochastic mathematical models, the short-term uncertainties of distributed generation and load variability are implicitly accommodated through this capacity factor
. By reserving a deterministic 10–30% capacity margin, the primary restoration layer avoids the severe computational delays inherent in probabilistic sampling, while structurally ensuring robust power supply against real-time source-load fluctuations during the emergency response phase.
- (2)
A single line may contain multiple distributed power sources (items) and primary loads (backpacks), and the capacity of a power source can be split.
- (3)
The value of the source-load matching problem primarily lies in its economic efficiency. In this paper, the weights of the SL matrix are used to represent the value of a distributed power source per unit capacity of a primary load, i.e.,
where
is the element value at row
and column
of the source-load matrix (SL matrix), representing the value of the
distributed power source’s capacity to the
primary load.
The value of capacity to a primary load is equivalent to the benefits of reliability and economy; these factors must be considered to ensure that the algorithm yields results with the highest economic benefit and the best power quality. When there are no primary loads, the total value is 0, i.e.,:
Assuming there are m distributed power sources and n primary loads in the dynamically operating grid, the following state transition equation can be written:
where
represents the state,
is a state variable representing the maximum load capacity of each primary load, and
represents the unit capacity
, serving as the algorithm’s step size.
Strictly speaking, the proposed optimized adaptive DP algorithm provides an approximate solution rather than a mathematically guaranteed global optimum. Given that the multi-constrained source-load matching problem is inherently NP-hard, searching for an absolute global optimum would incur unacceptable computational delays during emergency fault scenarios. By selecting the local optimum (the path with the lowest electrical weight) at each capacity step-size iteration, this algorithm functions as a greedy-DP approximation. Consequently, its theoretical guarantee lies not in absolute economic optimality, but rather in strict computational boundedness and absolute supply feasibility: it guarantees the allocation of sufficient capacity for 100% of the Level 1 critical loads within millisecond-level timeframes. Once this allocation is completed, the algorithm outputs the specific combination of power source nodes matched to each primary load; by then utilizing the Floyd algorithm to backtrack and determine the specific routing, a complete and explicit set of physical power supply paths for the primary restoration layer is formally established.
To clarify the execution logic illustrated in
Figure 5, the step-by-step implementation of the optimized adaptive DP algorithm is detailed as follows:
Step 1: Step Size Determination. Compare the capacities of all available distributed generation (DG) units and primary loads to select the minimum capacity value as the allocation step size, denoted as .
Step 2: Optimal Pair Selection. Identify the minimum electrical weight in the SL matrix to determine the current optimal match. Record its row index (representing the primary load) and column index (representing the DG), and update the DP state matrix accordingly.
Step 3: Capacity Update. Deduct the step size from both the remaining capacity of the DG and the unfulfilled demand of the primary load, and proceed to the next iteration with the updated values.
Step 4: Matrix Reduction. If a primary load’s demand is fully met or a DG’s capacity is exhausted (reduced to 0), remove the corresponding row or column from the SL matrix to exclude it from further allocations.
Step 5: Termination and Backtracking. Constrained by the dynamic grid generation requirements, the total DG capacity inherently exceeds the primary load demand. The iteration terminates when all primary load demands are satisfied (i.e., relevant rows in the SL matrix are cleared). Finally, the algorithm outputs the matched DG node indices for each primary load and invokes the Floyd-Warshall algorithm to backtrack and construct the explicit physical power supply paths.
4.2. Secondary Restoration Layer
Upon the establishment of the power supply backbone for primary loads by the primary restoration layer, the secondary restoration phase is initiated. Its fundamental objective is to maximize the restoration of secondary and tertiary loads utilizing the residual generation capacity, thereby minimizing total outage losses and enhancing the economic efficiency of power restoration. To achieve this while strictly safeguarding the continuous power supply to primary loads and ensuring overall system stability, the mathematical boundary between these hierarchical layers must be rigidly defined. Consequently, the topological state and resource allocation generated by the primary layer act as immutable hard boundary conditions for the secondary layer’s optimization space. Let denote the set of all branches, denote the set of primary load nodes, and denote the set of distributed generation nodes. The coupling equations are formalized as follows:
Topological locking constraint to ensure the primary supply backbone is not disrupted:
where
represents the connection state of branch
determined by the primary layer, and
is the branch state during secondary optimization.
Primary load power conservation constraint to guarantee the absolute supply of critical loads:
where
is the active power supplied to primary load node
allocated in the first stage, and
is the active power supplied to primary load node
allocated in the second stage.
Residual generation capacity constraint for secondary and tertiary load restoration:
where
is the maximum available capacity of generator
, and
is the network loss incurred by the primary restoration backbone. The secondary optimization algorithm executes its search strictly within the feasible region defined by these coupling constraints.
4.2.1. Improved Particle Swarm Optimization (PSO) Algorithm
Since the secondary restoration layer has relatively relaxed requirements regarding restoration time but places higher demands on the economic efficiency of restoration schemes, this paper employs an improved Particle Swarm Optimization (PSO) algorithm to solve this high-dimensional, nonlinear combinatorial optimization problem. The algorithm flowchart is shown in
Figure 6.
The specific steps are as follows:
In secondary restoration, the decision variables primarily concern the power grid’s topological structure, specifically the on/off states of each branch. For a network with m controllable branches, a particle is an m-dimensional vector:
where
represents the position of the
particle regarding the state of the
branch.
Figure 7 illustrates the distribution of particles at different time points, as well as their movement trajectories within the search space. Since the line state has only two possible states—connected or disconnected—meaning the particle can only take on “0” or “1” values, a normalization function is used to map these to the actual switch states:
where
represents the step function, with 1 indicating a connected line and 0 indicating a disconnected line;
represents the maximum velocity of the
particle in the
dimension;
and
represent the upper and lower limits of the particle’s velocity, respectively, with
set to 0.5.
- (2)
Initialization and Fitness Function Calculation
Using the skeleton of the primary restoration layer as part of the initial population, the fitness value of each particle (truss structure) is calculated using the aforementioned fitness function. The mathematical expression for the fitness function is as follows:
where
is the fitness score used for algorithm optimization, and
is the score for the optimization objective, with 0 representing the worst result and 100 representing the best result.
- (3)
Particle Update
Particles are updated according to the improved velocity and position formulas, and particles that meet the conditions undergo Lévy flight perturbations via the improved velocity formula to avoid becoming stuck in local optima. The position formula is as follows:
where
is the position of the
particle before the update,
is the position of the
th particle after the update, and
represents the velocity of the
particle.
- (4)
Iterate using the improved strategy
Repeat the evaluation and update process until convergence or the maximum number of iterations is reached. By continuously adjusting their positions, the particles gradually approach the optimal solution, and the optimal truss reconstruction scheme is output. The iteration formula for the optimized particle swarm algorithm is as follows:
In the equation, represents a contracting particle, denotes the velocity of the particle, is the set minimum velocity threshold, and represents the maximum velocity of the particle in the high-dimensional search space, where is the maximum velocity in each dimension. represents a random real number in the interval , represents the position of the particle, represents the optimal position of the particle, and represents the optimal position found among all particles. , , and are three weights representing inertial weight, individual weight, and social weight, respectively. and represent the current iteration number and total iteration number, respectively, is the random jump term introducing the Lévy distribution, and is the conditional trigger factor for the Lévy flight perturbation. The Lévy flight perturbation is introduced only when the particle velocity is less than the set minimum velocity threshold and the random term is greater than 0.5 (i.e., with a 50% probability). This ensures that strong perturbations are applied only to particles that have stagnated (become trapped in a local optimum), making the strategy for escaping local optima more flexible and robust.
4.2.2. Truss Reconstruction and Objective Function
The grid restructuring of the secondary restoration layer is based on the power supply backbone constructed by the primary restoration layer, and an optimal topological structure is generated using an improved PSO algorithm. The flowchart of the grid restructuring method is shown in
Figure 8. This restructuring problem can be formulated as a mixed-integer nonlinear programming model:
where
represents the total active power network losses of the system, serving as the economic objective of this optimization model; minimizing this value improves operational efficiency.
represents the number of circuit breaker operations required during topological reconstruction, serving as the operational cost and speed objective of this model; minimizing this value reduces equipment switching losses and accelerates the restoration process.
are the economic weighting coefficients for the restoration capacities of secondary and tertiary loads, respectively.
represent the total restored capacities of secondary and tertiary loads, respectively; maximizing their weighted sum aims to maximize the comprehensive value of restored load.
is the current available output capacity of distributed generation and energy storage within the dynamically operating grid, and
is the total demand capacity of loads to be restored.
represent the system node voltage and system frequency, respectively.
The objective function can be expressed as:
where
is the comprehensive optimization objective function for secondary-level load restoration and grid reconfiguration.
is the voltage quality evaluation metric, as described in
Section 3.3.
is the economic objective function, which comprehensively evaluates load restoration effectiveness, operational losses, and operational costs.
represent the restored capacities of secondary and tertiary loads, respectively, while
represent the total capacities of secondary and tertiary loads to be restored;
represents the total active power network losses of the reconstructed system;
represent the actual number of circuit breaker operations required for this reconstruction and the maximum allowable number of operations, respectively;
is the weighting coefficient for the internal sub-objectives of
, corresponding to the restoration of secondary loads, the restoration of tertiary loads, network losses, and the cost of switching operations, respectively. The parameter-setting rationale for these four internal factors reflects a fine-grained hierarchy of non-critical operational priorities. Generally, secondary load restoration is assigned the highest weight among the four, followed by tertiary loads, while network losses and switching operation costs are assigned lower weights to prevent operational penalties from inadvertently restricting load recovery.
are the weighting coefficients assigned to
and
, respectively, satisfying the normalization constraint and used to balance the trade-off between power supply quality and restoration economy.
5. Case Study Analysis
To verify the effectiveness of the dynamic operational grid construction method and the two-level self-healing restoration strategy proposed in this paper, a simulation analysis was conducted using a 56-node novel distribution system adapted from real-world data from a specific location. The simulation platform used was MATLAB R2021b. The detailed network topology diagram and the corresponding comprehensive electrical parameters of the 56-node system are provided in
Appendix A.
5.1. Performance Analysis of the Improved PSO Algorithm
To verify the overall performance of the improved PSO algorithm proposed in
Section 4.2.1 when solving the high-dimensional mixed-integer programming problem of emergency control strategy, this section analyzes simulation results based on standard test functions. The population size is set to 50, and the maximum number of iterations (
) is set to 100. The stopping criterion is defined as either reaching
or observing no improvement in the global best fitness value for 20 consecutive iterations. Regarding the parameter initialization ranges, the time-varying inertia weight
is initialized at 0.9 and dynamically decreases to 0.4. The individual and social learning factors (
and
) are initially set to 2.0 and dynamically adjusted within the range of [0.5, 2.5] during the iterative process. The particle positions are strictly mapped to discrete binary states {0, 1} representing the switch states, while the minimum velocity threshold
for triggering of the Lévy-flight perturbation is empirically set to 10% of the maximum velocity
. These specific hyperparameters were empirically calibrated to balance convergence quality with the stringent time constraints of emergency restoration. This configuration ensures the total optimization runtime remains strictly within the 1 s margin (as subsequently verified in
Section 5.3.2), while the defined perturbation probability prevents premature convergence without excessively disrupting the algorithmic search trajectory.
Figure 9 compares the convergence curves of the four PSO variants during the optimization process, clearly revealing the mechanisms by which each improvement strategy affects algorithm performance.
In
Figure 9, PSO1 employs time-varying inertia weights; PSO2 adds converged particles based on PSO1; PSO3 adopts time-varying individual and social weights based on PSO2 and uses them to update the convergence factor; PSO4 introduces an escape mechanism to prevent population stagnation based on PSO3 and uses Lévy flight to jump particles that have become stationary.
The results show that time-varying inertial weights accelerate particle convergence but weaken the final optimization performance; the more time-varying weights are used, the faster the convergence, but the poorer the final optimization performance. The convergence particles weaken the convergence acceleration effect but significantly enhance the optimization performance; the perturbation mechanism allows the algorithm to continuously update.
5.2. Dynamic Mesh Partitioning Validation
To evaluate the adaptability of the dynamic mesh generation method, three typical fault scenarios were set up:
- (1)
Scenario A (Single-point Fault): A short-circuit fault occurs in Feeder 57 (Nodes 29–42).
- (2)
Scenario B (Multi-point Fault): Feeder 12 (nodes 12–13) and Feeder 13 (nodes 13–14) fail simultaneously, causing the static operational grid to be divided into three power supply zones.
- (3)
Scenario C (Regional Power Outage): Due to a typhoon, multiple lines—including Feeder 4, Feeder 5, and Feeder 10—are severed, causing all wind turbines in the region to shut down. The regional power supply stations primarily consist of small hydropower plants, photovoltaic power plants, and substations connected to the main grid.
Based on the theory and methods of dynamic operational grid partitioning, the results of dynamic grid partitioning under the three scenarios—single-point failure, multi-point failure, and regional power outage—are shown in
Figure 10,
Figure 11 and
Figure 12. In the figure, yellow nodes indicate nodes with primary loads; nodes with red borders indicate nodes with distributed power sources; white nodes are regular nodes; the green numbers on the lines represent branch numbers. Additionally, red dashed lines indicate the specific fault locations, and the distinct colored areas represent the dynamically partitioned grids.
An examination of the power flow results in
Figure 13,
Figure 14 and
Figure 15 reveals that Scenarios A, B, and C all complete the power quality regulation tasks for each isolated island within the capacity limits of the regional power sources, and are able to temporarily maintain the stability of the local grid after fault clearance. Analysis of Scenario C reveals that during regional power outages—particularly those caused by natural disasters—the power capacity required to maintain local grid stability far exceeds that needed for single-point or multi-point faults, especially in the affected areas. Therefore, under conditions of insufficient capacity, priority must be given to ensuring the restoration and operation of critical loads.
5.3. Evaluation of the Two-Layer Load Restoration Strategy
To fully demonstrate the effectiveness of the two-layer load restoration strategy under extreme conditions, this test simulates a severe scenario: extreme weather causes multiple line failures, while wind turbines and some energy storage systems and power plants are unable to operate, reducing the total available capacity of regional distributed power sources to 155 MVA—far below the total regional load demand of 302.65 MVA.
5.3.1. Results of the Primary Restoration Layer
First, the comprehensive electrical distance from all power source nodes to primary load nodes was calculated using the Floyd algorithm, generating an SL source-load matrix. The results are visualized in
Figure 16.
Subsequently, the optimized adaptive DP algorithm was applied to calculate the minimum-weight path combinations required to supply all primary load nodes. The results of the primary restoration layer are shown in
Figure 17. In the diagram, yellow nodes indicate nodes with primary loads; nodes with red borders indicate nodes with distributed power sources; white nodes are regular nodes; the numbers on the lines represent branch numbers; and the numbers inside the black circles represent the grid numbers formed by the primary restoration layer.
This scheme successfully identified feasible power sources for all primary load nodes and established power supply paths. The scheme generation time for the primary restoration layer was approximately 25.74 ms, achieving a millisecond-level rapid response.
5.3.2. Results of the Secondary Restoration Layer
The results from the primary restoration layer are used as the initial input for the secondary restoration layer. An improved PSO algorithm is employed for optimization, with a population size of 50 and a maximum of 100 iterations. The optimization objective is to maximize the restoration of secondary and tertiary loads while ensuring voltage quality and minimizing switching operations.
The final restoration results of the secondary restoration layer are shown in
Figure 18. Despite severe constraints on power supply capacity, the algorithm still identified an optimal network structure that maximized the restoration of other loads while ensuring the full restoration of primary loads. The corresponding distributed power generation output plan and power flow calculation results validated the feasibility of this solution, with system voltage maintained within a stable range.
The power flow calculation results after restoration are shown in
Figure 19, and the restoration status statistics for each load level are presented in
Table 1. The results indicate that the proposed strategy performs excellently under extreme conditions: the restoration rate for Level 1 loads reaches 100%, ensuring power supply to the most critical users; the restoration rate for Level 2 loads is as high as 90.88%; and despite a massive capacity shortfall, 44.63% of Level 3 loads were restored, significantly reducing outage losses.
In terms of computational efficiency, a single iteration of the secondary restoration layer (including power flow calculation and fitness evaluation) takes an average of approximately 10 ms. The improved PSO algorithm converges within 50 iterations, with a total runtime of approximately 1 s for 100 iterations. Considering real-time requirements, a local optimal solution suitable for engineering applications can be obtained in about 120 ms, fully meeting the expected goal of “minute-level restoration.”
5.4. Statistical Performance, Computational Complexity, and Scalability Analysis
While
Section 5.2 and
Section 5.3 detailed the specific step-by-step execution of the proposed method under individual fault occurrences, evaluating its statistical robustness and real-time reliability requires extensive repeated testing. To systematically quantify the computational complexity and validate the execution stability, 100 repeated simulation tests were conducted for each typical fault scenario on the MATLAB platform. The unified test scenarios were configured with specific boundary conditions: the available power capacity was strictly defined (e.g., incorporating main grid capacity, distributed wind/photovoltaic power, and energy storage), and the node priorities were clearly designated, with nodes 6, 11, 25, 28, and 47 acting as the 5 secondary (Level 2) load nodes, while the remaining non-critical nodes were classified as tertiary (Level 3) loads. Regarding practical implementation requirements, the proposed framework is conceptualized to integrate as an advanced application module within existing edge-cloud Distribution Management Systems (DMS). To evaluate the ideal execution bounds of the algorithms, the test scenarios assume a robust fiber-optic Ethernet communication infrastructure, specifically characterized by an end-to-end transmission latency of
10 ms, a data sampling period of 5 ms, and zero communication packet loss. It is important to note that the algorithm modules within the primary layer (Floyd and adaptive DP) and the secondary layer operate as a continuous, tightly coupled execution sequence. Therefore, their execution times and system performance are evaluated holistically. The comprehensive statistical results are summarized in
Table 2.
As demonstrated by the statistical data, the proposed framework exhibits exceptional restoration performance. Across all 100 test iterations for varying complexities, the decision delays exhibit exceptionally low dispersion (with maximum variations constrained to approximately 30 ms). Even under severe regional power outages, the maximum recorded delay was bounded at 328 ms. Furthermore, the framework consistently maintained a 100% restoration rate for Level 1 loads and highly stable voltage profiles.
Take the single-point fault as an example. Regarding the Level 1 load restoration: immediately following a fault trigger, the primary restoration layer algorithm completes the power supply path reconfiguration in an average of 17.69 ms. All primary load nodes successfully achieve uninterrupted power supply with a 100% restoration rate, resulting in zero outage loss. Regarding the Level 2 load restoration: the secondary restoration layer algorithm efficiently completes the full-region grid optimization and reconfiguration, 4 out of the 5 secondary load nodes achieve full-capacity restoration, and 1 edge node achieves 98% capacity restoration, yielding an overall Level 2 restoration rate of 99.8%. The complete secondary load restoration process requires an average total time of 212 ms, successfully satisfying the dual objectives of minimal delay and maximum restoration. These results conclusively verify the excellent repeatability, convergence stability, and practical engineering value of the proposed ‘millisecond-level decision, minute-level restoration’ capability.
To further systematically elaborate on the computational complexity and scalability of the proposed framework, it is essential to analyze the algorithmic bounds within the context of the dynamic operational grid architecture. Theoretically, the primary layer employs the Floyd algorithm, which has a time complexity of (where is the number of nodes within a specific localized dynamic grid), followed by the adaptive DP algorithm whose complexity is polynomially bounded by the local capacity step size. The secondary layer utilizes the improved PSO, yielding a complexity of , where is the maximum number of iterations, is the population size, and represents the computational cost of the power flow evaluation.
Crucially, while standard centralized restoration methods suffer from the curse of dimensionality as the total network size increases, the DOG framework inherently ensures scalability through a “divide-and-conquer” mechanism. By partitioning extensive fault-affected areas into smaller, autonomous local grids, the effective optimization search space (
) is strictly bounded. The comprehensive statistical results in
Table 2 validate the highly consistent millisecond-to-second level execution speed and stability within such a bounded local grid scale. Because of this architectural decoupling, the computational burden does not grow exponentially with the expansion of the broader macroscopic network. Instead, relying on the proven local execution efficiency, this framework provides a robust theoretical foundation for maintaining rapid response performance even when deployed across significantly larger interconnected systems.
6. Conclusions
This paper systematically proposes a rapid recovery method based on dynamic operational grids to address the fault self-healing requirements of new-generation distribution systems under high penetration of renewable energy. The study establishes a weighted graph theory model for the dynamic operational grid and a comprehensive construction process, laying the theoretical foundation for post-fault system reconfiguration. Building on this, a novel “primary-secondary” two-layer load restoration mechanism is designed: the primary restoration layer integrates the Floyd algorithm with an optimized adaptive DP algorithm to achieve millisecond-level precise restoration of critical loads; the secondary restoration layer improves the traditional PSO algorithm by introducing adaptive weights, converging particles, and Lévy flight perturbations, effectively resolving the local optimum problem in network reconfiguration and significantly enhancing the restoration efficiency and economic performance for general loads. Simulation verification based on a 56-node system demonstrates that the proposed method exhibits strong adaptability across various fault scenarios. Notably, even under extreme capacity shortages, it ensures full restoration of critical loads while efficiently restoring a large number of secondary and tertiary loads. Ultimately, this forms a highly efficient self-healing system characterized by “second-level detection and minute-level restoration,” significantly enhancing the resilience and power supply reliability of new-generation distribution systems.
However, the practical deployment of the proposed strategy faces specific limitations that warrant further investigation. First, the current control-execution coordination mechanism assumes ideal communication conditions; in real-world smart grids, communication latency and data packet loss could impact the synchronized execution of the hierarchical restoration commands. Second, the current study primarily validates the proposed models through steady-state power flow analysis, whereas the highly dynamic transition into islanded mode inherently involves complex transient processes. Third, a comprehensive quantitative sensitivity analysis of the dynamic weighting parameters—especially regarding how varying weights influence transient stability boundaries during complex fault recoveries—remains to be systematically explored. Fourth, the undirected graph model utilized for topological reconfiguration represents a mathematical simplification tailored for computational speed, abstracting away detailed directional protection coordination and unbalanced operations. Fifth, as the primary objective of this manuscript is to establish a novel architectural paradigm for dynamic operational grids, comprehensive quantitative comparisons against standard benchmark algorithms (such as traditional FLISR or standard PSO) are relatively limited in the current scope. Sixth, while the convergence behaviors of the improved PSO variants have been evaluated, a comprehensive quantitative sensitivity analysis of its hyperparameters—such as population size, iteration limits, and Lévy-flight perturbation probabilities across varying dynamic grid scales—requires exhaustive simulation data and will be systematically conducted in our future work.
Consequently, establishing a comprehensive roadmap for future research is essential. Immediate future work will focus on addressing these practical constraints by investigating robust, delay-tolerant control algorithms to guarantee execution reliability. Concurrently, detailed electromagnetic transient (EMT) models will be developed to conduct rigorous time-domain simulations, ensuring that transient stability criteria are fully satisfied during dynamic mesh reconfigurations. Future studies will also aim to integrate directed graph theories with high-fidelity protection modeling and prioritize rigorous comparative benchmarking against existing state-of-the-art methods to further evaluate relative superiority. Additionally, as the internal scale of individual dynamic grids continues to expand in extremely large systems, integrating advanced sparse-graph algorithms (such as optimized Dijkstra or A* heuristic search) to complement the Floyd algorithm will be explored to further compress computational latency. Ultimately, by combining these EMT models and delay-tolerant mechanisms with digital twin technology, a comprehensive simulation platform can be constructed, enabling real-time evaluation of both steady-state and transient fault scenarios in realistic communication environments.