An Intelligent Dynamic Cluster Partitioning and Regulation Strategy for Distribution Networks

Liu, Keyan; He, Kaiyuan; Jia, Dongli; Zhan, Huiyu; Sheng, Wanxing; Li, Zukun; Huang, Yuxuan; Hu, Sijia; Li, Yong

doi:10.3390/en19020384

Open AccessArticle

An Intelligent Dynamic Cluster Partitioning and Regulation Strategy for Distribution Networks

by

Keyan Liu

¹

,

Kaiyuan He

¹,

Dongli Jia

¹,

Huiyu Zhan

¹,

Wanxing Sheng

¹,

Zukun Li

^2,*,

Yuxuan Huang

²,

Sijia Hu

² and

Yong Li

²

¹

China Electric Power Research Institute Co., Ltd., Beijing 100192, China

²

Department of Electrical and Information Engineering, Hunan University, Changsha 410082, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(2), 384; https://doi.org/10.3390/en19020384

Submission received: 8 December 2025 / Revised: 8 January 2026 / Accepted: 10 January 2026 / Published: 13 January 2026

(This article belongs to the Special Issue Advanced in Modeling, Analysis and Control of Microgrids)

Download

Browse Figures

Versions Notes

Abstract

As distributed generators (DGs) and flexible adjustable loads (FALs) further penetrate distribution networks (DNs), to reduce regulation complexity compared with traditional centralized control frameworks, DGs and FALs in DNs should be packed in several clusters to enable their dispatch to become standard in the industry. To mitigate the negative influence of DGs’ and FALs’ spatiotemporal distribution and uncertain output characteristics on dispatch, this paper proposes an intelligent dynamic cluster partitioning strategy for DNs, from which the DN’s resources and loads can be intelligently aggregated, organized, and regulated in a dynamic and optimal way with relatively high implementation efficiency. An environmental model based on the Markov decision process (MDP) technique is first developed for DN cluster partitioning, in which a continuous state space, a discrete action space, and a dispatching performance-oriented reward are designed. Then, a novel random forest Q-learning network (RF-QN) is developed to implement dynamic cluster partitioning by interacting with the proposed environmental model, from which the generalization and robust capability to estimate the Q-function can be improved by taking advantage of combining deep learning and decision trees. Finally, a modified IEEE-33-node system is adopted to verify the effectiveness of the proposed intelligent dynamic cluster partitioning and regulation strategy; the results also indicate that the proposed RF-QN is superior to the traditional deep Q-learning (DQN) model in terms of renewable energy accommodation rate, training efficiency, and portioning and regulation performance.

Keywords:

intelligent dynamic cluster partitioning; cluster partitioning environmental model; random forest Q-learning network; resource optimal organization; distribution network

1. Introduction

With the global energy transition, renewable energy, energy storage systems (ESSs), and flexible adjustable loads (FALs) have further penetrated into power systems on a large scale [1]; for distribution networks (DNs), they are challenged by both the source and load uncertainty and the interaction between them [2], so the traditional DN management method (i.e., the full centralized control-based method [3,4]) demonstrates limited effectiveness in addressing the regulation of DNs with a high proportion of distributed generators (DGs), ESSs, and FALs [5]. To improve the security, the control flexibility, the renewable energy resource’s utilization rate (RERUR), and the operation efficiency of DNs, the aggregation concept [6,7,8] and the related dispatching strategy are regarded as an attractive technological path; the scattered DGs, ESSs, and FALs in DNs can be organized and controlled in the form of clusters (or groups), by which the number of the central controller’s control object can be significantly reduced to the number of the cluster, the controller of each cluster can only focus on regulating resources within the cluster, and all the cluster controllers can work in parallel; hence, more flexible, robust, and efficient operation performance can be obtained [9]. However, how to achieve this aggregation/group process, ensuring that the aggregators can intelligently respond to grid demands while optimizing resource allocation, is a new and crucial technological challenge in realizing the above concept.

For the control and allocation of the DN’s resources, conventional control algorithms [10,11,12] always provide deterministic optimal or quasi-optimal solutions for every controllable resource. However, with the expansion of the physical and mathematical scale of the focused problem, the computational complexity and the requirement of model accuracy increase significantly, which limits the adaptability of the conventional algorithm to the dynamic and uncertain objects in DNs [13]. On the other hand, compared with the traditional control methods, the heuristic algorithm shows better adaptability to modern DNs [14,15,16], but the relatively slow convergence speed still inhibits its further application.

Reinforcement learning (RL) [17], a machine learning method, provides a new perspective for the dynamic management of DNs; it can automatically discover the optimal control strategy through trial and error from the environment without pre-knowing the explicit model of the plan, which shows applicable potential in DNs with the characteristics of time—variety, complexity, and uncertainty—e.g., by adjusting the output of DGs, FALs, and ESSs to realize load balance, voltage control, and even frequency regulation in isolated networks [18]. By introducing hierarchical constrained RL and a gated recurrent unit, Ref. [19] proposes an interaction power optimization strategy among several microgrids, by which the dispatching robustness and stability are improved, and in which the decision space is also simplified. To reuse historical dispatching knowledge, Ref. [20] proposes a power system with a real-time dispatching optimization method integrated with deep transfer RL technology; by expanding the input and output channels of the deep-learning network, the transfer of knowledge of the state and action spaces in the expanded scenario is realized, and the learning efficiency of agents is also improved. Ref. [21] presents a deep RL method to analyze and solve the optimal power flow (OPF) problem in DNs, in which the historical data is extracted from the operational knowledge and then approximated by deep neural networks, and quasi-optimal decision-making performance can be realized.

In addition, for DNs with high penetration of adjustable resources (ARs), multi-agent RL is regarded as an effective technical path for optimal dispatching. By learning the coordinated control strategy through a counter-training model, Ref. [22] presents a multi-agent deep RL method (MADRL) to mitigate voltage issues in DNs. In [23], an attention mechanism is adopted to decompose the DN into several sub-DNs, and a modified MADRL is developed to execute distributed voltage-reactive control. Furthermore, to enhance agents’ interregional cooperation ability, an interregional auxiliary rewards mechanism is proposed in [24], and the evolutionary game strategy optimization method is further developed to improve the voltage regulation performance in DNs.

As discussed above, the application of RL shows its great potential in the field of DN optimal regulation, but little research discusses DNs’ dynamic cluster partitioning by use of an RL-based method, though this is normally regarded as a critical step in highly effective organization and control of modern DNs [25,26,27,28,29]. During the cluster partitioning phase, resources most conducive to achieving the DN’s control objectives can be aggregated according to their inherent characteristics and grid topology, which alleviates computational burdens in subsequent grid regulation stages and enhances control efficiency [27]. Furthermore, once the cluster partitioning model is well-trained, its execution becomes liberated from the constraints of traditional optimization models, leading to improved computational efficiency [28]. On this basis, it also strengthens the DN’s decision-making capabilities regarding cluster responses under extreme operating conditions [29]. Hence, exploring an appropriate RL strategy that can be applied for DN dynamic cluster partitioning will be valuable for smart DN management level improvement.

Based on the above discussion, this paper proposes an intelligent dynamic partitioning for DNs, and a self-consistent regulation framework is also embedded. The main contributions of this paper are as follows:

(1): An environmental model with continuous state space, discrete action space and two dispatching performance-oriented reward functions is first developed for DN cluster partitioning.
(2): A novel random forest Q-learning network (RF-QN) with a node-based multi-agent parallel calculating framework and weight self-adjusting mechanism is developed and trained to implement the cluster partitioning, from which the generalization and robustness of the trained model are improved by taking advantage of both deep learning and decision trees.
(3): A dispatching framework with an intelligent dynamic cluster generating mechanism in each cluster partitioning period and a parallel cooperation mechanism between clusters in each regulating moment is proposed in this paper. It leverages the advantages of RL in handling uncertain and complex decision-making processes; while ensuring efficient utilization of the ARs within each cluster, the operation flexibility and robustness of the whole system are improved.

The remainder of this article is organized as follows. The cluster partitioning environmental model is developed in Section 2. The deep RL cluster partitioning strategy and its training method for implementing dynamic cluster partitioning are presented in Section 3. Then, the overall dynamic cluster partitioning and dispatching framework is discussed in Section 4. Section 5 gives the simulation results, and the conclusion is given in Section 6.

2. Cluster Partitioning Environmental Model

2.1. Assumptions

The term “intelligent dynamic cluster partitioning” in this paper refers to putting the nodes/buses of DNs (they are normally connected with some DGs or FALs) into groups at each time interval (e.g., one hour), which is aimed at effectively configuring the ARs according to their output characteristics. To execute this process (especially in the training stage of the proposed method (see Section 3.3, Section 3.4 and Section 3.5)), a good control for each group/cluster, normally OPF, to facilitate optimally allocating these ARs within a group is also included. In addition, to maintain the stable and efficient operation of the whole DN, the dispatching center also needs to supervise the cluster partitioning model and send power commands to each cluster in this process (note: the specific dispatching framework will be discussed in Section 4).

Hence, to realize the proposed cluster partitioning strategy, this paper makes the following assumptions:

(1): All ARs connect the grid via converters, which enables the resources to be flexibly regulated within their output range.
(2): The output active power of DGs is affected by weather or other natural factors, and there is a certain upper limit, while the regulation range of the DG’s reactive power should be constrained by both the setup’s normal rating and the output active power.
(3): To maintain the safe and stable operation of the whole system, it is necessary to ensure that the power commands issued by the dispatching center to each cluster as well as to the entire DN cannot exceed the output boundary of each resource and cluster.

2.2. The Environmental Model for Cluster Partitioning and Rewards

Inspired by the memory-free characteristics of the Markov decision process (MDP) [30], dynamic cluster partitioning can be described as follows:

(1): Each bus/node in the DN can be defined as an agent; the state of an agent represents the AR’s configuration of a node, and the action of an agent determines the cluster number selecting method of a node.
(2): The state space of agent- $i$ at time moment $t$ - $S_{t, i}$ is

$S_{t, i} = \{p_{l, t}^{m a x}, p_{l, t}^{m i n}, s_{l, N}, c (i)\}, l \in L$

(1)

where $p_{l, t}^{m a x}$ and $p_{l, t}^{m i x}$ denote the upper and lower limits of the output active power of AR- $l$ at time $t$ , $s_{l, N}$ is the installed capacity/normal rating of AR- $l$ , $c (i)$ is the cluster number of node- $i$ ’s upstream node, and $L$ is the set of ARs in the concerned DN.
(3): The action space of agent- $i$ - $A_{i}$ is

$A_{i} = \{a (i) = c (i), a (i) = m a x {c} + 1\}, i \in B$

(2)

where $a (i)$ denotes the cluster number node- $i$ belongs to, $c$ is the current cluster number of node- $i$ , and $B$ is the set of the nodes in the concerned DN. (Note: This paper sets $a (i) \leq M$ , $\forall i \in B$ , where $M$ denotes the preset maximum number of clusters. The value of $M$ is designed to balance clustering flexibility and computational efficiency. In this study, $M = 6$ is selected based on the scale and resource distribution of the modified IEEE 33-node test system).
Thus, the cluster number of agent- $i$ at time moment $t$ - $a_{i, t}$ can be expressed as

$a_{i, t} = a (i) = \{\begin{array}{l} action1 : c (i) \\ action2 : m a x (c) + 1 \end{array}, i \in B$

(3)

where whether action 1 or 2 is selected depends on the Q-value in the experience replay buffer (see Section 3.5); the larger related action should be selected.
(4): To evaluate the quality of action in the dynamic cluster partitioning process, dispatching models—automatic generation control (AGC) and automatic voltage control (AVC)—are discussed in this paper, the reward (or comprehensive score) of which can be defined as

$\{\begin{matrix} r^{A G C} = {({W e i g h t}^{A G C})}^{T} Index \\ r^{A V C} = {({W e i g h t}^{A V C})}^{T} Index \end{matrix}$

(4)

where ${W e i g h t}^{A G C}$ and ${W e i g h t}^{A V C}$ are the weights of $r^{A G C}$ and $r^{A V C}$ ; their specifications in this paper are set as

$\{\begin{matrix} {W e i g h t}^{A G C} = (31.61 %, 0.29 %, 31.16 %, 12.46 %, 21.81 %, 3.12 %) \\ {W e i g h t}^{A V C} = (5.30 %, 64.49 %, 5.30 %, 13.24 %, 3.76 %, 11.92 %) \end{matrix}$

(5)

and

$I n d e x = ({I n d}_{1}^{A G C}, {I n d}_{1}^{A V C}, {I n d}_{2}^{A G C}, {I n d}_{2}^{A V C}, {I n d}_{3}^{A G C}, {I n d}_{3}^{A V C})$

(6)

which indicates the dispatching performance indexes relating to RERUR (i.e., ${I n d}_{1}^{A G C}, {I n d}_{1}^{A V C}$ ), network losses (i.e., ${I n d}_{2}^{A G C}, {I n d}_{2}^{A V C}$ ), and voltage deviation rate (i.e., ${I n d}_{3}^{A G C}, {I n d}_{3}^{A V C}$ ), which are defined as

${I n d}_{1}^{m o d} = \frac{1}{T} [\sum_{t = 1}^{T} (\frac{\sum_{i \in B} \sum_{l \in ρ_{(i)}^{n e w}} p_{l, t}^{m o d}}{\sum_{i \in B} \sum_{l \in ρ_{(i)}^{n e w}} p_{l, t}^{m a x}})]$

(7)

${I n d}_{2}^{m o d} = - \frac{1}{T} [\sum_{t = 1}^{T} (\sum_{i \in B} \sum_{l \in ρ (i)} p_{l, t}^{m o d} - \sum_{i \in B} p_{i, t}^{D} + p_{P C C, t})]$

(8)

${I n d}_{3}^{m o d} = - \frac{1}{T} (\sum_{t = 1}^{T} \sum_{i \in B} |1 - \frac{V_{i, t}^{m o d}}{V_{B}}|)$

(9)

where $m o d = A C G, A V C$ ; $ρ_{(i)}^{n e w}$ is the set of distributed renewable energy resources in node- $i$ ; $p_{l, t}^{m o d}$ denotes the output active power of AR- $l$ at moment $t$ in operation model mod; $p_{l, t}^{m a x}$ is the maximum active power of AR- $l$ at moment $t$ ; ρ(i) is the set of distributed ARs (including DGs, FALs, ESSs, etc.) in node- $i$ ; $p_{i, t}^{D}$ is the active power of the un-adjustable node- $i$ at moment $t$ ; $p_{P C C, t}$ is the active power injected into the grid at the PCC point at moment $t$ ; $T$ is the total number of time periods; $V_{i, t}^{m o d}$ is node- $i$ ’s voltage at moment $t$ in operation model mod; and $V_{B}$ is the voltage’s base value.

In fact, the physical meaning of the reward defined in (4) is not unique, and depends on the performance metrics that cluster partitioning and dispatching focus on. In this paper, the indexes shown in (6) to (9) prioritize dispatching performance, indicating that the cluster partitioning strategy proposed here is dispatch-oriented. The weight parameters in (5) are derived from a dispatch-oriented cluster evaluation framework developed in our prior research, which employs a two-stage sample screening method followed by a feedback-based weight correction mechanism tailored for distribution network static clustering. Specifically, the two-stage screening process first eliminates redundant and low-correlation operational scenarios, after which each performance indicator’s weight is adaptively adjusted according to its contribution to overall dispatch performance. This approach ensures that the reward function reflects practical operational priorities—such as maximizing renewable energy utilization, minimizing network losses, and maintaining voltage stability—and has been validated in earlier studies on cluster-based dispatching in active distribution networks. Consequently, these weights are adopted directly in this study to align the reward function with real-world dispatching objectives.

3. Deep Reinforcement Learning Cluster Partitioning Strategy

As discussed in Section 2, the environmental model for dynamic cluster partitioning is formulated as a Markov decision process with continuous state space (reflecting real-time power variations) and discrete action space (representing node-level cluster assignment choices). Within this framework, each node observes its operational state—including power limits, installed capacity, and upstream cluster affiliation—and selects an action to either join the same cluster as its upstream neighbor or initiate a new cluster. Following the partitioning decision, a dispatch-oriented reward is computed based on renewable energy utilization, network loss, and voltage deviation. Hence, dynamic cluster partitioning can be demonstrated to find a balance between the fluctuating power demand and supply in every time section concerned to create an optimal resource grouping model, which is useful in a deep reinforcement learning algorithm.

3.1. Classical Deep Q-Learning Network (DQN) Model

As can be seen from Figure 1, a deep neural network (DNN) is embedded in a deep Q-learning network (DQN) [31], which enables the DQN to fit an approximate Q-function by learning from the continuous state space (e.g., (1)) and thereby predicting the expected rewards for each possible state–action pair

(s, a)

.

By defining the Q-value of the DNN as

Q_{D N N} (s, a)

, and the true Q-value as

Q_{*} (s, a)

, the loss function in the fitting process can be expressed as

J_{D N N} = {‖Q_{D N N} (s, a) - Q_{*} (s, a)‖}_{2}^{2}

(10)

Minimizing

J_{D N N}

in the training process, the DNN can achieve value network regression, and find the value

(s, a)

in every step to realize dynamic cluster partitioning.

3.2. Random Forest Q-Learning Network (RF-QN) Model

To address the limitations of classical DQNs in handling noisy grid data and high-dimensional state spaces, we propose a random forest (RF) Q-learning network (RF-QN) that synergizes ensemble learning with reinforcement learning. As shown in Figure 2, the RF-QN replaces the DNN in a traditional DQN with an RF regression system. The RF-QN leverages ensemble learning to achieve better generalization (through feature subsampling and voting) and inherent stability against noisy grid data [32]. The integration provides three distinct advantages [33]: (1) RF’s parallelizable structure enables faster training, (2) decision trees naturally handle sparse data regimes, reducing experience buffer dependency, and (3) explicit feature importance outputs make cluster partitioning decisions interpretable.

The loss function of the RF-QN is

J_{R F} = {‖Q_{R F} (s, a) - Q_{*} (s, a)‖}_{2}^{2}

(11)

where

Q_{R F} (s, a)

is the Q-function of RF.

Minimizing

J_{R F}

in the training process, RF can achieve value network regression, and find a more suitable

(s, a)

value for dynamic cluster partitioning.

3.3. Training and Parameter Update Principles

For the DNN-integrated DQN [31], the update principle of the Q-value can be expressed as

Q_{D N N} (s, a) \leftarrow Q_{D N N} (s, a) + α [r + γ \underset{a^{'} \in A}{\underset{⏟}{m a x}} (Q_{D N N} (s^{'}, a^{'}) - Q_{D N N} (s, a))]

(12)

where

α

is the learning rate,

γ

is the discount factor,

r

indicates the current reward,

s^{'}

is the status of the next moment, and

a^{'}

is the action of the next moment.

Based on the update principle shown in (12),

Q_{*} (s, a)

can be approximated as

Q_{*} (s, a) = α [r + γ \underset{a^{'} \in A}{\underset{⏟}{m a x}} (Q_{D N N} (s^{'}, a^{'}) - Q_{D N N} (s, a))]

(13)

and (10) can be rewritten as

J_{D N N} = {‖\begin{matrix} Q_{D N N} (s, a) \\ - α [r + γ \underset{a^{'} \in A}{\underset{⏟}{m a x}} (Q_{D N N} (s^{'}, a^{'}) - Q_{D N N} (s, a))] \end{matrix}‖}_{2}^{2}

(14)

By incrementally updating the Q-value network fitted by the DNN, it gradually approaches the temporal difference target until it is closer to the true Q-value network.

For the proposed RF-QN, different from the DNN, the RF regression model cannot update parameter weights in real time. A training result rumination update regression model is proposed in this paper to address this; the framework is shown in Figure 3.

As can be seen from Figure 3, we should first initiate an RF regression model; then, we randomly select several state–action pairs

(s, a)

as the input set (note: they should be restricted within their operation boundary) and save the related rewards of the regressive model as the output set. Then, before training the new regression model, we combine the current state–action pair and the rewards with the input and output sets from the previous step to construct new input and output sets, and the reorganized input and output sets can be adopted to train the model. This process is repeated until the expected accuracy or the maximum training number is reached. The mechanism of reconstructing the trained results with the current input and output set facilitates the updating of the RF’s weights, and in this process the immature results will be gradually abandoned.

3.4. Specific Training Procedure

Figure 4 illustrates the detailed training procedure of the proposed RF-QN for dynamic cluster partitioning, which consists of the following steps:

Step 1: Initialize the hyperparameters of the RF Q-network and assign a cluster number from 1 to node-1.

Step 2: Compute the current and target Q-networks of the RF model.

Step 3: Initialize the experience replay buffer (note: the buffer stores and retrieves experiences for each agent independently).

Step 4: Let Markov sequence number e = 1 and the training counter count = 0.

Step 5: Obtain the environmental state of the concerned DN, including the output power and the installing capacity of ARs (for specifications, refer to Section 3.5).

Step 6: Let the time step t = 1.

Step 7: Initialize the environmental states of the concerned DN and assign each node to a cluster using the ε-greedy algorithm, and record the current states, actions (i.e., cluster partitioning modes), and rewards in the experience replay buffer.

According to the ε-greedy algorithm, when the random number is larger than the preset value, the system will randomly select a cluster partitioning model from the sample pool; otherwise, the procedure will follow the process shown in Figure 5 (see Section 3.5).

Step 8: Calculate the current reward (see (4)) and update the environmental state (i.e., the output power and the installing capacity of ARs) at time step t.

Step 9: Store the current states, actions, and rewards in the experience replay buffer. If the buffer reaches its preset capacity, randomly sample data to train the RF Q-network with the following objective:

m i n J_{R F} = {‖\begin{matrix} Q_{R F} (s, a) \\ - α [r + γ \underset{a^{'} \in A}{\underset{⏟}{m a x}} (Q_{R F'} (s^{'}, a^{'}) - Q_{R F} (s, a))] \end{matrix}‖}_{2}^{2}

(15)

where

Q_{R F'} (s^{'}, a^{'})

is the trained target Q-network/function.

Step 10: Update the Q-network and increment

c o u n t

by 1.

Step 11: If count exceeds the preset threshold, update the target Q-network as

Q_{R F'} (s^{'}, a^{'}) = Q_{R F} (s, a)

.

Step 12: Increment

t

by 1 and proceed to the next training round until

t \geq T

.

Step 13: Terminate the training process once all Markov sequences are completed (i.e.,

e \geq

a preset value) and

t \geq T

.

3.5. The RF-QN Model-Driven Cluster Partitioning Method

To adapt the training process given in Figure 4, the cluster partitioning method driven by the RF-QN model is shown in Figure 5. The basic idea here is assigning a cluster number to each node in DN, after which nodes with the same cluster identifier will be grouped together. It can be seen from Figure 5 that the number of the current cluster

c

and node-1’s cluster number are first set as 1. And then, the following nodes, e.g., node-

i

, will be numbered based on the cluster number of the upstream node, and two possible cluster numbers can be assigned to node-

i

—action 1:

a (i) = c (i)

; and action 2:

a (i) = m a x {c} + 1

(see (2) and (3) in Section 2). Furthermore, the action with the largest RF-QN reward will finally be implemented, and node-

i

’s cluster number

a (i)

can then be determined based on (3). Finally, the cluster number

c

and node number

i

are updated until all nodes are assigned to the cluster number.

4. Dynamic Cluster Partitioning-Integrated Dispatching Framework

4.1. The Cluster Dispatching Model in DNs

As discussed in Section 2, Section 3.4 and Section 3.5, the cluster dispatching model/process should be used in the training process of the proposed dynamic cluster partitioning strategy.

The DN’s cluster dispatching model is given in Figure 6. After each cluster is partitioned by the strategy proposed in Section 2 and Section 3, the command power of the concerned DN (i.e., the PCC power),

p^{*}

and

q^{*}

, is first issued by the dispatching center of the upstream network, and it will be appropriately decomposed into each cluster; in this paper, this is expressed as

\{\begin{matrix} P_{m}^{*} = p^{*} \frac{P_{m}^{m a x} - P_{m}^{m i n}}{\sum_{i = 1}^{M} P_{i}^{m a x} - P_{i}^{m i n}} + P_{m}^{m i n} \\ Q_{m}^{*} = q^{*} \frac{Q_{m}^{m a x} - Q_{m}^{m i n}}{\sum_{i = 1}^{M} Q_{i}^{m a x} - Q_{i}^{m i n}} + Q_{m}^{m i n} \end{matrix}

(16)

where

P_{m}^{*}

and

Q_{m}^{*}

are the reference active and reactive power of cluster-

m

, respectively;

P_{i}^{m a x}

,

P_{i}^{m i n}

and

Q_{i}^{m a x}

,

Q_{i}^{m i n}

are the maximum and minimum active and reactive power of cluster-

i

,

i = 1, \dots, m, \dots, M

.

Obtaining the operation boundary of each cluster is referred to as “aggregation” in Figure 6; it can be calculated using the OPF, in which all the ARs outside the concerned cluster are “turned off”, and then

m a x {P, Q}

and

m i x {P, Q}

are taken as the objects (note:

P

,

Q

here means the power at the node of the concerned cluster interfacing with the external grid). When the aggregation is finished, the boundary of cluster-

m

can be obtained and is described as

\{\begin{matrix} P_{m}^{m i n} \leq P_{m} \leq P_{m}^{m a x} \\ Q_{m}^{m i n} \leq Q_{m} \leq Q_{m}^{m a x} \\ \cos θ_{l} \geq {PF}^{\min}, \forall l \in σ (m) \end{matrix}

(17)

where

P_{m}

and

Q_{m}

are the operation active and reactive power of cluster-

m

;

σ (m)

is the AR set of cluster-

m

;

θ_{l}

is the power factor angle of AR-

l

; and

{PF}^{\min}

is the minimum power factor of all the ARs belonging to cluster-

m

.

After aggregation, an OPF is implemented for each cluster to regulate the ARs within it, which is named “disaggregation” in Figure 6. In our paper, the target of OPF is to track the reference active and reactive power of each cluster, in which the operation boundary of each AR can be generally expressed as

\{\begin{matrix} p_{l}^{m i n} \leq p_{l} \leq p_{l}^{m a x} \\ p_{l}^{2} + q_{l}^{2} \leq s_{l, N}^{2} \\ \cos θ_{l} \geq {PF}^{\min} \\ p_{i} = \sum_{l \in ρ (i)} p_{l} \\ q_{i} = \sum_{l \in ρ (i)} q_{l} \end{matrix}

(18)

where

p_{l}

and

q_{l}

are the operation active and reactive power of AR-

l

;

ρ (i)

is the AR set at node-

i

.

4.2. Dispatching Framework with Dynamic Cluster Partitioning

The dispatching framework with dynamic cluster partitioning is shown in Figure 7. The cluster partitioning strategy is first implemented, and the calculation time

T_{p 1}

is normally less than 3 min for a real DN branch which normally has 20–60 nodes; to ensure a certain margin, the cluster partitioning results are delivered to the dispatching subsystem at

t + T_{p 2}

(note: if

T_{p 1} \leq 3 \min

,

T_{p 2}

can be selected as 5 min, but it can be adjusted according to the practical situation). Then, implementing the dispatching strategy shown in Figure 7, it can be seen that it is fast and robust because the scale of each cluster is appropriate and the OPF is implemented in each cluster independently; in addition, their control target and the command decomposing method are easy to implement (see Section 4.1). The final dispatching results and the rewards at

t

are issued to each adjustable resource at the end of the dispatching period

T_{d}

(it is normally set as

15 \min

in some practical systems). If the cluster partitioning period

T_{c} = T_{d}

, the process will be repeated in the next partitioning period beginning at

t + 1

. If

T_{c} > T_{d}

, e.g.,

T_{c} = 1 h

and

T_{d} = 15 \min

, three other dispatching periods will be implemented in

T_{c}

.

5. Case Study

5.1. System Configuration

A modified IEEE 33-node test system [34] is adopted here for a case study, in which 11 ARs are installed. The specific configuration of those ARs is given in Table 1 and Figure 8, and their output characteristics are shown in Figure 9. (Note: These curves are based on the real measured data in our other practical work, and some of them are proportionally scaled to adapt to the power and voltage level of the concerned DN). The total installation capacity of the ARs is 20% of the loads (note: the loads’ installed active and reactive power are 3715 kW and 2300 kVar, respectively); the installed energy capacity and initial SoC of the ESSs are set as 1000 kWh and 50%, respectively.

The initial parameters of the neural network, the RF model, and RL, which were obtained from several turns of debugging, are listed in Table 2, Table 3 and Table 4.

5.2. Cluster Partitioning Results Driven by DQN

The reward curves of the first 100 training steps in which the DN accepts AGC and AVC regulations are given in Figure 10a,b, in which the traditional DQN is adopted. The curves in Figure 10 show that the cumulative rewards of AGC and AVC (i.e.,

\sum_{i = 1}^{24} r_{i}^{A G C}

and

\sum_{i = 1}^{24} r_{i}^{A V C}

) increase at the beginning of the iteration and gradually tend to reach relatively high steady values with some fluctuations (especially for the curve of AGC). Actually, the fluctuation originally comes from the ϵ-greedy algorithm in the training process (see Section 3.4 and Section 3.5); the agent preferentially selects a high-reward action with probability (1—ϵ), while retaining probability ϵ to perform random action exploration. Although this mechanism results in short-term reward fluctuation, it plays a key role in ensuring the global convergence of the proposed training algorithm.

The comprehensive scores (i.e., the reward in (4)) of the static cluster partitioning strategy and the DQN-driven dynamic cluster partitioning strategy are given in Figure 11a,b (note: as shown in Figure 7, they are obtained after dispatching). It can be seen in Figure 11 that both the AGC and AVC regulation performance of the DQN-driven dynamic cluster partitioning are better than that when adopting the static cluster partitioning strategy. As the training performance of AVC is better than that of AGC (see Figure 10), the dynamic cluster partitioning performance of AVC is better than that of AGC (see Figure 11b).

5.3. Cluster Partitioning Results Driven by RF-QN

The AGC and AVC reward curves in the RF-QN-driven cluster partitioning training process are given in Figure 12a,b; though their trends are similar to that of the DQN-driven system (see Figure 10), their fluctuation amplitudes are smaller and the reward values are larger, especially for that of AGC, which means the training convergency of the proposed RF-QN-driven dynamic cluster partitioning is better than that of the DQN-driven strategy. In addition, when the dynamic cluster partitioning strategy is implemented in the dispatching process (see Section 4.2 (note: in our simulation, we set

T_{c} = T_{d} = 1 h

)), as can be seen in Figure 13a,b, both the AGC’s and AVC’s comprehensive scores in the dynamic cluster partitioning process driven by the proposed RF-QN are obviously higher than that when adopting static cluster partitioning and DQN-driven dynamic cluster partitioning strategies.

To further illustrate the performance of the proposed method more physically, some typical AVC cluster partitioning results under different cluster partitioning strategies at 20:00 are given in Figure 14, and the related dispatching performances are listed in Table 5.

For the all-day optimal static cluster partitioning result shown in Figure 14a, as there is almost no sunlight at 20:00, the PVs (e.g., DG-6 and 7 connect to nodes 21–22) in Clus-ter-2 have sufficient reactive power supply capability, but this kind of Q-support ability to support adjacent clusters is restricted by rigid cluster boundaries, so this sub-optimal resource configuration yields a low reward score of 25 (see Figure 13b), and the related renewable energy utilization rate (RERUR), the network losses, and the voltage deviation rate, as listed in Table 5, are 30.29%, 0.1431 p.u., and 5.637%, respectively. The DQN-driven dynamic cluster partitioning strategy addresses this by intelligently regrouping PVs into cluster-1 with proximal nodes; the PVs connected at nodes-21 and 22 in this scenario are grouped as a cluster including more adjacent nodes than Figure 14a (see cluster-1 in Figure 14b). The reactive power coordinate control ability can then be improved, which elevates the reward/score to 64 (see Figure 13b), and the concerned dispatching performances in Table 5 are also improved. Compared with Figure 14b, the proposed RF-QN-driven dynamic cluster partitioning strategy further optimizes performance by reorganizing the PVs and other ARs into four clusters based on the reactive power coordinate capability and electrical distance, as shown in Figure 14c: each cluster is equipped with sufficient ARs to ensure voltage stability (voltage deviation is reduced to 2.984%), while the electrical distance between nodes in each cluster is maintained at a relatively low level, and the transmission losses are reduced to 0.0493 p.u. correspondingly; hence, the score/reward increases to 80, surpassing both static and DQN-driven methods while achieving a RERUR of 94.73%—the highest among all concerned strategies shown in Figure 14.

Table 6 lists the comprehensive scores of the concerned cluster partitioning strategies. It can be seen from Table 6 that both the scores of the DQN- and RF-QN-driven dynamic partitioning strategies in the AGC and AVC models are higher than that of the static cluster partitioning strategy, but compared to the dynamic partitioning strategy driven by DQN, the score/dispatching performance of the RF-QN-driven dynamic partitioning strategy is closer to that of the exhaustive search-based dynamic cluster partitioning strategy, which is regarded as the optimal dynamic cluster partitioning strategy, though it has the lowest implementation efficiency.

The former analysis has verified that the proposed FR-QN-driven dynamic cluster partitioning strategy is better than the DQN-driven one in the aspects of training robustness, resource organization capability, and regulation performance. Next, we will discuss the execution adaptability of the proposed FR-QN-driven dynamic cluster partitioning strategy to the dispatching framework proposed in Section 4.2. The training and implementation efficiency when the proposed strategy is executed on an ordinary PC are listed in Table 7. It can be seen from Table 7 that though the training time consumption slightly exceeds one day (it will be shorter when a high-performance dedicated server is adopted), the implementation times of the proposed strategy for both AGC and AVC applications are at the second level, which means it can be conveniently executed in the dispatching system discussed in Section 4.2.

6. Conclusions

This paper aimed to develop an intelligent dynamic cluster partitioning and regulation strategy for distribution networks with high penetration of distributed and flexible resources, with the goals of improving operational efficiency, enhancing renewable energy utilization, and maintaining voltage stability while reducing computational complexity. To achieve these objectives, we first constructed an environment model based on a Markov decision process, which formally describes the state space, action space, and dispatch-oriented reward function for cluster partitioning. We then proposed a novel RF-QN (random forest Q-learning network) with a node-based multi-agent framework, incorporating a rumination update mechanism that enables adaptive weight adjustment during training, thereby enhancing the model’s generalization and robustness. Furthermore, an integrated dispatching framework was designed to realize dynamic cluster partitioning and cooperative regulation in practical operations.

The proposed strategy was validated on a modified IEEE-33-node test system. The results demonstrate the following:

Significant Performance Improvement: Compared with static clustering and DQN-based dynamic clustering, the RF-QN approach increases the renewable energy utilization rate to 94.73%, reduces network losses to 0.0493 p.u., and lowers the voltage deviation rate to 2.984% under the AVC scenario (see Table 5).

Operational Efficiency: The cluster partitioning and dispatch execution time remains under one minute, meeting the requirements of real-time or intra-hour grid operation.

Practical Applicability: The framework is particularly suitable for modern distribution networks with high shares of photovoltaic, wind, energy storage, and electric vehicle charging loads, where traditional centralized or static control methods struggle with variability and uncertainty.

In conclusion, the objectives of this study have been successfully achieved. The proposed intelligent dynamic cluster partitioning strategy not only provides a scalable and efficient solution for resource aggregation and dispatch but also offers a novel integration of random forest and reinforcement learning that balances interpretability, robustness, and computational performance. Future work will focus on extending the method to larger-scale networks and more extreme operational scenarios to further validate its adaptability and resilience.

Author Contributions

Conceptualization, K.L. and K.H.; methodology, K.L.; software, K.H.; validation, D.J. and W.S.; formal analysis, D.J.; investigation, D.J. and H.Z.; resources, H.Z.; data curation, Z.L. and Y.L.; writing—original draft preparation, K.L.; writing—review and editing, Z.L. and Y.H.; visualization, S.H.; supervision, Y.L.; project administration, S.H. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Technology Project of State Grid Corporation of China: Research on large-scale distributed power group coordination control method and network source interactive coordination control (No. 5100-202255492A-2-0-CB).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

Authors Keyan Liu, Kaiyuan He, Dongli Jia, Huiyu Zhan and Wanxing Sheng were employed by the company China Electric Power Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

AGC	Automatic Generation Control
ARs	Adjustable Resources
AVC	Automatic Voltage Control
DG	Distributed Generator
DN	Distribution Network
DNN	Deep Neural Network
DQN	Deep Q-learning Network
DWP	Distributed Wind Power
ESS	Energy Storage System
EV-CS	Electric Vehicle Charging Station
FAL	Flexible Adjustable Load
MADRL	Multi-Agent Deep Reinforcement Learning
MDP	Markov Decision Process
MH	Micro Hydropower
OPF	Optimal Power Flow
PCC	Point of Common Coupling
PV	Photovoltaic
RERUR	Renewable Energy Resource Utilization Rate
RF	Random Forest
RF-QN	Random Forest Q-learning Network
RL	Reinforcement Learning
SoC	State of Charge

References

Zhang, Y.; Zhong, K.; Deng, W.; Cheng, R.; Chen, M.; An, Y.; Tang, Y. An optimal scheduling of renewable energy in flexible interconnected distribution networks considering extreme scenarios. IET Renew. Power Gener. 2023, 17, 2531–2541. [Google Scholar] [CrossRef]
Xia, S.; Ding, Z.; Du, T.; Zhang, D.; Shahidehpour, M.; Ding, T. Multitime scale coordinated scheduling for the combined system of wind power, photovoltaic, thermal generator, hydro pumped storage, and batteries. IEEE Trans. Ind. Appl. 2020, 56, 2227–2237. [Google Scholar] [CrossRef]
Zhang, W.C.; Sheng, W.X.; Liu, K.Y.; Kang, T.Y.; Zhan, H.Y. Toward optimal voltage/VAR control with smart PVs in active distribution networks. Electr. Power Syst. Res. 2024, 228, 110076. [Google Scholar] [CrossRef]
Veisi, M.; Sahebi, D.; Karimi, M.; Shahnia, F. Optimised allocation of distributed generation and electric vehicles integration in microgrids: A multi-criteria approach. IET Renew. Power Gener. 2025, 19, e70042. [Google Scholar] [CrossRef]
Ding, J.J.; Zhang, Q.; Hu, S.J.; Wang, Q.J.; Ye, Q.B. Clusters partition and zonal voltage regulation for distribution networks with high penetration of PVs. IET Gener. Transm. Distrib. 2018, 12, 6041–6051. [Google Scholar] [CrossRef]
Mahmoodi, S.; Tarimoradi, H. A novel partitioning approach in active distribution networks for voltage sag mitigation. IEEE Access 2024, 12, 149206–149220. [Google Scholar] [CrossRef]
Shi, J.; Zhou, J.; Zhang, Z.; Liu, Q.; Cheng, Q.; Cheng, S.; Ren, L.; Yang, Z. Energy storage system configuration in power distribution network considering partitioned resource coordination. Electr. Power Syst. Res. 2025, 248, 111961. [Google Scholar] [CrossRef]
Lu, X.; Gong, X.; Wang, P. A grid partition planning method of AC–DC interconnected main grid for large-scale renewable energy integration. IET Renew. Power Gener. 2025, 19, e70023. [Google Scholar] [CrossRef]
Zhang, R.; Liu, H.; Yang, M.; Wang, J.; Shang, J. Network partitioning and hierarchical voltage regulation for distribution networks using holomorphic embedding method-based sensitivity. IET Generat. Transm. Distrib. 2023, 17, 604–620. [Google Scholar] [CrossRef]
Mohammed, N.; Ali, M.M. Optimal sizing of hybrid renewable energy system using two-stage stochastic programming. Int. J. Energy Res. 2024, 1, 2361858. [Google Scholar] [CrossRef]
Guo, Q.; Wu, J.K.; Mo, C.; Xu, H. A cooperative optimization model for voltage and reactive power in new energy distribution networks based on mixed integer second-order cone programming. Proc. CSEE 2018, 38, 1385–1396. [Google Scholar]
Feng, Q.; Ma, J. Coordinated voltage/var control in a hybrid AC/DC distribution network. IET Gener. Transm. Distrib. 2020, 14, 2129–2137. [Google Scholar]
Liu, Z.; Huai, H.; Yao, Y.; Ye, M. Static voltage optimization of active distribution networks based on the Sobol’ method. J. Electr. Sci. Technol. 2025, 40, 72–80. [Google Scholar]
Fatemi, S.S.; Samet, H. Optimal placement and coordination of protection devices simultaneous with optimal allocation of distributed generators. IET Gener. Transm. Distrib. 2023, 17, 4119–4133. [Google Scholar] [CrossRef]
Zhu, Y.; Guo, Y.H.; Hu, T.Y.; Wu, C.K.; Zhang, L.D. Wind farm layout optimization based on dynamic opposite learning-enhanced sparrow search algorithm. Int. J. Energy Res. 2024, 1, 4322211. [Google Scholar] [CrossRef]
Park, S.; Oh, J.Y.; Hwang, E.J. Metaheuristic algorithm-based optimal energy operation scheduling and energy system sizing scheme for PV-ESS integrated systems in South Korea. Int. J. Energy Res. 2024, 1, 1992135. [Google Scholar] [CrossRef]
Darshi, R.; Shamaghdari, S.; Jalali, A.; Arasteh, H. Decentralized energy management system for smart microgrids using reinforcement learning. IET Gener. Transm. Distrib. 2023, 17, 2142–2155. [Google Scholar] [CrossRef]
de Carvalho Neiva Pinheiro, V.; Francato, A.L.; Powell, W.B. Reinforcement learning for electricity dispatch in grids with high intermittent generation and energy storage systems: A case study for the Brazilian grid. Int. J. Energy Res. 2020, 44, 8635–8653. [Google Scholar] [CrossRef]
Lv, K.; Tang, H.; Bak-Jensen, B.; Pillai, J.R.; Tan, Q.; Zhang, Q. Hierarchical learning optimisation method for the coordination dispatch of the inter-regional power grid considering the quality of service index. IET Gener. Transm. Distrib. 2020, 14, 3673–3684. [Google Scholar] [CrossRef]
Luo, W.J.; Zhang, J.; He, Y.; Gu, T.; Nie, X.; Fan, L. Optimal scheduling of regional integrated energy system based on advantage learning soft actor-critic algorithm and transfer learning. Power Syst. Technol. 2023, 47, 1601–1615. [Google Scholar]
Cao, D.; Hu, W.; Xu, X.; Wu, Q.; Huang, Q.; Chen, Z.; Blaabjerg, F. Deep reinforcement learning based approach for OPF of distribution networks embedded with renewable energy and storage devices. J. Mod. Power Syst. Clean Energy 2021, 9, 1101–1110. [Google Scholar] [CrossRef]
Kamruzzaman, M.; Duan, J.; Shi, D.; Benidris, M. A deep reinforcement learning-based multi-agent framework to enhance power system resilience using shunt resources. IEEE Trans. Power Syst. 2021, 36, 5525–5536. [Google Scholar] [CrossRef]
Cao, D.; Zhao, J.; Hu, W.; Ding, F.; Huang, Q.; Chen, Z. Attention enabled multi-agent DRL for decentralized Volt-VAR control of active distribution system using PV inverters and SVCs. IEEE Trans. Sustain. Energy 2021, 12, 1582–1592. [Google Scholar] [CrossRef]
Zhou, X.; Li, X.; Liu, J.; Lin, S. Voltage optimization control of distribution networks considering inter-regional auxiliary rewards. Electr. Power Constr. 2024, 45, 80–93. [Google Scholar]
Nematshahi, S.; Shi, D.; Wang, F.; Yan, B.; Nair, A. Deep reinforcement learning based voltage control revisited. IET Gener. Transm. Distrib. 2023, 17, 4826–4835. [Google Scholar] [CrossRef]
He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
Sheng, W.X.; Wu, M.; Ji, Y.; Kou, L.; Pan, J.; Shi, H.; Niu, G.; Wang, Z. Key technologies and engineering practices for grid integration and consumption of distributed renewable energy generation clusters. Proc. CSEE 2019, 39, 2175–2186. [Google Scholar]
Pan, H.M. Research on cluster voltage regulation in active distribution networks based on multi-agent deep reinforcement learning. Power Syst. Big Data 2025, 28, 77–86. [Google Scholar]
Zeng, R.; Li, Y.; Hu, S.; Liu, J.; Wu, H.; Chen, J.; Xu, Y.; Yang, X.; Cao, Y. Bi-Level Resilient Control Solution for Distributed Feeder Automation System Under Hybrid Attack. IEEE Trans. Smart Grid 2025, 16, 301–312. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Liu, Q.; Zhai, J.; Zhang, Z.; Zhong, S.; Zhou, Q.; Zhang, P.; Xu, J. A survey on deep reinforcement learning. Chin. J. Comput. 2018, 41, 1–27. [Google Scholar]
Qammar, N.; Arshad, A.; Miller, R.J.; Mahmoud, K.; Lehtonen, M. Machine learning based hosting capacity determination methodology for low voltage distribution networks. IET Gener. Transm. Distrib. 2024, 18, 5911–5920. [Google Scholar] [CrossRef]
Fang, K.N.; Wu, J.B.; Zhu, J.P.; Xie, B.C. A review of random forest methodology research. Stat. Inf. Forum 2011, 26, 32–38. [Google Scholar]
Baran, M.E.; Wu, F.F. Network reconfiguration in distribution systems for loss reduction and load balancing. IEEE Trans. Power Deliv. 1989, 4, 1401–1407. [Google Scholar] [CrossRef]

Figure 1. DQN model.

Figure 2. RF-QN model.

Figure 3. Training result rumination update regression principle for RF-QN.

Figure 4. Training process of RF-QN-powered cluster partitioning.

Figure 5. FR-QN-based cluster partitioning method (note: M denotes the maximum number of clusters).

Figure 6. Cluster dispatching model.

Figure 7. Dispatching framework with dynamic cluster partitioning.

Figure 8. Topology of the concerned modified network.

Figure 9. Output characteristics of the installed adjustable resources. (Note: MH: micro hydropower; DWP: distributed wind power; EV-CS: electric vehicle charging station).

Figure 10. Cumulative rewards in DQN training process. (a) AGC reward. (b) AVC reward.

Figure 11. The scores of the DQN-driven dynamic cluster partitioning method and traditional static cluster partitioning method. (a) Score in the AGC operation model. (b) Score in the AVC operation model.

Figure 12. Cumulative rewards in RF-QN training process. (a) AGC reward. (b) AVC reward.

Figure 13. The scores of the FR-QN- and DQN-driven dynamic cluster partitioning methods and traditional static cluster partitioning method. (a) Score in the AGC operation model. (b) Score in the AVC operation model.

Figure 14. Some typical AVC cluster partitioning results. (a) Static cluster partitioning result. (b) DQN-driven dynamic cluster partitioning result at 20:00. (c) FR-QN-driven dynamic cluster partitioning result at 20:00.

Table 1. Specifications of the installed adjustable resources.

Identification Number	Node Installed	Attributes	Installation Capacity
1	6	ESS	100 kVA
2	9	PV 1	150 kVA
3	12	DWP 1	100 kVA
4	13	PV 2	150 kVA
5	18	MH	100 kVA
6	21	PV 3	150 kVA
7	22	PV 4	150 kVA
8	23	DWP 2	100 kVA
9	25	EV-CS 1	100 kVA
10	30	EV-CS 2	100 kVA
11	33	DWP 3	100 kVA

Table 2. Neural network model hyperparameters.

Parameter		Parameter
Learning rate	0.1	Input dimension	34
Training episodes	1	Number of hidden layers	6
Learning rate decay factor	0.9	Output dimension	2

Table 3. Random forest model hyperparameters.

Parameter		Parameter
Number of decision trees	100	Input dimension	34
Maximum splits per tree	10	Output dimension	1
Minimum samples per leaf	5	Type	Regression

Table 4. Reinforcement learning hyperparameters.

Parameter		Parameter
Q-network update frequency	50	Discount factor	0.98
ε-greedy sampling probability	0.1	Experience replay buffer batch size	100
Number of neural network-based agents	32	Number of random forest-based agents	64

Table 5. The specific performance of each partitioning method at 20:00.

Cluster Partitioning Strategies	RERUR (%)	Network Loss (p.u.)	Voltage Deviation Rate (%)
Static cluster partitioning strategy	30.29	0.1431	5.637
DQN-driven dynamic partitioning strategy	85.96	0.0785	3.582
RF-QN-driven dynamic partitioning strategy	94.73	0.0493	2.984

Table 6. Average scores of different cluster partitioning strategies.

Method	AGC Dispatching	AVC Dispatching
Static cluster partitioning strategy	65.36	61.83
DQN-driven dynamic partitioning strategy	69.66	76.74
RF-QN-driven dynamic partitioning strategy	75.73	83.65
Exhaustive search-based dynamic cluster partitioning strategy	80.16	88.07

Table 7. Training and implementation efficiency of the proposed FR-QN-driven dynamic cluster partitioning.

Items	Dynamic Partitioning Strategy Driven by RF-QN
AGC training time consumption for max. 100 iterations	26.426 h
AVC training time consumption for max. 100 iterations	28.515 h
Cluster partitioning time consumption for AGC when training is finished	47.47 s
Cluster partitioning time consumption for AVC when training is finished	59.52 s

Note: The simulation is implemented on the Matlab platform and is tested on a PC with the following hardware configuration: a 2.60 GHz Intel(R) Core(TM) i7-9750H CPU and 16.0 GB RAM.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, K.; He, K.; Jia, D.; Zhan, H.; Sheng, W.; Li, Z.; Huang, Y.; Hu, S.; Li, Y. An Intelligent Dynamic Cluster Partitioning and Regulation Strategy for Distribution Networks. Energies 2026, 19, 384. https://doi.org/10.3390/en19020384

AMA Style

Liu K, He K, Jia D, Zhan H, Sheng W, Li Z, Huang Y, Hu S, Li Y. An Intelligent Dynamic Cluster Partitioning and Regulation Strategy for Distribution Networks. Energies. 2026; 19(2):384. https://doi.org/10.3390/en19020384

Chicago/Turabian Style

Liu, Keyan, Kaiyuan He, Dongli Jia, Huiyu Zhan, Wanxing Sheng, Zukun Li, Yuxuan Huang, Sijia Hu, and Yong Li. 2026. "An Intelligent Dynamic Cluster Partitioning and Regulation Strategy for Distribution Networks" Energies 19, no. 2: 384. https://doi.org/10.3390/en19020384

APA Style

Liu, K., He, K., Jia, D., Zhan, H., Sheng, W., Li, Z., Huang, Y., Hu, S., & Li, Y. (2026). An Intelligent Dynamic Cluster Partitioning and Regulation Strategy for Distribution Networks. Energies, 19(2), 384. https://doi.org/10.3390/en19020384

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Intelligent Dynamic Cluster Partitioning and Regulation Strategy for Distribution Networks

Abstract

1. Introduction

2. Cluster Partitioning Environmental Model

2.1. Assumptions

2.2. The Environmental Model for Cluster Partitioning and Rewards

3. Deep Reinforcement Learning Cluster Partitioning Strategy

3.1. Classical Deep Q-Learning Network (DQN) Model

3.2. Random Forest Q-Learning Network (RF-QN) Model

3.3. Training and Parameter Update Principles

3.4. Specific Training Procedure

3.5. The RF-QN Model-Driven Cluster Partitioning Method

4. Dynamic Cluster Partitioning-Integrated Dispatching Framework

4.1. The Cluster Dispatching Model in DNs

4.2. Dispatching Framework with Dynamic Cluster Partitioning

5. Case Study

5.1. System Configuration

5.2. Cluster Partitioning Results Driven by DQN

5.3. Cluster Partitioning Results Driven by RF-QN

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI