Coordinated Energy Storage Optimization for Power Quality in High-Renewable Distribution Networks

Duan, Ruiqin; Jiang, Yan; Zhu, Xinchun; Song, Xiaolong; Luo, Junjie; Jia, Youwei

doi:10.3390/en19102373

Open AccessArticle

Coordinated Energy Storage Optimization for Power Quality in High-Renewable Distribution Networks

by

Ruiqin Duan

¹,

Yan Jiang

¹,

Xinchun Zhu

¹,

Xiaolong Song

²,

Junjie Luo

³ and

Youwei Jia

^3,*

¹

Yunnan Electric Power Dispatching and Control Center, Kunming 650011, China

²

Yunnan Electric Power Test & Research Institute (Group) Co., Ltd., Kunming 650217, China

³

Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(10), 2373; https://doi.org/10.3390/en19102373

Submission received: 29 January 2026 / Revised: 6 March 2026 / Accepted: 10 March 2026 / Published: 15 May 2026

(This article belongs to the Section F1: Electrical Power System)

Download

Browse Figures

Versions Notes

Abstract

The increasing penetration of single-phase photovoltaic (PV) generation and electric vehicle (EV) charging has aggravated phase current asymmetry in low-voltage distribution networks. In contrast to voltage-oriented control strategies, this work focuses directly on mitigating current imbalance at the point of common coupling (PCC). A coordinated control framework based on multi-agent deep deterministic policy gradient (MADDPG) is developed to regulate distributed battery energy storage systems (BESS). The control objective is formulated in terms of the Current Unbalance Factor (IUF), derived from symmetrical component theory. A linearized DistFlow model is embedded in the learning environment to preserve physical consistency while maintaining computational tractability. Device-level constraints, including state-of-charge limits and ramp-rate bounds, are enforced through action projection, whereas network security limits are incorporated via reward penalties. Case studies on a modified residential feeder indicate that coordinated BESS control reduces the peak IUF from 2.75% to 2.50% under the studied operating condition. The maximum dominant-phase current decreases from 125 A to 110 A. The performance is close to that of centralized convex optimization while enabling decentralized real-time execution after offline training. These results suggest that multi-agent reinforcement learning can serve as a feasible alternative for phase imbalance mitigation in distribution networks with high renewable penetration.

Keywords:

high proportion of renewable energy; energy storage system; reinforcement learning; collaborative optimization; three-phase imbalance

1. Introduction

The rapid transition toward low-carbon energy systems has significantly increased the penetration of distributed generation (DG), particularly rooftop photovoltaics (PV), in low-voltage distribution networks (LVDNs) [1,2]. Meanwhile, the accelerating deployment of single-phase electric vehicle (EV) charging infrastructure has introduced highly uneven phase-level loading conditions [3,4,5]. Unlike transmission systems, LVDNs are typically radial and cable-dominated, with relatively high R/X ratios and limited short-circuit capacity [6,7]. Under high distributed energy resource (DER) penetration, the coexistence of stochastic PV generation and asymmetric single-phase loads has led to increasingly severe three-phase current imbalance. Excessive phase imbalance produces significant neutral currents, increases copper losses, accelerates transformer thermal aging, and may even trigger protection malfunctions [8,9,10]. Although voltage deviation has traditionally been the primary focus of distribution-level control, several studies have demonstrated that transformer overloading and thermal stress are more directly associated with current magnitude asymmetry and negative-sequence currents rather than voltage magnitude deviation alone [11,12,13]. Therefore, mitigating current imbalance is critical for ensuring secure and sustainable operation of modern LVDNs.

Conventional mitigation measures include manual phase reallocation, feeder reconfiguration, and capacitor-based compensation [14,15,16]. However, these approaches are either labor-intensive, static in nature, or primarily oriented toward voltage support rather than active power redistribution across phases. PV inverters can provide reactive power compensation to regulate voltage profiles [17,18], but their capability to correct severe phase current asymmetry is limited by inverter capacity and intermittent solar irradiance. Given these limitations, active power balancing through controllable distributed resources has emerged as a promising direction.

Battery Energy Storage Systems (BESS), owing to their fast response capability and bidirectional active power controllability, provide an effective mechanism for reshaping phase power flows [19]. By dynamically adjusting charging and discharging power at specific nodes and phases, BESS units can directly influence phase current magnitudes and alleviate asymmetry. Existing research has investigated BESS deployment for voltage support, loss reduction, and reliability enhancement in unbalanced distribution networks [20]. Some works formulate centralized optimal power flow (OPF) models to coordinate storage dispatch in unbalanced feeders, while others consider multi-timescale scheduling frameworks integrating EV charging and storage systems [21,22,23]. Despite these advances, several research gaps remain:

In existing studies, many approaches focus on minimizing voltage deviation or the Voltage Unbalance Factor (VUF) [24,25]. However, voltage-oriented metrics do not necessarily capture the thermal stress induced by negative-sequence currents. The Current Unbalance Factor (IUF), derived from symmetrical component theory and standardized definitions of unbalance indices [26], more directly quantifies phase current asymmetry at the Point of Common Coupling (PCC). Few studies explicitly adopt IUF or negative-sequence current magnitude as the primary optimization objective. OPF-based imbalance mitigation typically requires accurate network parameters, full communication of global states, and centralized computation at each time step [25,27]. Such implementations may become computationally intensive and difficult to deploy in large-scale LVDNs with numerous distributed BESS units.

In addition, several data-driven control frameworks do not incorporate detailed network physics or strict feasibility enforcement. Deep Reinforcement Learning (DRL) has recently been applied to voltage control, demand response, and distributed energy management in distribution networks [28,29,30]. Algorithms such as Deep Deterministic Policy Gradient (DDPG) and Multi-Agent Deep Deterministic Policy Gradient (MADDPG) have demonstrated strong potential for continuous control problems [31,32]. However, many DRL-based studies treat the grid as a black box or rely on simplified resistive models that neglect reactive power coupling and branch impedance effects [32,33]. Moreover, critical engineering constraints—such as battery state-of-charge (SoC) limits, ramp-rate limits, and network security bounds—are often handled solely through soft penalty terms [34,35,36], which may not guarantee feasible actions during exploration.

Table 1 compares representative imbalance mitigation strategies in terms of control architecture, model requirements, optimization objective. As shown in Table 1, most existing approaches either (i) rely on centralized OPF-based optimization, or (ii) focus on voltage-oriented objectives rather than directly minimizing current imbalance. In addition, RL-based schemes rarely combine physics-informed modeling with strict engineering constraint enforcement under a decentralized execution paradigm.

To address these limitations, this paper proposes a coordinated control framework based on Multi-Agent Deep Deterministic Policy Gradient (MADDPG) for mitigating three-phase current imbalance in low-voltage distribution networks (LVDNs). Compared with conventional centralized optimization approaches, the proposed framework enables distributed real-time phase balancing through coordinated operation of distributed battery energy storage systems (BESS). By performing offline training and relying only on local measurements during execution, the approach significantly reduces communication requirements and avoids repeated centralized optimization, thereby improving the practical feasibility of phase imbalance mitigation in communication-constrained LVDNs.

IUF-Oriented Imbalance Mitigation: A global reward function is explicitly designed to minimize the Current Unbalance Factor (IUF) derived from symmetrical components, directly targeting phase current asymmetry associated with transformer thermal stress and neutral conductor loading.
Physics-Informed Multi-Agent Learning Framework: A linearized DistFlow-based environment is embedded within the reinforcement learning framework, enabling agents to learn physically consistent control policies that account for branch impedance and power flow sensitivity in distribution networks.
Centralized Training and Decentralized Execution (CTDE): The MADDPG architecture enables centralized offline training using global state information while allowing decentralized real-time execution based solely on locally measured states. This structure significantly reduces communication requirements and avoids repeated centralized optimization, improving the scalability and practical applicability of the proposed strategy for large-scale LVDNs.

The remainder of this paper is organized as follows. Section 2 presents the system modeling framework, including the linearized DistFlow formulation and BESS operational constraints. Section 3 introduces the proposed MADDPG-based coordinated control strategy. Section 4 provides simulation setup and case study results. Finally, Section 5 concludes the paper and outlines future research directions.

2. System Framework

2.1. Overall System Structure

The three-phase four-wire low-voltage distribution network established in this paper is modified from [19] and is shown in Figure 1. The system includes 21 residential households, comprising six households with single-phase energy storage, five with rooftop photovoltaic installations, three with three-phase loads, and seven with both single-phase loads and rooftop photovoltaic systems.

To provide a clear overview of the proposed methodology, the overall framework is illustrated in Figure 2. The core idea of this work is to mitigate three-phase current imbalance by dynamically regulating the charging and discharging power of distributed BESS units. By adjusting active power injection at different feeder locations, the phase power distribution is reshaped, thereby reducing neutral-line current and lowering the Current Unbalance Factor (IUF).

Figure 3 provides an overview of the unbalanced framework based on multi-agent reinforcement learning. Each agent, corresponding to a single-phase energy storage system, observes its local state while all agents are trained with a shared reward function. During training, all agents share experience through a centralized evaluation network, enabling collaborative learning to maximize the cumulative reward. After successful training, only the local observation state is retained for practical application; independent control can be performed in practical applications using only the local observation state. Consequently, single-phase battery systems deployed throughout the distribution network are able to operate in a coordinated manner via autonomous charging and discharging decisions, effectively reducing the neutral current at the point of common connection (PCC) without directly obtaining phase-specific connection information.

2.2. Establishment of the System Model

(1): Power balance and power flow constraints

Low-voltage residential feeders are typically short, radial, and cable-based, resulting in relatively high

R / X

ratios compared to transmission systems, indicating resistive-dominant characteristics. Under such conditions, the DistFlow formulation provides a reasonable trade-off between accuracy and computational efficiency. To accurately reflect the physical characteristics of the distribution network, we model both active and reactive power injections. For each node

n

and phase

ϕ

at time

t

, the net power injections satisfy the following balance equations:

P_{n, ϕ, t}^{i n j} = P_{n, ϕ, t}^{P V} - P_{n, ϕ, t}^{L o a d} + \sum_{k \in K} A_{k, (n, ϕ)} P_{k, t}

(1)

Q_{n, ϕ, t}^{i n j} = Q_{n, ϕ, t}^{P V} - Q_{n, ϕ, t}^{L o a d}

(2)

We adopt the linearized DistFlow model to formulate the relationship between power flow and voltage. Unlike the simplified resistive model, this formulation explicitly accounts for branch reactance

x_{i j}

and reactive power flow

Q_{i j}

, which are necessary to accurately represent voltage variations and power flows in low-voltage networks:

\sum_{i : (i, n) \in E} P_{i n, ϕ, t} - \sum_{j : (n, j) \in E} P_{n j, ϕ, t} = P_{n, ϕ, t}^{i n j}

(3)

\sum_{i : (i, n) \in E} Q_{i n, ϕ, t} - \sum_{j : (n, j) \in E} Q_{n j, ϕ, t} = Q_{n, ϕ, t}^{i n j}

(4)

The voltage drop along branch

(i, j)

is given by:

v_{j, ϕ, t} = v_{i, ϕ, t} - 2 (r_{i j, ϕ} P_{i j, ϕ, t} + x_{i j, ϕ} Q_{i j, ϕ, t})

(5)

where

v_{n, ϕ, t}

denotes the square of the voltage magnitude. This formulation ensures that the impact of reactive power on nodal voltage variations is properly captured.

For each node n, a voltage constraint must be satisfied to ensure that the voltage is within a specified range, as shown in (4):

V_{\min^{2}} \leq v_{n, ϕ, t} \leq V_{\max^{2}}, \forall n, ϕ, t

(6)

(2): Energy storage constraints

To ensure physical feasibility, each BESS is subject to power, energy, and operational constraints.

The charging and discharging power is bounded by inverter capacity:

- P_{k}^{m a x} \leq P_{k, t}^{B E S S} \leq P_{k}^{m a x}

(7)

where positive values indicate discharging and negative values indicate charging. The state of charge (SOC) is restricted within safe limits:

S O C_{m i n} \leq S_{k, t} \leq S O C_{m a x}

(8)

The evolution of the SOC accounts for conversion losses. Let

η_{c}

and

η_{d}

denote the charging and discharging efficiencies, respectively:

S_{k, t + 1} = S_{k, t} + η_{c} {[- P_{k, t}^{B E S S}]}_{+} Δ t - \frac{1}{η_{d}} {[P_{k, t}^{B E S S}]}_{+} Δ t

(9)

To reflect inverter response limits, a ramp-rate constraint is imposed:

| P_{k, t}^{B E S S} - P_{k, t - 1}^{B E S S} | \leq R_{k}^{r a m p}

(10)

To limit excessive cycling, the total daily energy throughput is constrained as:

\sum_{t = 1}^{T} ({[- P_{k, t}^{B E S S}]}_{+} + {[P_{k, t}^{B E S S}]}_{+}) Δ t \leq E_{k}^{t h}

(11)

In addition, energy storage can also be in an idle state. Therefore, by defining the association matrix, it indicates whether the energy storage unit is connected to the distribution network, as defined in (6):

A_{k, (n, ϕ)} = \{\begin{array}{l} 1, Access node n \\ 0, Otherwise \end{array} A_{k, (n, ϕ)} = \{\begin{array}{l} 1, Access node n \\ 0, Otherwise \end{array}

(12)

(3): Optimization Model

The objective of the proposed strategy is to mitigate three-phase current imbalance at the Point of Common Coupling (PCC).

The coordinated actions of distributed BESS units modify the feeder power flow. The resulting phase currents at the PCC are obtained from the total active and reactive power exchanged with the main grid. With nominal magnitude

V_{n o m}

, the PCC phase currents are approximated at T:

I_{ϕ, t}^{P C C} = \frac{P_{l i n e, ϕ, t}^{P C C} - j Q_{l i n e, ϕ, t}^{P C C}}{V_{n o m}} \forall ϕ \in {a, b, c}

(13)

Using the Symmetrical Component Method, the positive-sequence

I_{1, t}

and negative-sequence

I_{2, t}

currents are derived from the phase currents:

[\begin{matrix} I_{0, t} \\ I_{1, t} \\ I_{2, t} \end{matrix}] = \frac{1}{3} [\begin{matrix} 1 & 1 & 1 \\ 1 & α & α^{2} \\ 1 & α^{2} & α \end{matrix}] [\begin{matrix} I_{a, t}^{P C C} \\ I_{b, t}^{P C C} \\ I_{c, t}^{P C C} \end{matrix}]

(14)

where

α = e^{j \frac{2 π}{3}}

is the rotation operator, and the sequence components are evaluated at the PCC to quantify grid-side imbalance.

The Current Unbalance Factor (IUF) is defined as the ratio of the negative-sequence current magnitude to the positive-sequence current magnitude:

I U F_{t} = \frac{| I_{2, t} |}{| I_{1, t} | + ϵ}

(15)

where

ϵ

is a small constant 10⁻⁶ to prevent numerical instability during islanded or zero-flow conditions.

The global optimization problem minimizes the expected cumulative imbalance over the operation horizon

T

:

\min_{π} J = \sum_{t = 1}^{T} I U F_{t}

(16)

3. Three-Phase Unbalance Optimization Based on MADRL

3.1. Markov Decision Process Model

This paper formulates the coordinated control of distributed single-phase energy storage systems for three-phase imbalance mitigation as a multi-agent partially observable Markov decision process (POMDP) with multiple agents, represented by an eight-tuple

〈 G, S, O, A, P, r, ρ, γ 〉

. Among them,

G

is a collection of intelligent agents, which are represented by energy storage units in this paper.

S

is the set of state spaces,

O

is the set of observation spaces,

A

is the set of action spaces,

P

is the state transition probability function, and

r

is the reward function. The system state set contains global operational information of the three-phase distribution network and is denoted as:

S = {E, P^{p v}, P^{l o a d}, I, I^{n}, t},

where

E

represents the energy state of the energy storage unit,

P^{p v}

and

P^{l o a d}

is the power of the photovoltaic and load, respectively,

I

represents the three-phase current amplitudes of all nodes,

I^{n}

represents the neutral point current amplitude, and

t

is the time step.

The observation space of each agent is represented by local observation information, with the observation vector denoted as

o_{k} = {E_{k}, I_{a}, I_{b}, I_{c}, P_{n, ϕ}^{p v}, P_{n, ϕ}^{l o a d}, t}

, where

E_{k}

is the battery state of charge,

I_{a}, I_{b}, I_{c}

are the current value at the node n,

P_{n, ϕ}^{p v}, P_{n, ϕ}^{l o a d}

is the photovoltaic and load power at the corresponding node phase. The action space of each agent corresponds to the charging and discharging power of the associated energy storage unit and is modeled as a continuous domain. For each unit, the action set is given by:

a_{k} \in [- P^{\max_{k}}, P^{\max_{k}}]

(17)

where

P^{\max_{k}}

denotes the maximum charging and discharging power of the battery, where positive values correspond to discharging and negative values correspond to charging, consistent with the sign convention in Section 2.

Given the current state

s_{t}

and joint action

a_{t}

, the next state

s_{t + 1}

is generated by the physical dynamics of the distribution network and the BESS energy update. Specifically, the BESS energy state evolves according to Equation (9), while the nodal power injections and branch power flows are computed using the linearized DistFlow equations in Equations (1)–(6) under the realized PV generation and load demand.

The system-level control objective is to mitigate three-phase current imbalance at the PCC. Accordingly, a shared global reward is adopted for all agents and defined as:

r_{t} = - I U F_{t}

(18)

The optimization objective is achieved by assigning a negative reward, where a lower imbalance results in a higher reward value, reflecting improved three-phase balance in the distribution network.

To address the heterogeneity of physical magnitudes in the global state set

S

, all continuous state variables are normalized before being input into neural networks. To reduce the scale disparity among heterogeneous state variables, all continuous features are normalized before being fed into the neural networks (e.g., voltages in per-unit values and power/energy variables linearly scaled to

[0, 1]

based on their operational limits), which improves training stability.

3.2. Constraint Enforcement in the RL Framework

The physical constraints defined in Section 2 are enforced through a two-tier mechanism within the reinforcement learning loop.

Battery power limits, SOC bounds, and ramp-rate constraints are strictly enforced through action projection before applying actions to the environment. The raw output of the Actor network is clipped as:

P_{k, t}^{BESS} \leftarrow clip (P_{k, t}^{BESS}, - P_{k}^{\max}, P_{k}^{\max})

(19)

Network-level operational limits (e.g., voltage bounds) are incorporated as soft constraints to ensure feasible operation. To discourage actions that compromise system safety, a penalty term is introduced to the global reward. The modified reward function is expressed as:

r_{t} = - I U F_{t} - β \cdot C_{p e n, t}

(20)

where

β

is a positive penalty coefficient balancing the trade-off between imbalance mitigation and constraint satisfaction. The constraint violation term

C_{p e n, t}

penalizes voltage magnitude violations and is defined as:

C_{p e n, t} = \sum_{n \in N} [\max (0, v_{n, t} - V_{m a x}^{2}) + \max (0, V_{m i n}^{2} - v_{n, t})]

(21)

This term ensures that the agent receives a significantly lower reward if nodal voltages deviate from the permissible range

[V_{m i n}, V_{m a x}]

, thereby enforcing safe grid operation during training. This mechanism guides the learning process toward secure operating conditions without requiring hard projection on network states.

3.3. MADDPG Algorithm

Within the formulated Markov decision framework, the strong coupling and interactions among multiple energy storage units render conventional single-agent reinforcement learning inadequate for handling environmental non-stationarity and the resulting high-dimensional action space. To address this challenge, the multi-agent deep deterministic policy gradient (MADDPG) algorithm is adopted. MADDPG extends the DDPG framework and is specifically tailored for continuous control problems involving multiple agents. In contrast to DDPG, MADDPG employs multiple Actor networks together with their corresponding Critic networks. During training, each Actor aims to maximize the expected discounted return, whereas the Critic is optimized by minimizing the temporal-difference error. Accordingly, the objective function of agent

i

can be expressed as follows:

J_{i} = E_{d^{μ}} [\sum_{t = 1}^{T} γ^{t - 1} r_{i}^{t}]

(22)

where

r_{i}^{t}

represents the immediate reward, and

r

is the discount factor. During centralized training, the Critic network conditions on the global state

s

and the joint action

(a_{1}, \dots, a_{N})

, whereas each Actor network operates solely based on its local observation

o_{i}

. This centralized training and decentralized execution (CTDE) structure is explicitly reflected in the gradient formulation below:

\nabla_{θ_{i}} J_{i} = E_{s, a ~ D} [\nabla_{θ_{i}} μ_{i} (o_{i}) \nabla_{a_{i}} Q_{i}^{μ} (s, a_{1}, \dots, a_{N})]

(23)

where

o_{i}^{m}

denotes the local observation of agent i in sample m. and

s^{m}

denotes the corresponding global state used by the centralized Critic. For the Critic network, the loss function is defined as:

L (ϕ_{i}) = \frac{1}{M} \sum_{m} {(y_{i}^{m} - Q_{i}^{μ} (x^{m}, a_{1}^{m}, \dots, a_{N}^{m}))}^{2}

(24)

where the target value is:

y_{i}^{m} = r_{i}^{m} + γ Q_{i}^{μ^{'}} (x^{m + 1}, a_{1}^{m + 1}, \dots, a_{N}^{m + 1})

(25)

In the formula,

r_{i}^{m}

and

ϕ_{i}

correspond to the target Actor network and the target Critic network, respectively. To improve stability, the parameter update adopts the soft update method:

\begin{array}{l} τ θ_{i} + (1 - τ) θ_{i^{'}} \to θ_{i} \\ τ ϕ_{i} + (1 - τ) ϕ_{i^{'}} \to ϕ_{i} \end{array}

(26)

where

θ_{i}

and

ϕ_{i}

are associated with the target Actor and Critic networks, and

τ

is the soft update coefficient. The proposed method consists of two phases: centralized offline training and distributed online execution, as shown in Algorithm 1.

During the training phase, the intelligent agent interacts with the distribution network environment based on historical photovoltaic and load data, generating state transitions, actions, and rewards. It continuously updates the actor and critic networks through an experience replay mechanism, employing soft updates to ensure convergence stability. During the execution phase, each energy storage unit independently inputs local observations into the trained policy network, outputting charge and discharge power, thereby achieving distributed three-phase imbalance management.

In practical deployment, the proposed framework can be implemented through a two-stage architecture. During the offline stage, the distribution management system (DMS) collects historical operational data and trains the MADDPG model using the distribution network environment. Once training converges, the learned Actor network parameters are deployed to local controllers of distributed BESS units. During real-time operation, each controller determines charging and discharging actions based on locally measured variables such as nodal voltage, phase current, and battery state of charge.

Since the decision-making process relies only on local measurements and lightweight neural network inference, the proposed strategy can be integrated into existing smart inverters or battery management systems, enabling distributed real-time mitigation of phase imbalance in practical LVDNs.

Algorithm 1: Multi-agent energy storage scheduling based on MADDPG
1	Initialize Env, the multi-agent Actor–Critic networks, and the experience replay memory
2	For $each episode m = 1$ to $m$ , do:
3	$Reset the environment and get observations \{o_{k}\}$ for all agents.
4	$For each time t = 1$ to $T$ , do:
5	$For each agent i = 1, 2, \dots$ , do:
6	$Choose an action according to the policy network μ_{i} (o_{i}^{t} ∣ θ_{i}^{μ})$
7	End for
8	$Execute the joint action a^{t} = {a_{i}^{t}}$ in the environment.
9	Observe next state, reward rᵢᵗ for each agent, and store transition $(s^{t}, a^{t}, r^{t}, s^{t + 1})$ in 𝓡.
10	If the number of samples in 𝓡 > batch size B then:
11	Sample a mini-batch of transitions from the replay memory 𝓡
12	For each agent i do
13	$Compute target value y_{i} = r_{i} + γ Q_{i^{'}} (s^{'}, a^{'} ∣ θ_{i}^{Q^{'}})$
14	$Update critic by minimizing loss L (θ_{i}^{Q}) = E [{(y_{i} - Q_{i} (s, a ∣ θ_{i}^{Q}))}^{2}]$
15	Update actor policy using the sampled policy gradient $\nabla_{θ_{i}^{μ}} J \approx E [\nabla_{θ_{i}^{μ}} μ_{i} (o_{i}) \nabla_{a_{i}} Q_{i} (s, a)]$
16	End for
17	Update target networks
18	End if
19	End For
20	End for

4. Case Study

4.1. Parameter Settings

The three-phase four-wire low-voltage distribution network considered in this study is adapted from [19] and illustrated in Figure 1. The network consists of 20 nodes in total. Among the residential users, four are connected to three-phase loads, while three are equipped with single-phase rooftop photovoltaic (PV) systems. Eight households are supplied by single-phase loads with rooftop PV installations, and an additional six households are equipped with single-phase loads, rooftop PV systems, and battery energy storage units, as summarized in Table 2. The hyperparameters used for training the Actor and Critic neural networks are listed in Table 3.

Each single-phase battery system is modeled as an intelligent agent, which regulates the imbalance through charging and discharging behaviors. All battery systems are assumed to have an energy capacity of 40 kWh. The training dataset is obtained from the open-source GEFCom dataset [37]. And the photovoltaic generation and load profiles adopted for training are illustrated in Figure 4.

To mitigate the observed imbalance, an active power balancing approach based on MADDPG is adopted. The dataset shown in Figure 4 is utilized for agent training, which is conducted over 5000 training episodes. At the beginning of each episode, the initial battery active power is initialized as a uniformly distributed random value. A shared reward mechanism is adopted for all single-phase energy storage agents throughout training. The reward function, detailed in Section 3.2, penalizes three-phase imbalance at the common point. As shown in Figure 5, the agents demonstrate poor imbalance mitigation capability at the beginning but steadily enhance their performance through experience-driven learning.

4.2. Example Result

To comprehensively evaluate the proposed approach, four control strategies are compared: (a) No BESS: No energy storage is deployed. (b) Random Policy: Batteries charge or discharge randomly. (c) Convex Optimization (OPF): A centralized optimal power flow model solved by Gurobi 11.0.3. (d) Proposed MADDPG Framework: Decentralized multi-agent reinforcement learning. The OPF solution serves as a near-optimal performance benchmark.

Figure 6 presents the time-varying current imbalance factor (IUF) under four strategies. The no-control case exhibits pronounced imbalance throughout the day, with a peak value of approximately 2.75% around 18:00.

The random charging policy fails to mitigate imbalance effectively and even increases the peak IUF to 3.08%, demonstrating that uncoordinated battery actions may aggravate phase asymmetry. In contrast, the proposed MADDPG framework reduces the peak IUF to approximately 2.50%. The convex optimization solution achieves a slightly lower peak IUF of 2.48%, indicating that MADDPG closely approximates centralized optimal performance. More importantly, during the critical evening peak at 18:00, the IUF decreases from 2.75% in the no-control case to 2.5% under MADDPG, achieving a peak reduction of 9.1%. This demonstrates the strong capability of the proposed method in suppressing extreme negative-sequence current events, which are particularly harmful to transformer thermal limits and equipment lifespan.

Figure 7 compares the maximum daily current imbalance factor (IUF) under different control strategies. In the no-control scenario, the peak IUF reaches 2.75%, indicating severe phase imbalance during high-load periods. The random charging strategy further increases the maximum IUF to 3.08%, demonstrating that uncoordinated battery actions may even aggravate phase asymmetry.

In contrast, the proposed MADDPG framework reduces the peak IUF to 2.50%, corresponding to a 9.1% reduction compared to the no-control case and an 18.8% reduction compared to random charging.

The centralized convex optimization (OPF) solution achieves a slightly lower peak IUF of 2.48%. The performance gap between MADDPG and OPF is less than 0.02%, indicating that the learning-based approach closely approximates centralized optimal control.

As shown in Figure 8, the daily average IUF under the no-control scenario is 0.666%, while the proposed MADDPG framework reduces it to 0.628%, corresponding to a 6.1% reduction. Compared with random charging, the imbalance is further reduced by 5.9%, indicating that coordinated multi-agent learning is essential for effective phase-level power redistribution.

The centralized convex optimization solution achieves the lowest average imbalance 0.626% and serves as an upper performance bound. The performance gap between MADDPG and the optimal solution is only 0.02%, indicating that the learning-based method closely approximates centralized optimal control.

Although convex optimization provides slightly better performance, it requires solving a global optimization problem at every time step and depends on full network parameter knowledge. In contrast, the proposed MADDPG framework performs offline training and only requires neural network inference during online execution, enabling decentralized implementation and real-time scalability for large distribution feeders.

Figure 9 illustrates the three-phase current at the PCC under different control strategies. In the no-control case, significant phase loading asymmetry is observed. Phase A consistently carries higher current than Phases B and C, especially during the evening peak period, indicating substantial imbalance in phase-level power distribution.

The random charging strategy fails to mitigate this asymmetry. In certain intervals, the dominant-phase current even increases compared to the no-control scenario, demonstrating that uncoordinated battery operation does not inherently guarantee power quality improvement.

In contrast, the proposed MADDPG framework effectively suppresses the peak current of the overloaded phase. For example, during the evening peak around 18:00, the Phase A current decreases from 130 A in the no-control case to 122 A under MADDPG, corresponding to a reduction of 6.01%. Meanwhile, the currents of the lighter phases remain relatively stable, indicating successful redistribution of active power among phases.

The centralized convex optimization solution achieves the lowest phase current imbalance overall. Notably, the MADDPG results closely follow the optimal trajectory, demonstrating that the learning-based approach can approximate centralized optimal control while operating in a decentralized and model-free manner.

The reduction in dominant-phase current directly lowers the negative-sequence current component at the PCC, thereby explaining the IUF improvements observed in Figure 7.

Figure 10 compares the maximum and minimum phase currents at the PCC under different control strategies. In the no-control case, the maximum phase current reaches approximately 125 A, indicating significant dominant-phase loading. Under the random charging policy, the maximum current further increases to about 129 A, suggesting that uncoordinated battery actions may intensify phase stress.

In contrast, the proposed MADDPG framework reduces the maximum phase current to approximately 110 A, corresponding to a 12.0% reduction compared to the no-control scenario and a 14.7% reduction compared to random charging.

The centralized OPF solution achieves a slightly lower maximum current of about 109 A, indicating that MADDPG closely approaches centralized optimal performance.

5. Conclusions

This paper proposes a coordinated phase balancing framework for low-voltage distribution networks using distributed battery energy storage systems. By formulating the control objective based on the Current Unbalance Factor (IUF), the proposed approach shifts the focus from voltage regulation to phase-level current redistribution at the PCC.

Simulation results demonstrate that the coordinated control strategy effectively reduces both peak and average imbalance compared with no-control and random-operation scenarios. The peak IUF decreases from 2.75% to 2.50%, and the maximum phase current is reduced by approximately 12% during heavy loading periods. The achieved performance approaches that of a centralized convex optimization benchmark while avoiding repeated global optimization during online operation.

This study is based on a linearized DistFlow approximation and a single-day operating profile. Future work will investigate multi-day stochastic scenarios and larger-scale networks to further evaluate robustness and scalability.

Overall, by combining centralized offline training with decentralized real-time execution, the proposed framework provides a practical solution for mitigating phase imbalance in low-voltage distribution networks with high penetration of distributed energy resources.

Author Contributions

Conceptualization, J.L.; Methodology, R.D., Y.J. (Yan Jiang), X.Z., J.L. and Y.J. (Youwei Jia); Validation, R.D., Y.J. (Yan Jiang), X.Z., X.S., J.L. and Y.J. (Youwei Jia); Formal analysis, J.L.; Investigation, R.D., X.S. and Y.J. (Youwei Jia); Resources, R.D. and X.S.; Writing—original draft, J.L.; Writing—review & editing, R.D., Y.J. (Yan Jiang), X.Z. and Y.J. (Youwei Jia); Supervision, Y.J. (Youwei Jia); Funding acquisition, R.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Xiaolong Song was employed by the company Yunnan Electric Power Test & Research Institute (Group) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PV	Photovoltaic
PCC	Point of common coupling
MADDPG	Multi-agent deep deterministic policy gradient
BESS	Battery energy storage systems
IUF	Current unbalance factor
DG	Distributed generation
LVDNs	Low-voltage distribution networks
DER	Distributed energy resource
OPF	Optimal power flow
VUF	Voltage unbalance factor
CTDE	Centralized training and decentralized execution
$P_{n, ϕ, t}^{i n j}$	$Active power injection at node n$ $, phase ϕ$ $, time t$
$Q_{n, ϕ, t}^{i n j}$	$Reactive power injection at node n$ $, phase ϕ$ $, time t$
$v_{n, ϕ, t}$	$Squared voltage magnitude at node n$ $, phase ϕ$ $, time t$
$S O C_{k, t}$	State of charge of battery energy storage
$I U F_{t}$	Current unbalance factor
$P_{k, t}^{B E S S}$	$Charging / discharging power of storage unit k$
$E_{k}^{t h}$	$Energy throughput limit of storage unit k$
$R_{k}^{r a m p}$	Ramp-rate limit of storage unit $k$
$V_{n o m}$	Nominal voltage magnitude
$x_{i j}$	Line reactance
$r_{i j}$	Line resistance
$I_{ϕ, t}^{P C C}$	PCC phase current
$I_{1, t}$	Positive sequence current
$I_{2, t}$	Negative sequence current

References

Hamza, E.A.; Sedhom, B.E.; Badran, E.A. Impact and Assessment of the Overvoltage Mitigation Methods in Low-voltage Distribution Networks with Excessive Penetration of PV Systems: A Review. Int. Trans. Electr. Energ. Syst. 2021, 31, e13161. [Google Scholar] [CrossRef]
Licari, J.; Rhaili, S.E.; Micallef, A.; Staines, C.S. Addressing Voltage Regulation Challenges in Low Voltage Distribution Networks with High Renewable Energy and Electrical Vehicles: A Critical Review. Energy Rep. 2025, 14, 2977–2997. [Google Scholar] [CrossRef]
Gabdullin, Y.; Azzopardi, B. Impacts of Photovoltaics in Low-Voltage Distribution Networks: A Case Study in Malta. Energies 2022, 15, 6731. [Google Scholar] [CrossRef]
Hungbo, M.; Gu, M.; Meegahapola, L.; Littler, T.; Bu, S. Impact of Electric Vehicles on Low-voltage Residential Distribution Networks: A Probabilistic Analysis. IET Smart Grid 2023, 6, 536–548. [Google Scholar] [CrossRef]
Helm, S.; Hauer, I.; Wolter, M.; Wenge, C.; Balischewski, S.; Komarnicki, P. Impact of Unbalanced Electric Vehicle Charging on Low-Voltage Grids. In Proceedings of the 2020 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe); IEEE: New York, NY, USA, 2020; pp. 665–669. [Google Scholar]
Ibrahim, I.A.; Hossain, M.J. Low Voltage Distribution Networks Modeling and Unbalanced (Optimal) Power Flow: A Comprehensive Review. IEEE Access 2021, 9, 143026–143084. [Google Scholar] [CrossRef]
Mukwekwe, L.; Venugopal, C.; Davidson, I.E. A Review of the Impacts and Mitigation Strategies of High PV Penetration in Low Voltage Networks. In 2017 IEEE PES Power Africa; IEEE: New York, NY, USA, 2017; pp. 274–279. [Google Scholar]
Zeraati, M.; Golshan, M.E.H.; Guerrero, J.M. Voltage Quality Improvement in Low Voltage Distribution Networks Using Reactive Power Capability of Single-Phase PV Inverters. IEEE Trans. Smart Grid 2018, 10, 5057–5065. [Google Scholar] [CrossRef]
Ju, Y.; Liu, W.; Zhang, Z.; Zhang, R. Distributed Three-Phase Power Flow for AC/DC Hybrid Networked Microgrids Considering Converter Limiting Constraints. IEEE Trans. Smart Grid 2022, 13, 1691–1708. [Google Scholar] [CrossRef]
Pratama, N.A.; Rahmawati, Y. Evaluation of Unbalanced Load Impacts on Distribution Transformer Performances. Front. Energy Syst. Power Eng. 2020, 2, 28–35. [Google Scholar]
Moses, P.S.; Masoum, M.A. Three-Phase Asymmetric Transformer Aging Considering Voltage-Current Harmonic Interactions, Unbalanced Nonlinear Loading, Magnetic Couplings, and Hysteresis. IEEE Trans. Energy Convers. 2012, 27, 318–327. [Google Scholar] [CrossRef]
Ulinuha, A.; Sari, E.M. The Influence of Harmonic Distortion on Losses and Efficiency of Three-Phase Distribution Transformer. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2021; Volume 1858, p. 012084. [Google Scholar]
Chicco, G.; Pons, E.; Russo, A.; Spertino, F.; Porumb, R.; Postolache, P.; Toader, C. Assessment of Unbalance and Distortion Components in Three-Phase Systems with Harmonics and Interharmonics. Electr. Power Syst. Res. 2017, 147, 201–212. [Google Scholar] [CrossRef]
Wang, J.; Li, D. A Three-Phase-Unbalance Mitigation Strategy Based on the Synergy of Intelligent Phase-Shifting Switches and Flexible Self-Balancing Switches. Electronics 2026, 15, 776. [Google Scholar] [CrossRef]
Babu, M.N.; Dhal, P.K. Impact of Load Flow and Network Reconfiguration for Unbalanced Distribution Systems. Meas. Sens. 2024, 32, 101078. [Google Scholar] [CrossRef]
Lotfi, H.; Hajiabadi, M.E.; Parsadust, H. Power Distribution Network Reconfiguration Techniques: A Thorough Review. Sustainability 2024, 16, 10307. [Google Scholar] [CrossRef]
Almeida, D.; Pasupuleti, J.; Ekanayake, J. Comparison of Reactive Power Control Techniques for Solar PV Inverters to Mitigate Voltage Rise in Low-Voltage Grids. Electronics 2021, 10, 1569. [Google Scholar] [CrossRef]
Aboshady, F.M.; Pisica, I.; Zobaa, A.F.; Taylor, G.A.; Ceylan, O.; Ozdemir, A. Reactive Power Control of PV Inverters in Active Distribution Grids with High PV Penetration. IEEE Access 2023, 11, 81477–81496. [Google Scholar] [CrossRef]
Nour, A.M.M.; Hatata, A.Y.; Helal, A.A.; El-Saadawi, M.M. Review on Voltage-violation Mitigation Techniques of Distribution Networks with Distributed Rooftop PV Systems. IET Gener. Transm. Distrib. 2020, 14, 349–361. [Google Scholar] [CrossRef]
Alzahrani, A.; Alharthi, H.; Khalid, M. Minimization of Power Losses through Optimal Battery Placement in a Distributed Network with High Penetration of Photovoltaics. Energies 2019, 13, 140. [Google Scholar] [CrossRef]
Aljohani, T.M.; Saad, A.; Mohammed, O.A. Two-Stage Optimization Strategy for Solving the VVO Problem Considering High Penetration of Plug-In Electric Vehicles to Unbalanced Distribution Networks. IEEE Trans. Ind. Appl. 2021, 57, 3425–3440. [Google Scholar] [CrossRef]
Shang, X.-J.; Mishra, Y.; Yang, Y.; Liu, J.-L.; Yu, Z.-G.; Tian, Y.-C. Optimal Coordinated Scheduling of Electric Vehicles and Battery Energy Storage Systems. IEEE Trans. Consum. Electron. 2025, 71, 2944–2954. [Google Scholar] [CrossRef]
Zamzam, T.; Shaban, K.; Gaouda, A.; Massoud, A. Performance Assessment of Two-Timescale Multi-Objective Volt/Var Optimization Scheme Considering EV Charging Stations, BESSs, and RESs in Active Distribution Networks. Electr. Power Syst. Res. 2022, 207, 107843. [Google Scholar] [CrossRef]
Zhang, D.; Shafiullah, G.M.; Das, C.K.; Wong, K.W. Optimal Allocation of Battery Energy Storage Systems to Enhance System Performance and Reliability in Unbalanced Distribution Networks. Energies 2023, 16, 7127. [Google Scholar] [CrossRef]
Nazir, N.; Almassalkhi, M. Stochastic Multi-Period Optimal Dispatch of Energy Storage in Unbalanced Distribution Feeders. Electr. Power Syst. Res. 2020, 189, 106783. [Google Scholar] [CrossRef]
Chicco, G.; Mazza, A. 100 Years of Symmetrical Components. Energies 2019, 12, 450. [Google Scholar] [CrossRef]
Gan, L.; Low, S.H. Convex Relaxations and Linear Approximation for Optimal Power Flow in Multiphase Radial Networks. In Proceedings of the 2014 Power Systems Computation Conference, Wroclaw, Poland, 18–22 August 2014; pp. 1–9. [Google Scholar]
Ye, Y.; Wang, H.; Chen, P.; Tang, Y.; Strbac, G. Safe Deep Reinforcement Learning for Microgrid Energy Management in Distribution Networks with Leveraged Spatial–Temporal Perception. IEEE Trans. Smart Grid 2023, 14, 3759–3775. [Google Scholar] [CrossRef]
Wang, W.; Yu, N.; Gao, Y.; Shi, J. Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems. IEEE Trans. Smart Grid 2020, 11, 3008–3018. [Google Scholar] [CrossRef]
Bahrami, S.; Chen, Y.C.; Wong, V.W.S. Deep Reinforcement Learning for Demand Response in Distribution Networks. IEEE Trans. Smart Grid 2021, 12, 1496–1506. [Google Scholar] [CrossRef]
Pinthurat, W.; Hredzak, B. Multi-Agent Deep Reinforcement Learning for Mitigation of Unbalanced Active Powers Using Distributed Batteries in Low Voltage Residential Distribution System. Electr. Power Syst. Res. 2025, 245, 111599. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, D.; Qiu, R.C. Deep Reinforcement Learning for Power System Applications: An Overview. CSEE J. Power Energy Syst. 2020, 6, 213–225. [Google Scholar] [CrossRef]
Sun, X.; Qiu, J. Two-Stage Volt/Var Control in Active Distribution Networks with Multi-Agent Deep Reinforcement Learning Method. IEEE Trans. Smart Grid 2021, 12, 2903–2912. [Google Scholar] [CrossRef]
Wang, Y.; Xiao, M.; You, Y.; Poor, H.V. Optimized Energy Dispatch for Microgrids with Distributed Reinforcement Learning. IEEE Trans. Smart Grid 2024, 15, 2946–2956. [Google Scholar] [CrossRef]
Xu, H.; Sun, H.; Nikovski, D.; Kitamura, S.; Mori, K.; Hashimoto, H. Deep Reinforcement Learning for Joint Bidding and Pricing of Load Serving Entity. IEEE Trans. Smart Grid 2019, 10, 6366–6375. [Google Scholar] [CrossRef]
Fan, J.; Liebman, A.; Wang, H. Safety-Aware Reinforcement Learning for Electric Vehicle Charging Station Management in Distribution Network. In Proceedings of the 2024 IEEE Power & Energy Society General Meeting (PESGM), Seattle, WA, USA, 21–25 July 2024; pp. 1–5. [Google Scholar]
Zhang, Y.; Wang, J. GEFCom2014 Probabilistic Solar Power Forecasting Based on K-Nearest Neighbor and Kernel Density Estimator. In Proceedings of the 2015 IEEE Power & Energy Society General Meeting, Denver, CO, USA, 26–30 July 2015; pp. 1–5. [Google Scholar]

Figure 1. System of Low-voltage distribution network. (a, b, c indicate different phases, numbers indicate different nodes).

Figure 2. Overall framework of BESS-based three-phase imbalance mitigation.

Figure 3. Reinforcement learning framework.

Figure 4. Photovoltaic and load change data.

Figure 5. Reward of the agent training process.

Figure 6. Current unbalance factor (IUF) at the PCC under different control strategies.

Figure 7. Comparison of the maximum IUF under different control strategies.

Figure 8. Comparison of the average IUF under different control strategies.

Figure 9. Phase current at the PCC under different control strategies.

Figure 10. Maximum and minimum current magnitudes at the PCC under different control strategies.

Table 1. Comparison table with Literature.

Literature	Centralized OPF	Distributed BESS	RL-Based	Direct IUF Objective	CTDE	Distributed Real-Time Execution
[14,15,16]	✓	✕	✕	✕	✕	✕
[23,25]	✓	✓	✕	△	✕	✕
[21,22]	✓	✓	✕	△	✕	✕
[27]	△	✓	✕	△	✕	△
[26,27,28]	✕	△	✕	✕	△	✓
[32,33,34]	✕	△	✓	✕	△	✓
[31]	✕	✓	✓	△	✕	✓
This work	✕	✓	✓	✓	✓	✓

✓, considered; ✕, not considered; △, partially considered.

Table 2. Energy storage parameters.

Agent No.	1	2	3	4	5	6
Node	1	4	10	12	17	19
Phase	A	B	A	B	B	C

Table 3. Training parameter.

Parameter	Value	Parameter	Value
Learning Rate	1 × 10⁻³	Discount Factor	0.95
Soft Update Coefficient	0.01	Replay Buffer	100,000
Actor network layers	3	Actor network neurons	256
Critic network layers	3	Critic network neurons	512
Actor activation function	Relu/ Tanh	Critic activation function	Relu

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Duan, R.; Jiang, Y.; Zhu, X.; Song, X.; Luo, J.; Jia, Y. Coordinated Energy Storage Optimization for Power Quality in High-Renewable Distribution Networks. Energies 2026, 19, 2373. https://doi.org/10.3390/en19102373

AMA Style

Duan R, Jiang Y, Zhu X, Song X, Luo J, Jia Y. Coordinated Energy Storage Optimization for Power Quality in High-Renewable Distribution Networks. Energies. 2026; 19(10):2373. https://doi.org/10.3390/en19102373

Chicago/Turabian Style

Duan, Ruiqin, Yan Jiang, Xinchun Zhu, Xiaolong Song, Junjie Luo, and Youwei Jia. 2026. "Coordinated Energy Storage Optimization for Power Quality in High-Renewable Distribution Networks" Energies 19, no. 10: 2373. https://doi.org/10.3390/en19102373

APA Style

Duan, R., Jiang, Y., Zhu, X., Song, X., Luo, J., & Jia, Y. (2026). Coordinated Energy Storage Optimization for Power Quality in High-Renewable Distribution Networks. Energies, 19(10), 2373. https://doi.org/10.3390/en19102373

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Coordinated Energy Storage Optimization for Power Quality in High-Renewable Distribution Networks

Abstract

1. Introduction

2. System Framework

2.1. Overall System Structure

2.2. Establishment of the System Model

3. Three-Phase Unbalance Optimization Based on MADRL

3.1. Markov Decision Process Model

3.2. Constraint Enforcement in the RL Framework

3.3. MADDPG Algorithm

4. Case Study

4.1. Parameter Settings

4.2. Example Result

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI