Fairness–Performance Trade-Offs in Active Power Curtailment for Radial Distribution Grids with Battery Energy Storage

Giorgos Gotzias; Eleni Stai; Symeon Papavassiliou

doi:10.3390/en18225873

,

and

School of Electrical and Computer Engineering, National Technical University of Athens, Iroon Polytechniou 9, 15772 Zografou, Greece

^*

Author to whom correspondence should be addressed.

Energies2025, 18(22), 5873;https://doi.org/10.3390/en18225873

This article belongs to the Special Issue Techno-Economic Analysis and Optimization for Energy Systems: 3rd Edition on the Way to Green Transition

Version Notes

Order Reprints

Abstract

The increasing integration of decentralized technologies such as photovoltaic (PV) systems and electric vehicles (EVs) poses significant challenges to the reliable operation of radial distribution grids. In this paper, we study Active Power Curtailment (APC), which is a cost-effective method that maintains grid safety by temporarily reducing power injections. However, APC can place disproportional curtailment burden on grid buses that may in fact undermine the continuous adoption of PVs and EVs. In this work, we propose different novel APC methods that incorporate fairness properties for radial grids with PVs, EVs, and battery energy storage systems (BESSs). In addition, we integrate BESSs and show their benefits in lowering APC levels and achieving better PV and EV utilization while enhancing fairness. The proposed APC designs allow for fast decision making and can be generalized to unseen grids. To do so, a two-step solution is adopted, where in the first step, a reinforcement learning (RL)-based agent determines uniform per-feeder APC and BESS actions, and in the second step, heuristic controllers disaggregate these actions into tailored per-bus decisions while incorporating fairness features. Through simulations, the controllers are shown to mitigate over

99 %

of constraint violations and significantly enhance fairness in curtailment distribution. BESSs are shown to improve the violations count and APC trade-off, leaning towards reduced APC percentages. Finally, we exemplify how the solution generalizes effectively to unseen grid configurations.

Keywords:

active power curtailment; fairness; active distribution grids; grid safety constraints; battery energy storage; renewable energy; electric vehicles; reinforcement learning

1. Introduction

The reliable operation of electric power systems depends on their ability to maintain stability and avoid service interruptions by keeping voltages and ampacities within safe operating margins. The complexity of this task is increasing as distribution grids are being transformed by the rapid uptake of decentralized, uncertain, and variable technologies such as photovoltaic (PV) generation and electric vehicles (EVs). EV charging clusters can exacerbate local voltage drops during peak demand periods [], while PV systems can cause overvoltages when generation exceeds consumption []. Managing these emerging challenges in view of the increasing grid scales requires control strategies that can handle uncertain and rapidly changing power injections while remaining cost-effective and operationally simple. A portfolio of solutions can be collected from the literature to ensure the reliability of power grids in the presence of PVs and EVs. Common approaches to reduce voltage constraint violations include the use of On-Load Tap Changer control [,], reactive power control [], Hybrid Transformers [], and Active Power Curtailment (APC) [].

In this work, we focus on APC, which remains a widely applied and cost-effective method for maintaining safe grid operation by dynamically reducing active power injections based on temporal demands/generation. APC is essential for preventing violations of grid safety constraints; yet, if applied without consideration of fairness, it risks creating constant inequities among grid prosumers undermining public acceptance of distributed energy integration and electromobility. Indeed, in our previous work in [], the results of the sensitivity analysis have shown that the APC levels are minimized when curtailing specific buses that are the farthest from the PCC, indicating the risks of being constantly unfair to particular grid participants. Embedding fairness principles into APC is, therefore, crucial to fairly split curtailment burden across grid participants while maintaining overall low curtailment percentages.

In parallel, battery energy storage systems (BESSs) are emerging as a critical resource to mitigate these challenges and reduce APC levels. By absorbing surplus PV output and supplying energy during charging peaks, BESSs can substantially reduce the need for curtailment and provide flexibility that supports grid security. Although the combination of APC with battery control has been studied in the literature, showing the possibilities of reducing APC levels brought about by BESSs, it has hardly been seen together with fairness considerations as far as PVs, EVs and BESSs are concerned. This is exactly the gap we aim to fill in this work by proposing APC strategies in power grids including PVs, EVs, and BESSs and considering impacts on fairness and showing emerging fairness–performance trade-offs. In addition, to align with the increasing scales of future power grids, the proposed APC policies are based on reinforcement learning (RL) to enable fast decision making and are designed so that they are generalizable, i.e., they can be applied to unseen grids without re-training while maintaining satisfactory performance.

The contributions of this paper can be summarized as follows:

We develop an APC strategy for radial distribution grids with PVs, EVs, and BESSs that determines optimal curtailments for PV generation and EV demand per bus. The APC strategy is based on our previous work in [] and consists of two steps, namely, one step that applies RL to determine uniform per-feeder curtailments and a second step that applies heuristic disaggregators to determine tailored per-bus curtailments. The novelty of the proposed scheme in this work compared to the existing one lies in the consideration of fairness in per-bus curtailments while not impacting the generalization and low complexity properties of the two-step approach as well as the integration of BESSs that can further enhance fairness and performance.
Fairness is intelligently incorporated into the heuristic design of the disaggregation step and not imposed as a hard constraint that would have required explicit system modeling, impeding generalization to unseen grids, and would have led to increased computational complexity which would require solving complex optimization problems. BESSs decisions are incorporated via RL-based decision making and have guaranteed fairness per feeder.
We have performed extensive evaluations and comparisons of the proposed solution approach with emphasis on validating its fairness properties as well as the advantages brought by the integration of BESSs. In summary, fairness in curtailments is significantly enhanced, thus encouraging the integration of EVs and PVs at all buses independently of their location, and with BESSs the trade-off between fairness and performance further improves.

The rest of the paper is organized as follows. Section 2 positions our work within the related literature. Section 3 explains the considered system, formulates the problem, and details the fairness metrics that will be incorporated into our approach. Section 4 presents the RL-based solution approach. Section 5 includes all the numerical evaluations and comparisons and assesses the generalization properties of the proposed method. Finally, Section 6 discusses conclusive remarks and future steps.

2. Related Works

The APC method has been widely studied in the literature to ensure satisfaction of voltage and current constraints in power grids by limiting active power injection at the buses. Common approaches include droop-based methods [], OPF with linearized grid models [], non-convex AC OPF solutions [], and model predictive control []. Recent studies such as [,,] have focused on machine learning and reinforcement learning approaches that do not require a grid model, offering significant potential for generalization and computational tractability. Specifically, in our previous work [], an RL-based solution for voltage regulation in radial low-voltage grids with high penetration of PVs and EVs was proposed. The approach uses a per-feeder aggregated state and action representation, which enables generalization possibilities. The solution was enhanced in [] with disaggregation heuristics based on a sensitivity analysis study. However, these techniques tend to penalize buses located farther from the PCC, raising fairness concerns. The present work builds upon the approach in [], while explicitly addressing these fairness issues.

Ensuring fairness in APC is critical to encourage sustainability and continuous adoption of renewable generation and electromobility by sharing the burden proportionally and avoiding constant impact on specific buses/producers. Due to its importance, fairness in APC has been studied in the literature. The work in [] proposes an online feedback optimization approach to minimize curtailment needs while satisfying operational constraints. A modification of the voltage sensitivity matrix is suggested to provide fair solutions. In [], three notions of fairness are studied, a linearized OPF formulation is proposed for each, and their impacts on MV/LV distribution grids are assessed. Furthermore, [] proposes an ADMM-based algorithm for Volt–Var Control (VVC), which addresses fairness in APC for PVs. Jointly controlling VVC and APC minimizes the number of voltage violations, however, increases the communication burden. In [], a heuristic control algorithm, inspired by a sensitivity analysis, aims to suggest similar curtailment percentage for all PVs and achieve fairness. A decentralized control method for Volt–Var–Watt control of PV inverters with minimal communication requirements is studied in []. By limiting the curtailment, the proportional fairness metrics are improved. An RL-based approach for controlling PV inverters which considers fairness is presented in [], where fairness is incorporated into the reward of the RL agent. Similarly, the work in [] employs an RL agent to ensure grid operational limits satisfaction for unbalanced three-phase networks with a reward formulated to account for fairness.

The aforementioned works considering fairness in APC ([,,,,,]) do not include EVs in their proposed control schemes. However, the increasing EV adoption may be exploited to reduce the curtailment needs in PV-dominated grids []. In this spirit, in [], the EV batteries are controlled along with APC for PVs to mitigate voltage rises above operational limits. In addition, in [], a droop-based EV charging strategy along with APC is proposed to improve the economic and environmental benefits in a low-voltage distribution network. The vehicle-to-grid (V2G) capability that is studied in previous works ([,]) emerges as a promising direction to enhance the utilization of EV batteries. However, it requires explicit modeling of EV arrivals and departures, impeding the potential of generalizability to different grids and loads for the control strategies. The works above often consider fairness regarding PV curtailment but neglect fairness with respect to EV charging. In [], a max–min fairness notion regarding charging delay is formulated as a mixed-integer program. A distributed algorithm is proposed that iteratively solves linear programs to approximate the optimal solution efficiently. A fairness-focused approach on EV charging is presented on []. The suggested heuristic algorithm, augmented with a forecasting model to predict the departure time of EVs, balances evenly the possibilities of EVs to reach their desirable charging levels. In [], a hierarchical framework is suggested, where the system operator considers both grid operational constraints and fairness in the allocation of maximum charging power among EV aggregators. At the lower level, a multi-agent RL approach is incorporated to manage the charging of individual EVs. However, the fairness is enforced at the upper level by solving an OPF problem, which is computationally demanding.

Another approach attracting increasing research interest is the combination of battery energy storage systems (BESSs) with APC which aims to improve curtailment percentages. The popularity of BESSs is increasing as battery investment costs are decreasing [] in combination with their low operational costs []. The study in [] demonstrates that jointly controlling PV inverters and BESSs can significantly reduce the APC level, yielding financial benefits. In [], traditional volt–var and volt–watt techniques are used to control BESSs along with APC. The study focuses on optimal BESS sizing and shows that the inclusion of BESS allows higher PV adoption without compromising the power grid’s reliability. However, the literature lacks studies on joint considerations of APC in the presence of BESSs while accounting for fairness both in terms of PVs and EVs curtailments as well as of BESSs control. In our previous work [], we studied the problem of distributed batteries control using multi-agent RL and considering fairness issues among batteries. In comparison to [], the current paper additionally studies fair APC for PV generation and EV demands.

In this paper, we propose diverse APC schemes for distribution grids including PVs, EVs, and BESSs. The studied APC schemes are based on RL and are designed with generalization properties so that they can be applied in different grid topologies and under different loads without re-training. This is achieved via a two-step solution approach: first, an RL agent makes uniform per-feeder decisions, and then, heuristic controllers disaggregate the decisions into tailored per-bus PV and EV curtailments. We focus on enhancing the disaggregation with fairness properties based on fairness criteria such as those in [], while not impacting the generalization and low complexity properties of the solution. We thoroughly study the emerging trade-off between fairness and performance achieved for the different designs. Moreover, we integrate BESSs control in our framework to further improve the performance–fairness trade-off. Experiments demonstrate that the augmented framework that includes BESSs can also be generalized to unseen grids without re-training.

3. System Model and Problem Formulation

3.1. System Model

We consider a low-voltage distribution grid with a radial structure and model it as a graph consisting of the set

N

of PQ buses and the set

L

of lines. A single bus, with ID 0, is set as both the slack bus and the Point of Common Coupling (PCC) between the low-voltage and the upper-level grid. The operation of the upper-level grid is outside the scope of this work. Because of its radial structure, the grid can be organized into F feeders, each directly connected to the PCC. We denote the set of feeders by

F

. The buses and lines belonging to the feeder f form the sets

N_{f} = {(f, 1), \dots, (f, N_{f})} \subset N

and

L_{f} = {(f, 1), \dots, (f, L_{f})} \subset L

, respectively. We assume that indices are ordered with increasing geometric distance from the PCC, i.e., if

i < j

, then

(f, i)

is closer from or equally distant to the PCC as

(f, j)

. Buses located farther from the PCC than a given bus, in terms of geometric distance, are referred to as ‘downstream’, while those closer to the PCC are considered ‘upstream’. We also denote by

N_{f}^{E V}, N_{f}^{P V}

, and

N_{f}^{B E S S}

the subsets of buses in feeder f where an EV, a PV, or a BESS is connected, respectively.

The control horizon is discretized into T time slots, each with a duration of

Δ t

. Let t stand for the time index with

t \in T = {1, \dots, T} \in N

. The voltage magnitude on bus n of the feeder f is referred to as

U_{f, n} (t)

and the maximum current magnitude along the line l of feeder f is denoted by

I_{f, l} (t)

. For convenience, we use

U (t)

and

I (t)

to represent the collection of voltage and current magnitude values along the whole grid. The total apparent power at bus

(f, n) \in N_{f}

is calculated as

S_{f, n} (t, a_{f, n}^{E V}, a_{f, n}^{P V}, a_{f, n}^{B E S S}) = S_{f, n}^{H} (t) + S_{f, n}^{E V} (t, a_{f, n}^{E V}) + S_{f, n}^{P V} (t, a_{f, n}^{P V}) + S_{f, n}^{B E S S} (t, a_{f, n}^{B E S S}),

(1)

where

S_{f, n}^{H} (t) = P_{f, n}^{H, d} (t) + j \cdot Q_{f, n}^{H, d} (t)

is the inflexible household (H) power injection at bus n of feeder f with P denoting the active power and Q the reactive power, respectively.

S_{f, n}^{X} (t, a_{f, n}^{X}) = a_{f, n}^{X} \cdot P_{f, n}^{X, d} (t) + j \cdot Q_{f, n}^{X, d} (t), X \in {E V, P V}

is the EV or PV apparent power injection at bus n of feeder f.

P_{f, n}^{X, d} (t)

denotes the active power demand and

a_{f, n}^{X} \in [0, 1]

the controllable curtailment action, that is, the ratio of the active power allowed to the demand of EVs and PVs. Positive values of power correspond to consumption, while negative values correspond to generation. Similarly,

S_{f, n}^{B E S S} (t, a_{f, n}^{B E S S}) = a_{f, n}^{B E S S} \cdot {\bar{P}}_{f, n}^{B E S S} = P_{f, n}^{B E S S} (t)

is the apparent power injection of the battery located at bus n of feeder f, which is considered fully controllable through the variable

a_{f, n}^{B E S S} \in [- 1, 1]

, which is the ratio of battery’s applied power injection to its maximum power limit

{\bar{P}}_{f, n}^{B E S S}

. Battery’s reactive power is omitted for simplicity as reactive power control is out of the scope of this work.

The grid dynamics are determined through the set of nonlinear power flow equations, which couple the bus voltages

U (t)

and line currents

I (t)

with the power injections at each bus and the network parameters. The voltage and current values should be constrained to ensure the grid’s safe operation; thus the following constraints are defined:

\begin{matrix} U_{f, n} (t) \in [\underset{̲}{U}, \bar{U}], \forall f \in F, \forall (f, n) \in N_{f}, \forall t \in T, \end{matrix}

(2)

\begin{matrix} I_{f, l} (t) \in [0, \bar{I}], \forall f \in F, \forall (f, l) \in L_{f}, \forall t \in T, \end{matrix}

(3)

where

[\underset{̲}{U}, \bar{U}]

is the interval of safe operation with respect to voltage magnitudes and

[0, \bar{I}]

is the interval of safe operation with respect to current magnitudes.

Regarding the battery model, we follow similar lines as in []. The battery’s state-of-charge (SoC) value is denoted as

S o C_{f, n} (t)

, with

η_{f, n}^{c h} \leq 1

and

η_{f, n}^{d i s} \leq 1

being the battery’s charging and discharging efficiency coefficients, respectively.

E_{f, n}^{max}

is the maximum energy capacity of the battery. The following set of equations and constraints determines the battery’s operation:

\begin{matrix} S o C_{f, n} (t) & = S o C_{f, n} (t - 1) + η_{f, n}^{c h} \cdot \frac{max {P_{f, n}^{B E S S} (t), 0}}{E_{f, n}^{max}} \cdot Δ t + \frac{min {P_{f, n}^{B E S S} (t), 0}}{η_{f, n}^{d i s} \cdot E_{f, n}^{max}} \cdot Δ t, \end{matrix}

(4)

\begin{matrix} S o C_{f, n}^{min} & \leq S o C_{f, n} (t) \leq S o C_{f, n}^{max}, \end{matrix}

(5)

\begin{matrix} - {\bar{P}}_{f, n}^{B E S S} & \leq P_{f, n}^{B E S S} (t) \leq {\bar{P}}_{f, n}^{B E S S}, \end{matrix}

(6)

\begin{matrix} S o C_{f, n} (T) & = S o C_{f, n} (0) \forall f \in F, \forall (f, n) \in N_{f}, \forall t \in T . \end{matrix}

(7)

Equation (4) is the battery’s SoC evolution equation, whereas (5) and (6) impose the state-of-charge and active power bounds, respectively. Equation (7) ensures that the battery’s SoC at the end of the control horizon is equal to its initial SoC value,

S o C_{f, n} (0)

, i.e., at the beginning of the control horizon, which is considered a given.

3.2. Problem Formulation

The aim of controlling the EV, PV, and BESS power injections is to achieve a safe grid operation by minimizing voltage and current constraints violations while minimizing curtailments of PV generation and EV demand. This is a similar problem as the one tackled in our previous work in [] with the difference being that controllable batteries are also included, which introduce time dependencies among optimization variables via their SoC evolution equation (Equation (4)). Under ideal conditions that the values of the uncertain quantities over time (i.e., consuming loads and PV generation) are given by an oracle, our grid control problem can be expressed as the following optimization problem:

\begin{matrix} min_{a (t), U (t), I (t)} \sum_{t = 1}^{T} \sum_{f = 1}^{F} [w_{1} \sum_{n = 1}^{N_{f}} ϕ_{1} (U_{f, n} (t)) + w_{2} \sum_{n = 1}^{N_{f}} ϕ_{2} (U_{f, n} (t)) + w_{3} \sum_{l = 1}^{L_{f}} ϕ_{3} (I_{f, l} (t)) + w_{4} \sum_{n = 1}^{N_{f}} (1 - a_{f, n}^{E V} (t)) + w_{5} \sum_{n = 1}^{N_{f}} (1 - a_{f, n}^{P V} (t))] s . t . the power flow equations with power injections \end{matrix}

(8)

\begin{matrix} S_{f, n} (t, a_{f, n}^{E V}, a_{f, n}^{P V}, a_{f, n}^{B E S S}), \forall t \in T, \forall f \in F, \forall (f, n) \in N_{f}, \end{matrix}

(9)

\begin{matrix} (4) - (7), \forall t \in T, \forall f \in F, \forall (f, n) \in N_{f}^{B E S S}, \end{matrix}

(10)

where

a (t) = {[a_{1, 1}^{E V} (t), a_{1, 1}^{P V} (t), a_{1, 1}^{B E S S} (t), \dots, a_{F, N_{f}}^{E V} (t), a_{F, N_{f}}^{P V} (t), a_{F, N_{f}}^{B E S S} (t)]}^{T}

is the vector of controllable quantities and

w_{1}, w_{2}, w_{3}, w_{4}, w_{5} \in [0, 1]

are the weights of the individual objectives. Voltage and current constraints violations are penalized via the functions

ϕ_{1} (U_{f, n} (t)) = max (0, U_{f, n} (t) - \bar{U})

,

ϕ_{2} (U_{f, n} (t)) = max (0, \underset{̲}{U} - U_{f, n} (t))

,

ϕ_{3} (I_{f, l} (t)) = max (0, I_{f, l} (t) - \bar{I})

, which correspond to soft form expressions of the constraints (2) and (3). The last two terms of the objective in (8) aim to minimize the APC for both EVs and PVs so as to maximize the satisfaction of EV charging demands and PV utilization. The non-linearity of power flow equations implies that the problem at hand is non-convex.

In this work, we design causal grid controllers that perform APC jointly with battery control. Causality means that decisions are based solely on currently known information. This work enhances and extends our previous works in [,] by additionally adopting BESSs and introducing fairness criteria in curtailment decisions. In particular, we design an RL-based solution of (8)–(10) as well as heuristic APC controllers that integrate fair BESSs control and fairness among buses in the APC process. We aim at designing controllers that can be generalized in different grid topologies and loads without re-training. Moreover, we evaluate the performance of the heuristic controllers in terms of safety constraints violations, amount of curtailed power, and fairness regarding the per-bus PV and EV-demand curtailment quantities.

3.3. Fairness Criteria

In this section, we mathematically formulate the fairness criteria considered in this work for the APC process. Starting with PVs, an often-used definition of fairness in the context of PVs’ APC is proportional fair generation. This criterion is widely adopted in the literature, for instance, in [,,]. According to proportional fair generation, the ratio of curtailed power to maximum generation should be equal across buses that belong to the same feeder. In other words,

a_{f, n_{i}}^{P V} (t) = a_{f, n_{j}}^{P V} (t), \forall f \in F, \forall {(f, n_{i}), (f, n_{j})} \in N_{f}^{P V} \times N_{f}^{P V} .

(11)

A similar fairness condition is adopted for the EV curtailment, i.e.,

a_{f, n_{i}}^{E V} (t) = a_{f, n_{j}}^{E V} (t), \forall f \in F, \forall {(f, n_{i}), (f, n_{j})} \in N_{f}^{E V} \times N_{f}^{E V} .

(12)

Contrary to existing works in the literature, e.g., ref. [], we apply a more generalizable EV fairness condition that is aligned with our target of designing a solution approach that can be generalized to different grid topologies and loads. For instance, the EV fairness criterion used in [] involves the charging time of each EV, thus, requiring knowledge of the detailed EV charging model, whereas (12) does not demand information tailored to each EV. Note that these fairness conditions are not directly imposed on the proposed controllers, but they are indirectly accounted for through parameter tuning or other design features, as will be detailed in the following section. This is essential for maintaining the computational complexity low and real-time suitable. Imposing the fairness conditions as hard constraints requires solving an optimal power flow for the whole grid, which is hard to scale in very fast time scales.

4. Solution Approach

We develop a heuristic and causal solution to the problem stated in Section 3.2 that consists of two steps. The first step uses a trained RL agent to determine uniform per-feeder curtailment and battery power decisions. In other words, it determines for each feeder f three curtailment coefficient values: the first applies to all EVs in

N_{f}^{E V}

, the second for all PVs in

N_{f}^{P V}

, and the third one to all batteries in

N_{f}^{B E S S}

. The second step uses heuristic controllers that disaggregate the per-feeder coefficients to per-bus coefficients while accounting for the fairness criteria in Section 3.3 among different buses. This methodology is illustrated in Figure 1.

Figure 1. Two-step APC and BESS control solution.

Before describing the controllers of each step, we state the assumptions regarding their operation. Each controller has access to the system state, denoted by

s (t)

, which includes overvoltage, undervoltage, and overcurrent violation quantities from previous time steps, the current SoC values of the BESSs, and the current time step index. For the controllers of the second step, depending on their design, additional state information may be required, and this is contained in the vector

x_{f}^{D} (t)

defined for each feeder

f \in F

. Examples of such additional information are the location of each bus or forecasts of future PV and EV realizations.

4.1. RL Agent for the First Solution Step: Aggregated Agent

In this section, we design the RL-based agent of the first step that requires as input only aggregated information per feeder and outputs uniform per-feeder curtailment coefficients. The proposed agent is an extension of the RL agent of our previous work in [], denoted as Aggregated Agent (AA) to include BESSs. The evaluation of [] has demonstrated that the AA enhances training efficiency due to its relatively low dimensionality of state and action spaces and can be applied to grids with similar topologies and load patterns without requiring re-training. The aggregation of the state and the action spaces per feeder is the key feature of the AA that enables its good generalization properties. In the current design, we leverage the advantages of the previously developed AA agent and enhance its state and action spaces with batteries SoC values and uniform per-feeder BESSs power decisions, correspondingly. Regarding the BESSs, we assume that those belonging to the same feeder have identical characteristics and that their initial SoC values are the same. Thus, a uniform action is always applicable. Next, we describe in detail the Markov Decision Process (MDP) formulation that is used for the AA.

State: The state of the agent is formulated as

\begin{matrix} s (t) = & [\bar{U} (t - h - 1), \dots, \bar{U} (t - 1), \underset{̲}{U} (t - h - 1), \dots, \underset{̲}{U} (t - 1), \\ \bar{I} (t - h - 1), \dots, \bar{I} (t - 1), \tilde{SoC} (t - 1), t], \end{matrix}

(13)

where (i)

\bar{U} (t) = [{\bar{U}}_{1} (t), \dots, {\bar{U}}_{F} (t)]

, with

{\bar{U}}_{f} (t) = {max}_{n \in N_{f}} {U_{f, n} (t)}

,

\forall f \in F

, (ii)

\underset{̲}{U} (t) = [{\underset{̲}{U}}_{1} (t), \dots, {\underset{̲}{U}}_{F} (t)]

, with

{\underset{̲}{U}}_{f} (t) = {max}_{n \in N_{f}} {U_{f, n} (t)}

,

\forall f \in F

, (iii)

\bar{I} (t) = [{\bar{I}}_{1} (t), \dots, {\bar{I}}_{F} (t)]

, with

{\bar{I}}_{f} (t) = {max}_{l \in L_{f}} {I_{f, l} (t)}

,

\forall f \in F

, and (iv)

\tilde{SoC} (t) = [{\tilde{S o C}}_{1} (t), \dots, {\tilde{S o C}}_{F} (t)]

, where

{\tilde{S o C}}_{f} (t)

is the average SoC value of the batteries belonging to feeder f and h is the historical window size. Historical data for h prior time steps to

t - 1

are included to improve the agent’s forecasting capabilities by capturing violation patterns.

Action: The AA agent suggests a uniform action for each feeder, i.e., the same action is applied to all buses that belong to the same feeder. Thus, the action space is

\begin{matrix} \tilde{a} (t) = [{\tilde{a}}_{1}^{E V} (t), \dots, {\tilde{a}}_{F}^{E V} (t), {\tilde{a}}_{1}^{P V} (t), \dots, {\tilde{a}}_{F}^{P V} (t), {\bar{a}}_{1}^{B E S S} (t), \dots, {\bar{a}}_{F}^{B E S S} (t)], \end{matrix}

(14)

and an action projection mechanism may then adjust

{\bar{a}}_{f}^{B E S S} (t)

to

{\tilde{a}}_{f}^{B E S S} (t)

in order to satisfy the constraints (5)–(7). The action applied to bus n of feeder f is

a_{f, n}^{X} (t) = {\tilde{a}}_{f}^{X} (t)

,

X \in {E V, P V, B E S S}

, if

(f, n) \in N_{f}^{X}

. Note that, in case there are no EVs in a feeder, then the corresponding action is omitted from the action space. Similarly for PVs and BESSs.

Transition Process and Reward Function: After the action is decided based on the current state observation, the uncertain quantities for time t (i.e., the PV generation value, the EV demand, and other consumption loads) get realized and can be observed. Then, the voltage and current values for time t are derived through power flow calculations which inputs the action at t and the PV, EV, and load realizations at t. The reward function corresponds to the part of the objective function (8) corresponding to time t, i.e.,

\begin{matrix} r (t) = & - w_{1} \sum_{f = 1}^{F} ϕ_{1} ({\bar{U}}_{f} (t)) - w_{2} \sum_{f = 1}^{F} ϕ_{2} ({\underset{̲}{U}}_{f} (t)) - w_{3} \sum_{f = 1}^{F} ϕ_{3} ({\bar{I}}_{f} (t)) \\ - w_{4} \sum_{f = 1}^{F} (1 - {\tilde{a}}_{f}^{E V} (t)) - w_{5} \sum_{f = 1}^{F} (1 - {\tilde{a}}_{f}^{P V} (t)) . \end{matrix}

(15)

The AA is trained through the Soft Actor–Critic (SAC) algorithm [], which is an RL algorithm well-suited for continuous action spaces.

A feature of AA worth mentioning is that applying a uniform action per-feeder satisfies the fairness conditions in Section 3.3. Thus, the AA is considered fair by design. That is why we have not explicitly included the fairness criteria in our problem formulation. In the following, we present the heuristic controllers of step 2 that disaggregate the uniform per-feeder RL actions to per-bus decisions.

4.2. Heuristic Controllers for the Second Solution Step: Disaggregators

In this section, we briefly describe the disaggregators developed in [], which take as input the uniform per-feeder action produced by the RL agent in the first step and compute individualized actions for each PV and EV, based on the location of their corresponding bus. Furthermore, in this paper, we propose two enhanced versions of these disaggregators with improved performance and fairness in several scenarios. Disaggregation is applied solely to determine per-bus PV and EV curtailments, while the battery decisions from the first step remain unchanged; in other words, a uniform per-feeder action is applied to the BESSs. This choice is motivated by the temporal interdependencies introduced in the problem by the batteries affecting the satisfaction of hard constraints and which, upon disaggregation, may introduce inconsistencies in the actual impact of the AA uniform decision. Despite this choice being considered limiting, our evaluations demonstrate that the inclusion of BESSs improves the cost performance of the disaggregators while maintaining fairness.

The key design principle of the disaggregators is obtained from a sensitivity analysis presented in [] which has shown that buses farther from the PCC have a higher impact on voltage magnitudes as their power injection affects the voltages of all upstream buses, i.e., the buses closer to the PCC.

LD disaggregator: The first disaggregator is referred to as the Linear Disaggregator (LD). A per-bus action,

a_{f, n}^{X} (t), \forall (f, n) \in N_{f}^{X}, X \in {E V, P V}

, is determined such that

\frac{\sum_{(f, n) \in N_{f}^{X}} a_{f, n}^{X} (t)}{| N_{f}^{X} |} = {\tilde{a}}_{f}^{X} (t)

, i.e., the mean value of curtailment multipliers is equal to the uniform action of the AA (solution of step 1). The exact value of

a_{f, n}^{X} (t)

at bus

(f, n) \in N_{f}^{X}

for

X \in {E V, P V}

is given as follows:

\begin{matrix} \begin{matrix} n_{f}^{r e f, X} & = \frac{1}{2} (| N_{f}^{X} | - 1), \\ Δ {\bar{a}}_{f}^{X} (t) & = min \{1 - {\tilde{a}}_{f}^{X} (t), {\tilde{a}}_{f}^{X} (t)\}, \\ Δ a_{f, n}^{X} (t) & = \{\begin{matrix} m_{s}^{X} \cdot (r e i n d e x (X, f, n) - n_{f}^{r e f, X}), if |m_{s}^{X} \cdot (r e i n d e x (X, f, n) - n_{f}^{r e f, X})| \leq Δ {\bar{a}}_{f}^{X} (t), \\ sign (m_{s}^{X} \cdot (r e i n d e x (X, f, n) - n_{f}^{r e f, X})) \cdot Δ {\bar{a}}_{f}^{X} (t), otherwise, \end{matrix} \\ a_{f, n}^{X} (t) & = {\tilde{a}}_{f}^{X} (t) + Δ a_{f, n}^{X} (t), \end{matrix} \end{matrix}

(16)

where

Δ a_{f, n}^{X} (t)

denotes the offset of the disaggregated from the uniform AA actions,

m_{s}^{X} < 0

is a parameter of the LD controller that corresponds to the slope of the linear disaggregation, and

Δ {\bar{a}}_{f}^{X} (t)

is an upper bound on the absolute value of the offset

Δ a_{f, n}^{X} (t)

.

r e i n d e x (X, f, n)

is an auxiliary function that calculates the relative index of

(f, n)

in the set

N_{f}^{X}

, i.e., it returns k if

(f, n)

is the k-th nearest bus to the PCC among those in

N_{f}^{X}

. Note that the reference bus

n_{f}^{r e f, X}

may be a virtual bus located between two actual buses when the cardinality of

N_{f}^{X}

is even. The absolute value of the offset

Δ a_{f, n}^{X} (t)

is equal for buses located at equal distances from the reference bus

n_{f}^{r e f, X}

. The definition of

Δ {\bar{a}}_{f}^{X} (t)

ensures that

a_{f, n}^{X} (t) \in [0, 1], \forall (f, n) \in N_{f}^{X}

.

PBD disaggregator: The second disaggregator is denoted as the Power-Based Disaggregator (PBD). PBD allows the same PV generation or EV consumption as the AA would have permitted by its uniform action. To achieve this, PBD uses the additional information vector

x_{f}^{D} (t)

with entries forecasts of the per-bus next-time-step generation and consumption denoted as

P_{f, n}^{H, p} (t)

,

P_{f, n}^{E V, p} (t)

, and

P_{f, n}^{P V, p} (t)

for household demand of EVs and PVs, respectively. Initially, PBD sets

a_{f, n}^{X} (t) = 1, \forall (f, n) \in N_{f}^{X}, X \in {P V, E V}

. Then, it checks if generation or consumption dominates the feeder when no curtailment action is applied and curtails solely PV or EV power, respectively. If the quantity

\sum_{(f, n) \in N_{f}} P_{f, n}^{H, p} (t) + \sum_{(f, n) \in N_{f}^{E V}} P_{f, n}^{E V, p} (t) + \sum_{(f, n) \in N_{f}^{P V}} P_{f, n}^{P V, p} (t) + \sum_{(f, n) \in N_{f}^{B E S S}} {\tilde{a}}_{f, n}^{B E S S} (t) \cdot {\bar{P}}_{f, n}^{B E S S}

is positive, then consumption dominates the grid. In this case, PBD greedily curtails EV demand starting from the bus that is located the farthest from the PCC by setting

a_{f, n}^{E V} (t) = 0

and continues upstream towards the PCC until the allowed EV consumption becomes equal to the AA’s one, i.e., until the following condition holds:

\sum_{n \in N_{f}^{E V}} a_{f, n}^{E V} (t) P_{f, n}^{E V, p} (t) = {\tilde{a}}_{f}^{E V} (t) \sum_{n \in N_{f}^{E V}} P_{f, n}^{E V, p} (t) .

(17)

When generation dominates the feeder, PBD curtails only PVs in the same way as with the EVs above until the following condition is fulfilled:

\sum_{n \in N_{f}^{P V}} a_{f, n}^{P V} (t) P_{f, n}^{P V, p} (t) = {\tilde{a}}_{f}^{P V} (t) \sum_{n \in N_{f}^{P V}} P_{f, n}^{P V, p} (t) .

(18)

CPBD disaggregator: In this paper, we propose a variation in PBD, called Constraint PBD (CPBD), which aims to enhance PBD’s fairness properties in line with the fairness criteria of Section 3.3. CPBD restricts the per-bus curtailment by minimum and maximum quantities that depend on the uniform action given by the AA. In particular, in CPBD the curtailment multipliers are constrained by

(1 - ϵ) \cdot {\tilde{a}}_{f}^{X} (t) \leq a_{f, n}^{X} (t) \leq (1 + ϵ) \cdot {\tilde{a}}_{f}^{X} (t),

(19)

where

ϵ \geq 0

is a parameter that determines the maximum relative deviation from the uniform action. To determine such curtailment multipliers, we initially set

a_{f, n}^{X} (t) = 1, \forall (f, n) \in N_{f}^{X}, X \in {P V, E V}

. Then, we set

a_{f, n}^{X} (t) = min {(1 + ϵ) \cdot {\tilde{a}}_{f}^{X} (t), 1}, \forall n \in N_{f}^{X}, \forall f \in F

, where X is

E V

when consumption dominates without curtailment and

P V

otherwise. Then, CPBD starts the curtailment from the bus the farthest from the PCC, but the action is set equal to

a_{f, n}^{X} (t) = (1 - ϵ) \cdot {\tilde{a}}_{f}^{X} (t)

instead of zero, set by PBD. Curtailment stops at the bus where one of the conditions (17) or (18) becomes valid for dominating consumption or generation, respectively.

EPBD disaggregator: An alternative way to improve fairness in the utilization of the feeders is to consider the bus export. In this respect, we propose the Export-considering PBD (EPBD) disaggregator. EPBD curtails either PVs or EVs, depending on whether the feeder is dominated by generation or consumption (without curtailment), respectively. Then, starting from the bus the farthest from the PCC, it curtails PV power or EV power for each bus (depending on the decision above) until the corresponding bus’ power export is zero. Curtailment stops when one of the conditions (17), (18) becomes active or at the bus

(f, 1)

that is closest to the PCC. In the case that consumption (generation) dominates the grid but locally at a bus

(f, n)

, there is dominating generation (consumption), no curtailment is applied at the corresponding bus, i.e.,

a_{f, n}^{X} (t) = 1, X \in {E V, P V}

.

In Figure 2, the individual per-bus actions derived from the proposed disaggregators are depicted and compared using a toy example. Consider a single-feeder network consisting of 10 buses, where the PV generation is equal to 20 kW and the EV demand is equal to 10 kW for each bus. BESSs are not included in this example. The AA suggests the uniform actions

{\tilde{a}}_{f}^{E V} (t) = 0.7

and

{\tilde{a}}_{f}^{P V} (t) = 0.6

, and these actions are applied to all buses. The LD controller, the slope of which is set to

m_{s}^{E V} = m_{s}^{P V} = - 0.1

in this example, suggests actions that decrease linearly with the distance from PCC and curtails both PVs and EVs at each bus. Contrarily, the remaining disaggregators penalize only PV generation as it exceeds the EV demand without curtailment. PBD curtails all PV generation for buses 7–10 (the farthest ones from the PCC) and allows the total PV generation from the remaining buses. On the contrary, CPBD with parameter

ϵ

set to

0.35

suggests the minimum acceptable action of

(1 - ϵ) \cdot {\tilde{a}}_{f}^{P V} (t) = 0.39

for the last five buses (6–10) and the maximum one, which is equal to

(1 + ϵ) \cdot {\tilde{a}}_{f}^{P V} = 0.81

, for the remaining ones. Lastly, the EPBD sets the curtailment multiplier

a_{f, n}^{P V} (t)

equal to

0.5

for the buses with index

n \geq 3

and all other multipliers to 1. Thus, for buses with

n \geq 3

, the allowed PV generation is equal to their local EV demand, setting their export to the grid to zero. Buses 1 and 2 remain unaffected as the condition (18) is fulfilled already without curtailing PV from them.

Figure 2. Toy example illustrating the disaggregation decisions of the different controllers for solution step 2.

Rules-Based Curtailment (RBC): RBC is inspired by droop-based control and is proposed in our previous work []. In this section, we provide a brief overview of how it works, and as an enhancement to its version in [], we show how per-bus fairness is incorporated in the APC process and how BESSs control is included.

For BESSs, RBC applies the uniform action computed by the AA and corresponding to their feeder. For EVs and PVs, curtailments are computed based on the heuristic scheme explained below. Notice that the RBC curtailments for PVs and EVs are computed in a future myopic way, and thus, BESSs time interdependencies and future SoC constraints cannot be easily encoded within the RBC framework, and that is why AA is used for BESSs. However, a simplified version of the AA can be trained to control the BESSs along with the RBC. The state and the reward of the simplified AA remain the same as those used in Section 4.1, but the action vector includes only the BESSs uniform per-feeder power decisions, i.e.,

\begin{matrix} \tilde{a} (t) = [{\bar{a}}_{1}^{B E S S} (t), \dots, {\bar{a}}_{F}^{B E S S} (t)], \end{matrix}

(20)

along with the action projection (so that constraints (5)–(7) are satisfied) to compute

{\tilde{a}}_{f}^{B E S S} (t), \forall f \in F

. In this way, by reducing the action space, training becomes faster and more efficient. During the training of AA, RBC is executed in the environment to decide the APC of PVs and EVs. In this way, the agent controlling BESSs is expected to learn the dynamics corresponding to RBC and to improve the performance of the method.

For PVs and EVs, RBC suggests curtailment actions proportional to the deviation in voltages and currents from RBC bounds, which are tighter than grid operational limits. When a violation occurs at a bus indexed by n or at a line indexed by l, all downstream buses, as well as

η

upstream buses, should share the curtailment amount of the RBC. The choice of curtailing all the downstream buses is aligned with the sensitivity analysis of [], whereas the parameter

η

is exploited in the current work to improve the fairness of the controller as with

η \geq 1

more buses sharing the penalty.

RBC uses tightened bounds to compute violations, namely,

{\underset{̲}{U}}^{R B C} \geq \underset{̲}{U}

,

{\bar{U}}^{R B C} \leq \bar{U}

for the voltage lower and upper limits and

{\bar{I}}^{R B C} \leq \bar{I}

for the ampacity upper constraint. In particular, for the bus n of feeder

f \in F

, the violation quantities are calculated as

\begin{matrix} Δ {\underset{̲}{U}}_{f, n} (t) & = max {{\underset{̲}{U}}^{R B C} - U_{f, n} (t), 0}, \\ Δ {\bar{U}}_{f, n} (t) & = max {U_{f, n} (t) - {\bar{U}}^{R B C}, 0}, \\ Δ {\bar{I}}_{f, l} (t) & = max {I_{f, l} (t) - {\bar{I}}^{R B C}, 0} . \end{matrix}

(21)

Then, RBC applies a uniform curtailment action to the

η

upstream buses of the violation location (here it is considered at bus n or line l) and all downstream buses, which is computed as follows. First, the following auxiliary curtailment multipliers are computed:

\begin{matrix} c_{f, n}^{\underset{̲}{U}} (t) & = 1 - M^{\underset{̲}{U}} \cdot \frac{Δ {\underset{̲}{U}}_{f, n} (t - 1)}{N_{f} - max {n - η, 0} + 1}, \end{matrix}

(22)

\begin{matrix} c_{f, n}^{\bar{U}} (t) & = 1 - M^{\bar{U}} \cdot \frac{Δ {\bar{U}}_{f, n} (t - 1)}{N_{f} - max {n - η, 0} + 1}, \end{matrix}

(23)

\begin{matrix} c_{f, l}^{\bar{I}} (t) & = 1 - M^{\bar{I}} \cdot \frac{Δ {\bar{I}}_{f, l} (t - 1)}{N_{f} - max {n - η, 0} + 1}, \end{matrix}

(24)

where

M^{\underset{̲}{U}}, M^{\bar{U}}

, and

M^{\bar{I}}

are the sensitivity coefficients for undervoltage, overvoltage, and overcurrent violations, respectively, which are computed experimentally. Then, the auxiliary curtailment multipliers are normalized by the ratio of the allowed power at the previous time step to the expected power at the current time step, i.e.,

\begin{matrix} π_{f, n}^{X} (t, c) = \{\begin{matrix} min \{\frac{P_{f, n}^{X} (t - 1) \cdot c}{P_{f, n}^{X, p} (t)}, 1\}, & if P_{f, n}^{X, p} (t) \neq 0, \\ c, & otherwise, \end{matrix} \end{matrix}

(25)

where

X \in {E V, P V}

and c is one of the coefficients in (22)–(24). Intuitively, when the upcoming power injection is expected to be lower than the previous realization, a decreased curtailment action can be adequate; otherwise, the suggested curtailment amount is increased to prevent constraint violations.

If no violations occur in the entire feeder, RBC still applies a curtailment action that, compared to the curtailment action of the previous time step, increases proportionally to the relative expected power generation/consumption drop from the previous time step. This is expressed as follows:

\begin{matrix} ψ_{f, n}^{X} (t) = \{\begin{matrix} \frac{P_{f, n}^{X, d} (t - 1) - P_{f, n}^{X, p} (t)}{P_{f, n}^{X, d} (t - 1)} + a_{f, n}^{X} (t - 1), & if 0 < |P_{f, n}^{X, p} (t)| < |P_{f, n}^{X, d} (t - 1)|, \\ 1, & else if P_{f, n}^{X, p} (t) = 0, \\ 0, & else if a_{f, n}^{X} (t - 1) = 0, \\ a_{f, n}^{X} (t - 1), & otherwise . \end{matrix} \end{matrix}

(26)

Note that when multiple curtailment actions are suggested to the same bus because of constraints violations at different locations, the controller applies the maximum of them.

The steps of the RBC controller are summarized below:

For BESSs set $a_{f, n}^{B E S S} (t) \leftarrow {\tilde{a}}_{f}^{B E S S} (t)$ , $\forall n \in N_{f}^{B E S S}$ .
Set $a_{f, n}^{E V} (t) \leftarrow 1$ , $\forall n \in N_{f}^{E V}$ and $a_{f, n}^{P V} (t) \leftarrow 1$ , $\forall n \in N_{f}^{P V}$ .
$Y \leftarrow \{\begin{matrix} E V, if net consumption over the feeder, \\ P V, otherwise . \end{matrix}$
Compute $c_{f, n}^{\underset{̲}{U}} (t), c_{f, n}^{\bar{U}} (t), c_{f, l}^{\bar{I}} (t), \forall n \in N_{f}, \forall l \in L_{f}$ according to (22)–(24).
For every n where $c_{f, n}^{\underset{̲}{U}} (t) > 0$ do $a_{f, j}^{E V} (t) \leftarrow min {a_{f, j}^{E V} (t), π_{f, n}^{E V} (t, c_{f, n}^{\underset{̲}{U}} (t))},$ $\forall j \in {n - η, \dots, N_{f}} \cap N_{f}^{E V}$ .
For every n where $c_{f, n}^{\bar{U}} (t) > 0$ do $a_{f, j}^{P V} (t) \leftarrow min {a_{f, j}^{P V} (t), π_{f, n}^{P V} (t, c_{f, n}^{\bar{U}} (t))},$ $\forall j \in {n - η, \dots, N_{f}} \cap N_{f}^{P V}$ .
For every n where $c_{f, n}^{\bar{I}} (t) > 0$ do $a_{f, j}^{Y} (t) \leftarrow min {a_{f, j}^{Y} (t), π_{f, n}^{Y} (t, c_{f, n}^{\bar{I}} (t))},$ $\forall j \in {n - η, \dots, N_{f}} \cap N_{f}^{Y}$ .
If No Violations then $a_{f, n}^{Y} (t) \leftarrow min {ψ_{f, n}^{Y} (t), 1}, \forall n \in {1, \dots, N_{f}} \cap N_{f}^{Y}$ .

5. Results

5.1. Simulation Setup and Sketch of Experiments

We consider a dual-feeder radial low-voltage grid consisting of 17 households per feeder and the PCC. The considered total number of buses is realistic, since it may represent real distribution grids such as the one used in []. However, in Section 5.4, we also consider larger grids. The grid parameters are obtained from the SimBench dataset []. Specifically, the voltage bounds are set to

\underset{̲}{U} = 0.9

p.u. and

\bar{U} = 1.1

p.u. with a base voltage equal to

0.4

kV, and the ampacity limit is set to

\bar{I} = 0.27

kA. We assume that there is an EV connected to all buses, whereas PVs are connected to half of the buses, and specifically, those with indices n in the set

{1, 3, 5, \dots, 17}

for each feeder. BESSs are connected to buses with indices n within the set

{1, 5, \dots, 17}

for each feeder, and their parameters are obtained from [], i.e., the SoC capacity is set to

E_{f, n}^{max} = 20

kWh, the maximum power injection is equal to

{\bar{P}}_{f, n}^{B E S S} (t) = 10

kW, and the charging/discharging efficiencies are

η_{f, n}^{c h} = η_{f, n}^{d i s} = 0.91

, for each bus

(f, n) \in N_{f}^{B E S S}, \forall f \in F

. The open-source Python library PandaPower 2.13.1 [] is incorporated to model the grid.

The household demand, EV demand and PV generation data are obtained from SimBench []. The measurements took place in a time horizon of 366 days with a time resolution of 15 min. Thus, the duration of the time step is set to

Δ t = 15

min as well. Reactive power is considered only for the household demand, whereas, PVs and EVs generate and consume, respectively, only active power. Similarly with [],

90 %

of the days are used for training,

5 %

for hyperparameter tuning, and

5 %

as the test set.

RL agents are trained using the open-source implementation of the SAC algorithm in Stable-Baselines3 2.1.0 []. The hyperparameter values are set as in []: the learning rate is set equal to

3 \times 10^{- 4}

, the minibatch size is set to 100, and the replay buffer size is set to

10^{6}

. The actor and critic networks have identical architecture, consisting of two layers with 256 neurons each. Regarding the discount factor

γ

, it is set to 0 when there are no BESSs. However, due to the temporal dependency introduced by BESSs, a positive value for

γ

is applied when BESSs are present. After experimentation, this value is set to

γ = 0.5

. Training is performed for six epochs where an epoch corresponds to training over the whole dataset once, and this is sufficient for convergence. Regarding the RL agent trained to control BESSs along with RBC, the learning rate is set to

10^{- 4}

and the discount factor to

γ = 0.9

. A total of four epochs is sufficient for convergence of the training process in this case. The remaining parameters of the optimization problem and the controllers are listed in Table 1.

Table 1. Optimization and controller/disaggregator parameters.

We perform four sets of experiments to evaluate the proposed two-step solution approach. The first does not include BESSs and aims to assess the performance of the disaggregators in terms of reducing constraint violations and power curtailment as well as achieve fairness. The performance of the disaggregators is compared to linearized OPF solutions, which are explained in more detail in []. The NLOPF is used to estimate the optimal performance values. In particular, NLOPF minimizes the objective function (8) independently for each time step, which is equivalent to minimizing the aggregated objective, as there is no time dependency when BESSs are excluded. The constraint (9) is replaced by a linearized approximation of power flow equations based on a sensitivity study to improve computational efficiency. The voltage and current operating limits are tightened to

[0.905, 1.095]

and

[0, 0.265]

, respectively, to account for approximation errors. We assume that the NLOPF method has access to exact predictions to achieve optimal performance. The second baseline is referred to as ULOPF and is similar to NLOPF but applies a uniform curtailment action per feeder, analogous to AA. As the uniform actions enforce the fairness conditions at each time step, this approach can be considered as a fairness-aware OPF formulation. Next, we perform more detailed sensitivity analysis to demonstrate which parameters can be properly adjusted to tune the observed performance–fairness trade-off. The second set of experiments includes BESSs to examine the performance and fairness improvements attributed to them. In addition, we perform a third set of experiments to examine the training needs for more challenging scenarios of higher load and larger grids. Finally, we study the generalization capabilities of the proposed solution approach including BESSs by evaluating its performance in different grid topologies and loads compared with those used to train the AA. Unless otherwise specified, the reported results are derived by averaging the performance of five trained AAs. In all cases, persistent forecasting is applied for the PBD, CPBD, EPBD, and RBC controllers, i.e., we use, as forecast of the PV generation and EV demand of the next time slot, their currently observed values.

5.2. Evaluation Metrics

The first two metrics are defined to assess the two-step solution performance. The first metric used is the APC percentage, which is calculated as

\begin{matrix} APC percentage \\ = (1 - \frac{\sum_{t \in T} \sum_{f \in F} (\sum_{n \in N_{f}^{E V}} a_{f, n}^{E V} (t) \cdot P_{f, n}^{E V, d} (t) - \sum_{n \in N_{f}^{P V}} a_{f, n}^{P V} (t) \cdot P_{f, n}^{P V, d} (t))}{\sum_{t \in T} \sum_{f \in F} (\sum_{n \in N_{f}^{E V}} P_{f, n}^{E V, d} (t) - \sum_{n \in N_{f}^{P V}} P_{f, n}^{P V, d} (t))}) \cdot 100 % . \end{matrix}

(27)

The second metric is the ratio of violations occurring in the test set to the number of violations occurring when no control is performed.

The remaining metrics quantify fairness properties of the diverse disaggregators in the two-step solution approach. Fairness is assessed for PVs and EVs curtailment. Let us first define the following auxiliary variables:

\begin{matrix} y_{f, n}^{P V} = \frac{\sum_{t \in T} (P_{f, n}^{P V, d} (t) - a_{f, n}^{P V} (t) \cdot P_{f, n}^{P V, d} (t))}{\sum_{t \in T} P_{f, n}^{P V, d} (t)}, \end{matrix}

(28)

\begin{matrix} y_{f, n}^{E V} = \frac{\sum_{t \in T} (P_{f, n}^{E V, d} (t) - a_{f, n}^{E V} (t) \cdot P_{f, n}^{E V, d} (t))}{\sum_{t \in T} P_{f, n}^{E V, d} (t)} . \end{matrix}

(29)

The first considered fairness metric is the Min–Max Ratio (MMR), defined as follows:

M M R_{f}^{X} = \frac{{min}_{(f, n) \in N_{f}^{X}} y_{f, n}^{X}}{{max}_{(f, n) \in N_{f}^{X}} y_{f, n}^{X}} \cdot 100 %, X \in {E V, P V} .

(30)

The second fairness metric is the Jain’s Fairness Index (JFI), defined as follows:

J F I_{f}^{X} = \frac{{(\sum_{(f, n) \in N_{f}^{X}} y_{f, n}^{X})}^{2}}{| N_{f}^{X} | \sum_{(f, n) \in N_{f}^{X}} {(y_{f, n}^{X})}^{2}} \cdot 100 %, X \in {E V, P V} .

(31)

MMR assesses fairness based on extreme cases, whereas JFI considers all buses of the feeder. In the results that follow, for each fairness metric, we present its minimum value along all the feeders, which corresponds to the worst case in terms of fairness.

Another metric related to fairness that is evaluated in this work is the Minimum Utilization. We define utilization as the percentage of allowed power to the power when no curtailment is applied, which is expressed for a particular bus n as

u t_{f, n}^{X} = \frac{\sum_{t \in T} a_{f, n}^{X} (t) \cdot P_{f, n}^{X, d} (t)}{\sum_{t \in T} P_{f, n}^{X, d} (t)} \cdot 100 %, X \in {E V, P V} .

(32)

The minimum utilization value along all buses of the grid corresponds to the evaluated metric. The utilization metric is critical, especially for PVs, as low values may discourage possible investment.

5.3. Evaluation of the Proposed Disaggregators Without BESSs

Figure 3a depicts the performance metrics, i.e., the APC percentage, and the remaining violations for the AA, the two-step solution including the different disaggregators and the OPF solutions. All disaggregators present improved performance compared to the AA agent in terms of APC. For RBC, this comes at the expense of a slight increase in violations count. This comparison unveils the benefits of the tailored per-bus curtailments compared to the uniform per-feeder decisions. All disaggregators mitigate

99 %

or more violations compared to the uncontrolled case that corresponds to fewer than

0.049

violations per time step observed on average. The lowest number of violations is achieved by EPBD, while RBC achieves the lowest APC percentages. Especially, when the parameter

η

is set equal to 0, i.e., no upstream buses are affected, RBC achieves an APC percentage equal to

40.28 %

, which is much lower than the one of the second-best-performing disaggregator, i.e.,

50.44 %

for EPBD. PBD has similar performance to EPBD. CPBD and LD exhibit equivalent performance, which is better than AA, but not as good as PBD and EPBD.

Figure 3. Performance evaluation of the controllers in terms of ratio of violations and APC percentage (a) without BESSs; (b) with BESSs.

Regarding the RBC controller, we examine three different values for the parameter

η

, namely,

η = 0, 8, 16

. RBC with

η = 0

does not apply violation penalties to upstream buses from the violation location, whereas all buses (all upstream and all downstream buses) share the violation penalty for

η = 16

. The evaluation results indicate that lower values of

η

improve the performance as RBC with

η = 0

achieves an APC percentage that is

6.3

degrees lower, accompanied by at least

35 %

fewer violations than RBC with

η = 8

. RBC with

η = 16

still curtails less power than EPBD, but the number of violations is doubled, i.e., it is less efficient.

OPF approaches give the near-optimal solutions and thus, as expected, they achieve much better performance than the two-step solution approach. However, they are not suitable for real-time decision making as they are computationally intensive, requiring a full grid model and accurate forecasts. Computational complexity worsens when BESSs are included (e.g., in Figure 3b where OPF solutions are meaningless for real-time and, thus, not included in the comparisons) due to the introduced time dependencies that demand for multi-period optimization in this case.

Next, we discuss the fairness properties of the AA, the disaggregators, and the OPF-based solutions. Figure 4 presents the MMR and JFI values for the curtailment of PVs and EVs from different buses. As expected, AA is the most fair controller. Indeed, it achieves almost

100 %

in both metrics on PV curtailment, while it also significantly outperforms the remaining controllers in terms of fair EV curtailment. It achieves a lower value in the MMR metric for the EVs because there exist buses whose EV demand is compensated by PV generation, leading to reduced EV curtailment needs (which impacts the numerator of MMR); however, it performs even better than ULOPF, verifying its fair design. With respect to the OPF baselines, although NLOPF is the best-performing in terms of violations and APC, it has the worst performance in terms of fairness (note that fairness is not accounted for in its objective). On the contrary, ULOPF, by determining a uniform per-feeder action, achieves high fairness, which for PVs is close to the AA fairness, but at the expense of APC performance. Next, we focus on PV curtailment by the proposed disaggregators to assess the fairness of our schemes, which allows for more accurate conclusions, as in this simulation, the grid is PV-dominated.

Figure 4. Fairness evaluation of the controllers in the absence of BESSs. (a) PV curtailment. (b) EV curtailment.

For RBC, the evaluation indicates that higher values of

η

yield fairer strategies. In particular, RBC with

η = 16

is the most competitive alternative to AA in the PV curtailment, whereas RBC with

η = 0

is the least fair one. Thus, by properly setting the value of

η

, RBC’s trade-off between fairness in curtailment and performance can be properly tuned. In next subsection, we elaborate further on this tuning. PBD seems much less efficient in providing fair solutions, especially with respect to MMR, which is expected as it hardly penalizes buses far from the PCC. With respect to its enhancements, CPBD performs much better than EPBD in terms of both fairness metrics, indicating that although the additional constraint of Equation (19) induces a small performance drop compared to PBD and EPBD, it improves fairness among buses.

Lastly, in Figure 5, we observe the minimum utilization of PVs and EVs for the suggested controllers and the OPF baselines. RBC with

η = 16

obtains the maximum minimum utilization for both PVs and EVs. In general, RBC with any value of

η

achieves higher minimum utilization of EVs compared to other controllers, but lower values of

η

underperform with respect to the minimum utilization of PVs compared to other disaggregators or RBC with higher values of

η

. Specifically, for RBC with

η = 0

, the minimum utilization of PVs is equal to

8.23 %

, whereas RBC with

η = 16

allows at least

38 %

of the generated PV power to all buses. Moreover, the AA performs close to RBC, achieving approximately a value of

32 %

for the minimum PV utilization. CPBD follows, presenting around

23.6 %

minimum PV utilization and

50.2 %

minimum EV utilization, whereas for the remaining disaggregators, there is at least one bus where less than

10 %

of the PV generation is permitted. Regarding the OPF-based methods, ULOPF achieves optimal utilization for both PVs and EVs, which makes evident the impact of fairness. In contrast, the NLOPF presents competitive performance in terms of minimum PV utilization compared to fairness-aware controllers due to its significantly lower curtailment requirements. In general, we conclude that methods that achieve higher fairness in terms of MMR and JFI metrics, i.e., the AA, RBC with high

η

, and CPBD, also achieve high values of minimum PV and EV utilization, which underlines the correlation between minimum utilization and fairness.

Figure 5. Minimum utilization metric for all controllers: x-axis represents the minimum EV utilization and y-axis the minimum PV utilization.

5.4. Sensitivity Analysis of the Performance–Fairness Trade-off to Controller Parameters

In this section, we explain which parameters can be adjusted to tune the trade-off between fairness and performance and we perform a sensitivity analysis to assess their impact.

Regarding the LD controller, the parameters

m_{s}^{P V}

and

m_{s}^{E V}

are significant in balancing fairness and performance. In the previous section, we have shown that LD achieves better performance but reduced fairness compared to AA when the parameters

m_{s}^{P V}

and

m_{s}^{E V}

are set to

- 0.1

. Notably, LD becomes identical to AA when

m_{s}^{P V} = m_{s}^{E V} = 0

. Next, we set

m_{s}^{P V} = m_{s}^{E V} = m_{s}

and simulate the controller for

m_{s}

values ranging from

- 0.5

to 0 with a step of

0.05

. We do not study values lower than

- 0.5

because in our setting, the cardinalities of

N_{f}^{E V}

and

N_{f}^{P V}

are odd and, thus, values of

m_{s}

lower than

- 0.5

do not affect disaggregation. The corresponding results are illustrated in Figure 6.

Figure 6. Sensitivity of the LD-obtained metric values with respect to the parameter

m_{S}

. (a) Active Power Curtailment. (b) Remaining violations. (c) JFI of PVs. (d) Minimum PV utilization.

As observed in Figure 6a, higher

m_{s}

values result in higher curtailment, while for

m_{s} < - 0.2

, the improvement in curtailment becomes marginal. In terms of violation mitigation, lower

m_{s}

values generally lead to fewer violations, with the minimum observed for

m_{s} = - 0.1

. Figure 6c demonstrates that increasing

m_{s}

improves the fairness of PV curtailment. Similar trends are observed for EVs and for the MMR metric. Furthermore, values of

m_{s}

close to 0 are beneficial to the minimum utilization metric.

Regarding the CPBD controller, the parameter

ϵ

distinguishes it from PBD and enhances fairness. We simulate the controller for

ϵ

values ranging from 0 to 1, and the results are presented in Figure 7. The sensitivity analysis indicates that lower

ϵ

values lead to improvements in fairness-related metrics, whereas higher values are more favorable in terms of performance. However,

ϵ

values that are too high may increase the number of violations. Regarding the fairness metrics, similar results are obtained for JFI of EVs and MMR of both PVs and EVs.

Figure 7. Sensitivity of the CPBD-obtained metric values with respect to the parameter

ϵ

. (a) Active Power Curtailment. (b) Remaining violations. (c) JFI of PVs. (d) Minimum PV utilization.

The last controller with adjustable parameters affecting the fairness–performance trade-off is the RBC. In the previous section, representative values of

η

were selected to demonstrate that RBC is sensitive to this parameter. In this section, we analyze the behavior of RBC by examining all possible values of

η

for the dual-feeder grid with 17 buses per feeder, i.e.,

η \in {0, 1, \dots, 16}

. As illustrated in Figure 8, we observe that higher values of

η

enhance fairness-related metrics but the performance deteriorates. Moreover, RBC demonstrates the highest sensitivity. In particular, the increase from the best to the worst APC value is approximately 20%, whereas the corresponding increase is below 10% for LD and CPBD.

Figure 8. Sensitivity of the RBC-obtained metric values with respect to the parameter

η

. (a) Active Power Curtailment. (b) Remaining violations. (c) JFI of PVs. (d) Minimum PV utilization.

5.5. Evaluation of the Proposed Disaggregators with BESSs

The results of the previous section highlight the emerging trade-off between fairness and performance: the best-performing controllers do not entail fair solutions, resulting in suboptimal utilization of PVs or high curtailments of EV demands in specific buses, which may render the investment for the installation of PVs or charging stations in those buses unattractive. Thus, we include BESSs in the grid to examine their potential benefits towards increasing fairness in PV use and EV charging while maintaining high performance levels.

Figure 3b presents the APC percentage and the ratio of violations achieved for the AA and different disaggregators in the presence of BESSs. With BESSs, the AA controller shows a significant decrease in APC percentage (from

55.60

% to

47.20

%) compared to the case without BESSs with a slight penalty in the ratio of violations (from

0.59

% to

0.99

%). Similar behavior is obtained for the LD, PBD, CPBD, and EPBD disaggregators for which the APC percentage is reduced by around 15%, while the number of violations still remains below 1%. In particular, LD is the best-performing controller in terms of violations mitigation, allowing only

0.03

violations per time step on average. In the experiments with BESSs, RBC is evaluated only with

η = 16

, as this choice has led to the highest fairness without BESSs. For this RBC version, BESSs bring a modest decrease in the APC percentage (by 2 units) and a slight increase in the ratio of violations, which becomes around

1.4 %

. Therefore, when BESSs are included, the EPBD disaggregator demonstrates the best performance in reducing curtailment requiriments, with PBD achieving comparable results.

Regarding the fairness metrics, which are illustrated in Figure 9, the relative ordering of controllers remains unchanged as in Figure 4. The AA maintains its performance in terms of JFI, but with BESSs, it achieves a lower MMR value than without BESSs. This observation can be explained by the formula of MMR in Equation (30). In more detail, focusing on a single-feeder, the denominator of the MMR metric is equal to

1 - {min}_{(f, n) \in N_{f}^{X}} {u t_{f, n}^{X}}

; thus, an increase in the mininum utilization value implies a decrease in the denominator of the MMR metric. Similarly, the numerator is given by

1 - {max}_{(f, n) \in N_{f}^{X}} {u t_{f, n}^{X}}

. If we denote by

δ

the absolute increase in

{min}_{(f, n) \in N_{f}^{X}} {u t_{f, n}^{X}}

,

δ^{'}

the absolute increase in

{max}_{(f, n) \in N_{f}^{X}} {u t_{f, n}^{X}}

, and

{\hat{M M R}}_{f}^{X}

the initial MMR ratio, the derived MMR value is lower if the relation

\frac{δ^{'}}{δ} > {\hat{M M R}}_{f}^{X}

holds as can be proven by trivial calculations. Considering the uniform actions of AA and the reduced curtailment needs, the values of

δ

and

δ^{'}

are expected to be similar and positive, but

{\hat{M M R}}_{f}^{X}

is lower or equal to 1; hence, the drop of MMR value is justified. Furthermore, RBC and CPBD are again the most fair disaggregators, even though the corresponding JFI and MMR fairness metric values have slightly dropped compared to the case without BESSs. Worth mentioning are the improvements noticed in the fairness metrics for PVs of the PBD and EPBD controllers and in the JFI metric for EVs of EPBD and RBC.

Figure 9. Fairness evaluation of the controllers after the integration of BESSs. (a) PV curtailment. (b) EV curtailment.

Figure 10 depicts the minimum utilization values for all controllers. We observe that BESSs contribute significantly into increasing the values of this metric allowing for more efficient PV utilization and higher EV demand satisfaction. AA and RBC controllers achieve optimal performance. The former has a slight advantage on PVs, while the latter is better at fulfilling EV demand. In Figure 11, the relative improvement in the minimum utilization metric, when BESSs are included, to the minimum utilization without BESSs is plotted. All methods except RBC present an average improvement more than

35 %

on minimum PV utilization. The improvement achieved for RBC is only about

1 %

. Considering that RBC is the most effective controller concerning the minimum utilization, this improvement is satisfactory. The two-step solution combined with BESSs control shows also an improvement on average in minimum EV utilization.

Figure 10. Minimum utilization metric for all controllers after the adoption of BESSs: x-axis is the minimum EV utilization and y-axis the minimum PV utilization.

Figure 11. Improvement in the minimum utilization metric due to the integration of BESSs. (a) Minimum PV utilization. (b) Minimum EV utilization.

5.6. Study on the Scalability of the Proposed Controllers

In this section, we examine the scaling properties of the AA, when BESSs are included, by varying the peak load and grid topology. Scalability is mostly related to training, since after obtaining the trained AA agent, decision making is quite fast. The AA agent infers a decision instantly whereas disaggregators are also very fast, scaling in the worst case linearly with the maximum number of buses over all feeders. Regarding the LD method, if each bus is aware its relative order on the feeder, there is an efficient implementation, where each bus computes locally the disaggregated action. However, training is offline and does not affect the decision time during deployment of the controllers, which is also studied at the end of the section.

For each configuration we train three agents for the same number of epochs as in the reference case, i.e., six epochs for AA and four epochs for RBC. The results are averages over the three trained agents. The first experiment increases the peak PV generation and EV demand by 50%, thereby expecting it to be a more challenging scenario in terms of training effort. However, we maintain the training effort similar to the reference case and observe the quality of the obtained AA. In Figure 12, we observe that controllers remain effective in this setting. In particular, the LD controller mitigates

99.7

% of the violations observed in the uncontrolled case, i.e., only

0.024

violation per time step occur on average. As expected, the required curtailment increases due to the higher generation and demand levels. The RBC achieves the best APC performance at the expense of the number of violations that is approximately four times higher than that of LD. In terms of fairness, the observed trends are consistent with the lower load case. Notably, the JFI metric is improved for the AA-based disaggregators. Specifically, the CPBD presents better fairness properties than RBC in this scenario. Overall, although we applied the same training effort to obtain the AA, the two-step solution retains its good performance even in higher load conditions.

Figure 12. Evaluation of the controllers for a higher load scenario. (a) Performance metrics. (b) Fairness metrics.

In the next experiment, we train the AA and evaluate the proposed methods on a larger grid consisting of four feeders with 17 households each. For reference, training for six epochs lasts

3.5

h in this case, while the corresponding training time for the dual-feeder grid was 2 h. As shown in Figure 13, the performance is similar to dual-feeder case. A notable improvement is observed for the RBC controller, which achieves the lowest APC percentage. However, RBC allows

0.17

constraints violation per time step on average, which is significantly higher compared to corresponding violations observed for PBD, CPBD, and LD controllers. Regarding fairness, the observed trends are consistent with those in the dual-feeder grid scenario. Again, although we applied the same training effort to obtain the AA, the proposed approach maintains its performance for larger grids.

Figure 13. Evaluation of the controllers on a grid with 4 feeders. (a) Performance metrics. (b) Fairness metrics.

Finally, we assess the scalability of the controllers in terms of action inference time during deployment. Figure 14 presents the average calculation time per time step for varying number of households and feeders. In all cases, the controllers achieve inference times below 9 ms, demonstrating their applicability for real-time control applications. Note that the reported times correspond to a setup where the AA is executed centrally, followed by parallel disaggregation calculations per feeder, except for the RBC where the centralized BESS action inference runs in parallel with the RBC-based PV and EV curtailment computations. The highest computation time is observed for the LD disaggregator. However, this can be significantly reduced by implementing an alternative approach, in which each bus performs the disaggregation locally, considering its index within the feeder is known. This modification would yield computation times similar to those of the AA controller.

Figure 14. Inference time of controllers. (a) Varying number of buses per feeder. (b) Varying number of feeders.

5.7. Generalization Potential of AA with BESSs

In absence of BESSs, the AA can be effectively applied to similar but not identical networks []. By aggregating its state and action spaces per feeder, the AA can be applied to grids with varying numbers of buses, provided that the number of feeders remains the same as in the training setup, due to the fixed dimensions of its input and output vectors. In this section, we examine whether the AA can be applied in unseen grid topologies and loads with satisfactory performance even when BESSs are included. In particular, we applied the AA agent trained for the previous experiments to unseen dual-feeder radial grids with the same PV, EV, and BESS adoption rate but with a different number of households. Specifically, a grid with 13 households per feeder and a grid with 21 households per feeder are tested.

In the grid consisting of 13 households per feeder, AA mitigates

99.7 %

of the violations, while curtailing only

41.65 %

of the active power. Observing the more challenging case of the grid with 21 households per feeder,

96 %

of the violations are avoided, while the APC percentage is about

52.1 %

. The trend is similar to the case without BESSs as demonstrated in Figure 15, leading to the conclusion that the integration of BESSs does not affect the generalization properties of the AA. Finally, these experiments highlight once more the effect of including BESSs in achieving a significant decrease in the APC percentage of around

15 %

, with the penalty of a small increase in the number of violations even under unseen grid topologies and loads.

Figure 15. Evaluation of the AA with and without BESSs on multiple grids: (a) Remaining violations compared to no control case. (b) Active Power Curtailment percentage.

6. Conclusions

In this work, we studied the problem of APC in radial grids with PVs, EVs, and BESSs by proposing a fast solution approach that enhances fairness among buses in the determined curtailments. Fairness is not imposed as a hard constraint but encouraged via intelligently designing the disaggregators that determine the tailored per-bus curtailments based on the RL-based uniform per-feeder actions. BESSs control is also incorporated in the proposed fairness-encouraging APC strategy and further enhances the fairness–performance trade-off with respect to the considered metrics. In detail, BESSs help to significantly reduce the APC percentages and increase the minimum utilization metric for all controllers without impacting the controllers’ ordering with respect to JFI and MMR fairness metrics. The most outstanding advantage of BESSs is the significant increase in the minimum PV utilization, which is beneficial for increasing the adoption of PV systems in the grid. Lastly, we have shown via simulations that the AA can be applied with satisfactory performance in unseen grids even in the presence of BESSs and that integrating BESSs is beneficial for reducing APC levels even under unseen settings.

In our future work, we plan to investigate optimal BESSs placement to optimize APC performance and fairness. Moreover, properly tuning the initial and terminal BESSs SoC levels is a promising direction for further improving the fairness–performance trade-off. In addition, we plan to study the integration of Vehicle-to-Grid (V2G) in our generalizable two-step solution approach. V2G can function as a cost-efficient alternative to BESSs by avoiding installation costs while not limiting energy storage capabilities.

Author Contributions

Conceptualization, G.G., E.S. and S.P.; methodology, G.G. and E.S.; software, G.G.; validation, G.G.; formal analysis, G.G. and E.S.; investigation, G.G.; writing—original draft preparation, G.G. and E.S.; writing—review and editing, E.S. and S.P.; visualization, G.G.; supervision, E.S. and S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rahman, S.; Khan, I.A.; Khan, A.A.; Mallik, A.; Nadeem, M.F. Comprehensive Review and Impact Analysis of Integrating Projected Electric Vehicle Charging Load to the Existing Low Voltage Distribution System. Renew. Sustain. Energy Rev. 2022, 153, 111756. [Google Scholar] [CrossRef]
Haque, M.M.; Wolfs, P. A Review of High PV Penetrations in LV Distribution Networks: Present Status, Impacts and Mitigation Measures. Renew. Sustain. Energy Rev. 2016, 62, 1195–1208. [Google Scholar] [CrossRef]
Xu, H.; Domínguez-García, A.D.; Sauer, P.W. Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning. IEEE Trans. Power Syst. 2020, 35, 1990–2001. [Google Scholar] [CrossRef]
Garcia, I.; Santana, R. Unified Framework for the Analysis of the Effect of Control Strategies on On-Load Tap-Changer’s Automatic Voltage Controller. IEEE Trans. Autom. Sci. Eng. 2024, 21, 1539–1548. [Google Scholar] [CrossRef]
Turitsyn, K.; Suc, P.; Backhaus, S.; Chertokov, M. Options for Control of Reactive Power by Distributed Photovoltaic Generators. Proc. IEEE 2011, 99, 1063–1073. [Google Scholar] [CrossRef]
Hayward, S.; Merlin, M.; Williams, M.; Morstyn, T. Coordination of Smart Hybrid Transformers in Distribution Networks. IEEE Trans. Smart Grid 2025, 16, 973–988. [Google Scholar] [CrossRef]
Stai, E.; Guscetti, M.; Duckheim, M.; Hug, G. Reinforcement Learning Models for Adaptive Low Voltage Power System Operation. In Proceedings of the 2023 IEEE Belgrade PowerTech, Belgrade, Serbia, 25–29 June 2023. [Google Scholar]
Koepele, C.; Guscetti, M.; Duckheim, M.; Hug, G.; Stai, E. Comparison of Active Power Curtailment Methods for Safe Operation in Low Voltage Power Systems. In Proceedings of the 2024 International Conference on Smart Energy Systems and Technologies (SEST), Torino, Italy, 10–12 September 2024. [Google Scholar]
Paudyal, S.; Bhattarai, B.P.; Tonkoski, R.; Dahal, S.; Ceylan, O. Comparative Study of Active Power Curtailment Methods of PVs for Preventing Overvoltage on Distribution Feeders. In Proceedings of the 2018 IEEE Power & Energy Society General Meeting (PESGM), Portland, OR, USA, 5–10 August 2018. [Google Scholar]
Weckx, S.; Gonzalez, C.; Driesen, J. Combined Central and Local Active and Reactive Power Control of PV Inverters. IEEE Trans. Sustain. Energy 2014, 5, 776–784. [Google Scholar] [CrossRef]
Valverde, G.; Cutsem, T.V. Model Predictive Control of Voltages in Active Distribution Networks. IEEE Trans. Smart Grid 2013, 4, 2152–2161. [Google Scholar] [CrossRef]
Vassallo, M.; Benzerga, A.; Bahmanyar, A.; Ernst, D. Fair Reinforcement Learning Algorithm for PV Active Control in LV Distribution Networks. In Proceedings of the International Conference on Clean Electrical Power (ICCEP), Terrasini, Italy, 27–29 June 2023; pp. 796–802. [Google Scholar]
Zhan, S.; Morren, J.; Akker, W.v.; der Molen, A.v.; Paterakis, N.G.; Slootweg, J.G. Fairness-Incorporated Online Feedback Optimization for Real-Time Distribution Grid Management. IEEE Trans. Smart Grid 2024, 15, 1792–1806. [Google Scholar] [CrossRef]
Liu, M.Z.; Procopiou, A.T.; Petrou, K.; Ochoa, L.F.; Langstaff, T.; Harding, J.; Theunissen, J. On the Fairness of PV Curtailment Schemes in Residential Distribution Networks. IEEE Trans. Smart Grid 2020, 11, 4502–4512. [Google Scholar] [CrossRef]
Gebbran, D.; Mhanna, S.; Ma, Y.; Chapman, A.; Verbic, G. Fair Coordination of Distributed Energy Resources with Volt-Var Control and PV Curtailment. Appl. Energy 2021, 286, 116546. [Google Scholar] [CrossRef]
Vadavathi, A.R.; Hoogsteen, G.; Hurink, J. PV Inverter-Based Fair Power Quality Control. IEEE Trans. Smart Grid 2023, 14, 3776–3790. [Google Scholar] [CrossRef]
Gerdroodbari, Y.Z.; Razzaghi, R.; Shahnia, F. Decentralized Control Strategy to Improve Fairness in Active Power Curtailment of PV Inverters in Low-Voltage Distribution Networks. IEEE Trans. Sustain. Energy 2021, 12, 2282–2292. [Google Scholar] [CrossRef]
Ali, A.; Hredzak, B.; Li, C. Safe and Fair PV Curtailment for Voltage Control in Unbalanced Active Distribution Network. In Proceedings of the IECON 2024—50th Annual Conference of the IEEE Industrial Electronics Society, Chicago, IL, USA, 3–6 November 2024; pp. 1–6. [Google Scholar]
Heider, A.; Helfenbein, K.; Schachler, B.; Röpcke, T.; Hug, G. On the Integration of Electric Vehicles Into German Distribution Grids Through Smart Charging. IEEE Trans. Ind. Appl. 2025, 61, 2001–2010. [Google Scholar] [CrossRef]
Zeraati, M.; Golshan, M.E.H.; Guerrero, J.M. A Consensus-Based Cooperative Control of PEV Battery and PV Active Power Curtailment for Voltage Regulation in Distribution Networks. IEEE Trans. Smart Grid 2019, 10, 670–680. [Google Scholar] [CrossRef]
Demirci, A.; Tercan, S.M.; Ahmed, E.E.E.; Cali, U.; Nakir, I. A Novel Electric Vehicle Charging Management with Dynamic Active Power Curtailment Framework for PV-Rich Prosumers. IEEE Access 2024, 12, 120239–120249. [Google Scholar] [CrossRef]
Tsaousoglou, G.; Giraldo, J.S.; Pinson, P.; Paterakis, N.G. Fair and Scalable Electric Vehicle Charging Under Electrical Grid Constraints. IEEE Trans. Intell. Transp. Syst. 2023, 24, 15169–15177. [Google Scholar] [CrossRef]
Frendo, O.; Gaertner, N.; Stuckenschmidt, H. Improving Smart Charging Prioritization by Predicting Electric Vehicle Departure Time. IEEE Trans. Intell. Transp. Syst. 2021, 22, 6646–6653. [Google Scholar] [CrossRef]
Kamrani, A.S.; Dini, A.; Dagdougui, H.; Sheshyekani, K. Multi-Agent Deep Reinforcement Learning with Online and Fair Optimal Dispatch of EV Aggregators. Mach. Learn. Appl. 2025, 19, 100620. [Google Scholar] [CrossRef]
Walter, D.; Bond, K.; Butler-Sloss, S.; Speelman, L.; Numata, Y.; Atkinson, W. X-Change: Batteries: The Battery Domino Effect. RMI Rep. 2023. Available online: https://rmi.org/insight/x-change-batteries/ (accessed on 25 September 2025).
Bozorg, M.; Sossan, F.; Boudec, J.-Y.L.; Paolone, M. Influencing the Bulk Power System Reserve by Dispatching Power Distribution Networks Using Local Energy Storage. Electr. Power Syst. Res. 2018, 163, 270–279. [Google Scholar] [CrossRef]
Sharma, V.; Haque, M.H.; Aziz, S.M.; Kauschke, T. Smart Inverter and Battery Storage Controls to Reduce Financial Loss due to Overvoltage-induced PV Curtailment in Distribution Feeders. Sustain. Energy Grids Netw. 2023, 34, 101030. [Google Scholar] [CrossRef]
Antunes, H.M.A.; Torquato, H.R.; Callegari, J.M.S.; Araujo, L.S.; Brandao, D.I. Hosting-Capacity Improvement Using Rooftop Photovoltaics and Decentralized BESS in Low Voltage Grids. IEEE Access 2025, 13, 67286–67300. [Google Scholar] [CrossRef]
Gotzias, G.; Stai, E. Fair MARL-Based Control of Multiple Batteries in Active Distribution Grids. In Proceedings of the IEEE PowerTech, Kiel, Germany, 29 June–3 July 2025. [Google Scholar]
Jin, L.; Zhan, S.; Cudjoe, S.; Paterakis, N. Empowering Low-Voltage Grids: Real-World Implementation of Home Batteries for Effective Congestion Management. In Proceedings of the IEEE PowerTech, Kiel, Germany, 29 June–3 July 2025. [Google Scholar]
Haarnoja, P.A.T.; Zhou, A.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
Stai, E.; Wang, C.; Boudec, J.-Y.L. Online Battery Storage Management via Lyapunov Optimization in Active Distribution Grids. IEEE Trans. Control Syst. Technol. 2021, 29, 672–690. [Google Scholar] [CrossRef]
Meinecke, S.; Sarajlic, D.; Drauz, S.R.; Klettke, A.; Lauven, L.-P.; Rehtanz, C.; Moser, A.; Braun, M. SimBench—A Benchmark Dataset of Electric Power Systems to Compare Innovative Solutions Based on Power Flow Analysis. Energies 2020, 13, 3290. [Google Scholar] [CrossRef]
Fortenbacher, P.; Mathieu, J.; Andersson, G. Modeling and Optimal Operation of Distributed Battery Storage in Low Voltage Grids. IEEE Trans. Power Syst. 2017, 32, 4340–4350. [Google Scholar] [CrossRef]
Thurner, L.; Scheidler, A.; Schäfer, F.; Menke, J.H.; Dollichon, J.; Wenderoth, F.; Meinecke, S.; Braun, M. Pandapower—An Open-Source Python Tool for Convenient Modeling, Analysis, and Optimization of Electric Power Systems. IEEE Trans. Power Syst. 2018, 33, 6510–6521. [Google Scholar] [CrossRef]
Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]

Figure 1. Two-step APC and BESS control solution.

Figure 2. Toy example illustrating the disaggregation decisions of the different controllers for solution step 2.

Figure 3. Performance evaluation of the controllers in terms of ratio of violations and APC percentage (a) without BESSs; (b) with BESSs.

Figure 4. Fairness evaluation of the controllers in the absence of BESSs. (a) PV curtailment. (b) EV curtailment.

Figure 5. Minimum utilization metric for all controllers: x-axis represents the minimum EV utilization and y-axis the minimum PV utilization.

Figure 6. Sensitivity of the LD-obtained metric values with respect to the parameter

m_{S}

. (a) Active Power Curtailment. (b) Remaining violations. (c) JFI of PVs. (d) Minimum PV utilization.

Figure 7. Sensitivity of the CPBD-obtained metric values with respect to the parameter

ϵ

. (a) Active Power Curtailment. (b) Remaining violations. (c) JFI of PVs. (d) Minimum PV utilization.

Figure 8. Sensitivity of the RBC-obtained metric values with respect to the parameter

η

. (a) Active Power Curtailment. (b) Remaining violations. (c) JFI of PVs. (d) Minimum PV utilization.

Figure 9. Fairness evaluation of the controllers after the integration of BESSs. (a) PV curtailment. (b) EV curtailment.

Figure 10. Minimum utilization metric for all controllers after the adoption of BESSs: x-axis is the minimum EV utilization and y-axis the minimum PV utilization.

Figure 11. Improvement in the minimum utilization metric due to the integration of BESSs. (a) Minimum PV utilization. (b) Minimum EV utilization.

Figure 12. Evaluation of the controllers for a higher load scenario. (a) Performance metrics. (b) Fairness metrics.

Figure 13. Evaluation of the controllers on a grid with 4 feeders. (a) Performance metrics. (b) Fairness metrics.

Figure 14. Inference time of controllers. (a) Varying number of buses per feeder. (b) Varying number of feeders.

Figure 15. Evaluation of the AA with and without BESSs on multiple grids: (a) Remaining violations compared to no control case. (b) Active Power Curtailment percentage.

Table 1. Optimization and controller/disaggregator parameters.

Parameter	Value
$w_{1}$	$1.093$
$w_{2}$	$1.633$
$w_{3}$	$0.560$
$w_{4}$	$0.008$
$w_{5}$	$0.008$
h	2
$m_{s}^{P V}$	$- 0.1$
$m_{s}^{E V}$	$- 0.1$
T	96
$Δ t$	15 min
$ϵ$	$0.35$
${\underset{̲}{U}}^{R B C}$	$0.92$ p.u.
${\bar{U}}^{R B C}$	$1.08$ p.u.
${\bar{I}}^{R B C}$	$0.24$ kA
$M^{\underset{̲}{U}}$	250
$M^{\bar{U}}$	175
$M^{\bar{I}}$	$87.5$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Fairness–Performance Trade-Offs in Active Power Curtailment for Radial Distribution Grids with Battery Energy Storage

Abstract

1. Introduction

2. Related Works

3. System Model and Problem Formulation

3.1. System Model

3.2. Problem Formulation

3.3. Fairness Criteria

4. Solution Approach

4.1. RL Agent for the First Solution Step: Aggregated Agent

4.2. Heuristic Controllers for the Second Solution Step: Disaggregators

5. Results

5.1. Simulation Setup and Sketch of Experiments

5.2. Evaluation Metrics

5.3. Evaluation of the Proposed Disaggregators Without BESSs

5.4. Sensitivity Analysis of the Performance–Fairness Trade-off to Controller Parameters

5.5. Evaluation of the Proposed Disaggregators with BESSs

5.6. Study on the Scalability of the Proposed Controllers

5.7. Generalization Potential of AA with BESSs

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics