DDPG-ADRC-Based Load Frequency Control for Multi-Region Power Systems with Renewable Energy Sources and Energy Storage Equipment

Dou, Zhenlan; Zhang, Chunyan; Zhou, Xichao; Gao, Dan; Liu, Xinghua

doi:10.3390/en18143610

Open AccessArticle

DDPG-ADRC-Based Load Frequency Control for Multi-Region Power Systems with Renewable Energy Sources and Energy Storage Equipment

by

Zhenlan Dou

¹,

Chunyan Zhang

¹,

Xichao Zhou

²,

Dan Gao

³ and

Xinghua Liu

^3,*

¹

State Grid Shanghai Municipal Electric Power Company, Shanghai 200120, China

²

State Grid Integrated Energy Services Group Co., Ltd., Beijing 100032, China

³

School of Electrical Engineering, Xi’an University of Technology, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(14), 3610; https://doi.org/10.3390/en18143610

Submission received: 7 June 2025 / Revised: 27 June 2025 / Accepted: 6 July 2025 / Published: 8 July 2025

(This article belongs to the Topic Intelligent, Flexible, and Effective Operation of Smart Grids with Novel Energy Technologies and Equipment)

Download

Browse Figures

Versions Notes

Abstract

A scheme of load frequency control (LFC) is proposed based on the deep deterministic policy gradient (DDPG) and active disturbance rejection control (ADRC) for multi-region interconnected power systems considering the renewable energy sources (RESs) and energy storage (ES). The dynamic models of multi-region interconnected power systems are analyzed, which provides a basis for the subsequent RES access. Superconducting magnetic energy storage (SMES) and capacitor energy storage (CES) are adopted due to their rapid response capabilities and fast charge–discharge characteristics. To stabilize the frequency fluctuation, a first-order ADRC is designed, utilizing the anti-perturbation estimation capability of the first-order ADRC to achieve effective control. In addition, the system states are estimated using a linear expansion state observer. Based on the output of the observer, the appropriate feedback control law is selected. The DDPG-ADRC parameter optimization model is constructed to adaptively adjust the control parameters of ADRC based on the target frequency deviation and power deviation. The actor and critic networks are continuously updated according to the actual system response to ensure stable system operation. Finally, the experiment demonstrated that the proposed method outperforms traditional methods across all performance indicators, particularly excelling in reducing adjustment time (45.8% decrease) and overshoot (60% reduction).

Keywords:

multi-region interconnected power system; energy storage equipment; load frequency control; first-order active disturbance rejection controller; deep deterministic policy gradient algorithm

1. Introduction

With the growing global energy demand and the increasing emphasis on renewable energy sources (RESs), RESs are becoming more prominent in the energy structure and gradually evolving into a crucial component of the power system. The shift significantly contributes to reducing dependence on traditional fossil fuels and protecting the environment. However, due to the inherent instability and intermittency of RESs, their large-scale integration introduces new challenges to the long-term maintenance of the power system. Liu et al. [1] presented a comprehensive description of research in this field. Load frequency control (LFC), as a critical component for ensuring system stability, urgently requires more advanced control strategies to address various challenges. Mi et al. [2] investigated a robust LFC strategy for power systems considering delay and parameter uncertainty. A sampling proportional integration (PI) LFC scheme was explored, which considers variable period sampling of the control signal [3]. Despite these efforts, none of the proposed controllers have been able to significantly improve LFC performance in practical engineering applications. To solve this problem, Xia et al. [4] studied a robust LFC scheme based on disturbance estimation for power systems subject to random time lag attacks. A novel model predictive control strategy was proposed, which can achieve economic dispatch of load and stable frequency control in a decentralized manner between multiple interconnected power regions [5]. In addition, many control strategies were proposed, such as the fractional-order cascaded control method [6], the decentralized dynamic output feedback controller [7], the intelligent variable structure fuzzy controller [8], and the parallel-type load damping factor controller [9]. Although existing LFC strategies demonstrate satisfactory performance under certain conditions, significant limitations remain. The robust LFC strategy employing periodic sampling PI control exhibits limited adaptability to disturbances induced by RES fluctuations. Furthermore, model predictive control (MPC) strategies, while effective in multi-area coordination, require global state information. This dependency can lead to amplified communication delays and imposes substantial computational demands, often exceeding the requirements for practical engineering real-time operation. Strategies such as fractional-order control and fuzzy logic control enhance dynamic response but lack inherent self-learning capabilities. Consequently, they struggle to address the complex nonlinearities inherent in integrated energy storage (ES) and RESs. These limitations highlight a critical gap whereby current approaches not only inherit the integration challenges of combining traditional control with intelligent algorithms but also fail to reconcile the inherent trade-off between control accuracy and engineering practicality.

To address these challenges, this paper proposes a novel distributed reinforcement learning framework based on the deep deterministic policy gradient (DDPG) algorithm. This framework integrates a linear extended state observer (LESO), core to active disturbance rejection control (ADRC), to estimate total system disturbances in real time. Crucially, the observer dynamically adjusts the DDPG agent’s decision-making cycle. By optimizing the DDPG action space through the LESO’s error compensation mechanism, the framework develops a self-learning disturbance rejection capability. Simultaneously, the end-to-end learning process of the dual closed-loop DDPG structure automatically optimizes the key parameters of the ADRC controller. This co-adaptive learning fosters a synergistic relationship between the RL agent and the disturbance observer. To facilitate engineering practice, ADRC was linearized in [10], greatly simplifying its structure and parameters. The linear ADRC technique is applied to many fields such as permanent magnet synchronous motors [11,12], power electronics [13,14], boiler forced ventilation systems [15,16], and power system control and optimization [17,18]. The nonlinear ADRC of power systems was investigated to solve the hybrid problem of microgrid communication delay [19]. Since the system uncertainty and control difficulty increase accordingly, various intelligent algorithms have been proposed, such as the confined iterative rational Krylov algorithm [20], the multiple-step greedy policy [21], and the large-variance genetic algorithm [22]. In this context, many researchers have conducted extensive research in this field. A deep reinforcement learning parameter optimization model has been constructed in [23], which enables ADRC to achieve optimal control performance. Zhu et al. [24] proposed an ADRC derived from a DDPG and verified the robustness of this method in practical applications.

With the deepening of the dual-carbon strategy, the grid connection of RESs has brought new difficulties to the LFC of the interconnected power grid and has also realized the reliable integration of RESs in the power system [25]. The introduction of RESs increases the complexity of the power system, making it more challenging to operate the system in real time and stably. To improve the utilization of RESs, a new control strategy for a storage system was adopted in [26]. Meanwhile, the application of energy storage equipment in RES power systems has also attracted increasing attention [27]. ES devices can be combined with ADRC strategies to smooth out power fluctuations. Wind power is the most widely used form of renewable energy generation, and has been extensively studied in several papers. The linear ADRC method was inspected, which considers the application in the LFC of complex power systems in wind energy exchange systems [28]. Ebrahimi et al. [29] examined an ES coordination approach of advanced uncertainty control for a realistic multi-region power system by integration with wind energy. As highlighted in [30], an integrated and coordinated suppression strategy was proposed to realize fault ride-through at the access points of large-capacity wind energy. Liu et al. [31] investigated the effects of grid robustness and phase-locked loop characteristics on the stability of wind farms. A method was investigated to smooth the undulations of centralized wind energy generation utilizing a grid-connected vehicle system [32]. Rouhanian et al. [33] obtained an improved LFC model by considering the participation of wind farms. Although ADRC demonstrates excellent performance in improving stability, its practical applications need further exploration to ensure the effectiveness and reliability of control strategies, especially in complex multi-region systems involving RESs. Therefore, the introduction of ADRC in RES power systems is expected to effectively enhance the robustness and immunity of the system to interference.

The objective of this text is to discuss the key technologies and utilization prospects of ADRC in LFC. In RES power systems, the introduction of ADRC is expected to provide theoretical guidance and technical support for ensuring stable and secure operation. Additionally, the study of the DDPG-ADRC application contributes to promoting the progress of power system control technology.

The contributions of this paper are as follows:

In comparison with the schemes in [34,35], the LFC scheme designed in this paper considers the access of ES devices. The ES system is deeply integrated into the framework combining DDPG and ADRC, achieving coordinated frequency control between the ES system and other power generation equipment.
This paper proposes an ADRC with a simple structure to enhance the system stability. Meanwhile, this article organically combines DDPG with ADRC, utilizing DDPG for the adaptive adjustment of key parameters in ADRC, thereby forming a novel LFC strategy.
Compared with other methods [36,37,38,39], the method to optimize ADRC with DDPG proposed in this article has better anti-interference capability and is more suitable for dealing with uncertainties in multi-region power systems. In addition, relative to the solution without ES units, this method significantly enhances the transient response speed of the power system in the event of frequency deviation, which is improved by at least 15%.

The rest of this paper is structured as follows. Section 2 analyzes the dynamic model of the power system. Section 3 proposes a first-order ADRC structure, and uses the DDPG algorithm for parameter optimization. In Section 4, by simulation, the adopted method is verified to be effective. Section 5 summarizes the study.

2. System Model

The LFC model for the i-th region is depicted in Figure 1. In multi-region power systems, thermal power plants in each region are the main power generation mode, which can provide a stable power supply. All thermal power unit elements are mathematically described with first-order equations, as shown below. RESs include wind farms and PV power plants. ES devices include two types of elements. Superconducting magnetic energy storage (SMES) can respond rapidly to system frequency variations, allowing it to rapidly absorb or release large amounts of electrical energy, and capacitor energy storage (CES) fine-tunes the system power through fast charging and discharging. SMES and CES can assist in maintaining system frequency stability, effectively enhancing the frequency control performance of multi-region power systems. The implementation of all these models is as follows.

2.1. RESs Modeling

The power of wind turbines is

P_{W T} = \frac{1}{2} ρ A_{b} C_{f} V_{s}^{3},

(1)

where

ρ

is the air density.

A_{b}

is the vane swept size.

V_{s}

is the actual wind speed, which is the velocity of the airflow passing through the blades of the wind turbine.

C_{f}

is the power factor of the wind turbine, which can be expressed as

C_{f} = (0.44 - 0.0167 α) sin [\frac{(μ - 3) π}{15 - 0.3 α}] - 0.0184 (μ - 3) α,

(2)

where

α

is the pitch angle.

μ

is the tip velocity ratio, which can be expressed as

μ = \frac{R_{b} ω_{r}}{V_{s}},

(3)

where

R_{b}

is the blade radius.

ω_{r}

is the blade speed.

The output power of PV is

P_{p v} = η_{c e} S ϕ [1 - 0.005 (T_{a t} + 25)],

(4)

where S and

η_{c e}

are usually constant, S is the effective light area, and

η_{c e}

is the conversion efficiency. The output power is determined entirely by two independent variables, namely sunlight intensity

ϕ

and ambient temperature

T_{a t}

.

2.2. ES Equipment Modeling

The linearization framework of SMES is presented in Figure 2. Based on [40,41], the electrical loss in SMES is expressed as

P_{e l} = \frac{P_{h l} + P_{c l l}}{η_{c}} + P_{l b},

(5)

P_{h l} = 0.25 π δ h {(d_{0} - d_{i})}^{2},

(6)

P_{c l l} = I_{s} n^{'} ψ, P_{l b} = R_{j r} I_{s}^{2},

(7)

where

η_{c}

is the cooling efficiency.

d_{0}

is the outside diameter.

d_{i}

is the inner diameter.

δ

is the cooling capacity. h is the height of the magnet.

n^{'}

is all quantities of current leads.

ψ

is the minimum heat leakage.

R_{j r}

is the joint resistance.

I_{s}

is the incremental of the inductor current in SMES.

P_{e l}

is the total electrical loss of the unit.

P_{h l}

is the low-temperature loss.

P_{c l l}

is the current lead loss.

P_{l b}

is the loss of interface wiring.

Equations (5)–(7) are expanded with Taylor expansion and linearized as

Δ P_{e l} = Δ I_{s} (n^{'} ψ + 2 I_{0} R_{j r}),

(8)

where

I_{0}

is the current initial value.

Δ I_{s}

is the incremental deviation of the inductor current in SMES.

The variational voltage deviation on the cell coil is expressed as

Δ E_{i} = \frac{1}{1 + T_{s c} s} [A C E_{i} K_{α} - K_{i f} Δ I_{s}],

(9)

where

Δ I_{s} = \frac{Δ E_{i}}{S L}

.

K_{i f}

is the gain of feedback signal.

T_{s c}

is the time constant of the converter delay.

K_{α}

is the amplification of the gain.

Δ E_{i}

is the incremental voltage deviation of the SMES.

The incremental change in the active power of SMES is

Δ P_{S M E S} = (Δ I_{s} + I_{0}) Δ E_{i} - Δ P_{e l},

(10)

where

Δ P_{S M E S}

is the power deviation caused by superconducting magnetic ES.

The linearization framework of CES is presented in Figure 3. The voltage increment deviation of CES is expressed as

Δ E_{c s} = \frac{1}{1 + T_{c} s} \cdot \frac{1}{c s + \frac{1}{R^{'}}} (A C E_{i} K_{c f} - K_{v d} Δ I_{c s}),

(11)

where c is the capacitor capacitance.

R^{'}

is the equivalent series resistance value.

T_{c}

is the characteristic time of capacitor.

K_{c f}

is the amplification gain.

K_{v d}

is the feedback loop gain.

Δ E_{c s}

is the incremental voitage deviation of CES.

Δ I_{c s}

is the incremental deviation of the current in CES.

The amount of power deviation for CES is expressed as

Δ P_{C E S} = Δ E_{c s} Δ I_{c s},

(12)

where

Δ P_{C E S}

is the power deviation caused by the capacitor ES.

2.3. LFC Model of Power System

The model of the governor is

G_{g i} (s) = \frac{K_{g}}{1 + T_{g i} s} .

(13)

According to [42], the transfer function of the reheat turbines is

G_{t i} (s) = \frac{K_{T} (1 + K_{r} T_{r i} s)}{(1 + T_{t i} s) (1 + T_{r i} s)} .

(14)

In accordance with [43], the generator frequency–load relationship is

G_{p i} (s) = \frac{K_{p}}{1 + T_{p i} s} .

(15)

The entire loop can be represented as

G (s) = \frac{G_{t i} (s) G_{p i} (s) G_{g i} (s)}{1 + G_{t i} (s) G_{p i} (s) G_{g i} (s) / R_{i}} .

(16)

The variation in active power on the contact line between regions is

Δ P_{t i e i} = \sum_{i \neq j}^{n} \frac{2 π K_{i j}}{s} (Δ f_{i} - Δ f_{j}) .

(17)

From Figure 1,

Δ f_{i}

can be deduced as

Δ f_{i} = G (s) (u_{i} - Δ P_{d i} - Δ P_{e i} - Δ P_{t i e i}) .

(18)

The expression for area control error (ACE) is

\begin{matrix} A C E_{i} & = Δ f_{i} B_{i} + Δ P_{t i e i} = (u_{i} - Δ P_{d i} - Δ P_{e i}) B_{i} G (s) + (1 - G (s) B_{i}) Δ P_{t i e i}, \end{matrix}

(19)

where

Δ f_{i}

is the systematic frequency in region i.

Δ P_{d i}

is the load disturbance in area i, which represents the total change in output power of the wind turbines and PV.

Δ P_{e i}

is the power deviation caused by energy storage devices.

Δ P_{t i e i}

is the contact line power deviation in the i-th region.

B_{i}

is the frequency skew factor of region i.

G_{g i} (s)

is the transfer function of the governor.

G_{t i} (s)

is the transfer function of the reheat turbine.

G_{p i} (s)

is the transfer function of the generator.

K_{g}

is the governor static gain factor.

K_{r}

is the reheat factor.

K_{T}

is the turbine gain factor.

K_{p}

is the gain of the generator system.

K_{i j}

is the synchronization factor for liaison lines between regions i and j.

R_{i}

is the speed adjustment caused by governor action.

T_{g i}

is the time constant of the governor.

T_{t i}

is the time constant of the turbine.

T_{r i}

is the time constant of the reheat turbine.

T_{p i}

is the time constant of the generator system.

3. DDPG-ADRC Framework

DDPG is specifically designed to address the problem of continuous action spaces. In LFC, the output of the controller is usually a continuous value, and DDPG can directly handle continuous actions. DDPG has the capability for online learning. When the output fluctuations of RESs and load changes are frequent, DDPG can automatically extract new state information from the system and adjust to the optimal control scheme, while ADRC is responsible for quickly estimating and compensating for disturbances, and DDPG can adapt to dynamic changes in the system. Its combination with ADRC can further enhance the system robustness and dynamic performance.

3.1. Design of First-Order ADRC

For multi-region interconnected power systems,

A C E_{i}

is applied as the controller input to stabilize the frequency deviation of the contact line and the frequency deviation between areas to 0 [44]. The architecture of the first-order linear ADRC is shown in Figure 4. The input of the ADRC is

A C E_{i}

, corresponding to the reference input v. In Figure 4, the output u of the ADRC is directly utilized as the output

u_{i}

of the controller in Figure 1. The ADRC generates control signals

u_{i}

directly based on the area control error, adjusting the mechanical power output of the generator and the ES output. The LESO estimates and compensates for comprehensive disturbances, including fluctuations in RESs and variations in load.

In practical power system applications, ignoring

Δ P_{t i e i}

, (19) can be expressed as

\begin{matrix} Y_{i} (s) = G_{b i} (s) U_{i} (s) + G_{d i} (s) (D_{i} (s) + D_{e i} (s)), \end{matrix}

(20)

where

Y_{i} (s)

,

U_{i} (s)

,

D_{i} (s)

, and

D_{e i} (s)

are the Laplace transmutes of

A C E_{i}

,

u_{i}

,

Δ P_{d i}

, and

Δ P_{e i}

.

Integrating with (13), (14), and (15),

G_{b i} (s)

and

G_{d i} (s)

can be written as

\{\begin{matrix} G_{b i} (s) & = \frac{K_{p} K_{g} B_{i} R_{i}}{(T_{g i} s + 1) (T_{t i} s + 1) (T_{p i} s + 1) R_{i} + K_{p} K_{g}} \\ G_{d i} (s) & = \frac{(T_{g i} s + 1) (T_{t i} s + 1) K_{p} K_{g} B_{i} R_{i}}{(T_{g i} s + 1) (T_{t i} s + 1) (T_{p i} s + 1) R_{i} + K_{p} K_{g}} . \end{matrix}

(21)

By transforming the above equation, we can obtain the differential form as

\begin{matrix} T_{g i} T_{t i} T_{p i} y_{i}^{(3)} (t) + (T_{g i} T_{t i} + T_{g i} + T_{p i} + T_{t i} T_{p i}) {\ddot{y}}_{i} (t) + (T_{g i} + T_{t i} + T_{p i}) {\dot{y}}_{i} (t) + y_{i} (t) \\ = & K_{p} K_{g} B_{i} R_{i} u (t) + T_{g i} T_{t i} K_{p} K_{g} B_{i} R_{i} \ddot{d} (t) + (T_{g i} + T_{t i}) K_{p} K_{g} B_{i} R_{i} \dot{d} (t) + K_{p} K_{g} B_{i} R_{i} d (t) . \end{matrix}

(22)

Converting the above process to a generic first-order system, we have

\dot{y} = k (t, y, \ddot{y}, y^{(3)}, d, ξ, w) + b u,

(23)

where k is a synthetic function of

t, y, y^{(3)}, w,

and

ξ

. y is the system output, and u is the system input.

y^{(3)}

is a third-order dynamic. d is the unmeasurable perturbation of the system. w is the system’s dynamic uncertainty, and

ξ

is the systematic measurement of noise. The actual system gain b can be estimated by applying

b_{0}

, which gives the following equations:

\dot{y} = k (t, y, \ddot{y}, y^{(3)}, d, ξ, w) + (b - b_{0}) u + b_{0} u = f + b_{0} u,

(24)

f = k (t, y, \ddot{y}, y^{(3)}, d, ξ, w) + (b - b_{0}) u,

(25)

where f is the total perturbation of the control system.

According to the first-order system, the state can be defined as

y = x_{1}

,

x_{2} = f

, where

x = {[\begin{matrix} x_{1} & x_{2} \end{matrix}]}^{T} = {[\begin{matrix} y & f \end{matrix}]}^{T}

.

The sum of the unknown perturbations of the system can be seen as

{\dot{x}}_{2} = \dot{f} = h (x, d)

. The state-space form can be translated as

\{\begin{matrix} {\dot{x}}_{1} = x_{2} + b_{0} u \\ {\dot{x}}_{2} = h (x, d) \\ y = x_{1} . \end{matrix}

(26)

The state-space function (26) can be presented as

\{\begin{matrix} \dot{x} = A x + B u + E \dot{f} \\ y = C x, \end{matrix}

(27)

where

A = [\begin{matrix} 0 & 1 \\ 0 & 0 \end{matrix}], B = [\begin{matrix} b_{0} \\ 0 \end{matrix}], E = [\begin{matrix} 0 \\ 1 \end{matrix}], C = [\begin{matrix} 1 & 0 \end{matrix}] .

According to modern control theory, the estimates of each order and the total disturbance f can be obtained from the state observer. A LESO is designed as

\{\begin{matrix} \dot{z} = A z + B u + L (x_{1} - z_{1}) \\ \hat{y} = C z, \end{matrix}

(28)

where

z = {[\begin{matrix} z_{1} & z_{2} \end{matrix}]}^{T}

is the observation state.

z_{1}

is the observed value of the system output, and

z_{2}

is the observed value of the perturbation.

L = {[\begin{matrix} β_{1} & β_{2} \end{matrix}]}^{T}

is the observer gain vector. When L is chosen sensibly, z can track x well.

The output state feedback control law for selecting ADRC is

u = \frac{u_{0} - z_{2}}{b_{0}},

(29)

where

u_{0} = w_{c} (r - z_{1})

.

w_{c}

is the bandwidth of the controller. r is the set value of y (the reference input value is 0 under steady-state conditions).

3.2. DDPG-Based Parameter Adjuster

The DDPG adjuster depicted in this paper is defined as a quadruple [

S, A, T, R

]. It accepts the state information of the i-th region, optimizes the control parameters (

w_{c}, β_{1}, β_{2}, b_{0}

), and feeds them back to ADRC. The structure of the LFC method based on DDPG-ADRC is shown in Figure 5.

The state space

S

includes the actual frequency deviation

Δ f_{i}

, the target frequency deviation

Δ f_{i}^{*}

, the actual power deviation

Δ P_{t i e i}

, the target power deviation

Δ P_{t i e i}^{*}

, the actual area control error

A C E_{i}

, and the target area control error

A C E_{i}^{*}

.

The state space

A

includes the actual bandwidth

w_{c}

of ADRC, the target bandwidth

w_{c}^{*}

of ADRC, the actual observer gains

β_{1}

and

β_{2}

, the target observer gains

β_{1}^{*}

and

β_{2}^{*}

, the actual system gain

b_{0}

, and the target system gain

b_{0}^{*}

.

T

is a dynamic update function that summarizes the relationship from the current state to the following state. The output parameters of DDPG are introduced into ADRC to obtain state variables through LESO.

The goal of the DDPG parameter adjuster is to minimize

Δ f_{i}^{*}, Δ P_{t i e i}^{*}, A C E_{i}^{*}

. Therefore, the reward function

R

can be defined as

r (t) = - \sum_{i = 1}^{T} [|B_{i} Δ f_{i}| + |Δ P_{t i e i}|] .

(30)

The DDPG used in this paper includes an actor network and a critic network. The actor network can generate control actions, while the critic network can evaluate the value of the actions.

The input layer of the actor network has a state-space dimension of 6. Its hidden layers consist of two fully connected layers with 256 neurons evenly distributed across the two layers, and the activation function used is ReLU. The output layer of the actor network has an action space dimension of 8, with the activation function being tanh, and the output range constrained to [−1, 1]. The input layer of the critic network comprises 14 units, which is the sum of the state-space dimension and the action space dimension. Its hidden layers consist of two fully connected layers, with the first layer containing 192 neurons and the second layer containing 64 neurons, employing ReLU as the activation function. The output layer is one-dimensional with no activation function. A schematic diagram of the network structure is provided in Figure 6.

The actor strategy network is

λ (s_{t} | δ^{λ})

. When updating the actors, noise is also taken into account. The output of the actor network

a_{t}

is expressed as

a_{t} = λ (s_{t} | δ^{λ}) + n o i s e = λ (Δ f_{i}, Δ f_{i}^{*}, Δ P_{t i e i}, Δ P_{t i e i}^{*}, A C E_{i}, A C E_{i}^{*} | δ^{λ}) + n o i s e,

(31)

where

s_{t}

is the current state.

The critic network is

Q (s_{t}, a_{t} | δ^{Q})

, which can be expressed as

Q (s_{t}, a_{t} | δ^{Q}) = Q (Δ f_{i}, Δ f_{i}^{*}, Δ P_{t i e i}, Δ P_{t i e i}^{*}, A C E_{i}, A C E_{i}^{*}, a_{t} | δ^{Q}) .

(32)

The evaluation value

p_{i}

of the target Q network is represented as

p_{i} = r_{i} + ε Q^{'} [s_{i + 1}, λ^{'} (s_{i + 1} | δ^{λ^{'}}) | δ^{Q^{'}}],

(33)

where

ε

is the discount factor and

ε ≪ 1

.

r_{i}

represents the reward, which is (30).

The update of the critic network is achieved by minimizing the error between the target network and the actual network. The loss function is

D = \frac{1}{M} \sum_{i = 1}^{M} {[p_{i} - Q (s_{i}, a_{i} | δ^{Q})]}^{2} .

(34)

The goal of the actor network update is to maximize

J = \sum_{t = 0}^{T} r (t)

. The following policy gradient is used to update the parameters of the actor network:

\nabla_{δ} J \approx \frac{1}{M} \sum_{i = 1}^{M} [\nabla_{a_{t}} Q (s_{t}, a_{t} | δ^{Q}) |_{s = s_{i}, a = λ (s_{i} | δ^{λ})} \nabla_{δ^{λ}} λ (s_{t} | δ^{λ}) |_{s = s_{i}}] .

(35)

3.3. Parameter Optimization

The parameter optimization diagram is presented in Figure 7. The algorithm self-optimization is achieved through updates

δ^{λ}

and

δ^{Q}

. To make the algorithm learn better, the parameter update process is split with

λ^{'} (s_{t} | δ^{λ^{'}})

and

Q^{'} (s_{t}, λ^{'} (s_{t} | δ^{λ^{'}}) | δ^{Q^{'}})

. The parameters

δ^{λ}

and

δ^{Q}

are slowly tracked by the learning network

\{\begin{matrix} δ^{Q^{'}} \leftarrow μ δ^{Q} + (1 - μ) δ^{Q^{'}} \\ δ^{λ^{'}} \leftarrow μ δ^{λ} + (1 - μ) δ^{λ^{'}}, \end{matrix}

(36)

where

μ

is the parameter update rate and

μ \leq 1

.

To make the optimized parameters feasible, we have

\{\begin{matrix} {\tilde{δ}}^{i} = \frac{δ^{i} - δ_{m i n}^{i}}{δ_{m a x}^{i}} \\ δ^{i} = δ_{m a x}^{i} {\tilde{δ}}^{i} + δ_{m i n}^{i} \\ {\tilde{δ}}_{k + 1}^{i} = {\tilde{δ}}_{k}^{i} + a_{i}, \end{matrix}

(37)

{\tilde{δ}}^{i} = \{\begin{matrix} \frac{0.05 ({\tilde{δ}}^{i} + 1)}{1.05}, \tilde{δ} < 0.05 \\ {\tilde{δ}}^{i}, 0.05 \geq \tilde{δ} \leq 0.95 \\ \frac{1 - 0.05 (2 - {\tilde{δ}}^{i})}{1.05}, 0.95 < \tilde{δ}, \end{matrix}

(38)

where

{\tilde{δ}}^{i}

is the correction value of the i-th parameter.

δ^{i}

is the initial value of the i-th parameter.

δ_{m a x}^{i}

is the upper limit of the i-th parameter.

δ_{m i n}^{i}

is the lower limit of the i-th parameter. All parameters and parameter correction actions are normalized to the range of [0, 1] to prevent parameters from exceeding boundaries.

The training process terminates when any one of the predefined convergence criteria is satisfied. The first criterion requires the average reward fluctuation to remain below 1% for 50 consecutive training rounds, as calculated by the reward function (30). Alternatively, training stops when the total number of steps reaches 200,000. A supplementary condition mandates that the average control error must maintain a value below 0.05 for 100 consecutive rounds. To ensure thorough optimization and avoid premature convergence to local optima, the training continues for a minimum of 1000 iterations regardless of other conditions being met earlier.

4. Simulation Results

In general, the RES generation part is regarded as a perturbation of the system. However, its large-scale access will generate a significant effect on the power balance and frequency stability. The introduction of ES devices can solve this problem, and ES can provide a buffer when power fluctuates, helping to maintain the stability of the system’s frequency. The load adjustment is compensated through the disturbances observed by the ADRC, the adaptive learning of DDPG, and the collaborative control of ES devices. As presented in Figure 8, the test was performed on an IEEE 39-bus power network with 10-generators. The three-region interconnected power system is composed of thermal power units, RESs, and ES units. The specific composition of the three-region model is the following: region 1 contains thermal power, wind power, and SMES; region 2 contains thermal power, PV power generation, and CES; and region 3 contains thermal power, wind power, PV power generation, CES, and SMES.

This study uses MATLAB/Simulink R2023a as the primary modeling tool, combined with Python 3.8 to implement deep learning algorithms. An IEEE 39-bus power system is established in MATLAB, and an ADRC is constructed based on S-Function. The construction of the DDPG algorithm is realized through the Tensor Flow 2.6 framework in Python, utilizing the Keras interface to build an actor–critic network. Python is utilized for the training of the DDPG algorithm and the output of policies, while Simulink accepts control signals and provides feedback on system status. The two platforms engage in real-time data interchange, creating a closed-loop simulation process.

4.1. Three-Region Interconnected Power System Without ES

The input signal is set as the load disturbance change signal; the output signal is set as

Δ f_{i}

and

Δ P_{t i e}^{i j}

. The system frequency is controlled by DDPG optimizing the parameters of the first-order ADRC. The four methods (i.e., PID, PI, ADRC, GA-ADRC) are compared to illustrate the good performance of the DDPG. For the power system model used in this paper, a 0.01 p.u. load disturbance is used in the three-region power system at

t = 0

. The first-order ADRC designed above is used for LFC, where the controller is applied to each power system region and the response of system frequency when the load changes is observed. After 100 iterations, the relevant parameters of the first-order ADRC are listed in Table 1. Power system parameters are considered as listed in Table 2. The training parameters of DDPG are outlined in Table 3.

Wind speed was measured in the Zafarana region of Egypt for practical application. The wind speed is constantly changing between approximately 5.1 and 15.8 m/s, and the average value is 11 m/s. The output power of PV modules is mainly related to light intensity and temperature, and real PV data is used in the system. Temperature and solar irradiance data were extracted for a full day; these temperature and solar radiation data were collected at Site Benban in Egypt. The power curves of wind turbines and PV are shown in Figure 9. The installed capacity of the synchronous generator was set based on the configuration of conventional thermal power units, with each unit rated at 100 MW. Region 1 has a wind energy installed capacity of 60 MW, region 2 has a PV installed capacity of 30 MW, and region 3 has a wind energy installed capacity of 60 MW and 30 MW of PV. The parameter selection for wind turbines was based on [45], with real-time adjustments according to frequency response, rather than a fixed average value, and needed to comply with the scale of the wind farm set in the research as well as the control strategies employed. The parameter values of wind turbines are listed in Table 4. The frequency deviation of each region and the power deviation of the liaison line between each region are shown in Figure 10.

This paper utilizes synchronized phasor measurement units for real-time sampling, employing zero-crossing detection to capture the zero-crossing points of the voltage waveform to calculate the period T, and then determining the frequency value using

f = 1 / T

. The sampling frequency is set at 10 kHZ, with 200 sampling points per power frequency cycle, meeting the Nyquist sampling theorem requirements for 50/60 HZ systems. According to the cyclic measurement method, the theoretical resolution under the rated frequency is

Δ f = 1 / (N T_{s})

, where

N = 200

, with a sampling period of

T_{s} = 0.1

s, resulting in

Δ f \approx 0.05

HZ. The simulation results indicate that the absolute error in frequency measurement is less than or equal to 0.05 HZ, which meets the requirements for measurement accuracy.

From the above experimental results, it can be seen that several methods can accurately respond to the load change when the system load is perturbed. From the dynamic response curve, when RESs are connected, the transient response increases rapidly due to its overproduction. After that, the designed first-order ADRC processes the situation and returns the response to a normal value so that it can play a positive role in the LFC. Each method can effectively adjust the system frequency deviation to 0. However, compared to the other two methods, the designed first-order ADRC can effectively stabilize the deviation between each region. In a short time, the first-order ADRC has a faster response time and less overshooting, showing good control effect and anti-interference ability. Moreover, comparing the ADRC using conventional manual tuning with the ADRC optimized through traditional genetic algorithms, the DDPG-optimized ADRC demonstrates a stronger disturbance suppression capability and the shortest recovery time for stability, which can significantly enhance system robustness. The comparison of steady-state performance and dynamic response performance indicators is shown in Table 5. Obviously, the above experimental outcomes indicate that the first-order ADRC optimization approach can be used for the control of RESs and ES units to further verify the control effect.

4.2. Performance Evaluation of the System Under Communication Delays

The aforementioned experimental scenario only considers the impact of the integration of renewable energy on system performance. In this section, we will include additional uncertainty factors for simulation. Considering the impact of communication delays, taking the previous two areas as an example, we introduced a random delay of 100–500 ms into the control loop to test the robustness of DDPG-ADRC. The results of the experiment are illustrated in Figure 11.

The results indicate that the expanded state observer of ADRC can estimate the delay effects in real time. The frequency deviation of DDPG-ADRC is reduced by 45% compared to the other two proposed algorithms. When the time delay is ≤500 ms, the frequency deviation can still be maintained within ±0.02 HZ, indicating that communication delays do not have a significant impact on frequency deviation.

Furthermore, to verify the impact of

Δ P_{t i e i}

on the experimental results presented in this article, two regions are taken as examples. Scenario 1 is based on the original model, ignoring

Δ P_{t i e i}

. Scenario 2 considers the dynamics of

Δ P_{t i e i}

. The experimental results are illustrated in Figure 12.

The results indicate that in both scenarios, the frequency deviation of DDPG-ADRC is less than 3%, and the overshoot variation does not exceed 5%. This suggests that, under the current system parameters, ignoring the impact of

Δ P_{t i e i}

on control performance is within an acceptable range.

4.3. System Performance Evaluation with ES

In this subsection, we have added a comparison of four other algorithms (chaos-optimized-FOPID, MPC, generalized-ADRC, SAC-ADRC). The capacity of SMES is 30 MW, and the capacity of CES is 3.5 MW. The capacity selection reference for SMES is not merely an average value [41]. The choice of parameters is the result of adjustments based on the disturbance assumptions and control objectives of the research scenario. The selection of CES parameters is similar to that of SMES. The parameters of ES are listed in Table 6. After the addition of the ES units, the response fluctuation is always significantly suppressed, and the ACE response curves after adding ES units are shown in Figure 13 and Figure 14. The dynamic response indicators of different algorithms are shown in Table 7.

As can be seen from Figure 13, compared to other algorithms, DDPG-ADRC exhibits significant advantages in both dynamic response and steady-state accuracy. Under the same load disturbance conditions, the average rise time of the dynamic response is reduced by 27.25%, and the overshoot is decreased by approximately 66.7%. The dynamic response curve in Figure 14 indicates that without the integration of ES, the frequency fluctuations exhibit a distinct long-term oscillation pattern. However, after integrating ES with an optimized controller, the oscillations rapidly diminish, and the system returns to steady-state operation in a short period. Therefore, the addition of ES devices can improve the entire power system’s performance. The above results fully demonstrate the capability of ES and advanced control strategies to collaboratively address the uncertainties in interconnected power systems.

5. Conclusions

An LFC method for an RES multi-region interconnected power system was studied to improve the power system’s stability and reliability. Specifically, by focusing on system model structure and control strategies, the LFC scheme with RESs was optimized and innovated to enhance system performance. The fundamental modules of the LFC loops were established, and the RESs and ES units were modeled. Throughout the process, a practical ADRC was created based on the power system model and ADRC principles to enhance the system’s anti-interference. The optimal parameters of ADRC were tuned using DDPG. To enhance the LFC ability and support the integration of RESs into the grid, real data were collected from the test site and simulated at the actual nodes of the power network. The simulation results indicate that the proposed method maintains the frequency deviation within 0.02 HZ under a 0.01 p.u. load disturbance, reducing it by 23% compared to the standalone ADRC control and improving it by 18% compared to DDPG. Meanwhile, when integrating RESs, the overshoot remains below 0.15 HZ. This method significantly enhances the robustness of frequency regulation in the system through the coupling mechanism of reinforcement learning and disturbance-resistant control. The method and model studied in this paper assume that the forecasting error of the RES output is within 20%, without considering the excessively large forecasting errors that extreme weather conditions in real scenarios may cause, which could in turn affect the strategy optimization performance of the DDPG algorithm. Currently, validation has only been conducted through software simulation, and the hardware compatibility of the controller has not been tested on a real-time digital simulation platform. In the future, a distributed computing architecture will be adopted to transform the centralized DDPG algorithm into a multi-agent reinforcement learning framework to address more uncertainty issues. Various performances of the controller will be tested on the real-time digital simulation platform.

Author Contributions

Conceptualization and methodology, X.Z.; validation and formal analysis, Z.D.; investigation and data curation, C.Z.; writing—original draft preparation, X.Z.; writing—review and editing, D.G. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Technology Project of the State Grid Shanghai Municipal 291 Electric Power Company (No. 52093324000J).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Authors Zhenlan Dou and Chunyan Zhang were employed by the company State Grid Shanghai Municipal Electric Power Company. Author Xichao Zhou was employed by the company State Grid Integrated Energy Services Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Liu, G.; Vrakopoulou, M.; Mancarella, P. Assessment of the capacity credit of renewables and storage in multi-area power systems. IEEE Trans. Power Syst. 2021, 36, 2334–2344. [Google Scholar]
Mi, Y.; Ma, Y.; He, X.; Yang, X.; Gong, J.; Zhao, Y. Robust load frequency control for isolated microgrids based on double-loop compensation. CSEE J. Power Energy Syst. 2023, 9, 1359–1369. [Google Scholar]
Li, M.; Zhang, Z.; Hu, S.; Lian, H. Sampling PI load frequency control for new energy power system. CSEE J. Power Energy Syst. 2023, 43, 939–950. [Google Scholar]
Xia, K.S.; Liu, Y.; Wu, Q.H. Robust load frequency control of power systems against random time-delay attacks. IEEE Trans. Smart Grid 2021, 12, 909–911. [Google Scholar]
Jia, Y.B.; Zhao, Y.D.; Sun, C.Y.; Meng, K. Cooperation-based distributed economic MPC for economic load dispatch and load frequency control of interconnected power systems. IEEE Trans. Power Syst. 2019, 34, 3964–3966. [Google Scholar]
Babu, N.R.; Bhagat, S.K.; Chiranjeevi, T.; Pushkarna, M.; Saha, A.; Kotb, H.; AboRas, K.M.; Alsaif, F.; Alsulamy, S.; Ghadi, Y.Y.; et al. Frequency control of a realistic dish stirling solar thermal system and accurate HVDC models using a cascaded FOPI-IDDN-based crow search algorithm. Int. J. Energy Res. 2023, 1, 9976375. [Google Scholar]
Chen, P.; Liu, S.; Zhang, D.; Yu, L. Adaptive event-triggered decentralized dynamic output feedback control for load frequency regulation of power systems with communication delays. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 5949–5961. [Google Scholar]
Pathak, P.K.; Yadav, A.K. Fuzzy assisted optimal tilt control approach for LFC of renewable dominated micro-grid: A step towards grid decarbonization. Sustain. Energy Technol. Assessments 2023, 60, 103551. [Google Scholar]
Zhang, Z.; Liao, S.; Sun, Y.; Xu, J.; Ke, D.; Wang, B. A parallel-type load damping factor controller for frequency regulation in power systems with high penetration of renewable energy sources. J. Mod. Power Syst. Clean Energy 2024, 12, 1019–1030. [Google Scholar]
Liu, S.; You, H.H.; Li, J.L.; Kai, S.J.; Yang, L.Y. Active disturbance rejection control based distributed secondary control for a low-voltage DC microgrid. Sustain. Energy Grids Netw. 2021, 27, 100515. [Google Scholar]
Cui, Y.; Yin, Z.; Luo, P.; Yuan, D.; Liu, J. Linear active disturbance rejection control of IPMSM based on quasi-proportional resonance and disturbance differential compensation linear extended state observer. IEEE Trans. Ind. Electron. 2024, 71, 11910–11924. [Google Scholar]
Lin, S.; Cao, Y.; Li, C.; Wang, Z.; Shi, T.; Xia, C. Two-degree-of-freedom active disturbance rejection current control for permanent magnet synchronous motors. IEEE Trans. Power Electron. 2023, 38, 3640–3652. [Google Scholar]
Xu, L.; Zhuo, S.; Liu, J.; Jin, S.; Huang, Y.F.; Gao, F. Advancement of active disturbance rejection control and its applications in power electronics. IEEE Trans. Ind. Appl. 2024, 60, 1680–1694. [Google Scholar]
Meng, Q.; Hou, Z. Active disturbance rejection based repetitive learning control with applications in power inverters. IEEE Trans. Control Syst. Technol. 2021, 29, 2038–2048. [Google Scholar]
Wang, Q.C.; Xu, H.C.; Pan, L.; Sun, L. Active disturbance rejection control of boiler forced draft system: A data-driven practice. Sustainability 2020, 12, 4171. [Google Scholar]
Dong, Z.; Li, B.W.; Li, J.Y.; Guo, Z.W.; Huang, X.; Zhang, Y.; Zhang, Z. Flexible control of nuclear cogeneration plants for balancing intermittent renewables. Energy 2021, 221, 119906. [Google Scholar]
Li, X.H.; Wang, W.; Fang, F.; Liu, J.Z.; Chen, Z. Improving active power regulation for wind turbine by phase leading cascaded error-based active disturbance rejection control and multi-objective optimization. Renew. Energy 2025, 243, 122629. [Google Scholar]
Safiullah, S.; Rahman, A.; Lone, S.A.; Hussain, S.M.S. Novel COVID-19 based optimization algorithm (C-19BOA) for performance improvement of power systems. Sustainability 2022, 14, 14287. [Google Scholar]
Jain, S.; Hote, Y.V. Design of improved nonlinear active disturbance rejection controller for hybrid microgrid with communication delay. IEEE Trans. Sustain. Energy 2022, 13, 1101–1111. [Google Scholar]
Hosseini, S.A.; Toulabi, M.; Dobakhshari, A.S.; Ashouri-Zadeh, A.; Ranjbar, A.M. Delay compensation of demand response and adaptive disturbance rejection applied to power system frequency control. IEEE Trans. Power Syst. 2020, 35, 2037–2046. [Google Scholar]
Xi, L.; Zhang, L.; Xu, Y.; Wang, S.; Yang, C. Automatic generation control based on multiple-step greedy attribute and multiple-level allocation strategy. CSEE J. Power Energy Syst. 2022, 8, 281–292. [Google Scholar]
Chen, Z.Q.; Huang, Z.Y.; Sun, M.W.; Sun, Q.L. Active disturbance rejection control of load frequency based on big probability variation’s genetic algorithm for parameter optimization. CAAI Trans. Intell. Syst. 2020, 15, 41–49. [Google Scholar]
Wang, Y.C.; Fang, S.H.; Hu, J.X.; Huang, D.M. Multiscenarios parameter optimization method for active disturbance rejection control of PMSM based on deep reinforcement learning. IEEE Trans. Ind. Electron. 2023, 70, 10957–10968. [Google Scholar]
Wang, Y.C.; Fang, S.H.; Hu, J.X. Active disturbance rejection control based on deep reinforcement learning of PMSM for more electric aircraft. IEEE Trans. Power Electron. 2023, 38, 406–416. [Google Scholar]
Jain, H.; Mather, B.; Jain, A.K. Grid-supportive loads-a new approach to increasing renewable energy in power systems. IEEE Trans. Smart Grid 2022, 13, 2959–2972. [Google Scholar]
Yu, Y.J.; Cai, Z.F.; Liu, Y.C. Double deep Q-learning coordinated control of hybrid energy storage system in island micro-grid. Int. J. Energy Res. 2020, 45, 3315–3326. [Google Scholar]
Gu, C.J.; Wang, J.X.; Yang, Q.; Wang, X.L. Assessing operational benefits of large-scale energy storage in power system: Comprehensive framework, quantitative analysis, and decoupling method. Int. J. Energy Res. 2021, 45, 10191–10207. [Google Scholar]
Tang, Y.M.; Bai, Y.; Huang, C.Z.; Du, B. Linear active disturbance rejection-based load frequency control concerning high penetration of wind energy. Energy Convers. Manag. 2015, 95, 259–271. [Google Scholar]
Ebrahimi, H.; Abapour, M.; Mohammadi-Ivatloo, B.; Golshannavaz, S.; Yazdaninejadi, A. Decentralized approach for security enhancement of wind-integrated energy systems coordinated with energy storages. Int. J. Energy Res. 2021, 46, 5006–5027. [Google Scholar]
Jiang, S.; Xu, Y.; Li, G.; Xin, Y.; Wang, L. Coordinated control strategy of receiving-end fault ride-through for DC grid connected large-scale wind power. IEEE Trans. Power Deliv. 2022, 37, 2673–2683. [Google Scholar]
Liu, J.; Yao, W.; Wen, J.; Fang, J.; Jiang, L.; He, H. Impact of power grid strength and PLL parameters on stability of grid-connected DFIG wind farm. IEEE Trans. Sustain. Energy 2020, 11, 545–557. [Google Scholar]
Wang, W.; Liu, L.; Liu, J.; Chen, Z. Energy management and optimization of vehicle-to-grid systems for wind power integration. CSEE J. Power Energy Syst. 2021, 7, 172–180. [Google Scholar]
Rouhanian, A.; Aliamooei-Lakeh, H.; Aliamooei-Lakeh, S.; Toulabi, M. Improved load frequency control in power systems with high penetration of wind farms using robust fuzzy controller. Electr. Power Syst. Res. 2023, 224, 109511. [Google Scholar]
Ye, F.; Hu, Z.J. Robust load frequency control of interconnected power systems with back propagation neural network-proportional-integral-derivative-controlled wind power integration. Sustainability 2024, 16, 8062. [Google Scholar]
Wang, P.; Guo, J.; Cheng, F.; Gu, Y.; Yuan, F.; Zhang, F. A MPC-based load frequency control considering wind power intelligent forecasting. Renew. Energy 2025, 244, 122636. [Google Scholar]
Barakat, M. Novel chaos game optimization tuned-fractional-order PID fractional-order PI controller for load-frequency control of interconnected power systems. Prot. Control Mod. Power Syst. 2022, 7, 1–20. [Google Scholar]
Jain, S.; Hote, Y.V. Generalized active disturbance rejection controller for load frequency control in power systems. IEEE Control Syst. Lett. 2020, 4, 73–78. [Google Scholar]
Li, Z.; Li, X.; Lin, Y.; Wei, Y.; Li, Z.; Li, Z. Active disturbance rejection control for static power converters in flexible AC traction power supply systems. IEEE Trans. Energy Convers. 2022, 37, 2851–2862. [Google Scholar]
Song, D.; Chang, Q.; Zheng, S.; Yang, S.; Yang, J.; Joo, Y.H. Adaptive model predictive control for yaw system of variable-speed wind turbines. J. Mod. Power Syst. Clean Energy 2021, 9, 219–224. [Google Scholar]
Baškarad, T.; Holjevac, N.; Kuzle, I. Photovoltaic system control for power system frequency support in case of cascading events. IEEE Trans. Sustain. Energy 2023, 14, 1324–1334. [Google Scholar]
Mekhamer, A.S.; Hasanien, H.M.; Alharbi, M. Coati optimization algorithm-based optimal frequency control of power systems including storage devices and electric vehicles. J. Energy Storage 2024, 93, 112367. [Google Scholar]
Gulzar, M.M.; Sibtain, D.; Khalid, M. Cascaded fractional model predictive controller for load frequency control in multiarea hybrid renewable energy system with uncertainties. Int. J. Energy Res. 2023, 2023, 5999997. [Google Scholar]
Du, X.B.; Guo, C.Y.; Zhao, C.Y. Hydropower units are modeled by HVDC transmission system and multi-band oscillation mode analysis. Autom. Electr. Power Syst. 2022, 46, 75–83. [Google Scholar]
Yi, X.Q.; Wang, D.; Liu, H.T. Dynamic simulation model of medium-voltage DC generator set considering the speed regulation characteristics of prime mover. Trans. China Electrotech. Soc. 2024, 39, 2974–2983. [Google Scholar]
Sun, L.; Xue, W.; Li, D.; Zhu, H.; Su, Z.G. Quantitative tuning of active disturbance rejection controller for FOPTD model with application to power plant control. IEEE Trans. Ind. Electron. 2022, 69, 805–815. [Google Scholar]

Figure 1. Load frequency control model of region i.

Figure 2. SMES structure diagram.

Figure 3. CES structure diagram.

Figure 4. Diagram sketch of the first-order ADRC controller.

Figure 5. Structure of DDPG-ADRC parameter adjuster.

Figure 6. Neuronal structure diagram of DDPG.

Figure 7. Flowchart of ADRC parameter optimization based on DDPG.

Figure 8. The IEEE 39-bus power system with RESs and ES.

Figure 9. The output power curves. (a) The output power curves of the wind turbines. (b) Frequency The output power curves of the PV.

Figure 10. Frequency deviation and power deviation. (a) Frequency deviation of region 1. (b) Frequency deviation of region 2. (c) Frequency deviation of region 3. (d) Power deviation of the liaison lines in region 1 and region 2. (e) Power deviation of the liaison lines in region 1 and region 3. (f) Power deviation of the liaison lines in region 2 and region 3.

Figure 11. Maximum frequency deviation under communication delay. (a) The maximum frequency deviation of region 1. (b) The maximum frequency deviation of region 2.

Figure 12. Influence curve of

Δ P_{t i e i}

on system frequency variation. (a) The frequency deviation of region 1. (b) The frequency deviation of region 2.

Figure 12. Influence curve of

Δ P_{t i e i}

on system frequency variation. (a) The frequency deviation of region 1. (b) The frequency deviation of region 2.

Figure 13. Frequency deviation and power deviation under ES units. (a) Frequency deviation of region 1. (b) Frequency deviation of region 2. (c) Frequency deviation of region 3. (d) Power deviation of the liaison lines in region 1 and region 2. (e) Power deviation of the liaison lines in region 1 and region 3. (f) Power deviation of the liaison lines in region 2 and region 3.

Figure 14. Area control error response curve under ES units.

Table 1. Optimized ADRC parameters.

Symbol	$β_{1}^{*}$	$β_{2}^{*}$	$w_{c}^{*}$	$b_{0}^{*}$
Value	75.62	37.21	4	150

Table 2. System parameters.

Symbol	$R_{i}$	$B_{i}$	$K_{p}$	$K_{g}$	$K_{T}$	$K_{r}$	$K_{i j}$	$T_{p i}$	$T_{t i}$	$T_{g i}$	$T_{r i}$
Value	2.4	3	120	0.58	10	0.1	0.2	20	0.3	0.08	0.03

Table 3. Training parameter values.

Parameters	Learning Rate (Actor)	Learning Rate (Critic)	Noise	Batch Size	$μ$	$ε$
Value	$10^{- 4}$	$3 \times 10^{- 4}$	0.2	64	$10^{- 3}$	0.99

Table 4. Wind turbine parameters.

Symbol	$ρ$	$A_{b}$	$V_{s}$	$R_{t}$	$ω_{t}$	$α$	$μ$	$C_{f}$
Value	1.225 $kg / m^{3}$	5905 $m^{2}$	10 m/s	50 m	15 rad/min	3.6	8	0.421

Table 5. Performance comparison of control methods under load disturbance.

Index	Region	Traditional Methods				DDPG-ADRC	Improvement
Index	Region	PID	PI	ADRC	GA-ADRC	DDPG-ADRC	(%)
Rise time (s)	$Δ f_{1}$	2.30	2.25	2.10	2.05	1.25	40.5
	$Δ f_{2}$	2.45	2.40	2.25	2.20	1.35	40.0
	$Δ f_{3}$	2.20	2.15	2.05	1.95	1.20	41.5
	$Δ P_{t i e}^{12}$	0.95	0.90	0.85	0.80	0.45	47.1
Settling time (s)	$Δ f_{1}$	9.20	8.80	8.50	8.30	4.20	45.8
	$Δ f_{2}$	9.60	9.30	9.00	8.70	4.50	45.8
	$Δ f_{3}$	8.80	8.50	8.20	8.00	4.00	46.2
	$Δ P_{t i e}^{13}$	4.20	4.00	3.80	3.60	1.90	45.8
Overshoot (%)	$Δ f_{1}$	21.5	20.0	18.5	16.0	4.2	60.0
	$Δ f_{2}$	22.2	20.8	19.2	16.5	4.5	60.2
	$Δ f_{3}$	20.5	19.0	17.8	15.2	3.8	60.1
	$Δ P_{t i e}^{23}$	14.5	13.2	12.0	10.5	2.5	60.0
ACE	1	0.98	0.92	0.85	0.78	0.32	62.4
	2	1.05	0.98	0.92	0.85	0.35	62.0
	3	0.92	0.85	0.78	0.70	0.28	64.1

Table 6. ES device parameters.

Symbol	$K_{i f}$	$η_{c}$	$δ$	h	$d_{0}$	$d_{i}$
Value	0.4 kV/kA	0.18	1.8 $W / m^{3}$	0.625 m	2.412 m	0.517 m
Symbol	$n^{'}$	$I_{0}$	$R_{j r}$	$K_{α}$	L	$T_{s c}$
Value	2	5000 A	150 $Ω$	100 KV/p.u. MW	3.4 H	0.3 s
Symbol	c	$R^{'}$	$T_{c}$	$K_{c f}$	$K_{v d}$
Value	1.5 F	80 $Ω$	0.05 s	100 KA/p.u. MW	0.25 KA/KV

Table 7. Performance comparison of control methods under different algorithms.

Index	Region	Other Methods				DDPG-ADRC	Improvement
Index	Region	Chaos- optimized- FOFID	MPC	Geralized ADRC	SAC- ADRC	DDPG-ADRC	(%)
Rise time (s)	$Δ f_{1}$	1.95	1.90	1.85	1.80	1.25	30.6
	$Δ f_{2}$	2.10	2.05	2.00	1.95	1.35	30.8
	$Δ f_{3}$	1.85	1.80	1.75	1.70	1.20	29.4
	$Δ P_{t i e}^{12}$	0.70	0.65	0.60	0.55	0.45	18.2
Settling time (s)	$Δ f_{1}$	7.80	7.50	7.20	7.00	4.20	40.0
	$Δ f_{2}$	8.10	7.80	7.50	7.30	4.50	38.4
	$Δ f_{3}$	7.20	6.90	6.60	6.40	4.00	37.5
	$Δ P_{t i e}^{13}$	3.30	3.10	2.90	2.70	1.90	29.6
Overshoot (%)	$Δ f_{1}$	14.5	13.8	13.0	12.2	4.2	65.6
	$Δ f_{2}$	15.2	14.5	13.8	13.0	4.5	65.4
	$Δ f_{3}$	13.8	13.0	12.2	11.5	3.8	67.0
	$Δ P_{t i e}^{23}$	9.5	9.0	8.5	8.0	2.5	68.8
ACE	1	0.72	0.68	0.65	0.60	0.32	46.7
	2	0.78	0.74	0.70	0.65	0.35	46.2
	3	0.65	0.62	0.58	0.54	0.28	48.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dou, Z.; Zhang, C.; Zhou, X.; Gao, D.; Liu, X. DDPG-ADRC-Based Load Frequency Control for Multi-Region Power Systems with Renewable Energy Sources and Energy Storage Equipment. Energies 2025, 18, 3610. https://doi.org/10.3390/en18143610

AMA Style

Dou Z, Zhang C, Zhou X, Gao D, Liu X. DDPG-ADRC-Based Load Frequency Control for Multi-Region Power Systems with Renewable Energy Sources and Energy Storage Equipment. Energies. 2025; 18(14):3610. https://doi.org/10.3390/en18143610

Chicago/Turabian Style

Dou, Zhenlan, Chunyan Zhang, Xichao Zhou, Dan Gao, and Xinghua Liu. 2025. "DDPG-ADRC-Based Load Frequency Control for Multi-Region Power Systems with Renewable Energy Sources and Energy Storage Equipment" Energies 18, no. 14: 3610. https://doi.org/10.3390/en18143610

APA Style

Dou, Z., Zhang, C., Zhou, X., Gao, D., & Liu, X. (2025). DDPG-ADRC-Based Load Frequency Control for Multi-Region Power Systems with Renewable Energy Sources and Energy Storage Equipment. Energies, 18(14), 3610. https://doi.org/10.3390/en18143610

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DDPG-ADRC-Based Load Frequency Control for Multi-Region Power Systems with Renewable Energy Sources and Energy Storage Equipment

Abstract

1. Introduction

2. System Model

2.1. RESs Modeling

2.2. ES Equipment Modeling

2.3. LFC Model of Power System

3. DDPG-ADRC Framework

3.1. Design of First-Order ADRC

3.2. DDPG-Based Parameter Adjuster

3.3. Parameter Optimization

4. Simulation Results

4.1. Three-Region Interconnected Power System Without ES

4.2. Performance Evaluation of the System Under Communication Delays

4.3. System Performance Evaluation with ES

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI