Adaptive Active Disturbance Rejection Load Frequency Control for Power System with Renewable Energies Using the Lyapunov Reward-Based Twin Delayed Deep Deterministic Policy Gradient Algorithm

Zheng, Yuemin; Tao, Jin; Sun, Qinglin; Sun, Hao; Chen, Zengqiang; Sun, Mingwei

doi:10.3390/su151914452

Open AccessArticle

Adaptive Active Disturbance Rejection Load Frequency Control for Power System with Renewable Energies Using the Lyapunov Reward-Based Twin Delayed Deep Deterministic Policy Gradient Algorithm

by

Yuemin Zheng

¹

,

Jin Tao

^2,*

,

Qinglin Sun

^1,*

,

Hao Sun

¹,

Zengqiang Chen

¹

and

Mingwei Sun

¹

College of Artificial Intelligence, Nankai University, Tianjin 300350, China

²

Silo AI, 00100 Helsinki, Finland

^*

Authors to whom correspondence should be addressed.

Sustainability 2023, 15(19), 14452; https://doi.org/10.3390/su151914452

Submission received: 24 August 2023 / Revised: 27 September 2023 / Accepted: 30 September 2023 / Published: 3 October 2023

(This article belongs to the Section Sustainable Engineering and Science)

Download

Browse Figures

Versions Notes

Abstract

:

The substitution of renewable energy sources (RESs) for conventional fossil fuels in electricity generation is essential in addressing environmental pollution and resource depletion. However, the integration of RESs in the load frequency control (LFC) of power systems can have a negative impact on frequency deviation response, resulting in a decline in power quality. Moreover, load disturbances can also affect the stability of frequency deviation. Hence, this paper presents an LFC method that utilizes the Lyapunov reward-based twin delayed deep deterministic policy gradient (LTD3) algorithm to optimize the linear active disturbance rejection control (LADRC). With the advantages of being model-free and mitigating unknown disturbances, LADRC can regulate load disturbances and renewable energy deviations. Additionally, the LTD3 algorithm, based on the Lyapunov reward function, is employed to optimize controller parameters in real-time, resulting in enhanced control performance. Finally, the LADRC-LTD3 is evaluated using a power system containing two areas, comprising thermal, hydro, and gas power plants in each area, as well as RESs such as a noise-based wind turbine and photovoltaic (PV) system. A comparative analysis is conducted between the performance of the proposed controller and other control techniques, such as integral controller (IC), fractional-order proportional integral derivative (FOPID) controller, I-TD, ID-T, and TD3-optimized LADRC. The results indicate that the proposed method effectively addresses the LFC problem.

Keywords:

load frequency control; Lyapunov reward; twin delayed deep deterministic policy gradient algorithm; linear active disturbance rejection control; renewable energy sources

1. Introduction

Maintaining system frequency within an acceptable range is a crucial requirement in power system operation, and load frequency control (LFC) plays a vital role in achieving this objective. LFC ensures the balance between active power generation and load [1]. Ensuring the safe operation of the system, particularly in the face of load disturbances, necessitates the effective regulation of system frequency. Hence, this study focuses on addressing the load frequency control problem. The evolving landscape of modern power systems has witnessed substantial transformations in recent years. These changes are driven by the growing diversity of loads and the escalating adoption of renewable energy sources (RESs) as alternatives to traditional power plants, addressing their inherent limitations [2]. However, integrating RESs can lead to reduced system stability and increased frequency deviation. As a result, it is crucial to address the challenges posed by load disturbance and RESs and achieve efficient load frequency control.

Currently, there are numerous power system models being studied. These models can be broadly categorized into one area [3,4,5] and multiple areas [6,7,8], based on the number of areas involved. Among these models, thermal plants [9], hydro plants [10], and natural gas plants [11] are the most common. Moreover, in the past few years, RESs, particularly wind turbines [12] and solar energy [13], have been an extensive research subject. Bakeer et al. [14] provides an extensive discussion on the first-order transfer function model [15], second-order transfer function model [16], and noise-based model [17] of photovoltaic (PV) systems. In a similar vein, for wind turbines, researchers have also explored the first-order transfer function [18], the second-order wind farm with high-voltage direct current (HVDC) line [19], the noise-based model [20], and the doubly fed induction generator (DFIG)-driven wind power system [21]. Except for investigating various energy sources, researchers introduced nonlinear components such as governor deadband (GDB) and generation rate constraint (GRC) into the system to enhance its resemblance to real-world power systems.

In recent years, intelligent control methods that integrate the control system with optimization algorithms have gained popularity for load frequency control. Among these methods, the proportional–integral–derivative (PID) control remains the dominant choice for controllers [22,23,24]. However, the performance of the control system obtained by ordinary PID is poor. Therefore, many improved control methods based on PID have been proposed. For example, Guha et al. [21] proposed a cascade fractional-order controller (CC-FOC) consisting of three-degrees-of-freedom PID (3DOF-PID) and tilt–integral–derivative (TID) controllers, and implemented it on a power system containing three areas with DFIG and conducted performance comparison against a conventional PID. In addition, various control strategies such as sliding mode control [25,26], model predictive control [27], and linear active disturbance rejection control (LADRC) [28] have also been presented as effective solutions for LFC. In particular, the LADRC controller features a simple structure, model-free operation, and the capability to leverage the extended state observer (ESO) to reckon undefined disturbances [29]. As a result, it discovered extensive applications surpassing LFC, such as the servo system [30], motion control [31,32], and control of the liquid level [33]. LADRC is primarily employed for load frequency control in power systems that do not incorporate renewable energy sources. However, in this paper, LADRC is applied to power systems that feature renewable energy sources.

The optimization algorithms currently used in load frequency control systems mainly rely on heuristic methods, such as the grasshopper optimization algorithm (GOA) [34], dragonfly search algorithm (DSA) [35], and whale optimization algorithm [36], etc. However, these algorithms usually only provide the optimal parameters for a specific scenario, which implies that they cannot obtain adaptive parameters. The problem can be addressed by utilizing deep reinforcement learning (DRL). Deep learning algorithms have gained prominence in artificial intelligence, primarily due to their ability to acquire optimal policies for solving sequential decision-making problems [37,38]. Numerous research studies have explored the utilization of DRL for LFC in power systems. For example, Khalid et al. [39] tuned the PID controller parameters by a twin delayed deep deterministic policy gradient (TD3) algorithm for a two-area interconnected power system, where each area contains a TD3 agent. Zheng et al. [40] proposed a soft actor–critic (SAC) algorithm that optimized the LADRC method and achieved application to LFC in power systems. In these studies, the design of the reward function lacked the corresponding theoretical analysis. As a result, there is no unified and efficient design rule for the reward function in DRL algorithms. Drawing inspiration from the Lyapunov reward function [41,42], this paper presents the LTD3 based on the Lyapunov function and TD3 algorithm, which enables real-time optimization of LADRC controller parameters.

The main contributions are summarized below,

A Lyapunov-based reward function is constructed for LFC systems to improve the convergence of the TD3, namely LTD3.
The design of LADRC aims to mitigate the effects of load disturbances and the integration of RESs in power systems.
The proposed LTD3 algorithm is utilized to dynamically adjust the parameters $ω_{o}$ and $ω_{c}$ of the LADRC by observing the system state. To evaluate its performance, a comparative analysis is conducted with other techniques such as IC, FOPID, and ID-T [20,43,44]. The evaluations are conducted on a two-area power system incorporating a noise-based wind turbine and PV system, considering different scenarios.

The paper is mainly organized as below. Section 2 presents the mathematical model of a general power system. Section 3 explains the process of the LADRC for LFC. Section 4 introduces the LTD3-based LADRC parameter optimization method. Section 5 showcases simulation results to demonstrate the validation of the proposed method. Lastly, Section 6 concludes the paper and offers suggestions for future work.

2. Multi-Source Power System Modeling

Currently, the majority of electricity generation in most countries is derived from thermal, hydro, and gas power plants, which are utilized to fulfill the base load demand. Further, renewable energy sources are also used to meet load demand. Therefore, this paper considers a multi-source power system comprising thermal, hydro, and gas turbines and renewable energies, as illustrated in Figure 1.

Figure 1 exhibits a general structure of one area in an n-area interconnected power system, denoted as i-th area. It illustrates the correlation between

Δ f_{i}

,

{ACE}_{i}

, and

Δ P_{tiei}

. And the mathematical expression is shown in Equation (1),

{ACE}_{i} = B_{i} Δ f_{i} + Δ P_{tiei}

(1)

where

\{\begin{matrix} Δ f_{i} = (Δ P_{tpi} + Δ P_{hpi} + Δ P_{gpi} + Δ P_{rei} - Δ P_{di} - Δ P_{tiei}) G_{pi} \\ Δ {\dot{P}}_{tiei} = \sum_{j = 1 (j \neq i)}^{n} 2 π T_{ij} (Δ f_{i} - Δ f_{j}) \end{matrix}

(2)

Generally, the generator of the power system is usually presented by the following transfer function:

G_{pi} = \frac{1}{2 Hs + D}

(3)

The transfer functions of thermal power plants

G_{th}

, hydro turbine plants

G_{hp}

, gas turbine plants

G_{gp}

, and renewable energy sources

Δ P_{w}

,

Δ P_{Solar}

are introduced in Section 2.1, Section 2.2, Section 2.3, Section 2.4 and Section 2.5, respectively, as referenced in [20]. It is worth noting that the mechanism construction process of the power system model is not considered in this article. Additionally, referring to Figure 1, we can derive the following expression:

\{\begin{matrix} Δ P_{tpi} = G_{th} Δ P_{c 1 i} \\ Δ P_{hpi} = G_{hp} Δ P_{c 2 i} \\ Δ P_{gpi} = G_{gp} Δ P_{c 3 i} \end{matrix}

(4)

where

Δ P_{c 1 i}

,

Δ P_{c 2 i}

, and

Δ P_{c 3 i}

are related to the controller output

u_{i}

, and also according to Figure 1, their expressions can be obtained as

\{\begin{matrix} Δ P_{c 1 i} = u_{i} - \frac{Δ f_{i}}{R_{i 1}} \\ Δ P_{c 2 i} = u_{i} - \frac{Δ f_{i}}{R_{i 2}} \\ Δ P_{c 3 i} = u_{i} - \frac{Δ f_{i}}{R_{i 3}} \end{matrix}

(5)

With Equations (2), (4), and (5), the new expression for

Δ f_{i}

can be organized as

Δ f_{i} = \frac{(G_{th} + G_{hp} + G_{gp}) G_{pi}}{1 + (\frac{G_{th}}{R_{i 1}} + \frac{G_{hp}}{R_{i 2}} + \frac{G_{gp}}{R_{i 3}}) G_{pi}} u_{i} + \frac{(Δ P_{rei} - Δ P_{di} - Δ P_{tiei}) G_{pi}}{1 + (\frac{G_{th}}{R_{i 1}} + \frac{G_{hp}}{R_{i 2}} + \frac{G_{gp}}{R_{i 3}}) G_{pi}}

(6)

Thus, combining Equations (1) and (6), the expression about

{ACE}_{i}

and

u_{i}

of the i-

th

area can be obtained as

\begin{matrix} {ACE}_{i} = \frac{B_{i} (G_{th} + G_{hp} + G_{gp}) G_{pi}}{1 + (\frac{G_{th}}{R_{i 1}} + \frac{G_{hp}}{R_{i 2}} + \frac{G_{gp}}{R_{i 3}}) G_{pi}} u_{i} + \frac{B_{i} (Δ P_{rei} - Δ P_{di} - Δ P_{tiei}) G_{pi}}{1 + (\frac{G_{th}}{R_{i 1}} + \frac{G_{hp}}{R_{i 2}} + \frac{G_{gp}}{R_{i 3}}) G_{pi}} + Δ P_{tiei} \end{matrix}

(7)

2.1. Thermal Power Plant

In this paper, we consider a thermal power that includes the reheat turbine and additionally incorporates nonlinear links of GDB and GRC, shown in Figure 2. The GRC is taken as

3 %

pu/min, while the transfer function of GDB is

GDB = 0.8 - \frac{0.2}{π} s

(8)

And there is the transfer function of the thermal power plant as

G_{th} = \frac{Δ P_{tp}}{Δ P_{c 1}} = \frac{KT (0.8 - 0.2 s / π) (K_{r} T_{r} s + 1)}{(T_{g} s + 1) (T_{t} s + 1) (T_{r} s + 1)}

(9)

2.2. Hydro Power Plant

The diagram expressing the transfer function of a hydroelectric power unit is illustrated in Figure 3, consisting of a governor, a transient droop compensation, and a penstock turbine with GRC constraints. The GRC for upward adjustments is 270% pu/min, while for downward adjustments, it is 360% pu/min.

Then, there is the transfer function of the hydropower plant.

G_{hp} = \frac{Δ P_{hp}}{Δ P_{c 2}} = \frac{KH (T_{rs} s + 1) (- T_{w} s + 1)}{(T_{gh} s + 1) (T_{rh} s + 1) (0.5 T_{w} s + 1)}

(10)

2.3. Gas Power Plant

Similarly, the gas system, shown in Figure 4, has the following expression:

G_{gp} = \frac{Δ P_{gp}}{Δ P_{c 3}} = \frac{KG (X_{g} s + 1) (- T_{cr} s + 1)}{(B_{g} s + C_{g}) (Y_{g} s + 1) (T_{f} s + 1) (T_{cd} s + 1)}

(11)

2.4. Noise-Based Wind Turbine Generator

This paper examines wind energy as a form of renewable energy. Wind speed (

V_{w}

in m/s) affects the performance of wind energy systems. Figure 5a shows the wind speed model based on noise, which is used to simulate unstable wind speed in the actual environment. Based on the noise-based wind speed, the mechanical power output of the wind energy system can be calculated according to Equation (12),

P_{w} = \frac{1}{2} ρ A_{r} C_{p} {V^{'}}_{w}^{3}

(12)

where

ρ (kg / m^{3})

refers to air density,

A_{r} (m^{2})

denotes blade swept area. The rotor power coefficient

C_{p}

is expressed by Equation (13).

\{\begin{matrix} C_{p} = c_{1} (\frac{c_{2}}{λ_{I}} - c_{3} β - c_{4}) e^{- \frac{c_{5}}{λ_{I}}} + c_{6} λ \\ \frac{1}{λ_{I}} = \frac{1}{λ + 0.08 β} - \frac{0.035}{β^{3} + 1} \end{matrix}

(13)

Figure 5b illustrates the connection between

C_{p}

,

λ

, and

β

. And the diagram highlights a maximum value of

C_{p}

. Additionally, Table 1 presents the pertinent parameter values for wind turbine power systems.

2.5. Noise-Based Photovoltaic System

This paper considers both wind and photovoltaic (PV) systems as renewable energy sources. In this noise-based PV system model, the power variation considers the variation of uniform and non-uniform insolation, illustrated in Figure 6. As stated in Ref. [14], the power deviations of PV expressed by Equation (14) can be used to present an actual pattern depicting the variation in PV.

Δ P_{Solar} = 0.6 \sqrt{P_{Solar}}

(14)

3. LADRC Design Process for LFC

Generally, it is imperative to employ a control strategy to preserve the stability of

Δ f_{i}

and

Δ P_{tiei}

and mitigate the impact of load disturbances. However, this paper neglects the market mechanism, and as such, steady-state conditions should not experience any frequency deviation or tie-line exchanged power. Moreover, Equation (7) demonstrates that each area is not only affected by the load disturbance

Δ P_{di}

but also by the frequency deviation of other areas. In essence, LFC must address the impact of load disturbances and counteract inter-area coupling. Additionally, the renewable energy output power deviation

Δ P_{rei}

can also be seen as a disturbance in the load frequency system. Thus, theoretically speaking, the LFC for a power system incorporating renewable energy is more challenging.

LADRC is a model-free controller that can effectively overcome the impact of unknown disturbances. Additionally, it can treat coupling information as unknown disturbances, enabling it to achieve decoupling. Therefore, this paper adopts LADRC for the LFC of multi-source power systems.

3.1. System Order Analysis

While LADRC has model-free characteristics, obtaining the system order is essential for designing the LESO. For convenience,

d = Δ P_{rei} - Δ P_{di}

is defined as the disturbance. Then, carrying out Laplace transform on Equation (7), it yields

Y (s) = G (s) U (s) + G_{d} (s) D (s)

(15)

where

Y (s)

,

U (s)

, and

D (s)

are the Laplace forms of

{ACE}_{i} (t)

,

u_{i} (t)

and disturbance

d (t)

, respectively. Then, we can obtain

G (s) = \frac{R_{i 1} R_{i 2} R_{i 3} B_{i} (G_{th} + G_{hp} + G_{gp}) G_{pi}}{R_{i 1} R_{i 2} R_{i 3} + (R_{i 2} R_{i 3} G_{th} + R_{i 1} R_{i 3} G_{hp} + R_{i 1} R_{i 2} G_{gp}) G_{pi}}

(16)

As a result, combined with Equations (9), (10), (11), and (16), the system order of Equation (15) can be obtained as 2. That is, Equation (7) can be represented as

{\ddot{ACE}}_{i} = f_{1} ({ACE}_{i}, Δ P_{di}, Δ P_{rei}) + {bu}_{i}

(17)

where

f_{1}

denotes the total disturbance containing internal modeling information, load disturbance, and renewable energy output power deviation disturbance; b is the constant coefficient. Furthermore, an adjustable parameter

b_{0}

can be used to replace b,

\begin{matrix} {\ddot{ACE}}_{i} & = f_{1} + (b - b_{0}) u_{i} + b_{0} u_{i} \\ = f + b_{0} u_{i} \end{matrix}

(18)

Hence, the variable f represents the cumulative disturbance at the final stage, expressed as

f = f_{1} + (b - b_{0}) u_{i}

.

3.2. Disturbance Estimation and Elimination for LADRC

In Section 3.1, the multi-source power system under investigation is transformed into a second-order system with disturbances, where the disturbance information is unknown. Consequently, this paper will use the LADRC controller to eliminate unknown disturbances and realize the load frequency control. Firstly, an LESO is used to estimate the unknown disturbance.

For the system shown in Equation (18), the states can be defined as

x_{1} = {\dot{ACE}}_{i}

,

x_{2} = {ACE}_{i}

and

x_{3} = f

, allowing for the derivation of the state space equation

\{\begin{matrix} {\dot{x}}_{1} = x_{2} \\ {\dot{x}}_{2} = x_{3} + b_{0} u_{i} \\ {\dot{x}}_{3} = \dot{f} \end{matrix}

(19)

Suppose that

z_{1}

,

z_{2}

, and

z_{3}

are the observed states of

x_{1}

,

x_{2}

and

x_{3}

, respectively. Then, the full-order LESO, as shown in Equation (20), is designed to obtain the estimation of the above states [45],

\{\begin{matrix} {\dot{z}}_{1} = z_{2} - l_{1} (z_{1} - x_{1}) \\ {\dot{z}}_{2} = z_{3} + b_{0} u_{i} - l_{2} (z_{1} - x_{1}) \\ {\dot{z}}_{3} = - l_{3} (z_{1} - x_{1}) \end{matrix}

(20)

where

l_{1}

,

l_{2}

, and

l_{3}

are observer gains. Denote the observer error vector as

e = {[e_{1}, e_{2}, e_{3}]}^{T}

with

e_{1} = x_{1} - z_{1}

,

e_{2} = x_{2} - z_{2}

, and

e_{3} = x_{3} - z_{3}

; then, there is

\dot{e} = Ae + Eh

(21)

where

A = [\begin{matrix} - l_{1} & 1 & 0 \\ - l_{2} & 0 & 1 \\ - l_{3} & 0 & 0 \end{matrix}]

and

h = \dot{f}

. It can be found that the observer gains affect the stability of the observer errors. And suppose the eigenvalue is

λ_{A}

; then, we have

\det (A - λ_{A} I) = 0

, resulting in

λ_{A}^{3} + l_{1} λ_{A}^{2} + l_{2} λ_{A} + l_{3} = 0

(22)

Then, configure the pole at

- ω_{o} (ω_{o} > 0)

, that is,

{(λ_{A} + ω_{o})}^{3} = 0

, so that the convergence of the observer errors can be guaranteed. As a result, there are

l_{1} = 3 ω_{o}

,

l_{2} = 3 {ω_{o}}^{2}

, and

l_{3} = {ω_{o}}^{3}

. That is to say, the observer’s adjustable parameter count is reduced by the pole placement design method [46]. When

ω_{o}

is appropriate, there is

z_{3} \approx f

; thus, the control law shown in Equation (22) is designed to realize the elimination of disturbance, where

y_{r}

is the target value of

{ACE}_{i}

, i.e.,

y_{r} = 0

.

u_{i} = \frac{k_{p} (y_{r} - z_{1}) - k_{d} z_{2} - z_{3}}{b_{0}}

(23)

Substituting Equation (23) into (18), it can be found that the unknown disturbance f is compensated by

z_{3}

. Similarly, the controller parameters are placed at the pole

- ω_{c} (ω_{c} > 0)

through the pole placement method, obtaining

k_{p} = ω_{c}^{2}

and

k_{d} = 2 ω_{c}

. In summary, LADRC requires only the system’s order information to design an LESO for estimating unknown disturbances. The control law is then used to eliminate the estimated disturbance, enabling LFC control. Moreover, the LADRC presented in this paper includes three parameters: the observer parameter

ω_{o}

, the control law parameter

ω_{c}

, and the parameter

b_{0}

. And the structure diagram of LADRC is shown in Figure 7.

4. Adaptive LTD3-LADRC Approach

The performance of a controller is directly impacted by its parameters, and thus, it is crucial to carefully select and optimize them to ensure stability and improve robustness. As the number of areas in power systems increases, the parameter tuning process becomes more challenging and time-consuming, often requiring advanced optimization techniques and simulations to achieve satisfactory results. The adaptive tuning of controller parameters is a sequential decision-making process where the parameters are selected in a time-ordered manner. A suitable representation for this problem is a Markov decision process (MDP) denoted by

M = {S, A, P, R}

.

S

represents the state set,

A

denotes the action space,

P

indicates the state transition probability, and

R

represents the reward value.

RL is an effective means to deal with sequential decision-making problems, while DRL combines the computing power of deep neural networks based on RL. However, the training outcomes heavily rely on the reward function’s design, and there is no unified and efficient design rule yet. A Lyapunov-based reward function is provided for LFC problems based on the TD3 algorithm in this paper, namely the LTD3 algorithm. This approach enables the adaptive optimization of LADRC controller parameters.

4.1. The Basic of TD3

TD3 is an improved algorithm on the basis of DDPG, which solves the overestimation issue in the DDPG [47]. Consistent with the idea of reinforcement learning, the TD3 algorithm is also based on the dynamic interplay between the environment and the agent and uses the reward value in the interaction process to guide the agent toward selecting the most optimal strategy. The optimal policy is usually achieved by maximizing the following state–action value function.

Q (s, a) = E [R_{c} |S_{t} = s, A_{t} = a]

(24)

where

R_{c}

is known as the cumulative reward, expressed as

R_{c} = \sum_{t = 0}^{\infty} γ^{t} r_{t + 1}

. Here,

r \in R

is the instant reward, and the discount factor (

γ

) quantifies the significance of next-moment reward values in relation to the present moment.

Figure 8 showcases the frame diagram of the TD3 algorithm, encompassing essential components such as a replay buffer and six neural networks. The critic and target critic networks share the same structure as the actor and target actor networks. The replay buffer has the function of storing the data required for training the neural networks. For the six neural networks, there are the critic network 1

Q_{θ_{1}} (s, a)

, critic network 2

Q_{θ_{2}} (s, a)

, actor network

π_{ϕ} (s)

, target critic network 1

Q_{θ_{1}^{'}} (s, a)

, target critic network 2

Q_{θ_{2}^{'}} (s, a)

, and target actor network

π_{ϕ^{'}} (s)

. As for the meaning of these symbols, taking

Q_{θ_{1}} (s, a)

as an instance, it means that the network input includes s and a, the network weight is represented by

θ_{1}

, and the network output is Q. The network diagram depicted in Figure 8 illustrates the input and output variables. Notably, the critic network is designed to approximate the state–action value Q, while the actor network generates the action value a based on the given state s.

In addition, compared with the DDPG algorithm, TD3 adds the target critic networks, which leverages the DDQN (double deep Q-network) algorithm, thus alleviating the problem of Q value overestimation to a certain extent. That is to say, to enhance the proximity between the critic network’s generated Q value and the actual Q value, the estimation of the state–action value in the subsequent moment utilizes the smaller output from the two critic networks. This process yields the target value as

y = r + γ min_{i = 1, 2} Q_{{θ^{'}}_{i}} (s^{'}, a^{'})

(25)

where

s^{'}

represents the state value in the subsequent time step; the action value

a^{'}

in the subsequent time step is obtained through the target action network output

π_{ϕ^{'}} (s^{'})

:

a^{'} = π_{ϕ^{'}} (s^{'}) + ϵ

, in which the noise

ϵ

satisfies

ϵ \sim

clip

(N (0, σ), - c, c)

. Then, the TD errors can be obtained as

L_{ci} = {(y - Q_{θ_{i}} (s, a))}^{2}, i = 1, 2

(26)

On the basis of Equation (26), the update of

θ_{1}

and

θ_{2}

can be realized using the gradient descent method. Additionally, the actor network is updated after

κ

steps of the critic network update, and the updated principle is to hope that the action value

a = π_{ϕ} (s)

generated by the actor network can maximize the

Q_{θ_{1}} (s, a)

.

The update of the target networks adopts an exponentially moving average method, and there is

\{\begin{matrix} {θ^{'}}_{i} \leftarrow τ θ_{i} + (1 - τ) {θ^{'}}_{i} \\ ϕ^{'} \leftarrow τ ϕ + (1 - τ) ϕ^{'} \end{matrix}

(27)

where

τ

is a smoothing factor. And the pseudocode of the TD3 algorithm is shown in Algorithm 1.

Algorithm 1 TD3 Algorithm

Initialize critic networks $Q_{θ_{1}} (s, a)$ , $Q_{θ_{2}} (s, a)$ and actor network $π_{ϕ} (s)$ with random parameters $θ_{1}$ , $θ_{2}$ and $ϕ$ .
Initialize target networks $Q_{θ_{1}^{'}} (s, a)$ , $Q_{θ_{2}^{'}} (s, a)$ and $π_{ϕ^{'}} (s)$ with weights $θ_{1}^{'} \leftarrow θ_{1}$ , $θ_{2}^{'} \leftarrow θ_{2}$ and $ϕ^{'} \leftarrow ϕ$ .
Initialize replay buffer $D$ .
if $t \leq T$ then
Select action with exploration noise $a \sim π_{ϕ} (s) + ϵ$ , $ϵ \sim N (0, σ)$ , observe reward r and new states $s^{'}$ .
Store transition tuple $(s, a, r, s^{'})$ in $D$ .
Sample mini-batch of m transitions $(s, a, r, s^{'})$ from $D$ .
Obtain the action value by Target ActorNetwork: $a^{'} \leftarrow π_{ϕ^{'}} (s^{'}) + ϵ$ , $ϵ$ is the noise and satisfies: $ϵ \sim$ clip $(N (0, σ), - c, c)$ .
Calculate the target value by Equation (25): $y \leftarrow r + γ {min}_{i = 1, 2} Q_{{θ^{'}}_{i}} (s^{'}, a^{'})$ .
Update critics by TD-errors: $θ_{i} \leftarrow arg {min}_{θ_{i}} \frac{1}{m} \sum {(y - Q_{θ_{i}} (s, a))}^{2}$ .
if t mod $κ$ then
Update $ϕ$ by the deterministic policy gradient: $\nabla_{ϕ} J (ϕ) = \frac{1}{m} \sum \nabla_{a} Q_{θ_{1}} (s, a) |_{a = π_{ϕ} (s)} \nabla_{ϕ} π_{ϕ} (s)$ .
Update target networks by moving average method: $θ_{i}^{'} \leftarrow τ θ_{i} + (1 - τ) θ_{i}^{'}$ , $ϕ^{'} \leftarrow τ ϕ + (1 - τ) ϕ^{'}$ .
end if
end if

4.2. Lyapunov-Based Reward Function for TD3

The reward function design significantly impacts the effectiveness of DRL. It serves as the feedback signal given by the environment, guiding the agent to learn the correct behavior. The reward function determines what the agent should strive to achieve and what actions should be avoided. A well-designed reward function can help the agent learn the optimal policy faster and more accurately. At the same time, a poorly designed one can lead to incorrect behavior or slow learning.

In this paper, a Lyapunov [48]-based reward function [49] for TD3 is proposed as

R^{lyap} = - L (s, a) + λ (L (s, a) - γ L (s^{'}, a^{'}))

(28)

where

λ

represents the weight value, and it can be observed that when

λ = 0

, the Lyapunov reward function is consistent with the normal reward function.

L (s, a) = - R (s, a)

is the Lyapunov function with decreasing properties, and the notation

R (s, a)

represents the feedback received when action a is taken in state s. Then, the following analysis can be carried out.

Assumption 1.

γ approximates to 1, that is,

γ \approx 1

.

Assumption 2.

There exists a state–action pair

(s^{*}, a^{*})

that maximizes the reward value, denoted as

R (s^{*}, a^{*}) = \max_{s, a} R (s, a)

.

Theorem 1.

Under the premise of Assumptions 1 and 2, if

\exists (s^{'}, a^{'})

, the following inequality holds:

L (s^{'}, a^{'}) \leq L (s, a)

(29)

then the state–action pair

(s, a)

can converge to the optimum

(s^{*}, a^{*})

.

Proof of Theorem 1.

Equation (28) can be rearranged with Assumption 1 as

R^{lyap} = - (1 - λ) L (s, a) - λ L (s^{'}, a^{'})

(30)

With Equation (29), Equation (30) leads to

R^{lyap} \geq - L (s, a)

(31)

Moreover, Equation (30) can further obtain the following inequality results:

\begin{matrix} R^{lyap} & = (1 - λ) R (s, a) + λ R (s^{'}, a^{'}) \\ \leq R (s^{*}, a^{*}) \end{matrix}

(32)

Due to the Lyapunov function

L (s, a)

exhibiting descent property [50], as well as the existence of a maximal reward

R (s^{*}, a^{*})

(Assumption 2), the

L (s, a)

has lower bound

L (s, a) \geq - R (s^{*}, a^{*})

. This demonstrates the convergence of

L (s, a)

. Since

L (s, a) = - R (s, a)

,

L (s, a)

converging to

L (s^{*}, a^{*})

is equivalent to

R (s, a)

converging to

R (s^{*}, a^{*})

. As a result, the state–action pair

(s, a)

will converge to the optimum

(s^{*}, a^{*})

. □

Remark 1.

The conventional reward function is formulated on the premise that the reward value increases as the state quantity approaches the desired value, as illustrated in Ref. [51]. However, this reward function lacks a solid theoretical foundation. In contrast, the reward function presented in this paper incorporates the Lyapunov function, which not only provides a theoretical basis but also exhibits convergence, as validated by Theorem 1.

4.3. Environment and Agent Settings for Multi-Source Power System

As mentioned, DRL algorithms can continuously learn to find the best policy. In this paper, the environment comprises the multi-source power system and the corresponding LADRC, while the agent refers to the decision maker in the TD3 algorithm. Before training, it is necessary to clarify state and action space composition and establish a reward function based on the corresponding state variables.

This paper considers the action space as the parameters to be adjusted in LADRC. According to Section 3, each area contains an LADRC, and the parameters that need to be adjusted in a controller include three parameters:

ω_{o}

,

ω_{c}

, and

b_{0}

. Moreover, only the tuning of the parameters

ω_{o}

and

ω_{c}

is considered, because it is mentioned in Ref. [52] that

b_{0}

has little influence on suppressing disturbances. Moreover, it is conceivable that with an increase in the number of areas, there will be a corresponding increase in the number of controller parameters that require adjustment, so the action space can be defined as

\begin{matrix} {a_{11}, a_{12}, \dots a_{n 1}, a_{n 2} \in A | a_{11} = ω_{o 1}, a_{12} = ω_{c 1}, \dots a_{n 1} = ω_{o n}, a_{n 2} = ω_{c n}} \end{matrix}

(33)

n in Equation (33) is the the number of areas. The detected states in each area are

ACE (t)

and

ACE (t) - ACE (t - 1)

, which are equivalent to the two variables of error and derivative of error

A \dot{C} E (t)

. Therefore, the state space is defined as

\begin{matrix} {s_{11}, s_{12}, \dots s_{n 1}, s_{n 2} \in S | s_{11} = {ACE}_{1}, s_{12} = A \dot{C} E_{1}, \dots s_{n 1} = {ACE}_{n}, s_{n 2} = A \dot{C} E_{n}} \end{matrix}

(34)

Based on the observed state, we expect the area control error to converge quickly and smoothly to the desired value of 0. Then, the following two points about reward function need to be satisfied:

(1) The

{ACE}_{i}

is as close to 0 as possible, and the smaller the area control error, the larger the reward value should be.

(2) The overshoot and undershoot of the system response should be as slight as possible.

Therefore, the reward feedback

R_{t}

based on the state variables is designed as

R_{t} = - 2 \sum_{i = 1}^{n} [{(100 {ACE}_{i} (t))}^{2} + {(10 A \dot{C} E_{i} (t))}^{2}]

(35)

where the coefficients of 100 and 10 are utilized to ensure the consistent magnitude of

{ACE}_{i}

and

A \dot{C} E_{i}

. This approach prevents the agent from being overly influenced by a specific state during training. Moreover,

R (t)

has the maximum value, thus satisfying Assumption 2.

Then, according to Equations (30) and (35), the expression of the reward function based on Lyapunov can be written as

R^{lyap} (t) = R_{t} + λ (γ R_{t + 1} - R_{t})

(36)

For a power system with n areas, the structure diagram of the proposed LTD3-LADRC is shown in Figure 9. It can be found that the purpose of LTD3 is to obtain the adaptive controller parameters shown in Equation (33) through detecting the real-time state in Equation (34), then inputting these parameters into the controller will generate a new state vector. By continuously repeating this process, load frequency control can be effectively achieved.

5. Simulation Verification and Analysis

In this paper, the simulation is performed using the MATLAB/Simulink R2021b platform. Please refer to the Appendix A for the model under study [20] and its corresponding parameters. Then two LADRC controllers are needed, and LTD3 is used to to realize parameter optimization of

ω_{o 1}

,

ω_{c 1}

,

ω_{o 2}

, and

ω_{c 2}

. Additionally, Figure 10 illustrates the architecture of the critic and actor networks, as mentioned in Algorithm 1, proposed in this paper. The number of neurons in each hidden layer and the employed activation function are indicated within the dashed box. Network activation functions such as Relu are also shown in Figure 10.

Moreover, the hyperparameters in the LTD3 algorithm are chosen as

γ = 0.99

,

λ = 0.3

, and

τ = 0.005

. To showcase the efficacy of the proposed LTD3 algorithm, we conducted experiments on the LADRC load frequency control system. Specifically, we applied the TD3 algorithm with the reward function described in Equation (35) and the LTD3 algorithm with the reward function outlined in Equation (36). Throughout the training process, one episode represents the execution of one loop in Algorithm 1 with a time interval of 0.01 s and a total simulation time of 15 s. For the control system, the parameters of

b_{01}

and

b_{02}

are all chosen as 20, and the action space for DRL is

\{\begin{matrix} ω_{o i} \in [5, 15], i = 1, 2 \\ ω_{c i} \in [1, 6], i = 1, 2 \end{matrix}

(37)

Figure 11 illustrates the progression of the episode reward value for TD3 and LTD3 throughout the training process. The graph reveals that TD3’s reward value approaches stability around 500 episodes, whereas LTD3 achieves stability in approximately 90 episodes. This observation indicates that LTD3 exhibits faster convergence in its training.

To enhance the verification of the agent trained above, simulations are conducted under the following three scenarios:

Scenario 1: System response performance without RESs;
Scenario 2: System response performance with wind turbines and photovoltaic systems;
Scenario 3: System response performance under system parameter variations.

The difference between Scenario 1 and Scenario 2 lies in the variation of the model environment. Scenario 1 involves the model without renewable energy sources, whereas Scenario 2 incorporates wind and solar energy. Scenario 3 simulates parameter perturbation problems that may occur in the actual environment, essentially to verify the robustness of the LTD3-LADRC method.

5.1. Scenario 1: System Response Performance without RESs

Initially, we analyzed scenarios that exclude wind turbines and photovoltaic systems. This scenario is introduced in the following two cases.

Case A: System considers a SLP

In this case, a 0.01 p.u. step load perturbation (SLP) was introduced in Area I at 5 s, i.e.,

Δ P_{d 1} = 0.01

p.u. Figure 12 and Figure 13 present the simulation results, including the response curves of various published works, such as integral control (IC) [44], FOPID [43], and the ID-T based Archimedes optimization algorithm (AOA) [20], as well as the curve generated by LADRC-TD3 for comparison. Furthermore, Table 2 provides a numerical comparison of performance indicators, including undershoot, overshoot, settling time, and the integral absolute error (IAE) defined in Equation (38).

IAE = \int_{0}^{T} | e (t) | dt

(38)

Specifically, Figure 12 depicts the dynamic response of the frequency deviation in Area I and Area II, along with the tie-line exchanged power under the influence of SLP. It shows that the system exhibits a transient undershoot response under the influence of disturbances. Table 2 further indicates that the LADRC controller based on TD3 and LTD3 algorithms achieves smaller undershoot than other methods. It is worth noting that LTD3 exhibits slightly lower performance in terms of

Δ P_{tie}

compared to TD3, as indicated in Table 2. This occurs because the

{ACE}_{i}

is derived from the linear combination of

Δ f_{i}

and

Δ P_{tiei}

, as shown in Equation (1). The algorithm aims to minimize

{ACE}_{i}

, considering it ideal for

{ACE}_{i}

to be as small as possible. However, since

Δ f_{i}

generally has a larger magnitude than

Δ P_{tiei}

, it exerts a greater influence on

{ACE}_{i}

. Consequently, the effect of

Δ f_{i}

achieved by LTD3 is superior, while the impact of

Δ P_{tiei}

is not significantly different between the two algorithms. However, the difference is negligible, and from the target of frequency deviation, LTD3 still outperforms TD3. The changing process of controller parameters in the LADRC-TD3 and LADRC-LTD3 methods, obtained from the trained TD3 agent and LTD3 agent, respectively, is presented in Figure 13. It can be found that LTD3 and TD3 provide different controller parameter strategies, and from the performance results in Figure 12, the strategy obtained from the proposed LTD3 is better than that of the TD3.

Case B: System considers a multi-step load perturbation (MSLP)

In this case, the multi-step load perturbation (MSLP), expressed in Figure 14, is added to Area I, which denotes the presence of multiple step load perturbations throughout the simulation process. Then, the dynamic response by I-TD [20], ID-T [20], LADRC-TD3, and the proposed LADRC-LTD3 is shown in Figure 15 and Figure 16. All four control methods can attain steady-state responses when faced with various step disturbances. By comparison, LADRC-TD3 and LADRC-LTD3 exhibit significantly less vibration during the transient response process than the other two methods. Furthermore, Table 3 provides a numerical comparison of the IAE and ITAE integral of time multiplied by absolute error, as expressed in Equation (39)), for the

Δ f_{1}

,

Δ f_{2}

, and

Δ P_{tie}

responses of the four control methods, respectively. It indicates that the proposed method achieves the smallest IAE and ITAE values, which suggests that the proposed method outperforms the other three methods regarding control accuracy and convergence speed. Figure 16 also displays the adaptive parameters of TD3 and LTD3, obtained by observing the system state during the simulation process.

ITAE = \int_{0}^{T} t | e (t) | dt

(39)

5.2. Scenario 2: System Response Performance with RESs

In this scenario, the noise-based wind turbine illustrated in Figure 5 and the photovoltaic system depicted in Figure 6 are taken into account based on the two-area interconnected power system described earlier. With the white noise, the deviation output of wind and solar energy exhibits fluctuations, as illustrated in Figure 17. To further evaluate the effectiveness of the proposed method in regulating LFC that incorporate RESs, the MSLP shown in Figure 18 is added to Area I. The wind turbine is connected at 100s, and the photovoltaic system is connected at 250 s.

The simulation results are depicted in Figure 19 and Figure 20. It is noticeable that, in the presence of renewable energy, the system experiences oscillations, leading to a more significant response overshoot than the load frequency system without renewable energy. However, the proposed LADRC-LTD3 method effectively suppresses the oscillations and promptly restores the system to a stable state. Table 4 further displays the performance results of the four methods regarding IAE and ITAE, indicating that the proposed LADRC-LTD3 outperforms the other three methods significantly. Additionally, Figure 20 illustrates the parameters obtained by the TD3 agent and LTD3 agent. The selection of parameters varies significantly when a step disturbance occurs. In addition, white noise also impacts the selection of parameters.

5.3. Scenario 3: System Response Performance Considering System Parameter Variations

When evaluating controller performance, it is crucial to consider both the dynamic response of the system and the controller’s robustness against variations in system parameters. Therefore, the Monte Carlo method is adopted for the robustness test.

Based on Scenario 2, Monte Carlo simulations are conducted for 50 runs with the model parameters of

T_{sg}

,

T_{t}

,

T_{gh}

,

X_{g}

, and

Y_{g}

varying within

\pm 50 %

of their nominal values. Considering such uncertainty, Figure 21 displays the simulation results obtained. The results demonstrate that the proposed method retains its ability to regulate load disturbance and renewable energy and achieve a stable response even when the model parameters are modified. This indicates that the proposed method exhibits strong robustness.

6. Conclusions

In conclusion, the paper proposes a Lyapunov reward-based TD3 algorithm, denoted as LTD3, optimized LADRC method for LFC in multi-source systems with RESs. The effectiveness of the LTD3-LADRC is validated through simulations on a nonlinear two-area interconnected power system that includes thermal, hydro, and gas power plants, as well as wind turbine and photovoltaic systems. The proposed LADRC-LTD3 method is compared with existing methods, and the results show that it effectively addresses the LFC problem in the presence of renewable energy and load disturbance. Furthermore, the proposed method exhibits robustness and maintains system stability even in parameter perturbation.

This paper employs the Lyapunov reward function to enhance the convergence speed of the algorithm, and the effectiveness of the proposed LTD3 algorithm is validated through simulations. Further theoretical analysis of the LTD3 algorithm is necessary in future work to improve its robustness and performance. Furthermore, it is noteworthy that the current study does not encompass the involvement of the load side within the load frequency control framework. Thus, in our forthcoming research, we intend to broaden our model and validate our methodologies in this domain.

Author Contributions

Conceptualization, Y.Z. and Z.C.; methodology, J.T.; software, Y.Z.; validation, Y.Z. and Q.S.; formal analysis, Y.Z.; investigation, H.S.; resources, H.S.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, J.T. and M.S.; supervision, Q.S. and J.T.; project administration, J.T.; funding acquisition, J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61973172, 61973175, 62003175, 62003177, and 62073177) and Key Technologies Research and Development Program of Tianjin (Grant No. 19JCZDJC32800).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare that they do not have any conflicts of interest.

Nomenclature

${ACE}_{i}$	Area control error (p.u.)
$Δ f_{i}$	Frequency deviation (Hz)
$Δ P_{tiei}$	Tie-line exchanged power (p.u.)
$B_{i}$	Frequency bias constant (MW/Hz)
$R_{i}$	Speed regulation constant (Hz/MW)
$Δ P_{di}$	Load disturbance (p.u.)
$Δ P_{tpi}$ , $Δ P_{hpi}$ , $Δ P_{gpi}$ and $Δ P_{rei}$	Power output deviation of thermal plant, hydro plant, gas plant and renewable energy (p.u.)
$T_{ij}$	Synchronization time constant (s)
$T_{g}$ , $T_{t}$ , $T_{r}$	Time constants in thermal plant (s)
$KT$ , $KH$ , $KG$	Proportional constants
$K_{r}$	Reheat gain
$T_{gh}$ , $T_{rs}$ , $T_{rh}$ , $T_{w}$	Time constants in hydro plant (s)
$B_{g}$ , $C_{g}$ , $X_{g}$ , $Y_{g}$ , $T_{f}$ , $T_{cr}$ , $T_{cd}$	Time constants in gas plant (s)
H, D	Time constants in generator (s)
$V_{w}$	Wind speed (m/s)
$ρ$	Air density (kg/ $m^{3}$ )
$A_{r}$	Blade swept area ( $m^{2}$ )
$C_{p}$	Rotor power coefficient
$λ$ , $β$	Tip-speed ratio (rpm) and pitch angle ( $^{\circ}$ )
$ω_{o}$	Observer gain
$ω_{c}$	Controller pole
$τ$ , $γ$	Smoothing factor and discount factor
$θ$ ( $θ^{'}$ )	Network weights of (target) critic network
$ϕ$ ( $ϕ^{'}$ )	Network weights of (target) actor network
$R_{lyap}$ , $L (s, a)$	Lyapunov based reward function and Lyapunov function

Appendix A

The simulation presented in this paper is conducted using the MATLAB/Simulink platform. The model is thoroughly explained in Section 2, providing comprehensive details. The Appendix showcases the overall framework of the studied model as shown in Figure A1, along with its corresponding parameters shown in Table A1.

Table A1. Symbolic description of the studied power system.

Parameter	Values
Thermal power plant	$T_{g} = 0.06, T_{t} = 0.3, T_{r} = 10.2, K_{r} = 0.3, K_{T} = 0.5747$
Hydropower plant	$T_{gh} = 0.2, T_{rs} = 4.9, T_{rh} = 28.749, T_{w} = 1.1, K_{H} = 0.2873$
Gas power plant	$B_{g} = 0.049, C_{g} = 1, X_{g} = 0.6, Y_{g} = 1.1, T_{cr} = 0.01, T_{f} = 0.239, T_{cd} = 0.2, K_{G} = 0.138$
Generator	$H = 0.5 * 11.49 / 68.9655, D = 1 / 68.9655$
Others	$R_{1} = R_{2} = R_{3} = 2.4, B_{1} = B_{2} = 0.4312, T_{12} = 0.0433$

Figure A1. Simulink model of the studied power system.

References

Shayeghi, H.; Shayanfar, H.A.; Jalili, A. Load frequency control strategies: A state-of-the-art survey for the researcher. Energy Convers. Manag. 2009, 50, 344–353. [Google Scholar] [CrossRef]
Abou El-Ela, A.A.; El-Sehiemy, R.A.; Shaheen, A.M.; Diab, A.E.G. Design of cascaded controller based on coyote optimizer for load frequency control in multi-area power systems with renewable sources. Control Eng. Pract. 2022, 121, 05058. [Google Scholar] [CrossRef]
Srönmez, S.; Ayasun, S. Stability region in the parameter space of PI controller for a single-area load frequency control system with time delay. IEEE Trans. Power Syst. 2015, 31, 829–830. [Google Scholar] [CrossRef]
Mohamed, T.H.; Shabib, G.; Abdelhameed, E.H.; Khamies, M.; Qudaih, Y. Load frequency control in a single area system using model predictive control and linear quadratic Gaussian techniques. J. Electr. Eng. Technol. 2015, 3, 141–143. [Google Scholar] [CrossRef]
Ramakrishna, K.S.S.; Bhatti, T.S. Sampled-data automatic load frequency control of a single area power system with multi-source power generation. ElecTR Power Compon. Syst. 2007, 35, 955–980. [Google Scholar] [CrossRef]
Shakibjoo, A.D.; Moradzadeh, M.; Moussavi, S.Z.; Mohammadzadeh, A.; Vandevelde, L. Load frequency control for multi-area power sys-tems: A new type-2 fuzzy approach based on Levenberg-Marquardt algorithm. ISA Trans. 2022, 121, 40–52. [Google Scholar]
Jin, L.; Zhang, C.-K.; He, Y.; Jiang, L.; Wu, M. Delay-dependent stability analysis of multi-area load frequency control with enhanced accuracy and computation efficiency. IEEE Trans. Power Syst. 2019, 34, 3687–3696. [Google Scholar] [CrossRef]
Lamba, R.; Singla, S.K.; Sondhi, S. Design of fractional order PID controller for load frequency control in perturbed two area interconnected system. Electr. Power Compon. Syst. 2019, 47, 98–1011. [Google Scholar] [CrossRef]
Ali, E.S.; Abd-Elazim, S.M. BFOA based design of PID controller for two area load frequency control with nonlinearities. Int. J. Electr. Power Energy Syst. 2013, 51, 24–231. [Google Scholar] [CrossRef]
Tan, W. Unified PID load frequency controller tuning for power systems via IMC. IEEE Trans. Power Syst. 2009, 25, 341–350. [Google Scholar] [CrossRef]
Dutta, A.; Prakash, S. Utilizing electric vehicles and renewable energy sources for load frequency control in deregulated power system using emotional controller. IETE J. Res. 2022, 68, 1500–1511. [Google Scholar] [CrossRef]
Abouheaf, M.; Wail, G.; Adel, S. Model-free adaptive learning control scheme for wind tur-bines with doubly fed induction generators. IET Renew. Power Gener. 2018, 12, 675–1686. [Google Scholar] [CrossRef]
Şahin, E.; Halil, O. Parallel-connected buck–boost converter with FLC for hybrid energy system. Electr. Power Compon. Syst. 2021, 48, 117–2129. [Google Scholar] [CrossRef]
Bakeer, A.; Magdy, G.; Chub, A.; Bevrani, H. A sophisticated modeling approach for photovoltaic systems in load frequency control. Int. J. Electr. Power Energy Syst. 2022, 134, 07330. [Google Scholar] [CrossRef]
Khooban, M.H.; Gheisarnejad, M. A novel deep reinforcement learning controller based type-II fuzzy sys-tem: Frequency regulation in microgrids. IEEE Trans. Emerg. Top Comput. Intell. 2020, 5, 689–699. [Google Scholar] [CrossRef]
Abd-Elazim, S.M.; Ali, E.S. Load frequency controller design of a two-area system composed of PV grid and thermal generator via firefly algorithm. Neural. Comput. Appl. 2018, 30, 07–616. [Google Scholar] [CrossRef]
Magdy, G.; Shabib, G.; Elbaset, A.A.; Mitani, Y. Renewable power systems dynamic security using a new coor-dina-tion of frequency control strategy based on virtual synchronous generator and digital frequency pro-tection. Int. J. Electr. Power Energy Syst. 2019, 109, 51–368. [Google Scholar] [CrossRef]
Irudayaraj, A.X.R.; Wahab, N.I.A.; Premkumar, M.; Radzi, M.A.M.; Bin Sulaiman, N.; Veerasamy, V.; Farade, R.A.; Islam, M.Z. Renewable sources-based automatic load frequency control of interconnected systems using chaotic atom search optimization. Appl. Soft Comput. 2022, 119, 08574. [Google Scholar] [CrossRef]
Tavakoli, M.; Pouresmaeil, E.; Adabi, J.; Godina, R.; Catalão, J.P. Load-frequency control in a multi-source power system con-nected to wind farms through multi–erminal HVDC systems. Comput. Oper. Res. 2018, 96, 5–315. [Google Scholar] [CrossRef]
Ahmed, M.; Magdy, G.; Khamies, M.; Kamel, S. Modified TID controller for load frequency control of a two-area interconnected diverse-unit power system. Int. J. Electr. Power Energy Syst. 2022, 135, 07528. [Google Scholar] [CrossRef]
Guha, D.; Roy, P.K.; Banerjee, S. Equilibrium optimizer-tuned cascade fractional-order 3DOF-PID controller in load frequency control of power system having renewable energy resource integrated. Int. Trans. Electr. Energy Syst. 2021, 31, e12702. [Google Scholar] [CrossRef]
Tan, W. Tuning of PID load frequency controller for power systems. Energy Convers. Manag. 2009, 50, 1465–1472. [Google Scholar] [CrossRef]
Kumar, A.; Pan, S. Design of fractional order PID controller for load frequency control system with com-mu-nication delay. ISA Trans. 2022, 129, 38–149. [Google Scholar] [CrossRef] [PubMed]
Jalali, N.; Razmi, H.; Doagou-Mojarrad, H. Optimized fuzzy self-tuning PID controller design based on Tribe-DE optimization algorithm and rule weight adjustment method for load frequency control of in-ter-connected multi-area power systems. Appl. Soft Comput. 2020, 93, 06424. [Google Scholar] [CrossRef]
Mani, P.; Joo, Y.H. Fuzzy logic-based integral sliding mode control of multi-area power systems integrated with wind farms. Inf. Sci. 2021, 545, 53–169. [Google Scholar] [CrossRef]
Şahin, E.; Halil, O. Comparison of different controllers and stability analysis for photo-voltaic powered buck-boost DC-DC converter. Electr. Power Compon. Syst. 2018, 46, 149–161. [Google Scholar] [CrossRef]
Oshnoei, A.; Kheradmandi, M.; Muyeen, S.M. Robust control scheme for distributed battery energy storage systems in load frequency control. IEEE Trans. Power Syst. 2020, 35, 4781–4791. [Google Scholar] [CrossRef]
Tan, W.; Hao, Y.; Li, D. Load frequency control in deregulated environments via active disturbance rejec-tion. Int. J. Electr. Power Energy Syst. 2015, 66, 66–177. [Google Scholar] [CrossRef]
Gao, Z. On the foundation of active disturbance rejection control. Control Theory Appl. 2013, 30, 850–857. [Google Scholar]
Liu, C.; Luo, G.; Duan, X.; Chen, Z.; Zhang, Z.; Qiu, C. Adaptive LADRC-based disturbance rejection method for electromechanical servo system. IEEE Trans. Ind. Appl. 2019, 56, 876–889. [Google Scholar] [CrossRef]
Li, H.; An, X.; Feng, R.; Chen, Y. Motion Control of Autonomous Underwater Helicopter Based on Linear Active Disturbance Rejection Control with Tracking Differentiator. Appl. Sci. 2023, 13, 3836. [Google Scholar] [CrossRef]
Zheng, Y.; Tao, J.; Sun, Q.; Sun, H.; Chen, Z.; Sun, M.; Duan, F. Deep-reinforcement-learning-based active disturbance rejection control for lateral path following of parafoil system. Sustainability 2022, 15, 435. [Google Scholar] [CrossRef]
Li, D.; Wang, Z.; Yu, W.; Li, Q.; Jin, Q. Application of LADRC with stability region for a hydrotreating back-flushing process. Control Eng. Pract. 2018, 79, 85–194. [Google Scholar] [CrossRef]
Gouran-Orimi, S.; Ghasemi-Marzbali, A. Load Frequency Control of multi-area multi-source system with nonlinear structures using modified Grasshopper Optimization Algorithm. Appl. Soft Comput. 2023, 137, 10135. [Google Scholar] [CrossRef]
Çelik, E. Design of new fractional order PI-fractional order PD cascade controller through dragonfly search algorithm for advanced load frequency control of power systems. Soft Comput. 2021, 25, 1193–1217. [Google Scholar] [CrossRef]
Khadanga, R.K.; Kumar, A.; Panda, S. A novel modified whale optimization algorithm for load frequency controller design of a two-area power system composing of PV grid and thermal generator. Neural Comput. Appl. 2020, 32, 8205–8216. [Google Scholar] [CrossRef]
Wang, R.; Chen, Z.; Xing, Q.; Zhang, Z.; Zhang, T. A modified rainbow-based deep reinforcement learning method for optimal scheduling of charging station. Sustainability 2022, 14, 1884. [Google Scholar] [CrossRef]
Ibarz, J.; Tan, J.; Finn, C.; Kalakrishnan, M.; Pastor, P.; Levine, S. How to train your robot with deep reinforcement learning: Lessons we have learned. Int. J. Robot. Res. 2021, 40, 98–721. [Google Scholar] [CrossRef]
Khalid, J.; Ramli, M.A.; Khan, M.S.; Hidayat, T. Efficient load frequency control of renewable integrated power system: A twin delayed DDPG-based deep reinforcement learning approach. IEEE Access 2022, 10, 1561–51574. [Google Scholar] [CrossRef]
Zheng, Y.; Tao, J.; Sun, Q.; Sun, H.; Chen, Z.; Sun, M. Deep reinforcement learning based active disturbance rejection load fre-quency control of multi-area interconnected power systems with renewable energy. J. Franklin Inst. 2022, in press.
Dong, Y.; Tang, X.; Yuan, Y. Principled reward shaping for reinforcement learning via Lyapunov stability theory. Neurocomputing 2020, 393, 3–90. [Google Scholar] [CrossRef]
Yu, X.; Xu, S.; Fan, Y.; Ou, L. A Self-adaptive LSAC-PID Approach based on Lyapunov Reward Shaping for Mobile Robots [OL]. arXiv 2021, arXiv:111.02283. [Google Scholar] [CrossRef]
Morsali, J.; Zare, K.; Hagh, M.T. Comparative performance evaluation of fractional order controllers in LFC of two-area diverse-unit power system considering GDB and GRC effects. J. Electr. Syst. Inf. Technol. 2018, 5, 708–722. [Google Scholar] [CrossRef]
Morsali, J.; Zare, K.; Hagh, M.T. Performance comparison of TCSC with TCPS and SSSC controllers in AGC of realistic interconnected multi-source power system. Ain. Shams. Eng. J. 2016, 7, 143–158. [Google Scholar] [CrossRef]
Wang, Y.; Liu, J.; Chen, Z.; Sun, M.; Sun, Q. On the stability and convergence rate analysis for the nonlinear uncertain systems based upon active disturbance rejection control. Int. J. Robust Nonlin 2020, 30, 5728–5750. [Google Scholar] [CrossRef]
Gao, Z. Scaling and bandwidth-parameterization based controller tuning. In Proceedings of the 2003 American Control Conference, Denver, CO, USA, 4–6 July 2003; pp. 989–4996. [Google Scholar]
Huang, R.; He, H.; Zhao, X.; Wang, Y.; Li, M. Battery health-aware and naturalistic data-driven energy management for hybrid electric bus based on TD3 deep reinforcement learning algorithm. Appl. Energy 2022, 321, 19353. [Google Scholar] [CrossRef]
Khalil, H.K. Lyapunov stability. Control. Syst. Robot. Autom. 2009, 12, 15. [Google Scholar]
Ng, A.Y.; Harada, D.; Russell, S. Policy invariance under reward Transformations: Theory and application to reward shaping. In Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, 27–30 June 1999; Volume 99, pp. 78–287. [Google Scholar]
Zhou, P.; Hu, X.; Zhu, Z.; Ma, J. What is the most suitable Lyapunov function? Chaos Solitons & Fractals 2021, 150, 11154. [Google Scholar]
Zhao, Y.; Qi, X.; Ma, Y.; Li, Z.; Malekian, R.; Sotelo, M.A. Path following optimization for an underactuated USV using smooth-ly-convergent deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2020, 22, 6208–6220. [Google Scholar] [CrossRef]
Zhou, R.; Tan, W. Analysis and tuning of general linear active disturbance rejection controllers. IEEE Trans. Ind. Electron. 2018, 66, 5497–5507. [Google Scholar] [CrossRef]

Figure 1. An n-area power system: structure diagram for the i-th area.

Figure 2. Thermal power plant.

Figure 3. Hydro power plant.

Figure 4. Gas power plant.

Figure 5. Schematic diagram of wind speed and plot of

C_{p}

with

λ

and

β

.

Figure 5. Schematic diagram of wind speed and plot of

C_{p}

with

λ

and

β

.

Figure 6. Noise-based PV model.

Figure 7. Structure diagram of LADRC.

Figure 8. Frame diagram of TD3 algorithm.

Figure 9. Structure diagram of LTD3-LADRC.

Figure 10. Architecture of the critic and actor networks.

Figure 11. Comparison results of episode reward.

Figure 12. Response curves for Scenario 1—Case A.

Figure 13. Controller parameter changing process for Scenario 1—Case A.

Figure 14. Diagram of SLP added on Area I in Scenario 1—Case B.

Figure 15. Response curves in Scenario 1—Case B.

Figure 16. Adaptive parameters by TD3 and LTD3 in Scenario 1—Case B.

Figure 17. Power output deviation of RESs.

Figure 18. Diagram of SLP added on Area I in Scenario 2.

Figure 19. Response curves in Scenario 2.

Figure 20. Adaptive parameters by TD3 and LTD3 in Scenario 2.

Figure 21. Response curves of the two-area interconnected power system in Scenario 3.

Table 1. Parameter settings for wind turbine generator.

Parameter	Value	Parameter	Value
Rated capacity $P_{w}$	750 Kw	$c_{1}$	0.5176
$V_{w}$	15 m/s	$c_{2}$	116
$ρ$	1.225 $kg / m^{3}$	$c_{3}$	0.4
$A_{r}$	1684 $m^{2}$	$c_{4}$	5
$λ$	8.68 $rpm$	$c_{5}$	21
$β$	1 $^{\circ}$	$c_{6}$	0.0068

Table 2. Comparison results of dynamic performance indicators for Scenario 1—Case A.

Control Method		IC	FOPID	ID-T Based AOA	LADRC-TD3	LADRC-LTD3
$Δ f_{1}$	US $(10^{- 2})$	−4.37	−3.33	−2.78	−2.26	−2.14
	OS $(10^{- 3})$	9.8	7.7	8.8	9	4.3
	ST	50	15.3	11.6	15.6	12.75
	IAE $(10^{- 2})$	29.4	7.36	3.85	4.51	3.37
$Δ f_{2}$	US $(10^{- 2})$	−5.83	−3.14	−2.38	−1.64	−1.57
	OS $(10^{- 3})$	11.1	7.8	5.4	13.8	6.2
	ST	48.9	15.4	13.0	15.2	13.53
	IAE $(10^{- 2})$	28.77	7.45	3.66	3.88	3.20
$Δ P_{t i e}$	US $(10^{- 3})$	−8.6	−5.5	−4.2	−3.6	−3.7
	OS $(10^{- 4})$	2.68	7.9	14	6.3	7.3
	ST	42.7	9.7	11.0	9.7	12.5
	IAE $(10^{- 2})$	7.57	1.27	0.95	0.69	0.89

The abbreviations US, OS, and ST denote the undershoot, overshoot, and settling time, respectively. In this case, the settling time refers to the minimum duration needed for the output to reach and sustain within

\pm 0.0005

.

Table 3. Comparison results of IAE for Scenario 1—Case B.

Control Method		I-TD	ID-T	LADRC-TD3	LADRC-LTD3
IAE	$Δ f_{1}$	0.7772	0.3143	0.2674	0.1722
	$Δ f_{2}$	0.8245	0.2834	0.2376	0.1423
	$Δ P_{tie}$	0.2344	0.0637	0.0540	0.0326
ITAE	$Δ f_{1}$	78.88	34.67	26.33	16.03
	$Δ f_{2}$	86.91	29.16	23.35	13.25
	$Δ P_{tie}$	23.46	7.22	5.41	3.00

Table 4. Comparison results of IAE for Scenario 2.

Control Method		I-TD	ID-T	LADRC-TD3	LADRC-LTD3
IAE	$Δ f_{1}$	2.1498	0.7702	0.5689	0.4230
	$Δ f_{2}$	1.9650	0.6334	0.5392	0.4167
	$Δ P_{tie}$	0.8144	0.3412	0.1767	0.1243
ITAE	$Δ f_{1}$	298.95	104.27	72.93	53.58
	$Δ f_{2}$	272.86	85.43	69.97	53.11
	$Δ P_{tie}$	119.59	55.12	22.24	15.39

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, Y.; Tao, J.; Sun, Q.; Sun, H.; Chen, Z.; Sun, M. Adaptive Active Disturbance Rejection Load Frequency Control for Power System with Renewable Energies Using the Lyapunov Reward-Based Twin Delayed Deep Deterministic Policy Gradient Algorithm. Sustainability 2023, 15, 14452. https://doi.org/10.3390/su151914452

AMA Style

Zheng Y, Tao J, Sun Q, Sun H, Chen Z, Sun M. Adaptive Active Disturbance Rejection Load Frequency Control for Power System with Renewable Energies Using the Lyapunov Reward-Based Twin Delayed Deep Deterministic Policy Gradient Algorithm. Sustainability. 2023; 15(19):14452. https://doi.org/10.3390/su151914452

Chicago/Turabian Style

Zheng, Yuemin, Jin Tao, Qinglin Sun, Hao Sun, Zengqiang Chen, and Mingwei Sun. 2023. "Adaptive Active Disturbance Rejection Load Frequency Control for Power System with Renewable Energies Using the Lyapunov Reward-Based Twin Delayed Deep Deterministic Policy Gradient Algorithm" Sustainability 15, no. 19: 14452. https://doi.org/10.3390/su151914452

APA Style

Zheng, Y., Tao, J., Sun, Q., Sun, H., Chen, Z., & Sun, M. (2023). Adaptive Active Disturbance Rejection Load Frequency Control for Power System with Renewable Energies Using the Lyapunov Reward-Based Twin Delayed Deep Deterministic Policy Gradient Algorithm. Sustainability, 15(19), 14452. https://doi.org/10.3390/su151914452

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Active Disturbance Rejection Load Frequency Control for Power System with Renewable Energies Using the Lyapunov Reward-Based Twin Delayed Deep Deterministic Policy Gradient Algorithm

Abstract

1. Introduction

2. Multi-Source Power System Modeling

2.1. Thermal Power Plant

2.2. Hydro Power Plant

2.3. Gas Power Plant

2.4. Noise-Based Wind Turbine Generator

2.5. Noise-Based Photovoltaic System

3. LADRC Design Process for LFC

3.1. System Order Analysis

3.2. Disturbance Estimation and Elimination for LADRC

4. Adaptive LTD3-LADRC Approach

4.1. The Basic of TD3

4.2. Lyapunov-Based Reward Function for TD3

4.3. Environment and Agent Settings for Multi-Source Power System

5. Simulation Verification and Analysis

5.1. Scenario 1: System Response Performance without RESs

5.2. Scenario 2: System Response Performance with RESs

5.3. Scenario 3: System Response Performance Considering System Parameter Variations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI