1. Introduction
Maintaining system frequency within an acceptable range is a crucial requirement in power system operation, and load frequency control (LFC) plays a vital role in achieving this objective. LFC ensures the balance between active power generation and load [
1]. Ensuring the safe operation of the system, particularly in the face of load disturbances, necessitates the effective regulation of system frequency. Hence, this study focuses on addressing the load frequency control problem. The evolving landscape of modern power systems has witnessed substantial transformations in recent years. These changes are driven by the growing diversity of loads and the escalating adoption of renewable energy sources (RESs) as alternatives to traditional power plants, addressing their inherent limitations [
2]. However, integrating RESs can lead to reduced system stability and increased frequency deviation. As a result, it is crucial to address the challenges posed by load disturbance and RESs and achieve efficient load frequency control.
Currently, there are numerous power system models being studied. These models can be broadly categorized into one area [
3,
4,
5] and multiple areas [
6,
7,
8], based on the number of areas involved. Among these models, thermal plants [
9], hydro plants [
10], and natural gas plants [
11] are the most common. Moreover, in the past few years, RESs, particularly wind turbines [
12] and solar energy [
13], have been an extensive research subject. Bakeer et al. [
14] provides an extensive discussion on the first-order transfer function model [
15], second-order transfer function model [
16], and noise-based model [
17] of photovoltaic (PV) systems. In a similar vein, for wind turbines, researchers have also explored the first-order transfer function [
18], the second-order wind farm with high-voltage direct current (HVDC) line [
19], the noise-based model [
20], and the doubly fed induction generator (DFIG)-driven wind power system [
21]. Except for investigating various energy sources, researchers introduced nonlinear components such as governor deadband (GDB) and generation rate constraint (GRC) into the system to enhance its resemblance to real-world power systems.
In recent years, intelligent control methods that integrate the control system with optimization algorithms have gained popularity for load frequency control. Among these methods, the proportional–integral–derivative (PID) control remains the dominant choice for controllers [
22,
23,
24]. However, the performance of the control system obtained by ordinary PID is poor. Therefore, many improved control methods based on PID have been proposed. For example, Guha et al. [
21] proposed a cascade fractional-order controller (CC-FOC) consisting of three-degrees-of-freedom PID (3DOF-PID) and tilt–integral–derivative (TID) controllers, and implemented it on a power system containing three areas with DFIG and conducted performance comparison against a conventional PID. In addition, various control strategies such as sliding mode control [
25,
26], model predictive control [
27], and linear active disturbance rejection control (LADRC) [
28] have also been presented as effective solutions for LFC. In particular, the LADRC controller features a simple structure, model-free operation, and the capability to leverage the extended state observer (ESO) to reckon undefined disturbances [
29]. As a result, it discovered extensive applications surpassing LFC, such as the servo system [
30], motion control [
31,
32], and control of the liquid level [
33]. LADRC is primarily employed for load frequency control in power systems that do not incorporate renewable energy sources. However, in this paper, LADRC is applied to power systems that feature renewable energy sources.
The optimization algorithms currently used in load frequency control systems mainly rely on heuristic methods, such as the grasshopper optimization algorithm (GOA) [
34], dragonfly search algorithm (DSA) [
35], and whale optimization algorithm [
36], etc. However, these algorithms usually only provide the optimal parameters for a specific scenario, which implies that they cannot obtain adaptive parameters. The problem can be addressed by utilizing deep reinforcement learning (DRL). Deep learning algorithms have gained prominence in artificial intelligence, primarily due to their ability to acquire optimal policies for solving sequential decision-making problems [
37,
38]. Numerous research studies have explored the utilization of DRL for LFC in power systems. For example, Khalid et al. [
39] tuned the PID controller parameters by a twin delayed deep deterministic policy gradient (TD3) algorithm for a two-area interconnected power system, where each area contains a TD3 agent. Zheng et al. [
40] proposed a soft actor–critic (SAC) algorithm that optimized the LADRC method and achieved application to LFC in power systems. In these studies, the design of the reward function lacked the corresponding theoretical analysis. As a result, there is no unified and efficient design rule for the reward function in DRL algorithms. Drawing inspiration from the Lyapunov reward function [
41,
42], this paper presents the LTD3 based on the Lyapunov function and TD3 algorithm, which enables real-time optimization of LADRC controller parameters.
The main contributions are summarized below,
A Lyapunov-based reward function is constructed for LFC systems to improve the convergence of the TD3, namely LTD3.
The design of LADRC aims to mitigate the effects of load disturbances and the integration of RESs in power systems.
The proposed LTD3 algorithm is utilized to dynamically adjust the parameters
and
of the LADRC by observing the system state. To evaluate its performance, a comparative analysis is conducted with other techniques such as IC, FOPID, and ID-T [
20,
43,
44]. The evaluations are conducted on a two-area power system incorporating a noise-based wind turbine and PV system, considering different scenarios.
The paper is mainly organized as below.
Section 2 presents the mathematical model of a general power system.
Section 3 explains the process of the LADRC for LFC.
Section 4 introduces the LTD3-based LADRC parameter optimization method.
Section 5 showcases simulation results to demonstrate the validation of the proposed method. Lastly,
Section 6 concludes the paper and offers suggestions for future work.
2. Multi-Source Power System Modeling
Currently, the majority of electricity generation in most countries is derived from thermal, hydro, and gas power plants, which are utilized to fulfill the base load demand. Further, renewable energy sources are also used to meet load demand. Therefore, this paper considers a multi-source power system comprising thermal, hydro, and gas turbines and renewable energies, as illustrated in
Figure 1.
Figure 1 exhibits a general structure of one area in an n-area interconnected power system, denoted as
i-th area. It illustrates the correlation between
,
, and
. And the mathematical expression is shown in Equation (
1),
where
Generally, the generator of the power system is usually presented by the following transfer function:
The transfer functions of thermal power plants
, hydro turbine plants
, gas turbine plants
, and renewable energy sources
,
are introduced in
Section 2.1,
Section 2.2,
Section 2.3,
Section 2.4 and
Section 2.5, respectively, as referenced in [
20]. It is worth noting that the mechanism construction process of the power system model is not considered in this article. Additionally, referring to
Figure 1, we can derive the following expression:
where
,
, and
are related to the controller output
, and also according to
Figure 1, their expressions can be obtained as
With Equations (
2), (
4), and (
5), the new expression for
can be organized as
Thus, combining Equations (
1) and (
6), the expression about
and
of the
i-
area can be obtained as
2.1. Thermal Power Plant
In this paper, we consider a thermal power that includes the reheat turbine and additionally incorporates nonlinear links of GDB and GRC, shown in
Figure 2. The GRC is taken as
pu/min, while the transfer function of GDB is
And there is the transfer function of the thermal power plant as
2.2. Hydro Power Plant
The diagram expressing the transfer function of a hydroelectric power unit is illustrated in
Figure 3, consisting of a governor, a transient droop compensation, and a penstock turbine with GRC constraints. The GRC for upward adjustments is 270% pu/min, while for downward adjustments, it is 360% pu/min.
Then, there is the transfer function of the hydropower plant.
2.3. Gas Power Plant
Similarly, the gas system, shown in
Figure 4, has the following expression:
2.4. Noise-Based Wind Turbine Generator
This paper examines wind energy as a form of renewable energy. Wind speed (
in m/s) affects the performance of wind energy systems.
Figure 5a shows the wind speed model based on noise, which is used to simulate unstable wind speed in the actual environment. Based on the noise-based wind speed, the mechanical power output of the wind energy system can be calculated according to Equation (
12),
where
refers to air density,
denotes blade swept area. The rotor power coefficient
is expressed by Equation (
13).
Figure 5b illustrates the connection between
,
, and
. And the diagram highlights a maximum value of
. Additionally,
Table 1 presents the pertinent parameter values for wind turbine power systems.
2.5. Noise-Based Photovoltaic System
This paper considers both wind and photovoltaic (PV) systems as renewable energy sources. In this noise-based PV system model, the power variation considers the variation of uniform and non-uniform insolation, illustrated in
Figure 6. As stated in Ref. [
14], the power deviations of PV expressed by Equation (
14) can be used to present an actual pattern depicting the variation in PV.
3. LADRC Design Process for LFC
Generally, it is imperative to employ a control strategy to preserve the stability of
and
and mitigate the impact of load disturbances. However, this paper neglects the market mechanism, and as such, steady-state conditions should not experience any frequency deviation or tie-line exchanged power. Moreover, Equation (
7) demonstrates that each area is not only affected by the load disturbance
but also by the frequency deviation of other areas. In essence, LFC must address the impact of load disturbances and counteract inter-area coupling. Additionally, the renewable energy output power deviation
can also be seen as a disturbance in the load frequency system. Thus, theoretically speaking, the LFC for a power system incorporating renewable energy is more challenging.
LADRC is a model-free controller that can effectively overcome the impact of unknown disturbances. Additionally, it can treat coupling information as unknown disturbances, enabling it to achieve decoupling. Therefore, this paper adopts LADRC for the LFC of multi-source power systems.
3.1. System Order Analysis
While LADRC has model-free characteristics, obtaining the system order is essential for designing the LESO. For convenience,
is defined as the disturbance. Then, carrying out Laplace transform on Equation (
7), it yields
where
,
, and
are the Laplace forms of
,
and disturbance
, respectively. Then, we can obtain
As a result, combined with Equations (
9), (
10), (
11), and (
16), the system order of Equation (
15) can be obtained as 2. That is, Equation (
7) can be represented as
where
denotes the total disturbance containing internal modeling information, load disturbance, and renewable energy output power deviation disturbance;
b is the constant coefficient. Furthermore, an adjustable parameter
can be used to replace
b,
Hence, the variable f represents the cumulative disturbance at the final stage, expressed as .
3.2. Disturbance Estimation and Elimination for LADRC
In
Section 3.1, the multi-source power system under investigation is transformed into a second-order system with disturbances, where the disturbance information is unknown. Consequently, this paper will use the LADRC controller to eliminate unknown disturbances and realize the load frequency control. Firstly, an LESO is used to estimate the unknown disturbance.
For the system shown in Equation (
18), the states can be defined as
,
and
, allowing for the derivation of the state space equation
Suppose that
,
, and
are the observed states of
,
and
, respectively. Then, the full-order LESO, as shown in Equation (
20), is designed to obtain the estimation of the above states [
45],
where
,
, and
are observer gains. Denote the observer error vector as
with
,
, and
; then, there is
where
and
. It can be found that the observer gains affect the stability of the observer errors. And suppose the eigenvalue is
; then, we have
, resulting in
Then, configure the pole at
, that is,
, so that the convergence of the observer errors can be guaranteed. As a result, there are
,
, and
. That is to say, the observer’s adjustable parameter count is reduced by the pole placement design method [
46]. When
is appropriate, there is
; thus, the control law shown in Equation (
22) is designed to realize the elimination of disturbance, where
is the target value of
, i.e.,
.
Substituting Equation (
23) into (
18), it can be found that the unknown disturbance
f is compensated by
. Similarly, the controller parameters are placed at the pole
through the pole placement method, obtaining
and
. In summary, LADRC requires only the system’s order information to design an LESO for estimating unknown disturbances. The control law is then used to eliminate the estimated disturbance, enabling LFC control. Moreover, the LADRC presented in this paper includes three parameters: the observer parameter
, the control law parameter
, and the parameter
. And the structure diagram of LADRC is shown in
Figure 7.
4. Adaptive LTD3-LADRC Approach
The performance of a controller is directly impacted by its parameters, and thus, it is crucial to carefully select and optimize them to ensure stability and improve robustness. As the number of areas in power systems increases, the parameter tuning process becomes more challenging and time-consuming, often requiring advanced optimization techniques and simulations to achieve satisfactory results. The adaptive tuning of controller parameters is a sequential decision-making process where the parameters are selected in a time-ordered manner. A suitable representation for this problem is a Markov decision process (MDP) denoted by . represents the state set, denotes the action space, indicates the state transition probability, and represents the reward value.
RL is an effective means to deal with sequential decision-making problems, while DRL combines the computing power of deep neural networks based on RL. However, the training outcomes heavily rely on the reward function’s design, and there is no unified and efficient design rule yet. A Lyapunov-based reward function is provided for LFC problems based on the TD3 algorithm in this paper, namely the LTD3 algorithm. This approach enables the adaptive optimization of LADRC controller parameters.
4.1. The Basic of TD3
TD3 is an improved algorithm on the basis of DDPG, which solves the overestimation issue in the DDPG [
47]. Consistent with the idea of reinforcement learning, the TD3 algorithm is also based on the dynamic interplay between the environment and the agent and uses the reward value in the interaction process to guide the agent toward selecting the most optimal strategy. The optimal policy is usually achieved by maximizing the following state–action value function.
where
is known as the cumulative reward, expressed as
. Here,
is the instant reward, and the discount factor (
) quantifies the significance of next-moment reward values in relation to the present moment.
Figure 8 showcases the frame diagram of the TD3 algorithm, encompassing essential components such as a replay buffer and six neural networks. The critic and target critic networks share the same structure as the actor and target actor networks. The replay buffer has the function of storing the data required for training the neural networks. For the six neural networks, there are the critic network 1
, critic network 2
, actor network
, target critic network 1
, target critic network 2
, and target actor network
. As for the meaning of these symbols, taking
as an instance, it means that the network input includes
s and
a, the network weight is represented by
, and the network output is
Q. The network diagram depicted in
Figure 8 illustrates the input and output variables. Notably, the critic network is designed to approximate the state–action value
Q, while the actor network generates the action value
a based on the given state
s.
In addition, compared with the DDPG algorithm, TD3 adds the target critic networks, which leverages the DDQN (double deep Q-network) algorithm, thus alleviating the problem of
Q value overestimation to a certain extent. That is to say, to enhance the proximity between the critic network’s generated
Q value and the actual
Q value, the estimation of the state–action value in the subsequent moment utilizes the smaller output from the two critic networks. This process yields the target value as
where
represents the state value in the subsequent time step; the action value
in the subsequent time step is obtained through the target action network output
:
, in which the noise
satisfies
clip
. Then, the TD errors can be obtained as
On the basis of Equation (
26), the update of
and
can be realized using the gradient descent method. Additionally, the actor network is updated after
steps of the critic network update, and the updated principle is to hope that the action value
generated by the actor network can maximize the
.
The update of the target networks adopts an exponentially moving average method, and there is
where
is a smoothing factor. And the pseudocode of the TD3 algorithm is shown in Algorithm 1.
Algorithm 1 TD3 Algorithm |
Initialize critic networks , and actor network with random parameters , and . Initialize target networks , and with weights , and . Initialize replay buffer . if then Select action with exploration noise , , observe reward r and new states . Store transition tuple in . Sample mini-batch of m transitions from . Obtain the action value by Target ActorNetwork: , is the noise and satisfies: clip. Calculate the target value by Equation ( 25): . Update critics by TD-errors: . if t mod then Update by the deterministic policy gradient: . Update target networks by moving average method: , . end if end if
|
4.2. Lyapunov-Based Reward Function for TD3
The reward function design significantly impacts the effectiveness of DRL. It serves as the feedback signal given by the environment, guiding the agent to learn the correct behavior. The reward function determines what the agent should strive to achieve and what actions should be avoided. A well-designed reward function can help the agent learn the optimal policy faster and more accurately. At the same time, a poorly designed one can lead to incorrect behavior or slow learning.
In this paper, a Lyapunov [
48]-based reward function [
49] for TD3 is proposed as
where
represents the weight value, and it can be observed that when
, the Lyapunov reward function is consistent with the normal reward function.
is the Lyapunov function with decreasing properties, and the notation
represents the feedback received when action
a is taken in state
s. Then, the following analysis can be carried out.
Assumption 1. γ approximates to 1, that is, .
Assumption 2. There exists a state–action pair that maximizes the reward value, denoted as .
Theorem 1. Under the premise of Assumptions 1 and 2, if , the following inequality holds:then the state–action pair can converge to the optimum . Proof of Theorem 1. Equation (
28) can be rearranged with Assumption 1 as
With Equation (
29), Equation (
30) leads to
Moreover, Equation (
30) can further obtain the following inequality results:
Due to the Lyapunov function
exhibiting descent property [
50], as well as the existence of a maximal reward
(Assumption 2), the
has lower bound
. This demonstrates the convergence of
. Since
,
converging to
is equivalent to
converging to
. As a result, the state–action pair
will converge to the optimum
. □
Remark 1. The conventional reward function is formulated on the premise that the reward value increases as the state quantity approaches the desired value, as illustrated in Ref. [51]. However, this reward function lacks a solid theoretical foundation. In contrast, the reward function presented in this paper incorporates the Lyapunov function, which not only provides a theoretical basis but also exhibits convergence, as validated by Theorem 1. 4.3. Environment and Agent Settings for Multi-Source Power System
As mentioned, DRL algorithms can continuously learn to find the best policy. In this paper, the environment comprises the multi-source power system and the corresponding LADRC, while the agent refers to the decision maker in the TD3 algorithm. Before training, it is necessary to clarify state and action space composition and establish a reward function based on the corresponding state variables.
This paper considers the action space as the parameters to be adjusted in LADRC. According to
Section 3, each area contains an LADRC, and the parameters that need to be adjusted in a controller include three parameters:
,
, and
. Moreover, only the tuning of the parameters
and
is considered, because it is mentioned in Ref. [
52] that
has little influence on suppressing disturbances. Moreover, it is conceivable that with an increase in the number of areas, there will be a corresponding increase in the number of controller parameters that require adjustment, so the action space can be defined as
n in Equation (
33) is the the number of areas. The detected states in each area are
and
, which are equivalent to the two variables of error and derivative of error
. Therefore, the state space is defined as
Based on the observed state, we expect the area control error to converge quickly and smoothly to the desired value of 0. Then, the following two points about reward function need to be satisfied:
(1) The is as close to 0 as possible, and the smaller the area control error, the larger the reward value should be.
(2) The overshoot and undershoot of the system response should be as slight as possible.
Therefore, the reward feedback
based on the state variables is designed as
where the coefficients of 100 and 10 are utilized to ensure the consistent magnitude of
and
. This approach prevents the agent from being overly influenced by a specific state during training. Moreover,
has the maximum value, thus satisfying Assumption 2.
Then, according to Equations (
30) and (
35), the expression of the reward function based on Lyapunov can be written as
For a power system with n areas, the structure diagram of the proposed LTD3-LADRC is shown in
Figure 9. It can be found that the purpose of LTD3 is to obtain the adaptive controller parameters shown in Equation (
33) through detecting the real-time state in Equation (
34), then inputting these parameters into the controller will generate a new state vector. By continuously repeating this process, load frequency control can be effectively achieved.
5. Simulation Verification and Analysis
In this paper, the simulation is performed using the MATLAB/Simulink R2021b platform. Please refer to the
Appendix A for the model under study [
20] and its corresponding parameters. Then two LADRC controllers are needed, and LTD3 is used to to realize parameter optimization of
,
,
, and
. Additionally,
Figure 10 illustrates the architecture of the critic and actor networks, as mentioned in Algorithm 1, proposed in this paper. The number of neurons in each hidden layer and the employed activation function are indicated within the dashed box. Network activation functions such as Relu are also shown in
Figure 10.
Moreover, the hyperparameters in the LTD3 algorithm are chosen as
,
, and
. To showcase the efficacy of the proposed LTD3 algorithm, we conducted experiments on the LADRC load frequency control system. Specifically, we applied the TD3 algorithm with the reward function described in Equation (
35) and the LTD3 algorithm with the reward function outlined in Equation (
36). Throughout the training process, one episode represents the execution of one loop in Algorithm 1 with a time interval of 0.01 s and a total simulation time of 15 s. For the control system, the parameters of
and
are all chosen as 20, and the action space for DRL is
Figure 11 illustrates the progression of the episode reward value for TD3 and LTD3 throughout the training process. The graph reveals that TD3’s reward value approaches stability around 500 episodes, whereas LTD3 achieves stability in approximately 90 episodes. This observation indicates that LTD3 exhibits faster convergence in its training.
To enhance the verification of the agent trained above, simulations are conducted under the following three scenarios:
Scenario 1: System response performance without RESs;
Scenario 2: System response performance with wind turbines and photovoltaic systems;
Scenario 3: System response performance under system parameter variations.
The difference between Scenario 1 and Scenario 2 lies in the variation of the model environment. Scenario 1 involves the model without renewable energy sources, whereas Scenario 2 incorporates wind and solar energy. Scenario 3 simulates parameter perturbation problems that may occur in the actual environment, essentially to verify the robustness of the LTD3-LADRC method.
5.1. Scenario 1: System Response Performance without RESs
Initially, we analyzed scenarios that exclude wind turbines and photovoltaic systems. This scenario is introduced in the following two cases.
Case A: System considers a SLP
In this case, a 0.01 p.u. step load perturbation (SLP) was introduced in Area I at 5 s, i.e.,
p.u.
Figure 12 and
Figure 13 present the simulation results, including the response curves of various published works, such as integral control (IC) [
44], FOPID [
43], and the ID-T based Archimedes optimization algorithm (AOA) [
20], as well as the curve generated by LADRC-TD3 for comparison. Furthermore,
Table 2 provides a numerical comparison of performance indicators, including undershoot, overshoot, settling time, and the integral absolute error (IAE) defined in Equation (
38).
Specifically,
Figure 12 depicts the dynamic response of the frequency deviation in Area I and Area II, along with the tie-line exchanged power under the influence of SLP. It shows that the system exhibits a transient undershoot response under the influence of disturbances.
Table 2 further indicates that the LADRC controller based on TD3 and LTD3 algorithms achieves smaller undershoot than other methods. It is worth noting that LTD3 exhibits slightly lower performance in terms of
compared to TD3, as indicated in
Table 2. This occurs because the
is derived from the linear combination of
and
, as shown in Equation (
1). The algorithm aims to minimize
, considering it ideal for
to be as small as possible. However, since
generally has a larger magnitude than
, it exerts a greater influence on
. Consequently, the effect of
achieved by LTD3 is superior, while the impact of
is not significantly different between the two algorithms. However, the difference is negligible, and from the target of frequency deviation, LTD3 still outperforms TD3. The changing process of controller parameters in the LADRC-TD3 and LADRC-LTD3 methods, obtained from the trained TD3 agent and LTD3 agent, respectively, is presented in
Figure 13. It can be found that LTD3 and TD3 provide different controller parameter strategies, and from the performance results in
Figure 12, the strategy obtained from the proposed LTD3 is better than that of the TD3.
Case B: System considers a multi-step load perturbation (MSLP)
In this case, the multi-step load perturbation (MSLP), expressed in
Figure 14, is added to Area I, which denotes the presence of multiple step load perturbations throughout the simulation process. Then, the dynamic response by I-TD [
20], ID-T [
20], LADRC-TD3, and the proposed LADRC-LTD3 is shown in
Figure 15 and
Figure 16. All four control methods can attain steady-state responses when faced with various step disturbances. By comparison, LADRC-TD3 and LADRC-LTD3 exhibit significantly less vibration during the transient response process than the other two methods. Furthermore,
Table 3 provides a numerical comparison of the IAE and ITAE integral of time multiplied by absolute error, as expressed in Equation (
39)), for the
,
, and
responses of the four control methods, respectively. It indicates that the proposed method achieves the smallest IAE and ITAE values, which suggests that the proposed method outperforms the other three methods regarding control accuracy and convergence speed.
Figure 16 also displays the adaptive parameters of TD3 and LTD3, obtained by observing the system state during the simulation process.
5.2. Scenario 2: System Response Performance with RESs
In this scenario, the noise-based wind turbine illustrated in
Figure 5 and the photovoltaic system depicted in
Figure 6 are taken into account based on the two-area interconnected power system described earlier. With the white noise, the deviation output of wind and solar energy exhibits fluctuations, as illustrated in
Figure 17. To further evaluate the effectiveness of the proposed method in regulating LFC that incorporate RESs, the MSLP shown in
Figure 18 is added to Area I. The wind turbine is connected at 100s, and the photovoltaic system is connected at 250 s.
The simulation results are depicted in
Figure 19 and
Figure 20. It is noticeable that, in the presence of renewable energy, the system experiences oscillations, leading to a more significant response overshoot than the load frequency system without renewable energy. However, the proposed LADRC-LTD3 method effectively suppresses the oscillations and promptly restores the system to a stable state.
Table 4 further displays the performance results of the four methods regarding IAE and ITAE, indicating that the proposed LADRC-LTD3 outperforms the other three methods significantly. Additionally,
Figure 20 illustrates the parameters obtained by the TD3 agent and LTD3 agent. The selection of parameters varies significantly when a step disturbance occurs. In addition, white noise also impacts the selection of parameters.
5.3. Scenario 3: System Response Performance Considering System Parameter Variations
When evaluating controller performance, it is crucial to consider both the dynamic response of the system and the controller’s robustness against variations in system parameters. Therefore, the Monte Carlo method is adopted for the robustness test.
Based on Scenario 2, Monte Carlo simulations are conducted for 50 runs with the model parameters of
,
,
,
, and
varying within
of their nominal values. Considering such uncertainty,
Figure 21 displays the simulation results obtained. The results demonstrate that the proposed method retains its ability to regulate load disturbance and renewable energy and achieve a stable response even when the model parameters are modified. This indicates that the proposed method exhibits strong robustness.
6. Conclusions
In conclusion, the paper proposes a Lyapunov reward-based TD3 algorithm, denoted as LTD3, optimized LADRC method for LFC in multi-source systems with RESs. The effectiveness of the LTD3-LADRC is validated through simulations on a nonlinear two-area interconnected power system that includes thermal, hydro, and gas power plants, as well as wind turbine and photovoltaic systems. The proposed LADRC-LTD3 method is compared with existing methods, and the results show that it effectively addresses the LFC problem in the presence of renewable energy and load disturbance. Furthermore, the proposed method exhibits robustness and maintains system stability even in parameter perturbation.
This paper employs the Lyapunov reward function to enhance the convergence speed of the algorithm, and the effectiveness of the proposed LTD3 algorithm is validated through simulations. Further theoretical analysis of the LTD3 algorithm is necessary in future work to improve its robustness and performance. Furthermore, it is noteworthy that the current study does not encompass the involvement of the load side within the load frequency control framework. Thus, in our forthcoming research, we intend to broaden our model and validate our methodologies in this domain.