A Transferable DRL-Based Intelligent Secondary Frequency Control for Islanded Microgrids

Li, Sijia; Blaabjerg, Frede; Anvari-Moghaddam, Amjad

doi:10.3390/electronics14142826

Open AccessArticle

A Transferable DRL-Based Intelligent Secondary Frequency Control for Islanded Microgrids

by

Sijia Li

^*

,

Frede Blaabjerg

and

Amjad Anvari-Moghaddam

Department of Energy (AAU Energy), Aalborg University, 9220 Aalborg, Denmark

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(14), 2826; https://doi.org/10.3390/electronics14142826 (registering DOI)

Submission received: 5 June 2025 / Revised: 8 July 2025 / Accepted: 11 July 2025 / Published: 14 July 2025

(This article belongs to the Special Issue Recent Advances in Control and Optimization in Microgrids)

Download

Browse Figures

Versions Notes

Abstract

Frequency instability poses a significant challenge to the overall stability of islanded microgrid systems. Deep reinforcement learning (DRL)-based intelligent control strategies are drawing considerable attention for their ability to operate without the need for previous system dynamics information and the capacity for autonomous learning. This paper proposes an intelligent frequency secondary compensation solution that divides the traditional secondary frequency control into two layers. The first layer is based on a PID controller and the second layer is an intelligent controller based on DRL. To address the typically extensive training durations associated with DRL controllers, this paper integrates transfer learning, which significantly expedites the training process. This scheme improves control accuracy and reduces computational redundancy. Simulation tests are executed on an islanded microgrid with four distributed generators and an IEEE 13-bus system is utilized for further validation. Finally, the proposed method is validated on the OPAL-RT real-time test platform. The results demonstrate the superior performance of the proposed method.

Keywords:

secondary control; frequency compensation; Deep Q-Learning; transfer learning; PID; islanded microgrid

1. Introduction

Microgrids offer accessible operational solutions for utilizing distributed generators (DGs) powered by clean energy sources like photovoltaic and wind power. This also reduces the users’ reliance on the main grid [1,2,3]. Moreover, effectively controlling and optimizing power flows while operating multiple DGs is a critical issue that must be addressed to ensure an efficient performance of the microgrid. To tackle such problems, hierarchical control schemes including primary, secondary, and tertiary control have been proposed [4]. The objective of a primary control is to achieve stable voltage and frequency output while realizing accurate power distribution. The secondary control compensates for the voltage and frequency output of the primary control to return them to their nominal values. Tertiary control is typically used to achieve economic dispatch and optimized power flow [5].

Droop control allows for distributed control with plug-and-play features, and is thus extensively used in primary controls [6]. However, sudden changes in load can cause the system frequency to deviate from the rated value. Maintaining frequency stability is an essential requirement for microgrid operations, as the frequency deviation of the system can lead to equipment failure, clock synchronization failure and other stability issues [7,8]. As a result, the secondary frequency compensation control has been widely pointed out as an area of concern by scholars and is also the focus of this paper. Secondary control can be categorized into centralized control and distributed control. The distributed control scheme eliminates the need for a centralized control center, as agents only need to exchange information with adjacent units. This reduces the communication overhead and offers greater flexibility, widening its potential application scope [9]. Various control schemes such as proportional integral controller (PID), model predictive control (MPC), and consensus algorithms are widely applied in the secondary control of microgrids [10]. In [11], the authors utilized PID controllers for frequency compensation. Building on this research, ref. [12] introduced a PSO-PID controller, a solution that employs the particle swarm optimization (PSO) algorithm for tuning the parameters of the PID controller. A two-stage fractional order scheme based on an imperialistic competitive algorithm for PID parameter design was proposed in [13]. However, the PID controller still has the drawbacks of an overshooting problem and poor dynamic characteristics. In [14], a secondary frequency compensation based on MPC controller was proposed and the operating cost was taken as a constraint. However, MPC has drawbacks such as sensitivity to parameter variations and complex calculations, potentially leading to reduced robustness. In [15], the authors proposed a secondary control based on small AC signal injection, which injects an AC signal into each DG output power supply to achieve frequency compensation while maintaining active power distribution. A finite time frequency secondary compensation was proposed in [16], but its algorithm convergence is affected by the initial operating conditions of the microgrid, making it challenging to guarantee convergence.

The research in machine learning (ML) used for control systems has been a new focus in recent years, and ML techniques with powerful data processing and computational capabilities have the potential to realize optimal control of complex operating conditions containing multiple DGs [17]. In particular, reinforcement learning (RL), which is an important branch of ML, has become an alternative to traditional control due to its improved dynamic performance and robustness [18]. In [19], the authors used Q-learning to achieve frequency compensation with plug-and-play functionality, but the accuracy of Q-learning compensation is relatively low, and its accuracy depends heavily on the initialized Q-table. In [20], a secondary frequency control based on actor–critic-based RL was proposed, which requires no a priori data and takes into account the disturbances of uncertain line parameters. But the compensation accuracy under multi-DG operating conditions remains to be enhanced. In contrast to Q-learning, deep reinforcement learning combines deep neural networks (DNN) with RL, offering a more sophisticated control solution. A DRL frequency compensation scheme based on continuous action domain was proposed in [21], which could improve the compensation efficiency. In [22], a multi-agent DRL frequency compensation scheme based on deep deterministic policy gradient (DDPG) was proposed to improve the compensation accuracy by adjusting each agent parameter on the basis of setting up a globally optimal control objective. In [23], a multi-agent quantum DRL controller was proposed for frequency compensation. It utilized a quantum neural network instead of the traditional NN, which enhanced compensation accuracy. However, the computational intensity was higher and the process was more complex. While employing a DRL-based control agent improves the compensation accuracy at the result level, achieving complete accuracy during agent training remains a challenge. The probability of insufficiently accurate frequency compensation is higher under complex operating conditions caused by large sudden load changes [18]. Furthermore, all agents in [19,20,21,22,23] adopt the strategy of offline training followed by online application to achieve frequency compensation. Nevertheless, training or optimizing the parameters of the neural networks in all agents requires great computational effort and is inefficient. Transfer learning (TL) provides a solution to simplify the computational process described above. Transfer learning aims to use previous learning experiences to help to improve the computational efficiency of the target [24,25]. In [26], the authors adopted a TL-based fault detection protocol to improve the computational efficiency. The use of a TL technique in [27] significantly reduced the computational effort of DNN-based secondary control.

To overcome the above challenge, this paper proposes an intelligent distributed frequency secondary control scheme combining PID controller and DRL for AC islanded microgrids. The method is free of a priori data and does not require system modeling. The traditional frequency secondary compensation control is divided into two layers, the first layer is compensated by the traditional PID controller, and the second layer is compensated by the model-free DRL controller based on the Deep Q-learning (DQN) algorithm to re-compensate the output result after the first layer output. Compared to the Q-learning algorithm, the DQN algorithm including DNN has better state and action space and function approximation capabilities [28], also it has more accurate compensation capability. In the second layer of DRL-based frequency compensation, a method that entails training the agent offline and applying it online is used. During the offline training, the training dataset is derived from the deviations that still exist between the frequency compensation based on the PID and the rated values. In the online application, the agent has the capability to follow the frequency output for compensation. To better illustrate the research gap, Table 1 provides a comparative analysis of representative frequency secondary control strategies in microgrids. The main contributions are as follows:

An intelligent two-layer distributed secondary frequency compensation structure is proposed, which preserves the PID controller differently from other ML-based secondary control. Building on the first layer of PID compensation, a second layer of DRL-based intelligent frequency controller is proposed to achieve precise frequency compensation.
The DRL-based frequency controller employs the DQN algorithm to realize the improved accuracy of frequency compensation. Further, the fine-tuning model TL is used to reduce the DQN training time under multi-agent conditions.

The rest of the paper is organized as follows. The primary control principle is presented in Section 2. In Section 3, the proposed secondary control principle is illustrated and discussed. In Section 4, case study results are provided. Finally, Section 5 states the conclusions of the research.

2. Primary Control of AC Islanded Microgrids

Voltage source inverters (VSIs) are suitable for achieving voltage–frequency stabilized output in autonomous operation [11]. It is assumed that all DGs are integrated into an islanded microgrid through interfacing VSI, equipped with voltage-current control loops as well as a droop controller [8]. Further, virtual impedance is used to solve the line impedance inconsistency problem. The voltage-current loops use PI-based controllers to ensure stable voltage and current operation. Droop control is adopted as a primary control scheme to achieve active power, reactive power distribution and synchronized control of all VSIs. Specifically, the droop control mechanism operates as follows:

ω_{i} = ω^{*} - K_{ω i} (P_{i}^{m} - P_{i}^{*})

(1)

V_{i} = v^{*} - K_{V i} (Q_{i}^{m} - Q_{i}^{*})

(2)

where

ω_{i}

and

V_{i}

are the frequency and voltage of the

{DG}_{i}

output.

ω^{*}

and

v^{*}

represent the rated value.

K_{ω i}

and

K_{V_{i}}

are the droop coefficients, the value selection needs to consider the system stability requirements, frequency response rate and load variation.

P_{i}^{m}

and

Q_{i}^{m}

are the active and reactive power at the output of

{DG}_{i}

after low-pass filtering, as shown in (3) and (4), while

P_{i}^{*}

and

Q_{i}^{*}

represent the rated value.

P_{i}^{m} = \frac{ω_{c}}{ω_{c} + s} * (v_{o α i} i_{o α i} + v_{o β i} i_{o β i})

(3)

Q_{i}^{m} = \frac{ω_{c}}{ω_{c} + s} * (v_{o β i} i_{o α i} + v_{o α i} i_{o β i})

(4)

where

v_{o α_{i}} = i_{o α_{i}} R_{v i} - i_{o β_{i}} ω_{i} L_{v i}

,

v_{o β_{i}} = i_{o β_{i}} R_{v i} + i_{o β_{i}} ω_{i} L_{v i}

,

R_{v i}

,

L_{v i}

is the virtual impedance. The instantaneous output voltages and currents of

{DG}_{i}

are demonstrated in the

α - β

reference frame.

ω_{c}

is the cutoff frequency of the low-pass filter. Specifically, conventional droop control lacks the capability to constrain frequency deviations under abrupt load variations, resulting in a steady-state error from the nominal frequency. To address this limitation, secondary frequency control is required to restore the system frequency to its reference value, which constitutes the core focus of this study. The secondary compensation component is added to (1), as given in (5).

Ω_{i}

is the output variable of secondary control.

ω_{i} = ω^{*} - K_{ω i} (P_{i}^{m} - P_{i}^{*}) + Ω_{i}

(5)

3. Proposed Intelligent Secondary Control

3.1. Frequency Control Framework

In this research endeavor, droop control is adopted as a primary control. An intelligent distributed frequency secondary control system with a dual-layer architecture is proposed to compensate for the frequency deviation generated when the load active power changes, as depicted in Figure 1.

z_{v}

is virtual impedance.

L o a d

is active load. The first layer incorporates a conventional PID controller designed for compensation, whose output compensation value is

δ f_{1}

. Its input is the difference between the DG output frequency

f_{i}

and the rated value

f^{*}

. The proportional

k_{p}

and integral

k_{i}

coefficients are depicted in Figure 1, and the compensation principle is elucidated in (6).

δ f_{1} = k_{p} (f_{i} - f^{*}) + k_{i} \int (f_{i} - f^{*}) d t

(6)

The secondary layer integrates a DRL scheme based on the DQN algorithm. Notably, the DRL scheme operates to further compensate atop the output generated by the first layer. The DRL-based frequency compensation in the second layer unfolds in two phases: offline training and on-line application. During the offline training, the agent will interact with the environment (corresponding to the DG), as shown in Figure 2. The agent strategically explores actions contingent upon the frequency output of the corresponding DG. Simultaneously, the DG dynamically regulates its output frequency in response to the frequency action values provided by the agent. Following this interaction, a reward function is employed, motivating the agent to adapt its behavior. The overarching objective is the minimization of frequency deviation. In addition, this paper combines the offline training process with fine-tuning transfer learning that can effectively reduce the training time for other agents.

In the online application phase, the state quantity means the input deviation quantity, namely the frequency deviation that persists after the first layer of PID compensation, denoted as

d f^{″}

, as shown in (7). The trained agent selects the Q value based on this input state. Subsequently, the agent determines the action value in accordance with the chosen Q value. The second layer of compensation is

δ f_{2}

. The frequency control for each DG is constructed upon this two-layer control framework, featuring an autonomous agent—an intelligent frequency controller—engineered to facilitate precise frequency restoration.

d f^{″} = f^{*} - f_{i} + δ f_{1}

(7)

Generalization of the above the secondary control quantities in (5) is as follows:

Ω_{i} = δ f_{1} + δ f_{2}

(8)

3.2. Application Progress of Deep Q-Learning

3.2.1. Deep Q-Learning Principle

DQN is an innovative algorithm that fuses the principles of deep learning with reinforcement learning, aiming to tackle decision-making challenges within complex input spaces. Unlike traditional Q-learning, which relies on a Q-table for discrete state–action pairs, DQN leverages a DNN to handle more intricate inputs efficiently. This algorithm learns to discern optimal decisions by approximating the action–value function (Q-function) through a neural network. A pivotal strategy in DQN is the use of experience replay, where a repository of past interactions, including states, actions, rewards, and subsequent states, is maintained. Training the network on randomly sampled experiences from this collection significantly reduces the temporal correlations, thereby improving the stability of the training program [29].

Additionally, it incorporates a dual DNN structure—comprising a main network and a target network—to further stabilize training by decoupling the fluctuations in the learning targets. These innovations empower DQN to excel in domains like control and optimization, laying a solid groundwork for future developments in deep reinforcement learning. The DQN training process is shown in Figure 3, which illustrates the DQN architecture. The environment provides the current state s and the agent selects an action a via the prediction network. After executing a, the environment returns a reward r and the next state

s^{'}

. The transition tuple

(s, a, r, s^{'})

is stored in the memory buffer. The prediction network estimates

Q_{P} (s, a)

, while the target network computes

Q_{T} (s^{'}, a^{'})

for loss calculation. The loss is used to update the prediction network, and the target network is periodically synchronized. More principles and specific details will be given in later paragraphs. In Figure 1, once the training of the agent is completed, the DNN will give the optimal frequency compensation value directly based on the Q-value of the input.

3.2.2. Application Process

Regarding the proposed second layer compensation agent input state quantity, in accordance with IEEE1547 [30]. requirements and to avoid redundant calculations, the control objective of this section is to limit the frequency deviation to be within 0.01 Hz. The

s

representation is crucial for determining how the DQN senses its environment. To product the input vector,

s

is represented by a vector encoding the frequency deviation. For the above-mentioned purposes,

s

is decomposed into three components:

s_{1}

,

s_{2}

,

s_{3}

, representing units, tenths, and hundredths, as shown as shown in Equation (9).

s (t) = s_{1} (t) + 0.1 \cdot s_{2} (t) + 0.01 \cdot s_{3} (t)

(9)

Then, the input vector

x

is formed as a column vector. Here, the matrix

x (t)

is used to represent the input variable instead of the original

s (t)

.

x (t) = {[s_{1} (t), s_{2} (t), s_{3} (t)]}^{T}

(10)

To minimize the frequency deviation, define the objective as maximizing the expected cumulative reward over time, as shown in Equation (11).

π^{*} = arg max_{π} E [\sum_{t = 0}^{\infty} γ^{t} r (x, a)]

(11)

where a is compensation action,

γ

is discount factor for future rewards,

r (x, a)

is immediate reward for taking action a in state

x

, and t is step size. The Q-learning algorithm updates the Q-values, which represent the expected future rewards for state–action pairs. The update rule is as shown in Equation (12)

Q_{new} (x, a) = (1 - α) \cdot Q (x, a) + α \cdot (r (x, a) + γ \cdot max_{a^{'}} Q (x^{'}, a^{'}))

(12)

where

α

is the learning rate.

x^{'}

is the next input.

a^{'}

is the next action.

Q (x, a)

is the vector of Q-values for all possible actions a in state x.

DQN utilizes DNN to approximate a Q-function that estimates the quality of taking an action in a given state.

θ

are the parameters of DNN,

θ_{i} = [W^{i}, b^{i}]

. The DNN architecture is given by Equations (13) and (14).

z^{(l)} = W^{(l)} x^{(l - 1)} + b^{(l)}

(13)

a^{(l)} = σ (z^{(l)})

(14)

where

W^{(l)}

and

b^{(l)}

are the weights and biases of the

l - t h

layer of the DNN, respectively.

a^{(0)} = x

and

σ

is the activation function. In order to direct the agent to improve its decision-making process by realizing the Q-value update, the reward function is set as follows, where K is a positive constant that scales the penalty for deviations greater than 0.01. A reward threshold of 0.01 Hz is adopted as a practical boundary for acceptable frequency deviation, and additional segmentation for larger errors is omitted to simplify the reward design [31,32].

r = \{\begin{matrix} - K \cdot (| s + a | - 0.01), & if | s + a | > 0.01 \\ 1, & otherwise \end{matrix}

(15)

To break the correlation between consecutive samples, DQN uses a replay buffer to store transitions and sample from it randomly. Transition information

(x, a, r, x^{'}, d o n e)

will be stored in replay buffer D,

d o n e

is a symbol of whether an episode is completed or not, it is “True” if it is completed, or “False” if it fails. The DNN parameters are updated mainly with data obtained by random sampling from D. In addition, the DQN uses two networks: a main network (for the current Q-value prediction,

Q_{p}

) and a target network (for the Q-value prediction for the next state,

Q_{t}

). The target network is introduced to increase the stability of the learning process and reduce the oscillations and divergence problems that could arise from using the same network to estimate current and future Q values. For a batch of N samples, the loss function is defined as:

L (θ) = \frac{1}{N} \sum_{i = 1}^{N} {(Q_{p} (x_{i}, a_{i}) - Q_{t} (x_{i}, a_{i}))}^{2}

(16)

The gradient for

θ

of each layer is systematically calculated using the chain rule, as follows:

\frac{\partial L}{\partial W} = \frac{1}{N} \sum_{i = 1}^{N} 2 (Q_{p} (x_{i}, a_{i}) - Q_{t} (x_{i}, a_{i})) \cdot \frac{\partial Q_{p} (x_{i}, a_{i})}{\partial W}

(17)

\frac{\partial L}{\partial b} = \frac{1}{N} \sum_{i = 1}^{N} 2 (Q_{p} (x_{i}, a_{i}) - Q_{t} (x_{i}, a_{i})) \cdot \frac{\partial Q_{p} (x_{i}, a_{i})}{\partial b}

(18)

The updated weights and biases are shown below:

W = W - α \frac{\partial L}{\partial W}

(19)

b = b - α \frac{\partial L}{\partial b}

(20)

By employing gradient descent to minimize the loss function, the DNN weights and biases are iteratively adjusted. These updates are made at each training step, and as training progresses, the main network learns to more accurately estimate the value of taking different actions in different states, thus improving the strategy.

To promote a more efficient exploration process for DQNs, an exponentially decaying

ϵ

-greedy approach is utilized for optimization. This method involves selecting random actions with a probability of

ϵ_{t}

and actions that maximize the Q-value with a probability of

1 - ϵ_{t}

.

ϵ_{t}

is given in (21), defining the decay rate as

δ

. The agent training is finally accomplished based on (12), (16)–(21) to achieve optimal frequency compensation.

ϵ_{t} = ϵ_{start} \times δ^{t}

(21)

3.3. Application Progress of Fine-Tuning Transfer Learning

Transfer learning is a method that exploits a model trained on one task (source task) and applies it to a different, yet related, task (target task). This approach is particularly useful when addressing challenges in new domains (target domains), enabling rapid updates to the model using the experience gained from an existing domain (source domain). It finds extensive application in control fields [25].

In the proposed control scheme, each DG in the islanded microgrid contains DQN control, and the operating conditions of each DG may be varied, which means that the agents of each DG need to be trained during offline training, which will result in a large computational redundancy. For training different agents, the target task and the source task are the same, i.e., the frequency compensation is realized, and the target domain is slightly different from the source domain, because each DG will have different operating conditions. As shown in Figure 2, to improve the computational efficiency, this paper adopts fine-tuning TL. By completing the training of one agent, it then inherits part of the experience that has already been trained to the rest of the DGs. This approach significantly reduces computational demands and boosts efficiency.

The DNN of the agent consists of one input layer, two hidden layers and one output layer. The input layer and the first hidden layer will be retained, transferred from the trained agenti (resource) to the rest of the agentj (target), as shown in Figure 4. This transfer includes retaining the parameters

[W^{1}, b^{1}]

. During the training of agentj, the DNN’s training procedure is divided into two main phases: feed-forward propagation and backward updating.

In forward propagation, only the weights and biases

[W^{2}, b^{2}]

and

[W^{3}, b^{3}]

of the second hidden layer and the output layer are updated during the training process, while the weights and biases of the first hidden layer and the input layer remain unchanged. Following the obtained output values with the predicted values their mean square error loss is calculated as follows.

ζ = \frac{1}{m} \sum_{m = 1}^{m} {({\hat{y}}^{(m)} - y^{(m)})}^{2}

(22)

where m is the number of neurons in the i-

t h

layer, y is the true output, and

\hat{y}

is the predicted output. The gradient of the loss function with respect to each parameter is computed in the backward update according to (16)–(20) and the parameters are updated using the gradient descent method. The rest of the agent’s training process is the same as described in the previous section.

4. Case Study

The effectiveness of the proposed intelligent frequency secondary control is demonstrated in this section. The construction of the islanded microgrid system is accomplished in MATLAB/Simulink 2023, and the DQN-based compensation and fine-tuning TL are implemented via Python 3.7 and Pytorch libraries. These computational processes are carried out on a laptop equipped with an i7-1165G7 CPU, operating at 2.8 GHz, and 32 GB of RAM.

4.1. Offline Training Process

In the first step of offline training, we constructed an AC islanded microgrid model with four DGs, as depicted in Figure 5. This model incorporates two variable loads within its configuration. Each DG incorporates primary control through droop control mechanisms and traditional secondary control using PID regulation, with the specific control parameters detailed in Table 2. The load demands within the system are defined at specific time intervals: at t = 0 s, Load 1 is 20 kW and Load 2 is 15 kW; at t = 1 s, Load 1 increases to 30 kW and Load 2 to 20 kW; and at t = 2 s, they revert to 20 kW and 15 kW, respectively. The simulation time is set to 3 s. All four DGs have a rated active power of 25 kW. Following the establishment of the microgrid model, the next phase involved initializing the agent’s architecture. For this purpose, a DNN comprising four layers, including the input layer, was integrated into the DQN, with the nuanced parameters of this network outlined in Table 3. The learning rate

α

was set to 0.009, with a buffer size D of 10,000, batch = 32, i.e., the size of the data extracted from D, and a discount factor

λ

of 0.99.

ϵ_{start}

is 0.7,

δ^{t}

is 0.995, and training episodes are 3000.

Based on the property that DRL does not require a large amount of a priori data, in the third phase, the study utilized the frequency output signal following secondary control PID compensation from the first step, quantifying the deviation

d f

from this signal to the nominal value to compile a dataset for agent training. The frequency-compensated action of the agent output during training is

δ f_{2}^{i}

,

δ f_{2}^{j}

. After successfully training the agent associated with DGi, fine-tuning TL is applied to streamline and improve the training process for the following agents. Figure 6 illustrates the whole process of offline training of the system, with i = 1 indicating the initially trained agent and j representing agents corresponding to DGs in the range [2, 4].

Following the successful training of the agent for DG1, the study moved forward to evaluate the performance of the agents for DG2, DG3, and DG4. This evaluation was conducted using two distinct methodologies: the traditional training approach, which treats each agent’s training process independently from scratch, and a fine-tuning TL strategy, which leverages the pre-trained model of DG1 to accelerate and potentially enhance the learning process for the subsequent agents. Figure 7 succinctly illustrates the progression of loss function over the training period, from which it is apparent that the fine-tuning TL methodology achieves a more rapid convergence and yields a lower loss in comparison to the original scheme. A detailed comparison of the duration of training under the two methodologies is presented in Table 4, where the compensation accuracy is also given. These results underscore the profound influence of fine-tuning TL on the training process. By employing this approach, the improved method successfully reduced the training duration for datasets DG2, DG3, and DG4 by 28.81%, 28.91%, and 31.33%, respectively. Notably, this time efficiency was achieved without compromising the precision of the original compensation accuracy.

To evaluate the robustness of the DRL-based controller, we conducted additional training experiments in which 5% zero-mean Gaussian noise was added to the source data (specifically, frequency deviation signals used as agent inputs). The results show that the trained agents are able to maintain stable learning and control performance despite the injected noise. The final frequency compensation accuracy remains consistent, with less than 1% performance degradation in steady-state error compared to the noise-free case as shown in Table 4. These results demonstrate the robustness and generalization capability of the proposed method.

4.2. Simulation Results

To assess the effectiveness of the training outcomes, simulations were conducted in Matlab/Simulink for various islanded microgrid scenarios. These simulations aimed to validate the trained agent by comparing the proposed control strategy not only with the conventional PID control but also against a Q-learning based approach detailed in [33]. In the latter, the learning rate and discount factor were set to 0.1 and 0.9, respectively. In the simulation scenario shown in Figure 5, only DG1, DG2, and load 1 are running, with the simulation duration set to 4 s. Initially, Load 1 consumed 40 kW, which increased to 50 kW at t = 2 s. Further simulation parameters are detailed in Table 2.

Figure 8 shows the performance of DG1 under three different control schemes. Table 5 details the maximum frequency deviation of each DG operating under different control schemes. This comparison highlights that the proposed strategy not only exhibits a dynamic performance comparable to that of the Q-learning approach but also surpasses the traditional PID control in terms of frequency compensation and dynamic response. Moreover, the proposed control scheme significantly improves the compensation accuracy over the PID + Q-learning method. In Figure 9, it can be seen that both DG1 and DG2 output active power is stabilized at 19.17 kW before 2 s and 23.91 kW after 2 s. This indicates effective load management and distribution.

In a subsequent validation step, simulations are conducted based on the configuration outlined in Figure 5. Different droop control parameters are provided, with the K value set to 2

\times 10^{- 4}

for DG1 and DG3, and 1.6

\times 10^{- 4}

for DG2 and DG4. The rest of the simulation parameters are consistent with those listed in Table 2. All components operating normally over a total simulation time of 6 s. When t = 0 s, Load 1 and Load 2 are 40 kW, which are adjusted at 2 s to 60 kW for Load 1 and 44 kW for Load 2, before returning to 40 kW for both loads at 4 s. The results of these simulations for the islanded AC microgrid under normal operating conditions are depicted in Figure 10. Figure 10 shows a performance comparison of DG1 under three different control strategies. A quantitative analysis of the DG operation results is detailed in Table 6. The results indicate that the PID + Q learning method and the proposed scheme have better dynamic performance than PID control. Additionally, a comparison was made with the PID + Soft Actor-Critic (SAC) algorithm, where the SAC network structure was set to two layers, each containing 128 neural units, with a learning rate of 3

\times 10^{- 4}

and a discount factor of 0.9. Based on the results in Table 6, the proposed scheme demonstrates excellent compensation accuracy, with the highest improvement in compensation for DG1 reaching 78.79%. The proposed scheme outperforms the PID + SAC scheme in compensation because DQN is more suitable for discrete controllers, while the continuous action space of SAC is prone to error accumulation. Figure 11 shows the system’s active power output when using the proposed scheme. Before the load changes, DG1 and DG3 output 21.13 kW, while DG2 and DG4 output 17.05 kW. After the load changes, DG1 and DG3 increase to 27.26 kW, and DG2 and DG4 increase to 22.31 kW, demonstrating stable active power output proportional to the droop coefficient.

Furthermore, the IEEE 13-bus system is utilized to assess the effectiveness of a newly proposed control strategy. The configuration of the control system, as detailed in Figure 12, incorporates four DGs. The offline training methodology for each DG adheres to previously described protocols, employing identically configured DNNs. Each DG has an active power rating of 10 kW. The load distribution initially assigns Load 1: 4 kW and Load 3: 2 kW, Load 2, 4, 5, 6, and 7 are 8 kW. The simulation time is 18 s. At the 6-s mark, Load 1 is adjusted to 12 kW, and at the 12-s mark, it is back to 4 kW. Frequency training datasets are generated using an IEEE 13-bus system framework based on PID compensation. This study compares the PID + Q learning method and PID + SAC with the proposed control strategy. Figure 13 shows the frequency compensation results for DG1. The maximum deviation of the proposed scheme is 0.089 Hz, a reduction of 59.56%, while PID + SAC is 0.12 Hz and the PID + Q learning method is 0.22 Hz. Correspondingly, the frequency deviations of DG2, DG3, and DG4 based on the proposed scheme are reduced by 57.23%, 58.78%, and 58.91%, respectively. Figure 14 shows the active power output of the system based on the proposed control scheme, which still achieves the active power sharing equally.

4.3. Real-Time Test Result

An OPAL-RT platform is used for real-time simulation to verify the performance. The 4DG test structure in Figure 5 is constructed in Matlab/Simulink, and the DRL controller is trained offline and preloaded in the control system. When t = 0 s, Load 1 and Load 2 are 40 kW, at t = 0.82 s Load 1 changes to 60 kW and Load 2 to 44 kW. the total simulation time is 2 s, and the rest of the parameters are the same as in Table 2. The real-time simulation platform is shown in Figure 15. Figure 16 and Figure 17 show the results of frequency output based on PID + Q-learning compensation and the proposed scheme respectively, both schemes can realize the accurate performance of compensation frequency when load fluctuation is generated. It is worth noting that, compared with the PID + Q-learning method, the frequency deviation of DG1 was reduced from 0.9 to 0.65, DG2 from 1.05 to 0.72, DG3 from 1.05 to 0.72, and DG4 from 1.27 to 0.85. The improvements are 27.78%, 31.43%, 31.43%, and 33.07%. In addition, the active power output based on the proposed scheme is given in Figure 18, which demonstrates stability of the active power output.

5. Conclusions

A distributed intelligent frequency secondary control scheme was proposed to improve the frequency compensation performance by combining a data-driven DRL-based control scheme with a traditional solution in an AC islanded microgrid. Unlike other ML-based compensation solutions, this paper did not completely abandon or replace the PID controller. The secondary control was divided into two layers, the first layer was based on PID control, and the second layer was a DQN-based intelligent control. The method of offline training and online application was adopted, and the fine-tuning TL scheme was used to inherit the previous learning experience during offline training to reduce the offline training time and the computational redundancy. In the validation stage, the proposed control strategy was proven to have higher compensation accuracy in Matlab/Simulink and compared to other schemes. According to the verification results, the proposed scheme achieves an average improvement of 83.21% in compensation accuracy compared to the traditional PI scheme, and, through TL, it can save an average of 29.68% of training time, significantly enhancing performance. It was also validated in an IEEE standard 13-bus system. Further, real-time test results also demonstrated the effectiveness of the proposed control strategy. Future work will focus on applying the proposed scheme to more complex network structures and addressing challenges related to communication links and data quality.

Author Contributions

Conceptualization, S.L.; Software, S.L.; Validation, S.L.; Investigation, S.L.; Supervision, F.B. and A.A.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Khan, T.A.; Kahwash, F.; Ahmed, J.; Goh, K.; Papadopoulos, S. From Design to Deployment: A Comprehensive Review of Theoretical and Experimental Studies of Multi-Energy Systems for Residential Applications. Electronics 2025, 14, 2221. [Google Scholar] [CrossRef]
Ma, Z.G.; Værbak, M.; Cong, L.; Billanes, J.D.; Jørgensen, B.N. Enhancing Island Energy Resilience: Optimized Networked Microgrids for Renewable Integration and Disaster Preparedness. Electronics 2025, 14, 2186. [Google Scholar] [CrossRef]
Duan, H.; Shi, F.; Wang, S.; Cui, Q.; Zeng, M. A Time Synchronization Hop-Count-Control Algorithm Based on Synchronization Error Convergence Probability Estimation. Electronics 2025, 14, 2086. [Google Scholar] [CrossRef]
Kuyumcu, A.; Karabacak, M.; Boz, A.F. High-Fidelity Modeling and Stability Analysis of Microgrids by Considering Time Delay. Electronics 2025, 14, 1625. [Google Scholar] [CrossRef]
Han, Y.; Zhang, K.; Li, H.; Coelho, E.A.A.; Guerrero, J.M. MAS-Based Distributed Coordinated Control and Optimization in Microgrid and Microgrid Clusters: A Comprehensive Overview. IEEE Trans. Power Electron. 2018, 33, 6488–6508. [Google Scholar] [CrossRef]
Chen, M.; Xiao, X.; Guerrero, J.M. Secondary Restoration Control of Islanded Microgrids with a Decentralized Event-Triggered Strategy. IEEE Trans. Ind. Inform. 2018, 14, 3870–3880. [Google Scholar] [CrossRef]
Sahoo, S.; Yang, Y.; Blaabjerg, F. Resilient Synchronization Strategy for AC Microgrids Under Cyber Attacks. IEEE Trans. Power Electron. 2021, 36, 73–77. [Google Scholar] [CrossRef]
Blaabjerg, F.; Teodorescu, R.; Liserre, M.; Timbus, A.V. Overview of Control and Grid Synchronization for Distributed Power Generation Systems. IEEE Trans. Ind. Electron. 2006, 53, 1398–1409. [Google Scholar] [CrossRef]
Li, Z.; Cheng, Z.; Liang, J.; Si, J.; Dong, L.; Li, S. Distributed Event-Triggered Secondary Control for Economic Dispatch and Frequency Restoration Control of Droop-Controlled AC Microgrids. IEEE Trans. Sustain. Energy 2020, 11, 1938–1950. [Google Scholar] [CrossRef]
Chen, X.; Qu, G.; Tang, Y.; Low, S.; Li, N. Reinforcement Learning for Selective Key Applications in Power Systems: Recent Advances and Future Challenges. IEEE Trans. Smart Grid 2022, 13, 2935–2958. [Google Scholar] [CrossRef]
Guerrero, J.M.; Vasquez, J.C.; Matas, J.; de Vicuna, L.G.; Castilla, M. Hierarchical Control of Droop-Controlled AC and DC Microgrids—A General Approach Toward Standardization. IEEE Trans. Ind. Electron. 2011, 58, 158–172. [Google Scholar] [CrossRef]
Das, D.C.; Roy, A.K.; Sinha, N. PSO based frequency controller for wind-solar-diesel hybrid energy generation/energy storage system. In Proceedings of the 2011 International Conference on Energy, Automation and Signal, Bhubaneswar, India, 28–30 December 2011. [Google Scholar]
Singh, K.; Amir, M.; Ahmad, F.; Refaat, S.S. Enhancement of Frequency Control for Stand-Alone Multi-Microgrids. IEEE Access 2021, 9, 79128–79142. [Google Scholar] [CrossRef]
Yi, Z.; Xu, Y.; Gu, W.; Fei, Z. Distributed Model Predictive Control Based Secondary Frequency Regulation for a Microgrid with Massive Distributed Resources. IEEE Trans. Sustain. Energy 2021, 12, 1078–1089. [Google Scholar] [CrossRef]
Liu, B.; Wu, T.; Liu, Z.; Liu, J. A Small-AC-Signal Injection Based Decentralized Secondary Frequency Control for Droop-Controlled Islanded Microgrids. IEEE Trans. Power Electron. 2020, 35, 11634–11651. [Google Scholar] [CrossRef]
Xu, Y.; Sun, H.; Gu, W.; Xu, Y.; Li, Z. Optimal Distributed Control for Secondary Frequency and Voltage Regulation in an Islanded Microgrid. IEEE Trans. Ind. Inform. 2019, 15, 225–235. [Google Scholar] [CrossRef]
Du, Y.; Wu, D. Deep Reinforcement Learning From Demonstrations to Assist Service Restoration in Islanded Microgrids. IEEE Trans. Sustain. Energy 2022, 13, 1062–1072. [Google Scholar] [CrossRef]
Trivedi, R.; Khadem, S. Implementation of Artificial Intelligence Techniques in Microgrid Control Environment: Current Progress and Future Scopes. Energy AI 2022, 8, 100147. [Google Scholar] [CrossRef]
Liu, W.; Shen, J.; Zhang, S.; Li, N.; Zhu, Z.; Liang, L.; Wen, Z. Distributed Secondary Control Strategy Based on Q-learning and Pinning Control for Droop-Controlled Microgrids. J. Mod. Power Syst. Clean Energy 2022, 10, 1314–1325. [Google Scholar] [CrossRef]
Adibi, M.; van der Woude, J. Secondary Frequency Control of Microgrids: An Online Reinforcement Learning Approach. IEEE Trans. Autom. Control 2022, 67, 4824–4831. [Google Scholar] [CrossRef]
Yan, Z.; Xu, Y. Data-Driven Load Frequency Control for Stochastic Power Systems: A Deep Reinforcement Learning Method with Continuous Action Search. IEEE Trans. Power Syst. 2019, 34, 1653–1656. [Google Scholar] [CrossRef]
Yan, Z.; Xu, Y. A Multi-Agent Deep Reinforcement Learning Method for Cooperative Load Frequency Control of a Multi-Area Power System. IEEE Trans. Power Syst. 2020, 35, 4599–4608. [Google Scholar] [CrossRef]
Yan, R.; Wang, Y.; Xu, Y.; Dai, J. A Multiagent Quantum Deep Reinforcement Learning Method for Distributed Frequency Control of Islanded Microgrids. IEEE Trans. Control Netw. Syst. 2022, 9, 1622–1632. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Zhu, Z.; Lin, K.; Jain, A.K.; Zhou, J. Transfer Learning in Deep Reinforcement Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13344–13362. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Wang, Z.; Zhou, Z.; Deng, H.; Zhao, W.; Wang, C.; Guo, Y. Anomaly Detection of Industrial Control Systems Based on Transfer Learning. Tsinghua Sci. Technol. 2021, 26, 821–832. [Google Scholar] [CrossRef]
Xia, Y.; Xu, Y.; Mondal, S.; Gupta, A.K. A Transfer Learning-Based Method for Cyber-Attack Tolerance in Distributed Control of Microgrids. IEEE Trans. Smart Grid 2024, 15, 1258–1270. [Google Scholar] [CrossRef]
Li, Y.; Wang, R.; Yang, Z. Optimal Scheduling of Isolated Microgrids Using Automated Reinforcement Learning-Based Multi-Period Forecasting. IEEE Trans. Sustain. Energy 2022, 13, 159–169. [Google Scholar] [CrossRef]
Li, S.; Oshnoei, A.; Blaabjerg, F.; Anvari-Moghaddam, A. Hierarchical Control for Microgrids: A Survey on Classical and Machine Learning-Based Methods. Sustainability 2023, 15, 8952. [Google Scholar] [CrossRef]
IEEE Std 1547-2018; IEEE Standard for Interconnection and Interoperability of Distributed Energy Resources with Associated Electric Power Systems Interfaces. IEEE: Piscataway, NJ, USA, 2018.
Gao, H.; Jiang, S.; Li, Z.; Wang, R.; Liu, Y.; Liu, J. A Two-Stage Multi-Agent Deep Reinforcement Learning Method for Urban Distribution Network Reconfiguration Considering Switch Contribution. IEEE Trans. Power Syst. 2024, 39, 7064–7076. [Google Scholar] [CrossRef]
Shang, Y.; Li, D.; Li, Y.; Li, S. Explainable Spatiotemporal Multi-Task Learning for Electric Vehicle Charging Demand Prediction. Appl. Energy 2025, 384, 125460. [Google Scholar] [CrossRef]
Li, S.; Gao, X.; Blaabjerg, F.; Anvari-Moghaddam, A. A Distributed Two-Layer Frequency Compensation for Islanded Microgrids Based on Q-learning and PI Controllers. In Proceedings of the 8th IEEE Workshop on the Electronic Grid (eGRID), Karlsruhe, Germany, 16–18 October 2023; pp. 1–6. [Google Scholar]

Figure 1. Framework for online application of the proposed strategy.

Figure 2. Framework for offline training of agents used in Figure 1.

Figure 3. DQN training process diagram.

Figure 4. Principle of fine-tuning TL in offline training.

Figure 5. A typical 4-DG islanded microgrid.

Figure 6. Offline learning process flowchart.

Figure 7. Comparison of losses observed during training.

Figure 8. Frequency output of DG1 when Load 1 suddenly changes to 50 kW at

t = 2

s.

Figure 8. Frequency output of DG1 when Load 1 suddenly changes to 50 kW at

t = 2

s.

Figure 9. Total system active power output in response to Load 1 change to 50 kW at

t = 2

s.

Figure 9. Total system active power output in response to Load 1 change to 50 kW at

t = 2

s.

Figure 10. Frequency output of DG1 under load disturbances. Load 1 changes to 60 kW and Load 2 to 44 kW at

t = 2

s; both loads change to 40 kW at

t = 4

s.

Figure 10. Frequency output of DG1 under load disturbances. Load 1 changes to 60 kW and Load 2 to 44 kW at

t = 2

s; both loads change to 40 kW at

t = 4

s.

Figure 11. Total system active power output during load disturbances. Load 1 changes to 60 kW and Load 2 to 44 kW at

t = 2

s; both loads change to 40 kW at

t = 4

s.

Figure 11. Total system active power output during load disturbances. Load 1 changes to 60 kW and Load 2 to 44 kW at

t = 2

s; both loads change to 40 kW at

t = 4

s.

Figure 12. A 13-bus system for testing the control structure.

Figure 13. Frequency response of DG1 in the 13-bus system. Load 1 changes to 12 kW at

t = 6

s and to 4 kW at

t = 12

s.

Figure 13. Frequency response of DG1 in the 13-bus system. Load 1 changes to 12 kW at

t = 6

s and to 4 kW at

t = 12

s.

Figure 14. Total active power output of the 13-bus system in response to Load 1 variations at

t = 6

s and

t = 12

s.

Figure 14. Total active power output of the 13-bus system in response to Load 1 variations at

t = 6

s and

t = 12

s.

Figure 15. Real-time test platform setup.

Figure 16. Frequency output of the 4-DG system using Q-learning control. At

t = 0.82

s, Load 1 changes to 60 kW and Load 2 to 44 kW in the OPAL-RT real-time simulation.

Figure 16. Frequency output of the 4-DG system using Q-learning control. At

t = 0.82

s, Load 1 changes to 60 kW and Load 2 to 44 kW in the OPAL-RT real-time simulation.

Figure 17. Frequency output of the 4-DG system using the proposed control method. At

t = 0.82

s, Load 1 changes to 60 kW and Load 2 to 44 kW in the OPAL-RT real-time simulation.

Figure 17. Frequency output of the 4-DG system using the proposed control method. At

t = 0.82

s, Load 1 changes to 60 kW and Load 2 to 44 kW in the OPAL-RT real-time simulation.

Figure 18. Proposed method active power output in OPAL-RT, at t = 0.82 s Load 1 change to 60 kW, Load 2 to 44 kW.

Table 1. Comparison of typical secondary control methods for microgrids.

Method	Model	Accuracy	Training Type	Advantages	Disadvantages
PID	No	Low	Offline	Simple, easy to implement	Poor dynamic performance, overshoot
PSO-PID	No	Medium	Offline	Optimized tuning	Limited adaptability under dynamic conditions
FOPID-ICA	No	Medium	Offline	Improved robustness	High tuning complexity
MPC	Yes	High	Model-based	High accuracy under ideal model	Sensitive to modeling errors, complex
Q-learning	No	Low	Online	Plug-and-play, no model required	Slow learning, Q-table dependent
AC-RL	No	Medium	Online	Handles uncertainties	Limited accuracy under multi-DG settings
DDPG	No	High	Online	Accurate continuous control	High training cost per agent
Quantum DRL	No	High	Online	Enhanced accuracy via quantum design	Very high computation cost
Proposed	No	High	Offline	Accurate and efficient with TL	Offline training still required

Table 2. Simulation parameters.

Parameter	Value
Rated frequency	50 Hz
Rated voltage	311 V
Droop gain (P-w)	2 $\times 10^{- 4}$
Droop gain (Q-v)	2 $\times 10^{- 4}$
Sampling time	1 $\times 10^{- 6}$ s
DC voltage	700 V
Capacitance of LC filter	150 $μ$ F
Impedance of LC filter	3 mH
Line impedance 1	0.16 $Ω$ , 0.1 mH
Line impedance 2	0.32 $Ω$ , 0.1 mH
Line impedance 3	0.23 $Ω$ , 0.1 mH
Line impedance 4	0.25 $Ω$ , 0.23 mH
Line impedance 5	0.642 $Ω$ , 0.2 mH

Table 3. Parameter of DNN used in DQN.

Layer	Input Size	Output Size	Bias Quantity	Activation Function
Input	300	300	0	None
Hidden 1	300	32	32	ReLU
Hidden 2	32	32	32	ReLU
Output	32	300	300	ReLU

Table 4. Transfer learning efficiency and accuracy comparison.

DGj	DQN	Fine-Tuning TL DQN	Time Saved Proportion	Accuracy	Accuracy (5% Noise)
DG2	20.34 s	14.48 s	28.81%	99.21%	98.5%
DG3	19.99 s	14.21 s	28.91%	99.34%	98.5%
DG4	22.34 s	15.34 s	31.33%	99.52%	98.6%

Table 5. Comparison of frequency compensation results.

Method	PID	PID + Q-Learning		Proposed Method
Deviation	Value (Hz)	Value (Hz)	Improve	Value (Hz)	Improve
DG1	0.62	0.41	33.87%	0.11	82.26%
DG2	0.64	0.42	31.25%	0.12	81.25%

Table 6. Comparison of frequency compensation results.

Method	PID	PID + Q-Learning		PID + SAC		Proposed Method
Deviation	Value	Value	Improve	Value	Improve	Value	Improve
DG1	0.99	0.59	40.4%	0.28	71.72%	0.21	78.79%
DG2	0.48	0.29	39.58%	0.16	66.67%	0.09	81.25%
DG3	0.92	0.56	39.13%	0.27	76.51%	0.18	89.13%
DG4	0.49	0.26	46.94%	0.21	57.14%	0.08	83.67%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, S.; Blaabjerg, F.; Anvari-Moghaddam, A. A Transferable DRL-Based Intelligent Secondary Frequency Control for Islanded Microgrids. Electronics 2025, 14, 2826. https://doi.org/10.3390/electronics14142826

AMA Style

Li S, Blaabjerg F, Anvari-Moghaddam A. A Transferable DRL-Based Intelligent Secondary Frequency Control for Islanded Microgrids. Electronics. 2025; 14(14):2826. https://doi.org/10.3390/electronics14142826

Chicago/Turabian Style

Li, Sijia, Frede Blaabjerg, and Amjad Anvari-Moghaddam. 2025. "A Transferable DRL-Based Intelligent Secondary Frequency Control for Islanded Microgrids" Electronics 14, no. 14: 2826. https://doi.org/10.3390/electronics14142826

APA Style

Li, S., Blaabjerg, F., & Anvari-Moghaddam, A. (2025). A Transferable DRL-Based Intelligent Secondary Frequency Control for Islanded Microgrids. Electronics, 14(14), 2826. https://doi.org/10.3390/electronics14142826

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Transferable DRL-Based Intelligent Secondary Frequency Control for Islanded Microgrids

Abstract

1. Introduction

2. Primary Control of AC Islanded Microgrids

3. Proposed Intelligent Secondary Control

3.1. Frequency Control Framework

3.2. Application Progress of Deep Q-Learning

3.2.1. Deep Q-Learning Principle

3.2.2. Application Process

3.3. Application Progress of Fine-Tuning Transfer Learning

4. Case Study

4.1. Offline Training Process

4.2. Simulation Results

4.3. Real-Time Test Result

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI