Optimal Operation of Virtual Power Plants Based on Stackelberg Game Theory

Zhang, Weishi; He, Chuan; Wang, Haichao; Qian, Hanhan; Lin, Zhemin; Qi, Hui

doi:10.3390/en17153612

Open AccessArticle

Optimal Operation of Virtual Power Plants Based on Stackelberg Game Theory

by

Weishi Zhang

,

Chuan He

,

Haichao Wang

,

Hanhan Qian

,

Zhemin Lin

and

Hui Qi

^*

Anhui Power Exchange Center Co., Ltd., Hefei 230022, China

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(15), 3612; https://doi.org/10.3390/en17153612

Submission received: 8 June 2024 / Revised: 14 July 2024 / Accepted: 17 July 2024 / Published: 23 July 2024

(This article belongs to the Section F: Electrical Engineering)

Download

Browse Figures

Versions Notes

Abstract

As the scale of units within virtual power plants (VPPs) continues to expand, establishing an effective operational game model for these internal units has become a pressing issue for enhancing management and operations. This paper integrates photovoltaic generation, wind power, energy storage, and constant-temperature responsive loads, and it also considers micro gas turbines as auxiliary units, collectively forming a typical VPP case study. An operational optimization model was developed for the VPP control center and the micro gas turbines, and the game relationship between them was analyzed. A Stackelberg game model between the VPP control center and the micro gas turbines was proposed. Lastly, an improved D3QN (Dueling Double Deep Q-network) algorithm was employed to compute the VPP’s optimal operational strategy based on Stackelberg game theory. The results demonstrate that the proposed model can balance the energy complementarity between the VPP control center and the micro gas turbines, thereby enhancing the overall economic efficiency of operations.

Keywords:

virtual power plant; Stackelberg game; deep reinforcement learning; operation optimization

1. Introduction

A virtual power plant (VPP) is an energy management system realized through advanced information and communication technologies. Reference [1] proposes integrating distributed generation, controllable loads, and storage devices for energy management and scheduling, thereby reducing the impact of the uncertainties associated with distributed generation on the stable operation of the power system. As the scale of distributed generation continues to expand, VPPs offer a broader space for improving energy utilization efficiency in the electricity market.

The control problem of peak shaving demand response (DR) participation in a virtual power plant involves the effective integration of the downward adjustment potential of a large number of distributed energy resources (DERs) to dynamically track predefined target load profiles. These target load profiles can be determined based on settlement results from electricity ancillary service markets [2], customized operational strategies [3], and discrepancies between DR baselines and actual response capacities [4]. This control problem can be classified into two approaches: open-loop and closed-loop methods. Conventional open-loop control methods are suited for centrally dispatching energy storage systems, gas turbines, and other DERs with strong control capabilities within VPP operations [5]. DERs with weaker control authority, such as air conditioning units, are more suitable for participating in peak shaving DR using adaptive closed-loop control methods that require rapid adaptation to changes. It should be noted that the effectiveness of closed-loop control methods may be adversely affected by uncertainties such as control delays and model-measurement errors [6].

Reference [7] proposes that many scholars both domestically and internationally have conducted extensive research on demand-side management inspired by game theory. For instance, reference [8] considers the degree of demand response while setting electricity prices at the VPP control center, ultimately establishing a Stackelberg game optimization model for the internal units of the VPP. Reference [9] proposes the concepts of “price-based quantity” and “quantity-based price” in the context of electric vehicle charging management, modeling the pursuit of maximum benefits by users and electricity retailers as a Stackelberg game. Reference [10] presents a real-time optimal control model for demand-side response based on the Stackelberg game, with electricity retailers as leaders and various loads as followers, obtaining optimal strategies for all participants. Reference [11] establishes a two-tier Stackelberg game model for market transactions involving electric vehicle operators and distributed generators. Reference [12] constructs a two-stage Stackelberg model considering the operational hierarchy of the electricity market, involving the upper-level market, electricity retailers, and users. However, these studies mainly focus on demand-side management without game modeling for energy-side units, which is crucial for reducing power wastage on the energy side.

Deep reinforcement learning (DRL) provides powerful algorithms for solving problems in power system research. Reference [13] proposes an optimization method for the operation of a photovoltaic-storage combined power plant’s energy storage system in the spot market environment based on DQN (Deep Q-network), considering deviation penalty costs, energy revenue, and frequency regulation ancillary service revenue, to maximize the economic benefits of the PV-storage combined power plant. Reference [14] introduces the DDPG (Deep Deterministic Policy Gradient) algorithm, utilizing the experience replay mechanism and proportionally regulating perception learning to effectively enhance algorithm stability, successfully applied to the dynamic energy scheduling of power systems. References [15,16] propose the application of both the DDQN (Double Deep Q-network) algorithm and the Dueling-DQN (Dueling Deep Q-network) algorithm to the optimal scheduling of power systems. Additionally, reference [17] uses the D3QN (Dueling Double Deep Q-network) algorithm for real-time energy management, but this algorithm may encounter strategy oscillation issues in continuous and complex decision-making environments, primarily because it relies on extensive exploration to maintain performance, potentially leading to instability in rapidly changing environments.

This paper designs a VPP containing energy storage, micro gas turbines, wind farms, photovoltaics, and temperature-controlled loads. Under the framework of setting electricity sales and purchase price strategies at the VPP control center with energy storage, considering the flexibility of micro gas turbines and the dispatchability of demand response resources, a VPP operation optimization model based on the Stackelberg game is established. An improved D3QN algorithm is proposed to adapt to the interactive supply and demand within the VPP, and the economic efficiency of the proposed model is verified through case studies.

The remaining sections of this work are organized as follows: Section 2 presents the modeling of sources, loads, and storage within the virtual power plant. Section 3 introduces the Stackelberg game model, focusing on the game-theoretic modeling between the virtual power plant and gas turbines. Section 4 employs deep reinforcement learning to solve the game model, specifically attempting to integrate the D3QN algorithm with the electricity market to address strategic interactions in the virtual power plant context. In Section 5, a case study is used to analyze and compare the algorithm, thereby validating its practical effectiveness. Finally, Section 6 summarizes the main conclusions of this paper and outlines potential future research challenges.

2. Framework of Virtual Power Plant with Energy Storage

The number of loads within a virtual power plant jurisdiction can be quite large, making it impractical for the dispatch center to manage each load individually. A more feasible approach to categorize the loads and model different types of loads separately is proposed in Reference [18]. To streamline optimization, the load user groups within the VPP are equivalently considered as non-dispatchable loads and temperature-responsive loads, with the latter divided into air conditioning loads and water heater loads. In addition to distributed generation and loads, the VPP also includes energy storage and gas turbines, as shown in Figure 1.

The operation of each entity within the VPP is as follows: The VPP control center predicts the output of distributed generation within the virtual power plant for the next day based on historical data and sets reasonable electricity purchase and sale prices in advance. It then selects the set temperatures for air conditioning and water heater loads based on weather information and real-time electricity prices. Finally, the micro gas turbine optimizes its output throughout the day based on the electricity prices provided by the VPP to maximize its own revenue.

This paper investigates the energy optimization and scheduling problem of a VPP. The VPP is connected to the main grid, allowing it to purchase electricity from the main grid in case of a power shortage and sell electricity to the main grid in case of a power surplus. This effectively addresses the supply–demand imbalance issues during the operation of the VPP system.

2.1. Distributed Generation Units

Distributed generation components are the core elements of a VPP. As the penetration rate of renewable energy generation within the VPP continues to increase, the uncertainty and variability of its power output present challenges for the economic operation of the VPP. The distributed generation data used in this paper are processed from real-time power generation data of typical photovoltaic and wind power plants in the East China region.

2.2. Micro Gas Turbine Power Generation

On the power supply side, there is a micro gas turbine primarily fueled by natural gas to provide electrical energy to the user side. The relationship between its fuel cost and the output electrical power of the unit in the

t - t h

time period within a day can be expressed as

E_{t}^{M T} = \frac{q_{g a s}}{Q_{g a s} η^{M T}} P_{t}^{M T}

(1)

P_{M T}^{\min} \leq P_{t}^{M T} \leq P_{M T}^{\max}

(2)

P_{M T}^{d o w n} \leq P_{t}^{M T} - P_{t - 1}^{M T} \leq P_{M T}^{u p}

(3)

In the equations,

E_{t}^{M T}

represents the generation cost of the micro gas turbine during time period

t

.

P_{t}^{M T}

represents the electricity generated by the micro gas turbine during time period

t

.

q_{g a s}

represents the unit price of natural gas.

Q_{g a s}

represents the lower heating value of natural gas.

η^{M T}

represents the efficiency of the gas turbine.

P_{M T}^{\min}

represents the minimum active power of the generator.

P_{M T}^{\max}

represents the minimum active power of the generator.

P_{t}^{M T}

and

P_{t - 1}^{M T}

represent the power output of the gas turbine at the current moment and the previous moment, respectively.

P_{M T}^{d o w n}

and

P_{M T}^{u p}

represent the upper and lower limits of the ramping power of the gas turbine.

2.3. Constant-Temperature Control Load

2.3.1. Air Conditioning Load Model

The power variation of the air conditioning system is generally related to the temperature difference between indoors and outdoors. To model the air conditioning system, including the influence of outdoor temperature on its performance, a variation model dependent on outdoor temperature can be introduced. This type of model more accurately reflects the operational efficiency of the air conditioning system under different environmental temperatures. A common approach is to represent

λ

as a function of outdoor temperature:

Q_{l o a d} = k \cdot (T_{i n} - T_{o u t})

(4)

P_{t}^{A C} (T_{o u t}) = \frac{Q_{l o a d}}{λ (T_{o u t})}

(5)

λ (T_{o u t}) = a + b \cdot T_{o u t} + c \cdot T_{o u t}^{2}

(6)

In these equations,

Q_{l o a d}

represents the heat or cold load caused by the temperature difference between indoors and outdoors. The calculation method may vary depending on the cooling or heating requirements.

k

is the heat transfer coefficient, representing the insulation performance of the building, with units of kW/°C.

T_{i n}

is the indoor temperature.

P_{t}^{A C} (T_{o u t})

represents the power consumption of the air conditioning load at time

t

.

λ (T_{o u t})

is the performance coefficient of the air conditioning system, which varies with outdoor temperature

T_{o u t}

.

a

,

b

, and

c

are model parameters obtained through fitting with actual data.

T_{o u t}

represents the outdoor temperature.

The air conditioning switch is set to

{S_{t}}^{A C} = \{\begin{matrix} 0 & T_{i n} > T_{s e t}^{+} \\ 1 & T_{i n} < T_{s e t}^{-} \\ S_{t - 1}^{A C} & o t h e r w i s e \end{matrix}

(7)

In this equation,

S_{t}

and

S_{t - 1}

represent the states of the air conditioning system at the current moment and the previous moment, respectively.

T_{s e t}^{+}

represents the upper limit of the set temperature, while

T_{s e t}^{-}

represents the lower limit of the set temperature. The power of the air conditioning can be obtained as

P_{t}^{A C} = S_{t} P_{t}^{A C} (T_{o u t})

(8)

An air conditioning energy consumption model has been obtained that takes into account the influence of outdoor temperature. This model can calculate the power consumption of the air conditioning system based on different outdoor temperatures, thereby assisting the virtual power plant in optimizing its dispatching strategy to adapt to various environmental conditions and operational efficiencies.

2.3.2. Water Heater Load Model

Electric water heaters are primarily used for bathing and kitchen water in daily life, and their internal load must accurately reflect the water usage habits at different times. Throughout the day, peak periods of hot water demand occur during morning and evening bathing peaks and meal preparation periods, during which the electricity consumption of the water heater noticeably increases.

Probability density functions for water heater usage are established for different time periods, and the overall probability density is calculated by integrating the probability density functions across different time periods. This probability density can describe the variation in the probability of water heater usage at different times throughout the day, as shown in Equation (9):

\begin{matrix} f_{i} (x) = \frac{1}{\sqrt{2 π} σ_{i}} \exp (- \frac{{(x - μ_{i})}^{2}}{2 {σ_{i}}^{2}}) & i = 1, 2, 3, 4, 5 \end{matrix}

(9)

\{\begin{array}{l} f_{t} (x) = \sum_{i = 1}^{5} ρ_{i} \cdot f_{i} (x) \\ ρ_{1} + ρ_{2} + ρ_{3} + ρ_{4} + ρ_{5} = 1 \end{array}

(10)

In these equations,

f_{i} (x)

i = 1, 2, 3, 4, 5

represents the probability density of water usage during different time periods, including morning bathing, breakfast, lunch, dinner, and evening bathing.

Similar to air conditioning switch settings, electric water heater switches can be configured as follows:

{S_{t}}^{W H} = \{\begin{matrix} 0 & T_{i n} > T_{s e t}^{+} \\ 1 & T_{i n} < T_{s e t}^{-} \\ S_{T - 1}^{W H} & o t h e r w i s e \end{matrix}

(11)

In this equation,

S_{t}

and

S_{t - 1}

represent the states of the water heater system at the current moment and the previous moment, respectively.

T_{s e t}^{+}

represents the upper limit of the set temperature, while

T_{s e t}^{-}

represents the lower limit of the set temperature.

2.4. Energy Storage Battery Model

The VPP control center directly controls the charging and discharging actions of the energy storage battery component. The charging and discharging constraints of the energy storage battery are as follows:

S_{t} = S_{t - 1} + η^{+} e_{t}^{+} - e_{t}^{-} / η^{-}, \forall t

(12)

\{\begin{matrix} 0 \leq e_{t}^{+} \leq u_{t} R_{m}^{+} & \forall t \\ 0 \leq e_{t}^{-} \leq (1 - u_{t}) R_{m}^{-} & \forall t \\ 0 \leq S_{t} \leq S_{\max} & \forall t \\ S_{1} = S_{T} \end{matrix}

(13)

In these equations,

S_{t}

represents the state of charge of the energy storage battery at time

t

.

e_{t}^{+}

and

e_{t}^{-}

represent the charging and discharging amounts of the energy storage device at time

t

.

u_{t}

is a Boolean variable indicating the state of the energy storage device at time

t

.

R_{m}^{+}

and

R_{m}^{-}

represent the charging power limits.

η^{+}

and

η^{-}

represent the efficiency of the energy storage charging and discharging equipment.

S_{\max}

represents the maximum capacity of the energy storage.

S_{1}

represents the state of charge of the battery at the beginning of the day.

S_{T}

represents the state of charge of the battery at the end of the day.

3. Real-Time Interactive Stackelberg Game Model

3.1. Stackelberg Game Framework

To study the game optimization operation of internal units within a VPP, this paper establishes a Stackelberg game model featuring the VPP control center and a micro gas turbine as the main entities, as shown in Figure 2.

The upper-level leader is the VPP control center, while the lower-level follower is the micro gas turbine. Both the VPP control center and the micro gas turbine aim to maximize their respective profits. The decision variable for the VPP control center is the electricity purchase price from the micro gas turbine, and the decision variable for the micro gas turbine is its own power generation output.

The upper-level leader, the VPP control center, is constrained by the status of temperature-responsive loads, the operation status of energy storage, and the electricity purchase price from loads. The lower-level follower, the micro gas turbine, is constrained by its maximum and minimum operational power, as well as its maximum ramp rate.

3.2. Objective Function Design

Economic dispatch aims to maximize expected returns. To achieve the economic optimal operation control of the VPP control center, this paper takes the maximum operating profit of the VPP control center as the objective function for the economic optimization scheduling of the VPP. The operational costs for the VPP control center mainly include the cost of purchasing electricity from the micro gas turbine, the cost of distributed generation, and the cost of purchasing electricity from the main grid. The revenue of the VPP primarily comes from the profits gained from meeting various power load demands and selling electricity to the main grid. Therefore, the objective function of the VPP control center is set as follows:

\begin{matrix} \max \sum_{t = 0}^{T} F_{v p p, t} = p_{l o a d} L_{l o a d} + p_{t c l} \sum_{i = 0}^{N_{t c l}} L_{t c l}^{i, t} + p_{t}^{s} E_{t}^{s} + \\ p_{s o c} (e_{t}^{+} + e_{t}^{-}) - p_{\cos t} G_{t} - p_{t}^{b} E_{t}^{b} \end{matrix}

(14)

In this equation,

F_{v p p, t}

represents the profit of the VPP control center at time

t

.

p_{l o a d}

is the electricity price for price-responsive loads.

L_{l o a d}

represents the output status of the daily load.

p_{t c l}

is the electricity price for constant-temperature control loads.

N_{t c l}

is the quantity of constant-temperature control loads.

p_{t}^{s}

is the selling price of electricity to the main grid.

p_{t}^{b}

is the purchasing price of electricity from the main grid.

E_{t}^{s}

is the amount of electricity sold to the main grid.

p_{s o c}

is the unit price for charging and discharging of the energy storage battery.

E_{t}^{b}

is the amount of electricity purchased from the main grid.

p_{\cos t}

is the generation cost of distributed generation components.

e_{t}^{+}

and

e_{t}^{-}

are the charging and discharging amounts of the energy storage device at time

t

.

G_{t}

is the power generation of distributed generation components.

The objective function of the micro gas turbine is as follows:

\max F_{t} = \sum_{t = 0}^{T} (p_{t}^{M T} P_{t}^{M T} - E_{t}^{M T})

(15)

In this equation,

F_{t}

represents the profit of the micro gas turbine during time period

t

.

E_{t}^{M T}

represents the generation cost of the micro gas turbine during time period

t

.

P_{t}^{M T}

represents the electricity generation of the micro gas turbine during time period

t

.

p_{t}^{M T}

represents the purchasing price offered by the virtual power plant to the micro gas turbine during time period

t

.

4. Model Solution Based on Deep Reinforcement Learning

4.1. Problem Description

4.1.1. State Space

The state space

s_{t}

contained in this paper includes the following: information regarding the relative state of charge of constant-temperature control load components; information about the power generation of distributed generation components; information about the power generation of the gas turbine unit; information about the overall power consumption of price-responsive load components; information about the state of charge of energy storage components; information about the charging and discharging prices of energy storage components; information about the prices of price-responsive load components; time information; the electricity price for purchasing from the main grid; the electricity price for selling to the main grid.

4.1.2. Action Space

The action space

a_{t}

in this paper consists of four parts: In Section 2.2, action space

a_{c}

for gas turbine generation can be obtained; in Section 2.3, action space

a_{t c l}

for constant-temperature control load can be obtained; action space

a_{e}

for system power shortage and action space

a_{d}

for system power surplus can be obtained by interacting with the main grid. This paper defines the action space for temperature-controlled loads to include eight possible actions, which specify the adjustment of temperatures for air conditioning and water heater loads

a_{t} = {(a_{t c l}, a_{c}, a_{d}, a_{e})}_{t}

. The gas turbine has four actions: base load mode, peak shaving mode, standby mode, and start/stop operations. The VPP control center has two actions for power shortage and two for power surplus, which specify the priority between battery charging/discharging and purchasing/selling electricity from/to the main grid. The action space in the virtual power plant comprises 128 possible potential actions.

4.1.3. Reward Function

Reinforcement learning aims to maximize expected returns. To achieve the economic optimization control of the VPP control center, this paper sets the highest operating profit of the VPP control center as the objective function for VPP economic optimization scheduling. It is considered that the operating costs of the VPP mainly include electricity trading transmission costs, wind power generation costs, and purchasing costs from the main grid, and the revenue of the VPP mainly includes profits from meeting various electricity load demands and profits from selling electricity to the main grid. Therefore, the reward function of the VPP control center in this paper is derived from Equation (14) as follows:

\begin{matrix} r_{t} = p_{l o a d} L_{l o a d} + p_{t c l} \sum_{i = 0}^{N_{t c l}} L_{t c l}^{i, t} + p_{t}^{s} E_{t}^{s} + \\ p_{s o c} (e_{t}^{+} + e_{t}^{-}) - p_{\cos t} G_{t} - p_{t}^{b} E_{t}^{b} \end{matrix}

(16)

The reward function for the micro gas turbine is derived from Equation (15) as follows:

r_{t} = p_{t}^{M T} P_{t}^{M T} - E_{t}^{M T}

(17)

4.2. Designing the Improved D3QN Algorithm

On the basis of the game model of the virtual power plant with energy storage mentioned above, this paper proposes an improved D3QN game algorithm to assist agents in finding pricing or quantification strategies that maximize their own rewards. During the learning process, the intelligent agent interacts with the environment in the improved D3QN game algorithm, selecting the best actions to achieve the optimization control objectives of the VPP system. The Q-value function can be defined as follows [17]:

\begin{array}{l} Q (s_{t}, a_{t}; θ, θ_{v}, θ_{A}) = V (s_{t}; θ, θ_{v}) \\ + (A (s_{t}, a_{t}; θ, θ_{A}) - \frac{1}{|A|} A (s_{t}, a_{t + 1}; θ, θ_{A})) \end{array}

(18)

In this equation,

Q (•)

is the output function of the Q-network algorithm.

V (•)

is the state value function that characterizes the goodness or badness of the state.

A (•)

is the state-action advantage function.

θ

represents the shared network parameters.

θ_{v}

and

θ_{A}

are the neural network parameters unique to the state value function and the action advantage function, respectively.

a_{t + 1}

is the action at time

t + 1

.

1 / |A|

is the average value of the action advantage function [17].

\begin{array}{l} y_{t} = R_{t + 1} + γ Q (s_{t + 1}, \\ \arg \max_{a + 1} Q (s_{t + 1}, a_{t + 1}; θ, θ_{v}, θ_{A}); θ^{'}, {θ^{'}}_{v}, {θ^{'}}_{A}) \end{array}

(19)

In this equation,

r_{t + 1}

is the reward obtained by the agent at time

t + 1

.

s_{t + 1}

is the state set at time

t + 1

.

γ

is the discount factor.

θ

,

θ_{v}

, and

θ_{A}

are parameters of the evaluation network, and the parameters

θ

,

θ_{v}

, and

θ_{A}

are continuously updated.

θ^{'}

,

{θ_{v}}^{'}

, and

θ_{A}

are parameters of the target network, and they are replaced with the latest evaluation network parameters every N time steps.

To enhance the exploration ability of the improved D3QN reinforcement learning agent and prevent the algorithm from converging to local optima, the concept of a noise network is proposed in Reference [19]. Noise is added to the fully connected layers of the neural network, allowing parameters to be learned in a noisy environment, thus effectively enhancing network generalization and improving algorithm robustness. Considering the forward calculation formula of the next fully connected layer as

y = ω ∙ x + b

, the network parameters

ω

in

u^{ω} + δ^{ω} ∙ ξ^{ω}

and the network parameters

b

in

u^{b} + δ^{b} ∙ ξ^{b}

are transformed, where the network parameters

u

and

δ

represent the mean and standard deviation, respectively.

ξ

represents random noise, and

ξ \sim (0, 1)

. By adding noise to the neural network parameters, the exploration capability of the reinforcement learning algorithm is effectively enhanced.

In the traditional D3QN algorithm, the optimal action is selected through strategy

ε - g r e e d y

. The agent selects the action with the highest Q-value with probability

ε

and a random action with probability

1 - ε

. As the number of episodes increases, the value does not increase significantly. Considering that the electricity selling price to the grid

p_{t}^{s}

is relatively low, this paper combines strategy

ε - g r e e d y

with the power surplus action

a_{d}

to balance action exploration. When

a_{d}

is true, the value of

ε

is reduced to increase the chances of favorable exploration. Table 1 presents the pseudocode for the improved D3QN algorithm. This table outlines the key steps and modifications implemented in the algorithm Figure 3 shows the structure of the D3QN algorithm with a noisy network.

Based on the above network structure and real-time electricity price incentive mechanism, the training rules for the VPP energy management method using the improved D3QN algorithm are as follows.

5. Example Analysis

5.1. Scenario Description

This paper employs the Python language and utilizes the TensorFlow tool to design a virtual power plant as depicted in Figure 1, to validate the feasibility and effectiveness of the proposed virtual power plant internal unit game method based on the improved D3QN algorithm.

The distributed generation data and meteorological information required for the experiments were processed from real-time power generation data of typical photovoltaic and wind power plants in the East China region. The experimental sample data were collected once per hour, covering a year of characteristic information for the components within the virtual power plant. To maintain generality, the power generation data of distributed energy components were averaged over the year, as shown in Figure 4. Figure 5 shows the price curve of the virtual power plant for selling electricity to residents.

The loads include non-adjustable loads and temperature-controlled loads, with the latter divided into air conditioning loads and water heater loads. Figure 6 shows the power load of air conditioning for one day at different set temperatures. The outdoor temperature was processed using real-time summer temperatures from a specific location, with experimental sample data collected every 5 min. Table 2 presents the probability density parameters for water heater usage. Table 3 lists other parameters in the virtual power plant system environment.

5.2. Algorithm Performance Analysis

The algorithm was run on a computer with an Intel(R) Core(TM) i5-7300HQ CPU @ 2.50 GHz and 8 GB RAM. It was implemented using Python 3.10 and TensorFlow 2.11.0 as the deep learning framework.

In the algorithm, rewards were calculated every hour and accumulated to obtain the total daily reward. The training set for the VPP control center consisted of 2000 episodes, and 10 experiments were conducted under different algorithms to obtain the average reward value. The convergence of the VPP control center during iterations is shown in Figure 7. The algorithm exhibited significant fluctuations in the first 30 episodes, primarily due to the randomness in strategy selection during this stage. Subsequently, while gradually optimizing its strategy, the VPP control center and the gas turbine engaged in a non-cooperative Stackelberg game, which dominated the reward fluctuations during this period.

As observed from the figure, the convergence range of the Dueling DQN algorithm is between 510 and 530, the DDQN algorithm converges between 530 and 550, and the improved D3QN algorithm converges between 530 and 590. The convergence curves indicate that the improved D3QN algorithm exhibits better network optimization effects and has larger fluctuation amplitudes, making it less likely to fall into local optima.

To further verify the impact of real-time electricity price incentives on energy scheduling within a virtual power plant, Figure 8 presents a comparison of the VPP control center’s profits over seven days under three different algorithms.

Under the DDQN algorithm, the total profit of the virtual power plant control over seven days is CNY 350,453.7. When using the Dueling DQN algorithm, the total profit is CNY 384,555.2. With the improved D3QN algorithm, the total profit over seven days for the VPP control center is CNY 391,069.6, representing an increase of 11.6% compared to the DDQN algorithm and 1.7% compared to the Dueling DQN algorithm.

5.3. Analyzing the Results of Reinforcement Learning Optimization Scheduling

The improved D3QN algorithm was used for training the VPP control center and gas turbine agents. Using historical data, the agents were trained offline, with Figure 9 showing the price changes over 5000 iterations. The colormap used ranges from blue to yellow, where blue represents lower price values and yellow represents higher price values. This visual aid allows for an easier comparison of how prices fluctuate over time and across different simulations. Figure 10 presents the scheduling results obtained after convergence.

From Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10, it can be seen that in the first half of the day, the wind farm generates a significant amount of electricity. During the 00:00–03:00 time period, when the electricity selling price to the main grid is low, the energy storage battery prioritizes charging. During the 12:00–18:00 period, when the electricity selling price is higher, the energy storage battery discharges in a timely manner. Additionally, the VPP purchases any shortfall from the main grid to meet the internal power load demands of the virtual power plant.

During periods when distributed generation in the virtual power plant management system is high, sufficient electricity is allocated to temperature-controlled loads to maintain a comfortable temperature for the rest of the day without needing additional power. Using the improved D3QN algorithm for virtual power plant energy scheduling can avoid purchasing large amounts of electricity during high-demand and low-generation periods. This approach achieves peak shaving and valley filling, enhancing the overall daytime revenue of the VPP control center. Consequently, it maximizes the economic dispatch benefits of the internal game units within the virtual power plant.

6. Conclusions

This paper successfully proposes an improved D3QN algorithm for optimizing the economic dispatch within a virtual power plant. First, a comprehensive VPP economic dispatch game model was established, encompassing various energy resources such as energy storage, micro gas turbines, wind power, and photovoltaics. By integrating a greedy selection mechanism and a noisy network into the D3QN algorithm, this study not only enhances the model’s capability to handle complex and dynamic market environments but also improves the real-time responsiveness and robustness of the decision-making process.

Simulation experiments based on real data demonstrated that the improved D3QN algorithm significantly outperforms traditional DDQN and Dueling DQN algorithms. It shows better adaptability in handling the challenges of continuous decision-making environments, effectively reducing operational costs while maximizing the satisfaction rate of power load demands and the profitability of power transactions with the main grid. Compared to the DDQN and Dueling DQN algorithms, the improved D3QN algorithm exhibited profit increases ranging from 11.6% to 1.7% across multiple cycles of experiments, thereby confirming its economic advantages.

Furthermore, the primary objective of this study is to investigate the optimal scheduling problem within a virtual power plant and to construct a model for the game between the virtual power plant and its internal units. Against this backdrop, the research contribution lies in the development of an internal game model for the virtual power plant and the application of an enhanced D3QN algorithm integrated with real-time electricity prices to solve the model. Ultimately, the case study demonstrates the economic feasibility of the model and the effectiveness of the algorithm.

The study also reveals the potential of demand-side management achieved through intelligent scheduling strategies, further validating the application value of deep reinforcement learning technologies in the field of energy system optimization. Future work will explore the integration of more real-time datasets and algorithm adjustments to further enhance the operational efficiency and responsiveness of the VPP.

Author Contributions

Conceptualization, W.Z.; Methodology, W.Z.; Validation, C.H. and H.W.; Formal analysis, H.W.; Writing—original draft, W.Z.; Writing—review & editing, C.H., H.W., H.Q. (Hanhan Qian) and H.Q. (Hui Qi); Visualization, H.Q. (Hui Qi); Supervision, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the PetroChina Research Project “Research on the Design and Methods of Virtual Power Plant Participation in the Electricity Market Mechanism” (Project No. B6120922000L).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

All authors were employed by Anhui Power Exchange Center Co., Ltd. They declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Yuan, T.; Hou, X.; Lian, J. Two-layer two-stage scheduling optimization model for distributed energy system in active distribution network. Sci. Technol. Eng. 2023, 23, 1978–1983. [Google Scholar]
Guan, S.; Wang, X.; Jiang, C.; Qian, G.; Zhang, M.; Lü, R.; Guo, M. Classification and Aggregation of Controllable Loads Based on Different Responses and Optimal Bidding Strategy of VPP in Ancillary Market. Power Syst. Technol. 2022, 46, 933–944. [Google Scholar]
Liu, S.; Ai, Q.; Zheng, J.; Wu, R. Bi-level Coordination Mechanism and Operation Strategy of Multi-time Scale Multiple Virtual Power Plants. Proc. CSEE 2018, 38, 753–761. [Google Scholar]
Development and Reform Commission of Zhejiang Province. Notice on Carrying out Power Demand Response in 2021[EB/OL]. [2021-06-01]. Available online: https://fzggw.zj.gov.cn/art/2021/6/8/art_1229629046_4906648.html (accessed on 19 April 2024).
Gough, M.; Santos, S.F.; Lotfi, M.; Javadi, M.S.; Osório, G.J.; Ashraf, P.; Castro, R. Operation of a Technical Virtual Power Plant Considering Diverse Distributed Energy Resources. IEEE Trans. Ind. Appl. 2022, 58, 2547–2558. [Google Scholar] [CrossRef]
Duan, C.; Bharati, G.; Chakraborty, P.; Chen, B.; Nishikawa, T.; Motter, A.E. Practical Challenges in Real-Time Demand Response. IEEE Trans. Smart Grid 2021, 12, 4573–4576. [Google Scholar] [CrossRef]
Liang, Z.; Alsafasfeh, Q.; Jin, T.; Pourbabak, H.; Su, W. Risk-constrained optimal energy management for virtual power plants considering correlated demand response. IEEE Trans. Smart Grid 2019, 10, 1577–1587. [Google Scholar] [CrossRef]
Huang, W.; Su, Z.; Liang, X.; Chen, T.; Wang, L.; Zhou, L. Optimal operation strategy of virtual power plant considering adjustable market and external demand response. Electr. Power 2023, 56, 156–163. [Google Scholar]
Wei, W.; Chen, Y.; Liu, F.; Mei, S.; Tian, F.; Zhang, X. Agent pricing strategy and electric vehicle charging management in intelligent community based on master-slave game. Power Grid Technol. 2015, 39, 939–945. [Google Scholar]
Yu, M.; Hong, S.H. A real-time demand-response algorithm for smart grids: A Stackelberg game approach. IEEE Trans. SmartGrid 2016, 7, 879–888. [Google Scholar] [CrossRef]
Salyani, P.; Abapour, M.; Zare, K. Stackelberg based optimal planning of DGs and electric vehicle parking lot by implementing demand response program. Sustain. Cities Soc. 2019, 51, 101743. [Google Scholar] [CrossRef]
Yu, M.; Hong, S.H. Incentive-based demand response considering hierarchical electricity market: A Stackelberg game approach. Appl. Energy 2017, 203, 267–279. [Google Scholar] [CrossRef]
Liu, J.; Chen, J.; Wang, X.; Zeng, J.; Huang, Q. Research on Energy Management and Optimization Strategy of micro-energy networks based on Deep Reinforcement Learning. Power Grid Technol. 2020, 44, 3794–3803. [Google Scholar]
Gong, K.; Wang, X.; Deng, H.; Jiang, C.; Ma, J.; Fang, L. Optimal operation method of energy storage system of optical storage combined power station based on deep reinforcement learning in spot market environment. Power Grid Technol. 2012, 46, 3365–3377. [Google Scholar]
Yang, Q.L.; Wang, G.; Sadeghi, A.; Giannakis, G.B.; Sun, J. Two-time scale voltage control in distribution grids using deep reinforcement learning. IEEE Trans. Smart Grid 2020, 11, 2313–2323. [Google Scholar] [CrossRef]
Wang, B.; Li, Y.; Ming, W.Y.; Wang, S. Deep reinforcement learning method for demand response management of interruptible load. IEEE Trans. Smart Grid 2020, 11, 3146–3155. [Google Scholar] [CrossRef]
Li, X.; Zhao, C.; Cheng, Z.; Xu, H. Energy coordination optimization of microgrid based on MD3QN algorithm. J. Anhui Univ. (Nat. Sci. Ed.) 2019, 48, 46–52. [Google Scholar]
Wang, S.; Feng, T.; Sun, N.; He, K.; Li, J.; Yang, C.; Cui, H. Virtual power plant optimal scheduling strategy considering electric-gas-thermal coupling and demand response. Electr. Power 2019, 57, 101–114. [Google Scholar]
Peng, X.; Xu, H.; Jiang, L.; Rao, N.; Song, B. A deep reinforcement learning communication jamming resource allocation algorithm fused with noise network. J. Electron. Inf. Technol. 2022, 45, 1043–1054. [Google Scholar]

Figure 1. VPP frame diagram.

Figure 2. Master–slave game frame.

Figure 3. D3QN algorithm structure diagram.

Figure 4. Average electricity generation of renewable energy.

Figure 5. Price Curve of the Virtual Power Plant for Selling Electricity to Residents.

Figure 6. Air conditioning load power at different set temperatures. (a) Temperature variation indoor and outdoor with air conditioning set to 26 °C; (b) Power curve of the air conditioning with setting at 26 °C; (c) Temperature variation indoor and outdoor with air conditioning set to 24 °C; (d) Power curve of the air conditioning with setting at 24 °C; (e) Temperature variation indoor and outdoor with air conditioning set to 22 °C; (f) Power curve of the air conditioning with setting at 22 °C.

Figure 7. Change curve of reward value under different algorithms.

Figure 8. VPP 7-day profit comparison chart.

Figure 9. Iterative convergence diagram of VPP to purchase price of gas turbine.

Figure 10. Virtual power plant internal scheduling results. (a) Gas turbine operation plan; (b) Battery operation plan; (c) Power grid purchase and sale plan; (d) Optimal power purchase price.

Table 1. Improved D3QN game algorithm.

input: episodes,

γ

,

θ

,

θ_{v}

,

θ_{A}

,

θ^{'}

,

{θ_{v}}^{'}

,

{θ_{A}}^{'}

.

exportation: action

E_{t}^{+}

,

E_{t}^{-}

,

e_{t}^{+}

,

e_{t}^{-}

,

λ_{i t}

,

q_{i t}

1. initialize

Q (•)

,

y_{t}

2. for every episode in episodes

3. for step

t = 1

to

T

do

4. The leader takes action

a_{t}^{v p p}

based on

s_{t}

;

5. The follower takes action

a_{t}

according to

s_{t}

and

a_{t}^{v p p}

;

6. Observe

s_{t + 1}

;

7. Calculate

R_{t}

by (15), (16);

8. Renewal Q-value by (17), (18);

9.

s_{t} \leftarrow s_{t + 1}

10.

t = t + 1

11. end for

12. Every N time steps

θ

,

θ_{v}

,

θ_{A}

\to

θ^{'}

,

{θ_{v}}^{'}

,

{θ_{A}}^{'}

13. return action

a_{t}

14. end for

Table 2. Probability density function parameter.

Parameter	Value	Parameter	Value	Parameter	Value
$μ_{1}$	7	$σ_{1}$	0.5	$ρ_{1}$	0.3
$μ_{2}$	7.5	$σ_{2}$	0.5	$ρ_{2}$	0.1
$μ_{3}$	12.5	$σ_{3}$	0.8	$ρ_{3}$	0.15
$μ_{4}$	18	$σ_{4}$	0.9	$ρ_{4}$	0.15
$μ_{5}$	20	$σ_{5}$	1.2	$ρ_{5}$	0.3

Table 3. Environmental parameters of virtual power plant system.

Parameter	Value
Coefficient of charging efficiency $η_{c}$	0.9
Discharge efficiency coefficient $η_{d}$	0.9
Maximum storage capacity of battery $B_{\max}$	500 kW
Maximum charging power of a battery $c_{\max}$	500 kW
Maximum discharge power of a battery $D_{\max}$	800 kW
The number of air conditioning loads $N_{l o a d s}$	2000
Water heater load quantity $N_{t c l s}$	2000
Distributed energy generation cost value $p_{\cos t}$	69.8 CNY/MWh
Cost of purchasing electricity from the grid $p_{t}^{b}$	650 CNY/MWh

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, W.; He, C.; Wang, H.; Qian, H.; Lin, Z.; Qi, H. Optimal Operation of Virtual Power Plants Based on Stackelberg Game Theory. Energies 2024, 17, 3612. https://doi.org/10.3390/en17153612

AMA Style

Zhang W, He C, Wang H, Qian H, Lin Z, Qi H. Optimal Operation of Virtual Power Plants Based on Stackelberg Game Theory. Energies. 2024; 17(15):3612. https://doi.org/10.3390/en17153612

Chicago/Turabian Style

Zhang, Weishi, Chuan He, Haichao Wang, Hanhan Qian, Zhemin Lin, and Hui Qi. 2024. "Optimal Operation of Virtual Power Plants Based on Stackelberg Game Theory" Energies 17, no. 15: 3612. https://doi.org/10.3390/en17153612

APA Style

Zhang, W., He, C., Wang, H., Qian, H., Lin, Z., & Qi, H. (2024). Optimal Operation of Virtual Power Plants Based on Stackelberg Game Theory. Energies, 17(15), 3612. https://doi.org/10.3390/en17153612

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Operation of Virtual Power Plants Based on Stackelberg Game Theory

Abstract

1. Introduction

2. Framework of Virtual Power Plant with Energy Storage

2.1. Distributed Generation Units

2.2. Micro Gas Turbine Power Generation

2.3. Constant-Temperature Control Load

2.3.1. Air Conditioning Load Model

2.3.2. Water Heater Load Model

2.4. Energy Storage Battery Model

3. Real-Time Interactive Stackelberg Game Model

3.1. Stackelberg Game Framework

3.2. Objective Function Design

4. Model Solution Based on Deep Reinforcement Learning

4.1. Problem Description

4.1.1. State Space

4.1.2. Action Space

4.1.3. Reward Function

4.2. Designing the Improved D3QN Algorithm

5. Example Analysis

5.1. Scenario Description

5.2. Algorithm Performance Analysis

5.3. Analyzing the Results of Reinforcement Learning Optimization Scheduling

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI