1. Introduction
In order to further enhance competitiveness, in recent years the bilateral spot electricity market (EM) has been introduced and utilized to improve restructuring in the power industry of many countries [
1]. Moreover, increasingly prominent global environment and energy issues make the development of renewable energy resources highly valued by governments of many countries alongside the reform of the power industry [
2,
3]. Considering renewable resource penetration, these highly random, intermittent, and non-dispatchable power resources make it more difficult to develop a proper EM modeling approach, which is a necessary tool for decision-making analysis, market simulation, relevant policy design analysis, etc. [
4,
5,
6].
In a bilateral spot EM with renewable power penetration, non-renewable power generation companies (NRGenCOs) and distribution companies (or retailers or large consumers; for the sake of convenience, we call all of them DisCOs) must bid in this stochastically fluctuating environment of renewable power generation in order to improve their own profits. The independent system operator (ISO) must clear the market, which means to decide the scheduled power result of every NRGenCO, RPGenCO (renewable power generation company), DisCO and the marginal price of every node under constraints of system balance, congestion, generating limitation, etc. in order to improve social welfare (SW). The aim of this paper is to apply the Gradient Descent Continuous Actor-Critic (GDCAC) algorithm for solving bilateral spot EM modeling problems considering renewable power penetration.
Generally speaking, EM modeling approaches can be divided into two categories: game-based models and agent-based models. In terms of game-based models, [
7,
8,
9] have established EM models based on SFE (supply function equilibrium, [
7]), multi-level parametric linear programming ( [
8]), and static game model ([
9], respectively) to find the Nash Equilibrium (NE) points in EM bidding. Similar studies using game-based models can also be seen in [
10,
11,
12,
13,
14,
15]. However, game-based EM models have the following shortcomings [
2,
16]: (1) the mathematical forms of some game-based EM modeling approaches are sets of nonlinear equations that are difficult to solve or have no solution; (2) there are many participants bidding in EM; some game-based EM modeling approaches result in repeatedly solving a multi-level mathematical programming model for every participant, the computational complexity of which limits the application in more realistic situations; and (3) participants or players in many game-based EM modeling approaches need common knowledge about other players’ costs or revenue functions, etc., which are hard to obtain in reality.
In order to overcome the deficiencies mentioned above and make EM modeling approaches more applicable in practice, some agent-based EM modeling approaches have been proposed. In a spot EM, the agent can be referred to market participants with adaptive learning ability (e.g., generation companies (GenCOs) in unilateral EM; GenCOs and DisCOs in bilateral EM). EM modeling approaches based on the concept of agent are called agent-based EM modeling approaches, in which the agent can adjust the bidding strategy dynamically in the interaction with the market environment according to its accumulated experience, in order to maximize profit. Common agent-based EM models are: the Q-learning-based EM model proposed in [
16], the simulated annealing Q-learning-based EM model proposed in [
17], the Roth–Erev reinforcement learning-based EM test bed (called MASCEM: Multi-Agent Simulator of Competitive Electricity Markets) proposed in [
18], etc. Similar studies on agent-based EM modeling approaches can also be seen in [
19,
20,
21,
22,
23]. It can be seen from [
16,
17,
18,
19,
20,
21,
22,
23] that: (1) most agent-based EM modeling approaches do not need to set up nonlinear equations and repeatedly solve multi-level mathematical programming model for every agent, so the computational complexity of these models is significantly lower than that of game-based EM models; (2) the agent in EM needs no common knowledge about other agents’ costs or revenue functions, etc. when adjusting bidding strategies to improve profit. However, in [
16,
17,
18,
19,
20,
21,
22,
23], both the EM environment state and agent’s action (bidding strategy) spaces are assumed as discrete, which means the agent can hardly obtain the globally optimal bidding strategy to maximize profit [
24]. In the study of Lau et al. [
25], a modified Roth–Erev reinforcement learning algorithm was proposed to model GenCOs’ strategic bidding behaviors in continuous state and action spaces, where the superiority of the proposed spot EM model comparing to simulated annealing Q-learning and variant Roth–Erev reinforcement learning EM models was proven, but the proposed EM model in [
25] has not taken the renewable power penetration and bilateral bidding environment into consideration.
Recently, studies have taken renewable power penetration into account. Sharma et al. [
26] and Vilim et al [
27] point out that RPGenCOs (such as wind and solar photovoltaic) often participate in the spot EM as “price takers”, so the production level is therefore the only bidding parameter. Kang et al. [
28] hold that with renewable power penetration, other dispatchable EM participants’ (e.g., NRGenCOs) strategic behaviors are significantly affected by these highly random, intermittent, and non-dispatchable power resources, which in turn changes the market clearing price and scheduled power results. Dallinger et al. [
29], by combining the agent-based EM model with a stochastic model, have studied the impact of a kind of load with demand-price elasticity but no strategic bidding ability on market price in spot EM with renewable power penetration, which actually is still within the range of unilateral EM because the demands in [
29] cannot be considered strategic agents. In the study of Miadreza et al. [
30], a heuristic dynamic game-based EM model considering renewable power penetration is proposed to study the market power of NRGenCOs. Reeg et al. [
31] studied the policy design problem to foster the integration of renewable energy sources into EM by using an agent-based approach. Haring et al. [
32] proposed a multi-agent Q-learning approach to study the effects of renewable power penetration and demand side participation on spot EM. Gabriel et al. [
33] modified the MASCEM test bed by considering renewable power penetration. Abrell et al. [
34] used the stochastic optimization model to study the effect of the random renewable power output on Nash Equilibrium (NE) in unilateral hour-ahead EM. Zhao et al. [
35] estimated the strategic behaviors of NRGenCOs in unilateral hour-ahead EM with renewable power penetration by using a stochastic optimization model. Zou et al. [
36] compared different NEs obtained in a unilateral EM game under different proportions in the power structure. Similar studies considering renewable power penetration in EM modeling can also be seen in [
2,
37,
38,
39]. However, those non-agent-based EM models considering renewable power penetration mentioned above [
30,
34,
35,
36], etc. more or less have the same limits as game-based EM models. Moreover, those agent-based EM models considering renewable power penetration mentioned above [
29,
31,
32,
33,
37,
38,
39] cannot solve the contradiction between the reality of continuous state and action spaces in EM and the “curse of dimensionality”.
Mohammal et al. [
2] point out that in a spot EM with renewable penetration, when every agent (in [
2], NRGenCOs are considered as agents) bids in EM in order to maximize profit, we ought to consider the predicted power output of every RPGenCO, which is a continuous random variable. Hence, in [
2], the fuzzy Q-learning algorithm was applied for solving the unilateral hour-ahead EM modeling, in which the EM state space is made continuous but the action set of every NRGenCO is still assumed to be a discrete, scalar one. Moreover, it was verified in [
2] that the fuzzy Q-learning approach is more applicable in EM modeling in terms of improving an agent’s obtained profit and the overall SW, etc., compared with other agent-based approaches such as Q-learning.
This paper pays attention to the problem of bilateral hour-ahead EM modeling considering renewable power penetration, and the Gradient Descent Continuous Actor-Critic (GDCAC) algorithm [
40] instead of the fuzzy Q-learning approach applied in [
2] is adopted in our paper. The GDCAC algorithm is a modified reinforcement learning algorithm (proposed in [
40]) that can solve Markov decision-making problems with continuous state and action spaces. Hence, in this paper we propose a GDCAC-based bilateral hour-ahead EM model considering renewable power penetration, by which the impact of renewable power output on hourly equilibrium results will be examined. In addition, the comparison of our proposed model with that proposed in [
2] will be implemented under the same conditions in the simulation section of this paper.
The rest of this paper is organized as follows: In
Section 2 the multi-agent bilateral hour-ahead EM model considering renewable power penetration is explained.
Section 3 and
Section 4 describe the detailed procedures for applying the GDCAC approach for EM modeling.
Section 5 evaluates and explores the performance of our proposed method and the impact of renewable power output on hourly equilibrium results, based on a case study.
Section 6 concludes the paper.
2. Multi-Agent Hour-Ahead EM Modeling
In this paper, we take the bilateral hour-ahead EM into consideration. In our proposed EM, for the sake of simplicity, some assumptions and descriptions are listed as follows:
- (1)
Every GenCO (NRGenCO and RPGenCO) has only one generation unit;
- (2)
Similar to [
2], the considered hour-ahead EM is a single period EM, hence each hour every NRGenCO and DisCO sends its bid curve for the next hour to the ISO. However, the proposed single-period EM modeling approach can be extended to a multi-period one such as a day-ahead EM;
- (3)
Each hour, every RPGenCO submits only its own predicted production with bidding price 0 (
$/MW) for the next hour to ISO because of its low marginal cost and the role of “price taker” [
2,
33,
35], and the only strategic players are NRGenCOs [
2] and DisCOs. Therefore, each NRGenCO and DisCO can be considered an agent that adaptively adjusts the bidding strategy in order to maximize profit.
After receiving all agents’ supply and demand bid curves and all RPGenCOs’ predicted production submission in each hour, ISO performs the process of congestion management and sends the market clearing results, including power schedules and prices, to all market participants (NRGenCOs, RPGenCOs, and DisCOs). The pricing mechanism in the market clearing model is locational marginal price (LMP), which is popular in most developed countries.
A flowchart for describing how the considered bilateral hour-ahead EM works is shown in
Figure 1:
For the next hour
t, the supply bid curve submitted by NRGenCO
i (
i = 1, 2, …,
Ng1) to ISO in hour
t-1 can be formulated as [
13]:
where,
is the power production (MW) and bidding strategy ratio of NRGenCO
i for the next hour
t, respectively. NRGenCO
i can change its bid curve by adjusting its parameter
.
The marginal cost function of NRGenCO
i is:
where,
,
represent the slope and intercept parameters, respectively.
For the next hour
t, the demand bid curve submitted by DisCO
j (
j = 1, 2, …,
Nd) to ISO in hour
t-1 can be formulated as [
13]:
where,
is the power demand (MW) and bidding strategy ratio of DisCO
j for the next hour
t, respectively. DisCO
j can change its bid curve by adjusting its parameter
.
The marginal revenue function of DisCO
j is:
where
and
represent the slope and intercept parameters, respectively.
In order to generate the LMPs of all nodes as well as the corresponding supply and demand power schedules for the next hour
t, ISO must solve the congestion management model as follows [
41]:
where
Ng1 is the number of NRGenCOs,
Ng2 is the number of RPGenCOs, and
Nd is the number of DisCOs. Equation (5) shows that the objective of ISO is to pursue the maximization of social welfare. Equation (6) represents the power balance constraint of the whole system; Equations (7)–(9) represent the power flow constraints in each transmission line
l [
41]. In this paper, it is assumed that the power production of RPGenCO
v (
v = 1, 2, …,
Ng2) for hour
t, which is represented as
, is an exogenous stochastic parameter in our proposed congestion management model.