Day-Ahead Market Modeling for Strategic Wind Power Producers under Robust Market Clearing

Zhao, Huiru; Wang, Yuwei; Zhao, Mingrui; Tan, Qingkun; Guo, Sen

doi:10.3390/en10070924

Open AccessArticle

Day-Ahead Market Modeling for Strategic Wind Power Producers under Robust Market Clearing

School of Economics and Management, North China Electric Power University, Beijing 102206, China

^*

Author to whom correspondence should be addressed.

Energies 2017, 10(7), 924; https://doi.org/10.3390/en10070924

Submission received: 21 March 2017 / Revised: 10 June 2017 / Accepted: 28 June 2017 / Published: 4 July 2017

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, considering real time wind power uncertainties, the strategic behaviors of wind power producers adopting two different bidding modes in day-ahead electricity market is modeled and experimentally compared. These two different bidding modes only provide a wind power output plan and a bidding curve consisting of bidding price and power output, respectively. On the one hand, to significantly improve wind power accommodation, a robust market clearing model is employed for day-ahead market clearing implemented by an independent system operator. On the other hand, since the Least Squares Continuous Actor-Critic algorithm is demonstrated as an effective method in dealing with Markov decision-making problems with continuous state and action sets, we propose the Least Squares Continuous Actor-Critic-based approaches to model and simulate the dynamic bidding interaction processes of many wind power producers adopting two different bidding modes in the day-head electricity market under robust market clearing conditions, respectively. Simulations are implemented on the IEEE 30-bus test system with five strategic wind power producers, which verify the rationality of our proposed approaches. Moreover, the quantitative analysis and comparisons conducted in our simulations put forward some suggestions about leading wind power producers to reasonably bid in market and bidding mode selections.

Keywords:

day-ahead electricity market; wind power producer; wind power uncertainty; robust market clearing; Least Squares Continuous Actor-Critic algorithm

1. Introduction

The day-ahead electricity market (EM) is a crucial component in the EM system [1]. In recent years, wind power resources have experienced an unprecedented growth in the day-ahead EMs worldwide. Studies on wind power bidding in the day-ahead EM with wind power penetration are too numerous to enumerate one by one. References [2,3,4] etc., for reasons such as low marginal cost of wind power producer (WPP) etc., hold that the bidding mode (BM) of a WPP is to only send the independent system operator (ISO) its power output plan for each period of the next day (namely, BM 1). The ISO ensures the wind power accommodation according to every WPP’s day-ahead power output plan, but a WPP should be financially punished when its real time power output deviates from the day-ahead bidding one [3]. References [5,6,7,8,9] etc., based on actual EMs such as PJM (Pennsylvania-New Jersey-Maryland) etc., point out that the BM of a WPP is to provide ISO a bidding curve for each period of the next day (namely, BM 2). A bidding curve consists of bidding price and power output. According to these day-ahead bidding curves provided by WPPs, ISO, within a certain range of forecasted power outputs corresponding to each WPP, can dispatch the power outputs of WPPs in the day-ahead EM. However, a WPP should also be financially punished when its real time power output deviates from the day-ahead scheduled one [9]. In this work, we believed that different BMs adopted by WPPs may lead to different market results such as profits, clearing prices and operation cost of the power system. Hence, one motivation of this work is to experimentally compare those two BMs adopted by WPPs in day-ahead EM.

In addition, owning to the inherent intermittence, fluctuation, and low predictability of wind power output, uncertainties grow significantly with the increasing penetration of wind power resources, which pose major challenges to wind power accommodation in the EM [10]. Therefore, many references propose that the wind power accommodation can be improved by modifying the market clearing model (MCM) corresponding to ISO in day-ahead EM. References [9,11,12,13] applied the stochastic optimization (SO) method in the market clearing procedure. The SO-MCM significantly increases the number of constraints in MCM by generating real time wind power output stochastic scenarios (WPOSSs) based on real time wind power output probability distributions. The market clearing results (scheduled power results and clearing price of every node) can be obtained by optimizing the expected value of the objective function in SO-MCM [9]. This approach takes into account different security constraints of power system under different WPOSSs, and improves, to a certain extent, the wind power accommodation capacity of power system. However, SO-MCM still has the following shortcomings, thereby greatly reducing the feasibility and rationality of this method [14,15]: (1) in practice, the probability distribution of real time wind power output is difficult to obtain; (2) a small number of real time WPOSSs may lead to a reduction of the ability of the power system to resist random real time wind power output deviations from its day-ahead (bidding or scheduled) one; (3) a large number of WPOSSs may significantly increase the computational complexity of the model, thereby resulting in solving difficulties. In order to overcome these abovementioned shortcomings, recently, robust optimization (RO) methods are applied to the construction of power system dispatch models by many studies. Reference [16] proposed a two-stage robust security constraint unit-commitment (SCUC) model. The key idea of this two-stage robust SCUC model is to determine the optimal unit-commitment (UC) solution in the first stage which leads to the least operation cost for the worst wind power output scenario (WPOS) in the second stage. However, this approach is very conservative due to the optimization for the operation cost of the worst WPOS in the second stage. In Reference [17] the authors combined the stochastic and robust approaches using a weight factor in the objective function to address the conservativeness issue. Reference [18] employed the Affine Policy (AP) to formulate and solve the robust security constraint economic dispatch (SCED) model. Reference [19] proposed a robust optimization framework for robust SCUC and robust SCED which repeatedly calculates the UC and ED solutions in the first stage to optimize the operation cost of the basic WPOS but to pass the security test in the second stage. RO-based power system dispatch models do not need the probability distribution of real time wind power output. The number of constraints need not be significantly increased with the increase of the size of uncertainty set [19]. The optimal robust UC and ED solutions can satisfy every unit-wise and system-wise constraint under the worst WPOS [19], which means RO-based power system dispatch models can not only improve wind power accommodation but also maintain low computational complexity so as to promote the application of these models in practice. However, the approaches in [16,17,18,19] cannot be introduced directly for modification of MCM in day-ahead EM because it is not mentioned in those studies how to price the power outputs, loads, reserves and deviations (uncertainties). Recently, in study of Ye et al. [20], this shortcoming is made up by combining cost causation principle and locational marginal price (LMP) in robust SCUC and robust SCED modeling approaches so as to successfully modify the MCM in day-ahead EM by using a RO method. Therefore, inspired by [20], in this work, no matter which BM WPPs adopt in day-ahead EM bidding, the MCM corresponding to ISO will be modified by using a RO-method in order to make the power system accommodate any deviation caused by real time wind power uncertainties within a certain range.

Finally, a WPP participating in EM bidding aims at profit maximization. In a day-ahead EM, there are many participants competing with each other. In addition to real time wind power uncertainties, WPPs, like other conventional generation companies (GenCOs), are faced with complex market environment conditions, such as imperfect and incomplete information. Hence, there are many similarities between EM modeling approaches with and without WPPs participating in market bidding. EM modeling approaches proposed in [21,22,23,24,25] are based on game theory. EM modeling approaches proposed in [26,27,28,29,30,31] are based on machine learning algorithms. Recently, many relevant studies take renewable energy (i.e., wind power) bidding into account. Reference [32] supposed that a WPP strategically bids with BM 2 in day-ahead EM, and put forward a closed-form analysis on WPP’s strategic behavior based on the Stackelberg game model. Reference [33] proposed an autoregressive integrated moving average (ARIMA) model to obtain the optimal bidding strategy for a WPP who bids with BM 2 in day-ahead EM. The authors of [34] studied the behaviors of strategic WPPs bidding in EM with BM 1 based on Cournot game model. In Reference [35] an imbalance cost minimization bidding strategy for a BM 1 adopted WPP through forecasting the real time wind power probability distribution functions was proposed. Reference [36] analyzed the strategic behavior of a BM 1 adopted WPP in day-ahead EM based on Roth-Erev reinforcement learning algorithm. In [37], a stochastic programming problem was proposed for obtaining the optimal offering and operating strategy for a large wind-storage system adopting BM 2. Reference [38] considered the uncertainty on electricity price through a set of exogenous scenarios and solved the bidding problem of a BM 1 adopted thermal-wind power producer by using a stochastic mixed-integer linear programming approach. The authors in reference [39] proposed a two-stage stochastic bidding model based on kernel density estimation (KDE) for a BM 2 adopted WPP to obtain the optimal day-ahead bidding strategy. The approaches in [24,25,33,35,37,38,39] resulted in repeatedly solving multi-level mathematical programming models for every participant, the computational complexities of which limit their applications in more realistic situations. The methods proposed in [21,22,23,32,34] produced sets of nonlinear equations which are difficult to solve or have no solutions. The approaches in [26,27,28,29,30,31,36] belong to the agent-based EM modeling approaches, in which every bidding participant is considered as an agent who has the ability of adaptive learning so as to improve its profit during process of repeated bidding in market. Table-based reinforcement learning (TBRL) algorithms are usually proposed in depicting agents’ adaptive learning approaches in EM bidding, such as the Q-learning-based approach proposed in [26,27], simulated annealing Q-learning-based approach proposed in [28], Roth-Erev reinforcement learning-based EM test bed (called Multi-Agent Simulator of Competitive Electricity Markets or MASCEM) proposed in [29,36], SARSA (state-action-reward-state-action)-based approach proposed in [30], fuzzy Q-learning-based approach proposed in [31], etc. By using agent-based EM modeling approaches, it is neither necessary to repeatedly solve multi-level mathematical programming models for every agent, nor to establish sets of nonlinear equations which are difficult to solve or have no solutions. Low computational complexity and low reliance on common knowledge make these approaches more applicable in EM modeling [26]. However, in TBRL algorithms, both an agent’s action (i.e., bidding strategy) and state (i.e., market environment) sets must be assumed as discrete, otherwise it will cause the problem of “curse of dimensionality”, which does not conform to the actual situation of the day-ahead EM and hinders a strategic WPP to obtain its globally optimal bidding strategies no matter which BM it adopts. So far as we know, there is no reasonable way to solve this issue in the published literature studying wind power and other renewable energy bidding in EMs. Recently, Chen [40] proposed a modified reinforcement learning (RL) algorithm called Least Squares Continuous Actor-Critic (LSCAC) algorithm which can make both the action and state sets continuous without causing the problem of “curse of dimensionality”. Therefore, another motivation of this work was to apply for the first time the LSCAC algorithm for modeling the strategic bidding behaviors of WPPs in day-ahead EMs. On the one hand, this approach properly solved the contradiction of making every agent’s action and state sets continuous and causing the “curse of dimensionality” problem. On the other hand, it can provide a reasonable EM test bed to simulate and experimentally compare those two BMs adopted by WPPs in a day-ahead EM.

Therefore, the main novelty of this paper can be summarized as to firstly propose the LSCAC-based day-ahead EM modeling approach for strategic WPPs under robust market clearing conditions. The purpose for employing robust MCM is to improve wind power accommodation by reconstructing the market clearing mechanism. The motivation of proposing the LSCAC-based EM modeling approach is to assist strategic WPPs to make more appropriate bidding decisions so as to improve both the WPPs’ profits and the economic efficiency of the whole market compared with TBRL-based approaches. Moreover, comparison between different BMs can offer some suggestions about improving wind power resources development and market economic efficiency.

The rest of this paper is organized as follows: in Section 2, the concrete mathematic formulations of WPPs’ different BMs and the robust day-head MCM are proposed. Section 3 puts forward the proposed LSCAC-based day-ahead EM modeling approach for WPPs. Section 4 conducts the simulations and comparisons. Section 5 concludes the paper.

2. Problem Description

2.1. Model Assumptions

According to [41], the day-ahead EM is actually a dynamic complex system in which dynamic (direct and indirect) interactions exist among all participants. When considering WPPs’ strategic behaviors in the day-ahead EM, on one hand, the MCM of the ISO should be modified in order to accommodate the deviations caused by real time wind power output uncertainties [10,19,20], while on the other hand, EM modeling approaches should be proposed to help with obtaining WPPs’ reasonable strategies under different BMs.

In this section, strategic WPPs’ different BMs and the robust day-head MCM are mathematically formulated. For the sake of simplicity and without loss of generality, we make some assumptions listed as follows before conducting any further research:

Like [9], in our study, the problem of SCUC is assumed to have been solved exogenously in advance, and consequently, the UC constraints (i.e., ramping rates, startup costs/times, minimum down-times) are not considered. However, the proposed single period EM modeling approach containing single period robust MCM can be extended to a multi-period one. Moreover, network loss is ignored, and the shift factor matrix is constant.
Because we mainly consider the strategic behaviors of WPPs in a day-ahead EM, the bidding strategy of any other conventional generators is neglected [9,32,33]. We also assume that load in any bus is inelastic without load shedding [9].
Uncertainties are only caused by WPPs, The uncertainty-set can be truly formulated by the ISO [10,19,20]. The marginal cost of every WPP is neglected [3,31]. Hence, when a WPP is in BM 2, there is only one bidding price in this WPP’s bidding curve.

2.2. Different BMs of WPP

Real-time wind power output cannot be accurately predicted from the day-ahead horizon, which forces WPPs and the ISO to carefully consider these strong uncertainties when bidding and implementing the day-ahead market clearing, respectively. However, the ISO (WPP) can predict the real time wind power output interval of a certain WPP more accurately than the real time wind power output prediction. If the number of WPPs in a power system is N_W, consistent with [10,19,20], the uncertainty set corresponding to those N_W WPPs’ real time power outputs can be modeled as:

\begin{array}{l} U = {P w = (P w_{1}, P w_{2}, ..., P w_{i}, ..., P w_{N_{W}}) : \\ l w_{i} \leq P w_{i} \leq u w_{i}, \forall i, \sum_{i = 1}^{N_{W}} \frac{| 2 P w_{i} - (u w_{i} + l w_{i}) |}{(u w_{i} - l w_{i})} \leq Λ} \end{array}

(1)

where, i is the index for WPP,

P w_{i}

represents the actual power output of the i-th WPP,

l w_{i}

,

u w_{i}

(

0 \leq l w_{i} \leq u w_{i}

) represent the lower and upper bounds of

P w_{i}

,

Λ

is the budget parameter and assumed as an integer [20]. Moreover, accurate prediction of

l w_{i}

and

u w_{i}

can provide a valuable reference for WPP_i’s bidding decision-making. Therefore:

When BM 1 is adopted by WPP_i, the only bidding parameter is its power output plan (bidding power output) $P w_{i}^{b}$ which must satisfy Equation (2):

$l w_{i} \leq P w_{i}^{b} \leq u w_{i}, \forall i$

(2)

Under this BM, the bidding strategy of WPP_i is to adjust the value of $P w_{i}^{b}$ .
When BM 2 is adopted by WPP_i, it provides to ISO the bidding curve which is as follows:

$ρ (P {w^{b}}_{i}) = ρ_{i}^{b}, l w_{i} \leq P w_{i}^{b} \leq u w_{i}, ρ_{l o w, i} \leq ρ_{i}^{b} \leq ρ_{u p p, i}$

(3)

where, $ρ_{i}^{b}$ is WPP_i’s bidding price. $ρ_{l o w, i}$ , $ρ_{u p p, i}$ ( $0 \leq ρ_{l o w, i} \leq ρ_{u p p, i}$ ) represent the lower and upper limits of $ρ_{i}^{b}$ . Under this BM, the bidding strategy of WPP_i is to adjust the value of $ρ_{i}^{b}$ .

2.3. Robust Market Clearing Models under Different BMs of WPPs

According to Section 2.1, because the problem of SCUC is assumed to have been solved exogenously in advance, we propose a single period day-head robust MCM which is mainly focused on an ED procedure. The purpose of doing so is to make the power system accommodate any wind power deviation caused by real time wind power output uncertainty within a certain uncertainty set. With this robust MCM, the ISO, based on the day-ahead biddings (curves) of all participants, desires to get the optimal robust ED solution in the base-case scenario [19]. Under the optimal robust ED solution, the ISO can re-dispatch the flexible resources, such as adjustable conventional generators with fast ramping capabilities, etc., to follow the load when a deviation occurs. The method of obtaining an optimal robust ED solution in the base-case scenario is significantly less conservative than that in the worst-case scenario [16]. Moreover, this robust MCM can reasonably generate prices for power outputs, loads, reserves and deviations which are the byproducts of the optimal robust ED solution [10,20].

The mathematical formulation of this robust MCM can be described as follows:

(Problem) \min_{P_{j}, P w_{i}^{d i s p}, \forall j, i} \sum_{j = 1}^{N_{G}} c_{j} P_{j} + \sum_{i = 1}^{N_{W}} ρ_{i}^{b} P w_{i}^{d i s p}

(4)

s . t . \sum_{j = 1}^{N_{G}} P_{j} + \sum_{i = 1}^{N_{W}} P w_{i}^{d i s p} = \sum_{m = 1}^{N_{b u s}} d_{m}

(5)

P_{j} \leq P_{j}^{\max}, \forall j

(6)

- P_{j} \leq - P_{j}^{\min}, \forall j

(7)

\sum_{m = 1}^{N_{b u s}} Γ_{l, m} \sum_{i, j \in Θ (m)} (P_{j} + P w_{i}^{d i s p} - d_{m}) \leq F_{l}, \forall l

(8)

- \sum_{m = 1}^{N_{b u s}} Γ_{l, m} \sum_{i, j \in Θ (m)} (P_{j} + P w_{i}^{d i s p} - d_{m}) \leq F_{l}, \forall l

(9)

and Ω = {P_{G W} = (P_{1}, P_{2}, ..., P_{j}, ..., P_{N_{G}}, P w_{1}^{d i s p}, P w_{2}^{d i s p}, ..., P w_{i}^{d i s p}, ..., P w_{N_{W}}^{d i s p}) : \forall P w \in U, \exists Δ P = (Δ P_{1}, Δ P_{2}, ..., Δ P_{j}, ..., Δ P_{N_{G}}) such that \sum_{j = 1}^{N_{G}} Δ P_{j} + \sum_{i = 1}^{N_{W}} (P w_{i} - P w_{i}^{d i s p}) = 0

(10)

P_{j} + Δ P_{j} \leq P_{j}^{\max}, \forall j

(11)

- P_{j} - Δ P_{j} \leq - P_{j}^{\min}, \forall j

(12)

Δ P_{j} \leq r_{j}^{u}, \forall j

(13)

- Δ P_{j} \leq r_{j}^{d}, \forall j

(14)

\sum_{m = 1}^{N_{b u s}} Γ_{l, m} \sum_{i, j \in Θ (m)} (P_{j} + Δ P_{j} + P w_{i} - d_{m}) \leq F_{l}, \forall l

(15)

- \sum_{m = 1}^{N_{b u s}} Γ_{l, m} \sum_{i, j \in Θ (m)} (P_{j} + Δ P_{j} + P w_{i} - d_{m}) \leq F_{l}, \forall l

(16)

where, j and N_G represent the index and number of conventional generators that are determined as being the state of start-up in advance, respectively. Because it is assumed that the UC solution is fixed exogenously, N_G can also be considered as the number of conventional generators in power system for simplicity.

P_{j}

is the dispatched power output for the j-th generator,

c_{j}

is the cost coefficient of the j-th generator, m and N_bus represent the index and number of buses in power system. In the basic-case scenario, Equation (5) indicates power balance of the system, Equations (6) and (7) show the power limits of generators, Equations (8) and (9) stand for the transmission constraints of all lines in system. In case of wind power deviations, Equation (10) indicates power balance of the system in re-dispatch, Equations (11) and (12) show the power limits of generators in re-dispatch, Equations (13) and (14) are constraints for power re-dispatch variables

Δ P_{j}

s, Equations (15) and (16) stand for the transmission constraints of all lines in system in re-dispatch.

If BM 1 is adopted by WPPs, then:

$ρ_{i}^{b} = 0, \forall i$

(17)

$P w_{i}^{d i s p} = P {w^{b}}_{i}, \forall i$

(18)

${Equations (4) - (9), (17), (18)} \cup Ω$ constitute the day-ahead robust MCM for ISO under BM 1 (namely, RMCM 1).
If BM 2 is adopted by WPPs, ISO dispatches WPPs’ day-ahead power output schedules which should at least satisfy [9]:

$P w_{i}^{d i s p} \leq u w_{i}, \forall i$

(19)

$- P w_{i}^{d i s p} \leq l w_{i}, \forall i$

(20)

${Equations (4) - (9), (19), (20)} \cup Ω$ constitute the day-ahead robust MCM for ISO under BM 2 (namely, RMCM 2).

2.4. Robust MCM Reformulation

By solving RMCM 1 or RMCM 2, the obtained optimal robust ED solution

P_{G W} \in Ω

is immunized against any uncertainty

\forall P w \in U

[19]. When uncertainty

P w

occurs, deviations caused by

P w

can be accommodated by the power re-dispatch

Δ P

. However, it should be noted that both RMCM 1 and RMCM 2 cannot be directly solved. Similar to [19,20], reformulation is adopted to solve the two RMCMs. In order to facilitate the description, reformulation of

{Equations (4) - (9)} \cup Ω

which contains a master problem (MP) and a sub-problem (SP) will first be established as follows:

(MP) \min_{P_{j}, P w_{i}^{d i s p}, \forall j, i} \sum_{j = 1}^{N_{G}} c_{j} P_{j} + \sum_{i = 1}^{N_{W}} ρ_{i}^{b} P w_{i}^{d i s p}

(21)

s . t . (5) - (9) \sum_{j = 1}^{N_{G}} Δ P_{j k} + \sum_{i = 1}^{N_{W}} (P w_{i k} - P w_{i}^{d i s p}) = 0, \forall k \in κ

(22)

P_{j} + Δ P_{j k} \leq p_{j}^{\max}, \forall j, \forall k \in κ

(23)

- P_{j} - Δ P_{j k} \leq - p_{j}^{\min}, \forall j, \forall k \in κ

(24)

Δ P_{j k} \leq r_{j}^{u}, \forall j, \forall k \in κ

(25)

- Δ P_{j k} \leq r_{j}^{d}, \forall j, \forall k \in κ

(26)

\sum_{m = 1}^{N_{b u s}} Γ_{l, m} \sum_{i, j \in Θ (m)} (P_{j} + Δ P_{j k} + P w_{i k} - d_{m}) \leq F_{l}, \forall l, \forall k \in κ

(27)

- \sum_{m = 1}^{N_{b u s}} Γ_{l, m} \sum_{i, j \in Θ (m)} (P_{j} + Δ P_{j k} + P w_{i k} - d_{m}) \leq F_{l}, \forall l, \forall k \in κ

(28)

and:

(SP) Ψ = \max_{P w \in U} \min_{s^{+}, s^{-}, Δ P} \sum_{i = 1}^{N_{W}} (s_{i}^{+} + s_{i}^{-})

(29)

s . t . \sum_{j = 1}^{N_{G}} Δ P_{j} + \sum_{i = 1}^{N_{W}} (P w_{i} - P w_{i}^{d i s p} + s_{i}^{+} - s_{i}^{-}) = 0

(30)

- s_{i}^{+} \leq 0, \forall i

(31)

- s_{i}^{-} \leq 0, \forall i

(32)

\sum_{m = 1}^{N_{b u s}} Γ_{l, m} \sum_{i, j \in Θ (m)} (P_{j} + Δ P_{j} + P w_{i}^{d i s p} + P w_{i} - P w_{i}^{d i s p} + s_{i}^{+} - s_{i}^{-} - d_{m}) \leq F_{l}, \forall l

(33)

- \sum_{m = 1}^{N_{b u s}} Γ_{l, m} \sum_{i, j \in Θ (m)} (P_{j} + Δ P_{j} + P w_{i}^{d i s p} + P w_{i} - P w_{i}^{d i s p} + s_{i}^{+} - s_{i}^{-} - d_{m}) \leq F_{l}, \forall l (11) - (14)

(34)

where,

κ

is the index set for worst uncertainty points

P w_{k}

s which are dynamically generated in (SP) during the solution procedure.

If BM 1 isadopted by WPPs, Equations (17) and (18) should be added to (MP). If BM 2 is adopted by WPPs, Equations (19) and (20) should be added to (MP). According to References [19,20], the objective function in (SP) contains the summation of non-negative slack variables

s_{i}^{+}

s and

s_{i}^{-}

s, which evaluates the violation associated with the solution from (MP).

s_{i}^{+}

s and

s_{i}^{-}

s can be explained as un-followed uncertainties (i.e., generation shedding etc.) due to system limitations. Hence, to solve (SP) is to find the worst point

P w_{k}

in U given ED solutions. The solution procedure is [19]:

(1): $κ \leftarrow ϕ, k \leftarrow 1, Ψ \leftarrow + \infty,$ define feasibility tolerance $Δ$ ;
(2): while $Ψ \geq Δ$ do
(3): Solve (MP), obtain optimal $P_{j}, P w_{i}^{d i s p}, \forall j, i$ ;
(4): Solve (SP) with $P_{j}, P w_{i}^{d i s p}, \forall j, i$ , get solution $(Ψ, P w_{k})$ ;
(5): $κ \leftarrow κ \cup k, k \leftarrow k + 1$ ;
(6): end while.

2.5. Clearing Price Mechanism

After the convergence of the abovementioned solution procedure, the optimal robust ED solution can be obtained by solving (MP) for the last time. We set

λ

,

β_{j 1}

(

\forall j

),

β_{j 2}

(

\forall j

),

η_{l 1}

(

\forall l

),

η_{l 2}

(

\forall l

) to represent the generalized Lagrange multipliers (GLMs) for Equations (5)–(9), respectively,

μ_{i 1}

(

\forall i

),

μ_{i 2}

(

\forall i

) to represent the GLMs for Equations (19) and (20), respectively, and

λ_{k}

,

β_{j 1 k}

(

\forall j

,

\forall k \in κ

),

β_{j 2 k}

(

\forall j

,

\forall k \in κ

),

ζ_{j 1 k}

(

\forall j

,

\forall k \in κ

),

ζ_{j 2 k}

(

\forall j

,

\forall k \in κ

),

η_{l 1 k}

(

\forall l

,

\forall k \in κ

),

η_{l 2 k}

(

\forall l

,

\forall k \in κ

) to represent the GLMs for Equations (22)–(28), respectively. Consistent with the cost causation principle and LMP calculation method mentioned in [10,20], when solving (MP) for the last time, the clearing price mechanism of our proposed RMCMs can be described as follows:

No matter which BM WPPs adopt, the day-ahead LMP for energy credit and load payment at bus m can be calculated as:

$\begin{array}{l} π_{m} = \frac{\partial L (P_{G W}, Δ P, λ, β, ζ, η)}{\partial d_{m}} = \frac{\partial L (P_{G W}, Δ P, λ, β, μ, ζ, η)}{\partial d_{m}} \\ = λ - \sum_{l} Γ_{l, m} (η_{l 1} - η_{l 2}) - \sum_{l} \sum_{k \in κ} Γ_{l, m} (η_{l 1 k} - η_{l 2 k}) \end{array}$

(35)

where, $L (P_{G W}, Δ P, λ, β, ζ, η)$ is the generalized lagrange function (GMF) for (MP) under BM 1 adopted by WPPs, $L (P_{G W}, Δ P, λ, β, μ, ζ, η)$ is the GMF for (MP) under BM 2 adopted by WPPs.
Defining uncertainty marginal price (UMP) as [20]: the marginal cost of immunizing the next unit increment of uncertainty, then no matter which BM WPPs adopt, for the deviation $P w_{k} - P w^{d i s p}$ corresponding to a worst point $P w_{k}$ , the UMP for reserve credit and deviation payment at bus m is:

$\begin{array}{l} π_{m k} = \frac{\partial L (P_{G W}, Δ P, λ, β, ζ, η)}{\partial ε_{m k}} = \frac{\partial L (P_{G W}, Δ P, λ, β, μ, ζ, η)}{\partial ε_{m k}} \\ = λ_{k} - \sum_{l} Γ_{l, m} (η_{l 1 k} - η_{l 2 k}) \end{array}$

(36)

where, $ε_{m k} = \sum_{i \in Θ (m)} (P w_{i}^{d i s p} - P w_{i k})$ . Moreover, when $π_{m k} > 0$ , it can be illustrated that the direction of power re-dispatch corresponding to worst point $P w_{k}$ at bus m is upward ( $Δ P_{j k} \geq 0, \forall j \in Θ (m)$ ), and when $π_{m k} < 0$ , it can be illustrated that direction of power re-dispatch corresponding to worst point $P w_{k}$ at bus m is downward ( $Δ P_{j k} \leq 0, \forall j \in Θ (m)$ ).

Moreover, the structural differences between RMCM 1 and RMCM 2 may make the primal solutions (

P_{G W}, P w^{d i s p}, Δ P

) and the dual solutions (

λ, β, (μ), α, η

) obtained by solving RMCM 1 and RMCM 2 differ from each other. Therefore, although the clearing price formulas under 2 BMs have no difference according to Equations (35) and (36), different primal-dual solutions obtained by solving RMCM 1 and RMCM 2 still make the obtained clearing prices (

π_{m}

,

π_{m k}

) under 2 BMs different.

In summary, no matter which BM is adopted by WPPs, the estimated profit of WPP_i(

\forall i

) in one day-ahead bidding can be calculated as:

R_{i} = π_{m} P w_{i}^{d i s p} - \sum_{k \in κ} π_{m k} (P w_{i}^{d i s p} - P w_{i k}), i \in Θ (m)

(37)

Hence, the objective of WPP_i(

\forall i

) bidding in day-ahead EM is to maximize

R_{i}

.

3. LSCAC-Based Day-Ahead EM Modeling Approach for WPPs

3.1. Definitions

Although the BM and MCM in a day-ahead EM can be specified by relevant regulators in advance, a strategic WPP in EM still has limited information about other rivals. Owning to this fact of incomplete and imperfect information in the day-ahead EM [41], strategic WPPs must dynamically improve their profits through repeatedly bidding in day-ahead EM, which is actually a dynamic multi-participant decision-making process. This work intends to propose a LSCAC-based day-head EM modeling approach to simulate this dynamic multi-WPP decision-making process. Hence, similar to Reference [41], some necessary definitions are organized as follows:

Agent: we consider every WPP as an agent who, for the purpose of improving its profit, has the adaptive learning ability to dynamically adjust its bidding strategy according to its accumulated experiences through repeated bidding. Hence, the multi-WPP decision-making process can be also considered as multi-agent decision-making process. In our work, LSCAC algorithm is applied to depict this adaptive learning ability and to assist every WPP in bidding decision making.
Iteration: since the market is assumed to be cleared in day-ahead single period basis, we consider each transaction day as an iteration T.
State variable: in iteration T, the LMP and UMPs in bus m cleared in iteration T − 1 are considered as the market environment states for WPP_i connected in bus m ( $i \in Θ (m)$ ), which is because WPP_i ( $i \in Θ (m)$ ) actually has no idea about other market information. Taking $x_{i, T}$ to represent the state variable vector for WPP_i in iteration T, the relationship between $x_{i, T}$ and clearing prices is as follows [20]:

$x_{i, T} = (π_{m, T - 1}, π_{m, T - 1}^{u p}, π_{m, T - 1}^{d o w n}), i \in Θ (m)$

(38)

$π_{m, T - 1}^{u p} = \sum_{k \in κ_{m, T - 1}^{u p}} π_{m k, T - 1}; π_{m, T - 1}^{d o w n} = \sum_{k \in κ_{m, T - 1}^{d o w n}} π_{m k, T - 1}$

(39)

$κ_{m, T - 1}^{u p} = {k : π_{m k, T - 1} \geq 0}; κ_{m, T - 1}^{d o w n} = {k : π_{m k, T - 1} < 0}$

(40)

where, $π_{m, T - 1}$ and $π_{m k, T - 1}$ represent the LMP and the k-th UMP in bus m cleared in iteration T − 1, respectively.
Action variable: in iteration T, the bidding strategy of WPP_i is considered as its action. Taking $a_{i, T}$ to represent the action variable for WPP_i. If WPP_i bids in EM under BM 1, the relationship between $a_{i, T}$ and bidding strategy is as follow:

$a_{i, T} = P w^{b}_{i, T}$

(41)

where, $P w^{b}_{i, T}$ is the bidding output (strategy) of WPP_i in iteration T. If WPP_i bids in EM under BM 2, the relationship between $a_{i, T}$ and bidding strategy is as follows:

$a_{i, T} = ρ_{i, T}^{b}$

(42)

where, $ρ_{i, T}^{b}$ is the bidding price (strategy) of WPP_i in iteration T.
Reward: in iteration T, WPP_i’s reward is:

$r_{i, T} = R_{i, T}$

(43)

where, $R_{i, T}$ is WPP_i’s estimated profit obtained from bidding in iteration T.

3.2. LSCAC Algorithm

In TBRL-based EM modeling approaches [26,27,28,29,30,31,36], both the state and action sets should be assumed as discrete, otherwise the problem of “curse of dimensionality” will be caused so as to significantly hinder agents from improving their profits. However, according to Section 3.1 and [41], in day-ahead EM with many strategic WPPs, both

x_{i, T}

(

\forall i

) and

a_{i, T}

(

\forall i

) are within the continuous, bounded and closed sets (spaces). Therefore, a modified RL algorithm must be applied in day-ahead EM modeling for the study of strategic behaviors of WPPs.

In our work, we apply the LSCAC algorithm to this issue for the first time. The LSCAC algorithm is a modified actor-critic based RL algorithm which can rapidly tackle the dynamic multi-agent decision-making problem with continuous action and state sets. In the LSCAC algorithm, state value function and policy function of every agent are approximated by using linear combinations of basis functions. Linear parameters in state value functions corresponding to agents’ critic parts are updated online by using the temporal difference error (TD(0))-based method, the specific procedure of which can be found in [41]. The online updating procedure of linear parameters in policy functions corresponding to agents’ actor parts is described as follows [40]: by using a linear function, we estimate and repeatedly update in an agent’s actor part an optimal policy function

\hat{I} : X \to A

defined on the continuous state space

X

:

a_{x}^{(o p t i m a l)} = \hat{I} (x) = ϕ {(x)}^{T} ω = \sum_{h = 1}^{n} ϕ_{h} (x) ω_{h} x \in X

(44)

where,

ϕ_{h} : X \to R (h = 1, 2, ..., n)

represents the h-th basis function of state

x \in X

.

A

represents the continuous action set of an agent,

a_{x}^{(o p t i m a l)} \in A

represents the optimal action in face of state

x

. The linear parameter vector

ω

can be described as:

ω = {(ω_{1}, ω_{2}, ..., ω_{n})}^{T} \in R^{n}

.

An agent must generate a corresponding action

a \in A

in face of any state

x \in X

based on the policy maintained and repeatedly updated by its actor part. The policy is actually an action generating model which has the ability of balancing the exploration and exploitation, and can be mathematically formulated as follows [40,41]:

p r o (x, a) = \frac{1}{\sqrt{2 π} σ} \exp {- \frac{1}{2 σ^{2}} {(a - ϕ {(x)}^{T} ω)}^{2}}

(45)

where,

σ

> 0 is a standard deviation parameter which represents the exploring ability of the LSCAC algorithm.

Hence, the MSE function of

ω

is defined as [40,41]:

M S E (ω) = \frac{1}{2} \int_{x \in X} P^{(p r o)} (x) \int_{a \in A} s i g [δ (x, a)] {[ϕ {(x)}^{T} ω - a]}^{2} d a d x

(46)

where,

P^{(p r o)} (x)

is the probability distribution function of

x

under policy pro,

s i g [δ (x, a)]

is the sigmoid function of

δ (x, a)

which means the TD(0) error of selecting action

a

in face of state x. Its formulation is as follows [40,41]:

s i g [δ (x, a)] = \frac{1}{1 + e^{- q δ (x, a)}} \begin{matrix} q > 0 \end{matrix}

(47)

In iteration T, using

δ_{T}

to replace

δ (x_{T}, a_{T})

, formulation of

δ_{T}

is as follows [40,41]:

δ_{T} = r_{T} + γ ϕ {(x_{T + 1})}^{T} θ_{T} - ϕ {(x_{T})}^{T} θ_{T}

(48)

where, linear vector

θ_{T}

is composed of linear parameters in value functions in iteration T [41],

0 \leq γ \leq 1

is a discount factor.

Let the derivative of Equation (46) on

ω

equal to 0, then:

\int_{x \in X} P^{(p r o)} (x) \int_{a \in A} \frac{1}{1 + e^{- q δ (x, a)}} [ϕ {(x)}^{T} ω - a] ϕ (x) d a d x = 0

(49)

It should be noted that the integral formula in the left side of Equation (49) is hard to calculate. If the sample points from iteration 0 to iteration N are

(x_{0}, a_{0}, r_{0}, x_{1}), (x_{1}, a_{1}, r_{1}, x_{2}), ..., (x_{N}, a_{N}, r_{N}, x_{N + 1})

, Equation (46) can be approximately replaced by:

\sum_{T = 0}^{N} \frac{1}{1 + e^{- q δ_{T}}} (ϕ {(x_{T})}^{T} ω - a_{T}) ϕ (x_{T}) = 0

(50)

The reformulation of Equation (50) is:

[\sum_{T = 0}^{N} \frac{1}{1 + e^{- q δ_{T}}} ϕ (x_{T}) ϕ {(x_{T})}^{T}] ω = \sum_{T = 1}^{N} \frac{1}{1 + e^{- q δ_{T}}} a_{T} ϕ (x_{T})

(51)

Define a n-order matrix A_N and a n-dimensional vector b_N, respectively:

A_{N} = \sum_{T = 0}^{N} \frac{1}{1 + e^{- q δ_{T}}} ϕ (x_{T}) ϕ {(x_{T})}^{T}

(52)

b_{N} = \sum_{T = 1}^{N} \frac{1}{1 + e^{- q δ_{T}}} a_{T} ϕ (x_{T})

(53)

hence:

ω = {(A_{N})}^{- 1} b_{N}

(54)

Because

{(A_{N})}^{- 1}

may not exist,

{(A_{N})}^{- 1}

can be approximately replaced by

{(A_{N} + Π I)}^{- 1}

based on the method of ridge regression [40], where

Π

(

Π > 0

) is a smaller constant,

I

is an n-order identity matrix. When N is large, the calculation of parameter vector

ω

in Equation (54) may be unstable. Reference [40] has proposed a new calculation formula for parameter vector

ω

, which is as follows:

ω = {[\frac{1}{N} (A_{N} + Π I)]}^{- 1} (\frac{1}{N} b_{N})

(55)

3.3. The Step-by-Step Procedure of the Proposed Approach

In summary, the step-by-step procedure of LSCAC-based day-ahead EM modeling approach for WPPs (under two BMs) can be described as follows:

(1): Input: basis function vector $ϕ$ : $X \to R^{n}$ , step length factor series ${α_{i, T}}_{T = 0}^{\infty}$ where $\sum_{T = 0}^{\infty} α_{i, T} = \infty$ , $\sum_{T = 0}^{\infty} {(α_{i, T})}^{2} < \infty$ , and values of $σ, q, γ$ (for WPP_i( $\forall i$ )).
(2): T = 0, N = 0, set the iterative termination condition such as the maximum iterations (T_max).
(3): set $θ_{i, T} = θ_{i, 0}$ and $ω_{i, T} = ω_{i, 0}$ , $A_{i, N} = A_{i, 0} = Π_{i} I, Π_{i} > 0$ and $b_{i, N} = b_{i, 0} = 0$ (for WPP_i( $\forall i$ )).
(4): set $x_{i, T} = x_{i, 0}$ (for WPP_i( $\forall i$ )).
(5): In iteration T, generate $a_{i, T} ~ N (ϕ {(x_{i, T})}^{T} ω_{i, T}, σ^{2})$ where $a_{i, T}$ represents $P w^{b}_{i, T}$ or $ρ_{i, T}^{b}$ (for WPP_i( $\forall i$ )) and then ISO implements the robust MCM represented by RMCM 1 or RMCM 2.
(6): After market clearing, WPP_i( $\forall i$ ) obtains the immediate reword r_i,T using Equation (37) and a new market state $x_{i, T + 1}$ which can be generated by the ISO using Equations (35), (36), (38)–(40).
(7): $δ_{i, T} = r_{i, T} + γ ϕ {(x_{i, T + 1})}^{T} θ_{i, T} - ϕ {(x_{i, T})}^{T} θ_{i, T}, \forall i$ .
(8): $θ_{i, T + 1} = θ_{i, T} + α_{i, T} δ_{i, T} ϕ (x_{w i, T}), \forall i$ .
(9): $A_{i, N + 1} = A_{i, N} + \frac{1}{1 + e^{- q δ_{i, T}}} ϕ (x_{i, T}) ϕ {(x_{i, T})}^{T}, \forall i$ .
(10): $b_{i, N + 1} = b_{i, N} + \frac{1}{1 + e^{- q δ_{i, T}}} a_{i, T} ϕ (x_{i, T}), \forall i$ .
(11): $ω_{i, T + 1} = {[\frac{1}{N + 1} (A_{_{i, N + 1}})]}^{- 1} (\frac{1}{N + 1} b_{_{i, N + 1}}), \forall i$ .
(12): T = T + 1, N = N + 1.
(13): Checking the iterative termination condition, if our procedure achieved the iterative termination condition, go on to step (14), otherwise, return to step (5).
(14): Output: ${θ_{i}}^{*} = θ_{i, T}, {ω_{i}}^{*} = ω_{i, T}$ (for WPP_i( $\forall i$ )), based on which WPP_i( $\forall i$ ) can select the optimal bidding strategy (under BM 1 or 2) in face of whichever market state is.

Moreover, According to [40], we choose a Gaussian radial basis function as

ϕ (x)

.

4. Simulations and Discussions

4.1. System Data

In this section, by implementing the robust MCM mentioned in Section 2, our proposed day-ahead EM modeling approaches under different BMs are simulated on the IEEE 30-bus test system with five strategic WPPs [9]. Matlab R2014a is utilized to conduct our simulations. Figure 1 shows the schematic structure of the test system. Table 1 depicts the predicted single period loads distributed in different buses [42]. Consistent with the assumptions in Section 2, any uncertainties in this test system are not caused by loads. Parameters of conventional generators can be seen in Table 2, and the predicted power output intervals of the five WPPs which are the crucial components of the uncertainty set, are listed in Table 3. For the sake of simplicity and without losing generality, we assume the power output interval of WPP_i(

\forall i

) predicted by ISO is the same as that predicted by WPP_i(

\forall i

) itself.

4.2. Robust MCM Testing

The value of budget parameter

Λ

is related to the size of the uncertainty set. The smaller

Λ

is, the smaller is the size of uncertainty set estimated by ISO. That is to say the day-ahead market clearing procedure of ISO tends to be more deterministic with the decrease of the value of

Λ

(

Λ \geq 0

). When

Λ = 0

, it means the predicted power output of every WPP is deemed by ISO as a definite value which, according to Equation (1) and [10,19,20], is equal to the intermediate value of its power output interval, and the day-ahead MCM of ISO is completely turned into a conventional deterministic MCM similar to [30]. However, uncertainties exist objectively in a power system with WPPs. If the market clearing procedure were implemented by ISO without considering enough uncertainties, the reserve capacity of the system dispatched in day-ahead might find it hard to accommodate deviations caused by WPPs’ power output uncertainties in real time, which can seriously affect the security of the system and cause huge extra costs such as wind-abandonment, etc. Therefore, market clearing results under different

Λ

values must be compared so as to verify the necessity of proposing robust MCM in day-ahead EM with wind power penetrations.

In this section, in order to facilitate the market clearing comparison, we assume every WPP is under BM 1 and sends the ISO the intermediate value of its power output interval. In fact, the same key conclusions obtained from market clearing comparisons can also be generated with other BM and strategies. Moreover, no matter what the ISO thinks the value of

Λ

is, the actual value of

Λ

which represents the objective existence of uncertainty is fixed, by us, to the number of WPPs in the system (

Λ

= 5). Table 4 shows the market clearing results under different

Λ

values.

In Table 4, because the marginal cost of every WPP is neglected [3,31], the “operation cost” in column 2 can be calculated by using

\sum_{j = 1}^{N_{G}} c_{j} P_{j}

when the optimal ED solution is obtained. Moreover, the “uncertainty that cannot be accommodated” in column 3 means whether there exist uncertainty points in

U (Λ = 5)

that cannot be accommodated when the optimal ED solution is obtained. The “number of uncertainty poles that cannot be accommodated” in column 4 denotes the number of poles in

U (Λ = 5)

that cannot be accommodated when the optimal ED solution is obtained. From Table 4, it can be concluded that:

The total operation cost increases with the increase of $Λ$ . However, uncertainties that cannot be accommodated tend to be eliminated by increasing $Λ$ . On one hand, it means the conservatism of ISO is improved with the increase of $Λ$ , which reduces the economic efficiency of scheduling to a certain extent. On the other hand, the operation cost is calculated based on the basic scenario (WPPs’ day-ahead bids), in which extra cost caused by uncertainties that cannot be accommodated is not taken into account. Although that extra cost is hard to be specifically calculated due to many reasons such as missing information about the real-time occurrence of uncertainty from day-ahead horizon etc., it can be considerable once any uncertainty that cannot be accommodated occurs in practice. Therefore, it is necessary to eliminate uncertainties that cannot be accommodated by reasonably increasing the value of $Λ$ .
When $Λ = 0$ , it means the ISO clears the market using the conventional deterministic MCM. The number of uncertainty poles that cannot be accommodated in case $Λ = 0$ is 22 which is significantly more than any other cases listed in Table 4 (actually number of uncertainties that cannot be accommodated in case $Λ = 0$ is infinite). That is to say it is necessary to employ a modified MCM, such as our proposed robust MCM, in day-ahead EM with considerable uncertainties (i.e., WPPs).
Comparing cases of $Λ = 4$ and $Λ = 5$ , on one hand, there are no uncertainties that cannot be accommodated in both of the two cases; on the other hand, operation cost in case $Λ = 4$ is equal to that in case $Λ = 5$ . Moreover, increasing $Λ$ means to increase the computational complexity of solving the robust MCM [15,16,19,20]. Hence, the proposed robust MCM with $Λ = 4$ is applied for market clearing in our subsequent simulations.

4.3. LSCAC-Based EM Modeling Approach Testing

In this section and our subsequent simulations, no matter under which BM, every WPP (agent) will start with experiencing a training process of 3000 iterations. During this training process, all WPPs consider the balance of exploration and exploitation when selecting bidding strategies (actions) in each iteration [41]. After the training process, decision making process of 500 iterations will be implemented by every WPP, in which only greedy policy will be adopted when selecting actions in face of any state of the market. Moreover, we randomly set the action for every WPP at the beginning of the first training iteration because every WPP starts with limited experience in strategy selecting.

Testing and verifying whether our proposed LSCAC-based day-head EM approach under BM 1 reaches to dynamic stability or not after 3000 training iterations can be shown in Figure 2, Figure 3 and Figure 4. Moreover, Testing and verifying whether our proposed LSCAC-based day-head EM approach under BM 2 reaches to dynamic stability or not after 3000 training iterations can be shown in Figure 5, Figure 6 and Figure 7. In Figure 4 and Figure 7, summed UMP at bus m in each iteration can be calculated by using Equation (39).

Before we analyze BMs and strategies for WPPs by using our proposed LSCAC-based day-head EM modeling approach, it should be tested first whether our proposed approaches under different BMs converge to dynamic stabilities after every WPP experiences enough iterations of on line training. If the convergence was verified, the market state and obtained action of every WPP would no longer change after enough training iterations. It should be noted that in the existing TBRL-based approaches [26,27,28,29,30,31,36], the action set of every agent is discrete and finite, and the optimality of an agent’s final obtained action can be easily verified by using method mentioned in [31], which is to compare profits brought from all actions in this agent’s action set while fixing the actions of other agents. However, in our proposed LSCAC-based approach, the action set of every agent is continuous. It is impossible to directly test the optimality of an agent’s final obtained action because there are infinite actions other than this final obtained one. Therefore, we propose the following three steps to further test the performance of the LSCAC-based day-head EM modeling approach:

To test the optimality of a WPP’s final obtained strategy in TBRL-based (i.e., Q-Learning algorithm [26,27]) day-head EM modeling approaches after converging to dynamic stabilities by comparing profits brought from all this WPP’s strategies while fixing other WPPs’ obtained strategies. The specific optimality test method can be seen in [31].
To test whether a WPP can obtain more profit by using LSCAC algorithm than TBRL algorithm (Q-Learning algorithm [26,27]) or not, after converging to dynamic stabilities.
To test whether the whole market can reach lower operation cost in our proposed LSCAC-based approach than TBRL-based (Q-Learning algorithm [26,27]) one or not, after converging to dynamic stabilities.

The related parameters of our LSCAC-based day-head EM modeling approach are listed in Table 5.

From Figure 2, Figure 3 and Figure 4, it can be seen that, after randomly fluctuating in 3000 training iterations, the adjustment processes of estimated profit, LMP and summed UMP of every WPP remain constant during 500 decision-making iterations. Actually, other adjustment processes such as that of operation cost, every WPP’s bidding strategy, etc. also become constant after 3000 training iterations. Therefore, our proposed approach under BM 1 can converge to dynamic stability after every WPP experiences 3000 iterations of online training.

From Figure 5, Figure 6 and Figure 7, it can be seen that, after randomly fluctuating in 3000 training iterations, the adjustment processes of estimated profit, LMP and summed UMP of every WPP remain constant during 500 decision-making iterations. Actually, other adjustment processes such as that of operation cost, every WPP’s bidding strategy, etc. also become constant after 3000 training iterations. Therefore, our proposed approach under BM 2 can converge to dynamic stability after every WPP experiences 3000 iterations of online training.

The main reason about the fluctuating trends in the 3000 training iterations in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7 is that in order to balance the exploration and exploitation during these 3000 training iterations, every WPP must maintain the ability of exploration which is to randomly select bidding strategies according to the repeatedly updated Equation (45), all WPPs’ insufficient experiences and unstable action selecting policies make the dynamic training process of EM fluctuate randomly. The main reason about the constant trends in 500 decision-making iterations in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7 is that after accumulating enough experiences, every WPP adopts the greedy policy which is to only select its considered optimal bidding strategy in face of any observed EM state in each of the 500 decision-making iterations, all WPPs’ sufficient experiences and stable action selecting policies make the dynamic decision-making process of EM converge to stability. Therefore, it may be concluded that enough training iterations considering the balance of exploration and exploitation, as well as the greedy action selecting policy adopted in decision-making iterations are two main factors resulting in EM dynamic stability. Taking EM approach under BM 1 for example, Figure 8 shows the dynamic adjusting process of WPP₁’s estimated profit when every WPP experiences 1000 training iterations and 500 decision-making iterations, and Figure 9 shows the dynamic adjustment process of WPP₁’s estimated profit when every WPP experiences 3500 training iterations without greedy action selecting policy.

From Figure 8, it is shown that although the greedy action selecting policies are adopted by WPPs in decision-making iterations, insufficient training iterations, which mean insufficient experiences accumulated, still make WPP₁’s estimated profit fluctuate during decision-making process. Actually, the dynamic adjustment processes of other WPPs’ estimated profits also fluctuate during decision-making iterations.

From Figure 9, it is shown that although more than 3000 training iterations considering the balance of exploration and exploitation are conducted, WPP₁’s estimated profit still fluctuates during the last 500 iterations due to its lack of a greedy action selection policy. Actually, the dynamic adjustment processes of other WPPs’ estimated profits also fluctuate during the last 500 iterations. Moreover, WPP₁’s estimated profit in Figure 9 is much more volatile during the last 500 iterations than that in Figure 8, which is mainly because WPPs lacking greedy action selection policies tend to bid more randomly in the EM.

Therefore, EM dynamic stability cannot be reached whether there are insufficient training iterations or the greedy action selecting policy is not considered, which, to a certain extent, on the one hand verifies our conclusions about the two main factors resulting in EM dynamic stability, and on the other hand, suggests that the proposed 3000 training iterations and 500 decision-making iterations are comparatively reasonable for our proposed LSCAC-based approach to reach EM dynamic stability.

To further test the performance of our proposed approaches under different BMs, two Q-learning-based day-ahead EM approaches (QDEMAs) are taken for comparison. In these QDEMAs, some WPPs are designated as Q-learning-based agents while other undesignated WPPs are still the LSCAC-based ones. A Q-learning-based agent dynamically adjusts its action based on Q-learning algorithm which use ε-greedy policy [26] to balance exploration and exploitation in 3000 training iterations, and greedy policy in 500 decision-making iterations. Difference among these QDEMAs is only reflected in the number of Q-learning-based agents. Parameters related to these two QDEMAs are listed in Table 6.

After 3000 training and 500 decision-making iterations, the obtained market results of those two QDEMAs and our proposed approach under BM 1 are listed in Table 7, and the obtained results of those two QDEMAs and our proposed approach under BM 2 are listed in Table 8. Moreover, like our proposed LSCAC-based approach, no matter under which BM, both these QDEMAs can converge to dynamic stability after every WPP experiences 3000 iterations of online training. That means those results listed in Table 7 and Table 8 are not obtained accidentally, a LSCAC-based or Q-learning-based WPP does not change its strategy when market state affected by all WPPs’ strategies keeps unchanged.

From Table 7 and Table 8, it can be conclude that:

By using the optimality test method in [31], no matter under which BM and in which QDEMA, every Q-learning-based WPP’s final obtained strategy can be verified as its optimal one in its discrete action set, which can bring it the most profit when other WPPs’ strategies are fixed.
No matter under which BM, on the one hand, estimated profits of WPP₁ and WPP₂ in QDEMA 2 are higher than those in QDEMA 1, respectively, and estimated profits of WPP₃, WPP₄ and WPP₅ in our proposed LSCAC-based approach are higher than those in QDEMA 2, respectively, which, to some extent, indicates one can get more profit by using the LSCAC algorithm to bid in EM than the Q-learning one within the same conditions; on the other hand, the operation cost in our proposed LSCAC-based approach is lower than that in QDEMA 2, and the operation cost in QDEMA 2 is lower than that in QDEMA 1, which, to some extent, indicates that with the increase in the number of LSCAC-based agents in EM, the operation cost of whole system can be reduced.

In conclusion, no matter under which BM, Q-learning-based WPPs can finally find their optimal bidding strategies from their discrete and finite action sets. If these WPPs are transformed into LSCAC-based ones, they can finally find their more applicable strategies from their continuous action sets, which not only bring more profits for themselves but also bring lower operation cost for the whole system than that based on Q-learning method. Hence, although it is hard to directly test the optimality of every LSCAC-based WPP’s final obtained strategy, our further test has, to some extent, verified the rationality and scientific basis of applying our proposed LSCAC-based approach in day-ahead EM modeling for strategic WPPs.

Moreover, no matter under which BM, simulation of our proposed approach on IEEE 30 bus test system with five strategic WPPs takes only about 43 seconds to reach the final results (after 3500 iterations). That is to say, the time complexity of our proposed approach is relatively low so that we can extend it to the modeling and simulation of more realistic and more complex EM system.

4.4. BMs Analysis for WPPs

In this section, our proposed LSCAC-based approach is applied to analyze the obtained market results under different BMs after 3000 training iterations and 500 decision-making iterations.

Moreover, it should be noted that under BM 2, in order to lead WPP_i(

\forall i

) to reasonably bid in market, we set lower and upper limits

ρ_{l o w, i}

and

ρ_{u p p, i}

($/MWh) for its bidding price

ρ_{i}^{b}

. Values of

ρ_{l o w, i}

and

ρ_{u p p, i}

(

\forall i

) may affect the obtained market results such as final obtained LMPs, estimated profits of all WPPs and operation cost of the system etc. after 3500 iterations. Hence, different values of

ρ_{l o w, i}

and

ρ_{u p p, i}

(

\forall i

) should be taken into account when considering BMs. For the sake of simplicity and without losing generality, we set

ρ_{l o w, i} = ρ_{l o w}, ρ_{u p p, i} = ρ_{u p p} \forall i

, and different values of

ρ_{l o w}

and

ρ_{u p p}

are considered.

After 3500 iterations, considering different values of

ρ_{l o w}

(while fixing the upper limit

ρ_{u p p}

to 50 $/MWh, the same as what listed in Table 5, Table 9 is listed for the comparison of obtained market results under different BMs.

After 3500 iterations, considering different values of

ρ_{u p p}

(while fixing the upper limit

ρ_{l o w}

to 30 $/MWh, the same as what was listed in Table 5, Table 10 is provided for the comparison of obtained market results under different BMs.

From Table 9 and Table 10, it can be seen that:

In Table 9, when values of $ρ_{l o w}$ are 0, 10, 20 and 30 ($/MWh), respectively, the obtained profit of every WPP, operation cost and average LMP of 30 buses under BM 2 remain unchanged. Actually, if $ρ_{l o w}$ ≤ 30 $/MWh, the obtained bidding price (strategy) of every WPP under BM 2 is higher than 30 ($/MWh), which means values of $ρ_{l o w}$ lower than 30 ($/MWh) cannot affect every WPP’s bidding decision-making. Therefore, in our opinion, it is hard to weaken the market power of every WPP by only reducing the value of lower limit of every WPP’s bidding price while WPPs provide ISO their bidding curves consisting of bidding prices and power outputs for the next day.
In Table 10, the obtained profit of every WPP, operation cost and average LMP of 30 buses under BM 2 increases with the increase of the value of $ρ_{u p p}$ . That may be mainly because the more the value of $ρ_{u p p}$ is, the greater market power WPPs have. Therefore, in our opinion, the upper limit of every WPP’s bidding price should not be set too high while WPPs provide ISO their bidding curves consisting of bidding prices and power outputs for the next day.
In most cases of $ρ_{l o w}$ and $ρ_{u p p}$ , WPPs under BM 2 can get more profits than under BM 1, which may be because WPPs under BM 2 can obtain greater market power by directly adjusting their bidding prices so as to further improve their profits compared with WPPs under BM 1. However, from the perspective of the whole market, both the obtained operation cost and average LMP under BM 2 are higher than that under BM 1, which, to some extent, indicates WPPs adopting BM 2 cause lower economic efficiency in the whole market than adopting BM 1. Therefore, in our opinion, if the purpose of permitting WPPs to bid is to promote the development of wind power resources by improving WPPs’ profits, providing ISO their bidding curves is more applicable, and if the purpose of permitting WPPs to bid is to improve the economic efficiency of the whole market, only sending their power output plans is more applicable.

5. Conclusions

In this paper, we present a LSCAC-based day-head EM modeling approach with a robust market clearing mechanism embedded in it, and strategic behaviors of WPPs under two different BMs are successively mimicked and compared by using our proposed approach. With employing the robust MCM, day-head ED solution of the market can be immunized against any uncertainty within the real time wind power uncertainty set estimated by ISO, which not only ensures the ability of wind power accommodation in EM, but also reasonably generates LMPs for energy credit and load payment as well as UMPs for reserve credit and deviation payment. By employing the LSACA algorithm, every WPP can significantly improve its profit and the operation cost of the system can also be remarkably decreased compared with employing the TBRL (i.e., Q-learning) algorithms. Low computational time (taking only about 43 seconds for our simulation on IEEE 30 bus test system to reach the final results) makes that our proposed approach easily extendible to provide a reasonable test bed for simulation of more realistic and more complex EM systems. Moreover, by conducting comparisons on market results under different BMs in simulation, some suggestions leading WPPs to reasonably bid in market and BM selections are put forward for the purposes of promoting the development of wind power resources and improving the economic efficiency of the whole market.

Acknowledgments

This study is supported by the National Natural Science Foundation of China under Grant No. 71373076, the Fundamental Research Funds for the Central Universities under Grant No. JB2016183, and the National Key R&D Program of China under Grand No. 2016YFB0900501.

Author Contributions

Huiru Zhao guided the research; Yuwei Wang established the model, implemented the simulation and wrote this article; Mingrui Zhao, Qingkun Tan and Sen Guo collected references.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

Acronym
EM	Electricity market
WPP	Wind power producer
BM	Bidding mode
ISO	Independent system operator
MCM	Market clearing model
SO	Stochastic optimization
WPOSS/WPOS	Wind power output stochastic scenario/wind power output scenario
RO	Robust optimization
SCUC/UC	security constraint unit-commitment/unit-commitment
SCED/ED	security constraint economic-dispatch/economic-dispatch
AP	Affine policy
GenCO	generation company
ARIMA	Autoregressive integrated moving average
SARSA	State-action-reward-state-action
KDE	Kernel density estimation
TBRL	Table-based reinforcement learning
MASCEM	Multi-agent simulator of competitive electricity market
LSCAC	Least square continuous actor-critic
RMCM	Robust market clearing model
MP/SP	Master problem/sub-problem
LMP	Locational marginal price
UMP	Uncertainty marginal price
GLM/F	Generalized lagrange multiplier/fuction
TD	Time difference
QDEMA	Q-learning-based day-ahead EM approach
Index
i, j, m	Indices for WPP, conventional generator and bus
k	index of the worst point for uncertainty (element in $κ$ )
l	Index for transmission line
T, N	Indices for iteration in LSCAC-based EM approach
h	Index for the dimension of basis function vector
Set and Function
$U$	Real-time wind power output uncertainty set
$Ω$	Robust feasible region for every day-ahead ED solution which is immunized against any real-time uncertainties within $U$
$Θ (m)$	Set of WPPs, conventional generators and loads connected in bus m
$κ$	Index set for worst uncertainty points
$κ_{m, T - 1}^{u p}$ , $κ_{m, T - 1}^{d o w n}$	Set of indices k for upward and downward UMPs at bus m iteration T − 1
$X$ / $A$	State space/action space
$ρ (\cdot)$	WPP’s bidding function (curve) under BM 2, which is assumed to be identically equal to a certain bidding price $ρ_{i}^{b}$ no matter what its bidding power output $P w_{i}^{b}$ is
$L (\cdot)$	Generalized lagrange function
$\hat{I} (\cdot)$	Optimal action selecting policy function
$ϕ_{h} (\cdot)$ / $ϕ (\cdot)$	h-th basic function/basic function vector
$p r o (\cdot, \cdot)$	Probability density function for selecting action a under state x (representing action selecting policy during training iterations)
$M S E (\cdot)$	Mean square function
$P^{(p r o)} (\cdot)$	Probability distribution function of state x under policy pro
$s i g (\cdot)$	Sigmoid function
$δ (\cdot, \cdot)$	TD error function for selecting action a under state x
Constant
N_w, N_G, N_bus	The numbers of WPPs, conventional generators and buses
$l w_{i}$ , $u w_{i}$	lower and upper bounds for i-th WPP’s real-time wind power output
$ρ_{l o w, i}$ , $ρ_{u p p, i}$	lower and upper limits for i-th WPP’s bidding price
$Λ$	Budget parameter relating to the size of uncertainty set
c_j	cost coefficient of j-th conventional generator
d_m	Aggregated equivalent load at bus m
$P_{j}^{\max}$ , $P_{j}^{\min}$	Maximum and minimum generation outputs
$Γ_{l, m}$	Shift factor for line l with respect to bus m
$F_{l}$	Transmission line flow limit for line l
$r_{j}^{u}$ , $r_{j}^{d}$	j-th conventional generator’s ramping-up/down limits for uncertainty accommodation
$Δ$	Feasibility tolerance
n	dimension number of basis function vector
q	Coefficient in sigmoid function
$Π$	Small positive constant for calculating the linear parameter vector $ω$ in $\hat{I} (x)$ based on ridge regression
$α$	Step length factor
$σ$	Standard deviation
$γ$	Discount factor
Parametric Variable
$P w_{i}^{b}$	i-th WPP’s bidding power output parameter
$ρ_{i}^{b}$	i-th WPP’s bidding price parameter
$θ$	Linear parameter vector in TD error function (and value function)
$ω$	Linear parameter vector in optimal action selecting policy function
$θ^{*}$	Convergence value of $θ$
$ω^{*}$	Convergence value of $ω$
A_N/b_N	Intermediate parameter matrix/vector in LSCAC iterations for obtaining convergence value of $ω$
Variable
$P w$ / $P w_{i}$	Random variable vector representing the uncertainty of WPPs’ joint real-time power outputs/random variable representing the uncertainty of i-th WPP’s real-time power output, moreover, $P w$ / $P w_{i}$ can also be considered as part of the decision variables (vector) in SP
$P w_{k}$	k-th worst uncertainty point of WPPs’ joint real-time power outputs, moreover, $P w_{k}$ also represents part of SP’s solutions when solving SP for the k-th time
$P_{j}$ , $P w_{i}^{d i s p}$ , $Δ P_{j k}$	Decision variables in MP, representing day-ahead dispatched power output of j-th conventional generator, i-th WPP, as well as real-time power re-dispatch incremental result of j-th conventional generator under k-th worst uncertainty point, respectively
$P_{G W}$ , $P w^{d i s p}$ , $Δ P$	Variable vector consisting of $P_{j}$ ( $\forall j$ ) and $P w_{i}^{d i s p}$ ( $\forall i$ ), variable vector consisting of $P w_{i}^{d i s p}$ ( $\forall i$ ), variable vector consisting of $Δ P_{j k}$ ( $\forall j$ , $\forall k$ )
$s_{i}^{+}$ , $s_{i}^{-}$ , $Δ P_{j}$	Other decision variables in SP, $s_{i}^{+}$ , $s_{i}^{-}$ are non-negative slack variables, and the sum of which evaluate the violation associated with the solution from MP, $Δ P_{j}$ represents real time power re-dispatch approach of j-th conventional generator for accommodating uncertainties within U
$Ψ$	Value of SP’s objective function
$λ, β, μ, ζ, η$	Generalized lagrange multiplier vectors
$ε_{m k}$	Deviation of the real-time power output generated by WPPs connecting in bus m under k-th worst uncertainty point from the day-head bidding (dispatched) one
$π_{m}$ , $π_{m k}$	LMP in bus m, UMP in bus m under k-th worst uncertainty point
$R_{i}$	Estimated profit of i-th WPP
x, a, r	State variable, action variable, reward

References

Prabavathi, M.; Gnanadass, R. Energy bidding strategies for restructured electricity market. Int. J. Electr. Power Energy Syst. 2015, 64, 956–966. [Google Scholar] [CrossRef]
Majumder, S.; Khaparde, S.A. Revenue and ancillary benefit maximization of multiple non-collocated wind power producers considering uncertainties. IET Gener. Trans. Distrib. 2016, 10, 789–797. [Google Scholar] [CrossRef]
Li, J.; Wan, C.; Xu, Z. Robust offering strategy for a wind power producer under uncertainties. Proceedings of 2016 IEEE International Conference on Smart Grid Communications (SmartGridComm), Sydney, Australia, 6–9 November 2016; pp. 752–757. [Google Scholar]
Zhao, Q.; Shen, Y.; Li, M. Control and bidding strategy for virtual power plants with renewable generation and inelastic demand in electricity markets. IEEE Trans. Sustain. Energy 2016, 7, 562–575. [Google Scholar] [CrossRef]
Zugno, M.; Morales, J.M.; Pinson, P.; Madsen, H. Pool strategy of a price-maker wind power producer. IEEE Trans. Power Syst. 2013, 28, 3440–3450. [Google Scholar] [CrossRef]
Shafie-khah, M.; Heydarian-Forushani, E.; Golshan, M.E.H.; Moghaddam, M.P.; Sheikh-El-Eslami, M.K.; Catalão, J.P.S. Strategic offering for a price-maker wind power producer in oligopoly markets considering demand response exchange. IEEE Trans. Ind. Inf. 2015, 11, 1542–1553. [Google Scholar] [CrossRef]
Delikaraoglou, S.; Papakonstantinou, A.; Ordoudis, C.; Pinson, P. Price-maker wind power producer participating in a joint day-ahead and real-time market. In Proceedings of the 12th International Conference on the European Energy Market (EEM), Lisbon, Portugal, 19–22 May 2015; pp. 1–5. [Google Scholar]
De la Nieta, A.A.S.; Contreras, J.; Muñoz, J.I.; O’Malley, M. Modeling the impact of a wind power producer as a price-maker. IEEE Trans. Power Syst. 2014, 29, 2723–2732. [Google Scholar] [CrossRef]
Lei, M.; Zhang, J.; Dong, X.; Ye, J.J. Modeling the bids of wind power producers in the day-ahead market with stochastic market clearing. Sustain. Energy Technol. Assessm. 2016, 16, 151–161. [Google Scholar] [CrossRef]
Ye, H.; Ge, Y.; Shahidehpour, M.; Li, Z. Pricing energy and flexibility in robust Security-Constrained Unit Commitment model. In Proceedings of the 2016 IEEE Power and Energy Society General Meeting (PESGM), Boston, MA, USA, 17–21 July 2016; pp. 1–5. [Google Scholar]
Morales, J.M.; Conejo, A.J.; Perez-Ruiz, J. Economic valuation of reserves in power systems with high penetration of wind power. IEEE Trans. Power Syst. 2009, 24, 900–910. [Google Scholar] [CrossRef]
Morales, J.M.; Conejo, A.J.; Liu, K.; Zhong, J. Pricing electricity in pools with wind producers. IEEE Trans. Power Syst. 2012, 27, 1366–1376. [Google Scholar] [CrossRef]
Catalao, J.P.S.; Pousinho, H.M.I.; Mendes, V.M.F. Optimal offering strategies for wind power producers considering uncertainty and risk. IEEE Syst. J. 2012, 6, 270–277. [Google Scholar] [CrossRef]
Zugno, M.; Conejo, A.J. A robust optimization approach to energy and reserve dispatch in electricity markets. Eur. J. Oper. Res. 2015, 247, 659–671. [Google Scholar] [CrossRef]
Wei, W.; Liu, F.; Mei, S. Robust and economical scheduling methodology for power systems. Part one: Theoretical foundations. Autom. Electr. Power Syst. 2013, 37, 37–43. [Google Scholar]
Jiang, R.; Wang, J.; Guan, Y. Robust unit commitment with wind power and pumped storage hydro. IEEE Trans. Power Syst. 2012, 27, 800–810. [Google Scholar] [CrossRef]
Zhao, C.; Guan, Y. Unified stochastic and robust unit commitment. IEEE Trans. Power Syst. 2013, 28, 3353–3361. [Google Scholar] [CrossRef]
Warrington, J.; Goulart, P.; Mariéthoz, S.; Morari, M. Policy-based reserves for power systems. IEEE Trans. Power Syst. 2013, 28, 4427–4437. [Google Scholar] [CrossRef]
Ye, H.; Wang, J.; Li, Z. MIP reformulation for max-min problems in two-stage robust SCUC. IEEE Trans. Power Syst. 2017, 32, 1237–1247. [Google Scholar] [CrossRef]
Ye, H.; Ge, Y.; Shahidehpour, M.; Li, Z. Uncertainty marginal price, transmission reserve, and day-ahead market clearing with robust unit commitment. IEEE Trans. Power Syst. 2016, 32, 1782–1795. [Google Scholar] [CrossRef]
Langary, D.; Sadati, N.; Ranjbar, A.M. Direct approach in computing robust Nash strategies for generating companies in electricity markets. Int. J. Electr. Power Energy Syst. 2014, 54, 442–453. [Google Scholar] [CrossRef]
Salem, Y.; Agtash, A. Supply curve bidding of electricity in constrained power networks. Energy 2010, 35, 2886–2892. [Google Scholar]
Min, C.G.; Kim, M.K.; Park, J.K.; Yoon, Y.T. Game-theory-based generation maintenance scheduling in electricity markets. Energy 2013, 55, 310–318. [Google Scholar] [CrossRef]
Wang, J.; Zhou, Z.; Botterud, A. An evolutionary game approach to analyzing bidding strategies in electricity markets with elastic demand. Energy 2011, 36, 3459–3467. [Google Scholar] [CrossRef]
Shivaie, M.; Ameli, M.T. An environmental/techno-economic approach for bidding strategy in security-constrained electricity markets by a bi-level harmony search algorithm. Renew. Energy 2015, 83, 881–896. [Google Scholar] [CrossRef]
Rahimiyan, M.; Mashhadi, H.R. Supplier’s optimal bidding strategy in electricity pay-as-bid auction: Comparison of the Q-learning and a model-based approach. Electr. Power Syst. Res. 2008, 78, 165–175. [Google Scholar] [CrossRef]
Xiong, G.; Hashiyama, T.; Okuma, S. An electricity supplier bidding strategy through Q-Learning. In Proceedings of the IEEE Power Engineering Society Summer Meeting, Chicago, IL, USA, 21–25 July 2002; pp. 1516–1521. [Google Scholar]
Ziogos, N.P.; Tellidou, A.C. An agent-based FTR auction simulator. Electr. Power Syst. Res. 2011, 81, 1239–1246. [Google Scholar] [CrossRef]
Santos, G.; Fernandes, R.; Pinto, T.; Praça, I.; Vale, Z.; Morais, H. MASCEM: EPEX SPOT Day-Ahead market integration and simulation. In Proceedings of the 18th International Conference on Intelligent System Application to Power Systems (ISAP), Porto, Portugal, 11–16 September 2015; pp. 1–5. [Google Scholar]
Bach, T. Using Reinforcement Learning to Study the Features of the Participants’ Behavior in Wholesale Power Market. Available online: https://www.science-definition.com/whatis/Using_Reinforcement_Learning_to_Study_the_Features_of_the_Participants%C2%A1%C2%AF_Behavior_in_Wholesale_Power_Market (accessed on 29 June 2017).
Salehizadeh, M.R.; Soltaniyan, S. Application of fuzzy Q-learning for electricity market modeling by considering renewable power penetration. Renew. Sustain. Energy Rev. 2016, 56, 1172–1181. [Google Scholar] [CrossRef]
Xiao, Y.; Wang, X.; Wang, X.; Dang, C.; Lu, M. Behavior analysis of wind power producer in electricity market. Appl. Energy 2016, 171, 325–335. [Google Scholar] [CrossRef]
Ravnaas, K.W.; Doorman, G.; Farahmand, H. Optimal wind farm bids under different balancing market arrangements. In Proceedings of the IEEE 11th International Conference on Probabilistic Methods Applied to Power Systems, Singapore, 14–17 June 2010; pp. 30–35. [Google Scholar]
Sharma, K.C.; Bhakar, R.; Tiwari, H.P. Strategic bidding for wind power producers in electricity markets. Energy Conv. Manag. 2014, 86, 259–267. [Google Scholar] [CrossRef]
Matevosyan, J.; Soder, L. Minimization of imbalance cost trading wind power on the short term power market. IEEE Trans. Power Syst. 2005, 21, 1–7. [Google Scholar]
Soares, T.; Santos, G.; Pinto, T.; Morais, H.; Pinson, P.; Vale, Z. Analysis of strategic wind power participation in energy market using MASCEM simulator. In Proceedings of the 2015 18th International Conference on Intelligent System Application to Power Systems (ISAP), Porto, Portugal, 11–16 September 2015; pp. 1–6. [Google Scholar]
Ding, H.; Pinson, P.; Hu, Z.; Wang, J.; Song, Y. Optimal offering and operating strategy for a large wind-storage system as a price maker. IEEE Trans. Power Syst. 2017, PP, 1. [Google Scholar] [CrossRef]
Laia, R.; Pousinho, H.M.I.; Melíco, R.; Mendes, V.M.F. Bidding strategy of wind-thermal energy producers. Renew. Energy 2016, 99, 673–681. [Google Scholar] [CrossRef]
Vilim, M.; Botterud, A. Wind power bidding in electricity markets with high wind penetration. Appl. Energy 2014, 118, 141–155. [Google Scholar] [CrossRef]
Chen, G. Research on Value Function Approximation Methods in Reinforcement Learning. Master’s Thesis, Soochow University, Jiangsu, China, 2014. [Google Scholar]
Zhao, H.; Wang, Y.; Guo, S.; Zhao, M.; Zhang, C. Application of gradient descent continuous actor-critic algorithm for double-side day-ahead electricity market modeling. Energies 2016, 9, 725. [Google Scholar] [CrossRef]
Index of /Data. Available online: http://motor.ece.iit.edu/Data/ (accessed on 29 June 2017).
Buygi, M.O.; Zareipour, H.; Rosehart, W.D. Impacts of large-scale integration of intermittent resources on electricity markets: A supply function equilibrium approach. IEEE Syst. J. 2012, 6, 220–232. [Google Scholar] [CrossRef]

Figure 1. Diagram of the test system (Note: For the sake of simplicity, here it is assumed that the maximum congestion constraints in all transmission lines are 30 MW).

Figure 2. Dynamic adjustment process of estimated profit corresponding to every WPP under BM 1.

Figure 3. Dynamic adjustment process of LMP corresponding to every WPP under BM 1.

Figure 4. Dynamic adjustment process of summed UMP corresponding to every WPP under BM 1.

Figure 5. Dynamic adjustment process of estimated profit corresponding to every WPP under BM 2.

Figure 6. Dynamic adjustment process of LMP corresponding to every WPP under BM 2.

Figure 7. Dynamic adjustment process of summed UMP corresponding to every WPP under BM 2.

Figure 8. Dynamic adjusting process of WPP₁’s estimated profit under BM 1 when every WPP experiences 1000 training iterations and 500 decision-making iterations.

Figure 9. Dynamic adjustment process of WPP₁’s estimated profit under BM 1 when every WPP experiences 3500 training iterations without greedy action selecting policy.

Table 1. Values of un-elastic single period loads.

Bus Number	2	3	4	7	8
Load (MW)	26.7	7.4	12.6	27.8	35
Bus	10	12	14	15	16
Load (MW)	10.8	16.2	11.2	13.2	8.5
Bus number	17	18	19	20	21
Load (MW)	14	8.2	14.5	7.2	22.5
Bus	23	24	26	29	30
Load (MW)	8.2	13.7	8.5	7.4	15.6

Table 2. Parameters of conventional generators.

Bus	Generators	c_j (10³ $/MWh)	P^max (MW)	$r a m p^{u p, \max}$ (MW)	$r a m p^{d o w n, \max}$ (MW)
1	G₁	36	80	24	24
2	G₂	31.5	80	24	24
13	G₃	41.25	50	12	12
22	G₄	37.087	55	12	12
23	G₅	37.5	40	10	10
27	G₆	40	40	10	10

Note: Because the bidding strategies of conventional generators are neglected, the value of c_j (

\forall j

) in Table 2 is obtained by using marginal cost of the j-th conventional generator when its power output reaches P_j^max. Moreover, parameters of every conventional generator’s marginal cost can be seen in Reference [43].

Table 3. Predicted power output intervals of WPPs.

Bus	WPP	Lw (MW)	Uw (MW)
3	WPP₁	5	20
10	WPP₂	10	25
12	WPP₃	10	25
15	WPP₄	20	30
25	WPP₅	5	20

Table 4. Market clearing results under different

Λ

values.

Table 4. Market clearing results under different

Λ

values.

$Λ$ Value	Operation Cost (10³ $)	Uncertainty That Cannot Be Accommodated	Number of Uncertainty Poles That Cannot Be Accommodated
0	5.8955	yes	22
1	6.0087	yes	14
2	6.1023	yes	9
3	6.1851	yes	3
4	6.2078	no	0
5	6.2078	no	0

Table 5. Related parameters in LSCAC-based day-head EM modeling approach.

Under BM 1
LSCAC-Based Agents	EM State Set ($/MWh)		Action Set	$γ$	$α$	$σ$	q
LSCAC-Based Agents	X₁	X₂	Action Set	$γ$	$α$	$σ$	q
WPP₁	[0,100]	[−50, 50]	[5,20] MW	0.5	0.1	4.5	1
WPP₂	[0,100]	[−50, 50]	[10,25] MW	0.5	0.1	4.5	1
WPP₃	[0,100]	[−50, 50]	[10,25] MW	0.5	0.1	4.5	1
WPP₅	[0,100]	[−50, 50]	[20,30] MW	0.5	0.1	3	1
WPP₅	[0,100]	[−50, 50]	[5,20] MW	0.5	0.1	4.5	1
Under BM 2
LSCAC-Based Agents	EM State Set ($/MWh)		Action Set	$γ$	$α$	$σ$	q
LSCAC-Based Agents	X₁	X₂	Action Set	$γ$	$α$	$σ$	q
WPP₁	[0,100]	[−50, 50]	[30,50] $/MWh	0.5	0.1	4.5	1
WPP₂	[0,100]	[−50, 50]	[30,50] $/MWh	0.5	0.1	4.5	1
WPP₃	[0,100]	[−50, 50]	[30,50] $/MWh	0.5	0.1	4.5	1
WPP₅	[0,100]	[−50, 50]	[30,50] $/MWh	0.5	0.1	3	1
WPP₅	[0,100]	[−50, 50]	[30,50] $/MWh	0.5	0.1	4.5	1
Central point sets in the Gauss radial basis function corresponding with X₁ and X₂		X₁	{0, 5, 10, 15, …, 100}
		X₂	{−50, −45, −40, …, 50}

Table 6. Related parameters of Q-learning-based agents in two QDEMAs.

QDEMA 1 under BM 1
Q-Learning-Based Agents	EM State Set ($/MWh)		Action Set (MW)	$ε$	$γ$	$α$
Q-Learning-Based Agents	X₁	X₂	Action Set (MW)	$ε$	$γ$	$α$
WPP₁	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{5, 6, …, 20}	0.1	0.5	0.1
WPP₂	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{10, 11, …, 25}	0.1	0.5	0.1
WPP₃	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{10, 11, …, 25}	0.1	0.5	0.1
WPP₄	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{20, 21, …, 30}	0.1	0.5	0.1
WPP₅	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{5, 6, …, 20}	0.1	0.5	0.1
QDEMA 1 under BM 2
Q-Learning-Based Agents	EM State Set ($/MWh)		Action Set ($/MWh)	$ε$	$γ$	$α$
Q-Learning-Based Agents	X₁	X₂	Action Set ($/MWh)	$ε$	$γ$	$α$
WPP₁	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{30, 31, 32, …, 50}	0.1	0.5	0.1
WPP₂	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{30, 31, 32, …, 50}	0.1	0.5	0.1
WPP₃	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{30, 31, 32, …, 50}	0.1	0.5	0.1
WPP₄	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{30, 31, 32, …, 50}	0.1	0.5	0.1
WPP₅	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{30, 31, 32, …, 50}	0.1	0.5	0.1
QDEMA 2 under BM 1
Q-Learning-Based Agents	EM State Set ($/MWh)		Action Set (MW)	$ε$	$γ$	$α$
Q-Learning-Based Agents	X₁	X₂	Action Set (MW)	$ε$	$γ$	$α$
WPP₃	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{10, 11, …, 25}	0.1	0.5	0.1
WPP₄	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{20, 21, …, 30}	0.1	0.5	0.1
WPP₅	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{5, 6, …, 20}	0.1	0.5	0.1
QDEMA 1 under BM 2
Q-Learning-Based Agents	EM State Set ($/MWh)		Action Set ($/MWh)	$ε$	$γ$	$α$
Q-Learning-Based Agents	X₁	X₂	Action Set ($/MWh)	$ε$	$γ$	$α$
WPP₃	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{30,31,32,…,50}	0.1	0.5	0.1
WPP₄	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{30,31,32,…,50}	0.1	0.5	0.1
WPP₅	{0, 5, 10, …, 100}	{−50, −45, −40, …, 50}	{30,31,32,…,50}	0.1	0.5	0.1

Table 7. Obtained market results of those two QDEMAs and our proposed approach under BM 1 (

10^{3} $

).

Table 7. Obtained market results of those two QDEMAs and our proposed approach under BM 1 (

10^{3} $

).

Approaches	Profit of					Operation Cost
Approaches	WPP₁	WPP₂	WPP₃	WPP₄	WPP₅	Operation Cost
QDEMA 1	0.6497	0.9183	0.8195	1.0683	0.6631	6.1012
QDEMA 2	0.8034	1.0859	0.7706	0.9175	0.5989	5.9973
LSCAC-based approach	0.7185	1.0667	0.9917	1.1578	0.7432	5.8309

Table 8. Obtained market results of those two QDEMAs and our proposed approach under BM 2 (

10^{3} $

).

Table 8. Obtained market results of those two QDEMAs and our proposed approach under BM 2 (

10^{3} $

).

Approaches	Profit of					Operation Cost
Approaches	WPP₁	WPP₂	WPP₃	WPP₄	WPP₅	Operation Cost
QDEMA 1	0.6816	0.9643	0.8740	1.1375	0.7012	6.9908
QDEMA 2	0.8319	1.1083	0.8172	1.0864	0.6398	6.7637
LSCAC-based approach	0.7318	1.0797	1.0316	1.2073	0.7726	6.4134

Table 9. Comparison of obtained market results under different BMs by considering different values of

ρ_{l o w}

.

Table 9. Comparison of obtained market results under different BMs by considering different values of

ρ_{l o w}

.

BMs		Profit of (10³$)					Operation Cost (10⁻³$)	Average LMP ($/MWh)
Under BM 2	$ρ_{low}$	WPP₁	WPP₂	WPP₃	WPP₄	WPP₅	Operation Cost (10⁻³$)	Average LMP ($/MWh)
	0	0.7318	1.0797	1.0316	1.2073	0.7726	6.4134	39.3228
	10	0.7318	1.0797	1.0316	1.2073	0.7726	6.4134	39.3228
	20	0.7318	1.0797	1.0316	1.2073	0.7726	6.4134	39.3228
	30	0.7318	1.0797	1.0316	1.2073	0.7726	6.4134	39.3228
Under BM 1		0.7185	1.0667	0.9917	1.1578	0.7432	5.8309	38.7490

Table 10. Comparison of obtained market results under different BMs by considering different values of

ρ_{u p p}

.

Table 10. Comparison of obtained market results under different BMs by considering different values of

ρ_{u p p}

.

BMs		Profit of (10³$)					Operation Cost (10³$)	Average LMP ($/MWh)
Under BM 2	$ρ_{upp}$	WPP₁	WPP₂	WPP₃	WPP₄	WPP₅	Operation Cost (10³$)	Average LMP ($/MWh)
	50	0.7318	1.0797	1.0316	1.2073	0.7726	6.4134	39.3228
	60	0.7646	1.1082	1.0656	1.2322	0.7953	6.5973	39.4137
	70	0.8769	1.1231	1.0921	1.2547	0.9080	6.8231	39..8759
	80	0.9228	1.1439	1.1292	1.2861	0.9242	7.0197	40.0011
Under BM 1		0.7185	1.0667	0.9917	1.1578	0.7432	5.8909	38.7490

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, H.; Wang, Y.; Zhao, M.; Tan, Q.; Guo, S. Day-Ahead Market Modeling for Strategic Wind Power Producers under Robust Market Clearing. Energies 2017, 10, 924. https://doi.org/10.3390/en10070924

AMA Style

Zhao H, Wang Y, Zhao M, Tan Q, Guo S. Day-Ahead Market Modeling for Strategic Wind Power Producers under Robust Market Clearing. Energies. 2017; 10(7):924. https://doi.org/10.3390/en10070924

Chicago/Turabian Style

Zhao, Huiru, Yuwei Wang, Mingrui Zhao, Qingkun Tan, and Sen Guo. 2017. "Day-Ahead Market Modeling for Strategic Wind Power Producers under Robust Market Clearing" Energies 10, no. 7: 924. https://doi.org/10.3390/en10070924

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Day-Ahead Market Modeling for Strategic Wind Power Producers under Robust Market Clearing

Abstract

1. Introduction

2. Problem Description

2.1. Model Assumptions

2.2. Different BMs of WPP

2.3. Robust Market Clearing Models under Different BMs of WPPs

2.4. Robust MCM Reformulation

2.5. Clearing Price Mechanism

3. LSCAC-Based Day-Ahead EM Modeling Approach for WPPs

3.1. Definitions

3.2. LSCAC Algorithm

3.3. The Step-by-Step Procedure of the Proposed Approach

4. Simulations and Discussions

4.1. System Data

4.2. Robust MCM Testing

4.3. LSCAC-Based EM Modeling Approach Testing

4.4. BMs Analysis for WPPs

5. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI