Day-Ahead Market Modeling for Strategic Wind Power Producers under Robust Market Clearing

In this paper, considering real time wind power uncertainties, the strategic behaviors of wind power producers adopting two different bidding modes in day-ahead electricity market is modeled and experimentally compared. These two different bidding modes only provide a wind power output plan and a bidding curve consisting of bidding price and power output, respectively. On the one hand, to significantly improve wind power accommodation, a robust market clearing model is employed for day-ahead market clearing implemented by an independent system operator. On the other hand, since the Least Squares Continuous Actor-Critic algorithm is demonstrated as an effective method in dealing with Markov decision-making problems with continuous state and action sets, we propose the Least Squares Continuous Actor-Critic-based approaches to model and simulate the dynamic bidding interaction processes of many wind power producers adopting two different bidding modes in the day-head electricity market under robust market clearing conditions, respectively. Simulations are implemented on the IEEE 30-bus test system with five strategic wind power producers, which verify the rationality of our proposed approaches. Moreover, the quantitative analysis and comparisons conducted in our simulations put forward some suggestions about leading wind power producers to reasonably bid in market and bidding mode selections.


Introduction
The day-ahead electricity market (EM) is a crucial component in the EM system [1].In recent years, wind power resources have experienced an unprecedented growth in the day-ahead EMs worldwide.Studies on wind power bidding in the day-ahead EM with wind power penetration are too numerous to enumerate one by one.References [2][3][4] etc., for reasons such as low marginal cost of wind power producer (WPP) etc., hold that the bidding mode (BM) of a WPP is to only send the independent system operator (ISO) its power output plan for each period of the next day (namely, BM 1).The ISO ensures the wind power accommodation according to every WPP's day-ahead power output plan, but a WPP should be financially punished when its real time power output deviates from the day-ahead bidding one [3].References [5][6][7][8][9] etc., based on actual EMs such as PJM (Pennsylvania-New Jersey-Maryland) etc., point out that the BM of a WPP is to provide ISO a bidding curve for each period of the next day (namely, BM 2).A bidding curve consists of bidding price and power output.According to these day-ahead bidding curves provided by WPPs, ISO, within a certain range of forecasted power outputs corresponding to each WPP, can dispatch the power outputs of WPPs in the day-ahead EM.However, a WPP should also be financially punished when its real time power output deviates from the day-ahead scheduled one [9].In this work, we believed that different BMs adopted by WPPs may lead to different market results such as profits, clearing prices and operation cost of the power system.
Energies 2017, 10, 924 4 of 27 reconstructing the market clearing mechanism.The motivation of proposing the LSCAC-based EM modeling approach is to assist strategic WPPs to make more appropriate bidding decisions so as to improve both the WPPs' profits and the economic efficiency of the whole market compared with TBRL-based approaches.Moreover, comparison between different BMs can offer some suggestions about improving wind power resources development and market economic efficiency.
The rest of this paper is organized as follows: in Section 2, the concrete mathematic formulations of WPPs' different BMs and the robust day-head MCM are proposed.Section 3 puts forward the proposed LSCAC-based day-ahead EM modeling approach for WPPs.Section 4 conducts the simulations and comparisons.Section 5 concludes the paper.

Model Assumptions
According to [41], the day-ahead EM is actually a dynamic complex system in which dynamic (direct and indirect) interactions exist among all participants.When considering WPPs' strategic behaviors in the day-ahead EM, on one hand, the MCM of the ISO should be modified in order to accommodate the deviations caused by real time wind power output uncertainties [10,19,20], while on the other hand, EM modeling approaches should be proposed to help with obtaining WPPs' reasonable strategies under different BMs.
In this section, strategic WPPs' different BMs and the robust day-head MCM are mathematically formulated.For the sake of simplicity and without loss of generality, we make some assumptions listed as follows before conducting any further research: • Like [9], in our study, the problem of SCUC is assumed to have been solved exogenously in advance, and consequently, the UC constraints (i.e., ramping rates, startup costs/times, minimum down-times) are not considered.However, the proposed single period EM modeling approach containing single period robust MCM can be extended to a multi-period one.Moreover, network loss is ignored, and the shift factor matrix is constant.

•
Because we mainly consider the strategic behaviors of WPPs in a day-ahead EM, the bidding strategy of any other conventional generators is neglected [9,32,33].We also assume that load in any bus is inelastic without load shedding [9].

•
Uncertainties are only caused by WPPs, The uncertainty-set can be truly formulated by the ISO [10,19,20].The marginal cost of every WPP is neglected [3,31].Hence, when a WPP is in BM 2, there is only one bidding price in this WPP's bidding curve.

Different BMs of WPP
Real-time wind power output cannot be accurately predicted from the day-ahead horizon, which forces WPPs and the ISO to carefully consider these strong uncertainties when bidding and implementing the day-ahead market clearing, respectively.However, the ISO (WPP) can predict the real time wind power output interval of a certain WPP more accurately than the real time wind power output prediction.If the number of WPPs in a power system is N W , consistent with [10,19,20], the uncertainty set corresponding to those N W WPPs' real time power outputs can be modeled as: U = Pw = (Pw 1 , Pw 2 , ..., Pw i , ..., Pw N W ) : where, i is the index for WPP, Pw i represents the actual power output of the i-th WPP, lw i , uw i (0 ≤ lw i ≤ uw i ) represent the lower and upper bounds of Pw i , Λ is the budget parameter and assumed as an integer [20].Moreover, accurate prediction of lw i and uw i can provide a valuable reference for WPP i 's bidding decision-making.Therefore: Energies 2017, 10, 924 5 of 27

•
When BM 1 is adopted by WPP i , the only bidding parameter is its power output plan (bidding power output) Pw b i which must satisfy Equation (2): Under this BM, the bidding strategy of WPP i is to adjust the value of Pw b i .

•
When BM 2 is adopted by WPP i , it provides to ISO the bidding curve which is as follows: where, ) represent the lower and upper limits of ρ b i .Under this BM, the bidding strategy of WPP i is to adjust the value of ρ b i .

Robust Market Clearing Models under Different BMs of WPPs
According to Section 2.1, because the problem of SCUC is assumed to have been solved exogenously in advance, we propose a single period day-head robust MCM which is mainly focused on an ED procedure.The purpose of doing so is to make the power system accommodate any wind power deviation caused by real time wind power output uncertainty within a certain uncertainty set.With this robust MCM, the ISO, based on the day-ahead biddings (curves) of all participants, desires to get the optimal robust ED solution in the base-case scenario [19].Under the optimal robust ED solution, the ISO can re-dispatch the flexible resources, such as adjustable conventional generators with fast ramping capabilities, etc., to follow the load when a deviation occurs.The method of obtaining an optimal robust ED solution in the base-case scenario is significantly less conservative than that in the worst-case scenario [16].Moreover, this robust MCM can reasonably generate prices for power outputs, loads, reserves and deviations which are the byproducts of the optimal robust ED solution [10,20].
The mathematical formulation of this robust MCM can be described as follows: (Problem) min − P j ≤ −P j min , ∀j − and Ω = P GW = (P 1 , P 2 , ..., P j , ..., ∆P j ≤ r u j , ∀j N bus − where, j and N G represent the index and number of conventional generators that are determined as being the state of start-up in advance, respectively.Because it is assumed that the UC solution is fixed exogenously, N G can also be considered as the number of conventional generators in power system for simplicity.P j is the dispatched power output for the j-th generator, c j is the cost coefficient of the j-th generator, m and N bus represent the index and number of buses in power system.In the basic-case scenario, Equation ( 5) indicates power balance of the system, Equations ( 6) and (7) show the power limits of generators, Equations ( 8) and ( 9) stand for the transmission constraints of all lines in system.In case of wind power deviations, Equation (10) indicates power balance of the system in re-dispatch, Equations ( 11) and (12) show the power limits of generators in re-dispatch, Equations ( 13) and ( 14) are constraints for power re-dispatch variables ∆P j s, Equations ( 15) and ( 16) stand for the transmission constraints of all lines in system in re-dispatch.

Robust MCM Reformulation
By solving RMCM 1 or RMCM 2, the obtained optimal robust ED solution P GW ∈ Ω is immunized against any uncertainty ∀Pw ∈ U [19].When uncertainty Pw occurs, deviations caused by Pw can be accommodated by the power re-dispatch ∆P.However, it should be noted that both RMCM 1 and RMCM 2 cannot be directly solved.Similar to [19,20], reformulation is adopted to solve the two RMCMs.In order to facilitate the description, reformulation of {Equations (4)-( 9)} ∪ Ω which contains a master problem (MP) and a sub-problem (SP) will first be established as follows: (MP) s.t.(5)-( 9) and: − (P j + ∆P j + Pw disp i where, κ is the index set for worst uncertainty points Pw k s which are dynamically generated in (SP) during the solution procedure.
If BM 1 isadopted by WPPs, Equations ( 17) and ( 18) should be added to (MP).If BM 2 is adopted by WPPs, Equations (19) and (20) should be added to (MP).According to References [19,20], the objective function in (SP) contains the summation of non-negative slack variables s + i s and s − i s, which evaluates the violation associated with the solution from (MP).s + i s and s − i s can be explained as un-followed uncertainties (i.e., generation shedding etc.) due to system limitations.Hence, to solve (SP) is to find the worst point Pw k in U given ED solutions.The solution procedure is [19]: (6) end while.

•
Defining uncertainty marginal price (UMP) as [20]: the marginal cost of immunizing the next unit increment of uncertainty, then no matter which BM WPPs adopt, for the deviation Pw k − Pw disp corresponding to a worst point Pw k , the UMP for reserve credit and deviation payment at bus m is: where, − Pw ik ).Moreover, when π mk > 0, it can be illustrated that the direction of power re-dispatch corresponding to worst point Pw k at bus m is upward (∆P jk ≥ 0, ∀j ∈ Θ(m)), and when π mk < 0, it can be illustrated that direction of power re-dispatch corresponding to worst point Pw k at bus m is downward (∆P jk ≤ 0, ∀j ∈ Θ(m)).
Moreover, the structural differences between RMCM 1 and RMCM 2 may make the primal solutions (P GW , Pw disp , ∆P) and the dual solutions (λ, β, (µ), α, η) obtained by solving RMCM 1 and RMCM 2 differ from each other.Therefore, although the clearing price formulas under 2 BMs have no difference according to Equations (35) and (36), different primal-dual solutions obtained by solving RMCM 1 and RMCM 2 still make the obtained clearing prices (π m , π mk ) under 2 BMs different.
In summary, no matter which BM is adopted by WPPs, the estimated profit of WPP i (∀i) in one day-ahead bidding can be calculated as: Hence, the objective of WPP i (∀i) bidding in day-ahead EM is to maximize R i .

Definitions
Although the BM and MCM in a day-ahead EM can be specified by relevant regulators in advance, a strategic WPP in EM still has limited information about other rivals.Owning to this fact of incomplete and imperfect information in the day-ahead EM [41], strategic WPPs must dynamically improve their profits through repeatedly bidding in day-ahead EM, which is actually a dynamic multi-participant decision-making process.This work intends to propose a LSCAC-based day-head EM modeling approach to simulate this dynamic multi-WPP decision-making process.Hence, similar to Reference [41], some necessary definitions are organized as follows: • Agent: we consider every WPP as an agent who, for the purpose of improving its profit, has the adaptive learning ability to dynamically adjust its bidding strategy according to its accumulated experiences through repeated bidding.Hence, the multi-WPP decision-making process can be also considered as multi-agent decision-making process.In our work, LSCAC algorithm is applied to depict this adaptive learning ability and to assist every WPP in bidding decision making.

•
Iteration: since the market is assumed to be cleared in day-ahead single period basis, we consider each transaction day as an iteration T.

•
State variable: in iteration T, the LMP and UMPs in bus m cleared in iteration T − 1 are considered as the market environment states for WPP i connected in bus m (i ∈ Θ(m)), which is because WPP i (i ∈ Θ(m)) actually has no idea about other market information.Taking x i,T to represent the state variable vector for WPP i in iteration T, the relationship between x i,T and clearing prices is as follows [20]: where, π m,T−1 and π mk,T−1 represent the LMP and the k-th UMP in bus m cleared in iteration T − 1, respectively.

•
Action variable: in iteration T, the bidding strategy of WPP i is considered as its action.Taking a i,T to represent the action variable for WPP i .If WPP i bids in EM under BM 1, the relationship between a i,T and bidding strategy is as follow: where, Pw b i,T is the bidding output (strategy) of WPP i in iteration T. If WPP i bids in EM under BM 2, the relationship between a i,T and bidding strategy is as follows: where, ρ b i,T is the bidding price (strategy) of WPP i in iteration T.

•
Reward: in iteration T, WPP i 's reward is: where, R i,T is WPP i 's estimated profit obtained from bidding in iteration T.

LSCAC Algorithm
In TBRL-based EM modeling approaches [26][27][28][29][30][31]36], both the state and action sets should be assumed as discrete, otherwise the problem of "curse of dimensionality" will be caused so as to significantly hinder agents from improving their profits.However, according to Section 3.1 and [41], in day-ahead EM with many strategic WPPs, both x i,T (∀i) and a i,T (∀i) are within the continuous, bounded and closed sets (spaces).Therefore, a modified RL algorithm must be applied in day-ahead EM modeling for the study of strategic behaviors of WPPs.
In our work, we apply the LSCAC algorithm to this issue for the first time.The LSCAC algorithm is a modified actor-critic based RL algorithm which can rapidly tackle the dynamic multi-agent decision-making problem with continuous action and state sets.In the LSCAC algorithm, state value function and policy function of every agent are approximated by using linear combinations of basis functions.Linear parameters in state value functions corresponding to agents' critic parts are updated online by using the temporal difference error (TD(0))-based method, the specific procedure of which can be found in [41].The online updating procedure of linear parameters in policy functions corresponding to agents' actor parts is described as follows [40]: by using a linear function, we estimate and repeatedly update in an agent's actor part an optimal policy function Î : X → A defined on the continuous state space X: where, φ h : X → R(h = 1, 2, ..., n) represents the h-th basis function of state x ∈ X.A represents the continuous action set of an agent, a x (optimal) ∈ A represents the optimal action in face of state x.
The linear parameter vector ω can be described as: An agent must generate a corresponding action a ∈ A in face of any state x ∈ X based on the policy maintained and repeatedly updated by its actor part.The policy is actually an action generating model which has the ability of balancing the exploration and exploitation, and can be mathematically formulated as follows [40,41]: where, σ > 0 is a standard deviation parameter which represents the exploring ability of the LSCAC algorithm.Hence, the MSE function of ω is defined as [40,41]: where, P (pro) (x) is the probability distribution function of x under policy pro, sig[δ(x, a)] is the sigmoid function of δ(x, a) which means the TD(0) error of selecting action a in face of state x.Its formulation is as follows [40,41]: In iteration T, using δ T to replace δ(x T , a T ), formulation of δ T is as follows [40,41]: where, linear vector θ T is composed of linear parameters in value functions in iteration T [41], 0 ≤ γ ≤ 1 is a discount factor.Let the derivative of Equation ( 46) on !equal to 0, then: It should be noted that the integral formula in the left side of Equation ( 49) is hard to calculate.If the sample points from iteration 0 to iteration N are (x 0 , a 0 , r 0 , x 1 ), (x 1 , a 1 , r 1 , x 2 ), ..., (x N , a N , r N , x N+1 ), Equation ( 46) can be approximately replaced by: The reformulation of Equation ( 50) is: Energies 2017, 10, 924 11 of 27 Define a n-order matrix A N and a n-dimensional vector b N , respectively: hence: Because (A N ) −1 may not exist, (A N ) −1 can be approximately replaced by (A N + ΠI) −1 based on the method of ridge regression [40], where Π (Π > 0) is a smaller constant, I is an n-order identity matrix.When N is large, the calculation of parameter vector ω in Equation ( 54) may be unstable.Reference [40] has proposed a new calculation formula for parameter vector ω, which is as follows:

The Step-by-Step Procedure of the Proposed Approach
In summary, the step-by-step procedure of LSCAC-based day-ahead EM modeling approach for WPPs (under two BMs) can be described as follows: (1) Input: basis function vector φ: and values of σ, q, γ (for WPP i (∀i)).
(5) In iteration T, generate a i,T ∼ N(φ(x i,T ) T ω i,T , σ 2 ) where a i,T represents Pw b i,T or ρ b i,T (for WPP i (∀i)) and then ISO implements the robust MCM represented by RMCM 1 or RMCM 2. (6) After market clearing, WPP i (∀i) obtains the immediate reword r i,T using Equation (37) and a new market state x i,T+1 which can be generated by the ISO using Equations ( 35), ( 36), ( 38)-( 40).(7) ) Checking the iterative termination condition, if our procedure achieved the iterative termination condition, go on to step (14), otherwise, return to step (5).( 14) Output: ), based on which WPP i (∀i) can select the optimal bidding strategy (under BM 1 or 2) in face of whichever market state is.
Moreover, According to [40], we choose a Gaussian radial basis function as φ(x).

System Data
In this section, by implementing the robust MCM mentioned in Section 2, our proposed day-ahead EM modeling approaches under different BMs are simulated on the IEEE 30-bus test system with five strategic WPPs [9].Matlab R2014a is utilized to conduct our simulations.Figure 1 shows the schematic structure of the test system.Table 1 depicts the predicted single period loads distributed in different buses [42].Consistent with the assumptions in Section 2, any uncertainties in this test system are not caused by loads.Parameters of conventional generators can be seen in Table 2, and the predicted power output intervals of the five WPPs which are the crucial components of the uncertainty set, are listed in Table 3.For the sake of simplicity and without losing generality, we assume the power output interval of WPP i (∀i) predicted by ISO is the same as that predicted by WPP i (∀i) itself.
Energies 2017, 10, 924 13 of 27 distributed in different buses [42].Consistent with the assumptions in Section 2, any uncertainties in this test system are not caused by loads.Parameters of conventional generators can be seen in Table 2, and the predicted power output intervals of the five WPPs which are the crucial components of the uncertainty set, are listed in Table 3.For the sake of simplicity and without losing generality, we assume the power output interval of WPPi( i ∀ ) predicted by ISO is the same as that predicted by WPPi( i ∀ ) itself.Note: Because the bidding strategies of conventional generators are neglected, the value of c j (∀j) in Table 2 is obtained by using marginal cost of the j-th conventional generator when its power output reaches P j max .Moreover, parameters of every conventional generator's marginal cost can be seen in Reference [43].

Robust MCM Testing
The value of budget parameter Λ is related to the size of the uncertainty set.The smaller Λ is, the smaller is the size of uncertainty set estimated by ISO.That is to say the day-ahead market clearing procedure of ISO tends to be more deterministic with the decrease of the value of Λ (Λ ≥ 0).When Λ = 0, it means the predicted power output of every WPP is deemed by ISO as a definite value which, according to Equation (1) and [10,19,20], is equal to the intermediate value of its power output interval, and the day-ahead MCM of ISO is completely turned into a conventional deterministic MCM similar to [30].However, uncertainties exist objectively in a power system with WPPs.If the market clearing procedure were implemented by ISO considering enough uncertainties, the reserve capacity of the system dispatched in day-ahead might find it hard to accommodate deviations caused by WPPs' power output uncertainties in real time, which can seriously affect the security of the system and cause huge extra costs such as wind-abandonment, etc.Therefore, market clearing results under different Λ values must be compared so as to verify the necessity of proposing robust MCM in day-ahead EM with wind power penetrations.
In this section, in order to facilitate the market clearing comparison, we assume every WPP is under BM 1 and sends the ISO the intermediate value of its power output interval.In fact, the same key conclusions obtained from market clearing comparisons can also be generated with other BM and strategies.Moreover, no matter what the ISO thinks the value of Λ is, the actual value of Λ which represents the objective existence of uncertainty is fixed, by us, to the number of WPPs in the system (Λ = 5).Table 4 shows the market clearing results under different Λ values.In Table 4, because the marginal cost of every WPP is neglected [3,31], the "operation cost" in column 2 can be calculated by using N G ∑ j=1 c j P j when the optimal ED solution is obtained.Moreover, the "uncertainty that cannot be accommodated" in column 3 means whether there exist uncertainty points in U(Λ = 5) that cannot be accommodated when the optimal ED solution is obtained.
The "number of uncertainty poles that cannot be accommodated" in column 4 denotes the number of poles in U(Λ = 5) that cannot be accommodated when the optimal ED solution is obtained.From Table 4, it can be concluded that:

•
The total operation cost increases with the increase of Λ.However, uncertainties that cannot be accommodated tend to be eliminated by increasing Λ.On one hand, it means the conservatism of ISO is improved with the increase of Λ, which reduces the economic efficiency of scheduling to a certain extent.On the other hand, the operation cost is calculated based on the basic scenario (WPPs' day-ahead bids), in which extra cost caused by uncertainties that cannot be accommodated is not taken into account.Although that extra cost is hard to be specifically calculated due to many reasons such as missing information about the real-time occurrence of uncertainty from day-ahead horizon etc., it can be considerable once any uncertainty that cannot be accommodated occurs in practice.Therefore, it is necessary to eliminate uncertainties that cannot be accommodated by reasonably increasing the value of Λ.

•
When Λ = 0, it means the ISO clears the market using the conventional deterministic MCM.
The number of uncertainty poles that cannot be accommodated in case Λ = 0 is 22 which is significantly more than any other cases listed in Table 4 (actually number of uncertainties that cannot be accommodated in case Λ = 0 is infinite).That is to say it is necessary to employ a modified MCM, such as our proposed robust MCM, in day-ahead EM with considerable uncertainties (i.e., WPPs).

•
Comparing cases of Λ = 4 and Λ = 5, on one hand, there are no uncertainties that cannot be accommodated in both of the two cases; on the other hand, operation cost in case Λ = 4 is equal to that in case Λ = 5.Moreover, increasing Λ means to increase the computational complexity of solving the robust MCM [15,16,19,20].Hence, the proposed robust MCM with Λ = 4 is applied for market clearing in our subsequent simulations.

LSCAC-Based EM Modeling Approach Testing
In this section and our subsequent simulations, no matter under which BM, every WPP (agent) will start with experiencing a training process of 3000 iterations.During this training process, all WPPs consider the balance of exploration and exploitation when selecting bidding strategies (actions) in each iteration [41].After the training process, decision making process of 500 iterations will be implemented by every WPP, in which only greedy policy will be adopted when selecting actions in face of any state of the market.Moreover, we randomly set the action for every WPP at the beginning of the first training iteration because every WPP starts with limited experience in strategy selecting.
Testing and verifying whether our proposed LSCAC-based day-head EM approach under BM 1 reaches to dynamic stability or not after 3000 training iterations can be shown in Figures 2-4.Moreover, Testing and verifying whether our proposed LSCAC-based day-head EM approach under BM 2 reaches to dynamic stability or not after 3000 training iterations can be shown in Figures 5-7.In Figures 4  and 7, summed UMP at bus m in each iteration can be calculated by using Equation (39).
BM 1 reaches to dynamic stability or not after 3000 training iterations can be shown in Figures 2-4.Moreover, Testing and verifying whether our proposed LSCAC-based day-head EM approach under BM 2 reaches to dynamic stability or not after 3000 training iterations can be shown in Figures 5-7.In Figures 4 and 7, summed UMP at bus m in each iteration can be calculated by using Equation (39).Before we analyze BMs and strategies for WPPs by using our proposed LSCAC-based day-head EM modeling approach, it should be tested first whether our proposed approaches under different BMs converge to dynamic stabilities after every WPP experiences enough iterations of on line training.If the convergence was verified, the market state and obtained action of every WPP would no longer change after enough training iterations.It should be noted that in the existing TBRL-based approaches [26][27][28][29][30][31]36], the action set of every agent is discrete and finite, and the  Before we analyze BMs and strategies for WPPs by using our proposed LSCAC-based day-head EM modeling approach, it should be tested first whether our proposed approaches under different BMs converge to dynamic stabilities after every WPP experiences enough iterations of on line training.If the convergence was verified, the market state and obtained action of every WPP would no longer change after enough training iterations.It should be noted that in the existing TBRL-based approaches [26][27][28][29][30][31]36], the action set of every agent is discrete and finite, and the optimality of an agent's final obtained action can be easily verified by using method mentioned in Before we analyze BMs and strategies for WPPs by using our proposed LSCAC-based day-head EM modeling approach, it should be tested first whether our proposed approaches under different BMs converge to dynamic stabilities after every WPP experiences enough iterations of on line training.If the convergence was verified, the market state and obtained action of every WPP would no longer change after enough training iterations.It should be noted that in the existing TBRL-based approaches [26][27][28][29][30][31]36], the action set of every agent is discrete and finite, and the optimality of an agent's final obtained action can be easily verified by using method mentioned in [31], which is to compare profits brought from all actions in this agent's action set while fixing the actions of other agents.However, in our proposed LSCAC-based approach, the action set of every agent is continuous.It is impossible to directly test the optimality of an agent's final obtained action because there are infinite actions other than this final obtained one.Therefore, we propose the following three steps to further test the performance of the LSCAC-based day-head EM modeling approach:

•
To test the optimality of a WPP's final obtained strategy in TBRL-based (i.e., Q-Learning algorithm [26,27]) day-head EM modeling approaches after converging to dynamic stabilities by comparing profits brought from all this WPP's strategies while fixing other WPPs' obtained strategies.The specific optimality test method can be seen in [31].

•
To test whether a WPP can obtain more profit by using LSCAC algorithm than TBRL algorithm (Q-Learning algorithm [26,27]) or not, after converging to dynamic stabilities.

•
To test whether the whole market can reach lower operation cost in our proposed LSCAC-based approach than TBRL-based (Q-Learning algorithm [26,27]) one or not, after converging to dynamic stabilities.
The related parameters of our LSCAC-based day-head EM modeling approach are listed in Table 5.From Figures 2-4, it can be seen that, after randomly fluctuating in 3000 training iterations, the adjustment processes of estimated profit, LMP and summed UMP of every WPP remain constant during 500 decision-making iterations.Actually, other adjustment processes such as that of operation cost, every WPP's bidding strategy, etc. also become constant after 3000 training iterations.Therefore, our proposed approach under BM 1 can converge to dynamic stability after every WPP experiences 3000 iterations of online training.
From Figures 5-7, it can be seen that, after randomly fluctuating in 3000 training iterations, the adjustment processes of estimated profit, LMP and summed UMP of every WPP remain constant during 500 decision-making iterations.Actually, other adjustment processes such as that of operation cost, every WPP's bidding strategy, etc. also become constant after 3000 training iterations.Therefore, our proposed approach under BM 2 can converge to dynamic stability after every WPP experiences 3000 iterations of online training.
The main reason about the fluctuating trends in the 3000 training iterations in Figures 2-7 is that in order to balance the exploration and exploitation during these 3000 training iterations, every WPP must maintain the ability of exploration which is to randomly select bidding strategies according to the repeatedly updated Equation (45), all WPPs' insufficient experiences and unstable action selecting policies make the dynamic training process of EM fluctuate randomly.The main reason about the constant trends in 500 decision-making iterations in Figures 2-7 is that after accumulating enough experiences, every WPP adopts the greedy policy which is to only select its considered optimal bidding strategy in face of any observed EM state in each of the 500 decision-making iterations, all WPPs' sufficient experiences and stable action selecting policies make the dynamic decision-making process of EM converge to stability.Therefore, it may be concluded that enough training iterations considering the balance of exploration and exploitation, as well as the greedy action selecting policy adopted in decision-making iterations are two main factors resulting in EM dynamic stability.Taking EM approach under BM 1 for example, Figure 8 shows the dynamic adjusting process of WPP 1 's estimated profit when every WPP experiences 1000 training iterations and 500 decision-making iterations, and Figure 9 shows the dynamic adjustment process of WPP 1 's estimated profit when every WPP experiences 3500 training iterations without greedy action selecting policy.
Energies 2017, 10, 924 19 of 27 shows the dynamic adjusting process of WPP1's estimated profit when every WPP experiences 1000 training iterations and 500 decision-making iterations, and Figure 9 shows the dynamic adjustment process of WPP1's estimated profit when every WPP experiences 3500 training iterations without greedy action selecting policy.From Figure 8, it is shown that although the greedy action selecting policies are adopted by WPPs in decision-making iterations, insufficient training iterations, which mean insufficient experiences accumulated, still make WPP 1 's estimated profit fluctuate during decision-making process.Actually, the dynamic adjustment processes of other WPPs' estimated profits also fluctuate during decision-making iterations.
From Figure 9, it is shown that although more than 3000 training iterations considering the balance of exploration and exploitation are conducted, WPP 1 's estimated profit still fluctuates during the last 500 iterations due to its lack of a greedy action selection policy.Actually, the dynamic adjustment processes of other WPPs' estimated profits also fluctuate during the last 500 iterations.Moreover, WPP 1 's estimated profit in Figure 9 is much more volatile during the last 500 iterations than that in Figure 8, which is mainly because WPPs lacking greedy action selection policies tend to bid more randomly in the EM.From Figure 8, it is shown that although the greedy action selecting policies are adopted by WPPs in decision-making iterations, insufficient training iterations, which mean insufficient experiences accumulated, still make WPP1's estimated profit fluctuate during decision-making process.Actually, the dynamic adjustment processes of other WPPs' estimated profits also fluctuate during decision-making iterations.
From Figure 9, it is shown that although more than 3000 training iterations considering the balance of exploration and exploitation are conducted, WPP1's estimated profit still fluctuates during the last 500 iterations due to its lack of a greedy action selection policy.Actually, the dynamic adjustment processes of other WPPs' estimated profits also fluctuate during the last 500 iterations.Moreover, WPP1's estimated profit in Figure 9 is much more volatile during the last 500 Therefore, EM dynamic stability cannot be reached whether there are insufficient training iterations or the greedy action selecting policy is not considered, which, to a certain extent, on the one hand verifies our conclusions about the two main factors resulting in EM dynamic stability, and on the other hand, suggests that the proposed 3000 training iterations and 500 decision-making iterations are comparatively reasonable for our proposed LSCAC-based approach to reach EM dynamic stability.
To further test the performance of our proposed approaches under different BMs, two Q-learning-based day-ahead EM approaches (QDEMAs) are taken for comparison.In these QDEMAs, some WPPs are designated as Q-learning-based agents while other undesignated WPPs are still the LSCAC-based ones.A Q-learning-based agent dynamically adjusts its action based on Q-learning algorithm which use ε-greedy policy [26] to balance exploration and exploitation in 3000 training iterations, and greedy policy in 500 decision-making iterations.Difference among these QDEMAs is only reflected in the number of Q-learning-based agents.Parameters related to these two QDEMAs are listed in Table 6.After 3000 training and 500 decision-making iterations, the obtained market results of those two QDEMAs and our proposed approach under BM 1 are listed in Table 7, and the obtained results of those two QDEMAs and our proposed approach under BM 2 are listed in Table 8.Moreover, like our proposed LSCAC-based approach, no matter under which BM, both these QDEMAs can converge to dynamic stability after every WPP experiences 3000 iterations of online training.That means those results listed in Tables 7 and 8 are not obtained accidentally, a LSCAC-based or Q-learning-based WPP does not change its strategy when market state affected by all WPPs' strategies keeps unchanged.From Tables 7 and 8, it can be conclude that: • By using the optimality test method in [31], no matter under which BM and in which QDEMA, every Q-learning-based WPP's final obtained strategy can be verified as its optimal one in its discrete action set, which can bring it the most profit when other WPPs' strategies are fixed.

•
No matter under which BM, on the one hand, estimated profits of WPP 1 and WPP 2 in QDEMA 2 are higher than those in QDEMA 1, respectively, and estimated profits of WPP 3 , WPP 4 and WPP 5 in our proposed LSCAC-based approach are higher than those in QDEMA 2, respectively, which, to some extent, indicates one can get more profit by using the LSCAC algorithm to bid in EM than the Q-learning one within the same conditions; on the other hand, the operation cost in our proposed LSCAC-based approach is lower than that in QDEMA 2, and the operation cost in QDEMA 2 is lower than that in QDEMA 1, which, to some extent, indicates that with the increase in the number of LSCAC-based agents in EM, the operation cost of whole system can be reduced.
In conclusion, no matter under which BM, Q-learning-based WPPs can finally find their optimal bidding strategies from their discrete and finite action sets.If these WPPs are transformed into LSCAC-based ones, they can finally find their more applicable strategies from their continuous action sets, which not only bring more profits for themselves but also bring lower operation cost for the whole system than that based on Q-learning method.Hence, although it is hard to directly test the optimality of every LSCAC-based WPP's final obtained strategy, our further test has, to some extent, verified the rationality and scientific basis of applying our proposed LSCAC-based approach in day-ahead EM modeling for strategic WPPs.
Moreover, no matter under which BM, simulation of our proposed approach on IEEE 30 bus test system with five strategic WPPs takes only about 43 seconds to reach the final results (after 3500 iterations).That is to say, the time complexity of our proposed approach is relatively low so that we can extend it to the modeling and simulation of more realistic and more complex EM system.

BMs Analysis for WPPs
In this section, our proposed LSCAC-based approach is applied to analyze the obtained market results under different BMs after 3000 training iterations and 500 decision-making iterations.
Moreover, it should be noted that under BM 2, in order to lead WPP i (∀i) to reasonably bid in market, we set lower and upper limits ρ low,i and ρ upp,i ($/MWh) for its bidding price ρ b i .Values of ρ low,i and ρ upp,i (∀i) may affect the obtained market results such as final obtained LMPs, estimated profits of all WPPs and operation cost of the system etc. after 3500 iterations.Hence, different values of ρ low,i and ρ upp,i (∀i) should be taken into account when considering BMs.For the sake of simplicity and without losing generality, we set ρ low,i = ρ low , ρ upp,i = ρ upp ∀i, and different values of ρ low and ρ upp are considered.
After 3500 iterations, considering different values of ρ low (while fixing the upper limit ρ upp to 50 $/MWh, the same as what listed in Table 5, Table 9 is listed for the comparison of obtained market results under different BMs.From Tables 9 and 10, it can be seen that:

•
In Table 9, when values of ρ low are 0, 10, 20 and 30 ($/MWh), respectively, the obtained profit of every WPP, operation cost and average LMP of 30 buses under BM 2 remain unchanged.Actually, if ρ low ≤ 30 $/MWh, the obtained bidding price (strategy) of every WPP under BM 2 is higher than 30 ($/MWh), which means values of ρ low lower than 30 ($/MWh) cannot affect every WPP's bidding decision-making.Therefore, in our opinion, it is hard to weaken the market power of every WPP by only reducing the value of lower limit of every WPP's bidding price while WPPs provide ISO their bidding curves consisting of bidding prices and power outputs for the next day.

•
In Table 10, the obtained profit of every WPP, operation cost and average LMP of 30 buses under BM 2 increases with the increase of the value of ρ upp .That may be mainly because the more the value of ρ upp is, the greater market power WPPs have.Therefore, in our opinion, the upper limit of every WPP's bidding price should not be set too high while WPPs provide ISO their bidding curves consisting of bidding prices and power outputs for the next day.

•
In most cases of ρ low and ρ upp , WPPs under BM 2 can get more profits than under BM 1, which may be because WPPs under BM 2 can obtain greater market power by directly adjusting their bidding prices so as to further improve their profits compared with WPPs under BM 1.However, from the perspective of the whole market, both the obtained operation cost and average LMP under BM 2 are higher than that under BM 1, which, to some extent, indicates WPPs adopting BM 2 cause lower economic efficiency in the whole market than adopting BM 1.Therefore, in our opinion, if the purpose of permitting WPPs to bid is to promote the development of wind power resources by improving WPPs' profits, providing ISO their bidding curves is more applicable, and if the purpose of permitting WPPs to bid is to improve the economic efficiency of the whole market, only sending their power output plans is more applicable.

Conclusions
In this paper, we present a LSCAC-based day-head EM modeling approach with a robust market clearing mechanism embedded in it, and strategic behaviors of WPPs under two different BMs are successively mimicked and compared by using our proposed approach.With employing the robust MCM, day-head ED solution of the market can be immunized against any uncertainty within the real time wind power uncertainty set estimated by ISO, which not only ensures the ability of wind power accommodation in EM, but also reasonably generates LMPs for energy credit and load payment as well as UMPs for reserve credit and deviation payment.By employing the LSACA algorithm, every WPP can significantly improve its profit and the operation cost of the system can also be remarkably decreased compared with employing the TBRL (i.e., Q-learning) algorithms.Low computational time (taking only about 43 seconds for our simulation on IEEE 30 bus test system to reach the final results) makes that our proposed approach easily extendible to provide a reasonable test bed for simulation of more realistic and more complex EM systems.Moreover, by conducting comparisons on market results under different BMs in simulation, some suggestions leading WPPs to reasonably bid in market and BM selections are put forward for the purposes of promoting the development of wind power resources and improving the economic efficiency of the whole market.

Figure 1 .
Figure 1.Diagram of the test system (Note: For the sake of simplicity, here it is assumed that the maximum congestion constraints in all transmission lines are 30 MW).

Figure 4 .Figure 4 .
Figure 4. Dynamic adjustment process of summed UMP corresponding to every WPP under BM 1.

Figure 4 .
Figure 4. Dynamic adjustment process of summed UMP corresponding to every WPP under BM 1.

Figure 7 .
Figure 7. Dynamic adjustment process of summed UMP corresponding to every WPP under BM 2.

Figure 6 . 27 Figure 6 .
Figure 6.Dynamic adjustment process of LMP corresponding to every WPP under BM 2.

Figure 7 .
Figure 7. Dynamic adjustment process of summed UMP corresponding to every WPP under BM 2.

Figure 7 .
Figure 7. Dynamic adjustment process of summed UMP corresponding to every WPP under BM 2.

Figure 8 .
Figure 8. Dynamic adjusting process of WPP1's estimated profit under BM 1 when every WPP experiences 1000 training iterations and 500 decision-making iterations.

Figure 9 .
Figure 9. Dynamic adjustment process of WPP1's estimated profit under BM 1 when every WPP experiences 3500 training iterations without greedy action selecting policy.

Figure 8 .
Figure 8. Dynamic adjusting process of WPP 1 's estimated profit under BM 1 when every WPP experiences 1000 training iterations and 500 decision-making iterations.

Figure 8 .
Figure 8. Dynamic adjusting process of WPP1's estimated profit under BM 1 when every WPP experiences 1000 training iterations and 500 decision-making iterations.

Figure 9 .
Figure 9. Dynamic adjustment process of WPP1's estimated profit under BM 1 when every WPP experiences 3500 training iterations without greedy action selecting policy.

Figure 9 .
Figure 9. Dynamic adjustment process of WPP 1 's estimated profit under BM 1 when every WPP experiences 3500 training iterations without greedy action selecting policy.

Table 1 .
Values of un-elastic single period loads.

Table 2 .
Parameters of conventional generators.
Bus Generators cj (10 3 $/MWh) P min (MW) P max (MW) Note: Because the bidding strategies of conventional generators are neglected, the value of cj ( j ∀ ) in Table2is obtained by using marginal cost of the j-th conventional generator when its power output Figure 1.Diagram of the test system (Note: For the sake of simplicity, here it is assumed that the maximum congestion constraints in all transmission lines are 30 MW).

Table 1 .
Values of un-elastic single period loads.

Table 2 .
Parameters of conventional generators.

Table 3 .
Predicted power output intervals of WPPs.

Table 4 .
Market clearing results under different Λ values.

Table 5 .
Related parameters in LSCAC-based day-head EM modeling approach.

Table 6 .
Related parameters of Q-learning-based agents in two QDEMAs.

Table 7 .
Obtained market results of those two QDEMAs and our proposed approach under BM 1 (10 3 $).

Table 8 .
Obtained market results of those two QDEMAs and our proposed approach under BM 2 (10 3 $).

Table 9 .
Comparison of obtained market results under different BMs by considering different values of ρ low .After 3500 iterations, considering different values of ρ upp (while fixing the upper limit ρ low to 30 $/MWh, the same as what was listed in Table5, Table10is provided for the comparison of obtained market results under different BMs.

Table 10 .
Comparison of obtained market results under different BMs by considering different values of ρ upp .
φ h (•)/φ(•) h-th basic function/basic function vector pro(•, •) Probability density function for selecting action a under state x (representing action selecting policy during training iterations) TD error function for selecting action a under state x Constant N w , N G , N bus The numbers of WPPs, conventional generators and buses lw i ,uw i lower and upper bounds for i-th WPP's real-time wind power output ρ low,i ,ρ upp,i lower and upper limits for i-th WPP's bidding price Random variable vector representing the uncertainty of WPPs' joint real-time power outputs/random variable representing the uncertainty of i-th WPP's real-time power output, moreover, Pw/Pw i can also be considered as part of the decision variables (vector) in SP Pw k k-th worst uncertainty point of WPPs' joint real-time power outputs, moreover, Pw k also represents part of SP's solutions when solving SP for the k-th time Decision variables in MP, representing day-ahead dispatched power output of j-th conventional generator, i-th WPP, as well as real-time power re-dispatch incremental result of j-th conventional generator under k-th worst uncertainty point, respectively P GW , Pw disp , ∆P Variable vector consisting of P j (∀j) and Pw are non-negative slack variables, and the sum of which evaluate the violation associated with the solution from MP, ∆P j represents real time power re-dispatch approach of j-th conventional generator for accommodating uncertainties within U Deviation of the real-time power output generated by WPPs connecting in bus m under k-th worst uncertainty point from the day-head bidding (dispatched) one π m , π mk LMP in bus m, UMP in bus m under k-th worst uncertainty point R i Estimated profit of i-th WPP x, a, r State variable, action variable, reward (∀i), variable vector consisting of ∆P jk (∀j,∀k)s + i , s − i , ∆P jOther decision variables in SP, s + i , s − i