Next Article in Journal
Waste-to-Energy in China: Key Challenges and Opportunities
Next Article in Special Issue
A Study on Maximum Wind Power Penetration Limit in Island Power System Considering High-Voltage Direct Current Interconnections
Previous Article in Journal
High Resolution Modeling of the Impacts of Exogenous Factors on Power Systems—Case Study of Germany
Previous Article in Special Issue
Voltage Control Method Using Distributed Generators Based on a Multi-Agent System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Designing an Incentive Contract Menu for Sustaining the Electricity Market

1
School of Mechatronics Engineering and Automation, Shanghai University, Shanghai 200072, China
2
Ingram School of Engineering, Texas State University, San Marcos, TX 78666, USA
*
Author to whom correspondence should be addressed.
Energies 2015, 8(12), 14197-14218; https://doi.org/10.3390/en81212419
Submission received: 14 October 2015 / Revised: 4 December 2015 / Accepted: 7 December 2015 / Published: 16 December 2015
(This article belongs to the Special Issue Electric Power Systems Research)

Abstract

:
This paper designs an incentive contract menu to achieve long-term stability for electricity prices in a day-ahead electricity market. A bi-level Stackelberg game model is proposed to search for the optimal incentive mechanism under a one-leader and multi-followers gaming framework. A multi-agent simulation platform was developed to investigate the effectiveness of the incentive mechanism using an independent system operator (ISO) and multiple power generating companies (GenCos). Further, a Q-learning approach was implemented to analyze and assess the response of GenCos to the incentive menu. Numerical examples are provided to demonstrate the effectiveness of the incentive contract.

1. Introduction

As the vertically integrated power industry evolves into a competitive market, electricity now can be treated as a commodity governed by demand and generation interactions. Competition among generating companies (GenCos) is highly encouraged in order to lower the energy price and benefit end consumers. However, when GenCos are given more flexibility to choose their bidding strategies, larger uncertainties are also brought into the electricity markets. Many factors that affect bidding strategies include the risk appetite of Gen Cos, price volatility of fuels, weather conditions, network congestion and overloading. These factors and their interactions may lead to larger price volatility in the deregulated power market.
Many efforts have been made to design and optimize the bidding strategies in the presence of uncertainty. Zhang et al. [1] proposed an efficient decision system based on a Lagrangian relaxation method to find the optimal bidding strategies of GenCos. Kian and Cruz [2] modeled the oligopolistic electricity market as a non-linear dynamical system and used dynamic game theory to develop bidding strategies for market participants. Swider and Weber [3] proposed a methodology that enables a strategically behaving bidder to maximize the revenue under price uncertainty. Centeno et al. [4] used a scenario tree to represent uncertain variables that may affect price formation, which include hydro inflows, power demand, and fuel prices. They also presented a model to analyze GenCos’ medium-term strategic analysis. In [5], a dynamic bid model was used to simulate the bidding behaviors of the players and study the inter-relational effects of the players’ behaviors and the market conditions on the bidding strategies of players over time. Li and Shi [6] proposed an agent-based model to study the strategic bidding in a day-ahead electricity market, and found that applying learning algorithms could help increase the net earnings of the market participants. Nojavan and Zare [7] proposed an information gap decision theory model to solve the optimal bidding strategy problem by incorporating the uncertainty of market price. Their case study further shows that risk-averse or risk-taking decisions could affect the expected profit and the bidding curve in the day-ahead electricity market. Qiu et al. [8] discussed the impacts of model deviations on the design of a GenCo’s bidding strategies using the conjectural variation (CV) based methods, and further proposed a CV-based dynamic learning algorithm with data filtering to alleviate the influence of demand uncertainty. Kardakos et al. [9] point out that when making a bidding decision, a GenCo would take into accounts the behavior of its competitors as well as specific features and enacted rules of the electricity market. They further developed an optimal bidding strategy for a strategic generator in a transmission-constrained day-ahead electricity market. Other studies [10,11,12] emphasized that transmission constraints, volatile loads, market power exertions, and collusions may induce GenCos to bid higher prices than their true marginal costs, thereby aggravating the price volatility issue.
As concern for the sustainability of the power market increases, efforts also have been made to reduce the risk of price variation. Most studies focus on the employment of price-based demand response (DR) programs for the electricity users to control and reduce the peak-to-average load ratio [13,14,15,16,17,18,19,20,21,22]. For instance, Oh and Hildreth [13] proposed a novel decision model to determine whether or not to participate in the DR program, and further assessed the impact of the DR program on the market stability. Faria et al. [14] suggested that adequate tolls could motivate the potential market players to adopt the DR programs. Ghazvini et al. [15] showed that multi-objective decision-making is more realistic for retailers to optimize the resource schedule in a liberalized retail market. In [16], a two-stage stochastic programming model was formulated to hedge the financial losses in the retail electricity market. Zhong et al. [17] proposed a new type of DR program to improve social welfare by offering coupon incentives. The researchers in [18,19] handled the energy scheduling issue by optimizing the DR program in a smart grid environment. Yousefi et al. [20] proposed a dynamic model to simulate a time-based DR program, and used Q-learning methods to optimize the decisions for the market stakeholders. In [21,22,23], much more detailed reviews were provided for benefit analyses and applications of DR in a smart grid environment.
However, in electricity markets where the demand side is regulated, how to design and optimize GenCos’ bidding strategies is treated as one of the most efficient ways to sustain the market price in the presence of uncertainty. Some studies have been made by proposing an incentive mechanism or contract for the GenCos to mitigate the risk of price variation caused by their subjective preferences during the bidding process [24,25,26,27]. Silva et al. [24] introduce an incentive compatibility mechanism, which is individually rational and feasible, to resolve the asymmetric information problem. Liu et al. [25] proposed an incentive electricity generation mechanism to control GenCos’ market power and reduce the pollutant emissions using the signal transduction of game theory. Cai et al. [26] proposed a sealed dynamic price cap to prevent GenCos from exercising market power. Heine [27] performed a series of studies on the effectiveness of regulatory schemes in energy markets, and pointed out that potential improvements exist in contemporary systems when incentive-based regulations are appropriately implemented.
Although there is a large body of literature in bidding and incentive policy, most of the studies neglect assessment of the long-term effects of the incentive programs on the GenCos’ learning behaviors. Besides, less attention is paid to the dynamic response of the GenCos to the volatile loads and incentive schemes. To maintain the market stability, it is highly desirable to understand the interplays between the incentive mechanism and the GenCos’ adaptive responses to the variable market. This paper aims to fill this gap by proposing an incentive mechanism in a day-ahead power market to reduce price variance, and further assessing the subsequent long-term impacts of the incentive mechanism. To that end, a two-level Stackelberg gaming model was developed to analyze the bidding strategies of the market participants including one independent system operator (ISO) and several GenCos. An optimal menu of incentive contracts was derived under a one-leader and multi-followers game theoretic framework. Finally, a Stackelberg-based Q-learning approach was employed to assess the GenCos’ response to the incentive-based generation mechanism.
The remainder of the paper is organized as follows: in Section 2, we introduce the menu of incentive contracts, and describe the workflow of the multi-agent game framework; in Section 3, we give a mathematical description of the problem, and present the details of the Stackelberg model; in Section 4, we use a Q-learning methodology to investigate the long-term effectiveness of the incentive contracts; in Section 5, numerical examples are provided to demonstrate the application and performance of the method; Section 6 concludes the paper.

2. Problem Statement

2.1. Description of the Menu of Incentive Contracts

A commercial agreement between the ISO and the GenCo is proposed, which defines a reward scheme in exchange for a consistent bidding behavior: the GenCo agreed to bid a reasonable power generation with a constant bidding curve during the contract period. Note that the reasonable power output should be larger than the regulated threshold of power output.
In general it is not efficient to design a uniform incentive contract with a constant threshold due to the fact that GenCos usually possess diverse bidding behaviors. It is also rather complex to design customized incentive contracts for all GenCos. One viable approach is to design a pertinent incentive menu comprised of key characteristic incentive contracts for certain target GenCos. These GenCos are selected as representatives from the entire group of power generators. Though the ISO cannot precisely predict the target GenCos’ bidding information in a future bidding round, the customized incentive contracts still can be devised by incorporating the target GenCos’ interest through inference of historical bidding data. For the target GenCo, its expected profit could be amplified if it complies with the incentive contract which takes into account its individual rational constraints and incentive compatibility constraints. Hence it is reasonable to assume that target GenCos would not reject the customized contract.
Though the incentive contracts in the menu are designed based on the individual rationality and incentive compatibility of the target GenCos, they also benefit non-target GenCos that are willing to accept the incentive contracts. For a non-target GenCo, the incentive contract would be appealing if the expected profit is higher by abiding with the agreement. To better illustrate how the menu of the incentive contracts works, some notations are given as follows.

2.1.1. Target GenCo and Contracted Generating Companies

The GenCos are termed the target GenCos if their individual rationality constraints and incentive compatibility constraints are considered so that they could be motivated to accept the incentive contract.
We define A0 as set of all possible combinations of target GenCos. For each a0A0, a 0 = ( a 1 0 , a 2 0 , ... , a m 0 ) , where a i 0 represents whether GenCo i is chosen as a target GenCo or not. Further, we define I = { k | a k 0 = 1 , k M } as a set of target GenCos.
Note that a is a combination of strategies of GenCos, a = ( a 1 , a 2 , ... , a m ) , where ai represents whether GenCo i decides to accept the menu of the incentive contracts or not. If ai = 0, it is “not”, and ai ≠ 0 is “yes”. If ai ≠ 0, ai = k, kI, meaning GenCo k accepts the incentive contract and becomes the target GenCo. The GenCos with ai ≠ 0 are termed as contracted GenCos.

2.1.2. Bidding Curve and Market Clearing Price

For electricity transactions, The study in [28] shows that GenCos submit power output in MW (Megawatts) along with associated prices for one bid in both discrete form and continuous form. A bid in discrete form with three different blocks is shown in Table 1. If the power output level is below 30 MW, the price is 10 $/MWh; If the power output level is between 30 MW and 60 MW, the price is 15 $/MWh; If the power output level is between 60 MW and 90 MW, the price is 20 $/MWh. Generally, this bid could also take a continuous form as shown in Figure 1. Without loss of generality, a continuous bid curve model is adopted in this paper. For GenCo i, its bidding curve at time t is in the form of pit = αit + βitqit, where αit and βit are the bidding coefficients of GenCo i at time t. Here pit and qit respectively, represent the bidding price and the bidding power output of GenCo i at time t.
Table 1. Block bid.
Table 1. Block bid.
BlocksPrice ($/MWh)Power output level (MW)
Block 01030
Block 11560
Block 22090
Figure 1. Block bid and continuous bid curves.
Figure 1. Block bid and continuous bid curves.
Energies 08 12419 g001
The market clearing price (MCP) is a uniform price shared by all GenCos, and the actual MCP depends on all GenCos’ bidding behaviors. Assume the bidding curve of GenCo i is pit = αit + βitqit, and the electricity demand at time t is Dt. The MCP at time t, which is denoted as pt, can be obtained by solving the following power balance equation:
D t = i = 1 m q i t
pt = αit + βitqit, for i = 1,2 …, m

2.1.3. Menu of the Incentive Contracts

The menu of the incentive contracts is composed of multiple contracting terms in the form of (αi, βi, πi), where αi and βi represent the thresholds of bidding coefficients respectively, and πi is the relevant reward for meeting the incentive contract. Though each incentive contract is originally tailored to the rational and incentive-compatibility constraints of a certain target GenCo, it is also expected that these contracts are designed appropriately to motivate non-target GenCos to participate in the incentive program.
Assuming an incentive contract is customized for a target GenCo with bidding coefficients αi0 and βi0, and the amount of the reward is πi0 by calculating the target GenCo’s individual rationality and incentive compatibility conditions. This contract could be expressed by the triplet (αi0, βi0, πi0) which specifies the reward and the obligation associated with the contracted GenCo: if the dispatched power output of the contracted GenCo during the contract period is always greater than the required level, which is prescribed as q _ = ( p t α i 0 ) / β i 0 with pt being the MCP at time t, a reward of πi0 would be received at the end of the contract period.
All sets of (αi, βi, πi) are further incorporated into (AL, B, π), where AL, B, π are the vectors of αi, βi, πi, respectively. Hence the menu of the incentive contracts could be concisely specified in the form of (AL, B, π).

2.2. Scenario-Based Approach

At certain time, unexpected events like hot weather, network congestion, and demand spikes may occur randomly, which causes load soaring and demand forecast errors. Scenario based approach [16,29,30] is often employed to address these types of uncontrollable events. These uncertain events are characterized by scenarios with corresponding probabilities. The scenarios considered in this paper include both normal scenario and bad scenario. In the latter, the load is 20% higher than the average demand. Probability of each scenario could be inferred from historical data and experiences. The Monte Carlo method is adopted to simulate both normal and bad scenarios.

2.3. The Workflow of Multi-Agent System

In this paper, a multi-agent system, adapted to a simulated context with multiple GenCos and one ISO, is proposed to study a day-ahead electricity market based on the proposed incentive menu of contracts. The multi-agent system was developed in a Java platform that was partly inherited from the Repast platform [31]. Some actions of GenCos were coded by Matlab, and then packaged and implanted into the multi-agent Java platform.
Figure 2 shows the flowchart of multi-agent system (MAS) scheduling procedure.
Figure 2. Flow chart of multi-agent system (MAS) scheduling mechanism (note: Y = yes and N = no). ISO: independent system operator.
Figure 2. Flow chart of multi-agent system (MAS) scheduling mechanism (note: Y = yes and N = no). ISO: independent system operator.
Energies 08 12419 g002
At the beginning of a specified period t ˜ that consists of T days (i.e., a period may include several days, or several months), ISO announces the menu of the incentive contracts. GenCos, which act for their own interest, decide whether they accept the incentive contract or not by using the periodic Q-learning method. This decision-making system resembles the one-leader and multi-followers Stackelberg game where the ISO is the leader and the GenCos are the followers. An algorithm using the idea of the Stackelberg game, which is further illustrated in Section 4.3, is presented for the ISO to find the initial optimal menu of incentive contracts in the first period. In addition, a periodic Stackelberg-based Q-learning method, which is illustrated in Section 4.2, is proposed for the ISO to find the subsequent optimal menu of incentive contracts over the following periods.
At the beginning of a period, GenCo i decides whether or not to accept the incentive program, by using its periodic Q-learning method. In each day of the period, GenCo i chooses to place a high bid or a normal bid by using its daily Q-learning method, and submits its bid, taking the form of pit = αit + βitqit. After the ISO receives all the bids from the participants, the relevant information is aggregated and stored in a central repository. Based on the estimated hourly electricity demand of the next day, the ISO decides the unified hourly MCP of the next day, and announces the hourly power output schedule of individual GenCos for the next day.
At the end of period t ˜ , the ISO computes the relevant rewards based on the bidding data retrieved from the central repository. If any GenCo’s bidding data in the given period are constant, and always in alignment with certain contract in the menu of the incentive contracts, the GenCo would receive the relevant reward. During the repetitive bidding periods, both ISO and the GenCos improve their pricing policy and bidding strategies using the Q-learning algorithm.

3. Multi-Agent Stackelberg Game Model

3.1. Model Assumption and Description

Model assumptions are given as follows:
(1) To prevent GenCos reaping extra profits by modifying their bidding data to satisfy the incentive contract, it is stipulated that any GenCo using new bidding coefficients is not eligible to join the incentive program until after several rounds.
(2) At some time, due to the uncertainties in weather condition, network reliability and consumer behavior, unexpected demand spikes may occur, and the load may vary with a large degree of uncertainty. Probabilistic scenarios trees are adopted to accommodate the uncertain characteristics of the load profile. For instance, the electricity demand at time t is estimated to be 100 MW with probability of 0.8 for the normal demand scenario, and 150 MW with probability 0.2 for the high demand scenario, or bad scenario. Enumeration methods can be used to capture all possible scenarios if the problem size is not too large. Let Λ denote a set of uncertain scenarios, and λt denotes a realized scenario in Λ at time t. In addition, ΛB is used to represent a set of bad scenarios.
(3) It is assumed that each GenCo within the MAS framework have two bidding options: either place a high bid (i.e., bi,t = 1) or place a normal bid (i.e., bi,t = 0), where bi,t is the bidding strategy of GenCo i at time t. The coefficients for different bidding options are defined as follows:
α i , t = { α i c b i , t = 0 α i h b i , t = 1
β i , t = { β i c b i , t = 0 β i h b i , t = 1
where αic, βic are parameters of the normal bidding curve for GenCo i, and αih, βih are parameters of the corresponding high bidding curve. Obviously, if a GenCo has accepted an incentive contract, we have bit = 0, αi,t = αic, βi,t = βic for t t ˜ .

3.2. Single-Period Decision-Making Model of GenCo

Based on the given menu of incentive contracts, in each time period, a GenCo tries to maximize its profit by choosing the best bidding strategy as follows:
max i ( a i )
For a GenCo who does not accept any incentive contract, its profit is given as:
i ( a i = 0 ) = t t ˜ λ t Λ ( Y × ρ ( λ t ) )
where Y = i ( λ t , a i = 0 ) , and ρt) represents the probability of λt, and π k is the reward specified in the incentive contract for target GenCo k.
It is usually difficult for a GenCo to know the actual bidding behavior of others, but it is reasonable to assume that the probability of its competitors’ decision can be inferred from historical data. Hence Y = i ( λ t , a i = 0 ) can be obtained as:
Y = t t ˜ , λ t Λ X
where:
X = pos i ( b t i , C ) j i pos j ( b j , t ) ( p ( λ t , b t i , C ) K i , t c i 1 K i , t 0.5 c i 2 K i , t 2 ) + pos i ( b t i , H ) j i pos j ( b j , t ) ( p ( λ t , b t i , H ) K i , t c i 1 Κ i , t 0.5 c i 2 K i , t 2 )
where K i , t = q i , t ( λ t , b i , t ) is the power output of GenCo i when its bidding action is bi,t in a scenario λt. ci1 and ci2 are cost coefficients of GenCo i. bti,c = {b1,t, b2,t, bi-1,t, 0, bi+1,t,…,bm,t} represent a bidding combination when GenCo i places a normal bid. Note that posj(bj,t) is the probability for GenCo j to take action bj,t. Here pt, bt(a)) is the expected electricity price when the bidding combination of GenCos is bt(a) in scenario λt. Note that pt, bt(a)) is pt, bti,C) when GenCos’ bidding action is bti,C in scenario λt, and is pt, bti,H) when GenCos’ bidding action is bti,H in scenario λt.
For a contracted GenCo who agrees on the acceptance of an incentive contract which is tailored to the target GenCo k, its profit could be calculated as follows:
Π i ( a i = k ) = t t ˜ λ t Λ t ρ ( λ t ) ( Π i ( λ t , a i = k ) ) + π k
Assuming the incentive contract is prescribed as (αk, βk, πk) triplet, we have:
K i , t p t α k β k
pos i ( b t i , C ) = 1
pos i ( b t i , H ) = 0
So Π i ( λ t , a i = k ) could be calculated as:
Π i ( λ t , a i = k ) = t t ˜ ( j i pos j ( b j , t ) ( p ( λ t , b t i , C ) Κ i , t c i 1 Κ i , t 0.5 c i 2 Κ i , t 2 ) ) , k I
subject to:
K i , t p t α k β k
pos i ( b t i , C ) = 1
pos i ( b t i , H ) = 0
The optimization problem faced by a GenCo is how to choose an optimal bidding strategy such that its expected profit is maximized. Hence the incentive-compatibility constraint could be formulated as:
I C : a i = arg max { Π i ( a i = 0 ) , Π i ( a i = k ) } , k I
If a GenCo accepts an incentive contract, its expected profit should be higher than the alternative. Thus the personal rationality constraint could be re-formulated as:
P C : Π i ( a i = 0 ) < Π i ( a i = k ) + π k , k I

3.3. Optimization Problem of Independent System Operator

From the ISO’s point of view, its goal is to design an optimal menu of incentive contracts such that the average MCP during the period remains at a relatively stable level, or the volatility of price in the worst scenarios could be mitigated, while the total electricity payment is minimized. To that end, it is necessary for the ISO to identify the optimal set of target GenCos (i.e., a0) as well as designing the incentive menu for attracting contracted GenCos, so that its objectives could be optimized. Since an incentive contract, which is specified in the triplet form of (αi, βi, πi), is dependent upon a0, how to target suitable GenCos is the key to designing an optimal incentive menu of contracts. Hence the ISO’s initial decision is to choose optimal a0, so as to minimize the total cost with certain price stability.
As the leader of the Stackelberg game, the ISO can analyze the response of the followers (i.e., GenCos) so as to find the optimal decision variable a0. A two-level programming model is proposed to facilitate ISO’s decision-making. The sub-problem at the first level enables the ISO to minimize the total cost with price stability by finding an optimal value of a0. The sub-problem at the second level can be treated as GenCos’ reaction model upon the release of the menu of incentive contracts from the first level decision:
min a 0 A 0 ( C ( a 0 ) )
min a 0 A 0 ( ( 1 δ ) × EP ( a 0 ) + δ × BP ( a 0 ) )
subject to:
a = a ( a 0 ) = ( a 1 ( a 0 ) , a 2 ( a 0 ) , ... , a m ( a 0 ) ) ( a 1 , ... , a i , ... , a m )
C ( a 0 ) = t t ˜ λ t Λ t ( ρ ( λ t ) P ( λ t , a ) K i , t ) + i = 1 m π ( a i )
π ( a i ) = { π k a i = k , k I 0 a i = 0
EP ( a 0 ) = t t ˜ λ t Λ t ρ ( λ t ) × P ( λ t , a )
BP ( a 0 ) = t t ˜ λ t Λ t B ρ ( λ t ) × [ P ( λ t , a ) EP * ] 2
i = 1 m Κ i , t = D ( λ t )
P ( λ t , a ) = j M pos j ( b j , t ) p ( λ t , b t ( a ) )
p ( λ t , b t ( a ) ) = α i , t + β i , t Κ i , t , i M
b t ( a ) = ( b 1 , t ( a ) , b 2 , t ( a ) , , b m , t ( a ) )
b i , t ( a i ) = { 0 a i > 0 1 o r 0 a i = 0
where ai is obtained by solving follows:
I C : a i = arg max { Π i ( a i ( a 0 ) ) }
s.t.  P C : Π i ( a i = 0 ) < Π i ( a i = k ) + π k , k I
K i , t p t α k β k  for  a i > 0
pos i ( b t i , C ) = 1  for  a i > 0
pos i ( b t i , H ) = 0  for  a i > 0
where C(a0) is the total power purchasing cost when the combination of the target GenCos is a0, and δ is a balance parameter. EP(a0) is the expected electricity price when the combination of the target GenCos is a0, and EP* is the best expected price. BP(a0) is the variance of mean price versus EP* when the combination of the target GenCos is a0. Here Pt,a) is the expected MCP when the combination of contracted GenCos’ is a in scenario λt. The MCP is pt, bt(a)) when the bidding behavior of GenCos is bt(a) in scenario λt.
As shown in Equation (19), one of the ISO’s objectives is to minimize the electricity payment. Equation (20) is another objective of the ISO, that contains dual goals: Firstly, minimizing the common price in the contract period; and secondly minimizing the volatility of price. Both goals are combined by a balance factor. Equation (21) indicates that a is also decided by a0. Equations (22) and (23) are the mathematical descriptions of the cost and the reward, respectively. Equation (24) calculates the average price in one period under multiple scenarios. Equation (25) calculates the variation of mean price versus EP* in one period in multiple scenarios. Equation (26) ensures that the electricity demand is always satisfied. Equation (27) computes the average price by multiplying the price for certain bid combination in a specified scenario with its occurrence possibility. Equations (28)–(30) provide the mathematical descriptions for p ( λ t , b t ( a ) ) , b t ( a ) , b i , t ( a i ) , respectively. Equation (31) represents the GenCos’ objective which is also their incentive-compatibility constraint with ai being the decision variable for GenCo i. Equation (32) gives the personal rational constraint of the GenCos who is willing to accept an incentive contract. Finally, Equations (33)–(35) defines the constraints of contracted GenCos including power output capacity of contracted GenCos, and the possibilities of contracted GenCos to place high bids or normal bids.
A multi-objective optimization can be solved by turning it into a single objective model through appropriately assigning weight to each objective function. Using a weight w to combine the two objectives in Equations (19) and (20), the ISO’s decision model could be further expressed as:
max a 0 A 0 J = w ( C max C ( a 0 ) C max C min ) + ( 1 w ) ( EPM max EPM ( a 0 ) EPM max EPM min )
subject to:
EPM ( a 0 ) = ( ( 1 δ ) × EP ( a 0 ) + δ × BP ( a 0 ) )
a = a ( a 0 ) = ( a 1 ( a 0 ) , a 2 ( a 0 ) , ... , a m ( a 0 ) ) ( a 1 , ... , a i , ... , a m )
C ( a 0 ) = t t ˜ λ t Λ t ( ρ ( λ t ) P ( λ t , a ) K i , t ) + i = 1 m π ( a i )
π ( a i ) = { π k a i = k , k I 0 a i = 0
EP ( a 0 ) = t t ˜ λ t Λ t ρ ( λ t ) × P ( λ t , a )
BP ( a 0 ) = t t ˜ λ t Λ t B ρ ( λ t ) × [ P ( λ t , a ) EP * ] 2
i = 1 m K i , t = D ( λ t )
P ( λ t , a ) = j M pos j ( b j , t ) p ( λ t , b t ( a ) )
p ( λ t , b t ( a ) ) = α i , t + β i , t Κ i , t , i M
b t ( a ) = ( b 1 , t ( a ) , b 2 , t ( a ) , , b m , t ( a ) )
b i , t ( a i ) = { 0 a i > 0 1 o r 0 a i = 0
where ai is obtained by solving follows:
I C : a i = arg max { Π i ( a i ( a 0 ) ) }
s.t.  P C : Π i ( a i = 0 ) < Π i ( a i = k ) + π k , k I
K i , t p t α k β k  for  a i > 0
pos i ( b t i , C ) = 1  for  a i > 0
pos i ( b t i , H ) = 0  for  a i > 0
where Cmax is the maximum available C(a0); Cmin is the minimum available C(a0); EPM(a0) is a balance between price minimization and price variation minimization when the decision variable is a0 . EPMmax is the maximum available EPM(a0); and EPMmin is the minimum available EPM(a0). Equations (38)–(52) are the same with Equations (21)–(35).

4. Q-Learning for Agents’ Optimal Decision Making

Each agent interacts in the volatile market environment due to the uncertain load and lack of precise knowledge of its competitors. It is imperative for the agents to evolve their actions through the learning of repeated bidding processes. Q-learning is one of the reinforcement learning methods, and could guide the agents to improve the performance of their decision making over time. In each period, an agent perceives the state of the market environment, and takes certain actions based on its perception and past experience, which result in a new state. This sequential learning process would reinforce its subsequent actions. Quite a few studies have been done on Q-learning, and its application in the electricity market has been reported. For instance, Rahimiyan and Mashhadi [32] propose a fuzzy Q-learning method to model the GenCos’ strategic bidding behavior in a competitive market condition, and find that GenCos could accumulate more profit by using fuzzy Q-learning. Naghibi-Sistani et al. [33] developed a modified reinforcement learning based on temperature variation, and applied it to the electricity market to determine the GenCos’ optimal strategies. Attempts also have been made to combine the Q-learning with Nash-Stackelberg games for reaching a long-run equilibrium. Haddad et al. [34] incorporate a Nash-Stackelberg fuzzy Q-learning into a hierarchical and distributed learning framework for decision-making, with which mobile users are guided to enter the equilibrium state that optimizes the utilities of all the network participants.
In this paper, Q-learning methods are adopted by ISO and GenCos for the making decisions. Different learning algorithms are designed for ISO and GenCos because they have different goals. For GenCos, both a periodic Q-learning method and a daily Q-learning method are applied to the bidding decision process. For ISO, a Stackelberg-based Q-learning is adopted to design the menu of incentive contracts in each period.

4.1. Periodic and Daily Q-Learning Methods for Generating Companies

At the starting point of a period, a GenCo decides whether the incentive contract should be accepted or declined. In each day of the period, the GenCo should choose to place a high bid or place a normal bid. Especially, if a contracted GenCo decides to place a high bid, the reward at the end of the period, would be cancelled. To calculate the potential reward, a multi-step Q-learning method is adopted by the GenCo to decide its bidding strategy in daily basis. Two Q-learning methods are proposed for the GenCo’s periodic and daily decision making. The state, actions, reward and Q-value function are defined as follows:

4.1.1. State Identification

State s t ˜ is defined for GenCo’s Q-learning method for a period, and it is composed of values of all possible average electricity prices over one period.
State s t denote the states for GenCo’s Q-learning method in each day, and it is composed of values of all possible average electricity prices over one day.

4.1.2. Action Selection

Let a discrete set of actions a i , t ˜ = { 0 , k } , k I , denote the action selection of GenCo i at the starting point of a period for GenCo’s Q-learning method for a period. When a i , t ˜ = 0, GenCo i chooses not to accept the incentive contract over period t ˜ . When a i , t ˜ = k , k I , GenCo i accepts the incentive contract which is tailored to the target GenCo k over period t ˜ .
a i , t denotes the action selection of GenCo i in each day for GenCo’s Q-learning method for a day, and its value is 0 or 1. When a i , t = 1 , GenCo i adopts a normal bidding strategy; When a i , t = 0 , GenCo i adopts a high bidding strategy.
When a i , t ˜ 0 , in each day of the period t ˜ , GenCo i places a normal bidding strategy. So for a contracted GenCo, a i , t = 1 , with a high probability.

4.1.3. Reward Calculation

The periodic reward function for Q-learning method over period t ˜ is defined as:
r ( s t ˜ , a i , t ˜ ) = t t ˜ Π i , t ( a i , t ) + R ( a i , t ˜ )
Equation (53) represents the reward assigned to action a i , t ˜ from the old state s t ˜ . If a i , t ˜ = 0, which means that the menu of incentive contracts is not accepted over period t ˜ , R ( a i , t ˜ ) = 0 . If a i , t ˜ = k , k I , which means that GenCo i accepts the incentive contract which is tailored to the target GenCo k over period t ˜ , an amount of R ( a i , t ˜ ) is received as the reward for meeting the incentive contract. The reward would further influence the periodic Q-value which guides the GenCo to determines the next action as whether or not to accept the incentive menu.
Every day the GenCo agent evaluates the current state, and chooses the best action that optimizes its objectives. Then the current state evolves to the new state, with a transition probability, and the agent receives a reward.
The reward r for daily Q-learning is made up of two parts. One is the direct profit subject to all the GenCo agents’ bidding behaviors, loads, and cost of the GenCo agent. The second is a portion of the expected reward if the GenCo accepts the incentive contract. If a GenCo agent accepts the incentive menu and fulfills the contractual obligations in the contract period, a reward would be obtained at the end of the period, so the reward is a delayed reward. A multi-step reward function, which captures the characteristic of the delayed reward, is defined to describe GenCo agent’s daily Q-learning as follows:
r ( s t , a i , t ) = Π i , t ( a i , t ) + R ( a i , t )
subject to:
R ( a i , t ) = { Γ  for  Π t t ˜ a i , t = 1 0  for  Π t t ˜ a i , t = 0
where,
Γ = T s 1 T π ( a i , t ) + ( T T s ) ( φ 1 T π ( a i , t ) + φ 2 1 T π ( a i , t ) + + φ T T s 1 T π ( a i , t ) ) = 1 T π ( a i , t ) [ T s + ( T T s ) i = 1 T T s φ i ] , ( T s = 1 , 2 , , T )
where φ is a discount factor, and π ( a i , t ) is the reward for GenCo i to meet the incentive contract terms at time t, and T is the total number of days in a contract period, and Ts is the number of the days elapsed in the period.

4.1.4. Q-Value Update

By Q-learning, using the Bellman optimality in Equations (57) and (58), each GenCo agent tries to find the optimal action to maximize the Q-value of each state in a long run.
The periodic Q-value function defined for GenCo i over period t ˜ is given as follows:
Q t ˜ + 1 ( s t ˜ , a i , t ˜ ) = Q t ˜ ( s t ˜ , a i , t ˜ ) + t ˜ [ r ( s t ˜ , a i , t ˜ ) + γ t ˜ max a i , t ˜ + 1 Q ( s t ˜ + 1 , a i , t ˜ + 1 ) Q t ˜ ( s t ˜ , a i , t ˜ ) ]
where t ˜ is a positive learning rate at period t ˜ , and γ t ˜ is a discount parameter at period t ˜ .
These action-state value functions Q t ˜ + 1 ( s t ˜ , a i , t ˜ ) (i = 1, ... m), which are greatly affected by the reward function as illustrated in Equation (53), determine the GenCo agents’ most suitable actions for the next run. That is, if the Q-value for accepting the incentive menu is less than the Q-value for not accepting it, the GenCo agent would not take the action of accepting the incentive menu. Conversely, it would. The daily Q-value function defined for GenCo i at each day is given as follows:
Q t + 1 ( s t , a i , t ) = Q t ( s t , a i , t ) + t [ r ( s t , a i , t ) + γ t max a i , t + 1 Q ( s t + 1 , a i , t + 1 ) Q t ( s t , a i , t ) ]
where t is a positive learning rate in day t, and γ t is a discount parameter in day t.

4.2. Q-Learning for the Leader of the Stackelberg Game (Independent System Operator)

4.2.1. State Identification

State s t ˜ = { ( s t ˜ , a 0 ) } for ISO’s Q-learning is composed of two state variables, one is the values of all possible average electricity prices during that period, and the other is the decision variable for menu of the incentive contracts.

4.2.2. Action Selection

The set of all possible combinations of target GenCos, or A0 is defined as the set of action selection of ISO agent. The ISO takes the action at each step, or at the starting point of each period. ( a 0 ) t ˜ denotes the action selection of ISO in period t ˜ .

4.2.3. Reward Calculation

Reward function r ( s t ˜ , ( a 0 ) t ˜ ) is given by:
r ( s t ˜ , ( a 0 ) t ˜ ) = w ( C max C ( ( a 0 ) t ˜ ) C max C min ) + ( 1 w ) ( EPM max EPM ( ( a 0 ) t ˜ ) EPM max EPM min )
s.t.  a = a [ ( a 0 ) t ˜ ] = [ a 1 , t ˜ ( a 0 ) t ˜ , a 2 , t ˜ ( a 0 ) t ˜ , , a m , t ˜ ( a 0 ) t ˜ ] ( a 1 , t ˜ , a 2 , t ˜ , , a m , t ˜ )
a i , t ˜ = arg max ( Q t ˜ + 1 ( s t ˜ , a i , t ˜ ) )
Equations (59)–(61) illustrates that a i , t ˜ , which is the GenCo i’s action in period t, depends on its Q-learning, and so the reward of the ISO is obtained by using a Stackelberg-based Q-learning method.

4.2.4. Q-Value Update

As the leader of the Stackelberg game, Q-learning algorithm for ISO is given as follows:
Q 0 t ˜ + 1 ( s t ˜ , ( a 0 ) t ˜ ) = Q 0 t ˜ ( s t ˜ , ( a 0 ) t ˜ ) + t ˜ [ r t ˜ ( s t ˜ , ( a 0 ) t ˜ ) + γ max ( a 0 ) t ˜ + 1 Q 0 ( s t ˜ + 1 , ( a 0 ) t ˜ + 1 ) Q 0 t ˜ ( s t ˜ , ( a 0 ) t ˜ ) ]

4.3. Solution Methodology for Independent System Operator’s Initial Q Value

For problems with multiple decision variables, chaos search is more capable of hill-climbing and escaping from the local optima than the random search [35]. Hence a chaos optimization algorithm is proposed to solve the problem
The detailed procedure of the algorithm is given as follows:
Step 1:
set initial parameters incorporating bidding coefficients of GenCos and its power capacity.
Step 2:
set ν = 1 .
Step 3:
generate a non-zero chaos variable η ν + 1 using cube mapping method as shown below:
η ν + 1 = 4 η ν 3 3 η ν
Step 4:
decoding the chaos variables into a binary variable which represents a value for the sets of target GenCos.
Step 5:
calculate the tailoring values of (αi, βi, πi) for all target GenCos using Equation (18).
Step 6:
for the designed menu of the incentive contracts, check each GenCo’s optimal reaction by solving Equation (17).
Step 7:
calculate the corresponding objective value and the state s t ˜ for ISO’s Q-learning. The latter includes the mean electricity price during the period, and the menu of the incentive contracts. If the obtained objective value is larger than the existing one, substitute the existing one.
Step 8:
substitute the chaos variables into Equation (64) to yield new chaos variables:
x ν = c ν [ d ν η ν + 1 ]
where c ν and d ν are two constant vectors, and [ d ν η ν + 1 ] is the integr part of d ν η ν + 1 .
Step 9:
Set ν = ν + 1 , k = k + 1.
Step 10:
If ν > ν max , stop searching, else go to Step 4.

5. Simulations and Analysis

The simulation is performed in a day-ahead electricity market with the participation of five GenCos. The original probability for a GenCo to bid high or normal is 0.5. Electricity demand at each hour varies between 170 MW and 230 MW. The probability of high-demand scenarios, which are also termed as bad scenarios, is less than 0.2. The length of a contract could be several days, or a couple of months. For computational convenience, firstly it is assumed that each period consists of seven days or one week.
Parameters of GenCos’ bidding curves are listed in Table 2, and the GenCos’ cost parameters are listed in Table 3, and the weights of the two objectives are listed in Table 4. Three cases are investigated for the comparative analysis.
Case 1:
no menu of incentive contracts or Q-learning.
Case 2:
menu of incentive contracts without Q-learning in one period. Eight sub-cases are further analyzed, and the comparative results are listed in Table 5 and Table 6.
Case 3:
menu of incentive contracts with Q-learning in multiple periods. Note that load demand over the multiple periods varies between 170 MW and 230 MW.
Table 2. Parameters of GenCos’ bidding curves. (Unit for αi,tc, αi,th: $/MW per hour, unit for βi,tc, βi,th: $/(MW)2 per hour).
Table 2. Parameters of GenCos’ bidding curves. (Unit for αi,tc, αi,th: $/MW per hour, unit for βi,tc, βi,th: $/(MW)2 per hour).
GenCo No.Case 1, Cases 2.1–2.5, Case 3Cases 2.6–2.8
αi,tcβi,tcαi,thβi,thαi,tcβi,tcαi,thβi,th
1100.510.50.525100.510.50.525
2110.8120.84110.8120.84
380.68.40.6380.98.40.945
4150.515.750.525150.515.750.525
5200.9210.945200.6210.63
Table 3. GenCos’ cost parameters. (Unit for ci1: $/MW, unit for ci2: $/(MW)2).
Table 3. GenCos’ cost parameters. (Unit for ci1: $/MW, unit for ci2: $/(MW)2).
GenCo No.Cases 2.1–2.4 and 2.6–2.8Case 2.5
ci1ci2ci1ci2
1100.550.25
2110.5560.35
380.540.25
4150.870.4
5200.9100.45
Table 4. Objectives of Cases 2.1–2.8.
Table 4. Objectives of Cases 2.1–2.8.
CasesCostEPM (0.5 × EP + 0.5 × BP)
2.1, 2.5, 2.610
2.2, 2.701
2.3, 2.8, 30.50.5
2.4Without menu of incentive contracts
Table 5. Comparative results for Cases 2.1–2.3 and 2.5 (unit: $,). (Note: Y = saying “Yes” to offer of the incentive contract menu and N = saying “No” to the offer).
Table 5. Comparative results for Cases 2.1–2.3 and 2.5 (unit: $,). (Note: Y = saying “Yes” to offer of the incentive contract menu and N = saying “No” to the offer).
ItemsCase 2.1Cases 2.2 and 2.3Case 2.5
Target GenCos100011001101011
GenCos’ responseYYYNYYYYYYYYYYY
Expected reward208,750543,530123,070
Expected cost saving (compared with Case 2.4)597,830512,120932,580
Expected price (EP)37.1736.9436.94
BP (mean price variance in bad scenarios)2.782.532.53
EPM (0.5 × EP + 0.5 × BP)19.9719.7419.74
Table 6. Comparative results for Cases 2.6–2.8 (unit: $, N/A = Not Applicable).
Table 6. Comparative results for Cases 2.6–2.8 (unit: $, N/A = Not Applicable).
ItemsCase 2.6Cases 2.7 and 2.8
Target GenCos0010100111
GenCos’ responseYYYNYYYYYY
Expected reward215,980581,130
Expected cost saving (compared with Case 2.9)607,780497,270
EP3836.94
BP (meanprice variance in bad scenarios)2.782.53
EPM (0.5 × EP + 0.5 × BP)20.3920.15
Threshold for GenCo’s power output165586477
271457091
371457091
4N/A9852
565586477
Case 2.1 aims at minimizing cost. Cost in Case 2.1 is less than that in Cases 2.2 and 2.3, but EP and BP in Case 2.1 are larger compared with that in Cases 2.1 and 2.3.
In Case 2.5, though it also aims at minimizing cost, as GenCos have small cost coefficient, they could gain more profits compared with Cases 2.1–2.4, and so GenCos in Case 2.5 prefer to make normal bids since they could obtain more power output and hence more profit, and so both cost and EPM could be minimized.
Since GenCo 3 has higher bidding coefficients in Cases 2.6–2.8, it has more influence on MCP than in Cases 2.1–2.5. Hence GenCo 3 is more likely to be the target GenCo in Cases 2.6–2.8 than in Cases 2.1–2.5.
Figure 3 and Figure 4 show the simulation results of variations in price and cost across 112 days in Cases 1 and 3.
Figure 3. Comparative results of price variation for Cases 1 and 3 (for 112 days).
Figure 3. Comparative results of price variation for Cases 1 and 3 (for 112 days).
Energies 08 12419 g003
It can be seen that the variations of electricity price and cost could be reduced in the long term provided that the incentive contract is adopted. It could be seen that in the early phases, the effect of the incentive contract is not obvious as the daily price and daily electricity purchase cost do not significantly decrease in Case 3. However, as time evolves, GenCos can enhance their bidding experience through learning from past bidding processes and realize that accepting the incentive contract could help improve their profitability. They become more interested in participating in the incentive program. As a result, the electricity price is kept at a low and stable level.
Figure 4. Comparative results of cost variation for Cases 1 and 3 (for 112 days).
Figure 4. Comparative results of cost variation for Cases 1 and 3 (for 112 days).
Energies 08 12419 g004
Figure 5 and Figure 6 show the simulation results of the variations of price and the cost across 112 days in Cases 2 and 3. It could be seen that in the early periods, the cost in Case 3 may be higher than that in Case 2 over certain number of days, and the price in Case 3 is higher than that in Case 2. As GenCos and ISO accumulate more bidding experiences by Q-learning, optimum decisions in Case 3 could be made by both players, and the cost and the price could be reduced compared with that in Case 2.
Figure 5. Comparative results of price variation for Cases 2 and 3 (for 112 days).
Figure 5. Comparative results of price variation for Cases 2 and 3 (for 112 days).
Energies 08 12419 g005
Figure 6. Comparative results of cost variation for Cases 2 and 3 (for 112 days).
Figure 6. Comparative results of cost variation for Cases 2 and 3 (for 112 days).
Energies 08 12419 g006
Figure 7 and Figure 8 show the comparative results of average price variation and average cost variation for three cases, respectively. Based on Case 3, it could be seen that both the cost and the price could be reduced and remain stable in a long run.
Extending the length of the contract period to 14 days, the comparative results for the duration of 224 days (i.e., 16 periods) are shown in Figure 9, and it could be seen that price variation in the electricity market with incentive mechanism is less than the market without incentive mechanism. In fact, the price variance in the former market is 0.242 versus 0.270 in the latter, and the average price in the former market is 38.33 versus 38.50 in the latter (price unit is $/MWh).
Figure 7. Comparative results of average price variation for 3 Cases (for 16 periods).
Figure 7. Comparative results of average price variation for 3 Cases (for 16 periods).
Energies 08 12419 g007
Figure 8. Comparative results of average cost variation (for 16 periods).
Figure 8. Comparative results of average cost variation (for 16 periods).
Energies 08 12419 g008
Figure 9. Comparative results of price variation (for 224 days).
Figure 9. Comparative results of price variation (for 224 days).
Energies 08 12419 g009
Extending the single contract period to 28 days, the comparative results for 448 days (i.e., 16 periods) are shown in Figure 10. It could be seen that price variation in the electricity market with incentive mechanism is less than the market without incentive mechanism. In fact, the price variance in the former market is 0.252 versus 0.310 in the latter, and the average price in the former market is 37.86 versus 38.43 in the latter (price unit is $/MWh).
Figure 10. Comparative results of price variation (for 448 days).
Figure 10. Comparative results of price variation (for 448 days).
Energies 08 12419 g010

6. Conclusions

In this paper a menu of incentive contracts is presented in a Stackelberg game model, aiming at seeking an incentive bidding mechanism with which the electricity price could be kept at a low and stable level. To ensure market equilibrium, a Stackelberg game based Q-learning is proposed for the ISO to analyze the responses of GenCos to the market as well as searching for the optimal menu of incentive contracts. For GenCos, a periodic Q-learning method is adopted to determine whether the incentivized menu should be accepted or not. In addition, a multi-step Q-learning method is adopted by the GenCo to decide its daily bidding policy. Based on the multi-agent platform, the long-term effectiveness of the incentive program is validated up to 14 months using simulation methods. Numerical results show that an incentivized menu, which is suitably designed by the ISO with the perspective of a central planner, could lead to desirable bidding behavior of GenCos, and hence guarantees the market sustainability. When multiple types of Q-learning methods are adopted by the ISO and the GenCos for decision makings, both the electricity price and purchasing cost could be reduced. Hence a desirable trade-off between the price variation and the purchasing cost could be reached at equilibrium. Future efforts could be directed to analyzing GenCos’ reactions to the menu of incentive contracts under different risk preferences or generation uncertainties with wind and solar power integration.

Acknowledgments

This research is supported by National Natural Science Funds of China (Grant No. 71201097) and Action plan for scientific and technological innovation Program of Science and Technology Commission Foundation of Shanghai (Grant No. 15511109700). We would like to thank the anonymous reviewers and the editor for their valuable time and constructive comments for the improvement of the original manuscript.

Author Contributions

This paper designs an incentive contract menu to achieve long-term stability for electricity prices in a day-ahead electricity market. A bi-level Stackelberg game model is proposed to search for the optimal incentive mechanism under a one-leader and multi-followers gaming framework. A multi-agent simulation platform was developed to investigate the effectiveness of the incentive mechanism using an independent system operator (ISO) and multiple power generating companies (GenCos). Further, a Q-learning approach was implemented to analyze and assess the responses of GenCos to the incentive menu.

Conflicts of Interest

The authors declare no conflict of interest.

Notations

N1. Set Parameters

A0
Set of all possible a0, which denotes a combination of target GenCos.
M
Set of all serial numbers of GenCos.
I
Set of target GenCos, I = { k | a k 0 = 1 , k M } .
Λ
Set of uncertain scenarios during a bidding period.
ΛB
Set of all bad scenarios.
(AL, B, π)
Data set for the menu of incentive contracts.

N2. Decision Variables

a i 0
Whether a GenCo is chosen as the target GenCo. a i 0 = 1 is “yes” and 0 is “not”.
a0
a 0 = ( a 1 0 , a 2 0 , ... , a m 0 ) , a0 A0.
ai
Whether GenCo i accepts the incentive menu. ai = 0 means “not”, and ai ≠ 0 means “yes”. Moreover, if ai ≠ 0, ai = k, k∈I, meaning GenCo i accepts the incentive contract with GenCo k as the target GenCo.
a
i ∈M.

N3. Model Parameters

C(a0)
Total electricity purchasing cost when the combination of the target GenCos is a0.
π (ai)
Award received by GenCo i.
t
Time interval in one period.
t ˜
Length of contract period.
λt
A certain scenario in Λ at time t.
ρt)
Probability of λt.
EP(a0)
Expected electricity price when the combination of the target GenCos is a0.
EP*
The best expected price.
δ
Balance parameter.
w
Weight of the objective functions.
BP(a0)
Variance of mean price in bad scenarios versus EP* when the set of the target GenCos is a0.
EPM
A representative symbol of the model objective which combines the expected price (EP) and robustness of the price (BP) with a balance factor.
Pt,a)
In scenario λt, the expected market price when the set of the GenCos’ options for the accepted incentive contract is a.
m
Number of GenCos.
Ki,t
Power output of GenCo i when the bidding combination of GenCos is bt(a) in scenario λt.
αit, βit
Parameters of GenCo i’s bidding curve at time t.
bit
Bidding strategy of GenCo i at time t. bit = 1 means a high bidding; and bit = 0 implies a normal bidding.
posi(bit)
Probability for GenCo i to make a bid bit at time t.
Пi(ai)
Expected profit of GenCo i when its contract decision is ai.
Пi(λt, ai)
Expected profit for GenCo i in scenario λt. When ai = 0, GenCo i accepts the menu of incentive contracts; when ai ≠ 0 or ai = k, GenCo i does not accept the incentive contract which is tailored to the target GenCo k (k = ai).
bti,c
bti,c = {b1t, b2t, bi-1,t, 0, bi+1,t, ..., bmt}.
bti,H
bti,H = {b1t, b2t, bi-1,t, 1, bi+1,t, ..., bmt}.
bt(a)
Combination of GenCos’ bidding strategy with b t ( a ) = ( b 1 , t ( a 1 ) , b 2 , t ( a 2 ) , , b m , t ( a m ) ) .
p(λt, bt(a))
The expected electricity price when the bidding combination of GenCos is bt(a). The value of bt(a) could be bti,c or bti,H.
D(λt)
Electricity demand in scenario λt.
ci1, ci2
Cost coefficients of GenCo i.
i, βi, πi)
Parameters of an incentive contract. Note αi and βi represent the bidding coefficients of a target GenCo, and πi denotes per-period reward.
pit
Bidding price of GenCo i at time t.
pt
MCP at time t.
qit
Bidding power output of GenCo i at time t.
Dt
Electricity demand at time t.
αic, βic
Parameters of the normal bidding curve for GenCo i.
αih, βih
Parameters of the high bidding curve for GenCo i.

N4. Q-Learning Parameters

s t ˜
State identification for GenCo’s Q-learning method in a period.
s t
State identification for GenCo’s Q-learning method in each day.
a i , t ˜
Periodical action selection of GenCo i at the starting point of a period for GenCo’s Q-learning method.
a i , t
Daily action selection of GenCo i in each day for GenCo’s Q-learning method.
r ( s i , t ˜ , a i , t ˜ )
Periodical reward function for Q-learning method.
R ( a i , t ˜ )
Reward obtained by GenCo i over period t ˜ .
r ( s i , t , a i , t )
Daily reward function for Q-learning method.
R ( a i , t )
Reward obtained by GenCo i over period at a day.
φ
Discount factor.
T
Number of days in the contract period.
Ts
Number of days elapsed over a contract period t ˜ .
Q t ˜ + 1 ( s t ˜ , a i , t ˜ )
Periodical Q-value function defined for GenCo i.
Q t + 1 ( s t , a i , t )
Daily Q-value function defined for GenCo i.
t ˜
Positive learning rate for periodical Q-learning function.
t
Positive learning rate for daily Q-learning function.
γ t ˜
Discount parameter for periodical Q-learning function.
γ t
Discount parameter for daily Q-learning function.
s t ˜ = { ( s t ˜ , a 0 ) }
State identification for ISO’s Q-learning function.
( a 0 ) t ˜
Periodic action selection of ISO.
r ( s t ˜ , ( a 0 ) t ˜ )
Periodic reward calculation for ISO’s Q-learning function.
Q 0 t ˜ + 1 ( s t ˜ , ( a 0 ) t ˜ )
Periodic Q-value function defined for ISO.

N5. Algorithms Parameters

ν
An iteration number.
η ν
Non-zero chaos variable.
c ν , d ν
Constant vectors.
ν max
Max iteration times.

References

  1. Zhang, D.; Wang, Y.; Luh, P.B. Optimization based bidding strategies in the deregulated market. IEEE Trans. Power Syst. 2000, 15, 981–986. [Google Scholar] [CrossRef]
  2. Kian, A.R.; Cruz, J.B. Bidding strategies in dynamic electricity markets. Decis. Support Syst. 2005, 40, 543–551. [Google Scholar] [CrossRef]
  3. Swider, D.J.; Weber, C. Bidding under price uncertainty in multi-unit pay-as-bid procurement auctions for power systems reserve. Eur. J. Oper. Res. 2007, 181, 1297–1308. [Google Scholar] [CrossRef]
  4. Centeno, E.; Renese, J.; Barquin, J. Strategic analysis of electricity markets under uncertainty: A conjectured-price-response approach. IEEE Trans. Power Syst. 2007, 22, 423–432. [Google Scholar] [CrossRef]
  5. Sahraei-Ardakani, M.; Rahimi-Kian, A. A dynamic replicator model of the players’ bid in an oligopolistic electricity market. Electr. Power Syst. Res. 2009, 79, 781–788. [Google Scholar] [CrossRef]
  6. Li, G.; Shi, J. Agent-based modeling for trading wind power with uncertainty in the day-ahead wholesale electricity markets of single-sided auctions. Appl. Energy 2012, 99, 13–22. [Google Scholar] [CrossRef]
  7. Nojavan, S.; Zare, K. Risk-based optimal bidding strategy of generation company in day-head electricity market using information gap decision theory. Int. J. Electr. Power Energy Syst. 2013, 48, 83–92. [Google Scholar] [CrossRef]
  8. Qiu, Z.; Gui, N.; Deconick, G. Analysis of equilibrium-oriented bidding strategies with inaccurate electricity market models. Int. J. Electr. Power Energy Syst. 2013, 46, 306–314. [Google Scholar] [CrossRef]
  9. Kardakos, E.G.; Simoglou, C.K.; Bakirtzis, A.G. Optimal bidding strategy in transmission-constrained electricity markets. Electr. Power Syst. Res. 2014, 109, 141–149. [Google Scholar] [CrossRef]
  10. Anderson, E.J.; Cau, T.D.H. Implicit collusion and individual market power in electricity markets. Eur. J. Oper. Res. 2011, 211, 403–414. [Google Scholar] [CrossRef]
  11. Nam, Y.W.; Yoon, Y.T.; Hur, D.; Park, J.; Kim, S. Effects of long-term contracts on firms exercising market power in transmission constrained electricity markets. Electr. Power Syst. Res. 2006, 76, 435–444. [Google Scholar] [CrossRef]
  12. David, A.K.; Wem, F.S. Market power in electricity supply. IEEE Trans. Energy Convers. 2001, 16, 352–360. [Google Scholar] [CrossRef]
  13. Oh, S.; Hildreth, A.J. Decisions on energy demand response option contracts in smart grids based on activity-based costing and stochastic programming. Energies 2013, 6, 425–443. [Google Scholar] [CrossRef]
  14. Faria, P.; Vale, Z.; Baptista, J. Demand response programs design and use considering intensive penetration of distributed generation. Energies 2015, 9, 6230–6246. [Google Scholar] [CrossRef]
  15. Ghazvini, M.A.F.; Soares, J.; Horta, N.; Neves, R.; Castro, R.; Vale, Z. A multi-objective model for scheduling of short-term incentive-based demand response programs offered by electricity retailers. Appl. Energy 2015, 151, 102–118. [Google Scholar] [CrossRef]
  16. Ghazvini, M.A.F.; Faria, P.; Ramos, S.; Morais, H.; Vale, Z. Incentive-based demand response programs designed by asset-light electricity providers for the day-ahead market. Energy 2015, 82, 786–799. [Google Scholar] [CrossRef]
  17. Zhong, H.; Xie, L.; Xia, Q. Coupon incentive-based demand response: Theory and case study. IEEE Trans. Power Syst. 2013, 28, 1266–1276. [Google Scholar] [CrossRef]
  18. Fakhrazari, A.; Vakilzadian, H.; Choobineh, F.F. Optimal energy scheduling for a smart entity. IEEE Trans. Smart Grid 2014, 5, 2919–2928. [Google Scholar] [CrossRef]
  19. Christopher, O.A.; Wang, L. Smart charging and appliance scheduling approaches to demand side management. Int. J. Electr. Power Energy Syst. 2014, 57, 232–240. [Google Scholar]
  20. Yousefi, S.; Moghaddam, M.P.; Majd, V.J. Optimal real time pricing in an agent-based retail market using a comprehensive demand response model. Energy 2011, 36, 5716–5727. [Google Scholar] [CrossRef]
  21. Shariatazadeh, F.; Mandal, P.; Srivastava, A.K. Demand response for sustainable energy systems: A review, application and implementation strategy. Renew. Sustain. Energy Rev. 2015, 45, 343–350. [Google Scholar] [CrossRef]
  22. Gu, W.; Yu, H.; Liu, W.; Zhu, J.; Xu, X. Demand response and economic dispatch of power systems considering large-scale plug-in hybrid electric vehicles/electric vehicles (PHEVs/EVs): A review. Energies 2013, 6, 4394–4417. [Google Scholar] [CrossRef]
  23. Bradley, P.; Leach, M.; Torriti, J. A review of the costs and benefits of demand response for electricity in the UK. Energy Policy 2013, 52, 312–327. [Google Scholar] [CrossRef]
  24. Silva, C.; Wollenberg, B.F.; Zheng, C.Z. Application of mechanism design to electric power markets. IEEE Trans. Power Syst. 2001, 16, 1–8. [Google Scholar] [CrossRef]
  25. Liu, Z.; Zhang, X.; Lieu, J.; Li, X.; He, J. Research on incentive bidding mechanism to coordinate the electric power and emission-reduction of the generator. Int. J. Electr. Power Energy Syst. 2010, 32, 946–955. [Google Scholar] [CrossRef]
  26. Cai, X.; Li, C.; Lu, Y. Price cap mechanism for electricity market based on constraints of incentive compatibility and balance accounts. Power Syst. Technol. 2011, 35, 143–148. [Google Scholar]
  27. Heine, K. Inside the black box: Incentive regulation and incentive channeling on energy markets. J. Manag. Gov. 2013, 17, 157–186. [Google Scholar] [CrossRef]
  28. Weber, J.D.; Overbye, T.J. A two-level optimization problem for analysis of market bidding strategies. In Proceedings of the IEEE Power Engineering Society Summer Meeting, Edmonton, AB, Canada, 18–22 July 1999; Volume 2, pp. 682–687.
  29. Lei, W.; Shahidehpour, M.; Zuyi, L. Comparison of scenario-based and interval optimization approaches to stochastic SCUC. IEEE Trans. Power Syst. 2012, 27, 913–921. [Google Scholar]
  30. Wang, B.; Yang, X.; Li, Q. Bad-scenario set risk-resisting robust scheduling model. Acta Autom. Sin. 2012, 38, 270–278. [Google Scholar] [CrossRef]
  31. North, M.J.; Collier, N.T.; Vos, J.R. Experiences creating three implementations of the repast agent modeling toolkit. ACM Trans. Model. Comput. Simul. 2006, 16, 1–25. [Google Scholar] [CrossRef]
  32. Rahimiyan, M.; Mashhadi, H.R. An adaptive Q-learning algorithm developed for agent-based computational modeling of electricity market. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 2010, 40, 547–556. [Google Scholar] [CrossRef]
  33. Naghibi-Sistani, M.B.; Akbarzadeh-Tootoonchi, M.R.; Bayaz, M.H.J.D.; Rajabi-Mashhadi, H. Application of Q-learning with temperature variation for bidding strategies in market based power systems. Energy Convers. Manag. 2006, 47, 1529–1538. [Google Scholar] [CrossRef]
  34. Haddad, M.; Altmann, Z.; Elayoubi, S.E.; Altaman, E. A Nash-Stackelberg fuzzy Q-learning decision approach in heterogeneous cognitive networks. In Proceedings of the IEEE Global Telecommunications Conference, Miami, FL, USA, 6–10 December 2010.
  35. Zuo, X.Q.; Fan, Y.S. A chaos search immune algorithm with its application to neuro-fuzzy controller design. Chaos Solitons Fractals 2006, 30, 94–109. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Yu, Y.; Jin, T.; Zhong, C. Designing an Incentive Contract Menu for Sustaining the Electricity Market. Energies 2015, 8, 14197-14218. https://doi.org/10.3390/en81212419

AMA Style

Yu Y, Jin T, Zhong C. Designing an Incentive Contract Menu for Sustaining the Electricity Market. Energies. 2015; 8(12):14197-14218. https://doi.org/10.3390/en81212419

Chicago/Turabian Style

Yu, Ying, Tongdan Jin, and Chunjie Zhong. 2015. "Designing an Incentive Contract Menu for Sustaining the Electricity Market" Energies 8, no. 12: 14197-14218. https://doi.org/10.3390/en81212419

Article Metrics

Back to TopTop