A Stackelberg Game Approach for Price Response Coordination of Thermostatically Controlled Loads

In this paper, we study the demand response of the thermostatically controlled loads (TCLs) to control their set-point temperatures by considering the tradeoff between the electricity payment and TCL user’s comfort preference. Based upon the dynamics of the TCLs, we set up the relationship between the set-point temperature and the energy demand. Then, we define a discomfort function with respect to the associated energy demand which represents the discomfort level of the set-point temperature. More specifically, the system is equipped with a coordinator named electric energy control center (EECC) which can buy energy resources from the electricity market and sell them to TCL users. Due to the interaction between EECC and TCL users, we formulate the specific energy trading process as a one-leader multiple-follower Stackelberg game. As the main contributions of this work, we show the existence and uniqueness of the equilibrium for the underlying Stackelberg games, and develop a DR algorithm based on the so-called Backward Induction to achieve the equilibrium. Several numerical simulations are presented to verify the developed results in this work.


Introduction
Demand response (DR) can be defined as a program, which induces the end-users to adjust their energy usage in response to changes in the electricity price over time [1,2]. Rapid growth of energy demand has greatly increased the supply burden of the power system. In addition, reliable operation of the system necessitates a perfect balance between supply and demand in real time, which is not easy to achieve because both of them can change rapidly and unexpectedly. Based on the advanced information technologies, DR has been considered as a promising way to resolve these emerging challenges and achieve potential cost saving [1,3,4].
Thermostatically controlled loads (TCLs), as a large fraction of the flexible demand in power grid, offer significant potential for DR [5,6]. They use local hysteresis control to maintain the internal temperature within a dead-band around the set-point temperature. Real-time pricing (RTP) is one of the most important DR programs, where the price rates vary continuously to reflect wholesale market demand changes. Because of the high efficiency gains from a long-term perspective [7], many works have applied the RTP program to manage the flexible electric demand in power grid, e.g., [8][9][10][11]. In this paper, we also specify a RTP based DR program to coordinate the set-point temperature of TCLs to accomplish some objectives.
In order to coordinate TCLs, model formulation of TCLs should be illustrated. Based upon the dynamics of TCL [12,13], two different aggregated TCL models were proposed to mitigate the imbalance of the power gird, say homogeneous model [14] and heterogeneous one [15]. In [5,16], modeling and control of the aggregated TCLs were studied aiming at different goals. However, the preference of each TCL user is not reflected in these works, which is an important indicator to describe the comfort level of TCL users. As stated in [17][18][19][20], the discomfort functions can be defined to reflect the discomfort level w.r.t the energy demand of TCL users. In this paper, we propose a discomfort function with respect to the dynamics of each TCL user.
We study the coordination of TCLs in a typical office or residential building. An electric energy control center (EECC), as a coordinator, is equipped to play the role of buying energy from the wholesale market and selling it to TCL users. Then, an energy trading process occurs between EECC and the TCL users, such that, EECC determines a selling price to maximize the utility benefits and each TCL user adjusts its set-point temperature to maximize their own profits with respect to the selling price from EECC. Considering the dynamics of TCLs, we build a relationship between the set-point temperature of TCLs and the energy demand to reach this temperature. Based upon the above relationship, the energy trading process between EECC and TCL users can be formulated w.r.t the energy demand of TCLs. Moreover, since the decisions between EECC and the TCL users are interacted, we apply a Stackelberg game which is an effective method in power systems [11,21]. Specifically, in this paper, a one-leader N-follower Stackelberg game is established such that EECC serves as a leader and the TCL users are the N followers. We show that the Stackelberg equilibrium exists and is unique, which can be achieved by a backward induction method [22].
Above all, the main contributions of this work can be summarized as below: • We study the demand response of the TCLs to control their set-point temperatures by considering the tradeoff between the electricity payment and TCL user's comfort preference; • According to the dynamics of TCLs, we set up the relationship between the energy demand and set-point temperature. Besides, we formulate the dissatisfaction function to represent the discomfort level of the set-point temperature; • Based upon the interaction between EECC and TCL users, we formulate the specific energy trading process as a one-leader N-follower Stackelberg game; • We show the existence and uniqueness of the equilibrium for the underlying Stackelberg games, and develop a DR algorithm based on the Backward Induction method to achieve the equilibrium.
The reminder of the paper is organized as follows. In Section 2, we specify the relationship between the energy demand and the set-point temperature and formulate the energy trading process as a DR problem under the RTP scheme. In Section 3, a one-leader N-follower Stackelberg game is established and the existence and uniqueness of the Stackelberg equilibrium is observed. Section 4 presents numerical simulations for the proposed method. In Section 5, we provide a conclusion for the developed work.
The key variables and parameters used in this paper are listed in Table 1.

Problem Formulation
In this paper, we consider a typical office or residential building equipped with a coordinator called EECC, whose role is to collect energy resources from the electricity market and allocate them to a group of TCL users N ≡ {1, 2, . . . , N}. The buying price of EECC is the market price, denoted by P, and the selling price is determined by itself, denoted by p. Each TCL user i (i ∈ N ) chooses its set-point temperature, denoted by θ i , based on the broadcasted price p from EECC. Then, EECC will provide the energy demand u i to TCL i to reach the set-point temperature θ i . The above energy trading process is shown in Figure 1. In this paper, we suppose that each TCL user is a price-taker and its decision will not affect the market price P. This is a universal assumption when the market involves a large population of users [23,24]. Denote the time horizon by T with T ≡ [t k , t k+1 ], where t k is the start time of this horizon, and T ≡ t k+1 − t k is the length of the time horizon. Section 2.1 provides the model of the TCL dynamics, based on which the relationship between the set-point temperature of a TCL and its energy demand is established in Section 2.2. Then, in Section 2.3, the energy trading process is introduced together with the preferences of TCL users and EECC.

TCL Dynamics
As stated in [14,16,25], for each TCL i ∈ N at any time t ∈ T , the evolution of the temperature can be expressed as a first-order differential equation, such that, where the notations are specified as below: • θ i (t) and θ a,i (t) represent respectively the internal temperature ( • C) and the ambient temperature ( • C) of TCL i at time t. • R i , C i and P i are thermal parameters which express the thermal resistance (kWh/ • C), thermal capacitance ( • C/kW) and cooling thermal power (kW) of TCL i, respectively. For notational simplicity, we denote the thermal constant by τ i , such that τ i ≡ R i C i .

•
The binary variable W i (t) ∈ {0, 1} represents the switch state of TCL i at instant t.

Remark 1.
In (1), we consider the TCLs in different houses or offices where the evolution of the internal temperature is mainly effected by the heat exchange between inside and outside, hence there is no heat exchange among the TCLs [26]. Besides, Equation (1) is formulated for cooling TCLs such as air conditioners. Then, P i in (1) is a positive constant.
To avoid TCL i switching frequently around its set-point temperature θ i , we adopt a temperature dead-band δ ≡ |[θ − i , θ + i ]|, where θ − i and θ + i are the lower and upper limit of the dead-band respectively, such that: Then, the switch state function in (1) is defined as follows [27,28]: where t is an arbitrarily small time interval. The temperature evolution procedure is shown in Figure 2. T on,i appeared in the figure is the time length that the "on" state of TCL i lasts during one time horizon [t k , t k+1 ]. Figure 2. Temperature evolution procedure of thermostatically controlled loads (TCLs).

Energy Demand of TCLs
At the start time t k of any time horizon, each TCL user chooses a set-point temperature θ i according to the broadcast price p. Then, each TCL needs to consume some energy to make its current internal temperature θ i (t k ) reach the set-point temperature θ i . Denote this energy demand by u i ≡ f i ( θ i ) which is a function of θ i . In this section, we derive the relationship between θ i and u i based on the dynamics of TCLs.

Remark 2.
In this paper, we consider a 15-minute time horizon (T = 15 min), which is small enough to neglect the variation of θ a,i (t) within T . That is, θ a,i (t) ≡ θ a,i (t k ), for all t ∈ T [29].
As the cooling thermal power P i is a preknown parameter of TCL i, the energy demand to reach θ i from the current θ i (t k ) within T can be expressed as the following form: Recall that T on,i is the time length that the "on" state of TCL i lasts during one time horizon [t k , t k+1 ], as shown in Figure 2.
Since the maximum value of T on,i is T, the maximum energy demand is β + i ≡ TP i . Then, The feasible set of u i is denoted by U i such that, By (3), W i (t) remains unchanged over T if the internal temperature θ i (t), t ∈ T always lies in the dead-band. And W i (t) changes only if θ i (t) hits the limits of the dead-band θ − i or θ + i for some t ∈ T . Then by (1), after the first change of W i (t), θ i (t) will need some certain time to reach the limit θ − i or θ + i . Therefore, if given appropriate parameters in (1), θ i (t) will not hit the boundary of dead-band twice within T . Then, we have the following assumption in this paper. Assumption 1. The switch state W i (t) of each TCL i changes no more than once over T .
Based upon Assumption 1, the operation process of TCL i in T can be divided into two cases w.r.t. W i (t k ).
• Case 1 (W i (t k ) = 1): By (1), we have the internal temperature at time t k + T on,i , such that, Combining (2a) and (3), it gives Then by (4), (7) and (8), the relationship between u i and θ i is • Case 2 (W i (t k ) = 0): Similar with Case 1, by (1)-(4), we have: In summary, we obtain the relationship between the energy demand u i and the set-point temperature θ i such that

Energy Trading Process
As shown in Figure 1, EECC first collects the energy from the wholesale market under the market price P, and then sells the energy to TCL users at a broadcasted price p. Each TCL user adjusts its set-point temperature based on the broadcasted price from EECC. Suppose that EECC and TCL users are strategic players, and all of them make decisions by optimizing their individual objectives. Next we will introduce the preference of EECC and TCL users.
For TCL user i ∈ N , determining θ i is equivalent to determining u i as we have a relationship between them in (11). Hence, TCL user can optimize its energy demand by minimizing its individual cost, which contains the electricity payment and the cost associated with its discomfort level. The individual cost of the i-th TCL user with respect to u i is given in the following: wherein the first term represents the electricity payment and the second is the dissatisfaction cost, and ω denotes a weighting coefficient concerning the importance of the TCL user's discomfort during T . For a rational TCL user, its discomfort level continuously decreases with the reduction of the set-point temperature. By (11), the dissatisfaction cost is a function of u i , say d i (u i ). At time t k , before choosing the set-point temperature θ i , each TCL user has a reference temperature, denoted by θ r i , representing its comfortable temperature. Then, the corresponding reference demand, denoted by q i , can be computed by (11) such that: where θ + i,j and θ − i,j represent the i-th TCL user's maximum and minimum set-point temperature in Case j respectively, with j = {1, 2}. (13), the reference temperature θ r i is the threshold value of the comfortable temperature, which is related to each TCL user's preference and external environment. It can be recognized as a criterion of the comfort level of TCL users. (11), the expression of f i ( θ i ) is distinct in different cases. Then, by the feasible set of energy demand in (6), we have θ + i,j and θ − i,j are related to the case j for all i ∈ N .

Remark 4. By
As specified in [18][19][20], the dissatisfaction cost function d i (u i ) is continuous and has the following properties: i , the TCL user is dissatisfied with the current temperature and the discomfort level will increase rapidly as θ i (demand u i ) is away from the reference temperature θ r i (demand q i ). For u i > q i , i.e., θ i < θ r i , the TCL user is satisfied with the current temperature, but the comfort level will not increase infinitely and change slowly as θ i is away from the reference temperature θ r i . Based on the above properties, we apply the dissatisfaction cost function d i (u i ) as the following form [18]: with b i > 0, where b i represents the priority factor of TCL user i. For EECC, it can obtain benefits by buying energy from the market and selling it to TCL users. Thus as a rational EECC, the selling price should be larger than the market price, i.e., p ≥ P. Define the feasible set for the broadcast price p such that Besides, EECC should consider the discomfort of all the TCL users, otherwise EECC may set the selling price very high to get more benefits. Hence, the utility function of EECC can be expressed as the following form: where u = [u 1 , u 2 , ..., u N ] represents the energy demand of all the TCL users.

Stackelberg Game Coordination
As stated in the previous section, EECC buys the total energy that all the TCL users demand from the wholesale market under the market price P and broadcasts a selling price p to each TCL user. Then, based on the broadcast price p, each TCL user determines the energy demand u i i.e., setting its set-point temperature θ i .
Note that the decisions between EECC and the TCL users are actually interdependent. We establish a Stackelberg game to describe the interplay of TCL users and EECC in Section 3.1. Furthermore, the existence and uniqueness of the Stackelberg equilibrium are specified in Section 3.2.

Stackelberg Game
Since there exists a hierarchy among players in Stackelberg games, leaders are in a position to enforce their strategies on the followers. In this leader-follower competition, the followers find the best response function first, i.e., getting to know how they will respond once they observe the strategies of leaders. The leaders are aware of the fact that each follower will choose its best response with respect to the leaders strategies. Hence, the leaders are able to maximize their payoffs anticipating the predicted response of the followers. This is actually observed by the followers to adapt their expected strategy accordingly as a response.
We introduce a one-leader, N-follower Stackelberg game to characterize the electricity transaction process between EECC and TCL users, where EECC serves as the leader and TCL users act as followers. Thus, the system proceeds by the following two stages:

•
Stage I: Each TCL user i implements the best response function with respect to the broadcasted price p from EECC.

•
Stage II: EECC optimizes the broadcasted price p * considering TCL users' best response u * (p) at Stage I.
Then observing EECC's best strategy, each TCL user i determines its optimal energy demand u * i under the broadcast price p * from Stage II. Based on the above set-up, the optimization problem can be formally formulated as the following: • Leader level: • Follower level: The optimal strategies of the game take the form of the Stackelberg equilibrium [30,31]. At the equilibrium, the leader's strategy p * is a solution to the optimization problem specified in (17) based on the best strategy trajectories u * (p) of the followers. Each follower's strategy is also a solution to (18) when it is informed of the equilibrium strategy of the leader. The optimal strategies u * i (p * ), i ∈ N therefore constitute the equilibrium for all the followers.
Then, we have the following definition of the Stackelberg equilibrium [18,22].

Existence and Uniqueness of Stackelberg Equilibrium
Based on the above analysis of the game process, we can deduce the Stackelberg equilibrium by backward induction method [22]. Firstly, each follower determines its best strategy trajectory by solving (18) with respect to a strategy p from the leader. Then, combining the best strategy trajectory u * (p) with (17), the leader obtains its best strategy p * . Subsequently, each follower determines its best strategy u * i (p * ) when it is informed of the best strategy p * of the leader.

Lemma 1.
Given a broadcast price p from EECC, each follower has a unique optimal strategy u * i (p), such that: Proof of Lemma 1. By (12) and (14), we obtain that each follower's utility function C i (u i ; p) is continuous and differentiable over a convex set U i . Then, by (12), we have Hence, C i (u i ; p) is a strictly convex function w.r.t. u i . By ∂C i (u i ; p)/∂u i = 0, we obtain the optimal trajectory w.r.t p as below: Furthermore, because the feasible set U i defined in (6) is a bounded set, the boundary conditions of the optimal strategy in (21) is determined by (21).
Based on the best strategies u * (p), EECC determines the best electricity prices p * by maximizing its utility function (16).

Lemma 2.
The leader has a unique optimal strategy p * , such that: where p max ≡ max i∈N Proof of Lemma 2. The Proof of Lemma 2 is given in Appendix A.
Remark 5. From Lemma 2, there exists a unique optimal strategy (22) when p ∈ [P, p max ). Considering p ≥ p max , we have u * i (p) = 0 by (21), for all i ∈ N . In addition, by (16), we obtain that the utility function is a constant for all p ≥ p max . Therefore, there is no unique optimal strategy in the given range of p. Theorem 1. Considering p max > P, there exists a unique Stackelberg equilibrium (p * , u * ) for the proposed game. (20) holds. Then, by Lemma 2,(19) is satisfied. Therefore, according to the Definition 1, (p * , u * ) is the unique equilibrium of the Stackelberg game.

Remark 6. If p max ≤ P, then we have EECC's utility function S E
. Thus, considering p max ≤ P, EECC cannot find a unique optimal strategy.
Based upon Theorem 1, we specify Algorithm 1 to achieve the Stackelberg equilibrium of the game.

Require:
Initialize the time horizon T ≡ [t k , t k + T]; Initialize the switch state W i (t k ) of TCL user i; Initialize the market price P; Initialize the reference temperature θ r i of TCL user i; Set the reference demand q i of TCL user i by (13) w.r.t θ r i .

Ensure:
EECC's optimal broadcast price p * ; Each TCL user's adjusted set-point temperature θ * i . 1: By (22), EECC determines the optimal broadcast price p * w.r.t the best strategy trajectory u * i (p) in (21); 2: Each TCL user i ∈ N determines the optimal strategy u * i w.r.t. p * by (21); 3: Each TCL user i ∈ N computes their optimal set-point temperature As stated in [18,19], compared with other methods which usually involve interactive iteration processes between the leader and the followers, which are EECC and the TCL users respectively in the underlying games, the DR algorithm based on Backward Induction proposed in our work can significantly reduce the computational time in implementing the equilibrium of the underlying Stackelberg games.

Simulation
In this part, some case studies are analyzed to demonstrate the price response coordination of TCLs. The proposed Stackelberg game model and control scheme are validated by the simulations in MATLAB 2014a. Besides, we use the interior-point method to solve the optimization problems and the computational time of all cases are limited in 2.0 s. We adopt a typical 15-min based pricing by dividing 9-h into 36 equal time instants [19], as shown in Figure 3. An ambient temperature profile from 11:00 to 20:00 in a typical summer day is shown in

Homogeneous Case
We first consider N = 100 homogeneous TCLs, and the parameters of the TCLs are specified in Table 2 [32]. We set the weighting factor of the importance of the discomfort level as ω = 0.2. Without loss of generality, assume that W i (t k ) = 0 with t k = 11 : 00, for all i ∈ N , i.e., the switch state of each TCL is "off" at 11:00. As specified in Section 2.2, the "off" state implies that j = 2.
We also consider the internal temperature θ i (t k ) = 27 • C for all i ∈ N and the temperature dead-band δ = 0.25 • C.
Then, according to the reference temperature θ r i of TCL user i shown in Figure 4, we obtain the reference demand energy q i by (13), which is displayed by the blue dash-dot line in Figure 5. More specifically, taking one time horizon [11:00, 11:15] as an example and given θ a,i (t k ) = 31.2 • C, Then by (13), we have q i (t k ) = β + i = 2.275 kWh. By applying Algorithm 1, EECC implements the optimal price p * w.r.t u * (p) by (22), which is displayed by the red line in Figure 6.  The broadcast price p * satisfies p * ∈ ωb i q i Then, by (21), the optimal energy demand of each TCL user u * i increases as p * decreases from 11:00 to 20:00, which is displayed in Figure 5.
Subsequently, according to the relationship between the set-point temperature and the energy demand specified in Section 2.2, each TCL user adjusts its set-point temperature by θ * i = f (−1) i u * i , which is displayed by the red line in Figure 7. Consider the time horizon [13:00, 13:15] as an example. Based upon the reference demand q i = 0.691 kWh, the market price P = 0.12 $/kWh and the optimal reaction curve u given in Lemma 1, we obtain the optimal broadcast price p * = 0.219 $/kWh by (22). Afterwards, TCL users observe the best strategy of EECC and compute their best strategies u * i (p * ) by (21). The corresponding set-point temperature is θ * i = 25.98 • C. Because of the existence of the dead-band δ, the internal temperature varies by (1) in [13:00, 13:15], and the switch state will change when the internal temperature hits the upper limit θ * i + δ/2 = 26.11 • C. Moreover, after 13:00, for keeping the internal temperature around the reference temperature 26 • C, the switch state changes one time within each time horizon and the optimal set-point temperature stays around 26 • C, as illustrated in Figure 7. However, given the same ambient temperature θ a,i , by (9) and (10), the associate reference energy demand q i are distinct in different cases. This causes the fluctuation of the energy demand trajectory as displayed in Figure 5.

Heterogeneous Case
In general, the aggregated TCLs' switch state are different [14]. For the purpose of demonstration, we suppose that the total 100 TCLs are partitioned into two categories, say 50 TCLs are with W i (t k ) = 1 and another 50 TCLs with W i (t k ) = 0. As a sequence, the profile of the aggregated energy demand of the 100 TCLs is displayed by the black line in Figure 8.
As observed in Figure 8, the fluctuations of individual TCLs are alleviated by the aggregated TCLs with different W i (t k ). Thus, we may induce the TCL users to adjust its set-point temperature, to mitigate the fluctuation of the power grid by broadcasting different prices to the groups of TCL1 and TCL2 respectively. Furthermore, because of the different characteristics of the TCL users, the reference temperature will change with respect to the variational external environment, such as the ambient temperature and the human actions in the room. Therefore, in Figure 9, we consider a scenario with variational reference temperature. EECC broadcasts price p * (displayed by the red line) and TCL user i implements the set-point temperature θ * i accordingly (displayed by the purple line) at each instant to maximize the utility benefit and minimize the individual cost of each TCL user.
In reality, the TCLs' properties vary according to the different preferences of TCL users. Thus, besides the above study for homogeneous TCLs, here we also apply Algorithm 1 for the heterogeneous cases.
We first consider different priority factors of TCL users. By (14), we obtain that the TCL user with higher b will have more discomfort when the set-point temperature exceeds the reference temperature. Therefore, the set-point temperature of TCL2 with b 2 = 1.2 decreases faster than TCL1 with b 1 = 1.1, which is displayed in Figure 10. Furthermore, we consider the heterogeneous case with variational reference temperature and different properties of TCLs. The parameters of heterogeneous TCLs are shown in Table 2. Figures 11 and 12

Conclusions and Ongoing Reasearch
We have studied the coordination of TCLs under a Stackelberg game based price response scheme. Based upon the dynamics of the TCLs, we first establish the relationship between the set-point temperature and the energy consumed to reach the set-point temperature. Then, a discomfort function is defined to represent the discomfort level of the set-point temperature. Based upon the interplay of TCL users and EECC during the electricity trading process, a one-leader N-follower Stackelberg game is established. EECC optimizes its selling price considering the tradeoff of its electricity gross benefit and the dissatisfaction cost of TCL users, while TCL users make decisions by minimizing the electricity payments and the dissatisfaction cost. Compared with other iteration methods in the literature, a more effective DR algorithm by backward induction method is proposed to achieve the unique Stackelberg equilibrium. At the equilibrium, EECC maximizes its utility function and each TCL user adjusts its set-point temperature to minimize its cost.
In the future, unlike the model considered in the current work, we will extend our work by considering the heat exchanges among the TCLs which are interactive with each other. Besides, we would like to design a different electricity price scheme to satisfy different users' preferences and maximize the utility benefits.
Author Contributions: This paper is a result of the collaboration of all authors. P.W. and Z.M. conceived and designed this work. X.W. performed the experiments. P.W. and S.Z. wrote the paper.
Funding: This research was funded by International S&T Cooperation Program of Beijing Institute of Technology grant number GZ2016065101.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

TCL
Thermostatically controlled loads EECC Electric energy control center DR Demand response RTP Real-time pricing

Appendix A. Proof of Lemma 2
According to the value of market price P, we have the following two cases.
Based on the boundary conditions in (21), the value of the leader's strategy p in Case 1 can be divided into two subcases.
Case (1A): P ≤ p ≤ min i∈N ωb i q i e b i (1−β + i /q i ) By (21), we have u * i (p) = β + i , ∀i ∈ N . Then by (17), we obtain the leader's optimal strategy p * in the following: where P 1 ≡ p|P ≤ p ≤ min i∈N .

Case (1B): min i∈N
We denote the feasible set of p in Case (1B) by P 2 , such that, In addition, we specify three sets N 1 , N 2 and N 3 , such that, Then, together with (17), we obtain that, max p∈P 2 S E (p; u * (p)) = max Take the second derivative of the utility function S E (p; u * (p)) with respect to p, we have, Hence, the optimization problem (A6) has a unique optimal strategy p * .
Case 2: min i∈N ωb i q i e b i (1−β + i /q i ) < P < p max By (15), we have p ∈ [P, p max ). In addition, by Lemma 1, TCL users will have different optimal strategies β + i , q i − q i b i ln q i p ωb i , 0. Similar with Case (1B), there exists a unique strategy of the optimization problem (A6) .
In sum, consider p ∈ [P, p max ), there exists a unique optimal strategy p * in (22).