On Optimistic and Pessimistic Bilevel Optimization Models for Demand Response Management

This paper investigates bilevel optimization models for demand response management, and highlights the often overlooked consequences of a common modeling assumption in the field. That is, the overwhelming majority of existing research deals with the so-called optimistic variant of the problem where, in case of multiple optimal consumption schedules for a consumer (follower), the consumer chooses an optimal schedule that is the most favorable for the electricity retailer (leader). However, this assumption is usually illegitimate in practice; as a result, consumers may easily deviate from their expected behavior during realization, and the retailer suffers significant losses. One way out is to solve the pessimistic variant instead, where the retailer prepares for the least favorable optimal responses from the consumers. The main contribution of the paper is an exact procedure for solving the pessimistic variant of the problem. First, key properties of optimal solutions are formally proven and efficiently solvable special cases are identified. Then, a detailed investigation of the optimistic and pessimistic variants of the problem is presented. It is demonstrated that the set of optimal consumption schedules typically contains various responses that are equal for the follower, but bring radically different profits for the leader. The main procedure for solving the pessimistic variant reduces the problem to solving the optimistic variant with slightly perturbed problem data. A numerical case study shows that the optimistic solution may perform poorly in practice, while the pessimistic solution gives very close to the highest profit that can be achieved theoretically. To the best of the authors’ knowledge, this paper is the first to propose an exact solution approach for the pessimistic variant of the problem.


Introduction
Stackelberg game models and the corresponding bilevel programming solution approaches for demand response management have received considerable attention recently. When focusing on the operational level, most models capture the interplay of an electricity retailer, who is the leader in the Stackelberg game, and its multiple consumers, who act as the followers. In the sequential game, the leader decides first on the electricity tariff or some other incentives, whereas the followers respond to the tariff by scheduling their loads accordingly. Stackelberg game models assume that the load response is calculated by solving an optimization problem, with the tariff as the parameter, to optimality. Numerous such approaches have been published, including models with deferrable or curtailable loads, batteries, electric vehicles (EVs), etc. [1][2][3][4][5].
In this paper, it is argued that despite the remarkable results, an important detail is frequently overlooked: the followers often have a large set of optimal solutions to their problems, and the selection of the response from this set is not defined properly. Moreover, different optimal solutions for the follower may bring radically different benefits for the leader. In bilevel optimization, the optimistic assumption states that the followers select their optimal solution that is the most favorable for the leader. In contrast, the pessimistic variant deals with the case where the followers return their least favorable optimal response to the leader, and hence, it safeguards from potential losses due to an unexpected selection. Most previous approaches in the literature implicitly make the optimistic assumption, although this assumption can hardly be enforced in practice. In this paper, the difference between the two variants of the bilevel problem is highlighted. Then, to the best of the authors' knowledge, the first efficient exact solution approach for the pessimistic variant of a bilevel electricity tariff optimization problem in the literature is introduced. Moreover, it is shown that in most cases, the profit of the leader in the pessimistic variant can approach the profit that can be achieved in the optimistic variant.
It is important to emphasize at this point the difference between bilevel optimization as a mathematical modeling and solution approach, and hierarchical modeling techniques in general. Bilevel optimization [6] applies formal mathematical techniques to characterize and find the equilibrium in game-theoretical decision situations with two fully rational parties with given constraints and objectives, and a well-defined serial decision workflow. This is different from generic hierarchical modeling techniques that analyze the interplay of two or more decision makers, typically by solving the problems faced by the individual parties one by one, and constructing the overall outcome by assuming some coordination mechanism between them, often using simulation techniques; see, e.g., [7]. This paper considers bilevel optimization strictly in the former sense.
Main results. On the one hand, this paper proves formal properties of the optimal solutions of a bilevel tariff optimization problem, both for the easily tractable singleconsumer special case, and the computationally hard general case with an arbitrary number of consumers. The main implication of these results is the reduction of the pessimistic variant to the optimistic one by perturbing the problem data and also the optimal price vector, which also results in the first efficient solution approach for the pessimistic variant. On the other hand, a numerical case study is presented that demonstrates that solving the optimistic problem may directly cause a significant loss of profit for the retailer if the consumers do not choose their optimal solution as expected.
Structure of the paper. After a brief literature review in Section 2, the bilevel electricity tariff optimization problem is defined formally in Section 3. Some general observations are presented in Section 4. In Section 5, a special case with one consumer only is studied. The general optimistic variant is treated in Section 6.1, and the pessimistic one in Section 6.2. An experimental evaluation is presented in Section 7, and the paper concludes in Section 8.

Literature Review
Introduced by the seminal paper of Bracken and McGill [8], bilevel optimization has become a field rich in deep theoretical results and with many practical applications; see, e.g., Bard [9], Dempe [6], and Colson et al. [10]. One of the central questions is expressing the optimality of the followers' solutions in mathematical programming formulations. To this end, Karush-Kuhn-Tucker (KKT) necessary optimality conditions, or the Fritz John necessary optimality conditions, value functions, or penalty functions can be used [11][12][13][14][15][16].
As for the methods, the optimistic or strong bilevel optimization problem appears to be easier to solve than the pessimistic or weak bilevel problem in general; see, e.g., [17]. Most approaches reduce the strong bilevel optimization problem to a single-level problem by expressing the optimality of the lower-level solution by using one of the techniques mentioned above and then applying some non-linear programming methods for solving the resulting formulation [18]. There are many results for special cases. The linear bilevel programming problem, in which all constraints and objective functions are linear, is well understood; see, e.g., Ben-Ayed [19] and Dempe [6]. Lozano and Smith [20] described an exact method for nonlinear bilevel optimization problems, in which the leader has only integer variables, while the follower can have both integer and continuous variables. The constraints and the objective functions can be nonlinear, but all constraints are separable in terms of the leader and follower variables. They used a value function-based problem formulation in their enumeration procedure, and they extended it to the pessimistic problem as well. Since the leader's variables can take only discrete values, an optimal solution always exists, provided the problem is feasible. Brotcorne et al. [21] studied a concrete application in which the leader sets freight tariffs on the arcs of a traffic network, and the follower aims at minimizing its transportation costs while satisfying transportation demands. Both objective functions are non-linear, but all constraints are linear. The authors proposed heuristics to obtain good solutions.
As for the weak variant, Loridan and Morgan [22], approximated the optimal solution by solving a sequence of strong bilevel optimization problems. Wiesemann et al. [23] provided an in-depth study of the pessimistic bilevel optimization problem under some restrictions. That is, the follower's feasible set must be independent of the leader's solution, and both the leader and the follower must have a compact set of feasible solutions. However, integrality of the variables at both levels is permitted. Under these assumptions, the optimal solution is approximated by solving a sequence of problems obtained by relaxing the value function of the follower by a decreasing sequence of additive constants. A new relaxation, based on the value function approach, was proposed by Zeng [17]. The approach works if an optimal solution exists, but the author also discussed some remedies when that is not the case, which may help sometimes. The crux of the method is to reduce the pessimistic bilevel optimization problem to solving one or two optimistic problems with a few additional constraints.
The electricity tariff optimization problem in scope and its various extensions have been investigated extensively in the electrical engineering community for demand response management in smart grids. This formulation is named the simple multi-period energy tariff optimization problem (SMETOP) in [24], where its NP-hardness is proved in case of multiple followers. Various papers address the extensions of the SMETOP, including generic piecewise linear, quadratic, or other non-linear follower utility functions [3,4]; battery storage at the follower [2], multi-energy systems [25]; or the heating, ventilation, and air-conditioning (HVAC) of buildings using a dedicated thermal model [26]. The typical solution approach is reformulating the bilevel problem into an equivalent single-level problem using the KKT conditions, eliminating non-linear terms, and then solving the model as a mixed-integer linear program (MILP) [3,5,27,28]. The possible alternatives include exploiting strong duality for the follower's linear problem to convert the problem into a single-level quadratic program [2], or to use custom (meta-)heuristics to keep the computational load at bay [4,29,30]. A more detailed review on the solution approaches to bilevel programming models of demand response management can be found, e.g., in [2]. However, these approaches are applicable only to the optimistic (strong) variant of the bilevel problem. Applications of bilevel programming to energy networks are reviewed in [18]. Extensions to the stochastic case are presented, e.g., in [31,32].
The difference between the optimistic and the pessimistic variants of the bilevel optimization problem specifically in energy management is emphasized in [33]. The paper introduces the notion of a deceiving solution to denote the worst possible outcome for the leader if it applies the optimistic assumption but the follower deviates from the expected response, and similarly, the rewarding solution for the best possible outcome if the leader applies the pessimistic assumption but the follower decides for an unexpectedly favorable response. The paper applies a hybrid solution approach by combining a genetic algorithm and a MILP solver to find close-to-optimal solutions for both the optimistic and the pessimistic variants of the semivectorial bilevel problem in which the follower addresses the minimization of the bi-criteria composed of electricity cost and discomfort. However, the authors are not aware of efficient exact solution approaches to pessimistic bilevel optimization problems applicable to energy management.

Problem Definition
The paper investigates a bilevel electricity tariff optimization problem for demand response management as follows. In the bilevel problem, the leader is an electricity retailer who controls the electricity tariff (unit price) over a finite time horizon divided into T time periods, e.g., the 24 hours of a day. For each t ∈ {1, . . . , T}, let c t be the wholesale market price of electricity, Q the average unit price, and q l t and q u t the lower and upper bounds, respectively, on the unit price of the electricity (to be determined by the leader) in time period t. There are m followers, the consumers, who buy electricity at the given prices over the time horizon in order to meet their demands. Follower i attributes some utility u it to consuming one unit of electricity in each time period t, its total demand over the time horizon is at least D l i and at most D u i , and its consumption will be between x l it and x u it in each time period t.
It is noted that different loads of a household, which are scheduled independently (e.g., air conditioning, a washing machine, an EV charger, and other, inflexible loads) can be captured as separate consumers (followers) in the model. Likewise, a single consumer in the model can capture the ensemble of consumers with similar parameters in reality.
The profit of the leader, for a given price vector q, and the consumption vector x of the followers, are The leader wants to determine the unit prices q t in order to maximize its profit, that is, subject to the constraints where F is a function mapping the price vector to a profit value, and it can be evaluated after the followers solve their own optimization problems. F is defined after presenting the followers' optimization problems. In fact, each follower i solves a continuous knapsack problem in which the objective function is parameterized by the price vector set by the leader: Assuming that (5)-(6) has a solution, each follower i has at least one optimal solution for any price vector q. Let Ω(q) ⊂ R m×T denote the set of all optimal solutions of the followers. Observe that Ω(q) is never empty. If the followers have a unique optimal solution for q, i.e., Ω(q) contains only one element, then the leader's profit is well defined for q. However, if Ω(q) has 2 or more members, then it is not clear in advance, which optimal solution would be returned by the followers. For instance, if u it − q t = u it − q t , x u it = x u it , and x l it = x l it , for some t = t , and either x it or x it can be set to upper bound in an optimal solution, then it is up to follower i which one to choose. However, its decision can significantly impact the profit of the leader. In the optimistic or strong variant of the bilevel problem, it is assumed that in cases with multiple optimal solutions, the followers return the one most favorable for the leader, i.e., In contrast, in the pessimistic or weak variant, the leader prepares for the worst case; thus F(q), is computed using the least favorable optimal solution of the followers.
As it will be shown shortly, the maximum of F p (q) may not be attained by any price vector q; hence, in the pessimistic variant, (1) is replaced by The difference between the optimistic and pessimistic variants is illustrated by a small example.
Example 1 (Difference between the optimistic and the pessimistic variants). Suppose T = 2, there is only one follower, the leaders's average tariff is Q = 30, and the follower's desired consumption is D l = D u = 1. Further data is depicted in Table 1. In the optimistic variant of the problem, the optimal tariff vector is q = (20, 40) for which the best response is x o = (1, 0) giving an objective function value of 10. In contrast, in the pessimistic variant of the problem, if u 2 − q 2 ≥ u 1 − q 1 , the follower will load the second period with 1 unit of consumption, i.e., x p = (0, 1), for which the leader's objective function value is q 2 − 50. Since 20 ≤ q t ≤ 40 and q 1 + q 2 ≤ 60, u 1 − q 1 ≤ u 2 − q 2 for any feasible q. Thus the best option for the leader is q 2 = 40 (with arbitrary q 1 ), and its objective function value is −10 on x p = (0, 1). Observe that with higher values of c 2 , the loss of the leader can increase arbitrarily.  The next example shows that the pessimistic variant may not have an optimal solution, which justifies the supremum in (7). Example 2 (No optimal solution for the pessimistic variant). Suppose T = 2, there is only one follower, the leaders's average tariff is Q = 40, and D l = D u = 1. Further data are depicted in Table 2. The optimistic solution is q = (40, 40) and x o = (1, 0), resulting in a profit of 30 for the leader. However, for q = (40, 40), the pessimistic answer would be x p = (0, 1) for which the leader's objective function value is −10. However, the leader can do much better by setting q = (40 − , 40). Then u − q = ( , 0); thus, the follower's unique optimal solution is x p = (1, 0) for which the leader's objective function value is 30 − . Clearly, the supremum of the leader's objective function value is 30, but it cannot be attained by any feasible solution.

The Continuous Knapsack Problem
This section briefly overview the key properties of optimal solutions of the continuous knapsack problem as follows: Note that the w t are not restricted in sign.

The continuous knapsack problem (8) admits a feasible solution if and only if
When feasible, it always has a finite optimum, since all variables are bounded. Without loss of generality, D l ≥ ∑ t∈[T] x l t .

Proposition 1.
Suppose the continuous knapsack problem (8) admits a feasible solution. Then it has an optimal solution x of the following structure: 1.
, w t ≥ w p for t ∈ U, and w t ≤ w p for t ∈ L. Moreover, such a partitioning can be computed in O(T log T) time by determining a permutation π such that w π(t) ≥ w π(t+1) for t = 1, . . . , T − 1.

General Properties of Optimal Solutions
Firstly, observe that without loss of generality, the lower bounds on the prices can be assumed to be 0.

Proposition 2.
If q l t > 0, then an equivalent problem can be derived by setting Substituting q t withq t + q l t in (1)-(3) + (4)-(6) yields a formulation satisfying the properties of the statement.
From now on, the following assumption is made: The minimum consumption of each follower i is at least ∑ T t=1 x l it , while the maximum consumption is at most ∑ T t=1 x u it . Moreover, if D u i = ∑ T t=1 x l it , then the follower i has a unique optimal solution, which is independent of q. Hence, without loss of generality, the following assumption also holds: Now, an easy observation can be made about the leader's optimal price vector, which is valid in the optimistic and in the pessimistic variant of the bilevel tariff optimization problem.
Proof. Suppose q does not satisfy the conditions of the statement. Then for > 0 sufficiently small, the price vectorq = (q 1 + , . . . , q T + ) is feasible, and induces the same partitioning of the time periods as q for each follower i; cf. Proposition 1. Hence (q, x ) constitutes a feasible solution for (1)-(3) + (4)-(6), and where the last inequality follows from the assumption of the theorem. However, it follows that (x , q ) is not an optimal solution, a contradiction.

Polynomially Solvable Special Cases with One Consumer Only
This section investigates the one-consumer special case (m = 1), and under some further restrictions, polynomial time algorithms are provided for solving the optimistic and pessimistic variants as well.
Throughout this section, it is assumed that the prices are unbounded; i.e., q u t = ∞ for all t. In fact, by (2), it may equivalently be assumed that q u t = QT for all t. Further on, some assumptions on regularity are introduced in the next section.
Firstly, the optimistic variant is discussed in Section 5.1, and the pessimistic one in Section 5.2.

The Optimistic Variant
Let us assume that D l = D u , and let D denote the common value. The case with D l < D u will be discussed later. (2) and (3).

Observation 1.
If there is at least one regular time period, then D > 0.
Definition 3. From now on, an optimal solution (q , x ) of (1)- In the following results, it is assumed that x is an optimal solution of the continuous knapsack problem (8) for weights w t = u t − q t , and it respects the conditions of Proposition 1 for some partitioning L ∪ P ∪ U of [T], where P = {p}. Lemma 1. Assume (1)-(3) + (4)-(6) admits a non-degenerate optimal solution (q , x ), and suppose t ∈ L is a regular time period. Then Define a new price vector Thenq is a non-degenerate feasible solution of (1)-(3). Moreover, x is an optimal solution of the continuous knapsack problem (8) with weightsw t = u t −q t , as it satisfies the conditions of Proposition 1 for the same partitioning L ∪ P ∪ U of [T]. However, the objective value (1) changes by Due to the regularity assumption the change in the objective value is positive: which contradicts the optimality of (q , x ).

Lemma 2.
Assume (1)-(3) + (4)-(6) admits a non-degenerate optimal solution (q , x ), and suppose t ∈ U is a regular time period. Then Proof. (sketch) Analogous to that of Lemma 1. It is only mentioned that in this casẽ The rest follows from the regularity assumption, i.e., D/T < x u t .
Lemma 3. Suppose all time periods are regular, and (1)-(3) + (4)-(6) admits a non-degenerate optimal solution (q , x ). Then Proof. Since u t − q t = u p − q p for all t ∈ [T] by Lemmas 1 and 2, it holds for any where the second equation follows from Proposition 3, since ∑ T t=1 x t = D > 0, as each time period is regular. Now, a necessary and sufficient condition is provided for the existence of a nondegenerate optimal solution. Let u min = min t∈[T] u t .
In order the prove the converse direction, let us relax the bound constraints (3) for the q t variables, i.e., −∞ < q t < ∞. Note that this relaxation permits unbounded optimum value for the leader. However, as it is shown below, this is not the case. Fix some feasible price vector q , and let x be the corresponding optimal solution of the follower respecting the partitioning L ∪ P ∪ U of [T] given by Proposition 1. If ∑ T t=1 q t < QT, then while increasing all coordinates of q by the same value, the follower's solution x remains optimal, and the profit of the leader increases. Thus, without loss of generality, ∑ T t=1 q t = QT. Suppose P = {p} in the partitioning. If q fails to satisfy u t − q t = u p − q p for some t ∈ [T], then almost the same transformations can be applied as in Lemmas 1 and 2 to conclude that the leader's objective function value can be improved: . In either case, x remains optimal for the resulting price vector, and the leader's objective function value strictly increases.
By repeating this transformation, a solutionq of the relaxed problem is derived such that u t −q t = u p −q p for all t ∈ [T], while x remains optimal forq. However,q satisfieŝ Hence,q is a non-degenerate feasible solution for the leader and (x ,q) has a strictly greater objective function value than (x , q ). Since the above argument applies to any vector q with finite coordinates only, it can be deduced that u min − (1/T) ∑ τ∈[T] u τ + Q > 0 implies that there exists a non-degenerate optimal solution of the bilevel tariff optimization problem. Theorem 2. Suppose D l = D u , all time periods are regular, and (1)-(3) + (4)-(6) admits a non-degenerate optimal solution (q , x ). Then Moreover, the optimal consumptions x can be obtained by solving the continuous knapsack problem: Proof. The first part of the statement follows from Lemma 3, and the second part from the optimality of q .
Note that Theorem 2 yields an optimal solution for the optimistic variant of the bilevel tariff optimization problem. Then, x can be computed by using Proposition 1. Now, consider the more general case when D l < D u .
Theorem 3. Suppose D l < D u , all time periods are regular, and (1)-(3) + (4)-(6) admits a non-degenerate optimal solution (q , x ). Then x satisfies Proof. The first inequality follows from the feasibility of x . Define q t as in Theorem 1.
i.e., the objective function coefficient of the follower is the same in all time periods. Hence, the follower's optimal solution is chosen based on the leader's objective function ∑ t∈[T] (q t − c t )x t . That is, the follower solves (8) with then in any optimal solution of the follower, the periods are loaded to the least possible extent until D l is reached, since all objective function coefficients (of the follower) are negative. Therefore, since the follower chooses an optimal solution which maximizes the leader's objective function value, the follower solves (8) with cost vector w t := q t − c t for t ∈ [T], while D u is replaced with D l . Analogously, if (1/T) ∑ τ∈ [T] u τ − Q > 0, then in any optimal solution of the follower, the periods are loaded to the maximal possible amount until D u is reached. Therefore, the follower solves (8) with cost vector w t := q t − c t for t ∈ [T], while D l is replaced with D u .
Finally, if (1/T) ∑ τ∈ [T] u τ − Q = 0, then the follower chooses its optimal solution solely by considering the objective function of the leader-namely, it solves the fractional knapsack problem (8) with weights w t := q t − c t . The result follows from Proposition 1.

The Pessimistic Variant
Under the conditions of Theorem 2, the optimum value of the pessimistic variant (where (1) is replaced with (7)) can be approximated by a slight perturbation of the optimal price vector for the optimistic variant.
Theorem 4. Suppose that all time periods are regular, and (1)-(3) + (4)-(6) admits a nondegenerate optimal solution (q , x ) such that ∑ T t=1 x t > 0. Then for any > 0, there exists δ > 0 such that for the price vector (q ) δ obtained by the δ-perturbation of q , the follower has a unique optimum x δ and ∑ T t=1 Proof. By assumption, the conditions of Proposition 3 are satisfied, so ∑ T t=1 q t = Q · T. Then, it holds that For a sufficiently small δ, (q ) δ ≥ 0, since q t > 0 for all t by assumption. Moreover, all the values u t − (q ) δ t are different, and u t − (q ) δ t > u k − (q ) δ k if and only if q t − c t > q k − c k . It follows that for the price vector (q ) δ , the follower will prefer the time periods with higher q t − c t values. On the one hand, x is an optimal solution of the follower for the price vector (q ) δ . On the other hand, since (q ) δ t ≥ q t − (T − 1)δ, the decrease of the objective function value of the leader is at most Therefore, for δ = /((T − 1)D), the leader's objective function value decreases by at most , as claimed.
If the conditions of Theorem 4 are not satisfied, then the more general Theorem 5 can be applied to obtain a suboptimal solution of the pessimistic variant of the bilevel tariff optimization problem; see Section 6.2.

Solution of the General Optimistic Variant
This section presents an equivalent single-level MILP formulation for the optimistic variant of the bilevel tariff optimization problem, for arbitrary number of followers. No restrictions are imposed on the problem data, except Assumptions 1 and 2.
The MILP is derived from a reformulation of the followers' problems using the familiar complementary slackness conditions of linear programming at the expense of using new binary indicator variables. Moreover, the quadratic term ∑ T t=1 ∑ m i=1 q t x it that appears in both the leader's and the followers' objective functions is substituted with an equivalent linear expression from the equivalence of the followers' primal and dual objective functions.
Let us start by formalizing the dual of the linear program (4)-(6) of follower i ∈ {1, . . . , m} using dual variables α + i and α − i for the lower and upper bounds on ∑ T t=1 x it in constraint (5), respectively, and β + it and β − it for the lower and upper bounds on x it in constraint (6): subject to Now, strong duality of linear programs (LP) is exploited-that is, the optimum objective function values of the primal and the corresponding dual LP are equal, provided a finite optimum exists for either of them. Since the primal LP of each follower always admits a finite optimum, the following holds: Consequently, the optimistic variant of the bilevel tariff optimization problem can be equivalently described by the following mathematical problem with complementarity constraints: subject to where 0 ≤ L ⊥ R ≥ 0 denotes that L ≥ 0, R ≥ 0, and either L = 0, or R = 0. The latter complementarity constraint can be described by two linear constraints using an extra binary variable and some big M constant, which is a standard rewriting technique. One issue with this transformation is the choice of the big M constant. In the above mathematical program, the D u i , ∑ T t=1 x u it − D l i , x u it , and x u it − x l it will do for the corresponding R expressions. However, in the L expressions, the maximum values of the α + i , α − i , β + it , and β − it variables have to bound in the optimal solutions. Since the primal program (4)-(6) always has a finite optimum for each follower i, the dual LP (9) always admits a basic optimum solution. It is not hard to see that the values of α + i and β + it are bounded by max t u it , provided this quantity is non-negative; otherwise, they are 0. For α − i and β − it , the upper bound is max t (q u t − u it ), provided this quantity is positive, and otherwise 0.

Solution of the General Pessimistic Variant
In the pessimistic variant of the bilevel tariff optimization problem, the followers are adversarial toward the leader. Suppose the tariff vector q is fixed by the leader. By Proposition 1, each follower i loads the periods in non-increasing u it − q t order. In case of ties, the period with smaller q t − c t value must be loaded first. Hence, follower i loads the time periods in the order given by the permutation π i of {1, . . . , T} satisfying the following conditions: The next goal is to characterize the optimal solution of the followers. First suppose that D l i = D u i :

Proposition 4.
For a fixed price vector q, let permutation π i be defined as above. If D l i = D u i , then the optimal solution of follower i has the following structure: There exists an index k such Hence, the follower maximizes its profit by saturating the x it in the order given by π i , which at the same time minimizes the objective function of the leader for the fixed price vector q.
Now, consider the case when D l i < D u i . Let the vectorsx i ∈ R T andx i ∈ R T be the optimal solutions of follower i when the total consumption must be equal to D l i or D u i , respectively.
Proposition 5. For a fixed price vector q, let permutation π i be defined as above. If D l i < D u i , then the optimal solution of follower i has the following structure: there exists an index k such that x iπ i (t) =x iπ i (t) for t ∈ [1, k], and x iπ i (t) =x iπ i (t) for t ∈ [k + 1, T]. Moreover, k = T unless there exists an index t such that either u iπ i (t) − q π i (t) = 0 and q π i (t) − c π i (t) > 0, or u iπ i (t) − q π i (t) < 0, in which case (k + 1) is the smallest index with this property.
Proof. First, suppose that k = T. Then follower i will certainly assign the largest possible consumption to x iπ i (t) if u iπ i (t) − q π i (t) > 0. Moreover, in all the positions t with u iπ i (t) − q π i (t) = 0, if any, it holds that q π i (t) − c π i (t) ≤ 0, since k = T, and then again, follower i will maximize the x iπ i (t) . In both cases, the maximum consumption is reached by setting However, to maximize its utility, and minimize the leader's profit, from position k + 1 on, it has to assign the least possible amount to get a feasible solution, and accordingly, for t = k + 1, . . . , T, x iπ i (t) =x iπ i (t) .
The above propositions can easily be turned into algorithms; the details are omitted.

Lemma 4.
Fix some ε > 0. If an optimal solution (x , q ) of the optimistic variant of the bilevel tariff optimization problem is such that either q t < q u t for all t, or q t > 0 for all t, then the price vector q can be slightly perturbed such that x becomes the unique optimal solution of the followers for the modified price vector, and the objective function value of the leader decreases by less than ε.
, where δ > 0 is a parameter, and s is a scaling factor such that ∑ T t=1q t = QT. For δ sufficiently small, q is feasible for the leader, and if u it − q t > u ik − q k , then u it −q t > u ik −q k . Moreover, if u it − q t = u it − q t , and q t − c t > q t − c t for some t = t , then u t −q t > u t −q t . Hence, forq, both the optimistic and the pessimistic answers of the followers are equal to x . Finally, for a sufficiently small δ. Now suppose q t > 0 for all t. Then a very similar transformation can be applied, but this time the prices are decreased by some power of δ > 0 sufficiently small; the details are omitted.
Let S be the supremum of the leader's objective function value (7) over all feasible solutions. Theorem 5. Suppose q u t > 0 for all t ∈ [T], and Q > 0. For any > 0, the pessimistic problem admits a solution (q, , andx is the unique answer of the followers forq. Proof. Take any solution (q, x) such that ∑ T t=1 (q t − c t )x t ≥ S − , and (q, x) respects the optimality conditions. Let UB := {t ∈ [T] | q t > 0 and q u t − q t = min{q u τ − q τ : τ ∈ [T], q τ > 0}}. Let δ 1 be a small positive number. A new price vector q is defined from q as follows.
If UB = ∅, then q t = 0 for all t ∈ [T], and ∑ t∈[T] q t = T · Q must hold, since q satisfies the optimality conditions by assumption. However, this contradicts the previous general assumptions, namely, T · Q > 0 and q u t > 0 for all t ∈ [T]. Observe q t ≤ q t for all t. Then, δ 1 is chosen small enough such that for each t ∈ [T], it holds that It follows immediately that q is feasible for the leader. Consider a particular follower i. Without loss of generality, the optimal ordering of the time periods for follower i is given by the identity permutation defined by π i (t) = t. Let us examine how this ordering changes for the updated q vector. Suppose 1 ≤ t 1 < t 2 ≤ T.
• If u i,t 1 − q t 1 > u i,t 2 − q t 2 , then u i,t 1 − q t 1 > u i,t 2 − q t 2 and the order of the two time periods does not change. • If u i,t 1 − q t 1 = u i,t 2 − q t 2 and q t 1 − c t 1 ≤ q t 2 − c t 2 then three cases can be distinguished: -If q t 1 = q t 1 and q t 2 < q t 2 then u i,t 1 − q t 1 < u i,t 2 − q t 2 . Hence, the order of periods t 1 and t 2 will change for q in order to satisfy the optimality conditions. -If q t 1 < q t 1 and q t 2 = q t 2 then u i,t 1 − q t 1 > u i,t 2 − q t 2 . Hence, the order of t 1 and t 2 will not change for q .

-
If q t 1 − q t 2 = q t 1 − q t 2 , then the order of t 1 and t 2 will not change for q . Consider any follower i, and let x i be its pessimistic response for q , and π i the corresponding permutation of the time periods. Clearly, ∑ T t=1 (q t − c t ) ∑ T i=1 x it ≤ S by the definition of S. Let be the largest index such that x i, > x l i, . Then clearly, for all t < , and x i,t = x l i,t for all t > by the optimality conditions. Analogously, let be the unique index such that x i,π i (t) = x u i,π i (t) for t < , x i,π i (t) = x l i,π i (t) for t > , and x i,π i ( ) > x l i,π i ( ) . Let t 1 and t 2 be the smallest and the largest indices, respectively, such that u i, by the choice of δ 1 . This implies ≥ t 1 . It is argued that Two cases can be distinguished. First suppose ≤ t 2 . Then (11) implies that in π i , the time periods t 1 , . . . , t 2 can be in any order. Since q t − c t ≤ q t+1 − c t+1 for t ∈ [t 1 , t 2 − 1] (since x i is the pessimistic answer of follower i for q), it follows that any permutation of t 1 , . . . , t 2 is more beneficial for the leader for the price vector q. However, if two or more indices are swapped in π i , then it means that the corresponding q t variables are decreased by δ 1 each, whence the objective function decreases by at most δ 1 (x u it − x l it ) in these time periods, and (12) follows for a sufficiently small δ 1 . Now suppose > t 2 . Then (11) implies D l i ≤ ∑ t=1 x t < ∑ t=1 x π i (t) ≤ D u i . Hence, u iπ i ( ) − q π i ( ) ≥ 0, and thus u iπ i ( ) − q π i ( ) ≥ 0 by the choice of δ 1 . On the other hand, Moreover, ≤ t 2 < π i ( ) implies q +1 − c +1 ≥ 0; otherwise, for q, follower i could use the period + 1 to decrease the objective function value of the leader. Let t 3 > t 2 be the largest index such that . However, this implies (12) for a sufficiently small δ 1 .
To finish the proof of the Theorem, let σ be a permutation of For a sufficiently small δ 2 ,q is feasible for the leader; it preserves the permutation π i for each follower i (that is, ; and forq the optimistic and the pessimistic solutions coincide. Hence, the followers have a unique answerx, and and the theorem is proved. The main idea of the following algorithm is exploiting that there is a pessimistic solution (x,q), which has a value very close to the pessimistic optimum, while noq t is at upper bound. Thus, all upper bounds were slightly decreased, and then the optimistic variant was solved with the perturbed data. Finally, the prices were modified such that the solution value decreased only by a small amount, but the followers' solution is unique.
Notice that in the first step of the algorithm, the pessimistic answer x is computed based on the permutations π i , i ∈ [m], corresponding to the vector q. In Step 2 the MILP (10) is solved by using a general mixed-integer linear programming solver. optimization problem. The computed optimal solution, (q , x ), is the best solution for the leader with the decreased upper bounds on the prices. Hence, it cannot be worse than (q,x). Finally, Lemma 4 can be applied to conclude that after perturbation; the resulting price vector along with x constitutes a solution only slightly worse than (q , x ).

Numerical Example
This section demonstrates the proposed approach, and emphasizes the importance of being very conscious of the assumptions made, potentially implicitly, in regard to the way the followers select their response to the decision of the leader (e.g., the optimistic or the pessimistic assumption). In the example, the problem faced by an electricity retailer (leader) and its residential consumers (followers) is investigated on a daily time horizon divided into 24 hourly time units. Two types of consumers are distinguished, with household appliances and EV charging modeled as deferrable loads, respectively. Loads considered for the first consumer type were a 1.5 kW dishwasher and a 0.5 kW washing machine, both with a one-hour washing cycle. The consumers had slight preference for scheduling their load as early as possible, modeled with monotonously decreasing utility values. One-thousand such individual consumers were considered, organized into eight groups with different time windows for these loads. Each homogeneous consumer group was modeled as a separate follower, resulting in eight followers, each with D l i = D u i = 250 kWh, i = 1, ..., 8.
The other type of consumers wished to charge their EVs, equipped with a 75 kWh battery from 20% to 100% using a 11 kW wall charger (which corresponds to the battery capacity of the most popular EV worldwide in 2019 and the power output of the corresponding charger). The EV was connected to the grid from 20:00 to 06:00 the next morning. These consumers had a stronger preference for scheduling their load as early as possible, in order to have their vehicles fully charged, even if they had to leave home earlier than usual. The ensemble of 20 such consumers is the 9th follower in the problem, with D l 9 = D u 9 = 1200 kWh. For the sake of simplicity, other, inflexible loads are disregarded.
Market prices reflect the hourly prices recorded on the Hungarian power exchange (HUPX) on 1 January 2020, from 08:00, varying between 2.771 and 5.047 ct/kWh. The retailer must set an electricity tariff subject to q l t = 2 ct/kWh and q u t = 6 ct/kWh, for all t, with Q = 4 ct/kWh. Figure 1 displays the solution of this problem subject to the commonly applied optimistic assumption. The diagram shows the wholesale market price, the calculated tariff offered to consumers, the leader's net benefit (q t − c t ) (ct/kWh, left vertical axis), and the grid-level load resulting from the followers' demand response (kW, right vertical axis). With the appropriate tariff, the electricity retailer could motivate its consumers to schedule all their deferrable loads into periods when electricity is cheap, yet the retailer can realize a massively positive profit of 4983 cents.
A closer look into the sub-problem faced by follower 1 with household appliances (Figure 2) explains that the retailer achieved the above by compensating for the decreasing utility of followers 1-8 with a similar, decreasing tariff between 08:00 and 20:00, which resulted in a constant net benefit (u i,t − q t ) of 5.155 ct/kWh for followers 1-8 in this time interval. Since the followers were indifferent about the choice between these time periods, by the optimistic assumption, they decided on the period which was the most favorable for the leader: the period 08:00-09:00 in case of follower 1. In a similar fashion, the net benefit of follower 9 with EV charging (see Figure 3) was a constant 3.755 ct/kWh for 21:00-02:00 and 03:00-05:00. Hence, consumers charged their EVs in the period 21:00-02:00, to the benefit of the leader. However, this choice comes purely from an unnatural assumption of the mathematical model, which cannot be enforced in reality.  Given that various time periods bring similar net benefits for the followers, they may equivalently schedule their loads in other periods. Figure 4 depicts a solution in which the leader applies the same tariff, calculated using the optimistic assumption, but among periods that bring identical net benefits for the followers, they select the one that is the least favorable for the leader. In this case, followers 1-8 with household appliances schedule all their deferrable load into period 19:00-20:00 (see Figure 5), where the wholesale market price is higher than the tariff announced by the leader. Similarly, follower 9 charges the EVs partly in periods 03:00-05:00, resulting in further loss for the leader (see Figure 6). Hence, given that the retailer cannot realize its optimistic assumption, its assumed that positive profit can easily turn into considerable loss, −220 cents for the solution depicted.
At the same time, by Proposition 6, the leader can slightly modify the tariff to ensure that the followers have a unique optimal response, with loads identical to the optimistic solution and a tariff arbitrarily close to the optimistic tariff. Consequently, the profit of the leader is also arbitrarily close the the value calculated using the optimistic assumption. This pessimistic solution is not depicted in separate diagrams, since it is arbitrarily close to the optimistic solution displayed in Figures 1-3.

Computational Experiments
Computational experiments investigated the efficiency and the scalability of the proposed approach on randomly generated problem instances of various sizes. Namely, the number of consumers (consumer groups), m, was taken from {5, 10, 15, 20, 25}, and the number of time periods, T, from {12, 24, 36, 48}. Ten random instances were generated for each combination of m and T, resulting in 200 instances altogether.
The instances were similar in their structure to the numerical example presented above: half of the consumers captured different groups of households with deferrable loads (e.g., washing machines) that can be scheduled into a single time period. Other consumers modeled EV charging, where the load had to be distributed over 4-8 periods due to the upper bound x u it . In both cases, the load amounts, the time windows, and the utility values were generated randomly.
The proposed approach was implemented in FICO Xpress 8.8 in the Mosel programming language. During the experiments, the computational time required for solving the proposed MILP formulation (10) of the optimistic variant of the bilevel tariff optimization problem was measured. Given this optimistic solution, the pessimistic solution can be derived in negligible time using the pessimistic solution algorithm. The time limit was set to 300 s. All experiments were run on a personal computer with Intel i7-10510U 1.80 GHz CPU and 16 GB RAM.
The computational results are displayed in Table 3, where each row contains aggregated results over the 10 instances for a given problem size. Column Opt shows the number of instances solved to optimality out of 10; Time contains the average computation time in seconds; columns Gap/avg. and Gap/max. display the average and the maximum optimality gap for the given problem size. For each instance, the gap is computed as (UB − LB)/UB, where UB and LB are the upper and lower bounds, respectively. The results show that the proposed approach could solve all instances with moderate sizes, i.e., with m ≤ 15 or T = 12 to optimality in less than a minute. An increase of m has a stronger influence on the computational time than an increase of T. For larger problems, the solver often hit the time limit (for 10-30% of instances with m = 20, and 50% of instances with m = 25). In such cases, the optimality gap was reasonable, below 15% for all instances, except for a single instance with m = 25 and T = 48 for which the solver could not find an integer solution within the time limit; this is accounted for as a gap of 100%. For even larger instances, the development of more efficient solution algorithms is recommended.

Conclusions and Managerial Implications
This paper gave a detailed analysis of a simple bilevel tariff optimization problem for demand response management. Key properties of the optimal solutions were proven formally. It was shown that in some special cases with a single follower (e.g., when the electricity retailer can offer a dedicated tariff for an individual consumer) the optimal solution can be calculated analytically. For the general case with multiple followers, efficient solution approaches were proposed both for the optimistic and the pessimistic variants, based on a MILP formulation that exploited complementarity for the follower's LP sub-problem. Hence, to the best of the authors' knowledge, this paper proposed the first efficient exact solution approach for the pessimistic variant of the problem. Moreover, it was shown that in most cases, the supremum of the pessimistic variant equals the optimum of the optimistic variant, which means that with fully rational followers, the leader can attain a similar profit without the impracticable optimistic assumption.

Managerial Implications
The main finding of the research is related to the importance of defining clearly the assumption of how the followers select their response to the decision of the leader: almost all previous studies in the literature implicitly make the optimistic assumption that followers select the most favorable response for the leader, but this assumption cannot be enforced in practice. Instead, the followers typically have many optimal responses, and they may easily select another response that dramatically decrease the profit of the leader. This problem is addressed by the pessimistic variant of the bilevel problem, which assumes that the followers may select their optimal response that is the least favorable for the leader, and hence safeguards the leader from the consequences of an unexpected response. While the pessimistic variant of bilevel optimization problems is often harder to solve than the optimistic variant, this paper showed that for the studied bilevel tariff optimization problem, the pessimistic variant can also be solved efficiently.

Directions for Future Research
Future research should focus on generalizing the proposed approach to the pessimistic variant of richer bilevel models for energy management, including batteries and generators controlled by the leader or the followers, or specialized applications such, as HVAC. Moreover, the applicability of robust optimization approaches to these bilevel problems should be investigated, for instance, with uncertain consumer parameters or sub-optimal responses from followers.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.