Mechanism Design for Demand Management in Energy Communities

We consider a demand management problem of an energy community, in which several users obtain energy from an external organization such as an energy company, and pay for the energy according to pre-specified prices that consist of a time-dependent price per unit of energy, as well as a separate price for peak demand. Since users' utilities are their private information, which they may not be willing to share, a mediator, known as the planner, is introduced to help optimize the overall satisfaction of the community (total utility minus total payments) by mechanism design. A mechanism consists of a message space, a tax/subsidy and an allocation function for each user. Each user reports a message chosen from her own message space, and then receives some amount of energy determined by the allocation function and pays the tax specified by the tax function. A desirable mechanism induces a game, the Nash equilibria (NE) of which result in an allocation that coincides with the optimal allocation for the community. As a starting point, we design a mechanism for the energy community with desirable properties such as full implementation, strong budget balance and individual rationality for both users and the planner. We then modify this baseline mechanism for communities where message exchanges are allowed only within neighborhoods, and consequently, the tax/subsidy and allocation functions of each user are only determined by the messages from her neighbors. All the desirable properties of the baseline mechanism are preserved in the distributed mechanism. Finally, we present a learning algorithm for the baseline mechanism, based on projected gradient descent, that is guaranteed to converge to the NE of the induced game.


I. INTRODUCTION
Resource allocation is an essential task in networked systems such as communication networks, energy/power networks, etc [1]- [3]. In such systems, there is usually one or multiple kinds of limited and divisible resources allocated among several agents. When full information regarding agents' interests is available, solving the optimal resource allocation problem reduces to a standard optimization problem. However, in many interesting scenarios, strategic agents may choose to conceal or misreport their interests in order to get more resources. In such cases, it is possible that appropriate incentives are designed so that selfish agents are incentivized to report truly their private information, thus enabling optimal resource allocation [4].
In existing works related to resource allocation problems, mechanism design [5], [6] is frequently used to address the agents' strategic behavior mentioned above. In the framework of mechanism design, the participants reach an agreement regarding how they exchange messages, how they share the resources, and how much they pay (or get paid). Such agreements are designed to incentivize the agents to provide the information needed to solve the optimization problem.
In this paper, we develop mechanisms to solve a demand management problem in energy communities. In an energy community, users obtain energy from an energy company and pay for it. The pre-specified prices dictated by the energy company consist of a time-dependent price per unit of energy, as well as a separate price for peak demand. Users' demand is subject to constraints relating to equipment capacity and minimum comfort level. Each user possesses a utility as a function of their own demand. Utilities are private information for users. The welfare of the community is the sum of utilities minus energy cost. If users were willing to report truthfully their utilities, one could easily optimize energy allocation to maximize social welfare. However, since users are strategic and might not be willing to report utilities directly, to maximize the welfare, we need to find an appropriate mechanism that incentivizes them to reveal some information about their utilities, so that optimal allocation is reached even in the presence of strategic behaviors. These mechanisms are usually required to possess several interesting properties, among which, full implementation in Nash equilibria (NE), individual rationality and budget balance [6]- [8]. Moreover, in environments with communication constraints, it is desirable to have "distributed" mechanisms, whereby energy allocation and tax/subsidies for each user can be evaluated using only local messages in the user's neighborhood. Finally, for actual deployment of practical mechanisms we hope that the designed mechanism has convergence properties that guarantee that NE is reached by the agents by means of a provably convergent learning algorithm.
using penalty functions to incentivize feasibility, and in [12]- [14] using proportional allocation, or its generalization, radial allocation [15], [52]. All aforementioned works on mechanism design can be categorized as "centralized" mechanisms, which means that agents' messages are broadcasted to a central planner who evaluates allocation and taxation for all users. The first attempts in designing decentralized mechanisms were reported in [21], [22], where mechanisms are designed with the additional property that the allocation and tax functions for each agent depend only on the messages emitted by neighboring agents. As such, allocation and taxation can be evaluated locally.
Finally, learning in games is motivated by the fact that NE is, theoretically, a complete information solution concept. However, since users do not know each-others' utilities, they cannot evaluate the designed NE off line. Instead there is a need for a process (learning) during which the NE is being learnt by the community. The classic works [23]- [25] adopt fictitious play, while in [26] a connection between supermodularity and convergence of learning dynamics within an adaptive dynamics class is made, and is further specialized in [53] to the Lindahl allocation problem. A general class of learning dynamics named adaptive best response is discussed in [54] in connection with contractive games. Learning in monotone games [27], [55] is investigated in [56]- [61], with further applications in network optimization [62], [63]. Recently, learning of NE utilizing reinforcement learning has been reported in [64]- [68].

A. Demand Management in Energy Communities
Consider an energy community consisting of N users and a given time horizon T , where T can be viewed as the number of days during one billing period. Each user i in the user set N has her own prediction on her usage over one billing period denoted by x i = (x i 1 , . . . , x i T ), where x i t is the predicted usage of user i on the t-th time slot of the billing period 1 . Note that x i t can be a negative number due to the potential possibility that users in the electrical grid can generate power through renewable technologies (e.g., photovoltaic) and return the surplus back to the grid. The users are characterized by their utility functions as The energy community, as a whole, pays for the energy. The unit prices are given separately for every time slot t denoted by p t . These prices are considered given and fixed (e.g., by the local utility company). In addition, the local utility company imposes a unit peak price p 0 in order to incentivize load balancing and lessen the burden of peaks in demand. To conclude, the cost of energy to the community is as follows: where x is a concatenation of demand vectors x 1 , . . . , x N .
The centralized demand management problem for the energy community can be formulated The meaning of the feasible set X is to incorporate possible lower bounds on each user's demand (e.g., minimal indoor heating or AC) and/or upper bounds due to the capacities of the facilities, as well as transmission line capacities.
In order to solve the optimization problem (2) using convex optimization methods, the following assumptions are made.
Assumption 1. All the utility functions v i t (·)'s are twice differentiable and strictly concave.
Assumption 2. The feasible set X is a polytope formed by several linear inequality constraints, and X is coordinate convex, i.e., if x ∈ X , then setting any of the components of x to 0 won't let it fall outside of set X .
By Assumption 2, X can be written as {x|Ax ≤ b} for some A ∈ R L×N T and b ∈ R L + , where L is the number of linear constraints in X , and The coordinate convexity in Assumption 2 is mainly used for the outside option required by the individual rationality. Under this assumption, for a feasible allocation x, if any user i changes her mind and chooses not to participate in the mechanism, the mechanism yields a feasible allocation with x i = 0 fixed.
With Assumptions 1, 2, the energy community faces an optimization problem with a strictly concave and continuous objective function over a nonempty compact convex feasible set. Therefore, from convex optimization theory, the optimal solution for this problem always exists and should be unique [28].
Substituting the max function in (1) with a new variable w, the optimization problem in (2) can be equivalently restated as The proof of this equivalency can be found in Appendix A. The new optimization problem has a differentiable concave objective function with a convex feasible set, which means it is still a convex optimization, and therefore, KKT conditions are sufficient and necessary conditions for a solution (x, λ, µ) to be the optimal solution, where λ = [λ 1 , . . . , λ L ] T are the Lagrange multipliers for each linear constraint a lT x ≤ b l in constraint x ∈ X , and µ = [µ 1 , . . . , µ T ] T are the Lagrange multipliers for (3c). The KKT conditions are listed as follows: 1) Primal Feasibility: 2) Dual Feasibility: 3) Complementary Slackness: 4) Stationarity: wherev i t (·) is the first order derivative of v i t (·). We conclude this section by pointing out once more that our objective is not to solve (3) or (4) in a centralized or decentralized fashion. Such a methodology is well established and falls under the research area of centralized or decentralized (non-strategic) optimization. Furthermore, such a task can be accomplished only under the assumption that users report their utilities (or related quantities, such as derivatives of utilities at specific points) truthfully, i.e., they do not act strategically. Instead, our objective is to design a mechanism (i.e., messages and incentives) so that strategic users are presented with a game, the NE of which is designed so that it corresponds to the optimal solution of (3) or (4).

B. Mechanism Design Preliminaries
In an energy community, utilities are users' private information. Due to privacy and strategic concerns, users might not be willing to report their utilities. As a result, (3) or (4) cannot be solved directly. In order to solve (3), (4) under the settings stated above, we introduce a planner as an intermediary between the community and the energy company. To incentivize users to provide necessary information for optimization, the planner signs a contract with users, which prespecifies the messages needed from users and rules for determining the allocation and taxes/subsidies from/to the users. The planner commits to the contract. Informally speaking, the design of such contract is referred to as mechanism design.
More formally, a mechanism is a collection of message sets and an outcome function [7].
Specifically, in resource allocation problems, a mechanism can be defined as a tuple (M,x(·),t(·)), where M = M 1 × . . . × M N is the space of message profile;x : M → X is an allocation function determining the allocation x according to the received message profile m ∈ M; and t : M → R N is a tax function which defines the payments (or subsidies) of users based on m (specifically,t = {t i } i∈N witht i : M → R defining the tax/subsidy function for user i). Once defined, the mechanism induces a game (N , M, {u i } i∈N ). In this game, each user i chooses her message m i from the message space M i , with the objective to maximize her payoff The planner charges taxes and pays for the energy cost to the company, so the planner's payoff turns out to be it i (m) − J(x(m)) (the net income of the planner).
For the mechanism-induced game G, NE is an appropriate solution concept. At the equilibrium point m * , ifx(m * ) coincides with the optimal allocation (i.e., the solution of (3)), we say that June 16, 2021 DRAFT the mechanism implements the optimal allocation at m * . A mechanism has the property of full implementation if all the NE m * 's implement the optimal allocation.
There are other desirable properties in a mechanism. Individual rationality is the property that every one volunteers to participate in the mechanism-induced game instead of quitting. For the planner, this means that the sum of taxes it i (m * ) collected at NE is larger than the cost paid to the energy company J(x(m * )). In the context of this paper, strong budget balance is the property that the sum of taxes is exactly the same as the cost paid to the energy company, so no additional funds are required by the planner or the community to run the mechanism other than the true energy cost paid to the energy company. In addition, if we use the solution concept of NE, one significant problem is how the users know the NE without full information.
Therefore, some learning algorithm is needed to help users learn the NE. If under a specific class of learning algorithm, the message profile m converges to NE m * , then we say that the mechanism has learning guarantees with this certain class.

III. THE BASELINE "CENTRALIZED" MECHANISM
In this section we temporarily assume there are no communication constraints, i.e., all the message components are accessible for the calculations of the allocation and taxation. The mechanism designed under this assumption is called a "centralized" mechanism. In the next section we will extend this mechanism to an environment with communication constraints.
In the proposed centralized mechanism we define user i's message m i as Each message component above has an intuitive meaning. Message y i t ∈ R can be regarded as the demand for time slot t announced by user i. Message q i,l ∈ R + is the additional price that user i expects to pay for the constraint l, which corresponds to the Lagrange multiplier λ l .
Message s i t ∈ R + is proportional to the peak price that user i expects to pay at time t. Intuitively, setting one s i t greater than s i t means user i thinks day t is more likely to be the day with the peak demand rather than t . This component corresponds to the Lagrange multiplier µ t . Message β i t ∈ R is the prediction of user (i + 1)'s usage at time t by user i. This message is included for technical reasons that will become clear in the following (for a user index i ∈ N , let i − 1 and i + 1 denote modulo N operations).
i.e., users get exactly what they request.
Prior to the definition of the tax functions, we want to find some variable which acts like µ t at NE. Although s i t is designed to be proportional to µ t , it does not guarantee t s i t = p 0 , which is KKT condition (4f). To solve this problem, we utilize a technique similar to the proportional/radial allocation in [12]- [15], [52] to shape the suggested peak price vector s into a form which satisfies (4f). For a generic T -dimensional peak price vectors = (s 1 , . . . ,s T ) and a generic T -dimensional total demand vectorỹ = (ỹ 1 , . . . ,ỹ T ), define a radial pricing operator where and #(arg max t ỹ t ) represents the number of elements inỹ that are equal to the maximum value.
The output of the radial pricing RP(·, ·) will be taken as the peak price in the subsequent tax functions. When the given suggested price vectors is a nonzero vector, the unit peak price will be allocated to each day proportional tos t . If the suggested price vectors = 0, then divide p 0 to the days with peak demand with equal proportion.
The tax functions are defined aŝ where and a i,l is defined as a i,l = [a i,l 1 , . . . , a i,l T ]. The tax function for user i consists of three parts. The first part cost i (m) is the cost for the demand. According to this part, user i pays the fixed price and the peak price for her demand.
Note that the peak price at time t, RP i t (s −i , ζ −i ), is generated by the vector of peak prices from all other agents, s −i , and the total demand from all other agents (agent i's demand at time t is approximated by β i−1 t ). As a result, the peak price is not controlled by user i at all. The second part prβ i t (m) (prβ stands for "proxy-β") is a penalty term for the imprecision of prediction β i , which incentivizes β i to align with y i+1 at NE. The third part consists of two penalty terms con i,l (m) and con i a profitable deviationm, which keeps everything other than β i the same as m, but modifies β i t withβ i t = y i+1 t . Compare the payoff value u i before and after the deviation: Thus, if there is some β i = y i+1 , user i can always construct another announcementm i , such that user i get a better payoff.
It can be seen from Lemma 1 that the messages β play an important role in the mechanism.
They appear in two places in the tax functions. First, in the expression of which is the total demand at time t used in user i's tax function. Second, in the expression for excess demand b l − j =i a j,l y j − a i,l β i−1 for the l-th constraint. Note that we do not want user i to control these terms with her messages (specifically y i t ) because she already controls her allocation directly and this will create technical difficulties. Indeed, quoting the self-announced demand in the tax function raises the possibility of unexpected strategic moves for user i to obtain extra profit. Instead, using the proxy β i−1 instead of y i eliminates user i's control on his own slackness factor, while Lemma 1 guarantees that at NE these quantities become equal.
With the introduction of these proxies, we show in the following lemmas, that at NE, all KKT conditions required for the optimal solution are satisfied. First we prove primal feasibility (KKT 1) and complementary slackness (KKT 3) are ensured by the design of the penalty terms "pr"s and constraint-related terms "con"s, if we treat q and RP(s, ζ) as the Lagrange multipliers.
Lemma 2. At any NE, users' suggested prices are equal: Furthermore, users' announced demand profile satisfies y ∈ X , and the equal prices, together with the demand profile, have satisfy complementary slackness: which implies where z is the peak demand during the billing period.

June 16, 2021 DRAFT
Proof: The proof can be found in Appendix B.
Dual feasibility (KKT 2) holds trivially by definition. We now show that stationarity condition (KKT 4) holds at NE by imposing first order condition on the partial derivatives of user i's utility w.r.t. their message components y i t 's.
Proof: The proof is in Appendix C.
With Lemma 1, 2 and 3, it is straightforward to derive the first part of our result, i.e., efficiency of the allocation at any NE.
Theorem 1. For the mechanism-induced game G, if NE exist, then the NE result in the same allocation as the optimal solution to the centralized problem (3).
Proof: If m * is a NE, from Lemma 1 and 2, we know that at NE, β i * = y i+1 , and all the prices q i * , s i * , and all the ζ −i * are the same among all the users i ∈ N . We denote these equal quantities by y * , q * , s * and ζ * .
Therefore, sol satisfies all the four KKT conditions, which means the allocationx(m * ) is the optimal allocation.
The following theorem shows the existence of NE.
Theorem 2. For the mechanism-induced game G, there exists at least one NE.
Proof: From the theory of convex optimization, we know that the optimal solution of (3) exists. Based on this solution, one can construct a message profile which satisfies all the properties we present in Lemma 1,2,3 and prove there is no unilateral deviation for all users. The details are presented in Appendix D.
Full implementation indicates that if all users are willing to participate in the mechanism, the equilibrium outcome is nothing but the optimal allocation. For each user i, the payoff at NE will be In other words, the users pay for their own demands by the aggregated unit prices given by the consensus at NE. By counting the planner as a participant of the mechanism with utility i∈Nt i (m * )−J(x * ), a strong budget balance is automatically achieved. However, there are still two questions remaining. Are the users willing to follow this mechanism or would they rather not participate? Will the planner have to pay extra money for implementing such mechanism?
The two theorems below answer these questions.
Theorem 3 (Individual Rationality for Users). Assume agent i gets x i = 0 and pays nothing if she chooses not to participate in the mechanism. Then, at NE, participating in the mechanism is weakly better than not participating, i.e., Proof: The main idea for the proof of Theorem 3 is to find a message profile with m −i * , in which user i's payoff is v i (0), and then we can argue that following NE won't be worse since m * is a best response to m −i * . The details of the proof can be found in Appendix E.
Theorem 4 (Individual Rationality for the Planner). At NE, the planner does not need to pay extra money for the mechanism: Moreover, by a slight modification of the tax functions defined in (7), the total payment of users and the energy cost achieve a balance at NE: Proof: The verification of individual rationality of the planner can be done by substituting (14) directly. By redistributing the income of the planner back to the users in a certain way, the total payment of users is exactly J(x(m * )) and consequently no money is left after paying the energy company. The details are left to Appendix F.

IV. DISTRIBUTED MECHANISM
In the previous mechanism, allocation functions and tax functions of users depend on the global message profile m. If one wants to compute the taxt i for a certain user i, all messages m j for all j ∈ N are needed. Such mechanisms are not desirable for environments with communication constraints, where such global message exchange is restricted. To tackle this problem, we provide a distributed mechanism, in which the calculation of the allocation and tax of a certain user depends only on the messages from the "available" users, and therefore satisfies the communication constraints. In this section, we will first introduce communication constraints using a message exchange network model. We then develop a distributed mechanism, which accommodates the communication constrains and preserves the desirable properties of the baseline centralized mechanism.

A. Message Exchange Network
In an environment with communication constraints, all the users are organized in a undirected graph GR = (N , E), where the set of nodes N is the set of users, and the set of edges E indicates the accessibility to the message for each user. If (i, j) ∈ E, user i can access the message of user j, i.e., the message of j is available for user i when computing the allocation and tax of user i, and vice versa. Here we state a mild requirement for the message exchange network: The graph GR is a connected graph.
In fact, the mechanism we are going to show will work for the cases where GR is a tree.
Although an undirected connected graph is not necessarily a tree, since we can always find a spanning tree from such graph, it is safe to consider the mechanism under the assumption that the given network has a tree structure. If that is not the case, the mechanism designer can claim a spanning tree from the original message exchange network, and design the mechanism only based on the tree instead of the whole graph (essentially some of the connections of the original graph will never be used for message exchanges).
The basic idea behind the decentralized modification of the baseline mechanism is intuitively straightforward. Looking at the tax function for user i in the centralized mechanism we observe that several messages required are not coming from i's immediate neighbors. For this reason we define new "summary" messages that are quoted by i's neighbors and represent the missing messages. At the same time, for this to work, we add additional penalty terms that guarantee that the summary messages will indeed represent the needed terms at NE. Notice that in the previous mechanism, user i is expected to announce a β i t equal to the demand of the next user (i + 1), but here we might have (i, i + 1) / ∈ E, and owing to the communication constraint, we are not able to compare β i t with y i+1 t . Instead, β i t should be a proxy of the demand of user i's direct neighbor. This motivates us to define the function φ(i), where φ(i) ∈ N (i), N (i) is the set of user i's neighbors (excluding i), and φ(i) = j denotes that in user i's tax function, the proxy variable β used for user i's con l i (m) terms in her tax function is provided by user j. In other words, φ(i) is a "helper" for user i who quotes a proxy of his demand whenever needed.
In the next part we are going to use the summaries of the demands to deal with the distributed issue. For the sake of convenience, we define n(i, k) as the nearest user to user k among the neighbors of user i and user i itself. n(i, k) is well-defined because of the tree structure. The proof is omitted here. The details can be found in [69, Ch. 4, Sec. 7.1].

B. The Message Space
In the distributed mechanism, the message m i in user i's message space M i is defined as Here n i,j,l is a summary for demands of users related to constraint l and connected to user i via j as depicted in Figure 1. Message ν i,j t serves a similar role for the peak demand.

C. The Allocation and Tax Functions
The allocation functionsx i t (m) = y i t are still straightforward. There are some modifications on tax functions, including adjustments on prices, consensus of new variables, and terms for complementary slackness. where con i respectively. The second term in each of these expressions relates to the proxy β i−1 which in the decentralized version is substituted by the proxy β φ(i),i due to the fact that the proxy for y i is not provided by user i − 1 anymore but is provided by user i's helper φ(i). The first term, j =i a j,l y j = j∈N (i) a j,l y j + j / ∈N (i)∪{i} a j,l y j , which cannot be directly evaluated in the decentralized version (since it depends on messages outside the neighborhood of i) is now evaluated as j∈N (i) f i,j,l = j∈N (i) a j,l y j + j∈N (i) h∈N (j)\{i} n j,h,l . It should now be clear that the role of the new messages n j,h,l quoted by the neighbors j ∈ N (i) of i, is to summarize the total demands of other users. Furthermore, the additional quadratic penalty terms will have to effectuate this equality. This idea is made precise in the next section.

D. Properties
It is clear that this mechanism is distributed, since all the messages needed for the allocation and tax functions of user i come from her neighborhood N (i) and herself. Due to way the messages and taxes are designed, the proposed mechanism satisfies properties similar to those in Lemma 2, 3, and consequently Theorem 1, 2. The reason is that the components n and ν behave the same as the absent y h , h / ∈ N (i) in user i's functions at NE, which makes the proofs of the properties in previous mechanism still work here. We elaborate on these properties in the following.
Lemma 4. At any NE, we have the following results regarding the proxy messages: n i,j,l =a j,l y j + h∈N (j)\{i} n j,h,l , ∀i, ∀j ∈ N (i), ∀l ∈ L, Proof: β i,j , n i,j,l and ν i,j t only appear in the quadratic penalty terms of user i's tax function. Therefore, for any user i, the only choice to minimize the tax is to bid β i,j , n i,j,l and ν i,j t by (19)- (21). Now, based on the structure of the message exchange network, we have Lemma 5. At any NE, n i,j,l and ν i,j t satisfy Proof: The proof is presented in Appendix G.
With Lemma 5, we immediately obtain the following results.
Lemma 6. At any NE, for all user i, we have Proof: At NE, by directly substituting: can be reproduced in the distributed mechanism. We then obtain the following theorem.
Theorem 5. For the mechanism-induced game G, NE exist. Furthermore, any NE of game G induces the optimal allocation.
Proof: By substituting (24), (25) in (16), we obtain exactly the same form of the tax function in centralized mechanism on equilibrium, which yields the desirable results as shown in Lemmas 2, 3. We conclude that any NE induces the optimal allocation. The existence of NE can be proved by a construction similar to that of Theorem 2.
As was true in the baseline centralized mechanism, in the distributed case, the planner may also have the concerns whether the users have incentive to participate, and whether the mechanism requires external sources of funds to maintain the balance. As it turns out, Theorems 3 and 4 still hold here. As a result, the users are better off joining the mechanism, and the market has a balanced budget. The proofs and the construction of the subsidies can be done in a manner similar to the centralized case and therefore are omitted. utilities. To settle this issue, one can design a learning algorithm to help participants learn the NE in an online fashion. In this section, we present such a learning algorithm for the centralized mechanism discussed in Section III. Instead of using Assumption 1, here we make a stronger assumption in order to obtain a convergent algorithm. Here δ-strong concavity of a function g(·) is defined by the δ-strong convexity of −g(·). A function f (·) is strongly convex with parameter δ if

V. A LEARNING ALGORITHM FOR THE CENTRALIZED MECHANISM
The design of the learning algorithm involves three steps. First, we find the relation between NE and the optimal solution of the original optimization problem. This step has been done in the proof of Theorem 1: we see in NE, y * coincides with x * in the optimal allocation, and q i * equals λ * , and the components of s i * are proportional to the components of µ * . Then, by Slater's condition, strong duality holds here, so we connect the Lagrange multipliers λ * , µ * with the optimal solution of the dual problem. Due to the strong concavity of the utilities and stationarity, given λ * and µ * , the optimal allocation x * can be uniquely determined. Finally, if we can find an algorithm to solve the dual problem, the design is done.
The first two steps are straightforward. For the third one, we can see the dual problem is also a convex optimization problem, so projected gradient descent (PGD) is one of the choices for the learning algorithm. The proof of convergence of PGD is not trivial. In the proof developed in [28], the convergence of PGD holds when (a) the objective function is β-smooth and (b) the feasible set is closed and convex. In Appendix H we show that (a) is satisfied by Assumption 4. To check (b), we need to find a feasible set for the dual variables. Since in PGD of the dual problem, the gradient of the dual function turns out to be a combination of functions of the form (v i t ) −1 (·), the feasible set should satisfy two requirements: first, all the elements are in the domain of the dual function's gradient in order to make every iteration valid; second, (λ * , µ * ) is in the feasible set so that we won't miss it. With these requirements in mind, we make Assumption 5 and construct a feasible set for the dual problem based on that.

Assumption 5. For each utility
where ⊗ represents Kronecker product of matrices. Then define a set of proper prices P as the feasible set for the dual problem: Observe that by stationarity, the ((i − 1)T + t)-th entry ofÃ Tλ +p equalsv i t (x i * t ) in optimal solution. Consequently, Assumption 5 implies two things: first,λ * ∈ P; second, allÃ Tλ +p can be a vector ofv i t 's on some x ∈ R N T ifλ ∈ P. Hence, with Assumption 5, it is safe to narrow down the feasible set of the dual problem to P without changing the optimal solution.
Furthermore, for all the price vectors in P, (v i t ) −1 (·) in PGD can be evaluated. Back to condition (b) stated above, since P is closed and convex, PGD is convergent in this case.
Based on all the assumptions and the PGD method, we propose Algorithm 1 as a learning algorithm for the NE of the centralized mechanism.
Algorithm 1: The learning algorithm for the centralized mechanism Data: Time index k, a set of proper prices P, a vector of initial prices (q 0 , s 0 ) ∈ P, message profiles m(k), iteration step size α, number of iterations K.
The convergence of PGD yields the convergence of proposed learning algorithm: Theorem 6. Choose a step size α ≤ δ / A , where A is A's spectral norm, δ is the parameter of strong concavity of the centralized objective function. As the number of iterations K grows, the distance between the computed price vector (q(K), s(K)) and the optimal price vector (q * , s * ) is non-increasing. Furthermore, lim K→∞ m(K) = m * , where m * is the NE.
Proof: See Appendix H.

VI. A CONCRETE EXAMPLE
To give a sense of how the two mechanisms and the learning algorithm work, we provide a simple non-trivial example here. We will first present the original centralized problem for the example, and then identify the NE of the centralized mechanism based on the properties we found. For the distributed mechanism, we will illustrate how the proxy variables at NE are determined with a simple example of a message exchange network. Lastly, we implement the learning algorithm for the centralized mechanism.

A. The Demand Management Optimization Problem
In the energy community, assume there are three users in the user set N = {1, 2, 3}, and T = 2 days in a billing period. Suppose user i on day t has the following utility function: Set p 1 = 0.1, p 2 = 0.2, and the peak price p 0 = 0.05. We adopt the following centralized problem as a concrete example: The solution to this problem is approximately 2 The lower bound constraint for x 1 1 and the upper bound constraint for the sum are active. Thus, according to KKT conditions, λ l * = 0 for l = 2, . . . , 6, and λ 1 * = 0.2056, λ 7 * = 1.1056 by stationarity. The total demands of Day 1 and Day 2 are −0.8525 and 2.8525 respectively, so Day 2 has the peak demand w * = 2.8525, Day 1 charges no peak price (µ * 1 = 0), and Day 2 has an extra unit peak price µ * 2 = 0.05.

C. The Distributed Mechanism
In this subsection we will first demonstrate the modifications on message spaces compared to the centralized mechanism, and then show how the newly introduced components n and ν work.
The specific NE can be determined in a similar way with that of the centralized mechanism and therefore omitted. For this message exchange network, the message components for each user are Therefore, in the distributed mechanism, users are still required to provide their demands y, suggested unit prices q and suggested unit peak prices s. Different from the centralized mechanism, there are no β among user 3's message components, while user 2 needs to provide two β's, namely β 2,1 , β 2,3 . In addition, for each constraint l, every user needs to announce variable n's to each of her neighbor(s); for each day t, every user also needs to provide variable ν's to each of her neighbor(s).

D. The Learning Algorithm
Before the algorithm is implemented, one might want to check whether the problem setting satisfies Assumptions 4 and 5.
First we can check Assumption 5. Suppose for the specific environment we have r i t = i · t/9 andr i t = i · t. Then for the first condition in Assumption 5, since each x i t has a lower bound Also, every x i t is upper bounded by 7 because from the 7-th constraint we have For the second condition in Assumption 5, for all p ∈ [i · t/9, i · t], we havė so Assumption 5 is verified.
With the r i t 's andr i t 's chosen above, a dual feasible set P is constructed. Within this price set P, Algorithm 1 evaluates the function (v i t ) −1 (·) only in the interval [i · t/9, i · t]. Consequently, in running Algorithm 1 we only need to define v i t (·) on the interval [−1, 7]. Regarding Assumption 4, we need to show that v i t (·) is strongly concave on [−1, 7]. Since one can verify that the function −it ln(2 + x) − ax 2 /2 is convex on [−1, 7] for 0 ≤ a ≤ it/81, every v i t (·) is strong concave, and thus Assumption 4 holds 3 . To choose an appropriate step size α for the algorithm, we need to investigate further the parameter δ. In our environment, the sum of utility functions f (x) = − 2 is strongly concave on [−1, 7] 6 with parameter δ = 18/81, because each component v i t of f is a strongly concave function with parameter i · t/81, and the parameter δ is additive: it/81 = 18/81. By calculation, Ã ≈ 3.1623, so one possible step size can be α = 0.1 < 2 × δ/ Ã . According to Algorithm 1 the updates required are (define η(i, t) = 2(i − 1) + t for convenience): q i,η(j,t) (k + 1) = q i,η(j,t) (k) − α(1 + y j t (k)), j = 1, 2, 3, t = 1, 2, y j t (k)), y j t , t = 1, 2, (q i (k + 1), s i (k + 1)) = Proj P (q i (k + 1),s i (k + 1)), To verify the convergence of the learning algorithm we run it with initial price set to (q(0), s(0)) = Proj P (0 9×1 ). After K = 100 iterations, we observe the convergence for both the suggested prices q, s and the corresponding announced demands y. Figure 3 shows the process of convergence and verifies that the convergence rate is exponential, as expected.

VII. CONCLUSIONS
Motivated by the work of mechanism design for NUM problems, we proposed a new class of (indirect) mechanisms, with application in demand management in energy communities. The proposed mechanisms possess desirable properties including full implementation, individual rationality, and budget balance and can be easily generalized to different environments with peak shaving and convex constraints. We showed how the original "centralized" mechanism can be modified in a systematic way to account for environments with communication constraints.
This modification leads to a new type of mechanisms that we call "decentralized" mechanisms and can be thought of as the analog to decentralized optimization (developed for optimization problems with non-strategic agents) for environments with strategic users. Finally, motivated by the need for practical deployment of these mechanisms, we introduced a PGD-based learning algorithm for users to learn the NE of the mechanism-induced game.
Possible future research directions include learning algorithms for the distributed mechanism, as well as co-design of a (distributed) mechanism and characterization of the class of convergent algorithms for this design.

APPENDIX
A. Equivalence of Centralized Optimization Problem (2) and Original Problem (3) We first prove this sufficiency by showing that we can always derive the optimal solution of (2) from the optimal solution of newly constructed (3). Suppose the optimal solution of (3) is (x * , w * ). We claim that x * is the optimal solution of the original problem (2). First, the feasibility of x * in (2) is assured by (3b) in the newly constructed problem.
Now check the optimality. Suppose x * is not the optimal for (2), and instead, x is the optimal.
In the new problem, constructx then it is easy to verify that (x,w) is feasible for the new optimization. Notice that The first inequality follows the optimality of x in the original optimization (2); the second inequality comes from the constraint (3c) in the new optimization. By this inequality chain, we find a (x,w) with a better objective function value in (3) than (x * , w * ), which contradicts to the assumption that (x * , w * ) is optimal solution of (3).
Therefore, by contradiction, we shows that if (x * , w * ) is optimal solution of (3), x * must be the optimal solution for the original optimization (2).
For the other direction, we need to show if x is optimal solution of (2), then we are able to construct an optimal solution of (3) based on x . We constructx = x andw = max 1≤t≤T N i=1 x i t and argue that this (x,w) is the optimal for (3). Assume (x * , w * ) is the optimal for (3), then we will still get the same inequality chain as (28) (except that for the second line, there should be a "greater than or equal" sign instead), and the equality and inequalities hold for the same reasons as stated above. This shows (x,w) has the same objective value with the optimal solution of (3), and therefore (x,w) constructed from x of the original problem is also the optimal for the new problem.

B. Proof of Lemma 2
Proof: At NE m * , for the constraint l in L, consider the message components q i,l for each user i. In user i's tax function, denote the part relative to q i,l byt i,l q . We havê For any user i, there is no unilateral profitable deviation on m i * . Hence, if we fix m −i * and all the message components of m i * except q i,l , it is a necessary condition that user i cannot find a better response than q i,l * .
Consider the best response of q i,l in different cases of e l (y * ).
Case 1. e l (y * ) > 0, i.e., the constraint l is inactive at NE. Note thatt i,l q is a quadratic function of q i,l of the following form t i,l q = (q i,l ) 2 − (2q −i,l * − e l (y * ))q i,l + (q −i,l * ) 2 .
Notice that q i,l * < q −i,l * implies q i,l is smaller than one of the q j,l among user j = i, which means q i,l is not the largest. Assume that q i,l * < q −i,l * for all i, then no q i,l can be the largest among {q i,l } i∈N , but we also know that {q i,l } i∈N is a finite set and therefore it must have a maximum. Here comes the contradiction. As a result, there must exist at least one i, such that q i,l * = q −i,l * , which implies that all the q i,l * = 0.
Case 2. e l (y * ) = 0, i.e., the constraint l is active at NE. In this case,t i,l q = (q i,l − q −i,l * ) 2 . It is clear that every user's best response is to make her own price align with the average of the others.
Notice that if q i,l * = q −i,l * , then q −i,l is equal to the average of all q l . Consequently, q i,l * = q j,l * for all i, j ∈ N .
Case 3. e l (y * ) < 0, i.e., the constraint l is violated at NE. In this case, which leads to a condition for all user i as In a finite set, if one number is strictly larger than the average of the others, it means it is not the smallest number in the set. If this condition is true for all user i, it means there is no smallest number among the set, which is impossible. Therefore, Case 3 won't happen at NE.
In summary, at NE, we always have e l (y * ) ≥ 0, and q i,l 's are equal. Moreover, q i,l * e l (y * ) = 0.
These prove the primal feasibility, equal prices and complementary slackness on prices q in the Lemma 2. Now for the time t, consider the message component s i which is exactly the equation (11).
Equation (12) follows directly from the definition of RP operator.

D. Proof of Theorem 2
Proof: By assumption, the centralized problem is a convex optimization problem with nonempty feasible set, so there must exist an optimal solution {x * , w * } and corresponding Lagrange multipliers λ l * , µ * t which satisfy KKT conditions (4a-4g). Consider the message profile m * consisting of If for arbitrary user i, no profitable unilateral deviations exist, i.e., there does not exist añ , then m * is a NE of the game G. We can focus on u i (m) of user i, to see whether she has a profitable deviation given m * −i . For user i, we have Therefore, in the interest of user i, she wants to maximize the following The last term of (29) is the only term related to β i , which is a quadratic terms. As a strategic agent, it is clear that user i won't deviate from β i = x i+1 * otherwise she will pay for the penalty from this.
The second and third terms of (29) are quite similar: they both consist of a quadratic term and a term for complementary slackness. For the second term, let's consider constraint l. If l is active in the optimal solution, the complementary slackness term goes to 0. To avoid extra payment, user i will not deviate q i,l from the price suggested by optimal solution λ l * . If l is inactive, the price λ l * suggested by optimal solution will be 0. Then the penalty of constraint l for user i is where user i can only select a nonnegative price q i,l . There are no better choices better than choosing q i,l = 0 = λ l * . Similar analysis works for the third term of (29). As a result, there are not unilateral profitable deviations on q i,l for all l, and s i t for all t. Now we denote the terms in the parentheses in the first part of (29) by f i t (y i t ). Since these four terms are disjoint in the aspect of inputted variables, u i (m i , m −i * ) achieves its maximum if and only if every f i t (y i t ) achieves its maximum, and the rest three terms equal their minimum. As for the first part, due to the strict concavity of v i t (·), the second order derivative of f i t (y i t ) for each t is negative, which indicates that f i t (y i t ) is strictly concave as well. We can find the maxima of f i t (y i t ) by first order condition: By (4g) in the KKT conditions, we know the only y i t that makes (30) hold is y i t = x i * t for all t. The reason is that by the strict concavity assumption of utility function v i t (·), the first order derivative of v i t is strictly decreasing and therefore, for one aggregated price, there is at most one demand value x that makesv i t (x) equals that price. Therefore, for any agent i, if others send messages m −i * , the only best response of agent i is to announce m i * . Under this circumstance, sending messages other than m i * won't increase agent i's payoff u i (m). Consequently, m * is a NE of the induced game G.

E. Proof of Theorem 3
Proof: For any user i, if she chooses to participate with other users, when every one anticipates the NE, user i's payoff is of the form (13) if she only considers to modify y i and keeps other components unchanged. Thus, user i is facing the following optimization problem By the definition of NE, y i * is one of the best solutions, which yields a payoff u i (m * ). User i can also chooseỹ i = 0. Denote the corresponding message bym i . Then, the payoff value becomes u i (m i , m −i * ) = v i (0), which coincides with the payoff for not to participate. Since m i * is the best response to m −i * , we have u i (m * ) ≥ u i (m i , m −i * ) = v i (0). In other words, if every one anticipates the NE as the outcome, to participate is at least no worse than not to participate.

F. Proof of Theorem 4
Proof: Suppose the optimal solution for the original problem given by NE is (x * , λ * , µ * ), then the tax for user i iŝ The total amount of tax is For each constraint l, by the complementary slackness, we have which shows that at NE, the planner's payoff is nonnegative.
Furthermore, in order to save unnecessary expenses on the planner, the energy community can adopt the mechanism with the following tax functiont i (m) instead Note that user i has no control on the additional term because no components of m i are in that term, and thus the additional term won't change NE. Since the prices are equal at NE, so the planner actually gives l∈L λ l * b l back to the users. Hence, As a side comment, the choice oft i (m) is not unique. Any adjustment works here as long as it does not depend on m i for each t i (·), and sums up to l∈L λ l * b l at NE.

G. Proof of Lemma 5
Proof: Here we provide a non-rigorous proof of (23). The proof of (22) is quite similar.
For the detailed version of the proof, we refer the interested readers to 7.1, Chapter 4 of [69].
Before we show the proof of this part, for the sake of convenience, we define n(i, k) as the nearest user among the neighbors of user i and user i itself to user k. n(i, k) is well-defined because one can show that n(i, k) = j provides a partition for all the users. (23) can be shown by applying (21) iteratively. Recall that the message exchange network is assumed to be a undirected acyclic graph (i.e. a tree). First consider the user j on the leaves (the nodes with only one degree). Suppose the neighbor of user j is i, then N (j) = {i}. By (21), we have ν i,j t = y j t . Since no k satisfies n(i, k) = j other than j herself, (23) holds for ν i,j t where j is a leaf node.
For more general cases, to compute ν i,j t , it is safe to only consider the subgraph GR i contains only node i and node k's such that n(i, k) = j. When applying (21), it is impossible to have node l ∈ GR C i involve, because if it happens when expanding "ν" term for some j , l is a neighbor of j . We know that there is a route from i to j , say, route iLj . Since l ∈ GR C i , n(i, l) = j, there exists a route L does not involve any node in branch starting from node j, such that lL i, which results in a loop lL iLj l.
Then by using (21) iteratively, we can see that: 1. every node in GR i will be visited at least once and gives a corresponding demand "y"; 2. each y j t is given only once (except the root i, who won't give y i t in this procedure); 3. when it proceeds to the leaf nodes, the iteration terminates because there are no more "ν" terms to expand. Hence, ν i,j t = h∈GR i y h t , and we can easily verify that GR i \{i} is nothing but {h : n(i, h) = j}.

H. Convergence of the Learning Algorithm for Centralized Mechanism
The convergence of the proposed learning algorithm can be shown in three steps mentioned in Section V. First step shows the connection between m * and x * , λ * , µ * of the optimal solution for the original optimization, which has already been clarified in Section V. As a result, learning NE is equivalent to learning the optimal solution of the original optimization problem. For the second step, as a convex optimization problem with non-empty feasible set defined by linear inequalities, Slater's condition is easy to check. Therefore, we have strong duality in this problem, which means we can obtain the optimal solution of the original problem as long as we solve the dual problem. The last step is to identify the dual problem and find a convergent algorithm for it. This part of appendix explains how to pin down the dual function and the dual feasible set, and shows the convergence of PGD algorithm on this dual problem.
Before we identify the dual function of the original problem, for the sake of convenience, in constraint (3c) of the original problem, move w to the left hand side, and rewrite (3b) (3c) into one matrix formÃ x +1w ≤b, whereÃ is defined in (26) , then the objective function can be written as f (x)−p 0 w. Observe that by Assumption 4, (v i t (x i t ) − p t x i t )'s are also strongly concave without cross terms. Sequently, one can show directly by the definition of strong concavity that as the sum of these strongly concave functions, f (x) is strongly concave as well. Let h(x) = −f (x), then h(x) is strongly convex with parameter δ . Denote by h * (·) the conjugate function of h(x).
With these notations in mind, the dual function of the original problem is Here we should be cautious about the domain of D(λ). In the second line, sup w {1 T T µw − p 0 w} is only defined when the coefficient 1 T T µ − p 0 = 0, i.e., t µ t = p 0 . Therefore, we get the following dual problem.
Now we have derived the dual problem for the original optimization. To find the optimal solution, one direct thought is to use projected gradient descent. Luckily, we have the following theorem which ensures the convergence of PGD algorithm.
Theorem 7. For a minimization problem on a closed and convex feasible set X with objective function f (x), suppose X * is the set of optimal solutions. If f is convex and β-smooth on X , by using PGD with step size α < 2/β, there exists x * ∈ X * , such that Thus, the objective function is convex. Since h(x) is strongly convex with parameter δ , by the result mentioned in [70], δ -strong convexity of h(·) implies 1/δ -smooth of its conjugate h * (·).
where [·] j represents the j-th entry of inputted vector. To modify these rules into a learning algorithm for centralized mechanism, by the relation between m * and the optimal solution of the original problem and the dual, one might want to substitute λ, µ with q i , s i for each user i.
However, ∇h * is not tractable for users as they do not know the utilities of the others. Thankfully, users can obtain the values of ∇h * -related terms by cooperation without revealing their entire utility functions. This way is realized by inquiries for the demands under given prices from each user. A key point for this implementation is to build a connection between ∇h * and the marginal value functionv i t for each demand x i t . A useful result of subgradient of function f and its conjugate f * can be used here, which is quoted as Theorem 8.