Power Loss-Aware Transactive Microgrid Coalitions under Uncertainty

: Peer-to-peer energy trading within microgrid (MG) communities emerges as a key enabler of the future transactive distribution system and the transactive electricity market. Energy trading within MGs refers to the idea that the surplus energy of one MG can be used to satisfy the demand of another MG or a group of MGs that form an MG community. These communities can be dynamically established through time, based on the variations of demand and supply of the interconnected MGs. In many modern MGs, Electric Vehicles (EVs) have been considered as a viable storage option due to their ease of use (plug-and-play) and their growing adoption rates by drivers. On the other hand, the dynamic nature of EVs escalates the uncertainty in the transactive distribution system. In this paper, we study the problem of energy trading among MGs and EVs with the aim of power loss minimization where there is uncertainty. We propose a novel Bayesian Coalition Game (BCG) based algorithm, which allows the MGs and EVs to reduce the overall power loss by allowing them to form coalitions intelligently. The proposed scheme is compared with a conventional coalitional game theory-based approach and a Q-learning based approach. Our results show signiﬁcant improvement over other compared techniques.


Introduction
The power grid has experienced major transformation since the mid-2000s, and it is continuing to evolve [1]. As a part of the future smart grid, transactive energy systems are gaining particular emphasis. A transactive distribution system is composed of several microgrids (MGs) (e.g., buildings, homes, solar farms, etc.) with generation capacity that can satisfy some portion of the demand of the other MGs by allowing peer-to-peer energy trading through a transactive energy market and using an underlying communication technology [2]. An MG is a small-scale electricity distribution system with loads, generation capacity, storage, and that has islanding capability. As such, MGs can form communities [3] or coalitions for a specific time interval where some MGs have surplus energy and are willing to supply energy to others, while others generate less than their demands and are willing to buy energy (see Figure 1). The early form of MGs emerged from the military [4]. A network of MGs has been initially proposed in [5], and, more recently, community MGs, which focus more on collaboration rather than market-based interactions, have emerged to serve communities during disasters and to increase the reliability of the smart grid [6]. Meanwhile, MG coalitions using coalitional game theory, with the objective of minimizing power loss, have first been explored in [7]. Power loss happens due to heating dissipation in power lines of the transmission and distribution system, and it is proportional to the amount of power and distance. Therefore, minimizing the distance between interconnections reduces power loss, which can be realized by coalitions formed by near-by MG communities.
Besides the advances towards a smart power grid, the emergence of Electric Vehicles (EVs) and the fact that they can charge from and discharge to the distribution system, adds up to the novelties and challenges on the smart grid end. High penetration of EVs can affect the operation of MGs in different aspects due to their uncertain behaviors such as imposing unexpected load which may accumulate in peak hours or uncertainty about which MG they will be charging/discharging from when using public charging stations-or even, uncertainty of an entity type, whether it is a microgrid (fixed) or an EV (mobile) that can impact decisions. Therefore, considering the uncertainty inflicted by EVs is crucial.
In the literature, energy trading has been modeled using various game theoretical techniques. However, to the best of our knowledge, none of these studies considered the effect of uncertainty arising from the integration of EVs. In this paper, we propose a Bayesian Coalitional Game theory (BCG) approach [8] to form coalitions to effectively address uncertainty in the varying charging/discharging locations of EVs. In our approach, each MG and EV agent assumes a prior belief function over the type of other agents (either fixed MG or moving EV). As the agents interact through the iterations of BCG, they update their belief estimations, which, in the end, reach coalitions that minimize power loss. The contribution of this work is formulating uncertainty with respect to agents and including observation error in the Bayesian coalitional game.
The rest of this paper is organized as follows. In Section 2, related work is summarized.In Section 3, the system model is described.In Section 4, the BCG scheme is explained. Numerical results are provided in Section 5, and, finally, the conclusions are presented in Section 6.

Related Work
In the literature, various game theory techniques have been utilized for energy trading in MGs. In [9], a two-level continuous kernel Stackelberg game is proposed for distributed energy trading between MGs, in which seller and buyer MGs are classified as leaders and followers agents, respectively. In [10], a priority-based energy trading game is proposed in which buyers are prioritized according to their contribution history and their current energy shortage. In [11][12][13], joint learning and game-theoretical approaches have been proposed to investigate the energy trading problem. In [11], the problem of energy trading between microgrids is visited while protecting their private strategies.
To overcome the problem of incomplete information, a new scheme is proposed which combines non-cooperative repetitive Stackelberg game and Reinforcement learning algorithms. In [12], a set of connected microgrids is considered that can transfer energy with each other and the macrogrid. Each microgrid is equipped with battery to store energy and local energy generators such as wind turbine and photo voltaic panels. The level of these renewable energy sources are not constant and vary by time and thereby should be estimated based on the generation history. A hot-booting Q-learning based approach is implemented to achieve the Nash equilibrium of the dynamic repeated game. The result shows that a hot-booting method gains significant efficiency in the convergence time and also increases the overall profit of the players. In [13], the author of [12] has extended their work by implementing a deep Q-network based approach. In [7], the problem of energy trading among MGs is considered with the aim to minimize the power loss over the line. The authors have proposed a coalitional game theory approach that allows MGs to transfer energy inside a coalition. Furthermore, in [14], the authors proposed a coalitional game theory scheme to solve the problem of energy management in local energy communities. The problem of energy management of the community MGs has been addressed in [15] using coalitional game theory. Despite [7], which only focuses on energy loss, the authors expanded the objective functions to maximize the expected profit of MGs and usage of renewable energy while minimizing power loss and consumer discomfort. In our previous work [16], we aimed to minimize power loss while addressing and overcoming the uncertainties from the energy level of agents in the system using Bayesian learning. Different than prior works, in this paper, we address the uncertainty resulting from the type of agents (moving or fixed), and we use a novel BCG method which helps agents to make a belief about the type of other agents in the presence of observation error.

System Model
We consider a network of interconnected MGs and EVs as illustrated in Figure 1. The network under consideration includes N MGs, E EVs and S charging stations. Each MG is individually connected to a utility grid (macrogrid), and EVs are connected to the network through charging stations.The MGs and EVs trade energy among themselves and/or with the macrogrid. For a given time slot, each MG i ∈ N may generate power g i and may have demand denoted by d i . EVs are equipped with a battery with a capacity of b i and may have demand denoted by d i . Therefore, the surplus energy (or shortage if the load exceeds generation) of the MGs can be defined as: q i = g i − d i and denote the amount of energy that the MG would potentially export to/import from the network. Similarly, for EVs, q i = b i − d i represents surplus (or demanded) energy. We consider that, in a given time period T, some MGs and EVs have surplus (or shortage) of energy and desire to enter in energy trading with the transactive distribution system. EVs need to drive to a charging station to participate in a coalition. We assume that EVs would rationally drive to the closest charging station to have minimum driving cost. Considering the generation and demand levels, at each epoch, an MG or an EV may move from the seller group to the buyer group or vice versa. Energy trading among the MGs and EVs (through the charging stations) results in power loss over the distribution lines. This power loss for the line with resistance R ij per km and voltage U i can be expressed as [7,17,18]: R ij formulates the power loss in correlation to the distance between the seller i and the buyer j which can be an MG, an EV, or the utility grid (macrogrid). ρ is the fraction of power loss that occurs in the transformer at the macrostation which stands at the interconnect of the MGs (or charging stations) and the macrogrid. Therefore, ρ = 0 in the case of power transfer among MGs. P i (q i ) is the power flowing over the power line among the seller or buyer i, and it is computed according to: L * i expresses the power that is carried [7] and can be computed solving: This equation may have zero, one, or two solutions for a set of given parameters. In the case of two positive solutions, the smallest root is considered. Whenever (3) does not have solution, we consider that the energy of

Bayesian Coalitional Game (BCG) with Transferable Utility
To reduce transmission power loss from a distant macrogrid, we consider that MGs and EVs can participate in energy trading by joining a coalition. Forming cooperative groups and trading energy among the close by MGs (coalitions), i.e., peer-to-peer energy trading, is a promising approach to reduce the transmission power loss since the line loss depends on the distance and the power. Note that, in most of the previous game-theoretical frameworks implemented in the energy trading problems, each player's goal is to increase its revenue through efficient selling and buying of energy. Unlike those studies, in this paper, the goal of each rational MG player is to minimize its objective function which is the share of power loss and consequently decreasing the total coalitional power loss. A coalition is determined with a pair (C, v), where C denotes a set of agents that agree to form a coalition, and v represents the value function which calculates the total payoff of the coalition as described in [7]. In this scheme, a coalition can have several different coalition values, v, according to the internal energy trading policy. Setting energy trading within a coalition in an optimized way can result in the optimal coalition value. The optimal coalition value is defined as: where P loss ij shows the loss due to energy trading among seller i and buyer j while P loss i0 and P loss j0 represent the situation that the coalition has an overall surplus generation (agent i transfer energy to macrogrid) or demand (agent j receive energy from macrogrid), respectively. The following Algorithm 1 is implemented to make sure the least possible power loss occurs in a coalition transaction in each time slot: The output of this algorithm will be used in Algorithm 2 when calculating the utility in Equation (5).
In this paper, to preserve the privacy of agents, we assume that coalition members do not have any information regarding the location and type of other agents (EVs and MGs), and a coalition leader is responsible for computation and sharing payoffs. To form the coalitions, the agents should be equipped with a technique to overcome this lack of information which imposes uncertainties to the system and accordingly coalition values. The uncertainty can be explained in two levels. First, the type of members of a coalition is unknown to the other members. Second, the locations of the agents are unknown. Therefore, computing the payoff is not straightforward. This introduces uncertainty as EVs in a specific coalition charge and discharge in different charging stations, which vary the power loss in the coalition and bring uncertainty. Therefore, employing a method that allows agents to learn about each coalition to remove uncertainties and refine the coalition formation process is critical. The proposed Bayesian Coalitional Game (BCG) is a method that aids agents to overcome the uncertainties about other agents' type. We assume that two types of agents are involved in the game, i.e., fixed and moving agents.
• Fixed: We consider MGs as fixed agents. For these agents, power loss resulting from selling or buying unit power from a specific MG is always constant. • Moving: Since the EVs are driving, they might not always charge or discharge at the same charging station which results in a varying power loss when transferring power to a specific destination. EVs are assumed to utilize the closest charging station when they participate in coalitions.
Algorithm 1 Optimal Coalitional energy transaction to achieve v max (C).
1: Initialization: Divide the members of coalition C to two groups of Buyers and Sellers.
2: Main loop: find the shortest distance seller i and buyer j with q i &q j = 0 Do the transaction of amount min{q i , q j } between seller i and macrogrid. 7: Do the transaction of amount min{q i , q j } between buyer j and macrogrid. 9: else 10: Do the transaction of amount min{q i , q j } between seller i and buyer j.

11:
Update q i , q j and total power loss. 12: V max (C) is equal to negative of the total power loss.
Without loss of generality and, for the sake of simplicity, we assume that we have two charging stations in our network. In addition, we assume that EV i travels to the station 1 with probability 1 − i and to the other station with probability i . MGs and EVs have no information about the type of the other agents. We consider two observation errors. p e is the observation error when a fixed behavior of an agent observed as moving while p c represents the observation error when a moving behavior of an agent observed as fixed. We assume that p e and p c are close to zero and happen due to several reasons such as communication error or GPS inaccuracy. We formulate the BCG as follows: • agents: The set of N interconnected rational MGs and E rational EVs. • type: We assume system includes two types of agents, fixed type T f ixed and moving type T moving . Agents have full information regarding their own type, while the types of others can not be observed at the beginning. T f ixed transfer energy from the same location with probability 1.
On the other hand, T moving agents may use default station '1' or station '2' with the probabilities 1 − i and i , respectively. We assume that agents' type will be constant for a long period. • probability distribution: we consider a common prior probability, ℘, over the types. Note that assigning prior density function will not affect the long-term learning process of each MG since, after a large number of iterations, the effect of an assigned prior density function will be insignificant [8]. • expected payoff : Equation (4) formulates the total benefit of the coalition. However, each user in the coalition should have its own share from this benefit. To share the total benefit of a coalition, we use the proportional fair division algorithm which divides the total value among the members of a coalition as relative loss to energy exchange with the macrogrid and can be expressed as Equation (5). This is also the immediate payoff of the i-th MG known as transfer function [8]: where ζ i is the relative ratio of their contribution equal to In the BCG, we use expected payoff rather than immediate payoff since the system is dynamic and payoff of coalition changes with respect to the type of coalition members. This will help to overcome the uncertainty in finding optimal payoff, and, consequently, a stable coalition formation. Since we consider a discrete space, we can denote the expected payoff as follows: where t i C\{i} denotes the belief vector of agent i about other agents' types in the coalition C that stores all the possible combination of other agents and since in our scenario we have two possible types for each agent, the total number of possible combinations of other agents type believed by agent i is equal to 2 N+E−1 . u i (C, t i C ) shows the immediate payoff which is believed to be achieved by agent i. p k i t i C\{i} is joint belief probability of agent i about other agents with respect to index k which represents the k-th possible combination of other agents type believed by agent i. p k i t i C\{i} can be expressed as: t i j denotes the type of agent j believed by agent j.
In the proposed BCG, actions and preferences are defined as follows: • action: Each agent chooses which coalition to join considering based on its own expected payoff. • preference: C 1 i C 2 shows that user i prefers to be in coalition C 1 at least as much as coalition C 2 .
The users build their preferences according to their expected payoff. The expected payoff captures the beliefs about the type of other agents. However, these beliefs should change over time with the observation to represent the correct type for other agents. Therefore, an algorithm to update beliefs is essential. We propose Algorithm 2 to help each agent to update its belief. We employ the concept of the Bayesian theorem to derive an equation for belief update. Considering an agent i, which sold (or purchased) energy, then the location of agent i for the traded energy can be either of the two observations, which are denoted as O f ixed and O moving . Consequently, we are interested in calculation of p ij (T f ixed |O f ixed ) and p ij (T f ixed |O moving ). According to Bayes rule, agent j can update its belief about agent i as follows: where τ ij denotes the number of iteration that MG i observed MG j. If the agent j is a moving agent, then the belief probability of agent i that agent j will transfer energy from any location other than the promised location is denoted by ij .
can be updated as the weighted sum of previous values τ ij ij and current value τ ij + * ij using exponential moving average [19]: where ω is adjustable constant and τ ij + * ij can be found using Bayesian rule as follows: and |χ| τ ij +1 denote the number of actual fixed and total observations respectively. Therefore, we have: We use these updated probabilities in Algorithm 2 to form beliefs about agents. In each iteration, we assume that one agent is randomly chosen, and it proposes to join a new coalition. The new coalition will accept a new member if, within an existing member, it will be able to achieve a higher expected payoff value (known as Pareto order [20]) in the new coalition, while the expected payoff values for other agents remain intact (or improved). At the end of each iteration, coalition members use the observation information to update their beliefs about other members of their coalition according to (8)- (12). Consecutive merge and split iterations happen until the system of agents reach a coalition formation, from whereon no agent has any incentive to further merge to a new coalition. Algorithm 2 BCG formation for energy trading among MGs and EVs. 1: Initialization: Randomly assign all agents to the coalitions 2: Main loop: 3: for Each time slot t = 1 to T do 4: for Agent i = 1 to M + E do 5: Update current payoff u i (t) For agent i : 9: Send the joining proposal to all agents m, m ∈ C k / {i} 10: If u{i ∈ C k } ≥ u{i / ∈ C k } for all m ∈ C k / {i} then set i ∈ C k and update the u for m ∈ C k / {i}.

Baseline I-Q-Learning Based Coalition Formation:
To compare the proposed technique to a well-known machine learning based solution, we introduce the Q-Learning based approach [21]. The objective of Q-learning is to reach a sub-optimal policy by choosing actions that maximize the expected current and future rewards. The Q-Value is updated according to the Bellman's equation as follows: α denotes the learning rate and γ is a discount factor that shows the significance of future rewards. r represents the immediate reward and can be calculated the same as an immediate payoff in (5) as follows: We assume that MGs and EVs are agents and an agent's action is to refuse or accept joining proposition to their coalition from the proposer agent while the state is the vector of coalition memberships. The -greedy method is employed in order to consider action exploration.
Baseline II-Game Theory Based Coalition Formation: We implement the game theory based approach proposed in [7] for game-theoretic coalition formation. In addition to Q-learning and game theory-based approaches, we also compare the proposed approach when there are no coalitions.

Performance Evaluation
For the numerical evaluation, we consider that the number of MGs and EVs (N + V) varies between 4 and 10, which is a practical assumption considering the real cases of community MGs. MGs and macrogrid interconnections are located at random locations in a 10 km by 10 km area. EVs have varying levels of energy as the surplus or demand. A day is divided into 24 time slots where load and generation patterns are generated randomly according to a Gaussian random variable and periodically repeat after a day with slight variations as in [7]. We assume the resistance, R ij ; overall, the lines are the same. Energy transfer with the macrogrid happens in medium voltage U 0 and energy transfer among MG happens in low voltage U i and the losses inside the MG are not considered. The simulation parameters are summarized in Table 1. We compare the proposed BCG method with a coalitional game-theory (CG) based method and a Q-learning based method, as well as when there are no coalitions. The results are obtained over 10 runs, and each run includes at least 5000 iterations. In Figure 2, we present the average power loss per user versus the number of varying MGs and EVs. Average power loss per user (MGs or EVs) can be computed by dividing the total power loss (calculated with Equation (1)) by the number of agents (MGs and EVs). The number of MGs ranges from 4 to 10, and there are two EVs. We compare the proposed scheme with the CG based scheme, Q-learning scheme and no coalition scheme, and show that the BCG scheme has approximately 15% and 40% less power loss than Q-learning and CG based schemes, respectively, as well as providing a significant advantage over not having coalitions. As it is expected, when the number of MGs increases, the power loss is less for both schemes since the distance between agents is shorter. In Figure 3, we demonstrate the average power loss per user versus the percentage of EVs ranging from 0 to 75 percent. In these simulations, the total number of agents is set to 10 where the number of MGs is changing from 2 to 8. It is shown that, when the percentage of EVs increases, the average power loss increases as expected. This is due to increased uncertainty introduced by the EVs. In addition, it can be observed that the BCG method outperforms other methods as the uncertainty in the system increases.  In Figure 4, we show the convergence of the belief of the MG agent over the type of EVs. To demonstrate this, we consider a system with one MG and two EVs. We set values of In Figure 5, we increase the number of charging stations that EVs can visit. We show the power loss for BCG, Q-learning, and CG schemes when we have two and four charging stations. In this scenario, we have two EVs out of eight agents. The results demonstrate that, as the number of charging stations increases, power loss increases as well. The reason for this is that, with the increase in the number of charging stations (uncertainty about which charging station will be used increases), the agents have more difficulty in reaching optimal coalitions, resulting in more power loss. However, BCG is able to incur less loss than other schemes.

Conclusions
The future transactive energy systems will be built on peer-to-peer energy trading and MG communities. As EV penetration dramatically grows, EVs will become fundamental elements in the energy system. However, mobility of EVs and their high penetration impose various uncertainties in the system such as unexpected load which may accumulate in peak hours or hot spots, or uncertainty about which charging station to charge/discharge from when using public charging stations. In this paper, we revisited the problem of energy trading with the aim of power loss minimization considering the uncertainties introduced by the existence of EVs as dynamic elements in the system. A novel BCG approach has been proposed to form optimized coalitions of MGs and EVs, which results in less energy transfer from macrogrid or distant MGs while overcoming the uncertainty introduced by EVs. In this method, players/agents (MGs and EVs) refine their belief about the type of other players with iterative observation of the environment until merging their final belief. Whereas the conventional coalition game aims to maximize instantaneous pay off, in this method, the goal of each agent is to optimize its expected payoff through the various iterations. Comparing the proposed approach with a conventional coalition game and a Q-learning based approach, significant reduction in power loss has been achieved by the proposed technique. In our future work, we plan to extend our experiments by including IEEE standard power system models while increasing the scalability of the system.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

MG
Microgrid EV Electrical Vehicles CG Coalitional Game Theory BCG Bayesian Coalitional Game Theory