Distributed Optimization in Low Voltage Distribution Networks via Broadcast Signals †

: With the development of distributed energy resources, the low voltage distribution network (LVDN) is supposed to be the integrator of small distributed energy sources. This makes the users in LVDNs multifarious, which leads to more complex modeling. Additionally, data acquisition could be tricky due to rising privacy concerns. These impose severe demands on control schemes in LVDNs that the classical centralized control might not be able to fulﬁll. To tackle this, a model-free control approach with distributed decision-making architecture is proposed in this paper. Employing statistical methods and game theory, individual users in LVDNs achieve local optimum autonomously. Comparing to conventional approaches applied in LVDNs, the proposed approach is able to achieve active control with less communication burden and computational resources. The paper proves the convergence to the Nash Equilibrium (NE) and uses player compatible relations to form the speciﬁc equilibrium. A variant of the log-linear trial and error learning process is applied in a novel “suggest-convince” mechanism to implement the proposed approach. In the case study, a 103 nodes test network based on a real Belgian semiurban LVDN is illustrated. The proposed approach is validated and analyzed with practical load proﬁles on the 103 nodes network. In addition to that, centralized control is implemented as a benchmark to show the performance of the proposed approach by comparing it with the classical optimization result. The results demonstrate that the proposed approach is able to achieve player compatible equilibrium in an expected way, resulting in a good approximation to the local optimum.


Introduction
The rising penetration level of distributed energy resources (DER) has become a clear trend in modern distribution networks.This has been especially challenging controlling low voltage distribution networks (LVDNs) as LVDNs are expected to facilitate the penetration of DER.On one hand, higher DER penetration levels in LVDNs are synonymous with greater variability, which need larger flexibility reserve and active control.On the other hand, consumer privacy becomes an important concern when controlling users with the deployment and adoption of smart grid technologies, which set severe barriers to data collection from users [1].User uncertainty and the awareness of privacy bring more difficulties in modeling and information acquisition, both of which are critical in conventional active control approaches [2].Thus, there is a demand for active control schemes with limited information or even without specific models (model-free).
There are quite a few existing approaches that are deemed to be able to fulfill such a demand at certain degrees.Droop control [3,4] is a classic and reliable way to implement distributed control without relying on communication.Nevertheless, although drawbacks like load sharing can be solved by some modified scheme [5,6], some issues remain.For instance, most of DERs, if not all, have no droop character originally, and the control performance is affected by cable impedance [7].Moreover, as a passive approach, some notable features of the smart grid such as active management and dynamic optimization are not easy to implement by droop control.Most of the existing active control is centralized and comes with high communication demand, aiming at offering ancillary services [8,9].Additionally, the modeling of users is a remaining concern especially when they are multifarious.
All of these limits and restrictions suggest that attempts to achieving distributed model-free control is needed.As the decision-making process of control is decentralized, information is collected and used locally in a distributed manner.This helps the predicament on both communication burden and privacy concerns.Furthermore, with the ability to perform parallel computations, distributed algorithms have the potential to be computationally superior to centralized algorithms, both in terms of solution speed and the maximum problem size that can be addressed [10].Distributed control, including optimization, has found its applications in power systems especially on electrical vehicle charging and demand response.Readers are referred to the survey literature [11,12] for more instances.Most of these works are based on computational models or techniques, such as Augmented Lagrangian Decomposition [13] and the decentralized solution of the Karush-Kuhn-Tucker (KKT) necessary conditions for local optimality [14].Besides these computational approaches, as a study of strategic interaction among rational decision-makers, game theory finds its application in distributed control structure firmly.Ref. [15] proposed a strategy on distributed energy allocation between providers and consumers, while demand response among residential consumers considering irrational behavior is studied in [16], and a set of distributed robust adaptive computation algorithms for a class of generalized convex games by computing the Nash Equilibrium is proposed by [17].In this paper, a control scheme under distributed decision-making architecture is studied and the Nash Equilibrium (NE) is involved to drive users toward local optimality on flexibility management in a given LVDN.
As one of, if not the most, famous studies in game theory, NE has been attracting research interests for several decades in various fields.Generally, there are two major threads on seeking NE in practical applications: employing mathematical framework and solving a class of generalized convex games locally or globally, which is so-called "mathematical approaches" and designing rules of learning or evolving that can strike dynamic equilibrium within finite iterations, which sometimes is described as "trial and error learning".The "mathematical approaches" are widely studied in the control field.For instance, the ODE-based generalized NE computation [18], nonlinear Gauss-Seidel-type approach [19], best-response dynamics [20], generalized convex games over unreliable networks [17], and using Newton-type methods to find NE at a super-linear convergence rate [21].The "trial and error learning" have myriad applications in economics and has been employed in engineering fields over time.Hart, Foster, and Young's research [22][23][24] have proved decentralized rules can be devised that converge to NE or correlated equilibrium in general n-person games.[25] uses the classic model of rational Bayesian to maximize the discounted expected utility under the belief that the environment is constant.Based on NE, Player-Compatible Equilibrium (PCE) was proposed by [26], which extends the consideration of "trembles" in the NE by imposing cross-player restrictions on the game, in a way that is invariant to the utility representations of players' preferences over game outcomes.A trembling hand perfect equilibrium is an equilibrium that takes the possibility of off-the-equilibrium play into account by assuming that the players, through a "tremble", may choose unintended strategies, albeit with negligible probability.Readers are referred to [27] for more explanations.
Although there are quite a few existing works that involve game theory in power systems, most of them focus on electricity market issues [28][29][30][31][32] or employ game theory as auxiliary to the main control algorithm [33].For instance, Ref. [34] employs game theory in a hybrid energy system for voltage and frequency control, while the game theory algorithm has only been used to decide which energy source should be used at a certain point.Meanwhile, most of the works use one of Shapley [32], Aumann-Shapley [35], or Nucleolus-based algorithms [36].
This paper attempts to use novel game theoretic algorithms to tackle technical, non-economic issues in LVDNs.The contribution of this paper is three-fold.Firstly, a control scheme that can be implemented by LVDN users in a distributed manner, under a broadcaster-users architecture is proposed.To make it clear, the concept "users" in this paper indicates all the households and individual devices connected to LVDN, including small distributed generators, PV, small wind turbines, and so on.Without massive communication and complicated modeling, users employ local information and simple public information broadcast by the broadcaster to decide their own strategies independently.As no specific information is required in the control scheme, hot-plugging is feasible, which allows users to join/quit the network freely, making LVDNs flexible.Secondly, the paper proves that the proposed scheme is able to drive users to converge to PCE within the limited period to achieve the specific control objectives.Thirdly, a benchmark with centralized optimization is presented in this paper, with remarks on the performance comparison.The paper is organized as follows: the specific problem is elaborated in Section 2; Section 3 introduces the necessary concepts and then illustrates the scheme of the proposed approach; a simulation study based on a practical network and benchmark are presented in Section 4, and then Section 5 concludes the paper.

Notations
Assume a strategic-form game with N players, and each player i has a finite strategy set S i , where strategy s i ∈ S i .The set of mixed strategy σ i is M i and the set of strictly mixed strategies, where every pure strategy in S i has nonzero probability, is denoted by M • i .For s i , the correlated strategies of the other N − 1 users is s −i .The cost of user i is indicated by J i , the utility of user i is U i = −J i .

Problem Statement
Consider an LVDN with N users, where each user is located in a different location.The users are not necessarily homogeneous but consume or provide power on a comparable order of magnitude.Each user i has a power consumption profile pi t , where generation is regarded as negative consumption in this paper.Assume the given LVDN has limited flexibility reservation.It means if all the users operate as they want, there will be a mismatch in the balance, which leads to voltage issues.In LVDN, due to the high R/X ratio, voltage is more sensitive to active power balance, but still related to reactive power as well [37,38].A distributed control is needed to optimize the operations among users to strike a balance between user comfort and voltage regulation.Poor user comfort leads to dissatisfaction.Users will become less satisfied if they cannot operate as they wish, and partial operation or stepless adjustment of power is not feasible.For instance, if user i wants to increase its power from 1.5 to 2 kW, increasing its power to 1.8 kW instead will not make it satisfied and there is no guarantee that 1.8 kW is a feasible working state for user i.This is very common for most of the household appliances or distributed generators.Meanwhile, the communication and computation should not be too intensive for easy implementation.
Essentially, power regulation in LVDNs can be abstracted as a flexibility allocation game.If user i changes its working status against the power balance, it consumes flexibility.If the flexibility reserve is not sufficient, this leads to a voltage problem.Conventionally, the LVDN is supported by a backbone distribution network, which is supposed to provide sufficient flexibility.Whereas the flexibility allocation can be optimized to achieve a more robust and independent LVDN, which is in line with the concept of the active distribution network (ADN) [39].In this game, each user i has to implement its strategy s i ∈ S i in every control cycle, which is decided by a centralized optimization algorithm in conventional control schemes.According to pi t , each user i has an initial plan δ i t , which denotes the original profile change tendency of user i.In the given problem of this paper, S i is a finite set, then , where p i t is the actual power consumption of user i, where t + ∆t ∈ (t, t + 1).∆p i is the admissible regulation that proposed by control.Thus, the cost of user i is given by where v i t+∆t is the corresponding local nodal voltage of i after s i is implemented, and 0 < 2a < b. γ ∈ [0, 0.99] is the boundary coefficient.The objective of each user i is to minimize its J i .Although p i t+∆t is controlled by s i , user i cannot always choose s i = 1 to minimize J i , as b < 2a < 0 and if v i t+∆t turns out being far away from ξ, J i grows.

Concepts Preparation
Proposition 1. Resource allocation game [40] is a congestion game.
Congestion games are a general model for resource allocation games and are a special class of potential games [41].Theorem 1.A potential game converges to a Nash Equilibrium (NE).Games that are close (in terms of payoffs of players) to potential games have similar limiting dynamics to those in potential games.
For simplicity, readers are referred to the Theorem 3.1 in [42] for definitions and corresponding proofs.Definition 1.For user i = j and the corresponding strategies s * i ∈ S i , s * j ∈ S j , if it holds for every correlated strategy σ −j ∈ M • −j and for every then we say i is more player compatible with s * i than j is with s * j , which is denoted as s * i |i s * j |j .This compatibility relation is transitive and asymmetric, as the following propositions.Readers are referred to the appendix in [26] for the corresponding proofs.
Proposition 4. If the game only has two players i = j, then (s * i |i) (s * j |j) (s * l |l) never holds, as this concept considers third parties, whose best response is affected by the relative tremble probabilities of i = j.Definition 2. Assume there is a tremble profile , which assigns a positive number (no matter how small it is) (s i |i) > 0 to s i of every player i, use Π i for the set of these strategies of player i, then we write Π i is convex and compact.Whenever is small enough so that Π i is non-empty for every i, the existence of -equilibrium holds.
) for all i, j, and s * i , s * j such that (s * i |i) (s * j |j).Then a -equilibrium where is player compatible is called player compatible -equilibrium ( -PCE).
Theorem 2. PCE exists in every finite strategic-form game [26].Definition 4. State z(t) ∈ Z * is stochastically stable state if for every i and given any small ϕ > 0, holds for at least the fraction 1 − ϕ of all periods τ.

Architecture Setup
Besides the users in LVDN, we assume that there is a global broadcaster who periodically monitors the voltages v k t from K key points over the network at time t, then calculates the general situation parameter g t according to v t is the average of v k t , v r is the rated voltage so that g t suggests the general voltage situation of whole LVDN.Then g t is broadcast to every user as public information, users decide and implement their own strategies independently within S i .There is no extra communication besides the broadcast needed.

Control Scheme
As there is no communication besides broadcast, the decision-making process of users is a non-cooperative game.The public information provided by broadcast promotes coherence among users, as the decision-making process is decentralized compared to centralized control approaches, whose coherence is guaranteed by centralized decision making and bi-directional communication.
As stated in the previous section, the control can be regarded as a flexibility allocation game.According to Proposition 1, this turns out to be a congestion game, which is a special class of potential game.Theorem 1 suggests that a potential game converges to a NE.NE is a stable state of a system involving the interaction of different participants, in which no participant can gain by a unilateral change of strategy if the strategies of the others remain unchanged.According to the definition of NE, it is reasonable to conclude that J i has achieved its minimum if the game converges to a NE.However, just like local optimum in numerical optimization, there might be multiple NE in one potential game.Therefore, the proposed control scheme has to fulfill two requirements: firstly, the game should be able to converge to NE within an acceptable time, secondly, there should be a guarantee that the game converges to a specific admissible NE.
One can figure out that the minimum of J i is obtained, where s i = −1 and v i t+∆t −→ ξ.In other words, user i reaches its minimum cost when user i follows its own plan and the local voltage is close to the set point.Practically speaking, this is the idea of user deregulation.In this paper a so-called "suggest-convince" mechanism is proposed to configure the decision making of user i, to minimize , where τ is the period.g t is the incentive in the proposed control.Whenever user i receives g t , it will firstly operate Algorithm 1 independently to figure out the suggestion parameter ζ i t , in which Equation ( 8) is given by Step 5 in Algorithm 1 works in a seesaw manner.When v i t has a large deviation from rated voltage, it is more dominating in the computation of ζ i t , otherwise, g t will be dominating.This suggests that user i always regards its own situation as a priority. 6: Depending on s i , the corresponding control will be implemented according to (1).Nevertheless, ∆p i needs to be specified if s i = 1, as ζ i t only indicates the direction.The specific ∆p i is figured out in Algorithm 3. One can find that it is possible for ∆p i = δ i t even when s i = 1, while it is different from the situation when s i = −1, this will be explained later on.The last admissible status p i is the last corresponding p i t when v i t is within ±10% of rated voltage.In order to make the control robust and stable, users are encouraged to trace back to their p i if a regulation needs to be applied.Different from centralized control, there is no explicit set point given to users.If p i is not applicable, user i can keep the current working status and count 1 more on c i .This gives user i some time to wait for other users contribute to the necessary regulation.Whereas, if the situation remains when c i reaches to 2, user i will do an experiment by adjusting its p i t 2% towards the direction suggested by ζ i t and reset c i to zero.For the users who cannot adjust their working status by such a step, the closest working state will be chosen.Performing experiments is the least preferred operation as it will not be often when the game converges to a proper NE.

Input:
Suggestion As shown in Algorithm 2, Λ i is a vector that can affect the distribution of s i , where K = T τ .T is the timescale of Λ i while τ is the timescale of [λ i m ].In this paper, T is 24 h and τ is 0.5 h, therefore K = 48.For each period m, there is a scalar pair [λ i m ], in which λ i 1m and λ i 2m are two thresholds for s i as illustrated in Algorithm 2. Namely, λ i 1m is the threshold between staying with the current working status and making a change towards the direction ζ i t suggests, while λ i 2m is the threshold between staying with the current working status and implementing change according to pi t+1 .It benefits the whole LVDN most if user i makes a change towards ζ i t , while implementing change according to pi t+1 results in perfect user comfort.Scalar pair [λ i ] indicates control policy of user i, as it configures the corresponding probability distribution of σ i ∈ M • i .Given the situation that users have different behavioral characteristics during one day, different [λ i ] is needed to seek for the NE in different games during different periods of one day.This is why [λ i m ] applies.In this paper, the optimal values of [λ i m ] * are figured out by a trial and error learning approach illustrated by Algorithm 4, where sign bino stands for the Bernoulli trial.
In Algorithm 4, a variant of the so-called log-linear trial and error learning [43] is implemented.Note that ζ i t is given by a linear combination of two log models, and [λ i m ] is adjusted with probabilities in proportion to J i when v i t+∆t approaches the median between γ and technical boundaries (±10% of v r ). 5: if ∆v i t+∆t < γ then 8: else if ∆v i t+∆t > 0.99 then 10: else if ∆v i t+∆t > 0.1 then 33: 36:

Remarks
The proposed "suggest-convince" mechanism is essentially a simulation of the negotiation process in games.User i employs public information g t and its local information v i t , to generate ζ i t , which indicates the power changes preferred by the circumstance.Then compare ζ i t with the three zones divided by corresponding stubborn scalar pair [λ i m ] ∈ Λ i , to figure out s i ∈ S i .Eventually, one needs to review v i t+∆t to see whether there is potential to improve E(J i ), then adjust the corresponding [λ i m ], in order to approach the best mixed strategy σ * i ∈ M • i during the given period m.Within a given period m, assume user i has more moderate profile that leads to milder δ i t than user j, i = j, or the neighboring users of i are more supportive to grid regulation, v i t+∆t will be more likely stay within or closer to (1 ± γ)v r than v j t+∆t , which results in s * i |i s * j |j or σ * i |i σ * j |j depending on the network configuration.According to Proposition 2 and 3, this relationship is transitive and asymmetric, it spreads through the whole LVDN via the coupling among the users.It is important to point out that this relationship is not necessarily uniform in LVDN, as the coupling among users depend on the network topology, it is possible to have several player compatible relations in a given LVDN.One of the objectives of Algorithm 4 is discovering such a relation, encouraging users who have the upper hand to maximize their probabilities on strategies s i = 1 and s i = 0 within restrictions.Meanwhile, the boundary check on [λ i m ] in Algorithm 4 (lines 40-49) guarantees the existence of Π i , therefore -equilibrium exists.Consequently, as stated in (1), user i has a finite strategy set S i = {−1, 0, 1}.Therefore, this is a finite strategic-game and PCE exists according to Theorem 2.
Ref. [43] suggests that in an interdependent N-person game with a finite strategy set, if all players use log-linear trial and error learning, and that the acceptance probabilities are fairly large relative to the probability of conducting an experiment, then its stochastic stable state will be either a pure NE or mixed strategy that maximizes ∑ N i=1 U i if pure NE does not exist.For user i in this paper, the "acceptance probabilities" are the probabilities of v i t+∆t will stay within or closer to (1 ± γ)v r , and the "probability of conducting an experiment" is its tremble profile .The conditions are satisfied.Therefore, the stochastically stable state of user i will be PCE.Besides, Definition 2 essentially supposes that user i considers the set of all mixed correlated strategies of other users σ −i ∈ M • (S −i ).If the players can learn some prior knowledge about their counterparts' player compatibility, user i might be able to deduce that the counterparts will only play subset M−i ∈ M • (S i ).This prior knowledge can be obtained by Algorithm 4 in this paper, so that the convergence of σ i can be expected.
Additionally, some parameters are tunable.γ is set to 0.5 in this paper, which means the control is trying to make voltages converge to ±5% of v r instead of itself.This makes more sense to the proposed approach as it allows users to take advantage of more flexibility to improve comfort, but still with certain margin.It is necessary to point out that larger γ does not lead to more available flexibility, the optimal γ still needs further study.τ, as the timescale of [λ i m ], affects the converge speed and quality.Although the behavior of users in LVDNs is changing, it is relatively stable hourly and has daily characteristics.This is why τ is 0.5 h and T is 24 h in the paper.It allows user i to employ [λ i m ] and Λ i to learn the hourly statistic characteristics and fit daily dynamic patterns respectively from their counterparts.

Grid Topology
The schematic diagram is illustrated in Figure 1.It is a three-phase 230/400-V reference grid based on the topology of a real semiurban feeder in the region of Flanders, Belgium [44].To make the network more multifarious and ill-designed, new users such as residential wind turbines and small PV farms are added, increasing the number of nodes from 62 to 103.As listed in Table 1, the impedance values are calculated according to the Belgian standard for underground distribution cables with an assumed operating temperature of 45 • C. All of the main feeder cables are of type EAXVB 1 kV 4 × 150 mm 2 .A 250 kVA 10/0.4 kV transformer is assumed with an impedance of 0.013 + 0.038j pu.From feeders to each individual user,/hl a cable EXVB 1kV 4 × 16 mm 2 is used with a length of 15 m.To simplify, the three-phase system is assumed to be symmetrical, then the analysis can focus on a single phase.The v r of all the users is 230 V.

Profiles and Conditions
A model of domestic electricity use is used to generate the high-resolution household consumption profile, which is based upon a combination of patterns of active occupancy [45].Lighting and appliances, occupant's behavior, month of the year, and weekday or weekend have been taken in account for the profile generation.Household profiles in this paper are assumed on a weekend day of June, whose source database is based on a realistic measurement of 22 domestic dwellings around the town of Loughborough in the East Midlands, UK. is assumed that normal households randomly have one to four family members, and four to six family members are assumed in high-consuming households.Profiles of PV panels are taken from a 368 kWp PV system on the rooftop of EnergyVille in Genk, Belgium [46].The system has 24 strings of PV panels, and each string can be recorded individually.The profiles are scaled, randomly selected and combined from the raw data in the week of 6th-12th June, 2016.As all the PV strings are located close to each other, the data has a good correlation, which is suitable to be used as a profile in the LVDN.Regarding residential wind turbines, their profiles are obtained from Elia, a Belgian transmission system operator.The data was measured from Belgian onshore wind turbines on the 6th and 7th of June, 2016, the same time period as the profiles of PV and households.Then it is scaled and assigned randomly to the residential wind turbines in the test network.Four small residential generators are connected to the test network to represent other distributed generation users, such as CHP generators.Their profiles are generated randomly taking the occupants' behavior into account.The operation ranges of users are given in Table 2.

−7∼0
In LVDNs, due to the high R/X ratio, voltage is more related to active power distribution [37].Moreover, users, especially households in LVDN do not have any devices of which the reactive power output can be controlled.To simplify the case, assume that users only consume active power or the users' power factor is at least 0.9.This does not mean that the reactive power is ignored in the control, as users consider the actual voltage changes.The associated influence caused by other factors, such as reactive power changes due to the control, have already resulted in the actual voltage changes.For instance, although the incentive of a user to turn on the washing machine is active power, the associated reactive power is changed as well.

Simulation Setup
The algorithm runs once per minute, which means 30 times during each m.The initial [λ i m ] is set to [0.5, 0.2] uniformly.The power flow simulation is implemented in MATLAB, based on the backward-forward sweep method [47].Although the control is simulated step by step, its effect is processed as continuous.Namely, to make the simulation more realistic, user behaviors are extracted from the original profile.In every simulation step, we assume that the users will restore half of the regulations (if any) on them from the previous step, to simulate the decaying continuous effect from the control implemented before.Therefore, the reference profile in each step is the combination of corresponding user behaviors and the decayed previous status, instead of just obtaining status from the original profile rigidly.Meanwhile, to simulate the daily characters from different days, we use the same original profiles for all the days, nevertheless, small random generated variations are added to p i to make the operation conditions in the simulation not remain exactly the same among different days, which makes it more realistic.

Simulation Results and Remarks
To present a basic view on the convergence process, the average voltage from all the users are shown in Figure 2a.The test profile has three typical scenarios: from 0-8 h, the original profile is moderate, the users gradually adapt themselves to form an equilibrium situation; from 10-16 h, the system suffers from severe overvoltage due to the generation by PV modules, users finally reach an equilibrium that employs the 5% margin to avoid bothering the users as much as possible.From 18-24 h, the mild voltage variation does not motivate most of the users, so that they increase their corresponding [λ i m ] to guarantee their comfort.To study this in numbers, the discord rate D m is defined as where Ψ is the operation that counts the difference between the actual changes p i t+∆t − p i t and δ i t : if δ i t is eventually implemented, it counts 0, if does not, it counts 0.5, and it counts 1 if the actual adjustment opposites to what δ i t demands.We do not use J i here as J i is a combination of user comfort and technical demands, which will be shown later on.The hourly D m is illustrated in Figure 2b.From day 1 to day 4 one can figure out that along with the form of PCE, the discord rate is decreasing, which suggests better user comfort and smoother control.A centralized control that can derive the optimal adjustment for each participating user is implemented [48,49] as the benchmark.The controller solves an optimization problem by the full non-linear network model (Alternating Current Optimal Power Flow, ACOPF).It employs two-way communication; each participating user sends a range to the central controller, within which it can adjust its power consumption or generation.Besides, assumptions of perfect information, instant computation, communication, and implementation are given to the control, to obtain the theoretical best result to be compared with.The objective is minimizing (p i t+∆t − pi t+1 ) 2 , with the boundary condition of 0.95 The general average voltage is shown in Figure 3a, while the comparison of the discord rate is illustrated in Figure 3b.The statistical results of general voltage and nodal voltages are shown in Table 3. From Figure 3 it can be concluded that the proposed approach has achieved a good approximation to a centralized optimization solution.Meanwhile, the proposed approach comes with a relatively lower discord rate.This does not mean that the proposed approach is better than a centralized approach in all the aspects, as the centralized approach has a rigid boundary, while the proposed approach statistically converges to the boundary.This can be observed in Figure 4, which shows all the 103 nodal voltages of the proposed approach and centralized approach.It is clear that the proposed approach does not have an absolute hard boundary-it stochastically allows users to use the preserved margin to some extent, while the conventional ACOPF uses the absolute hard boundary to guarantee the preserved margin, whereas it comes with a higher discord rate.Besides, few overvoltages of nodes can be observed from Figure 4a, this is caused by drastic power changes.As it is model-free, the proposed approach is not able to have an accurate prediction on voltage changes, when the general condition is mild at time t, it is very likely to result in corresponding mild ζ i t , which gives users a higher probability on implementing pi t+1 to maximize the user's comfort.If pi t+1 indicates a dramatic change compared to p i t , the slight overvoltage might happen.Nevertheless, it is fixed immediately in the next control period.
To show how the PCE is formed and the evolution of the individual user, Figure 5 illustrates the changing of [λ i m ] in all the four days continuously, user i = 36 is selected randomly as an instance.Combining the figures of λ i 1m and λ i 2m , the evolution of the two thresholds of different s i in S i is changing.During some m, due to flexible neighbors or mild user profile, user i = 36 remains the upper hand all the time, such that both of the two scalars in its [λ i m ] reach their maximum, to take advantage of the deregulation as much as possible.Whereas for some m, user i = 36 increases its [λ i m ] at the beginning to feel out the other users, then it has to restrain its scalars as it does not have actual superiority compared to others.The global convergence can be observed from Figure 6.The gross discord rate is given by ∑ 24 m=1 D m everyday, while nodal voltage deviations are exactly in the manner of Table 3 but for 7 days.It can be observed that, with the current configuration, from the fourth day, the whole system reaches a statistically steady state, which is supposed to be PCE according to the derivations in Sections 2 and 3. Users achieved their approximate local optimum in the non-cooperative game.Four days are not a short time, nevertheless, although coming with high gross discord rate, the system gets well controlled from the first day, then the equilibrium is gradually formed via trial and error learning process.As long as the equilibrium is formed, it is robust unless the whole LVDN gets completely changed, which makes the LVDN hot-plugging and flexible.These features are in line with the concept of ADN as well.

Conclusions
A model-free control approach with distributed decision-making architecture is proposed in this paper.With statistic and game theories, it achieves good approximation to local optimum among individual users in the LVDN.Compared to conventional approaches applied in the LVDN, the proposed approach is able to achieve active control with a low communication burden and computational resources.Users broadcasters are double-blind to each other, which allows users to enter or quit the network freely (i.e., hot-plugging).Moreover, there is only anonymous general information delivered through the whole network, which addresses the concern on privacy.These make the proposed approach in accordance with the developing trend of privacy protection and decentralization in the LVDN.
Although not many, there are some existing works that proposed voltage control by game theoretical algorithms.For instance, Zhou et al. [50] employs Volt/VAr control dynamics with nonlinear power flow model to do a voltage control game.Compared to the proposed approach, as [50] works with an explicit model to indicate the influence on the voltage that users can exert by changing their consumption, a more accurate result could be expected if the model is well designed.Nevertheless, if the gradient of its piecewise linear volt/var control curve is too large, the algorithm may have a convergence problem, which is not a problem for the proposed approach as its convergence is guaranteed by the log-linear trial and error learning process and the approach itself is model-free.Nassaj and Shahrtash [51] employs the Shapley-Shubik index to implement dynamic voltage control in the distribution network.Normally this approach needs to figure out the Shapley values by communication before starting the game; Nassaj and Shahrtash [51] calculates the Shapley-Shubik indices, and then distributes them to implement the control, which is essentially not a completely distributed control.
Theoretically, the proposed approach can be applied in other scenarios as well, as long as the control can be formed as a game that meets two prerequisites.Firstly, it should be a potential game, which mainly means the congestion game in the power system.Secondly, there should be a hierarchical relationship in the aspect of priority among interdependent controlled agents, no matter whether this relationship is connatural or given on purpose.For instance, the control of OLTC and elastic loads in [52], smart EV charging in [53], the demand side management of large populations of thermostatically controlled loads [54], and frequency control with energy storage systems in the distribution network [55].
Although the proposed approach is able to minimize J i by reaching PCE, it does not guarantee global optimum, as the NE is an equilibrium among users, where every user i is close to its local optimum with limitations.The global optimum could be able to be achieved via peer to peer communication, with sophisticated algorithms and configuration according to the Fundamental Theorems of Welfare Economics [56].It has the potential to promote Pareto improvement on the achieved NE, and eventually reach a statistically steady point on the Pareto Frontier.This will be the focus of future research as seeking the Pareto Frontier is one of the classic ways to solve multi-objective optimization problems.

Figure 1 .
Figure 1.The topology of the test network.Cable lengths are drawn to scale.

Figure 2 .
General voltage (a) and discord rate (b) from day 1 to day 4.

Figure 5 .
Figure 5.The evolution of [λ i m ] in four days, i = 36.

Figure 6 .
Figure 6.The evolution of gross discord rate and nodal voltage deviations.

Table 2 .
Operating ranges of users (negative values mean production, positive values consumption of electricity).

Table 3 .
Performance statistics of individual users.