Management of Voltage Flexibility from Inverter-Based Distributed Generation Using Multi-Agent Reinforcement Learning

: The increase in the use of converter-interfaced generators (CIGs) in today’s electrical grids will require these generators both to supply power and participate in voltage control and provision of grid stability. At the same time, new possibilities of secondary QU droop control in power grids with a large proportion of CIGs (PV panels, wind generators, micro-turbines, fuel cells, and others) open new ways for DSO to increase energy ﬂexibility and maximize hosting capacity. This study extends the existing secondary QU droop control models to enhance the efﬁciency of CIG integration into electrical networks. The paper presents an approach to decentralized control of secondary voltage through converters based on a multi-agent reinforcement learning (MARL) algorithm. A procedure is also proposed for analyzing hosting capacity and voltage ﬂexibility in a power grid in terms of secondary voltage control. The effectiveness of the proposed static MARL control is demonstrated by the example of a modiﬁed IEEE 34-bus test feeder containing CIGs. Experiments have shown that the decentralized approach at issue is effective in stabilizing nodal voltage and preventing overcurrent in lines under various heavy load conditions often caused by active power injections from CIGs themselves and power exchange processes within the TSO/DSO market interaction. a modiﬁed IEEE 34-bus feeder have shown that this approach can effectively stabilize the voltage at network nodes and prevent overcurrent in lines under heavy load conditions. In the context of TSO/DSO market interaction, such operating conditions can result from the power exchange between systems of different voltage levels during ﬂexibility services rendered. Findings indicate that the proposed approach to secondary QU-control allows increasing the voltage ﬂexibility of the network at the DSO level and maximizing the hosting capacity.


Introduction
Maintaining busbar voltage within specified levels throughout the entire power system is essential for stability and power quality. Voltage stability is largely related to the reactive power capabilities of generating units and reactive power compensators and their location. Demand behavior and the presence of distributed energy resources (DERs) have a considerable impact on voltage. In this case, an increase in the number of DERs creates completely new power flows, alters the voltage profiles in the distribution system, and can diminish power quality. In this context, there is an elevated need for flexible solutions to maintain the required voltage level at busbars. Ancillary services from distributed inverter-based generation and energy storage systems can be instrumental solutions for voltage stabilization in systems with DERs [1].
It is worth noting that, for example, European draft network codes prohibit or restrict the DSO's ability to export reactive power to transmission networks. There are also physical operating limits in terms of voltage to be met [1]. As the number of distributed energy resources connected to distribution networks increases, active power injections lead to changes in voltage profile. However, with effective coordination, DSOs can employ converter-interfaced generation (CIG) to monitor voltage and manage power losses. Moreover, DSOs, by using flexibility services, could better control voltage profiles in areas with a large number of renewable energy sources (RESs).

lems:
• overvoltage at nodes where DG "delivers" significant amount of active power to the network; • overload of distribution transformers and lines; • other voltage problems such as imbalance, power quality problems (flicker, voltage wave quality); • mis-operation of relay protection systems due to bidirectional power flows.
In the light of CIG integration, the development of microgrids, which can be considered as localized distribution sub-networks, is a particular issue as they perform independent control, including that after disconnection from the main network through the point of common coupling (PCC) (Figure 1). At the same time, in urban distribution networks, which can include several microgrids, in the feeders of a network in a residential area, where the load is minimal but the amount of solar energy is significant during the daytime, solar generation can cause a reverse energy flow into the network. Due to the relatively high resistance of the distribution network, such a reverse power flow can cause overvoltage limiting the use of existing renewable energy sources and the integration of new photovoltaic plants. All these factors determine the problem of maximizing hosting capacity. In this regard, the primary purpose of calculating PV hosting capacity is to inform energy suppliers about the limitations of the possibility of integrating PV modules for their feeder without the need to upgrade the network [3]. Consequently, given the recent advances in voltage control techniques and more extensive deployment of CIGs, and, above all, PV modules, it is necessary to include voltage control in the analysis of the placement of such modules. • overvoltage at nodes where DG "delivers" significant amount of active power to the network; • overload of distribution transformers and lines; • other voltage problems such as imbalance, power quality problems (flicker, voltage wave quality); • mis-operation of relay protection systems due to bidirectional power flows.
In the light of CIG integration, the development of microgrids, which can be considered as localized distribution sub-networks, is a particular issue as they perform independent control, including that after disconnection from the main network through the point of common coupling (PCC) (Figure 1). At the same time, in urban distribution networks, which can include several microgrids, in the feeders of a network in a residential area, where the load is minimal but the amount of solar energy is significant during the daytime, solar generation can cause a reverse energy flow into the network. Due to the relatively high resistance of the distribution network, such a reverse power flow can cause overvoltage limiting the use of existing renewable energy sources and the integration of new photovoltaic plants. All these factors determine the problem of maximizing hosting capacity. In this regard, the primary purpose of calculating PV hosting capacity is to inform energy suppliers about the limitations of the possibility of integrating PV modules for their feeder without the need to upgrade the network [3]. Consequently, given the recent advances in voltage control techniques and more extensive deployment of CIGs, and, above all, PV modules, it is necessary to include voltage control in the analysis of the placement of such modules.

Related Work
Classical control methods in such cases may have limitations, such as compensation for reactive power that can lead to overloads. This factor, in particular, affects the fact that the existing methods for placing PV modules do not factor in voltage control devices when calculating PV hosting capacity [4,5]. This is because outdated voltage monitoring devices, such as LTCs and capacitor banks, are not fast enough, which is why transient overvoltage occurs. At the same time, modern intelligent inverters can maintain the reactive power of the feeder when operating in several control modes. They are faster than traditional controllers and are, therefore, potential candidates for eliminating voltage quality problems arising from the variability of DERs.
According to [6][7][8][9][10], provision of optimal operating conditions and a further improvement in CIG performance, for example, maximization of hosting capacity in distribution networks, reduction of network losses, and others, require enhanced state estimation and coordinated operation of various control means. At the same time, even a low level of communication between CIGs allows achieving better control settings and increased performance [11]. In the future, in addition to P-and Q-control-based flexibility services, CIGs may also provide other local power quality improvement services for DSOs.
One of the promising solutions to the above problems can be secondary voltage control through the coordination of available QU-droop control-based regulators. A new approach to integrating CIGs into distribution grids and micro-grids suggests harmonized voltage control using inverters, through which solar and/or wind generation is connected to the grid and which are located at the end users. In the context of considerable DERbased generation and a plunge in consumption from the grid, voltage stabilization and loss reduction are provided by remotely controlled inverters standing "behind the meter". Various flexibility services related to active power P and reactive power Q from CIG units can be provided by different modes of primary control of the inverter, through which different types of DERs are connected [12]. More specifically (Figure 2), the primary controller of each DG i, i = 1, …, N registers reference voltages, , from the secondary controller and regulates output voltage to the required setting, which is usually achieved using reactive droop control (Q/V strategy) methods without data exchange between CIGs [13,14]. Existing methods of secondary control can be divided into two main classes: centralized and distributed. The centralized controller collects information from all CIGs and makes a decision on collective management of the electrical network operation, which is Existing methods of secondary control can be divided into two main classes: centralized and distributed. The centralized controller collects information from all CIGs and makes a decision on collective management of the electrical network operation, which is then sent to the appropriate CIGs. A vivid practical example of such a management is that of the California Independent System Operator (CAISO), which is considered to be a bold experiment on energy flexibility. Based on the 131 MW Tule wind farm in San Diego, CAISO and Avangrid Renewables have proved that powerful wind turbines connected to the grid through inverters are sources of energy flexibility and can successfully provide services to control frequency and active power flows, and maintain voltage [15]. In the proposed solution, a centralized PCC controller is responsible for the function of critical control of all inverters in a wind farm, and it continuously monitors the state of inverters and controls them to ensure that they produce the active and reactive power needed to provide the desired voltage curve on the high side of the transformer.
Although centralized methods of control for CIG-based systems show promising results, such methods are associated with loss of bandwidth in the communication lines and are often associated with the issues of a single point of failure, as well as the "curse of dimension", which makes it impractical to deploy them in today's large power systems [16]. Alternatively, one can employ distributed methods, where each CIG interacts with neighboring CIGs and decides on decentralized control based on its state and the states of its neighbors shared through local communication networks.
At the same time, the basic principle of such approaches is the exchange of information through neighboring communication using a distributed protocol and reaching a consensus, for example, an average value of the measured voltages. In contrast to frequency, voltages are local variables, which means that they can be restored either on selected critical buses or at a system level [16]. In the latter case, the distributed methods can be used to generate a common signal, which is compared with a reference one and passes through a local PI controller that generates an appropriate control signal to be sent to the primary level to eliminate associated steady-state errors. Traditional distributed secondary controllers were based on the principle of normal averaging [17,18]. These papers defined the interaction between CIGs as a key component in achieving the control aims while avoiding a centralized architecture. The published works also present several distributed control methods, of which the algorithms of gossip [19] and consensus [20] have recently drawn considerable attention, mainly due to their robustness for distributed information exchange over networks. Given the specific features of the distributed nature of control, decentralized approaches often use a multi-agent systems (MAS) framework [21].
Thus, traditional principles of distribution network operation and control limit the CIG's capabilities to provide system-wide ancillary services in certain situations. Overcoming these limitations requires new principles of active and adaptive control [2]. Therefore, with recent advances in voltage control methods and large-scale deployment of DERs with intelligent inverters, it is necessary to include voltage control in the analysis of CIG placement. With this approach, it is possible to simultaneously achieve better hosting capacity of the distribution network and the flexibility of CIG services, even in very low-load situations.

Paper Contribution
The aim of this paper is to extend the existing multi-agent systems (MAS) models of decentralized inverter-based secondary voltage control to improve CIG-associated integration problems (overvoltages, voltage flexibility, and hosting capacity) in active distribution networks and microgrids. The paper proposes a new approach to the decentralized inverterbased secondary voltage control based on multi-agent deep reinforcement learning (MARL) algorithm to improve voltage flexibility and hosting capacity of microgrids and active distribution networks. The proposed approach can help better maintain voltage, maximize hosting capacity in distribution networks, and improve the availability of distribution network-connected DERs for TSO flexibility services. We adopt the centralized training and decentralized execution scheme, where each agent has its actor and critic networks, and their policies are updated independently in contrast to the algorithm of consensus that may hurt the convergence speed.
The remainder of the paper is organized as follows. Section 2 describes the proposed methodological MARL-based distributed voltage droop control framework as well as estimating hosting capacity/voltage flexibility approach. Section 3 presents a case study based on a modified MV IEEE 34-bus test feeder to demonstrate the main features of the

Voltage Droop Control for Inverters
Inspired by droop control used for synchronous generators, researchers have proposed a similar control scheme to inverters [21][22][23][24]. The primary motivation for this is that droop control actually implements decentralized proportional control and, therefore, represents a plug-and-play-like control scheme that is modular and hence simple in implementation in the sense that there is no need for centrally coordinated network control. In large highvoltage transmission systems, droop control is usually used only to obtain the desired active power distribution, while the voltage amplitude on the generator bus is regulated to the nominal voltage setpoint using (usually in the range of 0.95 ÷ 1.05 p.u.) a power system stabilizer. However, unlike high voltage transmission systems (hundreds-thousands of kilometers), the transmission lines in microgrids are usually relatively short (few tens of kilometers), which is why droop control is employed here to control voltage to achieve the desired reactive power distribution.
The rationale for using voltage droop controllers is as follows [25]. It follows for small angular deviations δ ik , that sin δ ik ≈ δ ik , and cos δ ik ≈ 1. Consequently, reactive power in predominantly inductive networks, i.e., where G ik ≈ 0, is most affected by voltage changes. Therefore, amplitudes of the invertor voltage V i vary depending on reactive power deviations (in terms of the desired value) according to: i : R ≥ 0 → R is measured reactive power, and Q d i ∈ R is its desired settings. For predominantly inductive networks and small angular deviations (for instance microgrids in islanded mode with sudden switching of reactive load), reactive power flow of the i-th node Q i , decreases to Q i : R n ≥0 → R : In this case, then, reactive power Q i can be controlled by controlling amplitudes of voltage V i and V k , k ∼ N i .

Multi-Agent Reinforcement Learning (MARL)-Based Distributed Voltage Control for Inverters
Reinforcement learning is one of the machine learning methods, during which the system (agent) under test learns by interacting with some environment. Reinforcement signals are the response of the environment to decisions made. The environment is usually formulated as a Markov decision-making process with a finite set of states. Formally, the simplest reinforcement learning model consists of a set of environmental states S, a set of actions A, and a set of scalar "gains". At any time instant t, agent is characterized by state s t ∈ S and set of potential actions a ∈ A(s t ); it transitions to state s t+1 and gains a reward r t . Based on this interaction with the environment, the reinforcement learning agent must strategize, π : S × A → [0, 1] , where π(s, a) is the probability of choosing an action a ∈ A(s t ) in state s. This strategy maximizes the value R = r 0 + r 1 + · · · + r n in the Markov decision-making process [26]. MARL is an extension of the single-agent model and refers to multi-agent/player systems. In recent years, several MARL-based approaches have been proposed for autonomous voltage control in microgrids [27][28][29][30]. The agent can find optimal policies, when they interact with the environment as well as offline learn to cooperate with other agents by simulating their policies. After completing training, agents can make real-time decisions that adapt well to unknown power grid or microgrid dynamics. This fact determines a strong motivation of developing MARL-based voltage control applications for isolated microgrids and energy communities with RESs and power flexibility services. Based on the analysis of these works, in this paper, we have developed a MARL-based model-free approach for decentralized inverter-based secondary voltage control to manage flexibility services and increase hosting capacity. By model-free algorithms are meant that do not actually use the well-known model of the environment associated with the Markov decision process. In fact, this type of reinforcement learning method can be thought of as a trial and error algorithm.
Multi-agent networks can be represented as graphs in which vertices represent physical or virtual items (agents) and edges represent the interaction between them. Specifically, we model the electrical network with CIG as a multi-agent network, G = (V, E ), where each agent i ∈ V interacts with its neighbors N i : j ε ij ∈ E . Then we can consider S and A as the global state and action spaces that represent, respectively, aggregated set on state and control for all CIGs. The main dynamics of the microgrid can present using the state transition probability P : S × A → [0, 1]. We consider a decentralized MARL framework to achieve scalable inverter-based secondary voltage control. Each CIG only communicates with its neighbors and makes control decisions based on these observations. Since each agent i (CIG i) observes only part of the environment (its own state and the state of its neighbors), we have a partially observable Markov decision process (POMDP) [31].
We solve the above problem with MARL and define the key elements in the POMDP in question as follows: • Action space: the control action for each CIG is the secondary voltage control setpoint V n . By analogy with [30], we used 10 discrete actions evenly distributed between 1.00 and 1.14 p.u. The overall action of a microgrid or active distribution network is the joint actions of all DG, i.e., a = υ n1 × υ n2 × · · · × υ nN . • State space: the state of each CIG i is chosen as s t = δ i , P i , Q i , i odi , i oqi , i bdi , i bqi , υ bdi , υ bqi to characterize operating parameters of CIGs, where δ i is measured reference angle (phase); P i , Q i are active power and reactive power, respectively; i odi , i oqi , i bdi , i bqi [A] are output currents d-q of CIG i and directly connected busbars, respectively; while υ bdi , υ bqi [kV] are output voltages d-q of the connected busbar, respectively. • Space of observations: it is assumed that each CIG can only observe its local state and messages from its neighbors, i.e., o i,t = S i,t ∪ m i,t , where m i,t is communication message received from neighboring agents j ∈ N i , which will be considered further in more detail. • Transition Probabilities: the probability of transition T(s 0 |s, a ) is a characteristic of the dynamics of the electrical network with CIG. We follow the models from [32] to build a platform for simulating the operating conditions of a microgrid or active distribution network without using any prior knowledge of the transition probability since the MARL used is model-free.

•
Reward function: we apply the following reward function for generators to converge quickly to reference voltages (for example, one p.u.): where r i,t is a reward of agent i at time step t. We split the voltage range into three working areas similarly to [30]. These are an area of normal operating conditions (|0.95, 1.05| p.u.), . With the reward formulated, CIGs with "emergency" voltages will receive a high penalty, while CIGs with voltages close to 1 p.u. will receive a positive reward.
The proposed voltage control is distributed and requires communication among CIGs in the network. We consider a decentralized MARL structure in which each agent (CIG) can communicate with its neighbors and exchange necessary information, for example, states. Information from neighboring agents is used to enhance the efficiency of training. Thus, based on the structure proposed in [30], agent i updates its hidden state h i,t at each step t.
where h i,t−1 is a hidden state from the previous time step; o i,t is the observation of agent i, which was made at time t, i.e., its internal state and the states of its neighbors; h N ,t−1 is an integrated state from neighbors; e s , q 0 , and q h are differentiable message encoding and extraction functions that use single-layer fully connected deep neural network layers with 64 neurons; while f i is the function of encoding hidden states and communication information, where we use the LSTM network. In this article the deep neural structure was chosen based on the studies obtained in [31], where the authors introduced the deep recurrent Q-network (DRQN), a combination of a LSTM and a Deep Q-Network. Such approach shown better results to solve POMDPs than comparable (non-LSTM) neural networks. Instead of low-dimensional indicators, as in [33], we include the neighbor's complete states in the local observation o i,t = s i,t ∪ s N,t , to improve the observability of the agent and use the network to automatically examine the corresponding representation. In this case, the received communication message m i, t of the i-th agent is a combination of internal states and hidden states of its neighbors.
Hidden state h i,t received from (5) is then used in actor-critic networks to generate random actions and predict value functions, respectively, i.e., π θ i (|h i,t ) and V ω i (h i,t ) ( Figure 3). We use a centralized training scheme with decentralized execution [34,35], where each agent has its actor-critic networks, and their policy is updated independently but not based on consensus [36] that can reduce the convergence rate of the solution. Cooperative MARL aims to maximize total global rewards , = ∑ , ∈ , where , = ∑ , denotes the cumulative reward for agent i. Such a mathematical formulation, however, is associated with typical problems of multi-agent training [26]. These are loss of bandwidth, possible decrease in training efficiency, restrictions on the number of agents, and slow convergence of the global solution. The spatial discounting factor was proposed in [30] to solve these problems when each agent uses the following reward: Cooperative MARL aims to maximize total global rewards R g,t = ∑ i∈V R i,t , where R i,t = T ∑ k=o γ k r i,t+k denotes the cumulative reward for agent i. Such a mathematical formulation, however, is associated with typical problems of multi-agent training [26]. These are loss of bandwidth, possible decrease in training efficiency, restrictions on the number of agents, and slow convergence of the global solution. The spatial discounting factor was proposed in [30] to solve these problems when each agent i uses the following reward: where α d i,j ∈ [0, 1] is a spatial discounting function, d i,j is a distance between agents i and j. The distance can be the Euclidean distance, which characterizes the physical distance between two agents (generators), or the distance between two vertices on the graph (i.e., the number of shortest connecting edges).

Estimating Photovoltaic (PV) Hosting Capacity and Voltage Flexibility
MV and LV networks have limited hosting capacity, which depends on load conditions, the capacity of components, and network topology. Excess of this limit manifests itself through overvoltage, undervoltage (voltage limitation), or line or transformer overload (current limitation). In networks with voltage-limited hosting capacity, intelligent inverters can provide additional network flexibility and increase hosting capacity [37].
Although hosting capacity now generally refers to various types of CIGs, in this paper, we focus on the classical PV hosting capacity analysis to assess the performance of a decentralized MARL-based inverter control method. The basic idea of PV hosting capacity calculation, in this case, is to increase the number of PV plants in the distribution network or microgrids until any scheduling principle or limitation is violated. We assess the feeder's PV hosting capacity, which means the largest capacity of a PV plant that can be placed without violating operational restrictions. In this case, we focus on overvoltage and overload in lines and MV-LV transformers.
For the stochastic nature of PV module placement to be factored in, we use the Monte Carlo simulation approach to simulate a whole host of different future PV installation scenarios [4]. We modify the algorithm proposed in [37] by simulating k scenarios of placing PV modules, each of which represents one Monte Carlo run for the investigated distribution network or microgrid. Then PV hosting capacity, H, can be defined as: where S is the discrete PV customer penetration levels, indexed by i, S ∈ {1, 2, . . . , i, . . . 100}; PV i pen is the set of all PV penetration levels indexed by customer penetration level i, PV 1 pen , PV 2 pen , . . . , PV i pen , . . . PV 100 pen ; V i max,k , I i Tmax,k , I i Lmax,k is the set of maximum primary voltages, line loading and transformer loading recorded for k PV deployment scenarios.
Based on the aforementioned, to relate the concepts of flexibility and hosting capacity for active distribution networks, we numerically estimate the voltage flexibility as: where H base is PV hosting capacity calculated for the base case without voltage regulation; H cont is PV hosting capacity calculated for the option with the possibility of voltage control. It is worth noting that H cont suggests calculation of PV hosting capacity with control instruments available in the distribution network or microgrid, including QU-droop control, regulation of the transformer tap, control of compensating devices, and others.

Results
We applied the proposed MARL-based decentralized voltage control approach to a modified MV IEEE 34-bus test feeder with six CIGs. This system was an actual feeder located in Arizona. Its nominal voltage is 24.9 kV. It is characterized by long and lightly loaded two in-line controllers, an in-line transformer for a short 4.16 kV section, unbalanced load, and shunt capacitors. This system was designed to evaluate and benchmark algorithms in solving unbalanced three-phase radial systems. Thus, this system represents a reduced-order model of an actual distribution circuit. In our modification, this network includes six CIGs (PV systems, each 20 kW) and a somewhat simplified topology sufficient to demonstrate the proposed secondary QU control approach (Figure 4). The main parameters of CIGs, lines, and loads are summarized in Table 1. where PV hosting capacity calculated for the base case without voltage regulation; is PV hosting capacity calculated for the option with the possibility of voltage control. It is worth noting that suggests calculation of PV hosting capacity with control instruments available in the distribution network or microgrid, including QU-droop control, regulation of the transformer tap, control of compensating devices, and others.

Results
We applied the proposed MARL-based decentralized voltage control approach to a modified MV IEEE 34-bus test feeder with six CIGs. This system was an actual feeder located in Arizona. Its nominal voltage is 24.9 kV. It is characterized by long and lightly loaded two in-line controllers, an in-line transformer for a short 4.16 kV section, unbalanced load, and shunt capacitors. This system was designed to evaluate and benchmark algorithms in solving unbalanced three-phase radial systems. Thus, this system represents a reduced-order model of an actual distribution circuit. In our modification, this network includes six CIGs (PV systems, each 20 kW) and a somewhat simplified topology sufficient to demonstrate the proposed secondary QU control approach (Figure 4). The main parameters of CIGs, lines, and loads are summarized in Table 1.

MARL-Based QU Droop Control
The MARL approach is implemented in the Python environment using open-source tools for power system modeling (pandapower and PowerNet). The simulation platform used is based on the technical characteristics of line and load described in [35,38]. Simulation of heavy load conditions involved random load changes added throughout the network with deviations of ±20% from the nominal values and random disturbances in the range of ±5% for each load. All CIGs in the considered schemes were monitored with a sampling time of 0.05 s, and each CIG could communicate with its neighbors across local boundaries of communication. The primary control of the lower level is implemented by an analogy with [32].
We compare the used MARL approach with several state-of-the-art benchmark MARL algorithms: IA2L [39] and CommNet [40], to demonstrate its effectiveness. We train each model over 10,000 episodes, with γ = 0.99, minibatch size N = 20, actor learning rate η θ = 5 × 10 −4 , and critic learning rate η ω = 2.5 × 10 −4 . To ensure fair comparison, each episode generates different random seeds and in each episode the same random seed is shared across different algorithms to guarantee the same training/testing environment. We control the agents every (a simulation time) ∆T = 0.05 s and one episode lasts for T = 20 steps. Figure 5a shows the training curve of the MARL algorithm for the modified IEEE 34-bus feeder. It is clear that the used MARL outperforms these state-of-the-art MARL algorithms in terms of convergence speed. After 5000 training episodes, the obtained strategy was assessed 20 times for various load disruptions with the same random seed for each agent in each episode. The results of this testing are presented in Table 2 and Figure 5b that show the voltage profiles for nodes with inverter generators for simulation of one of the heavy load conditions of the system (load increase by 25%). As noted above, the secondary QU control aims to bring all DGs voltages to a reference value of 1 p.u. As seen in Figure 5b, in the case of a voltage drop, MARL-control in 0.4 s after the disturbance starts restores voltage to its nominal values. Additionally, the effects of decentralized voltage control are demonstrated by representing the results of operating parameters calculation on the IEEE 34-bus feeder graph for various experimental cases ( Figure 6). Comparison of Figure 6b,c indicates that the secondary QU control not only stabilizes voltage at nodes but also reduces overcurrent in system lines. For example, under heavy load conditions, the overloads in Line 0 and Line 1 are 122.68% and 44.36%, respectively (Figure 6b). Secondary QU control leads to a decrease in the current overload in these lines to 91.08% and 32.83% (Figure 6c), respectively. Additionally, the effects of decentralized voltage control are demonstrated by representing the results of operating parameters calculation on the IEEE 34-bus feeder graph for various experimental cases ( Figure 6). Comparison of Figure 6b,c indicates that the secondary QU control not only stabilizes voltage at nodes but also reduces overcurrent in system lines. For example, under heavy load conditions, the overloads in Line 0 and Line 1 are 122.68% and 44.36%, respectively (Figure 6b). Secondary QU control leads to a decrease in the current overload in these lines to 91.08% and 32.83% (Figure 6c), respectively.

Voltage Flexibility and Hosting Capacity Analysis
An analysis of PV hosting capacity, according to (7), relied on the Monte Carlo method employed for probabilistic modeling of a large number of various scenarios for the installation of future PV plants in the IEEE 34-bus feeder. Each scenario consisted of PV systems of a certain capacity connected to specific nodes of the system. The obtained statistical distribution of the maximum installed capacity of PV systems is an additional hosting capacity of the network. The assessment of the effect of the secondary QU control on hosting capacity rests on two experiments: with and without QU control of inverters. An approximate distribution of the maximum number of installed PV plants for these two experiments is shown in Figure 7. The results were obtained by simulating 50 different possible scenarios for the future installation of PV modules.
The results for the scenario without voltage control show that the hosting capacity of the IEEE 34-bus feeder ranges from 12.8 MW to 17.6 MW overall (Figure 7a). We can also note that for 50% of the runs it is between 14.1 and 15.3 MW (the median equal 4.8 MW of additional PV capacity). The results also show that the potential problems due to connection of additional PV plants arise due overloading of a transformer (in 86% of cases), and a violation of the voltage band (in 14% of the cases). In the scenario when we have a voltage droop control, the hosting capacity ranges from 13.3 MW to 17.9 MW overall, and the median is increased to about 15.3 MW (Figure 7b). As result, the figure shows that with intelligent MARL-control of inverters, the hosting capacity of the considered electrical network increases from H = 14.6 MW to H = 15.7 MW. The box plot showing the resulting distribution helps to understand the behavior of the network when more PV systems are installed. It shows the minimum, maximum, and average number of additionally installed PV systems that are the first to expect violations. As a result, with intelligent control of inverters, these potential violations (when the maximum hosting capacity is exceeded) are reduced only to an overload of transformers (which we do not control), but voltage violations was eliminated (Figure 7b).

Voltage Flexibility and Hosting Capacity Analysis
An analysis of PV hosting capacity, according to (7), relied on the Monte Carlo method employed for probabilistic modeling of a large number of various scenarios for the installation of future PV plants in the IEEE 34-bus feeder. Each scenario consisted of PV systems of a certain capacity connected to specific nodes of the system. The obtained statistical distribution of the maximum installed capacity of PV systems is an additional hosting capacity of the network. The assessment of the effect of the secondary QU control on hosting capacity rests on two experiments: with and without QU control of inverters. An approximate distribution of the maximum number of installed PV plants for these two experiments is shown in Figure 7. The results were obtained by simulating 50 different possible scenarios for the future installation of PV modules. The results for the scenario without voltage control show that the hosting capacity of the IEEE 34-bus feeder ranges from 12.8 MW to 17.6 MW overall (Figure 7a). We can also note that for 50% of the runs it is between 14.1 and 15.3 MW (the median equal 4.8 MW of additional PV capacity). The results also show that the potential problems due to connection of additional PV plants arise due overloading of a transformer (in 86% of cases), and a violation of the voltage band (in 14% of the cases). In the scenario when we have a voltage droop control, the hosting capacity ranges from 13.3 MW to 17.9 MW overall, and the median is increased to about 15.3 MW (Figure 7b). As result, the figure shows that with intelligent MARL-control of inverters, the hosting capacity of the considered electrical network increases from H = 14.6 MW to H = 15.7 MW. The box plot showing the resulting distribution helps to understand the behavior of the network when more PV systems are installed. It shows the minimum, maximum, and average number of additionally installed PV systems that are the first to expect violations. As a result, with intelligent control of inverters, these potential violations (when the maximum hosting capacity is exceeded) are Expression (8) was used to evaluate the effect of voltage flexibility rise when using MARL-control of inverters. For the considered series of experiments with a modified IEEE 34-bus feeder, FLEX V = 7.53 %. This means that the obtained difference H base − H cont = 1.1 MW determines additional power that can be used to optimize operating conditions of the distribution network (already within the framework of tertiary control), for example, when selling electricity at the TSO level, without the risk of voltage problems and possible overcurrent in lines.

Conclusions
The future will require active use of flexible energy resources connected to the distribution network, including those connected to the low voltage 0.4 kV grid, to provide flexible DSO and TSO services across new markets. New possible instruments of secondary control in distribution networks and microgrids with a large share of CIG, including the coordination of adaptive droop Q/U-controllers will further increase operational flexibility, hosting capacity, and degrees of freedom in TSO/DSO interaction within the framework of market interaction.
We have proposed an approach to decentralized secondary voltage control through inverter generation based on the MARL algorithm to assess new opportunities for increasing voltage flexibility in distribution networks with a considerable proportion of CIGs. With this approach, the electrical network is considered as a multi-agent one, where each agent (CIG) learns a control policy based on (sub-) global reward, local states, and encoded communication messages from its neighbors (other CIGs). Experimental studies based on a modified IEEE 34-bus feeder have shown that this approach can effectively stabilize the voltage at network nodes and prevent overcurrent in lines under heavy load conditions. In the context of TSO/DSO market interaction, such operating conditions can result from the power exchange between systems of different voltage levels during flexibility services rendered. Findings indicate that the proposed approach to secondary QU-control allows increasing the voltage flexibility of the network at the DSO level and maximizing the hosting capacity.