Strategies for Controlling Microgrid Networks with Energy Storage Systems: A Review

: Distributed Energy Storage Systems are considered key enablers in the transition from the traditional centralized power system to a smarter, autonomous, and decentralized system operating mostly on renewable energy. The control of distributed energy storage involves the coordinated management of many smaller energy storages, typically embedded within microgrids. As such, there has been much recent interest related to controlling aspects of supporting power ‐ sharing balance and sustainability, increasing system resilience and reliability, and balancing distributed state of charge. This paper presents a comprehensive review of decentralized, centralized, multiagent, and intelligent control strategies that have been proposed to control and manage distributed energy storage. It also highlights the potential range of services that can be provided by these storages, their control complications, and proposed solutions. Specific focus on control strategies based upon multiagent communication and reinforcement learning is a main objective of this paper, reflecting recent advancements in digitalization and AI. The paper concludes with a summary of emerging areas and presents a summary of promising future directions.


Introduction
Whereas traditional electricity utility grids operated in a centralized, top-down fashion, climate change action and the pressing need for decarbonization have seen trends towards decentralization, digitalization, and increasing deployment of artificial intelligence (AI) and automation. Smart grids can accomplish better generation and more efficient transmission and distribution of the generated power [1]. Smart grids provide comprehensive digitalization and automation of an electricity network and can be formed of a hierarchy of microgrids connected to each other to compose a large Smart grid [2]. The typical main objectives of Smart grid are grid supervising and situation awareness, system performance enhancements, reliability, resilience, and security improvements, improved economic operations, and distributed real-time intelligent control and protection of system components [1]. Furthermore, Smart grids offer support for enhanced penetration of renewable energy, which is the main aim for all European and non-European countries for a clean energy environment [3,4]. Within this smarter, autonomous, and decentralized system of microgrids-operating mostly on renewable energy sources-Energy Storage System (ESS) is considered as a key enabler in providing effective buffering against the inherent intermittency of renewable sources [5]. Developments in controlling microgrids including ESSs are a vital branch in the field of intelligent energy distribution systems, arising because of the need for optimized power distribution management.
The focus of this paper is on distributed ESSs, specifically to provide a thorough and up-to-date review of the literature related to their decentralized management and control. Decentralized control strategies form the starting point of this comprehensive review. Specifically, the focus is placed upon strategies that can perform localized ESS control tasks with no urgent need for supervising control. Droop control is the conventional foundation of decentralized control, which can provide balanced load sharing with no need for communication with other system components or a centralized controller. SOCbased droop control can provide accuracy and balance to the ESS, and virtual impedance droop control is a developed version that can also help to balance reactive power due to mismatch in line impedance. Droop control is technology agnostic and can deal with heterogeneous distributed ESSs, and AI-enhanced droop control has been proposed to achieve improved accuracy and balance of the storage system voltage, current, and SOC. Both secondary and tertiary centralized control strategies have been presented, which perform many control and enhancement functions via supervision of decentralized control strategies and correction of load sharing balance through trimming of voltage and current references. Multiagent-based control strategies combine decentralization with partial centralization by providing neighbor-to-neighbor communication between decentralized agents. Multiagent-based control solutions have been introduced for both secondary and tertiary services to enhance autonomy while reducing communication overhead. To accomplish distributed intelligent power distribution management, intelligent strategies have been presented (Q-learning, batch RL, Deep-Q-learning, and actor-critic). Each has differing features depending on the control objective and level of system complexity. The emerging intelligent strategies based on RL (synchronous/asynchronous; actor-critic; multiagent; priority experience; extrinsic/intrinsic) have been introduced when traditional intelligent strategies are insufficient to the high complexity of the system. This review aims to present a comprehensive and rigorous reference for researchers working in the field of distributed energy storage in microgrids, categorizing each approach and comparing advantages and disadvantages in each case, as well as describing the underlying logic and mathematical background of their operation. To further facilitate the exposition and discussions, a brief overview of methods and architectures is now given to aid subsequent classification of schemas, along with an introductory overview of the role of storage in microgrids to aid subsequent classification of services.

Energy Storage Systems Overview, Main Techniques, Classifications, and Control Architecture
The typical main objective of ESS in microgrid is to store energy that is generated out of consumer current need, e.g., in off-peak hours, and then re-inject it to enhance energy balance and sustainability when generation is not adequate to demand, e.g., in peak hours [1]. In contrast, there are many existing challenges, the majors are: charging/discharging balance, safety, reliability, size, lifecycle, cost, in addition to the overall control and management [6]. The traditional main ESS techniques are explained through the following points: 1. Lithium-ion: The typical lithium-ion battery energy storage consists of four main components: a cathode, anode, electrolyte, and a separator. All the components collaborate in accomplishing the objective of storing excess energy. The growing demand of the energy storage market encourages a progressed development of commercial lithium-ion to achieve batteries with higher energy densities, better safety, lower cost, and more prolonged life [7]. 2. Fuel cell: It is an energy storage technique that converts the stored chemical energy to electrical energy via an electrochemical process. Polymer Electrolyte Membrane (PEM) fuel cells are the major application of fuel-cells, and are recently widely desired because of their low operation temperature, high power density, high efficiency, and low emissions [8]. 3. Flow battery: The flow battery is a fully rechargeable electrolyte-based electrical energy storage technique, in which fluids are pumped through a cell in order to enhance reduction/oxidation at the ion exchange layer. A redox flow battery is considered a distinguished storage unit because of its high capability of storing electricity, which makes it more desirable than traditional batteries [9]. 4. Compressed air: It is typically a technique to store energy through compressed air during low-demand times, and this air can be used later to rotate a motor-generator and generate electricity [10]. 5. Flywheel: This technique is based on storing energy in the form of kinetic energy in a vacuum, and then using it to rotate a motor-generator and generate electricity [11].
ESS of microgrid network can likewise be classified depending on location and storage technology into three configurations: 1. Aggregated: Modelling is simplified when all ESSs are in one location of a predetermined microgrid network [12]. 2. Distributed: Distributed ESSs are scattered around different locations within a predetermined microgrid network [13]. 3. Hybrid: A combined application of ESSs with different storage technologies, which is necessary due to the lack of any ESS technologies which can individually provide all the mandatory characteristics [14].
All the benefits that are accomplished by ESS serve a major objective, which is the transition from the traditional microgrid network of centralized generation and control, to a smart, decentralized network of distributed sources and storage which is mostly based on renewable energy [15]. These benefits were aided by the accelerated trend in the field of renewable energy introduction as explained in the following points: 1. The urgent necessity to increase the introduction of renewable energy resources, such as photovoltaics and wind generators, has simulated the movement toward decentralized distributed ESS [16]; consequently, it has paved the way for a successful and beneficial transition to smart microgrid networks and reduced pollution [17]. 2. The gradual degradation of ESSs cost has prompted an increase in their use, for the purpose of storing excess energy, and other purposes [18].
The typical standard hierarchical control architecture of a microgrid model network is classified into three levels, as demonstrated in Figure 1, which relates to the hierarchical architecture of control levels and their specific roles of an AC-connected microgrid. These levels are explained as below: 1. Primary Decentralized Control: The objective of this level is to regulate the load sharing of distributed energy resources and storage, via the control of their linked converters output voltage and frequency, to attain balanced and autonomous operation of these distributed systems [19,20]. The most typical strategy of this is droop control, which is responsible for implementing balanced load sharing for the distributed resources and storage, with no necessity for time-critical communication links [21]. As demonstrated in the AC microgrid of Figure 1, droop control is present at each distributed ESS as a primary control. It receives measured active and reactive power and creates voltage and frequency offsets for the local controller. This, in turn, implements load participation that accomplishes the overall balance of the load sharing in the microgrid. 2. Secondary Centralized Control: Centralized secondary control has the responsibility of correcting voltage and frequency offsets that are achieved by the primary control. Therefore, it plays the role of an observer for the primary control. Moreover, it offers some additional roles, such as reactive power-sharing, accurate frequency regulation, and PQ compensation [22,23]. The application of an AC microgrid in Figure 1 illustrates the role of secondary control in correcting droop control offsets to the nominal microgrid references provided by tertiary control. The correction is based on the measured output voltage and frequency offered by each ESS. 3. Tertiary Centralized Control: This is the highest control level of the control hierarchy.
Typically, it is liable of two major objectives. Firstly, adjusting voltage setpoints, or providing optimal voltage references. Secondly, managing power entering or leaving microgrid, or solve optimal power flow problem (OPF) [24]. In addition, it operates in conjunction with other entities to implement overall objectives of providing balanced and sustainable load sharing [25,26]. Figure 1 clarifies how tertiary control in AC microgrid receives power flow management constraints and objectives, and then creates voltage magnitude and angle references that implement optimal power flow management.

Energy Storage Systems Roles and Objectives of Microgrid
ESS in general, and specifically when distributed within a predetermined microgrid network, provide several fundamental roles and services. It is typically cooperated with its mandatory power electronic converter, to support power-sharing optimization and autonomous operation reliability. These beneficial roles can be explained in the following classifications [27,28]: 1. Grid voltage support: Means power provided by ESS of a microgrid network, for the objective of maintaining voltage within a mandatory level, or acceptable range. This can be accomplished through the control of distributed ESS reactive power based on real energy generated.
2. Grid Frequency Support: It is active power that can be delivered by distributed ESS in a microgrid network, to compensate for any imbalance of frequency that is due to a sudden increase in load or generation. 3. Grid stability: ESS offers the opportunity of decreasing oscillation from the rapid introduction of an event during microgrid operation. 4. Peak shaving: Typically, the energy generated during the availability of generation or during off-peak times is stored in ESS and shafted to support during high demand time or absence of generation. Furthermore, distributed ESS can implement a shortterm demand independently with no obligation of generation. This, in turn, provides excellent support to distributed renewable energy resources, such as photovoltaic and wind turbines. 5. Spinning reverse: ESS offers support backup power for islanding. 6. Enhancing quality of power: ESS participates in improving power quality, through the involvement of reducing typical issues related to it, such as maintaining voltage and frequency offsets, reducing harmonics, maintaining the balance of voltage, and improving power factor. 7. Support reliability: ESS is collaborating on enhancing system reliability in implementing consumer demand. 8. Ride through support: ESS can offer essential energy during the conditions of a disturbance or voltage sag, which affects system reliability. This, in turn, helps to keep electric units connected for the duration of these disturbances. 9. Compensation of unbalanced load: The collaboration of ESS through the individual injection/absorption of power supports the compensation of an unbalanced load.

Contribution and Paper Structure
Given the historic and more recent developments in this research area, this paper aims to provide a comprehensive review of control strategies for energy storage in microgrids. In this review paper, existing research challenges are presented and areas for further research are subsequently identified. The methodology employed to search the literature and select relevant works in each category was as follows. Research that aims to provide distinctive, clear, and comprehensive implementation to the control strategiesand addressing them as a main objective of the paper or article-were first selected. Outof-scope, incremental or similar work was then removed. In total, 131 of the most relevant and non-incremental have been selected for this review, distributed across each of the categories in Figure 2. The remainder of this review paper is indexed as follows. Section 2 covers decentralized control strategies and concepts, while centralized control strategies are presented in Section 3. Section 4 covers multiagent-based control methods. Intelligent control strategies are presented in Section 5, along with an explanation of the most promising directions of further research. A distinctive summary is presented for each section, which highlights the major strengths and weaknesses of each strategy. Section 6 is reserved for the emerging intelligent techniques and then a final summary and conclusion are presented in Section 7.

Decentralized Control Strategies of Distributed ESSs
To achieve sustainable and balanced power-sharing by implementing load demand, typically, distributed ESS in a microgrid is locally controlled by decentralized control strategies, which includes strategies that can perform the task with no urgent need of supervisory control and can be operated with only local information. The block diagram in Figure 3 illustrates traditional standard decentralized control of an AC microgrid consisting of five distributed ESSs. In which, any of the ESSs is controlled locally, with no central or supervisory controller. Decentralized droop control is the typical, traditional standard strategy for this role, which operates in participation with the local controller to regulate the output voltage and load sharing current of ESS. A conventional power electronic converter exists as an interface to the microgrid bus. The major, valuable feature of droop control is decentralization, in addition to the lack of need for a communication link between distributed ESSs. In contrast, it holds a weakness, which is that implementation of unmodified droop control provides only approximation balance to the output parameters. For this reason, the strategy has progressed through several stages of development and updates, in order to achieve more accuracy and stability. Some are based on the introduction of new parameters, which contribute to accomplishing more reliability, and the others are through the integration of other strategies to play its mandated role as a stage of a comprehensive strategy. This paper presents the major feature of these developments, each of them having accomplished successful solutions to a specifically diagnosed major drawback or weakness. The achievement of State of Charge (SOC) balance of ESS is fundamental to accomplish overall load sharing balance, in addition to maintaining safety and supporting prolonged life of the storage. The introduction of virtual line impedance is a great solution to transmission line impedances, which, in turn, support accomplishing optimized stability as well as reducing losses and maintaining infrastructure. The control of distributed ESSs in different technologies is no less important than what has been mentioned, which specifically aims to adapt droop control to be qualified for balancing load sharing for heterogeneous ESSs [29].

Traditional Droop Control
Droop control is the standard decentralized strategy to control the distributed ESS and to interface the microgrid to the bus through conventional power electronic converters. It mimics the governor and extractor operation of synchronous generators, which is controlling frequency based on its control of speed and fuel. This illustrates the objective of providing balanced output voltage and frequency through the control of active and reactive power. The idea behind it is to add virtual resistance, which differs from genuine resistance by being unaffected by operation conditions. An example of these conditions is the temperature, which causes losses of power. This virtual resistance is typically named droop gain or coefficient [30]. For an AC microgrid of low voltage, the balance of the output frequency is achieved depending on active power (f-P), and the magnitude of output voltage is dependent upon reactive power (V-Q). The typical features of voltage and frequency droop characteristics are demonstrated in Figure 4. As given in (1), the active power droop coefficient ( is multiplied by the measured active power , and then subtracted from the reference velocity ( * to achieve the desired velocity . Meanwhile, the reactive power droop coefficient, as presented in (2), is multiplied by the measured reactive power , and subtracted from output voltage reference ( * to attain ( . Therefore, frequency is inversely proportional to the measured active power, and voltage magnitude is to the measured reactive power [31]. Decentralized droop control is implemented on distributed ESS in a DC microgrid, and power-sharing is directly proportional to the values of output voltage and current (V-I). In fact, standard droop control with no modification is unqualified to provide full balanced power-sharing of distributed ESS, because SOC is not considered. * (1) *

Virtual Impedance Droop Control
The difficulty of balancing reactive power-sharing among parallel droop-controlled inverters in AC microgrid is an existing obstacle, especially when there is a mismatch of line impedances. Virtual impedance droop control is an updated version of traditional droop control that adopts virtual impedance theory to compensate for the mismatch of line impedances, which is considered a drawback through its effect of reactive power balance. The theory involves a modification of inverter output voltage droop control, as proposed in [32], to achieve the equivalent model and eventually accomplish balanced output voltage. Reactive power is balanced when voltage drops from each parallel inverter ( , ) are equal (see Equation (3)). Then, first inverter virtual impedance is stetted to zero 0 , which qualifies another inverter virtual impedance ( ), which is given in (4), to eliminate the mismatch of line impedances ). The voltage drops that occur due to the accomplished virtual impedance are subtracted from droop control voltage reference to attain a reference voltage that implements balanced reactive power between the inverted distributed units, as demonstrated in Figure 5, which clarifies the introduction of virtual impedance to droop control for two inverters connected to the same AC Bus.

Droop Control-Based SOC
Droop control-based SOC is a modified version of traditional or standard droop control that includes SOC of calculating or weighting droop coefficients [33,34]. The objective is to accomplish balanced SOC of the distributed ESS, in addition to extending its life [35]. Therefore, it is named SOC weighted droop control and it is achieved by adding SOC as an exponent to the weighted droop coefficient [33,35]. In [36], a modified weighted droop control has been proposed to regulate bus voltage when power changes. It is clearly clarifying how traditional droop control is modified to accomplish the mandatory SOC balancing. As demonstrated in (5), a droop control action-based SOC for both discharge/charge modes is accomplished through the multiplication of discharge/charge droop coefficient ( , by energy storage output power , and the exponential of the computed SOC / , and then subtracted from the reference control .
A comparison between this developed SOC-based droop strategy and traditional droop control has clarified that the SOC-based droop strategy significantly enhances the SOC balance of distributed ESS and improves the balance of sharing current during load fluctuations. Furthermore, it supports a prolonged life for the storage. The objectives of SOC-based droop control have been expanded by C. Gavriluta et al. [37] to include the determination of microgrid voltage and frequency offsets, through its effect of adjusting microgrid voltage and frequency when included within droop control.
A more recent dynamic SOC-based droop control strategy has been proposed in [38], to control battery-based distributed energy storage systems (BESSs) in a DC microgrid network including constant power loads (CPLs). The aim was to recover and stabilize microgrid DC bus voltage and power distribution in the case of a time-varying droop coefficient. The major contribution of this strategy was that local information of BESSs SOC can be shared by a dynamic consensus algorithm and the introduction of a nonlinear disturbance observer (NDO). Implementation has shown optimized system stability and rapidity. Furthermore, DC bus voltage has been maintained with appropriate sustainable power distribution.

Fuzzy Logic Droop Control
Droop control-based-SOC suffers from two specific weaknesses. First is the overloading of high SOC-distributed ESSs, which is due to the lack of participation of low SOC-distributed storage. Second is the instability of voltage and frequency, because of the increase in droop coefficient when all distributed ESSs reach low SOC level. For these fundamental reasons, fuzzy droop control has been developed [39,40], which is a modified version of standard droop control to schedule droop control coefficients gains. Particularly, involving output voltage and SOC of weighting these coefficients. The objective is to accomplish balanced output voltage during the condition of all distributed ESSs at a low SOC level, where microgrid output voltage ( , which is mentioned in (6), is balanced through the regulation of droop control virtual resistance ( ) based on SOC estimation and output current ( . In fact, Fuzzy logic droop control has the beneficial feature of implementing more than one control objective. It reduces voltage deviation ( between microgrid bus voltage ( and reference voltage ( ), as clarified in (7). Furthermore, is adjusted based on fuzzy SOC estimation A decentralized control strategy based on fuzzy logic has been proposed for AC islanded microgrid to balance the SOC of distributed energy storage [41]. Figure 6 highlights the methodology, and how the fuzzy inference system (FIS) has been integrated to droop control. A constant voltage charger exists to prevent the battery current from falling below a certain level; thus, the distributed ESS is kept operating on a current control mode (CCM). The new weighting factor (see Equation (9)) has been suggested and estimated by a Fuzzy inference system (FIS) for each distributed ESS based on SOC, to attain the correct value of droop coefficient . Then, the accomplished estimation is applied to the droop control, which implements the balanced SOC of each energy storage based on the correct power injection/extraction at the common bus [41]. *

Droop Control of Different Technology-Distributed ESSs
Droop control of different technology-distributed ESSs comprises droop strategies that are modified to control distributed ESSs of different or heterogeneous storage technologies. These technologies are typically classified into two groups: (1) peak shaving and regulating power quality; (2) energy shafting and spinning reserve. Ultracapacitors are a very common example of these technologies and can significantly influence energy balancing for ESS. Specifically, a long lifecycle, low power cost (USD KWh), and high rate (KWh/Kg) make it applicable to accomplish the optimized quality of high-frequency load demands. The typical ESS that comprises numerous storage technologies is known as a hybrid system. Droop control has been implemented to control the primary frequency of two different storage techniques ESSs, BESS, and superconducting magnetic energy system (SMES) in a hybrid standalone AC microgrid [42].
Where and are battery and SMES droop coefficients, ∆ and (∆ are battery and SEMS contribution power. As explained in (10) and (11), when Microgrid frequency is at non-critical frequency, as chosen by the UK grid code [43], then no action is obliged by droop control on the storage. In contrast, when the frequency is more than critical-up frequency ( _ ), then the storage is charged to absorb this excess, whereas if it is less than critical-low frequency ( _ , then they are discharged to compensate. The introduced power-sharing method is an optimized droop control strategy, to control the primary frequency of a heterogeneous ESS consisting of battery and SMES. It is accomplished that frequency stability has been improved. Moreover, the optimal output power is achieved in different power situations due to its capability of adjusting droop gain for both storage units.
In [44], a composite droop control strategy has been planned to control heterogeneous ESS consisting of battery and supercapacitor in DC microgrid. In particular, the strategy has proposed a high pass filter-based droop (HPFD) for battery converter, and a virtual capacitance droop (VCD) controller to control the supercapacitor (SC). Therefore, a collaboration of two control strategies has been demonstrated, and several control objectives have been accomplished. Fast fluctuations of SC were buffered with compensation of low-frequency power mismatch. Meanwhile, bus voltage was regulated with a recovery to supercapacitor SOC. As given in (12), the voltage deviation ( of battery output voltage ( is maintained by a compensation voltage (∆ that added to it. Reference or nominal voltage ( is increased by ∆ to compensate for the deviation. Here, is virtual resistance, and is battery output current. -∆ A successful recent Fuzzy logic-based control strategy has been proposed by G. Bharathi et al. [45] for a DC microgrid network consisting of a photovoltaic system, fuel cell (FC), and BESS. The strategy has presented a Fuzzy solution to the heterogeneous energy storage system to stabilize power distribution and regulate bus voltage. The role of the heterogeneous energy storage system here is to retain DC bus voltage under the control of the new proposed strategy. Specifically, droop control is mitigating DC bus voltage fluctuations while fuzzy logic control is enriching power exchange under different dynamic situations. Simulation of the system with the new proposed strategy has verified an optimized performance and balanced power for different dynamics.

SOC Balancing of Modular Multilevel Converter Energy Storage System
A modular multilevel converter (MMC) has existed in many high voltage, high power applications as an alternative to the conventional converter because of its brilliant properties; in particular, when interfaced with an ESS to attain a modular multilevel energy storage system (MMC-ESS) that can provide excellent support to the performance of grid applications when connected to the grid [46]. To achieve the necessary performance support, it is vital to ensure a smooth connection to the grid under the design of a properly qualified control system [47]. One of the crucial diagnosed control drawbacks is the unbalance in SOC of ESSs, which is due to different charge and discharge speeds, and might cause more excessive drawbacks rather than the overcharge or overdischarge of any of the storage units [48]. Unbalanced SOC might lead to two more defects: (1) unequal battery voltages, which, in turn, can induce DC components of the injected grid current; (2) an internal circulating current [48].
Many successful solutions have been proposed in the literature to solve this drawback and attain a balanced SOC. A distinctive one has been proposed by F. Geo et al. [49], who suggested a novel control strategy to optimize the performance of MMC-EES. SOC balance was one of the objectives, in addition to the suppression of circulating current and the grid DC current. The strategy has suggested the adjustment of real power for each half-bridge according to the difference in SOC. The result has indicated the success and validity of the proposed strategy with an effective balance of SOC for all batteries, and suppression of circulating current and grid DC current. A three-level SOC equilibrium method has been suggested and designed by H. Laing et al. [50] to a BESS interfaced to MMC to balance batteries energy through the balance of their SOC. The development was an attempt towards extending life or reusing the second life of batteries in electric vehicles. The new strategy has encompassed the introduction of power regulations that are based on battery capacity proportion for the three-phase legs to attain balanced SOC of batteries, among three-phase legs of upper and lower arms, in each specific phase, and of submodules in each arm. Implementation of the developed strategy has verified an effective overall SOC balance.
A summary of the reviewed decentralized strategies has been established in Table 1, which highlights the major strengths and weaknesses for each of these strategies. An existing drop of bus voltage but within the acceptable limit.
[38] DC An achieved system stability and rapidity. Balanced SOC. A disconnection of BESS with low SOC (less than 10%).
A constant power loads (CPLs) (load fluctuation has not been considered).
[39] DC A developed fuzzy control strategy that can do more than one control objective. Reduced voltage deviation.
An existing minor deviation of voltage but within an acceptable limit.
The same controller can be selected for different values of Rd (virtual resistance loop). Good balance of SOC (faster than fixed virtual resistance strategy). Decentralization.
[40] DC Self-controlled (based on local parameters) with good SOC balance. Modular and expandable. Decentralization. Multi-objective control.
Faster charge.
A prioritization of SOC balancing over the regulation of voltage deviation. An existing minor voltage deviation, but within an acceptable limit.
[41] AC Decentralization. Good SOC balance. No obliged modification for the introduction of a new active generator. Asymptotic approximation to SOC during storage operation. Fast charge compared with the traditional method. Furthermore, a reduced depth of discharge. FIS is applicable for DC microgrid.
An existing minor voltage deviation between batteries. Good SOC balance but to an extant (still some unbalance exists).
[42] AC Good frequency balance. An improved SOC balance of battery.
Longer battery life.
A minor fluctuation of battery SOC. No consideration to bus voltage.
Bus voltage restoration. Supercapacitor, SOC recovery. The accomplishment of transient powersharing with no effect of voltage and SOC.
Still, a high snap of supercapacitor current when the load fluctuates. A little instability of battery current.
[45] DC Quick power balance. Maintained system performance with different dynamic situations.
No consideration to temperature effect.
Suppression of grid DC current.
High total simulation time (17.5 s) due to computer limitation.
[50] MMC-BESS SOC balance of all battery modules. Improvement to utilization of second-life BESS.
An existing minor deviation of SOC estimation, which is affecting the final SOC convergence.

Centralized Control Strategies of Distributed ESSs
Centralized control strategies offer direct control and individual monitoring of distributed ESS in a microgrid. The block diagram in Figure 7 demonstrates the standard centralized control of distributed ESSs in an AC microgrid, consisting of five distributed ESSs. Here, direct control exists between the central controller and any of the ESSs. The centralized control strategies are classified depending on the role of control action into two types. The first (secondary) aims to regulate power quality, such as correcting voltage/frequency, while the second (tertiary) optimizes the power flow dynamic.

Centralized Secondary Control
The secondary control system has been classified within the standard hierarchical architecture of microgrid control, as the regulator of voltage and frequency offsets to the primary level [23]. However, control objectives have been extended in [51] to include the correction of voltage balance at the common coupling (PCC) of an AC microgrid. Particularly, the adjustment of power exchange depends on a central controller request to regulate the output voltage according to the secondary control. A further secondary centralized control objective has been accomplished by M.H. Andishgar et al. [52], in which a powerful secondary control strategy has been proposed to improve the total harmonic distortion (THD) at sensitive load bus (SLB). This, in turn, has optimized the THD.
Where , are the integral and the proportional gains, is the voltage in frame of each harmonic. Based on the proposed development, fifth, seventh, and eleventh harmonic distortions (13) have been extracted depending on the voltage harmonic components of each voltage , / , , which are obtained by the multiple-second order generalized integers and frequency locked loop (MSOGI-FLL). The THD is calculated and compared with a reference value of the THD to achieve the total harmonic compensation signal ( , ) (see Equations (14) and (15)). The modified total harmonic compensation signal ( (refer to Equation (16)) was accomplished, and the total harmonic at the SLC bus was improved.
, , , With advances in technology, the objectives of a secondary centralized control system have been expanded to include the balance of SOC. Y. Guan et al. [53] suggested a secondary control strategy to balance the rate of discharge for the ESSs in AC microgrid. It has eliminated the deviation of voltage and frequency that created in droop control, which is due to the unbalance of the storages SOC. A secondary SOC-based control has been added to enhance the primary control strategy of BESS in standalone microgrid [54]. The objective was to restore the SOC deviation that appeared at the primary level. This deviation is caused when a variation of load happens, which, in turn, requires more active power of battery energy storage. This demanded active power is prompting a deviation of SOC. Then, the deviation is sent to the secondary control after a small communication time delay to be restored. A secondary central control layer has been proposed in [55] with the adaptive droop regulated primary level, of a developed control strategy for DC autonomous microgrid consisting of several distributed generators (DGs) and two distributed batteries. The role of supervisory control here is to monitor and regulate distributed batteries charging and discharging to support a prolonged lifecycle, and to maintain the voltage balance. According to the developed supervision protocol of the distributed batteries, SOC in normal operation is forced by virtual resistance to be balanced, and the one with higher SOC is the first that is fully charged. if energy production of the system is disturbed, then batteries will discharge, in a manner that the one that is firstly fully charged is the initiator.
As shows in (17), state of charge variation with time ( ) of any of the distributed batteries is based on the rate of change that subtracted of the initial state of charge 0 which is inversely proportional to battery rated capacity ( , . Here, is charge/discharge efficiency, and , is the battery current. The batteries, according to secondary control supervision, are swapping charge and discharge with no one falling below 90% of its full capacity. Other applications of centralized secondary control for balancing SOC of distributed ESS have been presented in [56,57]. Here, secondary control has implemented the objectives of managing the amounts of energy and power of distributed ESSs, which were essential for maintaining the instabilities of generation, balancing load demand, improving power quality, and enhancing backup power. Furthermore, the balance of SOC has achieved a fundamental objective of reducing the maximum depth of discharge that supports a more prolonged life of the distributed storage. Ultimately, system sustainability and overall efficiency have been optimized. The control strategy proposed by Z. Jin et al. [58] is an advanced application of secondary centralized control in DC distribution, which is deemed as one of the most current trends for future mobile power systems. A secondary centralized control has been acted within a hierarchical control to accomplish the objectives of a large-scale mobile power system of shipboard that consists of a DC network and ESS. The management control system has collaborated with a primary adaptive inverse droop control to verify comprehensive control treatment and achieve two main objectives: (1) the management of control level, which, in turn, supports the collaboration of BESSs to coordinate number and fuel consumption for running agents; (2) the restoration of voltage level through the compensation of voltage drops at droop control.
Rule-based control (RBC) is an outer or secondary control that has been applied to a BESS, for the aim of controlling charge/discharge on an hourly basis, and creating a controlled current reference that is convenient for this role. Additionally, the proposed controller has considered an SOC balance and charging constraints [59]. As shown in Figure 8, the controller is taking renewable generated power from wind or scholar, hourly dispatched power set point, battery SOC, and battery voltage, as inputs. Output is a current reference ( , and within a limit of maximum charge and discharge current ( , , , as illustrated in (19). The SOC is kept within the obliged lower and upper limits ( , (refer to Equation (18)). (18) , , The rule-based secondary control strategy has been applied in [60] to a distributed energy storage system consisting of a vanadium redox battery and supercapacitor and fed by photovoltaic generation. The aim was to manage (charge/discharge) on an hourly basis and with the introduction of mandatory constraints. It is of scientific and technical interest to explain the purpose of using a supercapacitor with the vanadium redox battery, despite its high storage capacity. Vanadium redox battery's main characteristics are independent with its energy and power densities, a long lifecycle, no limitation of discharge depth, and good efficiency. However, its response time is limited by an electrolyte that controlled by a pump. Therefore, their flow rate needs to be maintained. On the other hand, the supercapacitor major features are storing energy in a form of electricity and no need for a conversation to other kinds, very high efficiency and power density, deep discharge, and long lifecycle. Despite this, it has a very low energy density and cannot be used for longterm storage. The benefit of using a supercapacitor in parallel with the vanadium redox battery is reducing the rating of the redox battery. Furthermore, combining the good features of both storage techniques allows us to obtain a qualified ESS that satisfies its purpose when connected to a PV system [61,62]. Another effective application has been proposed by C. Wang et al. [63] to involve RBC as a secondary controller in a combined control strategy of a central controller and local controllers, for the aim of controlling ESSs power flow on an hourly basis.
Economic Model Predictive Control (EMPC) has been implemented for a residential distributed ESS that was fed by photovoltaic generation in [64]. The aim was to optimize their power flow (charge/discharge) based on a time varied tariff. As demonstrated in (20), which demonstrates the overall power that supplied and drawn from the ESS at time K. The EMPC controls the power flow of the distributed ESS at a time interval (K) ( ), which is the summation of charge and discharge power ( , ) (refer to Figure 9) and based on a SOC within a minimum and maximum limitations ( , ), as explained in (21). The block diagram in Figure 9 clarifies control objectives of the specific strategy. This demonstrates that the controller takes net load power, renewably generated power, Tariff of importing power from the grid, Tariff of exporting power to the grid, and tariff of using power from the battery storages, as inputs. Moreover, measured SOC at time K is as feedback. Control objective was to provide control of power at charge and discharge for every K, which, in turn, accomplished optimized economic cost of residential houses demand based on different time tariff. (20) (21) It is clearly clarified from the objectives of RBC and EMPC strategies that both are managing charge/discharge for the storage in a constrained manner, which, in turn, implements low power consumption based on energy storage support. In contrast, the major difference is that RBC does not consider the cost of electricity, and the cost of batteries' lifecycle degradation when controlling their charge and discharge. For comparison, both strategies were applied to systems of identical characteristics, and independently operated with no grid supply to store excess PV generation, and then used later in peak times. The result of an aggregated demand for 30 consumers shows that EMPC had implemented a reduction in peak demand during peak times (between 17:00 and 20:00), more than RBC. In contrast, reduction in RBC is higher at off-peak (between 0.00 and 7:00). This proves that EMPC has predicted peak demand and shafted it to offpeak times, which is the time of energy storage recharging. Therefore, EMPC was more successful than RBC in improving a 1-day load profile for the nominated group of consumers.

Centralized Tertiary Control of AC Microgrid
The major objective of tertiary centralized control is to provide optimal voltage references or offsets. Moreover, it manages power flow into and out of microgrid predetermined network [24,25]. AC optimal power flow problem can be defined as a nonlinear and non-convex problem of enhancing generation dispatch in a manner that accomplishes the lowest cost that is accepted by consumers. Besides, it considers the availability of active and reactive power [65]. Therefore, the non-convexity adds more complexity to the computation. Additionally, only approximate solutions are provided. These are the major drawbacks of the microgrid tertiary level. The execution of OPF focuses on managing power flow from the main grid to the microgrid, and vice versa. In addition, the optimum use of the available generation and storage units reduces power consumption. Tertiary control strategies in AC microgrid can be divided into four classifications depending on the approximation and ESS power management.

Single/Aggregated Distributed ESSs
It comprises dynamic optimal power flow (DOPF) solutions of distributed ESSs that represent single or aggregated capacity. Tertiary centralized strategy controls power flow between single/aggregated distributed ESSs and the main utility grid. No management is provided by tertiary towards power flow between these distributed storages. The control strategy that has been proposed in [66] is an effective application. In this, a tertiary energy management system (EMS) has been applied to the developed control strategy to accomplish balanced power within the microgrid network. The overall management has comprised two control managements; the first is for power flow of each converter, and the second is for power in the microgrid network under different generation and load circumstances. As given in (22), the clarified power management aims to accomplish power balance between PV generation ( , the summation of AC/DC load power ( , , and battery power ( . Therefore, the balance is implemented whenever loss power ( ) is reduced. Another application of single/aggregated tertiary central control of distributed ESS has been introduced in [67]. -

Ideal Real Power Transfer
Tertiary DOPF solutions consider the management of real power transfer between distributed ESSs. The energy management system that has been proposed by A. Ouammi et al. [68] offers clear clarification of this. A central controller and an energy management unit (EMU) have been combined with a model predictive controller to manage power exchanging scheduling among a group of interconnected smart microgrids. Several fundamental roles have been outlined; the most important of these was the support for an autonomous operation of microgrid through the management of its components, such as distributed generation control and charge/discharge schedule, while also providing information on power production and prices to the introduced module predictive controller to control power exchanging. Another important role was interfacing microgrid components to the central or global controller (GCC). Consequently, the microgrid exchanges power with other microgrids or disconnects, in the case of network failure, while the ESSs compensate power shortage via charge/discharge, depending on the operated task.
F. Garcia-Torres et al. [69] proposed a tertiary central control as part of an optimized energy storage management system of two hybrid ESSs distributed in an AC-connected microgrid. The main aim of the MPC-based strategy was to solve the lack of competition in the electricity market due to the unpredictability and deviations of renewable energy. This strategy takes advantage of the high storage density of hydrogen as one of the distributed energy storage units, and an optimized energy management system based on MPC has been designed to support more economical benefits and to support a reduced degradation of the distributed ESS. Another application of tertiary control to minimize the expected operation cost of microgrid has been suggested in [70]. Here, stochastic dynamic programming has been proposed as a solution to the optimum Microgrid operation that is determined by unity commitment (UC) and the economic dispatch (ED). A one-day to one-week preform of start-up, shut-down, and operating costs have been used. This was followed by an ED preformation for a few minutes to one hour, for the economic online allocation of units, and with a consideration of all system units and constraints.

Convex Approximation
Convex approximation or optimization of OPF means relaxing some constraints of the original problem and obtaining a convex model. This approximation can be used for a high reactance to resistance ratio (X/R) network to approximate the DC power flow under the assumption of reactive line impedance and a small difference of voltage angle [71]. An advanced application to the convex approximation to slave DOPF problems has been presented in [72]. Here, a developed strategy has been proposed as an EMS to single-phase or three-phase AC microgrid with distributed generation and storage units. Robust convex optimization was employed for a limited time horizon to minimize the cost of energy, import, export, and dispatch of the DG, in addition to the operation of ESSs. Moreover, it considers the self-discharging rate and SOC of the distributed energy storage. The developed EMS has been assessed via the Mount Carlo simulation method, and success has been verified, which states that power balance in the microgrid network is determined by the main utility grid and ESSs in connection mode through a collaboration to estimate the difference between local consumption and local production. K. Garifi et. al [73] proposed a convex relaxation to neglect constraints that were enforced of charge and discharge for ESSs in a grid-connected microgrid network. The solution was through the introduction of an MPC-based DC OPF penalty improvement approach. The specific development comprises a modification to the cost function to include a penalty function to remove charge/discharge constraints. Furthermore, Kuhn-Tucker conditions have been utilized to confirm satisfaction of the convex relaxation to the constraints. Simulation of the proposed system has been run off a multiple IEEE test system, to achieve reduced computation time, compared to the previous approach with constraint ESS.

Non-Convex Approximation
Non-convex strategies introduce non-convex approximation solutions when the objective or any constraints are non-convex, which comprises a combined mixed-integer linear programming and nonlinear programming of solving DOPF in a Microgrid that includes ESSs within its predetermined network. Furthermore, unbalanced phases are considered by the non-linear programming [74]. One of the solutions based on stochastic gradient descent-based optimization of parameters has been applied in [75] to optimize the non-convex problem. The microgrid nominated for the experiment consisted of distributed ESSs, microchips as a controllable DG, and uncontrollable DG. A developed version of central EMS has been proposed in [76] to optimize power dispatch of distributed ESS in an isolated microgrid. The development was through formulated mathematical programming centralized EMS with the help of MPC. Additionally, with generation and operational limits, this version was proposed to manage generation, balance power flow, and to provide settings of system operation and the balancedistributed ESS. Moreover, it was proposed to support the backup power of islanding. The decomposition of the mixed-integer nonlinear formulation problem (MINLP) into the mixed-integer linear programming (MILP) and UC was a sign that this solution might be superior to other solutions that were previously presented. Simulation of the proposed solution has demonstrated less computation time compared to other solutions [76]. D.E. Olivares et al. [77] raised objectives of the previous strategy to include stochastic mixedinteger programming formulation, in addition to a second stage OPF, and under the employment of nonlinear programming formulation. Therefore, both stages have cooperated in addressing the uncertainty of the same isolated microgrid. Decisions have been made via the proposed two-stage process in which a commitment was decided by the linear stochastic unit commitment (SUC), while final dispatch was accomplished by the shrinking horizon optimal power flow (SHOPF). Since SUC was responsible for commitments, it supports a fixed SOC boundary for the distributed ESS.

Centralized Tertiary Control of DC Microgrid
Solutions to the DC dynamic power flow problem (DC-DOPF) in DC microgrids have been suggested in the literature through many successful attempts. One distinctive solution is a power flow management based on MPC that has been proposed to solve the DOPF problem in a DC microgrid network [78]. The objective was to manage the power flow of DG units based on renewable predictions. Moreover, the capacity for controlling distributed ESSs power flow depends on their SOC. Another successful proposition has been suggested by M. Gulin et al. [79], in which a stochastic optimization problem has been diagnosed and solved through the design of a developed tertiary management system; specifically, a two-stage programming solution with the incorporation of an MPC to compensate for the uncertainty of the feedback mechanism. One of the valuable achievements was a successful integration between the ESS and the grid, in addition to the optimized energy management and minimized operating costs.
A centralized tertiary control has been acted within a hierarchical control approach in a most recent study by J. Zhang et al. [80] in DC microgrid supplied by distributed BESSs. The tertiary control has evaluated current sharing weights depending on the batteries SOC. While secondary control has included a unit control error (UCE) for the roles of restoring DC voltage of microgrid and accomplishing an accurate load sharing of batteries depending on the weights achieved by the tertiary level. The main aim of the developed strategy was to attain an optimized battery discharge management, which leads to a balanced sharing of the demand. Simulation of the new strategy has proved system validity and effectiveness.
A summary of the reviewed centralized strategies has been presented in Table 2, which explains the major strengths and weaknesses of each strategy. Life and protection of batteries have not been mentioned.
[58] AC Balanced power-sharing of different resources depending on their actual power and energy rather than the rated capacity. Efficient fuel consumption to a limited extent.
Accomplished power-sharing of onboard sources independent of load conditions.
A degrade of fuel efficiency under load variations.
[59] DC Balanced SOC within the mandatory levels.
The battery current has been directed to the renominated limits. Great catch on for injected power to set points.
Better performance than the wind case method.
[60] AC The capability of dealing with solar forecasting error up to 60% with no brake to dispatch rules. Optimized dispatch performance than basic control (delta-balance). Active, less-complex computation, and easy for practical application. Adapted to other electricity markets.
Thresholds and HESS are required to be reevaluated.
[63] AC There was an overshoot of current when transfer from autonomous to the gridconnected.
[67] AC Optimized economical operation compared to other existing strategies. An advantageous tradeoff between peak demands and consumers has been achieved through an acceptable restriction.
No consideration for heat recovery.
No demonstration for reactive power exploitation.
[68] Smart Future planning of, power exchange, and charge/discharge profile, of each ESS, among different Microgrid networks, has been predicted via an MPC-based algorithm.
Real-time communication failure with some grids of the system.
[69] AC Optimized operational scheduling (because price variation with time). The effect of ESS towards a dispatchable generation. MPC has minimized the cost function.
Formulation complexity under the use of MPC.
[70] AC Economic dispatch has been achieved without the introduction of a further objective function. An optimized operation with day-ahead scheduling and high probability.
Due to the limitation of batteries' discharge depth, they barely discharge for only some hours at night. Electricity price has not been considered.
[72] AC More flexible and faster than stochastic. The applicability of offline solutions, since implementation has proved that execution times are steady with expectations of mixedinteger programming.
Convenient for short-term energy management scenarios (this could be an advantage or disadvantage based on the purpose).
[73] DC Satisfied ESS charge/discharge in the situation of a taken relaxation. Reduction in computation time. Negligible influence of penalty time.
Unsatisfied charging/discharging in the case of non-simultaneous ESS. No participation of load in demand response.
[75] AC Improvement of performance. Uncomplicated implementation. Less computation time. Active in implementing large system solutions. Reduction of the cost compared to MPC.
Convergence was after 1000 iterations. However, stability value was 0.4 < 1, which improves the activity of large system solutions.
[76] AC Reduced computation time. The decomposition of the MINLP problem has enabled the solution to be solved within the desired time.
The quality of solutions needs to be improved. Specifically, the impact of primary controllers. More robustness against uncertainties is mandatory.
[77] AC Improved total operational cost. Qualified accounting of uncertainty in power predictions. Improved computation time.
[78] DC An optimized operation, because of the accomplished scalable solution. Good interaction between the proposed controller and the local controllers. Lower average solution time.
[79] DC Allows more flexibility, through an advantageous tradeoff between constrains violations and the achieved revenue. Minimized microgrid operation cost.
Penalizing any constraint with higher/lower prices for importing/exporting energy will not prevent constraints violation but keep it as small as possible (violation is still existing).
[80] DC Balanced SOC. Successful restoration of DC voltage. Accurate sharing of batteries current. System effectiveness has been accomplished.
Tested only for two batteries.

Distributed Control Strategies Based on Multiagent Communication of Controlling Distributed ESSs
Decentralized control strategies are incapable of exploiting the full capacity of distributed ESS since it only depends on local information. Centralized control strategies require an adequate infrastructure for maintaining communication between the distributed ESSs. Therefore, both have a weakness in optimizing the combined energy and power of the storage system. This results in an urgent need for strategies that combine decentralization, in addition to communication with other units. Distributed multiagent systems have been developed for this purpose, as presented in Figure 10 which shows the application of a multiagent neighbor-to-neighbor communication network on the AC microgrid consisting of five distributed ESSs, each representing an independent agent. It exists under two main categories: secondary and tertiary.

Secondary Multiagent of Controlling Distributed ESSs
Under this category, each distributed ESS agent operates autonomously with a presence of a neighbor-to-neighbor communication. Accordingly, they share information, such as SOC level, load current, output voltage, and power consumed, for the aim of balanced implementation for load demand. It addresses the problem of a cooperative consensus of distributed ESSs under a multiagent neighbor-to-neighbor information sharing [81]. Then, development has been carried out of the implementation for the distributed secondary multiagent to include the introduction of an optimal controller [82]. The classical theory has been extended to a networked system through the design of a linear quadric regulator based on an optimized control strategy at each node. S. Mondal et al. [83] recently proposed a successful application that highlights the impact of secondary multiagent control in the form of an integral consensus protocol by synchronizing the combined energy and power of a distributed BESS according to a multiagent neighbor-toneighbor network of energy and power. This strategy has developed and has accomplished an independent energy and power consensus that is unaffected by load variations and batteries scenarios.
SOC balancing of distributed ESS has been incorporated by distributed secondary multiagent control, as one of the vital objectives for both AC and DC microgrid. There have been distinctive attempts in the literature with the aim of balancing SOC based on the distributed secondary in AC microgrid [84,85]. A clear clarification of the theory was introduced in [84]. Here, SOC balance of distributed ESSs in the AC microgrid was achieved via the design of a multiagent-based control algorithm of each agent. The average SOC of neighbor ESS at time K ( was received via multiagent communication. Then, the average SOC of the specific distributed ESS at the next time K + 1, _ (K + 1)) was determined through dynamic average consensus information. Furthermore, the frequency that implements balanced SOC has been scheduled and applied to primary control.
Successful development lies in the active role of the dynamic consensus, which is clarified in Figure 11 and based on the proposed multiagent communication. An SOC that is created by consensus was compared with the measured state of charge ( to accomplish a balanced SOC. Then, the balanced SOC was compared with nominal frequency (° * ) to schedule the frequency that implements the obliged SOC balance. Finally, the scheduled frequency reference (° was applied to primary control with a voltage reference (° to achieve PWM control signal. Successful simulation of the developed secondary, multiagent-based distributed frequency scheduling had yielded valuable features of robustness against communication failure, in addition to its capacity for expansion. Furthermore, any of the distributed ESSs were capable of participating at any point of the operation. C. Yu et al. [86] recently suggested an application of the theory on distributed BESSs in an islanded AC microgrid with the existence of multiagent communication. A control algorithm was designed with the aim of restoring frequency in addition to balancing SOC. The steady-state frequency was maintained to its nominal value via the compensation of power difference in the microgrid system. Another objective has been gained from the simulation rather than frequency and SOC, which is the optimization of synchronous speed of the developed event-triggered method over the conventional one. DC microgrid networks have also been an application field of the secondary multiagent for the objective of balancing SOC of distributed ESS. An innovative application of this is demonstrated in [87]. Here, a distributed multiagent secondary was applied to a DC-connected microgrid with distributed ESSs. SOC balancing was one of the valuable objectives of the system. The key aspects of development for this strategy focused on two main tasks: (1) when the distributed secondary control created a voltage control action ( ⊽ ) and an average energy control action ( ), they were then added to droop calculation to create reference voltage * that was implemented to balance output voltage for the agent with the connected DC Bus (see Equation (23)); (2) when the developed control system has been applied to the AC/DC grid rectifier to manage power flow and modes of the DC Microgrid in a form that provided a balanced energy level (balanced SOC) for the distributed ESSs. The grid rectifier received information from neighboring ESSs regarding the voltage and energy situation. Therefore, the need for a central controller or control mechanism to control the transition from one mode to another was eliminated. * ⊽ SOC balancing of distributed heterogeneous ESSs in DC microgrid has also been solved by distributed secondary multiagent-based control. The strategy in [88] was one of the successful propositions, in which multiagent-based energy coordination control was applied to a control hybrid microgrid consisting of BESSs and ultracapacitors with no need for a central controller. The various level distribution of the multi-benefits heterogeneous distributed storages helped to achieve an enhanced control optimization. Furthermore, more control objectives were gained. The developed strategy was a pattern of four control scenarios; the microgrid bus voltage was maintained by leader ultracapacitors, while ultracapacitor voltage was maintained by leader batteries. The other ultracapacitors were followers and were responsible for implementing local urgent load demands. On the other hand, the main objective of follower batteries was to balance the SOC. Despite the many objectives of the strategy that were achieved, the main objectives were to balance the microgrid power and maintain the SOC balance.
Distributed secondary multiagent strategy was developed as a solution to a limitation of a linear consensus protocol of the distributed BESSs in a microgrid. The limitation occurred in previously proposed strategies to balance the dynamic energy level of the distributed BESSs [87][88][89]. An undesired tradeoff between dynamic energy balancing and the equilibrium of SOC caused circulation current between the distributed BESSs. T. Morstyn et. al [90] designed a strategy to maintain a linear consensus protocol limitation via the balance of the SOC. Thus, a sliding mode control has been integrated to a secondary multiagent-based control of distributed BESSs in DC microgrid. The achieved sliding mode control action ( , as given in (24), has succeeded in controlling the level of participation of the distributed BESSs in droop control for both charging and discharging based on information from multiagent communication regarding average neighbor's storages SOC , measured SOC (t)), and the measured participation current per unit storage ( .
The initial implementation of the theory achieved a balanced SOC but with the appearance of two defects. A chattering, which was due to the many rapid switches of sliding mode control to keep the SOC of the distributed BESS equal to the average SOC of its neighboring BESSs, and an overloading of some participating distributed BESS with a higher storage level due to a wide range of participation in the current level. To overcome these weaknesses, an updated sliding mode surface was introduced (see Equation (25)), as was a new maximum per unit current limit . This current was determined from the division of maximum discharge current of the distributed BESS by the battery maximum capacity ( . The new sliding mode control has prioritized solving the drawback of overloading over guaranteeing accurate SOC synchronization, to reduce chattering.
The proposed strategy gained some features over the conventional strategy, including circulating current between the participated distributed BESSs, and the feature of plug and play.
A more recent development of secondary distributed multiagent-based control was applied to introduce time-oriented SOC balancing in [91]. The idea behind the developed consensus protocol was to achieve the obliged SOC balance through the time management of charging/discharging modes of the distributed BESSs. As shown in (26) and (27) The strategy that was proposed by J. Almada et al. [92] is more recent, in which a secondary multiage-based control strategy was designed to operate both connected and standalone modes of a microgrid, in order to optimize the overall system performance. The control strategy consisted of a modified droop controller at the primary to accurately share reactive power, and a secondary centralized multiagent-based controller. The successful sign of the designed approach was via the adoption of an intelligent agent, which is autonomous and can decide, detect, and operate in the given environment with high responsibility. Therefore, it can cooperatively solve complex and distributed problems with other intelligent agents. The system has been tested and results showed an optimized balance of power and system stability.

Tertiary Cooperative Multiagent Based Strategies of Distributed ESSs
The main typical objective of tertiary cooperative multiagent control in a microgrid is to attain DOPF of distributed ESSs. Despite this common objective, the controls differ according to their specific control objective; some of them regulate the microgrid parameters, while others track these parameters. The strategies that accomplish economic optimization are the strategies that have priority for the preference. The preferable economic strategies are classified, based on multiagent communication architecture, into three categories.

Hierarchical Tertiary Multiagent Strategies of Implementing DOPF of Distributed ESSs
The DOPF solutions of these strategies are achieved via a collaboration of a central controller with autonomous distributed generation and storage agents. Each of these agents is working independently with its local controller and under specific constraints, while full information of power topology is provided by the central controller. K. Worthmann et al. [93] proposed a strategy that explains the concept of the distributed centralized tertiary; an MPC-based market maker strategy acts within three other control levels of implementing a flattened aggregation of power consumption, and communication is available between any of the distributed agents and the central economic optimization control management.
The agent at node exchanges information with a market manager controller, at each time k and for N length sequence of prices. The information relates to the price to buy power from the main grid (see Equation (29)), price to sell power to the main grid (see Equation (30)), power supplied by the main grid at time k ( ) (refer to Equation (30)), and power injected to the main grid at time k ( ) (refer to Equation (31)). The objective was to attain the obliged solution of managing the cost. Cost management was the scenario of increasing selling and buying electricity prices when demand exceeds the average predicted and vice versa.
, . . . . , 1 , . . . . . . . , A more recent effective implementation of the hierarchy-based tertiary multiagent distributed control was proposed in [94], in order to control the AC microgrid network with distributed energy resources, distributed ESSs, and loads. The three hierarchical control levels were used to achieve an optimized distributed power system. Tertiary control with a partnership of all distributed agents was responsible for solving the OPF problem. As demonstrated in (32), the mathematical formulation of the AC OPF problem as a function of tertiary control variable ( was intended to produce an economic generation with the application of convex reduction to power flow constraints (ℎ ), and generation limit constraints (ɡ (See Equation (33)). Implementation of the developed control strategy has demonstrated optimized scalability of solving AC OPF based on multiagent communication. (32) . . ℎ 0 , ɡ 0 The exact diffusion strategy has been one of the most recently developed strategies for implementing an optimized economic dispatch of multiagent distributed agents in a designed microgrid, which consists of distributed generations, storage, and loads [95]. Tertiary centralized control was the higher level of the proposed hierarchical control that acted as power distribution optimizer rather than a central controller in accomplishing the economic operation of the microgrid. A microgrid global central controller (MGCC) agent transmits schedules to the distributed agents to optimize agents' power dispatching. Additionally, it uses an optimization of the consensus algorithm for quicker convergence and increasing of stability and expansibility.

Topology-Based Multiagent DOPF Solutions
Topology-based solutions consider multiagent sparse communication between the distributed agents that reflects the power network topology of a predetermined distributed microgrid network. Each distributed agent has a bidirectional information exchange with all its neighbors. A comprehensive application to the theory has been achieved through a proposed decentralized control of distributed multi-smart-microgrid power network in [96]. The idea behind the development was to take the advantage of the distributed ESS agent at each smart microgrid network in order to achieve internal implementation of the demand. Furthermore, the network exchanges power locally with neighboring smart grids and the main utility grid.
The main objective of the proposed control strategy in [96] was to accomplish distributed cooperative control for any of the distributed smart grids according to the topology-based multiagent communication as shown in Figure 12. Each smart microgrid (SMG) was considered an agent and communicated with neighbor agents through a power link to optimize its power exchange. Information exchange was in progress between neighbor agents regarding the current and expected power availability. The effective computation for many Microgrid systems was one of the signs of success to achieve all decentralization features. W. Kang et al. [97] proposed a strategy with a topology-based multiagent communication layer of distributed BESSs, DGs, and loads, in a microgrid. A systematic method was designed which uses multiagent information to accomplish SOC and reactive power balancing.

Fully Distributed Tertiary Multiagent DOPF Solutions
Fully distributed DOPF solutions are based on topology-free communication, and only communication between close neighbor agents is mandatory. Therefore, it is achieved if at least a bidirectional communication between the distributed agent and one neighbor is achieved. The strategy that has been proposed in [98] clears up the application of the fully distributed solution and its effectiveness in coordinating distributed energy units. The specific energy management employed a (consensus + innovations) method to organize all energy units of the microgrid network. Each of these units included storage systems as an agent connected to a specific node. Therefore, full distribution multiagent sparse communication was implemented. Furthermore, it exchanged the cost and load demand information between neighbor's agents, in order to ensure that the bulk energy of the microgrid is sufficient for load demand. The optimized operation of the distributed ESSs and the inclusion of ramp rate constraints were behind the successful solution to the DOPF problem. T. Morstyn et al. [99] applied a fully distributed DOPF to a microgrid network that included distributed ESSs for the aim of achieving a scalable solution that mimics the increase in distributed ESSs in future power networks. The work also eliminated the requirement for a central controller. The development in [99] comprised the division of the DOPF problem over the distributed agents to be solved based on local information provided by the autonomous agent. Thus, enhanced flexibility was achieved, in addition to more robustness.

Tertiary Competitive Multiagent Solutions
In the cooperative multiagent, distributed ESSs are involved in implementing DOPF optimization. Despite this objective, it can be difficult to implement further specific roles, such as the independent sale of energy and the increase in the overall microgrid profit. To understand the theory, a complete description of market-based Microgrid networks with competitive ESS agents was presented in [100]. A multiagent communication-based competitive game theory was employed for an AC microgrid of renewable energy distributed agents. The distributed agent was committed to hour-ahead information of the market for a whole day. Figure 13 [101] shows how the agent was updated with the environment through a multistage platform, which enabled it to perceive the environment through the sensors and make decisions. These decisions were sent to the actuators. One of the significant advantages of ESSs here is that the price of energy was proportional to the SOC, so the price is low whenever the SOC is high. Multi-microgrid multi-consumer systems are fundamental. Therefore, they have been a field of competitive multiagent application and there has been a great deal of work directed towards solving the management of energy distribution for such a system [102]. To this end, a multilevel Stackelberg gaming solution was established to consider the multimicrogrids as leaders that decide the mandatory level of generation. Furthermore, support from central energy management was available for an optimum energy tariff, and to earn more profit, in addition to the participation of consumers or followers in deciding the optimal consumption. Therefore, ESSs at the follower agents were deciding the optimal demand, which in turn has resulted in more profit for the specific Smart grid [102].

Combined Cooperative Competitive Multiagent Solutions
A combination of cooperative and competitive solutions can be achieved for attaining more intelligent solutions of power distribution management in smart grids. For example, providing multilevel energy trading and marketing, thus consumer level trading, in addition to the individual whole model marketing. For example, the multi-objective power management solution that was recently proposed in [103] to solve the power management problem. The new idea behind the development was to model the power management problem so that the distributed agents were involved in a bargaining game. This was attained by introducing a Nash bargaining solution. Furthermore, the implemented agent decision-based computation eliminated the need for a central controller. The employment of the Nash bargaining solution for solving the power management problem was extended to a multi-microgrid power distribution network in order to obtain a cooperative, agent-based, Pareto-optimal treatment of power management [104]. Furthermore, a utility supplier is the common factor in the coupling of all the agent microgrids that support the power exchange of a multi-microgrid network, and represents the main market to accomplish the necessary cost reduction.
The multilevel energy market demonstrated success in the multiagent distributed power system in [105]. The operation of the Smart multi-microgrids has been enhanced through hierarchical, three-level marketing propositions. Here, the double-auction, dayahead marketing mechanism was at the first level, while the other two levels were an hour-ahead real-time marketing. The concept of the hierarchical multilevel solution was to accomplish a multi-decision-makers system in a format where the upper-level decisionmakers are leaders, while the lower levels are followers. The qualified multiagent-based communication that uses data distribution service (DDS) under the employment of realtime publish-subscribe (RTPS) implemented fast, reliable, and scalable communication. Furthermore, microgrids within power systems were capable of increasing system flexibility and accomplishing economic operations. Table 3 demonstrates a summary of the reviewed multiagent strategies, which comprises the major strengths and weaknesses for each of these strategies.

Strategy Application Strengths Weaknesses
[83] 2nd order multiagent system Synchronization of energy and power levels of the battery during the charging/discharging mode. Achieved energy and power fixed time consensus regardless of charging/discharging. Robustness.
Neglect of the dynamics of the system, inner loops, control of secondary frequency, and control of power, to simplify the design.
A negative influence of frequency control if neighbors tracking error is suddenly changed. [86] AC microgrid/multiagent system Balanced SOC. Active restoration of frequency to the nominal at steady state.
Short oscillation of the controller before steady state.
[87] DC microgrid/multiagent system Robust, extensible, and flexible. Regulated DC voltage in all operational modes. Accurate demand implementation.
Intermittent of renewable generation.
Delay of communication.
[88] DC microgrid/multiagent system Decentralized coordination of heterogeneous storage systems. A highfrequency load has been provided by ultracapacitor, while short frequency load was by battery.
The minimum voltage achieved (367.5 V) is still above the minimum microgrid rating (360 V).
[89] DC microgrid/multiagent system Balanced SOC. Regulated bus voltage to a specific range.
Less communication dependency.
Communication fail (multiagent communication topology is changeable).
[90] DC microgrid/multiagent system Balanced SOC. No circulating current between the participated BESSs. No overloading of any of the participated BESS and plug and play.
Despite balanced SOC, the overloading problem has been prioritized over the accuracy of SOC synchronization, to reduce sliding mode chattering.
[91] DC Microgrid/multiagent system Elimination of average voltage deviation based on nominal voltage. Balanced SOC at the limits. Smooth transfer from charge to discharge and vice versa.
A difference in nodes voltage is existing, but the average of the microgrid is balanced. SOC is only balanced when reach limits (90% higher, 20% lower).
Robust against, load fluctuations and different ESS capacities.
[92] Multiagent-based microgrid network Active for both connected and standalone modes. More realistic environment for microgrid control and management. Optimized power flow.
Frequency deviation when switching to transfer from an operating mode to the other.
The optimized performance level that provided by MPC does not last, because of the fast increase in an optimization problem with the expansion of the network.
Current sharing cannot be implemented, as the failed link will isolate the agent.
[95] Multiagent-based microgrid network Faster convergence. Economic and stable operation. Robust to the changes in communication topology. Plug and play of distributed systems. Guarantee optimal operation of resources with economic optimization.
Transmission losses are not considered.
[96] Smart Microgrids (SMGN) Effective computation, even if the number of Smart grids is increased. Individual cooperation over Smart grid operation (no need for information for other Smart grids status). Scalable and robust.
The increase in PLAs iterations with the increase in involved smart grid number. Thus, computation time will be increased.
[97] Multiagent-based microgrid network [99] Multiagent-based microgrid network Expandable to many distributed ESSs networks (network topology was not an effective parameter of control). More flexibility and enhanced robustness.
Reduction in power quality due to the VSC references. Slight increase in power consumption, compared to central (0.13%).
[100] Multiagent based grid connected Microgrid Compatible with industry applications. The possibility of large-scale implementations for the proposed microgrid has been proven.
A voltage drop at PCC is recognizable but small.
[102] Multi-microgrid multiconsumer network Enhancement to energy adequality. A delay in energy supply.
Economical operation of each microgrid of the network.
The multi-objective power management (MOPM) problem has been solved for a grid connected and islanded microgrids.
Limitation of exported power to 4.2 MW, lower than congestion limit 7.5 MW [104] Multiagent-based multi-Microgrid network Lower computational load per iteration. Active for wide-scale applications in realtime.
[105] Multiagent multi-microgrid network Minimized, implementation time. Capable of using all of the agents' capacity to minimize cost, in addition, to raising system flexibility for deregulated fields. Less dependence on utility.
A minor real-time mismatch.

Intelligent Control-Based Reinforcement Learning
One of the influencing factors that enhances system reliability is the integration between renewable resources and ESSs, so that excess generation of renewable energy can be stored. Therefore, multiagent communication is the gate towards decentralization and the accomplishment of this integration. Above all, the main objective of decentralization is the transition towards smart, decentralized microgrid networks. Reinforcement learning (RL) is one of the gold standards for smart intelligent power distribution management, especially with the trend towards a clean and economic environment, and with the increase in electric vehicles (EVs). The aim of the RL agent, as illustrated in Figure  14, is to increase the total reward via a sequential interaction with the environment status that includes a power distribution management within it [106]. The best action is learned for every state through the design of a qualified reward [107]. As given in (34), the state function explains that if a new state ( visits, then action is taken to move to the next state ( ) via the provision of an urgent reward ). Furthermore, future returns that exist in the current state are regulated by an action factor ( ∈ 0, 1 ). Here, is the probability sequence from current to next state [107].

Balance of Exploration and Exploitation
The aim of a learning scenario is to explore from statuses and exploit from rewards in order to decide the RL action. Therefore, both exploration and exploitation need to be balanced to avoid jamming in local peaks. Balancing exploitation/exploration is not an easy task because of the experience essential for optimizing the actions and handling so much mandatory data. Because of that, the actions need to provide as many rewards as possible in order to achieve the desired action, which provides a high reward; this action is named the greedy action ( if it delivers the maximum reward. Solution policies for the balance problem between exploration and exploitation were presented in [106].

E-Greedy Policy
A model-free, e-greedy, reinforcement learning was proposed as a lower-level control for managing the energy of battery pack storage and two driving motors of a hybrid-tracked electric vehicle system [108]. It succeeded, with its Q-learning algorithm optimizing control based on online transition probability matrix (TPM) computation. Egreedy policy-based Q-learning RL has been nominated as a solution for the difficulty in obtaining EVs mobility and its charge/discharge profiles, which are desirable for the mobility-aware control algorithm (MACA) that proposed to optimize charge/discharge scenarios [109]. Since EV in the Vehicle-To-Grid (V2G) system can consume power in charging and supply power in discharging, then it can represent an autonomous microgrid with a storage unit. Z. Tan et al. [110] used a module-free, e-greedy, reinforcement learning of Q-learning solution as a non-convex top layer of a fast-learning optimizer, in order to implement the real-time optimal energy management (OEM) of a connected microgrid. The proposed strategy was an intelligent contribution for a combined management method of classical control and an intelligent model-free reinforcement learning, which in turn, enhanced the speed and the value of the quality optimization.

Q-Learning
Q-learning is a qualified method of model-free learning, mostly based on reinforcement learning, for the rule ( of the decision-maker. It refers to how successful the process of deciding an action ( is for the current state ( (see Equation (35)). Therefore, the state function of the desired action can be represented as quality of taking that action [106]. , Q-learning has been involved in microgrid power management with the aim of achieving fast and high-quality optimization. The three proposed strategies [108][109][110], presented in the previous section all followed a Q-learning method of their module-free reinforcement learning involvement. The introduction of Q-learning RL of optimizing power flow for an EV charging station highlighted its development compared to classical, programming-based optimization [111]. This is due to the capacity of RL to complete and save solutions offline. Model-free RL has been recommended to enhance energy consumption scheduling, through gaining more information about power consumers and suppliers. The strategy highlighted the impact of involving RL in multiagent-based energy management through the accomplishment of agents' (consumers and suppliers) information.

Batch Reinforcement Learning
Despite the wide range of applications of a model-free learning method, it is still not robust enough for application in some policies and is limited in its data. Therefore, there is a need for a more efficient RL methodology. Batch reinforcement can provide more efficient and stable solutions by having full knowledge of the experiences in the environment prior to an update, as illustrated in Figure 15, which clarifies that batch experiences are saved and applied before taking the action; this differs from Q-learning, which updates Q-values at the action time [112]. The application of batch learning in scheduling power management in a microgrid, specifically in the power flow of an ESSs, was highlighted in [113]. A combination of Q and batch-learning was proposed to achieve an optimized proposal of a battery operation. An optimized operation of the battery was decided by the nominated RL agent based on storage SOC, demanded load, inverter efficiency, and PV generation. A developed batch Q-learning was proposed by G. Shi et al. [114] to manage the energy of an eco-based microgrid network that consisted of an office as a demand, photovoltaic generation as renewable supply, and a battery storage unit. The system used the full knowledge of the optimized performance over a period of time to prepare for a real-time electricity rate and demand, which, in turn, accomplished the objectives of the developed Echo-RL-based strategy of optimizing charge/discharge of the battery that implements the optimized reduction in the total cost.

Deep Q-Learning
Deep Q-learning is a combined solution of supervised and reinforcement learning, which combines deep learning and batch-based Q-learning [115]. Figure 16 shows that deep Q-learning is comprised of two neural networks. One network estimates or predicts the current Q, while the other uses the old estimation to estimate the next Q or targets Q.
The application of deep Q-learning in managing energy, in a microgrid consisting of DGs, BESSs, and PV, was applied in [116], as a solution to the uncertainty in renewable generation, demand, and their prices. The development was through a formulation of Markov decision-making in scheduling the specific microgrid operation in real-time. RLbased deep Q-learning was introduced for solving Markov decision learning [116]. Then, action was approximated via the proposed deep Q-learning, and the designed deep forward neural network. Implementation proved that deep Q-learning-based scheduling predicted uncertainty of operation with no explicit model, unlike traditional RL that requires a specific model. The application of deep Q-learning in a multi-microgrid smart grid was proposed by X. Lu et al. [117], to balance supply with the demanded load. Therefore, deep Q-learning serves the aim of achieving an energy trading policy via the intelligent prediction of renewable generation, future demand, and level of storage in the battery. Simulation has verified the system's success in maintaining the mismatch between generation and demand, which gives a reduction in plant scheduling of 12%, and a rise of Microgrid renewable generation utility of 22.3%. A new deep RL control approach was recently suggested by L. Desportes et al. [118] for a power distribution network consisting of a hybrid ESS of lead battery and hydrogen storage, a photovoltaic system as a renewable resource, and a consumer, represented by a partial islanded building. The main aim of the designed approach was to accomplish a 35% long-term renewable feeding for the building and reduce emission impacts due to fuel generation. To successfully achieve this goal, a control strategy-based new deep deterministic policy gradient ∝ algorithm was suggested. Particularly, the problem was reformulated to minimize components of the action to one component, (∝ . ( is the state of hydrogen storage. Simulation implementation showed that the newly suggested strategy learned the policy ( ∶ → ). Additionally, the main goal of reducing carbon impact was achieved when the efficiency of the hydrogen storage was adequately large. The smart deep RL-based strategy that was proposed in [119] was the most recent distinctively successful attempt to control a complex hybrid electrical and thermal storage system that was fed by a PV system of a residential building. The main aim of the new strategy was to reduce energy obliged for heating, cooling, and providing hot water. The developed RL-based strategy demonstrated success in dealing with the complexity of the thermal system. The implementation of the new strategy has been compared with a rule-based control and demonstrated better system management as well as significant cost and energy-saving enhancement.

Actor-Critic Algorithms
The actor-critic algorithm is a combination of two deep Q-learning networks in order to maximize the total reward. They are operating cooperatively in a scenario where the actor policy network delivers an action for a state from the environment. The critic policy network monitors two inputs from the environment, state and reward, that are created by the actor's action, and then the accomplished action is returned to the critic and is also sent to the actor [120]. As an application to actor-critic in terms of optimizing stored power management, a deep deterministic policy gradient (DDPG) algorithm has been employed within a battery power management system in a microgrid to minimize consumption cost and steadying battery SOC [121]. Massive and intensive training has been conducted by DDPG, in addition to the avoidance of over-fitting, to optimize battery power flow based on different preference consumers. Results have shown success to accomplish an increase in profit by 55%, and a decrease in SOC instability by 67.5%. A more recent intelligent application of the actor-critic deterministic deep learning policy was designed by L. Yu et al. [122], to manage the energy scheduling of energy storage systems, in addition to other requirements, for a Smart home. A challenging drawback of uncertainty in renewable generation, unshifted high demands, outside temperature, and consumption tariffs, encouraged the authors to design intelligent power management. Then, a deterministic deep learning-based system was designed, which demonstrated effectiveness and robustness in accomplishing the desired energy management.
In a more recent study, A. Joshi et al. [123] proposed a new actor-critic RL-based method named polynomial deterministic policy gradient (PDPG) in order to design a new RL-based control approach to controlling a residential household, fed by a photovoltaicbattery renewable system. The objectives were to reduce consumption cost, enhance battery scheduling and boost roles of consumers of the management policy. The proposed design is a model-free Q-learning capable of accounting for continuous action and learning a deterministic policy under the introduction of an actor-critic dependent upon a deterministic policy gradient. Implementation of the policy has shown progress over state-of-the-art designs in terms of reducing computational time and electricity cost reduction. Table 4 summarizes the major strengths and weaknesses of the reviewed intelligent strategies. An increase in battery usage rate.
[114] Office distribution network Optimal battery charge/discharge. Significant reduction in cost.
An existing error of modeling includes renewable energy errors, errors of energy demand, and the error of electricity ratings.
[116] Microgrid network The capability of predicting uncertainty without a specific model. Economical operation.
A side effect exists in reducing batch size, which is that the full advantage of samples was not taken and leads to an ineffective Q-network. This reduces the performance.
[117] Smart grid consisting of N microgrids Convergence has been achieved. Effective strategy (decrease power plant introduction and increase microgrids utility). Less computation time.
A tradeoff between computational complexity and power plant scheduling.
[118] Distribution network (storage, PV, and building) Successful monitoring of hybrid storage. Long-term minimization of carbon impact at least 35% (minimum is one year). Active with storage models comprising a non-linearity.
Cannot assess a day or an hour because of similar or near consumptions; for example, cannot distinguish between 1 am and 4 am at night because these times are showing the same consumption. Or between cloudy summer and clear winter because of the similar consumption.
[119] Residential building A successful deal with system complexity. Better system management. Significant cost reduction and energy saving.
Lack of accuracy. More identification of project feasibility is needed.
Only daily consideration to the reduction in cost and improvement of SOC. Furthermore, they were limited.
User favoriting is not considered. Power losses are existing.
[122] Smart home Effective and robust. Increase and stabile average reward.
Reduction of energy cost. Reduced temperature deviation. An effective tradeoff between thermal comfort and cost of energy.
A variation of performance for the same system parameters.
Active battery scheduling. Less computational tame compared to the existing strategies. Reduced electricity cost. Enhanced role of the consumer in the management policy.
Limited to one consumer (as consumer role is enhanced, it is advantageous in reducing electricity cost to achieve the decision of several consumers).

Emerging Reinforcement Learning Techniques of Power management in Micro and Smart Grids
Reinforcement learning techniques have become the smart solution to many defects that were dilemmas in the past. Furthermore, it has added extra success and intelligence to the existing solutions. However, traditional techniques are not always sufficient; therefore, emerging reinforcement learning techniques have been introduced, which are developed versions of the traditional techniques to solve power management issues in some complex power distribution applications that cannot be solved by the traditional strategies. Research is still in the early stages of this sector; therefore, future research work is planned to be comprehensive research of these emerging techniques, and specifically of energy management optimization. Synchronous and asynchronous learning have been developed because of the instability of Q-learning in some complicated applications [106]. Asynchronous actor-critic (A3C) was developed earlier, and is a developed version of actor-critic. Specifically, it is a combination of several neural network agents trained asynchronously with different environments [124]. Then, it has been noticed that despite the intelligence of the asynchronous approach, its complexity is a drawback. Therefore, an uncomplicated synchronous actor-critic version (A2C) has been designed to provide intelligence with no complexity [125]. Multiagent reinforcement learning (MARLA) strategies use more than a single reinforcement learning agent; each of them is interacting with the environment to learn the desired optimization of the control system [126]. Therefore, it is introduced when a single reinforcement agent is insufficient of the purpose [126]. The vital need for multiagent reinforcement learning strategies is increasing with the trend towards decentralization of power distribution in micro-and smart grids, especially when the distribution network is more complicated and consists of a group of decentralized energy agents that distribute far from each other within a microgrid network. The key for transfer in reinforcement learning is to use the knowledge that is achieved by solving a specific problem to solve another one; in other words, to transfer the knowledge or solution [127]. Priority experience can be defined as the scenario of sampling past experiences of an RL agent for accomplishing a learning objective [128]. Traditional RL techniques elect an immediate extrinsic motivation for the agent, for implementing the objective of the learning process. Due to the un-scalability of the developed reward of the traditional RL, and the immediate impact that is needed of RL action in some complex environments such as modern power networks, intrinsic motivation methods have been introduced in [129], so that the reward is produced by the RL agent independent of state transition. The curiosity meaning of RL is the prediction error of state transition. Furthermore, actions that provide higher intrinsic reward reduce curiosity [130].

Conclusions and Recommendations
This paper has presented a comprehensive review of historic and state-of-the-art control strategies for distributed energy storage systems in microgrids, smart grids, and intelligent power distribution networks. The importance of ESSs in providing balancing services and to help buffer against intermittent renewable supply is well agreed upon; therefore, it is imperative that research related to their control and management is up to date and succinctly summarized. This paper has set out to provide such a review. 130 research works in the area have been dissected, and a distinctive summary has been presented for each control strategy to highlight the major strengths and weaknesses related to design, implementation, and service provision. Highlights are summarized in the following paragraphs.
Droop control is the traditional strategy of the primary decentralized control for similar or heterogeneous ESSs in microgrid networks. On the other hand, application of droop control without adaptation or consideration of SOC cannot simultaneously provide full voltage balance and load sharing services. Fuzzy logic is one such adaptation to overcome SOC overloading and instability of both voltage and frequency, while virtual impedance can be deployed as a solution to unbalanced reactive power when there is a mismatch of transmission lines impedances. Centralized strategies implement direct control of ESSs and enable individual monitoring and trimming. The secondary control corrects or supervises primary control, in addition to its participation in regulating load sharing and balancing SOC. Rule-based control and EMPC seem the most successful applications, with the main objective of optimizing ESSs power flow on an hourly or sub-hourly basis. The strength of EMPC over rule-based control lies in the consideration of a time varied electricity tariff to potentially yield an economic profit. The main objective of Tertiary centralized control is to provide optimal voltage reference. Furthermore, it helps to solve the OPF problem. Solutions for OPF differ depending on the ESS category. Aggregated solutions manage power flow with a consideration of ESSs as single capacity, while ideal real solutions consider real power management between the distributed ESSs. The different approximations affect OPF solutions, whether they are convex or non-convex.
The introduction of a central controller to many small and distributed ESSs in a microgrid network comes with many challenges. Each of the distributed ESSs is obliged to be controlled individually and precisely by a central controller. Therefore, expanding the infrastructure and providing provision for real-time communications is mandatory. In turn, communication disturbances can be introduced, in addition to privacy and security concerns. This encourages the decentralized control that is based merely on local information, which is with the imprecision of achieving the obliged balance and performance. These fundamental challenges and complications have paved the path towards multiagent control, in which a cooperative balance can be achieved by ESSs based on neighbor-to-neighbor local information exchange only. Secondary multiagent strategies ensure autonomous operation of distributed ESS through multiagent information sharing of SOC, output voltage, and load current. Tertiary cooperative multiagent strategies are classified depending on communication architecture and include hierarchical tertiary, which is accomplished via a direct multiagent communication between central controller and storage agents. Meanwhile, topology-based multiagent reflects the underlying power network topology with no requirement of a central controller. The topology-free multiagent on the other hand does not reflect power network topology, and control can be achieved if at least a bidirectional communication with the neighboring ESS is available. Competitive tertiary strategies differ from cooperative strategies and are required in competitive situations (such as one featuring an independent seller of energy), and consideration of microgrid and/or agent profit cannot be neglected. Furthermore, cooperative, and competitive strategies can be combined for more flexible solutions.
As mentioned previously, multiagent control is the gateway from autonomy to intelligence, and reinforcement learning is one of the shortest paths to reach it, through its application to power management and control. It is considered a powerful tool for scheduling and managing power in complicated power systems. The E-greedy policy is based on giving as much reward as possible to achieve a high reward action, while Qlearning is a model-free learning method that is largely based on reinforcement learning and has wide applications of power management in a microgrid. Despite that, it can still lack robustness, and is limited in its data for some policies. Therefore, more efficient batch reinforcement techniques have been introduced, and combined deep/batch reinforcement learning has also been applied for more accurate estimation and prediction. A further combination of RL architecture is the actor-critic, which consists of two deep Q-learning networks with the aim of maximizing the total reward.
Despite the intelligence of traditional reinforcement learning techniques, they prove insufficient in some complicated applications of managing power; therefore, further RLbased techniques are still emerging. Synchronous and asynchronous techniques are solving the instability of Q-learning in some complicated applications. Meanwhile, multiagent reinforcement strategies are mandatory (e.g., when a single RL network is insufficient), which is much applied to power management in a microgrid. Transfer RL is a principle in which knowledge for solving a problem is transferred from one domain to another domain; this is different to a priority technique, which samples past experiences to implement learning objectives. The extrinsic motivation of the agent is elected by traditional RL, but some complex environments, such as modern power networks, require the immediate impact of RL action. Therefore, intrinsic motivations methods were developed to provide the solution. Such emerging techniques have been applied to solve more complicated applications of power distribution management, or to follow the envisioned future trend of decentralization and autonomy in the design of power distribution systems; however, research is still in the early stages. The principal finding of this comprehensive review is that research gaps related to emerging decentralized intelligent strategies based on RL, and their applications to renewable energy control, management, and optimization in the context of microgrid energy storage mechanism remain. This review has provided a clear taxonomy and description of each control strategy, its methodology, applications, and the major strengths/weaknesses. This in turn fosters clarity of understanding of the topic by the reader, providing insight as to the nature of these research gaps and indicating how knowledge in this field may be extended effectively by future scholarly works.

Conflicts of Interest:
The authors declare no conflict of interest.