Strategies for Controlling Microgrid Networks with Energy Storage Systems: A Review

Mudhafar Al-Saadi; Maher Al-Greer; Michael Short

doi:10.3390/en14217234

,

and

School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough TS1 3BX, UK

^*

Author to whom correspondence should be addressed.

Energies2021, 14(21), 7234;https://doi.org/10.3390/en14217234

This article belongs to the Special Issue Sustainable Energy Reviews II

Version Notes

Order Reprints

Abstract

Distributed Energy Storage Systems are considered key enablers in the transition from the traditional centralized power system to a smarter, autonomous, and decentralized system operating mostly on renewable energy. The control of distributed energy storage involves the coordinated management of many smaller energy storages, typically embedded within microgrids. As such, there has been much recent interest related to controlling aspects of supporting power-sharing balance and sustainability, increasing system resilience and reliability, and balancing distributed state of charge. This paper presents a comprehensive review of decentralized, centralized, multiagent, and intelligent control strategies that have been proposed to control and manage distributed energy storage. It also highlights the potential range of services that can be provided by these storages, their control complications, and proposed solutions. Specific focus on control strategies based upon multiagent communication and reinforcement learning is a main objective of this paper, reflecting recent advancements in digitalization and AI. The paper concludes with a summary of emerging areas and presents a summary of promising future directions.

Keywords:

microgrid; smart grid; control optimization; energy consumption reduction; decentralization; centralization; multiagent; energy management; energy storage

1. Introduction

Whereas traditional electricity utility grids operated in a centralized, top-down fashion, climate change action and the pressing need for decarbonization have seen trends towards decentralization, digitalization, and increasing deployment of artificial intelligence (AI) and automation. Smart grids can accomplish better generation and more efficient transmission and distribution of the generated power [1]. Smart grids provide comprehensive digitalization and automation of an electricity network and can be formed of a hierarchy of microgrids connected to each other to compose a large Smart grid [2]. The typical main objectives of Smart grid are grid supervising and situation awareness, system performance enhancements, reliability, resilience, and security improvements, improved economic operations, and distributed real-time intelligent control and protection of system components [1]. Furthermore, Smart grids offer support for enhanced penetration of renewable energy, which is the main aim for all European and non-European countries for a clean energy environment [3,4]. Within this smarter, autonomous, and decentralized system of microgrids—operating mostly on renewable energy sources—Energy Storage System (ESS) is considered as a key enabler in providing effective buffering against the inherent intermittency of renewable sources [5]. Developments in controlling microgrids including ESSs are a vital branch in the field of intelligent energy distribution systems, arising because of the need for optimized power distribution management.

The focus of this paper is on distributed ESSs, specifically to provide a thorough and up-to-date review of the literature related to their decentralized management and control. Decentralized control strategies form the starting point of this comprehensive review. Specifically, the focus is placed upon strategies that can perform localized ESS control tasks with no urgent need for supervising control. Droop control is the conventional foundation of decentralized control, which can provide balanced load sharing with no need for communication with other system components or a centralized controller. SOC-based droop control can provide accuracy and balance to the ESS, and virtual impedance droop control is a developed version that can also help to balance reactive power due to mismatch in line impedance. Droop control is technology agnostic and can deal with heterogeneous distributed ESSs, and AI-enhanced droop control has been proposed to achieve improved accuracy and balance of the storage system voltage, current, and SOC. Both secondary and tertiary centralized control strategies have been presented, which perform many control and enhancement functions via supervision of decentralized control strategies and correction of load sharing balance through trimming of voltage and current references. Multiagent-based control strategies combine decentralization with partial centralization by providing neighbor-to-neighbor communication between decentralized agents. Multiagent-based control solutions have been introduced for both secondary and tertiary services to enhance autonomy while reducing communication overhead. To accomplish distributed intelligent power distribution management, intelligent strategies have been presented (Q-learning, batch RL, Deep-Q-learning, and actor-critic). Each has differing features depending on the control objective and level of system complexity. The emerging intelligent strategies based on RL (synchronous/asynchronous; actor-critic; multiagent; priority experience; extrinsic/intrinsic) have been introduced when traditional intelligent strategies are insufficient to the high complexity of the system.

This review aims to present a comprehensive and rigorous reference for researchers working in the field of distributed energy storage in microgrids, categorizing each approach and comparing advantages and disadvantages in each case, as well as describing the underlying logic and mathematical background of their operation. To further facilitate the exposition and discussions, a brief overview of methods and architectures is now given to aid subsequent classification of schemas, along with an introductory overview of the role of storage in microgrids to aid subsequent classification of services.

1.1. Energy Storage Systems Overview, Main Techniques, Classifications, and Control Architecture

The typical main objective of ESS in microgrid is to store energy that is generated out of consumer current need, e.g., in off-peak hours, and then re-inject it to enhance energy balance and sustainability when generation is not adequate to demand, e.g., in peak hours [1]. In contrast, there are many existing challenges, the majors are: charging/discharging balance, safety, reliability, size, lifecycle, cost, in addition to the overall control and management [6]. The traditional main ESS techniques are explained through the following points:

Lithium-ion: The typical lithium-ion battery energy storage consists of four main components: a cathode, anode, electrolyte, and a separator. All the components collaborate in accomplishing the objective of storing excess energy. The growing demand of the energy storage market encourages a progressed development of commercial lithium-ion to achieve batteries with higher energy densities, better safety, lower cost, and more prolonged life [7].
Fuel cell: It is an energy storage technique that converts the stored chemical energy to electrical energy via an electrochemical process. Polymer Electrolyte Membrane (PEM) fuel cells are the major application of fuel-cells, and are recently widely desired because of their low operation temperature, high power density, high efficiency, and low emissions [8].
Flow battery: The flow battery is a fully rechargeable electrolyte-based electrical energy storage technique, in which fluids are pumped through a cell in order to enhance reduction/oxidation at the ion exchange layer. A redox flow battery is considered a distinguished storage unit because of its high capability of storing electricity, which makes it more desirable than traditional batteries [9].
Compressed air: It is typically a technique to store energy through compressed air during low-demand times, and this air can be used later to rotate a motor-generator and generate electricity [10].
Flywheel: This technique is based on storing energy in the form of kinetic energy in a vacuum, and then using it to rotate a motor-generator and generate electricity [11].

ESS of microgrid network can likewise be classified depending on location and storage technology into three configurations:

Aggregated: Modelling is simplified when all ESSs are in one location of a predetermined microgrid network [12].
Distributed: Distributed ESSs are scattered around different locations within a predetermined microgrid network [13].
Hybrid: A combined application of ESSs with different storage technologies, which is necessary due to the lack of any ESS technologies which can individually provide all the mandatory characteristics [14].

All the benefits that are accomplished by ESS serve a major objective, which is the transition from the traditional microgrid network of centralized generation and control, to a smart, decentralized network of distributed sources and storage which is mostly based on renewable energy [15]. These benefits were aided by the accelerated trend in the field of renewable energy introduction as explained in the following points:

The urgent necessity to increase the introduction of renewable energy resources, such as photovoltaics and wind generators, has simulated the movement toward decentralized distributed ESS [16]; consequently, it has paved the way for a successful and beneficial transition to smart microgrid networks and reduced pollution [17].
The gradual degradation of ESSs cost has prompted an increase in their use, for the purpose of storing excess energy, and other purposes [18].

The typical standard hierarchical control architecture of a microgrid model network is classified into three levels, as demonstrated in Figure 1, which relates to the hierarchical architecture of control levels and their specific roles of an AC-connected microgrid. These levels are explained as below:

Figure 1. Hierarchical control architecture of AC microgrid.

Primary Decentralized Control: The objective of this level is to regulate the load sharing of distributed energy resources and storage, via the control of their linked converters output voltage and frequency, to attain balanced and autonomous operation of these distributed systems [19,20]. The most typical strategy of this is droop control, which is responsible for implementing balanced load sharing for the distributed resources and storage, with no necessity for time-critical communication links [21]. As demonstrated in the AC microgrid of Figure 1, droop control is present at each distributed ESS as a primary control. It receives measured active and reactive power and creates voltage and frequency offsets for the local controller. This, in turn, implements load participation that accomplishes the overall balance of the load sharing in the microgrid.
Secondary Centralized Control: Centralized secondary control has the responsibility of correcting voltage and frequency offsets that are achieved by the primary control. Therefore, it plays the role of an observer for the primary control. Moreover, it offers some additional roles, such as reactive power-sharing, accurate frequency regulation, and PQ compensation [22,23]. The application of an AC microgrid in Figure 1 illustrates the role of secondary control in correcting droop control offsets to the nominal microgrid references provided by tertiary control. The correction is based on the measured output voltage and frequency offered by each ESS.
Tertiary Centralized Control: This is the highest control level of the control hierarchy. Typically, it is liable of two major objectives. Firstly, adjusting voltage setpoints, or providing optimal voltage references. Secondly, managing power entering or leaving microgrid, or solve optimal power flow problem (OPF) [24]. In addition, it operates in conjunction with other entities to implement overall objectives of providing balanced and sustainable load sharing [25,26]. Figure 1 clarifies how tertiary control in AC microgrid receives power flow management constraints and objectives, and then creates voltage magnitude and angle references that implement optimal power flow management.

1.2. Energy Storage Systems Roles and Objectives of Microgrid

ESS in general, and specifically when distributed within a predetermined microgrid network, provide several fundamental roles and services. It is typically cooperated with its mandatory power electronic converter, to support power-sharing optimization and autonomous operation reliability. These beneficial roles can be explained in the following classifications [27,28]:

Grid voltage support: Means power provided by ESS of a microgrid network, for the objective of maintaining voltage within a mandatory level, or acceptable range. This can be accomplished through the control of distributed ESS reactive power based on real energy generated.
Grid Frequency Support: It is active power that can be delivered by distributed ESS in a microgrid network, to compensate for any imbalance of frequency that is due to a sudden increase in load or generation.
Grid stability: ESS offers the opportunity of decreasing oscillation from the rapid introduction of an event during microgrid operation.
Peak shaving: Typically, the energy generated during the availability of generation or during off-peak times is stored in ESS and shafted to support during high demand time or absence of generation. Furthermore, distributed ESS can implement a short-term demand independently with no obligation of generation. This, in turn, provides excellent support to distributed renewable energy resources, such as photovoltaic and wind turbines.
Spinning reverse: ESS offers support backup power for islanding.
Enhancing quality of power: ESS participates in improving power quality, through the involvement of reducing typical issues related to it, such as maintaining voltage and frequency offsets, reducing harmonics, maintaining the balance of voltage, and improving power factor.
Support reliability: ESS is collaborating on enhancing system reliability in implementing consumer demand.
Ride through support: ESS can offer essential energy during the conditions of a disturbance or voltage sag, which affects system reliability. This, in turn, helps to keep electric units connected for the duration of these disturbances.
Compensation of unbalanced load: The collaboration of ESS through the individual injection/absorption of power supports the compensation of an unbalanced load.

Figure 2 illustrates and summarizes the control strategies of controlling ESSs in the microgrid, providing the relevant taxonomy for later reference.

Figure 2. Control strategies of ESS in microgrid with the relevant reviewed research works.

1.3. Contribution and Paper Structure

Given the historic and more recent developments in this research area, this paper aims to provide a comprehensive review of control strategies for energy storage in microgrids. In this review paper, existing research challenges are presented and areas for further research are subsequently identified. The methodology employed to search the literature and select relevant works in each category was as follows. Research that aims to provide distinctive, clear, and comprehensive implementation to the control strategies—and addressing them as a main objective of the paper or article—were first selected. Out-of-scope, incremental or similar work was then removed. In total, 131 of the most relevant and non-incremental have been selected for this review, distributed across each of the categories in Figure 2. The remainder of this review paper is indexed as follows. Section 2 covers decentralized control strategies and concepts, while centralized control strategies are presented in Section 3. Section 4 covers multiagent-based control methods. Intelligent control strategies are presented in Section 5, along with an explanation of the most promising directions of further research. A distinctive summary is presented for each section, which highlights the major strengths and weaknesses of each strategy. Section 6 is reserved for the emerging intelligent techniques and then a final summary and conclusion are presented in Section 7.

2. Decentralized Control Strategies of Distributed ESSs

To achieve sustainable and balanced power-sharing by implementing load demand, typically, distributed ESS in a microgrid is locally controlled by decentralized control strategies, which includes strategies that can perform the task with no urgent need of supervisory control and can be operated with only local information. The block diagram in Figure 3 illustrates traditional standard decentralized control of an AC microgrid consisting of five distributed ESSs. In which, any of the ESSs is controlled locally, with no central or supervisory controller. Decentralized droop control is the typical, traditional standard strategy for this role, which operates in participation with the local controller to regulate the output voltage and load sharing current of ESS. A conventional power electronic converter exists as an interface to the microgrid bus. The major, valuable feature of droop control is decentralization, in addition to the lack of need for a communication link between distributed ESSs. In contrast, it holds a weakness, which is that implementation of unmodified droop control provides only approximation balance to the output parameters. For this reason, the strategy has progressed through several stages of development and updates, in order to achieve more accuracy and stability. Some are based on the introduction of new parameters, which contribute to accomplishing more reliability, and the others are through the integration of other strategies to play its mandated role as a stage of a comprehensive strategy. This paper presents the major feature of these developments, each of them having accomplished successful solutions to a specifically diagnosed major drawback or weakness. The achievement of State of Charge (SOC) balance of ESS is fundamental to accomplish overall load sharing balance, in addition to maintaining safety and supporting prolonged life of the storage. The introduction of virtual line impedance is a great solution to transmission line impedances, which, in turn, support accomplishing optimized stability as well as reducing losses and maintaining infrastructure. The control of distributed ESSs in different technologies is no less important than what has been mentioned, which specifically aims to adapt droop control to be qualified for balancing load sharing for heterogeneous ESSs [29].

Figure 3. Decentralized control of AC microgrid.

2.1. Traditional Droop Control

Droop control is the standard decentralized strategy to control the distributed ESS and to interface the microgrid to the bus through conventional power electronic converters. It mimics the governor and extractor operation of synchronous generators, which is controlling frequency based on its control of speed and fuel. This illustrates the objective of providing balanced output voltage and frequency through the control of active and reactive power. The idea behind it is to add virtual resistance, which differs from genuine resistance by being unaffected by operation conditions. An example of these conditions is the temperature, which causes losses of power. This virtual resistance is typically named droop gain or coefficient [30]. For an AC microgrid of low voltage, the balance of the output frequency is achieved depending on active power (f–P), and the magnitude of output voltage is dependent upon reactive power (V–Q). The typical features of voltage and frequency droop characteristics are demonstrated in Figure 4.

Figure 4. Conventional droop control characteristics.

As given in (1), the active power droop coefficient (

K p)

is multiplied by the measured active power

(P)

, and then subtracted from the reference velocity (

ω^{*})

to achieve the desired velocity

(ω)

. Meanwhile, the reactive power droop coefficient, as presented in (2), is multiplied by the measured reactive power

(Q)

, and subtracted from output voltage reference (

E^{*})

to attain (

E)

. Therefore, frequency is inversely proportional to the measured active power, and voltage magnitude is to the measured reactive power [31]. Decentralized droop control is implemented on distributed ESS in a DC microgrid, and power-sharing is directly proportional to the values of output voltage and current (V–I). In fact, standard droop control with no modification is unqualified to provide full balanced power-sharing of distributed ESS, because SOC is not considered.

ω = ω^{*} - K p \times P

(1)

E = E^{*} - K q \times Q

(2)

2.2. Virtual Impedance Droop Control

The difficulty of balancing reactive power-sharing among parallel droop-controlled inverters in AC microgrid is an existing obstacle, especially when there is a mismatch of line impedances. Virtual impedance droop control is an updated version of traditional droop control that adopts virtual impedance theory to compensate for the mismatch of line impedances, which is considered a drawback through its effect of reactive power balance. The theory involves a modification of inverter output voltage droop control, as proposed in [32], to achieve the equivalent model and eventually accomplish balanced output voltage. Reactive power is balanced when voltage drops from each parallel inverter (

V_{d r o p 1}, V_{d r o p 2}

) are equal (see Equation (3)). Then, first inverter virtual impedance is stetted to zero

(Z_{v 1} = 0)

, which qualifies another inverter virtual impedance (

Z_{v 2}

), which is given in (4), to eliminate the mismatch of line impedances

(Z_{1} - Z_{2}

). The voltage drops that occur due to the accomplished virtual impedance are subtracted from droop control voltage reference to attain a reference voltage that implements balanced reactive power between the inverted distributed units, as demonstrated in Figure 5, which clarifies the introduction of virtual impedance to droop control for two inverters connected to the same AC Bus.

V_{d r o p 1} = I_{L 1} (Z_{1} + Z_{v 1}) = V_{d r o p 2} = I_{L 2} (Z_{2} + Z_{v 2})

(3)

Z_{V 2} = Z_{1} - Z_{2}

(4)

Figure 5. Virtual impedance droop control of parallel inverters in AC microgrid.

2.3. Droop Control-Based SOC

Droop control-based SOC is a modified version of traditional or standard droop control that includes SOC of calculating or weighting droop coefficients [33,34]. The objective is to accomplish balanced SOC of the distributed ESS, in addition to extending its life [35]. Therefore, it is named SOC weighted droop control and it is achieved by adding SOC as an exponent to the weighted droop coefficient [33,35]. In [36], a modified weighted droop control has been proposed to regulate bus voltage when power changes. It is clearly clarifying how traditional droop control is modified to accomplish the mandatory SOC balancing. As demonstrated in (5), a droop control action-based SOC for both discharge/charge modes

(u_{d c i})

is accomplished through the multiplication of discharge/charge droop coefficient (

K (d), K (c))

by energy storage output power

(P_{i o u t}),

and the exponential of the computed SOC

(e^{1 / S O C_{i}^{n}})

, and then subtracted from the reference control

(u_{d c r e f})

.

{\begin{matrix} u_{d c i =} u_{d c r e f} - \frac{K_{d} \times P_{i o u t}}{e^{S O C_{i}^{n}}} D i s c h a r g e \\ u_{d c i =} u_{d c r e f} - \frac{K_{c} \times P_{i o u t}}{e^{S O C_{i}^{n}}} C h a r g e \end{matrix}

(5)

A comparison between this developed SOC-based droop strategy and traditional droop control has clarified that the SOC-based droop strategy significantly enhances the SOC balance of distributed ESS and improves the balance of sharing current during load fluctuations. Furthermore, it supports a prolonged life for the storage. The objectives of SOC-based droop control have been expanded by C. Gavriluta et al. [37] to include the determination of microgrid voltage and frequency offsets, through its effect of adjusting microgrid voltage and frequency when included within droop control.

A more recent dynamic SOC-based droop control strategy has been proposed in [38], to control battery-based distributed energy storage systems (BESSs) in a DC microgrid network including constant power loads (CPLs). The aim was to recover and stabilize microgrid DC bus voltage and power distribution in the case of a time-varying droop coefficient. The major contribution of this strategy was that local information of BESSs SOC can be shared by a dynamic consensus algorithm and the introduction of a nonlinear disturbance observer (NDO). Implementation has shown optimized system stability and rapidity. Furthermore, DC bus voltage has been maintained with appropriate sustainable power distribution.

2.4. Fuzzy Logic Droop Control

Droop control-based-SOC suffers from two specific weaknesses. First is the overloading of high SOC-distributed ESSs, which is due to the lack of participation of low SOC-distributed storage. Second is the instability of voltage and frequency, because of the increase in droop coefficient when all distributed ESSs reach low SOC level. For these fundamental reasons, fuzzy droop control has been developed [39,40], which is a modified version of standard droop control to schedule droop control coefficients gains. Particularly, involving output voltage and SOC of weighting these coefficients. The objective is to accomplish balanced output voltage during the condition of all distributed ESSs at a low SOC level, where microgrid output voltage (

V_{D C})

, which is mentioned in (6), is balanced through the regulation of droop control virtual resistance (

R_{d}

) based on SOC estimation and output current (

I_{L})

. In fact, Fuzzy logic droop control has the beneficial feature of implementing more than one control objective. It reduces voltage deviation (

V E)

between microgrid bus voltage (

V_{D C})

and reference voltage (

V_{r e f}

), as clarified in (7). Furthermore,

R_{d}

is adjusted based on fuzzy SOC estimation

V_{D C} = V_{r e f} - (I_{L} \times R_{d})

(6)

V E = V_{D C} - V_{r e f}

(7)

A decentralized control strategy based on fuzzy logic has been proposed for AC islanded microgrid to balance the SOC of distributed energy storage [41]. Figure 6 highlights the methodology, and how the fuzzy inference system (FIS) has been integrated to droop control. A constant voltage charger exists to prevent the battery current from falling below a certain level; thus, the distributed ESS is kept operating on a current control mode (CCM). The new weighting factor

(W (S O C_{B a t}))

(see Equation (9)) has been suggested and estimated by a Fuzzy inference system (FIS) for each distributed ESS based on SOC, to attain the correct value of droop coefficient

(m)

. Then, the accomplished estimation is applied to the

(p - w)

droop control, which implements the balanced SOC of each energy storage based on the correct power injection/extraction at the common bus [41].

ω = ω^{*} - (m \times P_{B a t t}) (n o f u z z y)

(8)

ω = ω^{*} - (m \times P_{B a t t} \times W (S O C_{B a t t})) (f u z z y)

(9)

Figure 6. FIS-based fuzzy droop Control.

2.5. Droop Control of Different Technology-Distributed ESSs

Droop control of different technology-distributed ESSs comprises droop strategies that are modified to control distributed ESSs of different or heterogeneous storage technologies. These technologies are typically classified into two groups: (1) peak shaving and regulating power quality; (2) energy shafting and spinning reserve. Ultracapacitors are a very common example of these technologies and can significantly influence energy balancing for ESS. Specifically, a long lifecycle, low power cost (USD KWh), and high rate (KWh/Kg) make it applicable to accomplish the optimized quality of high-frequency load demands. The typical ESS that comprises numerous storage technologies is known as a hybrid system. Droop control has been implemented to control the primary frequency of two different storage techniques ESSs, BESS, and superconducting magnetic energy system (SMES) in a hybrid standalone AC microgrid [42].

Where

(K_{s m e s})

and

(K_{b a t t})

are battery and SMES droop coefficients,

(∆ P_{S M E S})

and (

∆ P_{b a t t})

are battery and SEMS contribution power. As explained in (10) and (11), when Microgrid frequency is at non-critical frequency, as chosen by the UK grid code [43], then no action is obliged by droop control on the storage. In contrast, when the frequency is more than critical-up frequency (

f > f_{n o n_u p}

), then the storage is charged to absorb this excess, whereas if it is less than critical-low frequency (

f > f_{n o n_l o w})

, then they are discharged to compensate. The introduced power-sharing method is an optimized droop control strategy, to control the primary frequency of a heterogeneous ESS consisting of battery and SMES. It is accomplished that frequency stability has been improved. Moreover, the optimal output power is achieved in different power situations due to its capability of adjusting droop gain for both storage units.

∆ P_{S E M S} = {\begin{matrix} - \frac{1}{K_{s m e s}} (f - f_{n o n_u p) (C h a r g i n g m o d e)} \\ - \frac{1}{K_{s m e s}} (f - f_{n o n_l o w) (D i s c h a r g i n g m o d e)} \end{matrix}

(10)

∆ P_{b a t t} = {\begin{matrix} - \frac{1}{K_{b a t t}} (f - f_{n o n_u p) (C h a r g i n g m o d e)} \\ - \frac{1}{K_{b a t t}} (f - f_{n o n_l o w) (D i s c h a r g i n g m o d e)} \end{matrix}

(11)

In [44], a composite droop control strategy has been planned to control heterogeneous ESS consisting of battery and supercapacitor in DC microgrid. In particular, the strategy has proposed a high pass filter-based droop (HPFD) for battery converter, and a virtual capacitance droop (VCD) controller to control the supercapacitor (SC). Therefore, a collaboration of two control strategies has been demonstrated, and several control objectives have been accomplished. Fast fluctuations of SC were buffered with compensation of low-frequency power mismatch. Meanwhile, bus voltage was regulated with a recovery to supercapacitor SOC. As given in (12), the voltage deviation (

R_{V} \times i_{o B})

of battery output voltage (

V_{o B})

is maintained by a compensation voltage (

∆ V_{R})

that added to it. Reference or nominal voltage (

V_{n o m})

is increased by

∆_{V R}

to compensate for the deviation. Here,

R_{V}

is virtual resistance, and

i_{o B}

is battery output current.

V_{o B} = V_{n o m} - (R_{v} \times i_{o B}) + ∆ V_{V R}

(12)

A successful recent Fuzzy logic-based control strategy has been proposed by G. Bharathi et al. [45] for a DC microgrid network consisting of a photovoltaic system, fuel cell (FC), and BESS. The strategy has presented a Fuzzy solution to the heterogeneous energy storage system to stabilize power distribution and regulate bus voltage. The role of the heterogeneous energy storage system here is to retain DC bus voltage under the control of the new proposed strategy. Specifically, droop control is mitigating DC bus voltage fluctuations while fuzzy logic control is enriching power exchange under different dynamic situations. Simulation of the system with the new proposed strategy has verified an optimized performance and balanced power for different dynamics.

2.6. SOC Balancing of Modular Multilevel Converter Energy Storage System

A modular multilevel converter (MMC) has existed in many high voltage, high power applications as an alternative to the conventional converter because of its brilliant properties; in particular, when interfaced with an ESS to attain a modular multilevel energy storage system (MMC-ESS) that can provide excellent support to the performance of grid applications when connected to the grid [46]. To achieve the necessary performance support, it is vital to ensure a smooth connection to the grid under the design of a properly qualified control system [47]. One of the crucial diagnosed control drawbacks is the unbalance in SOC of ESSs, which is due to different charge and discharge speeds, and might cause more excessive drawbacks rather than the overcharge or over-discharge of any of the storage units [48]. Unbalanced SOC might lead to two more defects: (1) unequal battery voltages, which, in turn, can induce DC components of the injected grid current; (2) an internal circulating current [48].

Many successful solutions have been proposed in the literature to solve this drawback and attain a balanced SOC. A distinctive one has been proposed by F. Geo et al. [49], who suggested a novel control strategy to optimize the performance of MMC-EES. SOC balance was one of the objectives, in addition to the suppression of circulating current and the grid DC current. The strategy has suggested the adjustment of real power for each half-bridge according to the difference in SOC. The result has indicated the success and validity of the proposed strategy with an effective balance of SOC for all batteries, and suppression of circulating current and grid DC current. A three-level SOC equilibrium method has been suggested and designed by H. Laing et al. [50] to a BESS interfaced to MMC to balance batteries energy through the balance of their SOC. The development was an attempt towards extending life or reusing the second life of batteries in electric vehicles. The new strategy has encompassed the introduction of power regulations that are based on battery capacity proportion for the three-phase legs to attain balanced SOC of batteries, among three-phase legs of upper and lower arms, in each specific phase, and of submodules in each arm. Implementation of the developed strategy has verified an effective overall SOC balance.

A summary of the reviewed decentralized strategies has been established in Table 1, which highlights the major strengths and weaknesses for each of these strategies.

Table 1. Summary of decentralized control strategies.

3. Centralized Control Strategies of Distributed ESSs

Centralized control strategies offer direct control and individual monitoring of distributed ESS in a microgrid. The block diagram in Figure 7 demonstrates the standard centralized control of distributed ESSs in an AC microgrid, consisting of five distributed ESSs. Here, direct control exists between the central controller and any of the ESSs. The centralized control strategies are classified depending on the role of control action into two types. The first (secondary) aims to regulate power quality, such as correcting voltage/frequency, while the second (tertiary) optimizes the power flow dynamic.

Figure 7. Centralized control of AC microgrid.

3.1. Centralized Secondary Control

The secondary control system has been classified within the standard hierarchical architecture of microgrid control, as the regulator of voltage and frequency offsets to the primary level [23]. However, control objectives have been extended in [51] to include the correction of voltage balance at the common coupling (PCC) of an AC microgrid. Particularly, the adjustment of power exchange depends on a central controller request to regulate the output voltage according to the secondary control. A further secondary centralized control objective has been accomplished by M.H. Andishgar et al. [52], in which a powerful secondary control strategy has been proposed to improve the total harmonic distortion (THD) at sensitive load bus (SLB). This, in turn, has optimized the THD.

Where

(K_{P}, K_{i})

are the integral and the proportional gains,

(V_{d q h})

is the voltage in

d q

frame of each harmonic. Based on the proposed development, fifth, seventh, and eleventh harmonic distortions

(H D_{h})

(13) have been extracted depending on the

d q

voltage harmonic components of each voltage

(V_{d, q}^{+ 1} / V_{d, q}^{h})

, which are obtained by the multiple-second order generalized integers and frequency locked loop (MSOGI-FLL). The THD is calculated and compared with a reference value of the THD to achieve the total harmonic compensation signal (

{c^{'}}_{d q}^{h}

) (see Equations (14) and (15)). The modified total harmonic compensation signal (

c_{d q}^{h})

(refer to Equation (16)) was accomplished, and the total harmonic at the SLC bus was improved.

H D_{h} = \sqrt{\frac{{(V_{d}^{h})}^{2} + {(V_{q}^{h})}^{2}}{{(V_{d}^{+ 1})}^{2} + {(V_{q}^{+ 1})}^{2}}}

(13)

where

h = 5, 7, 11

T H D = \sqrt{H D_{5}^{2} + H D_{7}^{2} + H D_{11}^{2}}

(14)

{c^{'}}_{d q}^{h} = V_{d q h} (\frac{K_{i}, H}{S} + K_{P}, H) (T H D_{r e f} - T H D)

(15)

C_{d q}^{h} = C_{d q}^{, h} \times (\frac{3 \times H D_{h}}{\sum_{K = 5, 7, 9} H D_{K}})

(16)

With advances in technology, the objectives of a secondary centralized control system have been expanded to include the balance of SOC. Y. Guan et al. [53] suggested a secondary control strategy to balance the rate of discharge for the ESSs in AC microgrid. It has eliminated the deviation of voltage and frequency that created in droop control, which is due to the unbalance of the storages SOC. A secondary SOC-based control has been added to enhance the primary control strategy of BESS in standalone microgrid [54]. The objective was to restore the SOC deviation that appeared at the primary level. This deviation is caused when a variation of load happens, which, in turn, requires more active power of battery energy storage. This demanded active power is prompting a deviation of SOC. Then, the deviation is sent to the secondary control after a small communication time delay to be restored. A secondary central control layer has been proposed in [55] with the adaptive droop regulated primary level, of a developed control strategy for DC autonomous microgrid consisting of several distributed generators (DGs) and two distributed batteries. The role of supervisory control here is to monitor and regulate distributed batteries charging and discharging to support a prolonged lifecycle, and to maintain the voltage balance. According to the developed supervision protocol of the distributed batteries, SOC in normal operation is forced by virtual resistance to be balanced, and the one with higher SOC is the first that is fully charged. if energy production of the system is disturbed, then batteries will discharge, in a manner that the one that is firstly fully charged is the initiator.

As shows in (17), state of charge variation with time (

S O C_{i} (t)

) of any of the distributed batteries is based on the rate of change that subtracted of the initial state of charge

(S O C_{i} (0))

which is inversely proportional to battery rated capacity (

C_{B A T, i})

. Here,

(η_{i})

is charge/discharge efficiency, and

(I_{B A T, i})

is the battery current. The batteries, according to secondary control supervision, are swapping charge and discharge with no one falling below 90% of its full capacity.

S O C_{i} (t) = S O C_{i} (0) - \int_{0}^{t} η_{i} \frac{I_{B A T, i} (τ)}{C_{B A T, i}} (τ) d τ

(17)

Other applications of centralized secondary control for balancing SOC of distributed ESS have been presented in [56,57]. Here, secondary control has implemented the objectives of managing the amounts of energy and power of distributed ESSs, which were essential for maintaining the instabilities of generation, balancing load demand, improving power quality, and enhancing backup power. Furthermore, the balance of SOC has achieved a fundamental objective of reducing the maximum depth of discharge that supports a more prolonged life of the distributed storage. Ultimately, system sustainability and overall efficiency have been optimized. The control strategy proposed by Z. Jin et al. [58] is an advanced application of secondary centralized control in DC distribution, which is deemed as one of the most current trends for future mobile power systems. A secondary centralized control has been acted within a hierarchical control to accomplish the objectives of a large-scale mobile power system of shipboard that consists of a DC network and ESS. The management control system has collaborated with a primary adaptive inverse droop control to verify comprehensive control treatment and achieve two main objectives: (1) the management of control level, which, in turn, supports the collaboration of BESSs to coordinate number and fuel consumption for running agents; (2) the restoration of voltage level through the compensation of voltage drops at droop control.

Rule-based control (RBC) is an outer or secondary control that has been applied to a BESS, for the aim of controlling charge/discharge on an hourly basis, and creating a controlled current reference that is convenient for this role. Additionally, the proposed controller has considered an SOC balance and charging constraints [59]. As shown in Figure 8, the controller is taking renewable generated power from wind or scholar, hourly dispatched power set point, battery SOC, and battery voltage, as inputs. Output is a current reference (

i_{b e s s}),

and within a limit of maximum charge and discharge current (

i_{m a x, c h}, i_{m a x d i s c h}),

as illustrated in (19). The SOC is kept within the obliged lower and upper limits (

S O C_{L L}

,

S O C_{U L})

(refer to Equation (18)).

S O C_{L L} \leq S O C (t) \leq S O C_{U L}

(18)

i_{m a x, c h} \leq i_{b e s s} (t) \leq i_{m a x, d i s}

(19)

Figure 8. Rule-based control.

The rule-based secondary control strategy has been applied in [60] to a distributed energy storage system consisting of a vanadium redox battery and supercapacitor and fed by photovoltaic generation. The aim was to manage (charge/discharge) on an hourly basis and with the introduction of mandatory constraints. It is of scientific and technical interest to explain the purpose of using a supercapacitor with the vanadium redox battery, despite its high storage capacity. Vanadium redox battery’s main characteristics are independent with its energy and power densities, a long lifecycle, no limitation of discharge depth, and good efficiency. However, its response time is limited by an electrolyte that controlled by a pump. Therefore, their flow rate needs to be maintained. On the other hand, the supercapacitor major features are storing energy in a form of electricity and no need for a conversation to other kinds, very high efficiency and power density, deep discharge, and long lifecycle. Despite this, it has a very low energy density and cannot be used for long-term storage. The benefit of using a supercapacitor in parallel with the vanadium redox battery is reducing the rating of the redox battery. Furthermore, combining the good features of both storage techniques allows us to obtain a qualified ESS that satisfies its purpose when connected to a PV system [61,62]. Another effective application has been proposed by C. Wang et al. [63] to involve RBC as a secondary controller in a combined control strategy of a central controller and local controllers, for the aim of controlling ESSs power flow on an hourly basis.

Economic Model Predictive Control (EMPC) has been implemented for a residential distributed ESS that was fed by photovoltaic generation in [64]. The aim was to optimize their power flow (charge/discharge) based on a time varied tariff. As demonstrated in (20), which demonstrates the overall power that supplied and drawn from the ESS at time K. The EMPC controls the power flow of the distributed ESS at a time interval (K)(

P_{B} (K)

), which is the summation of charge and discharge power (

P_{c h}, P_{d i s}

) (refer to Figure 9) and based on a SOC within a minimum and maximum limitations (

S O C_{m i n}, S O C_{m a x}

), as explained in (21). The block diagram in Figure 9 clarifies control objectives of the specific strategy. This demonstrates that the controller takes net load power, renewably generated power, Tariff of importing power from the grid, Tariff of exporting power to the grid, and tariff of using power from the battery storages, as inputs. Moreover, measured SOC at time K is as feedback. Control objective was to provide control of power at charge and discharge for every K, which, in turn, accomplished optimized economic cost of residential houses demand based on different time tariff.

P_{B} (K) = P_{c h} (K) + P_{d i s} (K)

(20)

S O C_{m i n} \leq S O C (K) \leq S O C_{m a x}

(21)

Figure 9. Economic module predictive Control.

It is clearly clarified from the objectives of RBC and EMPC strategies that both are managing charge/discharge for the storage in a constrained manner, which, in turn, implements low power consumption based on energy storage support. In contrast, the major difference is that RBC does not consider the cost of electricity, and the cost of batteries’ lifecycle degradation when controlling their charge and discharge. For comparison, both strategies were applied to systems of identical characteristics, and independently operated with no grid supply to store excess PV generation, and then used later in peak times. The result of an aggregated demand for 30 consumers shows that EMPC had implemented a reduction in peak demand during peak times (between 17:00 and 20:00), more than RBC. In contrast, reduction in RBC is higher at off-peak (between 0.00 and 7:00). This proves that EMPC has predicted peak demand and shafted it to off-peak times, which is the time of energy storage recharging. Therefore, EMPC was more successful than RBC in improving a 1-day load profile for the nominated group of consumers.

3.2. Centralized Tertiary Control of AC Microgrid

The major objective of tertiary centralized control is to provide optimal voltage references or offsets. Moreover, it manages power flow into and out of microgrid predetermined network [24,25]. AC optimal power flow problem can be defined as a nonlinear and non-convex problem of enhancing generation dispatch in a manner that accomplishes the lowest cost that is accepted by consumers. Besides, it considers the availability of active and reactive power [65]. Therefore, the non-convexity adds more complexity to the computation. Additionally, only approximate solutions are provided. These are the major drawbacks of the microgrid tertiary level. The execution of OPF focuses on managing power flow from the main grid to the microgrid, and vice versa. In addition, the optimum use of the available generation and storage units reduces power consumption. Tertiary control strategies in AC microgrid can be divided into four classifications depending on the approximation and ESS power management.

3.2.1. Single/Aggregated Distributed ESSs

It comprises dynamic optimal power flow (DOPF) solutions of distributed ESSs that represent single or aggregated capacity. Tertiary centralized strategy controls power flow between single/aggregated distributed ESSs and the main utility grid. No management is provided by tertiary towards power flow between these distributed storages. The control strategy that has been proposed in [66] is an effective application. In this, a tertiary energy management system (EMS) has been applied to the developed control strategy to accomplish balanced power within the microgrid network. The overall management has comprised two control managements; the first is for power flow of each converter, and the second is for power in the microgrid network under different generation and load circumstances. As given in (22), the clarified power management aims to accomplish power balance between PV generation (

P_{P V})

, the summation of AC/DC load power (

P_{a c L}, P_{D C L})

, and battery power (

P_{b a t})

. Therefore, the balance is implemented whenever loss power (

P_{l o s s}

) is reduced. Another application of single/aggregated tertiary central control of distributed ESS has been introduced in [67].

P_{p v} - P_{l o s s} = P_{a c L} + P_{D C L} + P_{b a t}

(22)

3.2.2. Ideal Real Power Transfer

Tertiary DOPF solutions consider the management of real power transfer between distributed ESSs. The energy management system that has been proposed by A. Ouammi et al. [68] offers clear clarification of this. A central controller and an energy management unit (EMU) have been combined with a model predictive controller to manage power exchanging scheduling among a group of interconnected smart microgrids. Several fundamental roles have been outlined; the most important of these was the support for an autonomous operation of microgrid through the management of its components, such as distributed generation control and charge/discharge schedule, while also providing information on power production and prices to the introduced module predictive controller to control power exchanging. Another important role was interfacing microgrid components to the central or global controller (GCC). Consequently, the microgrid exchanges power with other microgrids or disconnects, in the case of network failure, while the ESSs compensate power shortage via charge/discharge, depending on the operated task.

F. Garcia-Torres et al. [69] proposed a tertiary central control as part of an optimized energy storage management system of two hybrid ESSs distributed in an AC-connected microgrid. The main aim of the MPC-based strategy was to solve the lack of competition in the electricity market due to the unpredictability and deviations of renewable energy. This strategy takes advantage of the high storage density of hydrogen as one of the distributed energy storage units, and an optimized energy management system based on MPC has been designed to support more economical benefits and to support a reduced degradation of the distributed ESS. Another application of tertiary control to minimize the expected operation cost of microgrid has been suggested in [70]. Here, stochastic dynamic programming has been proposed as a solution to the optimum Microgrid operation that is determined by unity commitment (UC) and the economic dispatch (ED). A one-day to one-week preform of start-up, shut-down, and operating costs have been used. This was followed by an ED preformation for a few minutes to one hour, for the economic online allocation of units, and with a consideration of all system units and constraints.

3.2.3. Convex Approximation

Convex approximation or optimization of OPF means relaxing some constraints of the original problem and obtaining a convex model. This approximation can be used for a high reactance to resistance ratio (X/R) network to approximate the DC power flow under the assumption of reactive line impedance and a small difference of voltage angle [71]. An advanced application to the convex approximation to slave DOPF problems has been presented in [72]. Here, a developed strategy has been proposed as an EMS to single-phase or three-phase AC microgrid with distributed generation and storage units. Robust convex optimization was employed for a limited time horizon to minimize the cost of energy, import, export, and dispatch of the DG, in addition to the operation of ESSs. Moreover, it considers the self-discharging rate and SOC of the distributed energy storage. The developed EMS has been assessed via the Mount Carlo simulation method, and success has been verified, which states that power balance in the microgrid network is determined by the main utility grid and ESSs in connection mode through a collaboration to estimate the difference between local consumption and local production. K. Garifi et. al [73] proposed a convex relaxation to neglect constraints that were enforced of charge and discharge for ESSs in a grid-connected microgrid network. The solution was through the introduction of an MPC-based DC OPF penalty improvement approach. The specific development comprises a modification to the cost function to include a penalty function to remove charge/discharge constraints. Furthermore, Kuhn–Tucker conditions have been utilized to confirm satisfaction of the convex relaxation to the constraints. Simulation of the proposed system has been run off a multiple IEEE test system, to achieve reduced computation time, compared to the previous approach with constraint ESS.

3.2.4. Non-Convex Approximation

Non-convex strategies introduce non-convex approximation solutions when the objective or any constraints are non-convex, which comprises a combined mixed-integer linear programming and nonlinear programming of solving DOPF in a Microgrid that includes ESSs within its predetermined network. Furthermore, unbalanced phases are considered by the non-linear programming [74]. One of the solutions based on stochastic gradient descent-based optimization of parameters has been applied in [75] to optimize the non-convex problem. The microgrid nominated for the experiment consisted of distributed ESSs, microchips as a controllable DG, and uncontrollable DG. A developed version of central EMS has been proposed in [76] to optimize power dispatch of distributed ESS in an isolated microgrid. The development was through formulated mathematical programming centralized EMS with the help of MPC. Additionally, with generation and operational limits, this version was proposed to manage generation, balance power flow, and to provide settings of system operation and the balance-distributed ESS. Moreover, it was proposed to support the backup power of islanding. The decomposition of the mixed-integer nonlinear formulation problem (MINLP) into the mixed-integer linear programming (MILP) and UC was a sign that this solution might be superior to other solutions that were previously presented. Simulation of the proposed solution has demonstrated less computation time compared to other solutions [76]. D.E. Olivares et al. [77] raised objectives of the previous strategy to include stochastic mixed-integer programming formulation, in addition to a second stage OPF, and under the employment of nonlinear programming formulation. Therefore, both stages have cooperated in addressing the uncertainty of the same isolated microgrid. Decisions have been made via the proposed two-stage process in which a commitment was decided by the linear stochastic unit commitment (SUC), while final dispatch was accomplished by the shrinking horizon optimal power flow (SHOPF). Since SUC was responsible for commitments, it supports a fixed SOC boundary for the distributed ESS.

3.3. Centralized Tertiary Control of DC Microgrid

Solutions to the DC dynamic power flow problem (DC-DOPF) in DC microgrids have been suggested in the literature through many successful attempts. One distinctive solution is a power flow management based on MPC that has been proposed to solve the DOPF problem in a DC microgrid network [78]. The objective was to manage the power flow of DG units based on renewable predictions. Moreover, the capacity for controlling distributed ESSs power flow depends on their SOC. Another successful proposition has been suggested by M. Gulin et al. [79], in which a stochastic optimization problem has been diagnosed and solved through the design of a developed tertiary management system; specifically, a two-stage programming solution with the incorporation of an MPC to compensate for the uncertainty of the feedback mechanism. One of the valuable achievements was a successful integration between the ESS and the grid, in addition to the optimized energy management and minimized operating costs.

A centralized tertiary control has been acted within a hierarchical control approach in a most recent study by J. Zhang et al. [80] in DC microgrid supplied by distributed BESSs. The tertiary control has evaluated current sharing weights depending on the batteries SOC. While secondary control has included a unit control error (UCE) for the roles of restoring DC voltage of microgrid and accomplishing an accurate load sharing of batteries depending on the weights achieved by the tertiary level. The main aim of the developed strategy was to attain an optimized battery discharge management, which leads to a balanced sharing of the demand. Simulation of the new strategy has proved system validity and effectiveness.

A summary of the reviewed centralized strategies has been presented in Table 2, which explains the major strengths and weaknesses of each strategy.

Table 2. Summary of centralized control strategies.

4. Distributed Control Strategies Based on Multiagent Communication of Controlling Distributed ESSs

Decentralized control strategies are incapable of exploiting the full capacity of distributed ESS since it only depends on local information. Centralized control strategies require an adequate infrastructure for maintaining communication between the distributed ESSs. Therefore, both have a weakness in optimizing the combined energy and power of the storage system. This results in an urgent need for strategies that combine decentralization, in addition to communication with other units. Distributed multiagent systems have been developed for this purpose, as presented in Figure 10 which shows the application of a multiagent neighbor-to-neighbor communication network on the AC microgrid consisting of five distributed ESSs, each representing an independent agent. It exists under two main categories: secondary and tertiary.

Figure 10. Distributed multiagent based control of AC Microgrid.

4.1. Secondary Multiagent of Controlling Distributed ESSs

Under this category, each distributed ESS agent operates autonomously with a presence of a neighbor-to-neighbor communication. Accordingly, they share information, such as SOC level, load current, output voltage, and power consumed, for the aim of balanced implementation for load demand. It addresses the problem of a cooperative consensus of distributed ESSs under a multiagent neighbor-to-neighbor information sharing [81]. Then, development has been carried out of the implementation for the distributed secondary multiagent to include the introduction of an optimal controller [82]. The classical theory has been extended to a networked system through the design of a linear quadric regulator based on an optimized control strategy at each node. S. Mondal et al. [83] recently proposed a successful application that highlights the impact of secondary multiagent control in the form of an integral consensus protocol by synchronizing the combined energy and power of a distributed BESS according to a multiagent neighbor-to-neighbor network of energy and power. This strategy has developed and has accomplished an independent energy and power consensus that is unaffected by load variations and batteries scenarios.

SOC balancing of distributed ESS has been incorporated by distributed secondary multiagent control, as one of the vital objectives for both AC and DC microgrid. There have been distinctive attempts in the literature with the aim of balancing SOC based on the distributed secondary in AC microgrid [84,85]. A clear clarification of the theory was introduced in [84]. Here, SOC balance of distributed ESSs in the AC microgrid was achieved via the design of a multiagent-based control algorithm of each agent. The average SOC of neighbor ESS at time K (

S O C_{m e a n_{j}} (K))

was received via multiagent communication. Then, the average SOC of the specific distributed ESS at the next time K + 1,

(S O C_{m e a n_i}

(K + 1)) was determined through dynamic average consensus information. Furthermore, the frequency that implements balanced SOC has been scheduled and applied to primary control.

Successful development lies in the active role of the dynamic consensus, which is clarified in Figure 11 and based on the proposed multiagent communication. An SOC that is created by consensus was compared with the measured state of charge (

S O C_{i})

to accomplish a balanced SOC. Then, the balanced SOC was compared with nominal frequency (

W_{°}^{*}

) to schedule the frequency that implements the obliged SOC balance. Finally, the scheduled frequency reference (

W_{°})

was applied to primary control with a voltage reference (

E_{°})

to achieve PWM control signal. Successful simulation of the developed secondary, multiagent-based distributed frequency scheduling had yielded valuable features of robustness against communication failure, in addition to its capacity for expansion. Furthermore, any of the distributed ESSs were capable of participating at any point of the operation.

Figure 11. Secondary multiagent frequency scheduling of an independent storage agent, to balance SOC.

C. Yu et al. [86] recently suggested an application of the theory on distributed BESSs in an islanded AC microgrid with the existence of multiagent communication. A control algorithm was designed with the aim of restoring frequency in addition to balancing SOC. The steady-state frequency was maintained to its nominal value via the compensation of power difference in the microgrid system. Another objective has been gained from the simulation rather than frequency and SOC, which is the optimization of synchronous speed of the developed event-triggered method over the conventional one.

DC microgrid networks have also been an application field of the secondary multiagent for the objective of balancing SOC of distributed ESS. An innovative application of this is demonstrated in [87]. Here, a distributed multiagent secondary was applied to a DC-connected microgrid with distributed ESSs. SOC balancing was one of the valuable objectives of the system. The key aspects of development for this strategy focused on two main tasks: (1) when the distributed secondary control created a voltage control action (

u_{i}^{⊽}

) and an average energy control action (

u_{i}^{e}

), they were then added to droop calculation to create reference voltage

(V^{*})

that was implemented to balance output voltage for the agent with the connected DC Bus (see Equation (23)); (2) when the developed control system has been applied to the AC/DC grid rectifier to manage power flow and modes of the DC Microgrid in a form that provided a balanced energy level (balanced SOC) for the distributed ESSs. The grid rectifier received information from neighboring ESSs regarding the voltage and energy situation. Therefore, the need for a central controller or control mechanism to control the transition from one mode to another was eliminated.

V^{*} = V^{m g} - F_{i} r_{i} (i_{i} - u_{i}^{⊽} - u_{i}^{e})

(23)

SOC balancing of distributed heterogeneous ESSs in DC microgrid has also been solved by distributed secondary multiagent-based control. The strategy in [88] was one of the successful propositions, in which multiagent-based energy coordination control was applied to a control hybrid microgrid consisting of BESSs and ultracapacitors with no need for a central controller. The various level distribution of the multi-benefits heterogeneous distributed storages helped to achieve an enhanced control optimization. Furthermore, more control objectives were gained. The developed strategy was a pattern of four control scenarios; the microgrid bus voltage was maintained by leader ultracapacitors, while ultracapacitor voltage was maintained by leader batteries. The other ultracapacitors were followers and were responsible for implementing local urgent load demands. On the other hand, the main objective of follower batteries was to balance the SOC. Despite the many objectives of the strategy that were achieved, the main objectives were to balance the microgrid power and maintain the SOC balance.

Distributed secondary multiagent strategy was developed as a solution to a limitation of a linear consensus protocol of the distributed BESSs in a microgrid. The limitation occurred in previously proposed strategies to balance the dynamic energy level of the distributed BESSs [87,88,89]. An undesired tradeoff between dynamic energy balancing and the equilibrium of SOC caused circulation current between the distributed BESSs. T. Morstyn et. al [90] designed a strategy to maintain a linear consensus protocol limitation via the balance of the SOC. Thus, a sliding mode control has been integrated to a secondary multiagent-based control of distributed BESSs in DC microgrid. The achieved sliding mode control action (

u_{i} (t))

, as given in (24), has succeeded in controlling the level of participation of the distributed BESSs in droop control for both charging and discharging based on information from multiagent communication regarding average neighbor’s storages SOC

(A_{i} (t))

, measured SOC

(S_{i}

(t)), and the measured participation current per unit storage (

i L_{i}^{- p u})

.

u_{i} (t) = {\begin{matrix} 1, S_{i} (t) \geq A_{i} (t) a n d i_{L i}^{- p u} (t) > 0 \\ 1, S_{i} (t) \leq A_{i} (t) a n d i_{L i}^{- p u} (t) < 0 \\ 0 o t h e r w i s e \end{matrix}

(24)

The initial implementation of the theory achieved a balanced SOC but with the appearance of two defects. A chattering, which was due to the many rapid switches of sliding mode control to keep the SOC of the distributed BESS equal to the average SOC of its neighboring BESSs, and an overloading of some participating distributed BESS with a higher storage level due to a wide range of participation in the current level. To overcome these weaknesses, an updated sliding mode surface was introduced (see Equation (25)), as was a new maximum per unit current limit

(i_{L}^{p u m a x}) .

This current was determined from the division of maximum discharge current

(i_{L i}^{m a x})

of the distributed BESS by the battery maximum capacity (

C_{b a t t i})

. The new sliding mode control has prioritized solving the drawback of overloading over guaranteeing accurate SOC synchronization, to reduce chattering.

u_{i} (t) = {\begin{matrix} 1, S_{i} (t) > A_{i} (t) a n d i_{L i}^{- p u} (t) > 0, | i_{L i}^{- p u} | > i_{L}^{p u m a x} \\ 1, S_{i} (t) < A_{i} (t) a n d i_{L i}^{- p u} (t) < 0, | i_{L i}^{- p u} | > i_{L}^{p u m a x} \\ 0 o t h e r w i s e \end{matrix}

(25)

The proposed strategy gained some features over the conventional strategy, including circulating current between the participated distributed BESSs, and the feature of plug and play.

A more recent development of secondary distributed multiagent-based control was applied to introduce time-oriented SOC balancing in [91]. The idea behind the developed consensus protocol was to achieve the obliged SOC balance through the time management of charging/discharging modes of the distributed BESSs. As shown in (26) and (27), the average time of discharging/charging

(t_{i}^{- d} (t), t_{i}^{- C} (t))

at node

(i

) was determined by subtracting the estimated average neighbor time of discharging/charging

(t_{j}^{- d} (τ), t_{j}^{- C} (τ))

from the estimated average time of discharging/charging (

t_{i}^{- d} (τ), t_{i}^{- C} (τ)

) at node

(i)

. Then, the result was added to the measured time of discharge/charge (

t_{i}^{d} (t), t_{i}^{C} (t))

at the specific node. The new development succeeded in balancing the SOC to the range of 20% as the minimum percentage level and 90% as the maximum percentage level. Furthermore, this new development played an important role in regulating estimated secondary voltage with the nominated reference.

t_{i}^{- d} (t) = t_{i}^{d} (t) + \int_{0}^{t} \sum_{j = 0}^{n} a_{i j} (t_{j}^{- d} (τ) - t_{i}^{- d} (τ)) d τ

(26)

t_{i}^{- C} (t) = t_{i}^{C} (t) + \int_{0}^{t} \sum_{j = 0}^{n} a_{i j} (t_{j}^{- C} (τ) - t_{i}^{- C} (τ) d τ

(27)

The strategy that was proposed by J. Almada et al. [92] is more recent, in which a secondary multiage-based control strategy was designed to operate both connected and standalone modes of a microgrid, in order to optimize the overall system performance. The control strategy consisted of a modified droop controller at the primary to accurately share reactive power, and a secondary centralized multiagent-based controller. The successful sign of the designed approach was via the adoption of an intelligent agent, which is autonomous and can decide, detect, and operate in the given environment with high responsibility. Therefore, it can cooperatively solve complex and distributed problems with other intelligent agents. The system has been tested and results showed an optimized balance of power and system stability.

4.2. Tertiary Cooperative Multiagent Based Strategies of Distributed ESSs

The main typical objective of tertiary cooperative multiagent control in a microgrid is to attain DOPF of distributed ESSs. Despite this common objective, the controls differ according to their specific control objective; some of them regulate the microgrid parameters, while others track these parameters. The strategies that accomplish economic optimization are the strategies that have priority for the preference. The preferable economic strategies are classified, based on multiagent communication architecture, into three categories.

4.2.1. Hierarchical Tertiary Multiagent Strategies of Implementing DOPF of Distributed ESSs

The DOPF solutions of these strategies are achieved via a collaboration of a central controller with autonomous distributed generation and storage agents. Each of these agents is working independently with its local controller and under specific constraints, while full information of power topology is provided by the central controller. K. Worthmann et al. [93] proposed a strategy that explains the concept of the distributed centralized tertiary; an MPC-based market maker strategy acts within three other control levels of implementing a flattened aggregation of power consumption, and communication is available between any of the distributed agents and the central economic optimization control management.

The agent at node

(i)

exchanges information with a market manager controller, at each time k and for N length sequence of prices. The information relates to the price to buy power from the main grid

(P)

(see Equation (29)), price to sell power to the main grid

(q)

(see Equation (30)), power supplied by the main grid at time k (

y_{i}^{+} (k)

) (refer to Equation (30)), and power injected to the main grid at time k (

y_{i}^{-} (k)

) (refer to Equation (31)). The objective was to attain the obliged solution of managing the cost. Cost management was the scenario of increasing selling and buying electricity prices when demand exceeds the average predicted and vice versa.

P = {(p (k), \dots, p (k + N - 1))}^{T}

(28)

q = {(q (k), \dots \dots, q (k))}^{T}

(29)

y_{i}^{+} (k) : = m a x {y_{i} (k), 0}

(30)

y_{i}^{-} (k) : = m a x {- y_{i} (k), 0}

(31)

A more recent effective implementation of the hierarchy-based tertiary multiagent distributed control was proposed in [94], in order to control the AC microgrid network with distributed energy resources, distributed ESSs, and loads. The three hierarchical control levels were used to achieve an optimized distributed power system. Tertiary control with a partnership of all distributed agents was responsible for solving the OPF problem. As demonstrated in (32), the mathematical formulation of the AC OPF problem as a function of tertiary control variable (

x^{t})

was intended to produce an economic generation with the application of convex reduction to power flow constraints (

h (x^{t})

), and generation limit constraints (

g (x^{t}))

(See Equation (33)). Implementation of the developed control strategy has demonstrated optimized scalability of solving AC OPF based on multiagent communication.

\min_{x^{t}} J (x^{t})

(32)

S . t . h (x^{t}) = 0, g (x^{t}) \leq 0

(33)

The exact diffusion strategy has been one of the most recently developed strategies for implementing an optimized economic dispatch of multiagent distributed agents in a designed microgrid, which consists of distributed generations, storage, and loads [95]. Tertiary centralized control was the higher level of the proposed hierarchical control that acted as power distribution optimizer rather than a central controller in accomplishing the economic operation of the microgrid. A microgrid global central controller (MGCC) agent transmits schedules to the distributed agents to optimize agents’ power dispatching. Additionally, it uses an optimization of the consensus algorithm for quicker convergence and increasing of stability and expansibility.

4.2.2. Topology-Based Multiagent DOPF Solutions

Topology-based solutions consider multiagent sparse communication between the distributed agents that reflects the power network topology of a predetermined distributed microgrid network. Each distributed agent has a bidirectional information exchange with all its neighbors. A comprehensive application to the theory has been achieved through a proposed decentralized control of distributed multi-smart-microgrid power network in [96]. The idea behind the development was to take the advantage of the distributed ESS agent at each smart microgrid network in order to achieve internal implementation of the demand. Furthermore, the network exchanges power locally with neighboring smart grids and the main utility grid.

The main objective of the proposed control strategy in [96] was to accomplish distributed cooperative control for any of the distributed smart grids according to the topology-based multiagent communication as shown in Figure 12. Each smart microgrid (SMG) was considered an agent and communicated with neighbor agents through a power link to optimize its power exchange. Information exchange was in progress between neighbor agents regarding the current and expected power availability. The effective computation for many Microgrid systems was one of the signs of success to achieve all decentralization features. W. Kang et al. [97] proposed a strategy with a topology-based multiagent communication layer of distributed BESSs, DGs, and loads, in a microgrid. A systematic method was designed which uses multiagent information to accomplish SOC and reactive power balancing.

Figure 12. Topology-based multiagent communication of multi-smart microgrids power network.

4.2.3. Fully Distributed Tertiary Multiagent DOPF Solutions

Fully distributed DOPF solutions are based on topology-free communication, and only communication between close neighbor agents is mandatory. Therefore, it is achieved if at least a bidirectional communication between the distributed agent and one neighbor is achieved. The strategy that has been proposed in [98] clears up the application of the fully distributed solution and its effectiveness in coordinating distributed energy units. The specific energy management employed a (consensus + innovations) method to organize all energy units of the microgrid network. Each of these units included storage systems as an agent connected to a specific node. Therefore, full distribution multiagent sparse communication was implemented. Furthermore, it exchanged the cost and load demand information between neighbor’s agents, in order to ensure that the bulk energy of the microgrid is sufficient for load demand. The optimized operation of the distributed ESSs and the inclusion of ramp rate constraints were behind the successful solution to the DOPF problem. T. Morstyn et al. [99] applied a fully distributed DOPF to a microgrid network that included distributed ESSs for the aim of achieving a scalable solution that mimics the increase in distributed ESSs in future power networks. The work also eliminated the requirement for a central controller. The development in [99] comprised the division of the DOPF problem over the distributed agents to be solved based on local information provided by the autonomous agent. Thus, enhanced flexibility was achieved, in addition to more robustness.

4.3. Tertiary Competitive Multiagent Solutions

In the cooperative multiagent, distributed ESSs are involved in implementing DOPF optimization. Despite this objective, it can be difficult to implement further specific roles, such as the independent sale of energy and the increase in the overall microgrid profit. To understand the theory, a complete description of market-based Microgrid networks with competitive ESS agents was presented in [100]. A multiagent communication-based competitive game theory was employed for an AC microgrid of renewable energy distributed agents. The distributed agent was committed to hour-ahead information of the market for a whole day. Figure 13 [101] shows how the agent was updated with the environment through a multistage platform, which enabled it to perceive the environment through the sensors and make decisions. These decisions were sent to the actuators. One of the significant advantages of ESSs here is that the price of energy was proportional to the SOC, so the price is low whenever the SOC is high.

Figure 13. Competitive agent behavior.

Multi-microgrid multi-consumer systems are fundamental. Therefore, they have been a field of competitive multiagent application and there has been a great deal of work directed towards solving the management of energy distribution for such a system [102]. To this end, a multilevel Stackelberg gaming solution was established to consider the multi-microgrids as leaders that decide the mandatory level of generation. Furthermore, support from central energy management was available for an optimum energy tariff, and to earn more profit, in addition to the participation of consumers or followers in deciding the optimal consumption. Therefore, ESSs at the follower agents were deciding the optimal demand, which in turn has resulted in more profit for the specific Smart grid [102].

4.4. Combined Cooperative Competitive Multiagent Solutions

A combination of cooperative and competitive solutions can be achieved for attaining more intelligent solutions of power distribution management in smart grids. For example, providing multilevel energy trading and marketing, thus consumer level trading, in addition to the individual whole model marketing. For example, the multi-objective power management solution that was recently proposed in [103] to solve the power management problem. The new idea behind the development was to model the power management problem so that the distributed agents were involved in a bargaining game. This was attained by introducing a Nash bargaining solution. Furthermore, the implemented agent decision-based computation eliminated the need for a central controller. The employment of the Nash bargaining solution for solving the power management problem was extended to a multi-microgrid power distribution network in order to obtain a cooperative, agent-based, Pareto-optimal treatment of power management [104]. Furthermore, a utility supplier is the common factor in the coupling of all the agent microgrids that support the power exchange of a multi-microgrid network, and represents the main market to accomplish the necessary cost reduction.

The multilevel energy market demonstrated success in the multiagent distributed power system in [105]. The operation of the Smart multi-microgrids has been enhanced through hierarchical, three-level marketing propositions. Here, the double-auction, day-ahead marketing mechanism was at the first level, while the other two levels were an hour-ahead real-time marketing. The concept of the hierarchical multilevel solution was to accomplish a multi-decision-makers system in a format where the upper-level decision-makers are leaders, while the lower levels are followers. The qualified multiagent-based communication that uses data distribution service (DDS) under the employment of real-time publish-subscribe (RTPS) implemented fast, reliable, and scalable communication. Furthermore, microgrids within power systems were capable of increasing system flexibility and accomplishing economic operations.

Table 3 demonstrates a summary of the reviewed multiagent strategies, which comprises the major strengths and weaknesses for each of these strategies.

Table 3. Summary of distributed multiagent control strategies.

5. Intelligent Control-Based Reinforcement Learning

One of the influencing factors that enhances system reliability is the integration between renewable resources and ESSs, so that excess generation of renewable energy can be stored. Therefore, multiagent communication is the gate towards decentralization and the accomplishment of this integration. Above all, the main objective of decentralization is the transition towards smart, decentralized microgrid networks. Reinforcement learning (RL) is one of the gold standards for smart intelligent power distribution management, especially with the trend towards a clean and economic environment, and with the increase in electric vehicles (EVs). The aim of the RL agent, as illustrated in Figure 14, is to increase the total reward via a sequential interaction with the environment status that includes a power distribution management within it [106]. The best action is learned for every state through the design of a qualified reward [107].

Figure 14. Reinforcement Learning Agent.

As given in (34), the state function explains that if a new state (

x)

visits, then action is taken to move to the next state (

y

) via the provision of an urgent reward

(r (π (x))

). Furthermore, future returns that exist in the current state are regulated by an action factor (

γ \in (0, 1)

). Here,

P

is the probability sequence from current to next state [107].

V^{π} (x) = r (π (x)) + γ \sum_{y} P [π (x)] V^{π} (y)

(34)

5.1. Balance of Exploration and Exploitation

The aim of a learning scenario is to explore from statuses and exploit from rewards in order to decide the RL action. Therefore, both exploration and exploitation need to be balanced to avoid jamming in local peaks. Balancing exploitation/exploration is not an easy task because of the experience essential for optimizing the actions and handling so much mandatory data. Because of that, the actions need to provide as many rewards as possible in order to achieve the desired action, which provides a high reward; this action is named the greedy action (

a_{g})

if it delivers the maximum reward. Solution policies for the balance problem between exploration and exploitation were presented in [106].

E-Greedy Policy

A model-free, e-greedy, reinforcement learning was proposed as a lower-level control for managing the energy of battery pack storage and two driving motors of a hybrid-tracked electric vehicle system [108]. It succeeded, with its Q-learning algorithm optimizing control based on online transition probability matrix (TPM) computation. E-greedy policy-based Q-learning RL has been nominated as a solution for the difficulty in obtaining EVs mobility and its charge/discharge profiles, which are desirable for the mobility-aware control algorithm (MACA) that proposed to optimize charge/discharge scenarios [109]. Since EV in the Vehicle-To-Grid (V2G) system can consume power in charging and supply power in discharging, then it can represent an autonomous microgrid with a storage unit. Z. Tan et al. [110] used a module-free, e-greedy, reinforcement learning of Q-learning solution as a non-convex top layer of a fast-learning optimizer, in order to implement the real-time optimal energy management (OEM) of a connected microgrid. The proposed strategy was an intelligent contribution for a combined management method of classical control and an intelligent model-free reinforcement learning, which in turn, enhanced the speed and the value of the quality optimization.

5.2. Q-Learning

Q-learning is a qualified method of model-free learning, mostly based on reinforcement learning, for the rule (

π)

of the decision-maker. It refers to how successful the process of deciding an action (

a)

is for the current state (

x)

(see Equation (35)). Therefore, the state function of the desired action can be represented as quality of taking that action [106].

Q (x, a) = γ \sum_{y} P [π (x)] V^{π} (y)

(35)

Q-learning has been involved in microgrid power management with the aim of achieving fast and high-quality optimization. The three proposed strategies [108,109,110], presented in the previous section all followed a Q-learning method of their module-free reinforcement learning involvement. The introduction of Q-learning RL of optimizing power flow for an EV charging station highlighted its development compared to classical, programming-based optimization [111]. This is due to the capacity of RL to complete and save solutions offline. Model-free RL has been recommended to enhance energy consumption scheduling, through gaining more information about power consumers and suppliers. The strategy highlighted the impact of involving RL in multiagent-based energy management through the accomplishment of agents’ (consumers and suppliers) information.

5.3. Batch Reinforcement Learning

Despite the wide range of applications of a model-free learning method, it is still not robust enough for application in some policies and is limited in its data. Therefore, there is a need for a more efficient RL methodology. Batch reinforcement can provide more efficient and stable solutions by having full knowledge of the experiences in the environment prior to an update, as illustrated in Figure 15, which clarifies that batch experiences are saved and applied before taking the action; this differs from Q-learning, which updates Q-values at the action time [112].

Figure 15. Batch Reinforcement Learning.

The application of batch learning in scheduling power management in a microgrid, specifically in the power flow of an ESSs, was highlighted in [113]. A combination of Q and batch-learning was proposed to achieve an optimized proposal of a battery operation. An optimized operation of the battery was decided by the nominated RL agent based on storage SOC, demanded load, inverter efficiency, and PV generation. A developed batch Q-learning was proposed by G. Shi et al. [114] to manage the energy of an eco-based microgrid network that consisted of an office as a demand, photovoltaic generation as renewable supply, and a battery storage unit. The system used the full knowledge of the optimized performance over a period of time to prepare for a real-time electricity rate and demand, which, in turn, accomplished the objectives of the developed Echo-RL-based strategy of optimizing charge/discharge of the battery that implements the optimized reduction in the total cost.

5.4. Deep Q-Learning

Deep Q-learning is a combined solution of supervised and reinforcement learning, which combines deep learning and batch-based Q-learning [115]. Figure 16 shows that deep Q-learning is comprised of two neural networks. One network estimates or predicts the current Q, while the other uses the old estimation to estimate the next Q or targets Q.

Figure 16. Deep Q-learning network.

The application of deep Q-learning in managing energy, in a microgrid consisting of DGs, BESSs, and PV, was applied in [116], as a solution to the uncertainty in renewable generation, demand, and their prices. The development was through a formulation of Markov decision-making in scheduling the specific microgrid operation in real-time. RL-based deep Q-learning was introduced for solving Markov decision learning [116]. Then, action was approximated via the proposed deep Q-learning, and the designed deep forward neural network. Implementation proved that deep Q-learning-based scheduling predicted uncertainty of operation with no explicit model, unlike traditional RL that requires a specific model. The application of deep Q-learning in a multi-microgrid smart grid was proposed by X. Lu et al. [117], to balance supply with the demanded load. Therefore, deep Q-learning serves the aim of achieving an energy trading policy via the intelligent prediction of renewable generation, future demand, and level of storage in the battery. Simulation has verified the system’s success in maintaining the mismatch between generation and demand, which gives a reduction in plant scheduling of 12%, and a rise of Microgrid renewable generation utility of 22.3%.

A new deep RL control approach was recently suggested by L. Desportes et al. [118] for a power distribution network consisting of a hybrid ESS of lead battery and hydrogen storage, a photovoltaic system as a renewable resource, and a consumer, represented by a partial islanded building. The main aim of the designed approach was to accomplish a 35% long-term renewable feeding for the building and reduce emission impacts due to fuel generation. To successfully achieve this goal, a control strategy-based new deep deterministic policy gradient

D D P G_{\propto r e p}

algorithm was suggested. Particularly, the problem was reformulated to minimize components of the action to one component, (

\propto_{r e p} (t))

. (

S_{t})

is the state of hydrogen storage. Simulation implementation showed that the newly suggested strategy learned the policy (

π_{θ} : S_{t} \to α_{r e p} (t)

). Additionally, the main goal of reducing carbon impact was achieved when the efficiency of the hydrogen storage was adequately large. The smart deep RL-based strategy that was proposed in [119] was the most recent distinctively successful attempt to control a complex hybrid electrical and thermal storage system that was fed by a PV system of a residential building. The main aim of the new strategy was to reduce energy obliged for heating, cooling, and providing hot water. The developed RL-based strategy demonstrated success in dealing with the complexity of the thermal system. The implementation of the new strategy has been compared with a rule-based control and demonstrated better system management as well as significant cost and energy-saving enhancement.

5.5. Actor-Critic Algorithms

The actor-critic algorithm is a combination of two deep Q-learning networks in order to maximize the total reward. They are operating cooperatively in a scenario where the actor policy network delivers an action for a state from the environment. The critic policy network monitors two inputs from the environment, state and reward, that are created by the actor’s action, and then the accomplished action is returned to the critic and is also sent to the actor [120]. As an application to actor-critic in terms of optimizing stored power management, a deep deterministic policy gradient (DDPG) algorithm has been employed within a battery power management system in a microgrid to minimize consumption cost and steadying battery SOC [121]. Massive and intensive training has been conducted by DDPG, in addition to the avoidance of over-fitting, to optimize battery power flow based on different preference consumers. Results have shown success to accomplish an increase in profit by 55%, and a decrease in SOC instability by 67.5%. A more recent intelligent application of the actor-critic deterministic deep learning policy was designed by L. Yu et al. [122], to manage the energy scheduling of energy storage systems, in addition to other requirements, for a Smart home. A challenging drawback of uncertainty in renewable generation, unshifted high demands, outside temperature, and consumption tariffs, encouraged the authors to design intelligent power management. Then, a deterministic deep learning-based system was designed, which demonstrated effectiveness and robustness in accomplishing the desired energy management.

In a more recent study, A. Joshi et al. [123] proposed a new actor-critic RL-based method named polynomial deterministic policy gradient (PDPG) in order to design a new RL-based control approach to controlling a residential household, fed by a photovoltaic-battery renewable system. The objectives were to reduce consumption cost, enhance battery scheduling and boost roles of consumers of the management policy. The proposed design is a model-free Q-learning capable of accounting for continuous action and learning a deterministic policy under the introduction of an actor-critic dependent upon a deterministic policy gradient. Implementation of the policy has shown progress over state-of-the-art designs in terms of reducing computational time and electricity cost reduction.

Table 4 summarizes the major strengths and weaknesses of the reviewed intelligent strategies.

Table 4. Summary of intelligent strategy-based reinforcement learning.

6. Emerging Reinforcement Learning Techniques of Power Management in Micro and Smart Grids

Reinforcement learning techniques have become the smart solution to many defects that were dilemmas in the past. Furthermore, it has added extra success and intelligence to the existing solutions. However, traditional techniques are not always sufficient; therefore, emerging reinforcement learning techniques have been introduced, which are developed versions of the traditional techniques to solve power management issues in some complex power distribution applications that cannot be solved by the traditional strategies. Research is still in the early stages of this sector; therefore, future research work is planned to be comprehensive research of these emerging techniques, and specifically of energy management optimization. Synchronous and asynchronous learning have been developed because of the instability of Q-learning in some complicated applications [106]. Asynchronous actor-critic (A3C) was developed earlier, and is a developed version of actor-critic. Specifically, it is a combination of several neural network agents trained asynchronously with different environments [124]. Then, it has been noticed that despite the intelligence of the asynchronous approach, its complexity is a drawback. Therefore, an uncomplicated synchronous actor-critic version (A2C) has been designed to provide intelligence with no complexity [125]. Multiagent reinforcement learning (MARLA) strategies use more than a single reinforcement learning agent; each of them is interacting with the environment to learn the desired optimization of the control system [126]. Therefore, it is introduced when a single reinforcement agent is insufficient of the purpose [126]. The vital need for multiagent reinforcement learning strategies is increasing with the trend towards decentralization of power distribution in micro- and smart grids, especially when the distribution network is more complicated and consists of a group of decentralized energy agents that distribute far from each other within a microgrid network. The key for transfer in reinforcement learning is to use the knowledge that is achieved by solving a specific problem to solve another one; in other words, to transfer the knowledge or solution [127]. Priority experience can be defined as the scenario of sampling past experiences of an RL agent for accomplishing a learning objective [128]. Traditional RL techniques elect an immediate extrinsic motivation for the agent, for implementing the objective of the learning process. Due to the un-scalability of the developed reward of the traditional RL, and the immediate impact that is needed of RL action in some complex environments such as modern power networks, intrinsic motivation methods have been introduced in [129], so that the reward is produced by the RL agent independent of state transition. The curiosity meaning of RL is the prediction error of state transition. Furthermore, actions that provide higher intrinsic reward reduce curiosity [130].

7. Conclusions and Recommendations

This paper has presented a comprehensive review of historic and state-of-the-art control strategies for distributed energy storage systems in microgrids, smart grids, and intelligent power distribution networks. The importance of ESSs in providing balancing services and to help buffer against intermittent renewable supply is well agreed upon; therefore, it is imperative that research related to their control and management is up to date and succinctly summarized. This paper has set out to provide such a review. 130 research works in the area have been dissected, and a distinctive summary has been presented for each control strategy to highlight the major strengths and weaknesses related to design, implementation, and service provision. Highlights are summarized in the following paragraphs.

Droop control is the traditional strategy of the primary decentralized control for similar or heterogeneous ESSs in microgrid networks. On the other hand, application of droop control without adaptation or consideration of SOC cannot simultaneously provide full voltage balance and load sharing services. Fuzzy logic is one such adaptation to overcome SOC overloading and instability of both voltage and frequency, while virtual impedance can be deployed as a solution to unbalanced reactive power when there is a mismatch of transmission lines impedances. Centralized strategies implement direct control of ESSs and enable individual monitoring and trimming. The secondary control corrects or supervises primary control, in addition to its participation in regulating load sharing and balancing SOC. Rule-based control and EMPC seem the most successful applications, with the main objective of optimizing ESSs power flow on an hourly or sub-hourly basis. The strength of EMPC over rule-based control lies in the consideration of a time varied electricity tariff to potentially yield an economic profit. The main objective of Tertiary centralized control is to provide optimal voltage reference. Furthermore, it helps to solve the OPF problem. Solutions for OPF differ depending on the ESS category. Aggregated solutions manage power flow with a consideration of ESSs as single capacity, while ideal real solutions consider real power management between the distributed ESSs. The different approximations affect OPF solutions, whether they are convex or non-convex.

The introduction of a central controller to many small and distributed ESSs in a microgrid network comes with many challenges. Each of the distributed ESSs is obliged to be controlled individually and precisely by a central controller. Therefore, expanding the infrastructure and providing provision for real-time communications is mandatory. In turn, communication disturbances can be introduced, in addition to privacy and security concerns. This encourages the decentralized control that is based merely on local information, which is with the imprecision of achieving the obliged balance and performance. These fundamental challenges and complications have paved the path towards multiagent control, in which a cooperative balance can be achieved by ESSs based on neighbor-to-neighbor local information exchange only. Secondary multiagent strategies ensure autonomous operation of distributed ESS through multiagent information sharing of SOC, output voltage, and load current. Tertiary cooperative multiagent strategies are classified depending on communication architecture and include hierarchical tertiary, which is accomplished via a direct multiagent communication between central controller and storage agents. Meanwhile, topology-based multiagent reflects the underlying power network topology with no requirement of a central controller. The topology-free multiagent on the other hand does not reflect power network topology, and control can be achieved if at least a bidirectional communication with the neighboring ESS is available. Competitive tertiary strategies differ from cooperative strategies and are required in competitive situations (such as one featuring an independent seller of energy), and consideration of microgrid and/or agent profit cannot be neglected. Furthermore, cooperative, and competitive strategies can be combined for more flexible solutions.

As mentioned previously, multiagent control is the gateway from autonomy to intelligence, and reinforcement learning is one of the shortest paths to reach it, through its application to power management and control. It is considered a powerful tool for scheduling and managing power in complicated power systems. The E-greedy policy is based on giving as much reward as possible to achieve a high reward action, while Q-learning is a model-free learning method that is largely based on reinforcement learning and has wide applications of power management in a microgrid. Despite that, it can still lack robustness, and is limited in its data for some policies. Therefore, more efficient batch reinforcement techniques have been introduced, and combined deep/batch reinforcement learning has also been applied for more accurate estimation and prediction. A further combination of RL architecture is the actor-critic, which consists of two deep Q-learning networks with the aim of maximizing the total reward.

Despite the intelligence of traditional reinforcement learning techniques, they prove insufficient in some complicated applications of managing power; therefore, further RL-based techniques are still emerging. Synchronous and asynchronous techniques are solving the instability of Q-learning in some complicated applications. Meanwhile, multiagent reinforcement strategies are mandatory (e.g., when a single RL network is insufficient), which is much applied to power management in a microgrid. Transfer RL is a principle in which knowledge for solving a problem is transferred from one domain to another domain; this is different to a priority technique, which samples past experiences to implement learning objectives. The extrinsic motivation of the agent is elected by traditional RL, but some complex environments, such as modern power networks, require the immediate impact of RL action. Therefore, intrinsic motivations methods were developed to provide the solution. Such emerging techniques have been applied to solve more complicated applications of power distribution management, or to follow the envisioned future trend of decentralization and autonomy in the design of power distribution systems; however, research is still in the early stages. The principal finding of this comprehensive review is that research gaps related to emerging decentralized intelligent strategies based on RL, and their applications to renewable energy control, management, and optimization in the context of microgrid energy storage mechanism remain. This review has provided a clear taxonomy and description of each control strategy, its methodology, applications, and the major strengths/weaknesses. This in turn fosters clarity of understanding of the topic by the reader, providing insight as to the nature of these research gaps and indicating how knowledge in this field may be extended effectively by future scholarly works.

Author Contributions

Conceptualization, M.A.-S. and M.A.-G.; methodology, M.A.-S., M.A.-G. and M.S.; formal analysis, M.A.-S., and M.A.-G.; investigation, M.A.-S. and M.A.-G.; writing—original draft preparation, M.A.-S.; writing—review and editing, M.A.-G. and M.S.; supervision, M.A.-G., and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

SOC	State of charge
ESS/ESSs	Energy storage system, systems
FIS	Fuzzy inference system.
MPC	Module predictive control
EMS	Energy system management
DG	Distributed Generation
BESS/BESSs	Battery energy storage system, systems
OPF	Optimum power flow
DOPF	Dynamic optimal power flow
RL	Reinforcement learning
EV/EVs	Electric vehicle, vehicles
SEMS	Superconducting magnetic energy storage
HPFD	High pass filter-based droop
VCD	Virtual capacitance droop
SC	Super capacitor
PCC	Point of common coupling
THD	Total harmonic distortion
SLB	Sensitive load bus
MSOGI-FLL	Multiple-second order generalized integers and frequency locked loop
RBC	Rule-based control
EMPC	Economic module predictive control
EMS	Energy management system
EMU	Energy management unit
GCC	Global central controller
UC	Unity commitment
ED	Economic dispatch
MINLP	Mixed integer nonlinear formulation problem
MILP	Mixer integer linear programming
SUC	Stochastic unit commitment
SHOPF	Shrinking horizon optimal power flow
HESS	Hybrid energy storage system
PV	Photovoltaic
MGCC	Microgrid global central controller
SMG	Smart Microgrid
PLA	Power Link
DDS	Data distribution service
RTPS	Real time publish subscribe.
TPM	Transition probability matrix
MACA	Mobility-aware control algorithm
V2G	Vehicle to grid
OEM	Optimal energy management
DDPG	Deep deterministic policy gradient
A3C	Asynchronous advantage actor-critic
A2C	Synchronous actor-critic version
MARLA	Multiagent reinforcement learning
DP	Dynamic programming
TRPO	Trust-Region Policy Optimization
PWM	Pulse width modulation
PEM	Polymer Electrolyte membrane
NDO	Nonlinear disturbance observer
FC	Fuel cell
Tariff	How energy provider charges consumers for using energy
UCE	Unit control error
MMC	Modular multi-level converter
MMC-ESS	Modular multi-level converter energy storage system
X/R	Reactance to resistance ratio
AI	Artificial Intelligence
$K_{P}$	Active power droop coefficient
$K_{q}$	Reactive power coefficient
$Z_{v 1}, Z_{v 2}$	First and second inverter virtual impedances
$Z_{1}, Z_{2}$	First and second line impedances
$K_{d}, K_{c}$	Discharge/Charge droop coefficients
$e^{1 / S O C_{i}^{n}}$	Exponential of the computed SOC
$u_{d c i}$	Discharge/Charge Droop control action
$R_{d}$	Fuzzy logic droop control virtual resistance
$m$	Fuzzy droop control correction
$K_{s m e s}$ , $K_{b a t t}$	Battery and SMES droop coefficients
$f_{n o n_u p}$	Critical-up frequency
$f_{n o n_l o w}$	Critical-low frequency
$R_{V}$	Deviation resistance
$V_{d q h}$	Voltage in dq frame
$c_{d q}^{, h}$	Total harmonic compensation signal
$c_{d q}^{h}$	Modified total harmonic compensation signal
$C_{B A T, i}$	Battery rated capacity
$η_{i}$	Charge/Discharge efficiency
$P_{B} (K)$	Power flow of the distributed ESS at time K
$P_{c h} (K), P_{d i s} (K)$	Charge, discharge power at time K
$V^{m g}$	Microgrid reference voltage
$u_{i}^{⊽}$	Voltage control action
$u_{i}^{e}$	Energy control action
$S_{i} (t)$	SOC at of agent i at time t
$A_{i} (t)$	Average neighbors’ agents SOC of agent i
$i_{L i}^{- p u}$	Participation current of agent i per unit energy storage
$i_{L}^{p u m a x}$	Maximum current of agent i per unit energy storage
$a_{i j}$	Communication weight from node j to node i.
$y_{i}^{+} (k)$	Power supplied by the main grid at time K
$y_{i}^{-} (k)$	Power injected to the main grid at time K
$π$	Rule or policy
$V^{π} (y)$	Value of state $y$
$r (π (x))$	The immediate reward
$Q (x, a)$	Value function (degree of goodness of taking an action a)
$a$	Action
$CPLs$	Constant power loads

References

Butt, O.M.; Zulqarnain, M.; Butt, T.M. Recent advancement in smart grid technology: Future prospects in the electrical power network. Ain Shams Eng. J. 2021, 12, 687–695. [Google Scholar] [CrossRef]
Faisal, M.; Hannan, M.A.; Ker, P.J.; Hussain, A.; Mansor, M.B.; Blaabjerg, F. Review of energy storage system technologies in microgrid applications: Issues and challenges. IEEE Access 2018, 6, 35143–35164. [Google Scholar] [CrossRef]
Ceglia, F.; Marrasso, E.; Roselli, C.; Sasso, M. Small renewable energy community: The role of energy and environmental indicators for power grid. Sustainability 2021, 13, 2137. [Google Scholar] [CrossRef]
Fichera, A.; Marrasso, E.; Sasso, M.; Volpe, R. Energy, environmental and economic performance of an urban community hybrid distributed energy system. Energies 2020, 13, 2545. [Google Scholar] [CrossRef]
Huang, P.; Sun, Y.; Lovati, M.; Zhang, X. Solar-photovoltaic-power-sharing-based design optimization of distributed energy storage systems for performance improvements. Energy 2021, 222, 119931. [Google Scholar] [CrossRef]
Jing, W.; Lai, C.H.; Wong, W.S.; Wong, M.D. Dynamic power allocation of battery-supercapacitor hybrid energy storage for standalone PV microgrid applications. Sustain. Energy Technol. Assess. 2017, 22, 55–64. [Google Scholar] [CrossRef]
Wu, F.; Maier, J.; Yu, Y. Guidelines and trends for next-generation rechargeable lithium and lithium-ion batteries. Chem. Soc. Rev. 2020, 49, 1569–1614. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Diaz, D.F.R.; Chen, K.S.; Wang, Z.; Adroher, X.C. Materials, technological status, and fundamentals of PEM fuel cells–a review. Mater. Today 2020, 32, 178–203. [Google Scholar] [CrossRef]
Zhang, H.; Sun, C. Cost-effective iron-based aqueous redox flow batteries for large-scale energy storage application: A review. J. Power Sour. 2021, 493, 229445. [Google Scholar] [CrossRef]
Alirahmi, S.M.; Mousavi, S.B.; Razmi, A.R.; Ahmadi, P. A comprehensive techno-economic analysis and multi-criteria optimization of a compressed air energy storage (CAES) hybridized with solar and desalination units. Energy Convers. Manag. 2021, 236, 114053. [Google Scholar] [CrossRef]
Wang, Y.; Wang, C.; Xue, H. A novel capacity configuration method of flywheel energy storage system in electric vehicles fast charging station. Electr. Power Syst. Res. 2021, 195, 107185. [Google Scholar] [CrossRef]
Miao, Z.; Xu, L.; Disfani, V.R.; Fan, L. An SOC-based battery management system for microgrids. IEEE Trans. Smart Grid 2013, 5, 966–973. [Google Scholar] [CrossRef]
Toledo, O.M.; Oliveira Filho, D.; Diniz, A.S.A.C. Distributed photovoltaic generation and energy storage systems: A review. Renew. Sustain. Energy Rev. 2010, 14, 506–511. [Google Scholar] [CrossRef]
Xu, Q.; Hu, X.; Wang, P.; Xiao, J.; Tu, P.; Wen, C.; Lee, M.Y. A decentralized dynamic power sharing strategy for hybrid energy storage system in autonomous DC microgrid. IEEE Trans. Ind. Electron. 2016, 64, 5930–5941. [Google Scholar] [CrossRef]
Rehmani, M.H.; Reisslein, M.; Rachedi, A.; Erol-Kantarci, M.; Radenkovic, M. Integrating renewable energy resources into the smart grid: Recent developments in information and communication technologies. IEEE Trans. Ind. Inform. 2018, 14, 2814–2825. [Google Scholar] [CrossRef]
Carli, R.; Dotoli, M. Decentralized control for residential energy management of a smart users’ microgrid with renewable energy exchange. IEEE/CAA J. Autom. Sin. 2019, 6, 641–656. [Google Scholar] [CrossRef]
Olsen, D.J.; Kirschen, D.S. Profitable emissions-reducing energy storage. IEEE Trans. Power Syst. 2019, 35, 1509–1519. [Google Scholar] [CrossRef] [Green Version]
Dinh, H.T.; Yun, J.; Kim, D.M.; Lee, K.-H.; Kim, D. A home energy management system with renewable energy and energy storage utilizing main grid and electricity selling. IEEE Access 2020, 8, 49436–49450. [Google Scholar] [CrossRef]
Muhtadi, A.; Pandit, D.; Nguyen, N.; Mitra, J. Distributed energy resources based microgrid: Review of architecture, control, and reliability. IEEE Trans. Ind. Appl. 2021, 57, 2223–2235. [Google Scholar] [CrossRef]
Legry, M.; Dieulot, J.-Y.; Colas, F.; Saudemont, C.; Ducarme, O. Non-linear primary control mapping for droop-like behavior of microgrid systems. IEEE Trans. Smart Grid 2020, 11, 4604–4613. [Google Scholar] [CrossRef]
Guerrero, J.M.; Chandorkar, M.; Lee, T.-L.; Loh, P.C. Advanced control architectures for intelligent microgrids—Part I: Decentralized and hierarchical control. IEEE Trans. Ind. Electron. 2012, 60, 1254–1262. [Google Scholar] [CrossRef] [Green Version]
Liu, J.; Li, J.; Song, H.; Nawaz, A.; Qu, Y. Nonlinear secondary voltage control of islanded microgrid via distributed consistency. IEEE Trans. Energy Convers. 2020, 35, 1964–1972. [Google Scholar] [CrossRef]
Simpson-Porco, J.W.; Shafiee, Q.; Dörfler, F.; Vasquez, J.C.; Guerrero, J.M.; Bullo, F. Secondary frequency and voltage control of islanded microgrids via distributed averaging. IEEE Trans. Ind. Electron. 2015, 62, 7025–7038. [Google Scholar] [CrossRef]
Zhang, R.; Savkin, A.V.; Hredzak, B. Centralized nonlinear switching control strategy for distributed energy storage systems communicating via a network with large time delays. J. Energy Storage 2021, 41, 102834. [Google Scholar] [CrossRef]
Sahoo, S.K.; Sinha, A.K.; Kishore, N. Control techniques in AC, DC, and hybrid AC–DC microgrid: A review. IEEE J. Emerg. Sel. Top. Power Electron. 2017, 6, 738–759. [Google Scholar] [CrossRef]
Mohammed, A.; Refaat, S.S.; Bayhan, S.; Abu-Rub, H. Ac microgrid control and management strategies: Evaluation and review. IEEE Power Electron. Mag. 2019, 6, 18–31. [Google Scholar] [CrossRef]
Khan, K.A.; Khalid, M. Improving the transient response of hybrid energy storage system for voltage stability in DC microgrids using an autonomous control strategy. IEEE Access 2021, 9, 10460–10472. [Google Scholar] [CrossRef]
Mohd, A.; Ortjohann, E.; Schmelter, A.; Hamsic, N.; Morton, D. Challenges in Integrating distributed energy storage systems into future smart grid. In Proceedings of the 2008 IEEE International Symposium on Industrial Electronics, Cambridge, UK, 30 June–2 July 2008; pp. 1627–1632. [Google Scholar]
Morstyn, T.; Hredzak, B.; Agelidis, V.G. Control strategies for microgrids with distributed energy storage systems: An overview. IEEE Trans. Smart Grid 2016, 9, 3652–3666. [Google Scholar] [CrossRef] [Green Version]
Gao, F.; Kang, R.; Cao, J.; Yang, T. Primary and secondary control in DC microgrids: A review. J. Mod. Power Syst. Clean Energy 2019, 7, 227–242. [Google Scholar] [CrossRef] [Green Version]
Moslemi, R.; Mohammadpour, J. Accurate reactive power control of autonomous microgrids using an adaptive virtual inductance loop. Electr. Power Syst. Res. 2015, 129, 142–149. [Google Scholar] [CrossRef]
Tayab, U.B.; Roslan, M.A.B.; Hwai, L.J.; Kashif, M. A review of droop control techniques for microgrid. Renew. Sustain. Energy Rev. 2017, 76, 717–727. [Google Scholar] [CrossRef]
Lu, X.; Sun, K.; Guerrero, J.M.; Vasquez, J.C.; Huang, L. State-of-charge balance using adaptive droop control for distributed energy storage systems in DC microgrid applications. IEEE Trans. Ind. Electron. 2013, 61, 2804–2815. [Google Scholar] [CrossRef] [Green Version]
Alam, M.; Kumar, K.; Dutta, V. Droop based control strategy for balancing the level of hydrogen storage in direct current microgrid application. J. Energy Storage 2021, 33, 102106. [Google Scholar] [CrossRef]
Bi, K.; Yang, W.; Xu, D.; Yan, W. Dynamic SOC balance strategy for modular energy storage system based on adaptive droop control. IEEE Access 2020, 8, 41418–41431. [Google Scholar] [CrossRef]
Wang, W.; Zhou, M.; Jiang, H.; Chen, Z.; Wang, Q. Improved droop control based on State-of-Charge in DC microgrid. In Proceedings of the 2020 IEEE 29th International Symposium on Industrial Electronics (ISIE), Delft, The Netherlands, 17–19 June 2020; pp. 1509–1513. [Google Scholar]
Gavriluta, C.; Candela, J.I.; Citro, C.; Rocabert, J.; Luna, A.; Rodríguez, P. Decentralized primary control of MTDC networks with energy storage and distributed generation. IEEE Trans. Ind. Appl. 2014, 50, 4122–4131. [Google Scholar] [CrossRef]
Wang, J. SoC-based dynamic droop control for battery energy storage systems in DC microgrids feeding CPLs. J. Phys. Conf. Ser. 2021, 1754, 012060. [Google Scholar] [CrossRef]
Diaz, N.L.; Dragičević, T.; Vasquez, J.C.; Guerrero, J.M. In Fuzzy-logic-based gain-scheduling control for state-of-charge balance of distributed energy storage systems for DC microgrids. In Proceedings of the 2014 IEEE Applied Power Electronics Conference and Exposition-APEC, Fort Worth, TX, USA, 16–20 March 2014; pp. 2171–2176. [Google Scholar]
Diaz, N.L.; Dragičević, T.; Vasquez, J.C.; Guerrero, J.M. Intelligent distributed generation and storage units for DC microgrids—A new concept on cooperative control without communications beyond droop control. IEEE Trans. Smart Grid 2014, 5, 2476–2485. [Google Scholar] [CrossRef] [Green Version]
Díaz, N.L.; Wu, D.; Dragičević, T.; Vásquez, J.C.; Guerrero, J.M. In Fuzzy droop control loops adjustment for stored energy balance in distributed energy storage system. In Proceedings of the 2015 9th International Conference on Power Electronics and ECCE Asia (ICPE-ECCE Asia), Seoul, Korea, 1–5 June 2015; pp. 728–735. [Google Scholar]
Li, J.; Xiong, R.; Yang, Q.; Liang, F.; Zhang, M.; Yuan, W. Design/test of a hybrid energy storage system for primary frequency control using a dynamic droop method in an isolated microgrid power system. Appl. Energy 2017, 201, 257–269. [Google Scholar] [CrossRef]
Díaz-González, F.; Hau, M.; Sumper, A.; Gomis-Bellmunt, O. Participation of wind power plants in system frequency control: Review of grid code requirements and control methods. Renew. Sustain. Energy Rev. 2014, 34, 551–564. [Google Scholar] [CrossRef]
Xu, Q.; Xiao, J.; Hu, X.; Wang, P.; Lee, M.Y. A decentralized power management strategy for hybrid energy storage system with autonomous bus voltage restoration and state-of-charge recovery. IEEE Trans. Ind. Electron. 2017, 64, 7098–7108. [Google Scholar] [CrossRef]
Bharathi, G.; Kantharao, P.; Srinivasarao, R. Fuzzy logic control (FLC)-based coordination control of DC microgrid with energy storage system and hybrid distributed generation. Int. J. Ambient Energy 2021, 1–17. [Google Scholar] [CrossRef]
Liu, J.; Dong, D.; Zhang, D. A hybrid modular multilevel converter family with higher power density and efficiency. IEEE Trans. Power Electron. 2021, 36, 9001–9014. [Google Scholar] [CrossRef]
Zhang, D.; Jiang, J.; Zhang, L.; Zhou, Z. Grid-connected control strategy of modular multilevel converter–battery energy storage system based on VSG. J. Eng. 2019, 2019, 1502–1505. [Google Scholar] [CrossRef]
Yuan, Q.; Yang, F.; Li, A.; Ma, T. A novel hybrid control strategy for the energy storage modular multilevel converters. IEEE Access 2021, 9, 59466–59474. [Google Scholar] [CrossRef]
Gao, F.; Zhang, L.; Zhou, Q.; Chen, M.; Xu, T.; Hu, S. State-of-charge balancing control strategy of battery energy storage system based on modular multilevel converter. In Proceedings of the 2014 IEEE Energy Conversion Congress and Exposition (ECCE), Pittsburgh, PA, USA, 15–18 September 2014; pp. 2567–2574. [Google Scholar]
Liang, H.; Guo, L.; Song, J.; Yang, Y.; Zhang, W.; Qi, H. State-of-charge balancing control of a modular multilevel converter with an integrated battery energy storage. Energies 2018, 11, 873. [Google Scholar] [CrossRef] [Green Version]
Golsorkhi, M.S.; Hill, D.J.; Baharizadeh, M. A secondary control method for voltage unbalance compensation and accurate Load sharing in networked microgrids. IEEE Trans. Smart Grid 2021, 4, 2822–2833. [Google Scholar] [CrossRef]
Andishgar, M.H.; Gholipour, E.; Hooshmand, R.-A. Improved secondary control for optimal total harmonic distortion compensation of parallel connected DGs in islanded microgrids. IET Smart Grid 2019, 2, 115–122. [Google Scholar] [CrossRef]
Guan, Y.; Vasquez, J.C.; Guerrero, J.M. Coordinated secondary control for balanced discharge rate of energy storage system in islanded AC microgrids. IEEE Trans. Ind. Appl. 2016, 52, 5019–5028. [Google Scholar] [CrossRef] [Green Version]
Kim, Y.-S.; Hwang, C.-S.; Kim, E.-S.; Cho, C. State of charge-based active power sharing method in a standalone microgrid with high penetration level of renewable energy sources. Energies 2016, 9, 480. [Google Scholar] [CrossRef]
Dragičević, T.; Guerrero, J.M.; Vasquez, J.C.; Škrlec, D. Supervisory control of an adaptive droop regulated DC microgrid with battery management capability. IEEE Trans. Power Electron. 2013, 29, 695–706. [Google Scholar] [CrossRef] [Green Version]
Meng, T.; Lin, Z.; Shamash, Y.A. Distributed cooperative control of battery energy Storage systems in DC microgrids. IEEE/CAA J. Autom. Sin. 2021, 8, 606–616. [Google Scholar] [CrossRef]
Palizban, O.; Kauhaniemi, K. Distributed cooperative control of battery energy storage system in AC microgrid applications. J. Energy Storage 2015, 3, 43–51. [Google Scholar] [CrossRef]
Jin, Z.; Meng, L.; Guerrero, J.M.; Han, R. Hierarchical control design for a shipboard power system with DC distribution and energy storage aboard future more-electric ships. IEEE Trans. Ind. Inform. 2017, 14, 703–714. [Google Scholar] [CrossRef] [Green Version]
Teleke, S.; Baran, M.E.; Bhattacharya, S.; Huang, A.Q. Rule-based control of battery energy storage for dispatching intermittent renewable sources. IEEE Trans. Sustain. Energy 2010, 1, 117–124. [Google Scholar] [CrossRef]
Wang, G.; Ciobotaru, M.; Agelidis, V.G. Power management for improved dispatch of utility-scale PV plants. IEEE Trans. Power Syst. 2015, 31, 2297–2306. [Google Scholar] [CrossRef]
Sun, C.; Negro, E.; Vezzù, K.; Pagot, G.; Cavinato, G.; Nale, A.; Bang, Y.H.; Di Noto, V. Hybrid inorganic-organic proton-conducting membranes based on SPEEK doped with WO3 nanoparticles for application in vanadium redox flow batteries. Electrochim. Acta 2019, 309, 311–325. [Google Scholar] [CrossRef]
Etxeberria, A.; Vechiu, I.; Baudoin, S.; Camblong, H.; Kreckelbergh, S. Control of a vanadium redox battery and supercapacitor using a three-level neutral point clamped converter. J. Power Sour. 2014, 248, 1170–1176. [Google Scholar] [CrossRef]
Wang, C.; Zhang, T.; Ma, F. A multi-agent based hierarchical control system for DERs management in islanded micro-grid. In Proceedings of the 2015 Chinese Automation Congress (CAC), Wuhan, China, 27–29 November 2015; pp. 1371–1376. [Google Scholar]
Banfield, B.; Robinson, D.A.; Agalgaonkar, A.P. Comparison of economic model predictive control and rule-based control for residential energy storage systems. IET Smart Grid 2020, 3, 722–729. [Google Scholar] [CrossRef]
Halilbašić, L.; Pinson, P.; Chatzivasileiadis, S. Convex relaxations and approximations of chance-constrained AC-OPF problems. IEEE Trans. Power Syst. 2018, 34, 1459–1470. [Google Scholar] [CrossRef] [Green Version]
Hu, J.; Xu, Y.; Cheng, K.W.; Guerrero, J.M. A model predictive control strategy of PV-Battery microgrid under variable power generations and load conditions. Appl. Energy 2018, 221, 195–203. [Google Scholar] [CrossRef]
Parisio, A.; Rikos, E.; Glielmo, L. A model predictive control approach to microgrid operation optimization. IEEE Trans. Control Syst. Technol. 2014, 22, 1813–1827. [Google Scholar] [CrossRef]
Ouammi, A.; Dagdougui, H.; Dessaint, L.; Sacile, R. Coordinated model predictive-based power flows control in a cooperative network of smart microgrids. IEEE Trans. Smart Grid 2015, 6, 2233–2244. [Google Scholar] [CrossRef]
Garcia-Torres, F.; Bordons, C. Optimal economical schedule of hydrogen-based microgrids with hybrid storage using model predictive control. IEEE Trans. Ind. Electron. 2015, 62, 5195–5207. [Google Scholar] [CrossRef]
Nguyen, T.A.; Crow, M. Stochastic optimization of renewable-based microgrid operation incorporating battery operating cost. IEEE Trans. Power Syst. 2015, 31, 2289–2296. [Google Scholar] [CrossRef]
Montoya, O.D.; Gil-González, W.; Garces, A. Optimal power flow on DC microgrids: A quadratic convex approximation. IEEE Trans. Circuits Syst. II Express Br. 2018, 66, 1018–1022. [Google Scholar] [CrossRef]
Giraldo, J.S.; Castrillon, J.A.; López, J.C.; Rider, M.J.; Castro, C.A. Microgrids energy management using robust convex programming. IEEE Trans. Smart Grid 2018, 10, 4520–4530. [Google Scholar] [CrossRef]
Garifi, K.; Baker, K.; Christensen, D.; Touri, B. Convex relaxation of grid-connected energy storage system models with complementarity constraints in DC OPF. IEEE Trans. Smart Grid 2020, 11, 4070–4079. [Google Scholar] [CrossRef]
Bai, W.; Zhu, X.; Lee, K.Y. Dynamic optimal power flow based on a spatio-temporal wind speed forecast model. In Proceedings of the 2021 IEEE Congress on Evolutionary Computation (CEC), Kraków, Poland, 28 June–1 July 2021; pp. 136–143. [Google Scholar]
Shuai, H.; Fang, J.; Ai, X.; Yao, W.; Wen, J.; He, H. On-line energy management of microgrid via parametric cost function approximation. IEEE Trans. Power Syst. 2019, 34, 3300–3302. [Google Scholar] [CrossRef]
Olivares, D.E.; Cañizares, C.A.; Kazerani, M. A centralized energy management system for isolated microgrids. IEEE Trans. Smart Grid 2014, 5, 1864–1875. [Google Scholar] [CrossRef]
Olivares, D.E.; Lara, J.D.; Cañizares, C.A.; Kazerani, M. Stochastic-predictive energy management system for isolated microgrids. IEEE Trans. Smart Grid 2015, 6, 2681–2693. [Google Scholar] [CrossRef]
Morstyn, T.; Hredzak, B.; Agelidis, V.G. Dynamic optimal power flow for DC microgrids with distributed battery energy storage systems. In Proceedings of the 2016 IEEE Energy Conversion Congress and Exposition (ECCE), Milwaukee, WI, USA, 18–22 September 2016; pp. 1–6. [Google Scholar]
Gulin, M.; Matuško, J.; Vašak, M. Stochastic model predictive control for optimal economic operation of a residential DC microgrid. In Proceedings of the 2015 IEEE International Conference on Industrial Technology (ICIT), Seville, Spain, 17–19 March 2015; pp. 505–510. [Google Scholar]
Zhang, J.; Csank, J.T.; Soeder, J.F. Hierarchical control of distributed battery energy storage system in a DC microgrid. In Proceedings of the 2021 IEEE Fourth International Conference on DC Microgrids (ICDCM), Virtual Conference, Arlington, VA, USA, 18–21 July 2021; pp. 1–8. [Google Scholar]
Fax, J.A.; Murray, R.M. Information flow and cooperative control of vehicle formations. IEEE Trans. Autom. Control 2004, 49, 1465–1476. [Google Scholar] [CrossRef] [Green Version]
Zhang, H.; Lewis, F.L.; Das, A. Optimal design for synchronization of cooperative systems: State feedback, observer and output feedback. IEEE Trans. Autom. Control 2011, 56, 1948–1952. [Google Scholar] [CrossRef]
Mondal, S.; Srivastava, A.; Maji, A.; Chakraborty, R.; Roy, D.S.; Mukherjee, S. A distributed fixed-time consensus for battery storage systems. In Proceedings of the 2021 Innovations in Energy Management and Renewable Resources (52042), Kolkata, India, 5–7 February 2021; pp. 1–5. [Google Scholar]
Morstyn, T.; Hredzak, B.; Agelidis, V.G. Communication delay robustness for multi-agent state of charge balancing between distributed AC microgrid storage systems. In Proceedings of the 2015 IEEE Conference on Control Applications (CCA), Sydney, NSW, Australia, 21–23 September 2015; pp. 181–186. [Google Scholar]
Li, C.; Coelho, E.A.A.; Dragicevic, T.; Guerrero, J.M.; Vasquez, J.C. Multiagent-based distributed state of charge balancing control for distributed energy storage units in AC microgrids. IEEE Trans. Ind. Appl. 2016, 53, 2369–2381. [Google Scholar] [CrossRef] [Green Version]
Yu, C.; Zhou, H.; Yao, R.; Chen, S. Frequency synchronization and soc balancing control in AC microgrids. In Proceedings of the IECON 2020 the 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 3365–3370. [Google Scholar]
Morstyn, T.; Hredzak, B.; Demetriades, G.D.; Agelidis, V.G. Unified distributed control for DC microgrid operating modes. IEEE Trans. Power Syst. 2015, 31, 802–812. [Google Scholar] [CrossRef]
Morstyn, T.; Hredzak, B.; Agelidis, V.G. Cooperative multi-agent control of heterogeneous storage devices distributed in a DC microgrid. IEEE Trans. Power Syst. 2015, 31, 2974–2986. [Google Scholar] [CrossRef]
Li, C.; Dragicevic, T.; Plaza, M.G.; Andrade, F.; Vasquez, J.C.; Guerrero, J.M. Multiagent based distributed control for state-of-charge balance of distributed energy storage in DC microgrids. In Proceedings of the IECON 2014—40th Annual Conference of the IEEE Industrial Electronics Society, Dallas, TX, USA, 29 October–1 November 2014; pp. 2180–2184. [Google Scholar]
Morstyn, T.; Savkin, A.V.; Hredzak, B.; Agelidis, V.G. Multi-agent sliding mode control for state of charge balancing between battery energy storage systems distributed in a DC microgrid. IEEE Trans. Smart Grid 2017, 9, 4735–4743. [Google Scholar] [CrossRef] [Green Version]
Wu, T.; Xia, Y.; Wang, L.; Wei, W. Multiagent based distributed control with time-oriented SoC balancing method for DC microgrid. Energies 2020, 13, 2793. [Google Scholar] [CrossRef]
Almada, J.B.; Leão, R.P.; Almeida, R.G.; Sampaio, R.F. Microgrid distributed secondary control and energy management using multi-agent system. Int. Trans. Electr. Energy Syst. 2021, 31, e12886. [Google Scholar] [CrossRef]
Worthmann, K.; Kellett, C.M.; Braun, P.; Grüne, L.; Weller, S.R. Distributed and decentralized control of residential energy systems incorporating battery storage. IEEE Trans. Smart Grid 2015, 6, 1914–1923. [Google Scholar] [CrossRef] [Green Version]
Liu, M.; Cheng, Z.; Zhang, Z.; Sun, M.; Deng, R.; Cheng, P.; Chow, M.-Y. A multi-agent system based hierarchical control framework for microgrids. In Proceedings of the 2021 IEEE PES General Meeting, Washington, DC, USA, 16 February 2021. [Google Scholar]
He, Y.; Wang, W.; Wu, X. Multi-agent based fully distributed economic dispatch in microgrid using exact diffusion strategy. IEEE Access 2019, 8, 7020–7031. [Google Scholar] [CrossRef]
Dagdougui, H.; Sacile, R. Decentralized control of the power flows in a network of smart microgrids modeled as a team of cooperative agents. IEEE Trans. Control Syst. Technol. 2013, 22, 510–519. [Google Scholar] [CrossRef]
Kang, W.; Chen, M.; Li, B.; Chen, F.; Lai, W.; Lin, H.; Zhao, B. Distributed reactive power control and SOC sharing method for battery energy storage system in microgrids. IEEE Access 2019, 7, 60707–60720. [Google Scholar] [CrossRef]
Hug, G.; Kar, S.; Wu, C. Consensus+ innovations approach for distributed multiagent coordination in a microgrid. IEEE Trans. Smart Grid 2015, 6, 1893–1903. [Google Scholar] [CrossRef]
Morstyn, T.; Hredzak, B.; Agelidis, V.G. Network topology independent multi-agent dynamic optimal power flow for microgrids with distributed energy storage systems. IEEE Trans. Smart Grid 2016, 9, 3419–3429. [Google Scholar] [CrossRef] [Green Version]
Cintuglu, M.H.; Martin, H.; Mohammed, O.A. Real-time implementation of multiagent-based game theory reverse auction model for microgrid market operation. IEEE Trans. Smart Grid 2015, 6, 1064–1072. [Google Scholar] [CrossRef]
Russel, S.; Norvig, P. Artificial Intelligence—A Modern Approach, 2nd ed.; Pearson Education: Bergen, NJ, USA, 2003. [Google Scholar]
Mondal, A.; Misra, S.; Obaidat, M.S. Distributed home energy management system with storage in smart grid using game theory. IEEE Syst. J. 2015, 11, 1857–1866. [Google Scholar] [CrossRef]
Dehghanpour, K.; Nehrir, H. Real-time multiobjective microgrid power management using distributed optimization in an agent-based bargaining framework. IEEE Trans. Smart Grid 2017, 9, 6318–6327. [Google Scholar] [CrossRef]
Dehghanpour, K.; Nehrir, H. An agent-based hierarchical bargaining framework for power management of multiple cooperative microgrids. IEEE Trans. Smart Grid 2017, 10, 514–522. [Google Scholar] [CrossRef] [Green Version]
Esfahani, M.M.; Hariri, A.; Mohammed, O.A. A multiagent-based game-theoretic and optimization approach for market operation of multimicrogrid systems. IEEE Trans. Ind. Inform. 2018, 15, 280–292. [Google Scholar] [CrossRef]
Arwa, E.O.; Folly, K.A. Reinforcement learning techniques for optimal power control in grid-connected microgrids: A comprehensive review. IEEE Access 2020, 8, 208992–209007. [Google Scholar] [CrossRef]
Erick, A.O.; Folly, K.A. Energy trading in grid-connected PV-battery electric vehicle charging station. In Proceedings of the 2020 International SAUPEC/RobMech/PRASA Conference, Cape Town, South Africa, 29–31 January 2020; pp. 1–6. [Google Scholar]
Liu, T.; Hu, X. A bi-level control for energy efficiency improvement of a hybrid tracked vehicle. IEEE Trans. Ind. Inform. 2018, 14, 1616–1625. [Google Scholar] [CrossRef] [Green Version]
Ko, H.; Pack, S.; Leung, V.C. Mobility-aware vehicle-to-grid control algorithm in microgrids. IEEE Trans. Intell. Transp. Syst. 2018, 19, 2165–2174. [Google Scholar] [CrossRef]
Tan, Z.; Zhang, X.; Xie, B.; Wang, D.; Liu, B.; Yu, T. Fast learning optimiser for real-time optimal energy management of a grid-connected microgrid. IET Gener. Transm. Distrib. 2018, 12, 2977–2987. [Google Scholar] [CrossRef]
Erick, A.O.; Folly, K.A. Power flow management in electric vehicles charging station using reinforcement learning. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Lange, S.; Gabel, T.; Riedmiller, M. Batch reinforcement learning. In Reinforcement Learning; Springer: Berlin, Germany, 2012; pp. 45–73. [Google Scholar]
Mbuwir, B.V.; Ruelens, F.; Spiessens, F.; Deconinck, G. Battery energy management in a microgrid using batch reinforcement learning. Energies 2017, 10, 1846. [Google Scholar] [CrossRef] [Green Version]
Shi, G.; Liu, D.; Wei, Q. Echo state network-based Q-learning method for optimal battery control of offices combined with renewable energy. IET Control Theory Appl. 2017, 11, 915–922. [Google Scholar] [CrossRef] [Green Version]
Vrancois, V. Contributions to Deep Reinforcement Learning and Its Applications in Smart Grids. Ph.D. Thesis, University of Liège, Liège, Belgium, 2017. [Google Scholar]
Ji, Y.; Wang, J.; Xu, J.; Fang, X.; Zhang, H. Real-time energy management of a microgrid using deep reinforcement learning. Energies 2019, 12, 2291. [Google Scholar] [CrossRef] [Green Version]
Lu, X.; Xiao, X.; Xiao, L.; Dai, C.; Peng, M.; Poor, H.V. Reinforcement learning-based microgrid energy trading with a reduced power plant schedule. IEEE Internet Things J. 2019, 6, 10728–10737. [Google Scholar] [CrossRef]
Desportes, L.; Fijalkow, I.; Andry, P. Deep reinforcement learning for hybrid energy storage systems: Balancing lead and hydrogen storage. Energies 2021, 14, 4706. [Google Scholar] [CrossRef]
Zsembinszki, G.; Fernández, C.; Vérez, D.; Cabeza, L.F. Deep learning optimal control for a complex hybrid energy storage system. Buildings 2021, 11, 194. [Google Scholar] [CrossRef]
Awate, Y.P. Policy-gradient based actor-critic algorithms. In Proceedings of the 2009 WRI Global Congress on Intelligent Systems, Washington, DCM, USA, 19–21 May 2009; pp. 505–509. [Google Scholar]
Chen, P.; Liu, M.; Chen, C.; Shang, X. A battery management strategy in microgrid for personalized customer requirements. Energy 2019, 189, 116245. [Google Scholar] [CrossRef]
Yu, L.; Xie, W.; Xie, D.; Zou, Y.; Zhang, D.; Sun, Z.; Zhang, L.; Zhang, Y.; Jiang, T. Deep reinforcement learning for smart home energy management. IEEE Internet Things J. 2019, 7, 2751–2762. [Google Scholar] [CrossRef] [Green Version]
Joshi, A.; Tipaldi, M.; Glielmo, L. An actor-critic approach for control of residential photovoltaic-battery systems. IFAC-PapersOnLine 2021, 54, 222–227. [Google Scholar] [CrossRef]
Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
Kyriakides, G.; Margaritis, K.G. Neural architecture search with synchronous advantage actor-critic methods and partial training. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2018; pp. 1–7. [Google Scholar]
Zhang, K.; Yang, Z.; Başar, T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. In Handbook of Reinforcement Learning and Control; Springer: Berlin, Germany, 2021; pp. 321–384. [Google Scholar]
Zhou, S.; Zhou, L.; Mao, M.; Xi, X. Transfer learning for photovoltaic power forecasting with long short-term memory neural network. In Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Korea, 19–22 February 2020; pp. 125–132. [Google Scholar]
Cao, X.; Wan, H.; Lin, Y.; Han, S. High-value prioritized experience replay for off-policy reinforcement learning. In Proceedings of the 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), Portland, OR, USA, 4–6 November 2019; pp. 1510–1514. [Google Scholar]
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. A brief survey of deep reinforcement learning. arXiv 2017, arXiv:1708.05866 2017. [Google Scholar] [CrossRef] [Green Version]
De Abril, I.M.; Kanai, R. Curiosity-driven reinforcement learning with homeostatic regulation. In Proceedings of the 2018 International Joint Conference on Neural Networks, Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–6. [Google Scholar]