Voltage Control for DC Microgrids: A Review and Comparative Evaluation of Deep Reinforcement Learning

Muhammad, Sharafadeen; Obeid, Hussein; Hammou, Abdelilah; Hinaje, Melika; Gualous, Hamid

doi:10.3390/en18215706

Open AccessReview

Voltage Control for DC Microgrids: A Review and Comparative Evaluation of Deep Reinforcement Learning

by

Sharafadeen Muhammad

¹

,

Hussein Obeid

^2,*

,

Abdelilah Hammou

¹

,

Melika Hinaje

³

and

Hamid Gualous

¹

LUSAC Laboratory, University of Caen Normandy, 50130 Cherbourg-en-Cotentin, France

²

Department of Mechanical and Industrial Engineering, College of Engineering, Sultan Qaboos University, Muscat 123, Oman

³

LRGP, CNRS, University of Lorraine, 54000 Nancy, France

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(21), 5706; https://doi.org/10.3390/en18215706

Submission received: 24 September 2025 / Revised: 23 October 2025 / Accepted: 27 October 2025 / Published: 30 October 2025

(This article belongs to the Special Issue Smart and Sustainable Energy Systems: Optimization, Modeling, and Management for Global Energy Challenges)

Download

Browse Figures

Versions Notes

Abstract

Voltage stability in DC microgrids (DC MG) is crucial for ensuring reliable operation and component safety. This paper surveys voltage control techniques for DC MG, classifying them into model-based, model-free, and hybrid approaches. It analyzes their fundamental principles and evaluates their strengths and limitations. In addition to the survey, the study investigates the voltage control problem in a critical scenario involving a DC/DC buck converter with an input LC filter. Two model-free deep reinforcement learning (DRL) control strategies are proposed: twin-delayed deep deterministic policy gradient (TD3) and proximal policy optimization (PPO) agents. Bayesian optimization (BO) is employed to enhance the performance of the agents by tuning their critical hyperparameters. Simulation results demonstrate the effectiveness of the DRL-based approaches: compared to benchmark methods, BO-TD3 achieves the lowest error metrics, reducing root mean square error (RMSE) by up to 5.6%, and mean absolute percentage error (MAPE) by 7.8%. Lastly, the study outlines future research directions for DRL-based voltage control aimed at improving voltage stability in DC MG.

Keywords:

DC/DC converters; DC microgrid; deep reinforcement learning; Bayesian optimization; renewable energy; voltage control

1. Introduction

The 21st century has witnessed a rapid increase in distributed energy resources (DER), driven by advances in power electronics, the widespread adoption of renewable energy sources (RES), and the growing use of energy storage devices (ESD) [1]. RES such as solar photovoltaic (PV), fuel cells, wind, and tidal energy systems are increasingly favored for their sustainability, abundance, and environmental benefits. They are widely used in residential, industrial, and remote applications where access to conventional grid power is limited or economically impractical. This transition supports global efforts to mitigate climate change and ensure clean, sustainable energy for future generations [2], forming the foundation for the evolution from conventional centralized grids to distributed microgrid (MG) architectures.

MG can be defined as an electrically bounded area of a distributed network that integrates local DER, ESD, and loads to form a self-sufficient energy system [3]. MG are generally classified into three types, based on the current in their main bus: AC, DC, and hybrid. AC MG are widely used to reduce dependence on the main grid and supply power to target loads [4]. Despite challenges, such as grid synchronization, power sharing among coupled DER, and frequency control, AC MG remain prevalent due to their grid compatibility and the maturity of their control and protection technologies [5].

Over the past few decades, AC MG have received primary attention due to the dominance of AC networks. However, the growing integration of RES, ESD, and DC loads has shifted interest toward DC MG [6]. In particular, DC MG offers a simpler structure with zero reactive power flow and provides reduced power losses with improved efficiency. Moreover, they exhibit superior compatibility with the increasing number of modern DC-based consumer devices [7].

Despite numerous advantages, voltage instability remains a significant challenge that affects the performance of DC MG [8]. Voltage and power fluctuations, often exacerbated by the intermittent nature of RES and other uncertainties, can degrade power quality and overall efficiency. Prolonged instability may even cause equipment malfunction or system failure. Consequently, voltage stability has remained a critical area of research, with extensive efforts devoted to developing various model-based, model-free, and hybrid control techniques.

In model-based control, a mathematical model of the system is required to design the control algorithms. These methods rely on accurate system parameters and are commonly applied in optimal and predictive control [9], with examples including sliding mode control, predictive control, passivity-based control, and the classical proportional-integral (PI) controller. They offer analytical stability guarantees and can generate control actions with high accuracy [10], but are often sensitive to parameter variations and unmodeled dynamics. In contrast, model-free techniques do not require an explicit system model and are typically experience-based or rely on data-driven approaches [11], such as fuzzy logic control, artificial neural networks, and reinforcement learning. These methods offer flexibility and adaptability, especially in complex systems. However, their reliance on data can be challenging, particularly when system dynamics change rapidly or training data is insufficient [11]. On the other hand, hybrid control techniques combine aspects of both model-based and model-free schemes, examples include physics informed neural networks and metaheuristic-optimized control techniques.

Several studies have reviewed voltage control methods for DC MG applications. For example, ref. [12] discusses various topologies and control strategies, focusing on centralized, decentralized, and hierarchical control structures. Similarly, ref. [13] analyzes the strengths and weaknesses of various control topologies, with particular emphasis on droop control. In [14], the performance of advanced control technologies is evaluated, specifically for their application in bidirectional DC/DC converters to address instability issues caused by constant power loads (CPL). Another survey, ref. [15], explores voltage control techniques that incorporate different levels of communication for standalone MG applications.

This study provides a comprehensive survey of various advanced voltage control techniques, which are categorized into model-based, model-free, and hybrid approaches. In addition to this, the paper proposes model-free DRL-based control strategies for a DC/DC converter with an input LC filter: value-based TD3 and policy-based PPO. The proposed agents are designed, trained, and tuned using Bayesian optimization (BO) to address instability issues arising in the cascaded system. Furthermore, the proposed approaches are benchmarked against other control strategies through simulations under different operating conditions. Lastly, future research directions for voltage control in DC MG are identified. The main contributions of this work are summarized as follows:

It provides a survey of various state-of-the-art model-based, model-free, and hybrid voltage control techniques.
It proposes BO-DRL-based solutions for voltage control of a DC/DC buck converter with an input LC filter.
It proposes recommendations for future research, particularly those that employ machine learning and data-driven control algorithms.

The organization of the rest of the paper is as follows: Section 2 provides an overview of DC MG control. Detailed reviews of model-based, model-free, and hybrid control techniques are discussed in Section 3, Section 4, and Section 5, respectively. Section 6 presents the design and analysis of a DRL control strategy based on TD3 and PPO, incorporating BO for optimal tuning of the agents’ hyperparameters. Section 7 outlines future research directions, and finally, the conclusion is given in Section 8.

2. Background of the Study

DC MG has gained significant attention for enabling more reliable and sustainable energy systems. In general, their operation can be classified into two modes: grid-connected or standalone (autonomous) [12]. When the DC MG exchanges power with the main grid, it operates in grid-connected mode. On the other hand, when it operates independently, relying on local generation to meet load demand, it is in stand-alone mode. In this mode, ESD must absorb or release the power difference between the generation and the load.

Depending on the application, a DC MG incorporates various types of power electronic converters to integrate distributed energy sources, energy storage systems, and loads. The converters are typically connected to a common DC bus, which serves as an interface for achieving high efficiency and straightforward control. Notably, DC MG has seen widespread adoption across industrial, residential, and military sectors, including electric vehicles, data centers, naval ships, submarines, spacecraft, and net-zero energy buildings. A typical architecture of DC MG is shown in Figure 1.

2.1. Cause of Voltage Instability

The pressing operational challenges of DC MG, particularly in maintaining voltage stability, arise from several key factors, including the following:

Intermittent generation: The fluctuating nature of RES causes mismatch between generation and demand, leading to voltage instability [16]. For example, PV output depends on weather conditions like temperature and solar irradiance.
CPL are nonlinear loads characterized by negative incremental impedance [17]. They maintain constant power consumption despite DC bus voltage fluctuates. DC load regulated by DC/DC converters often exhibits CPL behavior, which can destabilize the system. Voltage instability caused by CPL has been extensively studied [18,19]. Figure 2 illustrates this negative incremental impedance behavior.
Pulse power loads (PPL) draw large current in short durations, potentially causing voltage instability due to their high-power characteristics [20]. They are common in onboard MG of electric ships, particularly for systems like sonar and radar.
Faults and aging can alter system dynamics, posing the risk of instability and poor performance. Additional challenges related to the operation, control, and protection of DC MGs are reported in [21].
Filters, such as commonly used LC types, improve power quality, but can reduce the damping ratio of DC MG, increasing the risk of instability [22].

2.2. DC Microgrid Control

To ensure stable and efficient operation, DC MG commonly employs a hierarchical control structure comprising three levels: primary, secondary, and tertiary [23]. The primary controller acts locally to regulate the DC bus voltage and maintain power sharing at the lowest level. It operates at the fastest timescale, and voltage deviations at this level are corrected by the slower secondary controller [24]. The tertiary controller, operating at the highest and slowest level, optimizes the overall system performance and power flow among units. This study specifically focuses on recent advances in primary control, emphasizing model-based, model-free, and hybrid approaches.

3. Model-Based Techniques

Model-based control, which relies on knowledge of the system model, forms the foundation of control theory for designing effective controllers. In DC MG, model-based voltage control employs both conventional and advanced controllers. Traditional linear techniques, such as PI and LQR [25], have been successfully implemented but often struggle with system uncertainties. A review of advanced control techniques is presented in the following sub-sections.

3.1. Sliding Mode Control

Sliding mode control (SMC) is a nonlinear technique well suited for systems with uncertainties and disturbances [26,27]. It operates by driving system states onto a predefined sliding surface and maintaining them there. In DC MG, SMC is widely used for voltage regulation [28,29]. However, it can suffer from chattering, a phenomenon that, if not properly addressed, may cause excessive wear of power components.

Various sliding surfaces have been proposed to reduce chattering. The authors in [30] introduce an approach incorporating a hysteresis band in the switching function. To improve the voltage stability of a boost converter, ref. [31] proposes a fully decentralized second-order SM (SOSM) control capable of handling unknown loads without a state observer. A distinguishing feature of this method is the inclusion of an auxiliary integral control using an appropriate sliding function.

The investigation in [32] presents a particle swarm optimization (PSO)-tuned SMC. The evolutionary algorithm optimizes controller parameters based on stability conditions. To further reduce chattering and provide an adjustable sliding coefficient, ref. [33] introduces an adaptive SMC technique. In [34], a global fast terminal sliding mode controller with hysteresis modulation (HM-GFTSMC) is proposed to enhance voltage stability in a buck converter. The study in [35] proposes a robust higher-order PID sliding mode controller (PID-HOSMC) using a double power reaching law. The PID-HOSMC stabilizes the bus voltage via battery control signals. To achieve proportional load power sharing and DC bus voltage regulation, ref. [36] proposes a decentralized control approach combining SMC based on a higher-order finite-time observer (HOFTO) with droop control.

Moreover, SMC has been combined with a radial basis function neural network (RBFNN) to enhance performance. RBFNNs are three-layer feedforward neural networks with a single hidden layer that can approximate continuous functions with arbitrary precision [37,38]. In [39], an RBFNN adaptively tunes the switching gain of a sliding mode observer, ensuring asymptotic error convergence in the state of charge (SOC) estimation of a lithium-polymer battery. Similarly, ref. [40] employs an adaptive SMC with RBFNN estimation and backstepping control to improve the performance of a proton exchange membrane fuel cell (PEMFC).

Although SMC has been successfully applied for voltage regulation, its implementation faces several challenges. Increasing system complexity can make designing effective SMC difficult. Moreover, its performance strongly depends on model accuracy, sliding surface selection, and control law formulation. Despite efforts to reduce chattering, it remains a persistent challenge in many applications.

3.2. Adaptive Droop Control

Unlike conventional droop control with fixed coefficients, adaptive droop control adapts to variations in power, enabling more precise control [41]. Several droop-based methods have been developed to enhance voltage stability and power sharing accuracy. Reference [42] proposes a hierarchical two-stage controller. The first stage involves an adaptive droop control with voltage feedback compensation, while the second provides supervisory control. The study in [43] presents an adaptive droop and voltage shifting technique for balanced current sharing. Additionally, ref. [44] presents a unified control strategy for supercapacitor systems. The strategy rejects disturbances and regulates terminal voltage, demonstrating satisfactory plug-and-play capability using local measurements.

Another study in [45] presents a nonlinear droop control algorithm for a hybrid islanded MG. The method achieves good tracking performance through the integration of SMC. To further enhance voltage stability, ref. [46] introduces a resiliency-driven droop index control strategy, in which the droop coefficient is tuned using PSO.

Overall, adaptive droop control presents a promising solution for improving both voltage stability and power-sharing accuracy, although its performance may degrade in complex and fast-dynamic systems.

3.3. Model Predictive Control

Model Predictive Control (MPC) uses a system model to predict future behavior and optimize control actions over a finite time horizon [47]. At each sampling instant, an optimization problem is evaluated using a defined cost function while explicitly considering constraints [48].

MPC approaches are generally categorized as continuous control set (CCS-MPC) and finite control set (FCS-MPC) [49]. In CCS-MPC, converter switching is governed by PWM at fixed frequency, whereas FCS-MPC operates with a variable switching frequency [50]. It is worth noting that FCS-MPC offers simple implementation and high dynamic response [51]. However, its online computational burden increases with prediction horizon length. A comprehensive analysis of MPC is discussed in [52].

Recent studies have introduced different schemes of MPC-based control strategies. In [53], an MPC-based scheme for hybrid energy storage systems (HESS) is developed to enhance MG resilience under diverse disturbances. The authors adopt FCS-MPC to control a superconducting magnetic energy storage unit and a backup battery. Reference [54] proposes a fast-distributed MPC method that improves steady-state performance using the alternating-direction method of multipliers. To strengthen dynamic stability, ref. [55] investigates a hybrid MPC capable of operating in both continuous and discontinuous current modes. The simulation results showed improvements in voltage stability and faster response compared to CCS. Moreover, ref. [56] introduces an FCS-MPC algorithm integrated with a Kalman observer. In this framework, the FCS-MPC tracks the reference current, while the Kalman observer provides feedforward compensation.

Despite its advantages, MPC performance remains dependent on model accuracy, and its computational requirements increase with prediction horizon, necessitating a powerful processor.

3.4. Passivity-Based Control

Passivity-based control (PBC) ensures system stability by modeling the system as an energy-conserving (passive) structure [57]. It relies on the concept of passivity to shape the energy of the system and stabilize it [58]. In DC MG, PBC is employed to regulate bus voltage, manage power flows, and maintain stability under varying operating conditions. This approach is particularly effective for controlling DC/DC converters with ESDs, where energy management is critical. The main advantage of PBC lies in its robustness and inherent stability; however, its performance can be limited when dealing with unmodeled disturbances.

Several studies have proposed PBC-based control strategies. For example, ref. [10] presents a passivity-based approach using a Krasovskii-type storage function. In [59], a passivity-based design is introduced for a non-affine, nonlinear system. The study proposes a low-gain controller using a dynamical model described by three inputs, three outputs non-affine system with six states. To improve voltage stability in an isolated DC MG with hydrogen energy storage, ref. [60] proposes an interconnection and damping assignment passivity-based controller (IDA-PBC). A sliding mode reference conditioning (SMRC) loop is complemented to prevent overvoltage. Additionally, ref. [61] proposes an adaptive IDA-PBC for a buck-boost converter supplying unknown CPLs. The authors incorporated an online parameter estimator to track load power, which is challenging to measure in practical applications.

3.5. Active Disturbance Control

Active Disturbance Rejection Control (ADRC) is a model-assisted technique that estimates and compensates for internal and external disturbances in real time [62]. It typically employs an extended state observer to reject disturbances. This makes it effective in situations where the system model is not completely known [63]. Its primary advantage is its ability to handle uncertainties and disturbances. However, the performance depends on the accuracy of the model and the observer.

In recent years, ADRC has been widely applied across various fields. Reference [64] implements ADRC for a flywheel energy storage system. The results showed superior disturbance rejection and dynamic performance compared with PI. Reference [65] proposes a time-scale voltage control technique based on ADRC. This approach offers improved robustness against system uncertainties and external disturbances. By developing a reduced-order model, the authors simplified the controller design and achieved better performance.

The study in [66] evaluates the performance of various ADRC techniques (linear, nonlinear, higher order, and generalized predictive integral). The study also proposes a modified model-assisted ADRC that uses pole information. Results show that incorporating system dynamics (pole information) as a known disturbance enhances stability compared with other model-assisted techniques.

In summary, ADRC offers distinct advantages in handling model uncertainties and suppressing disturbances and can be combined with other control methods to enhance stability. Its design, however, assumes knowledge of the system’s state-space model and order, and its tuning complexity can make implementation challenging, particularly for higher-order systems.

3.6. H-Infinity Control

H-infinity (H_∞) control is used to design controllers that handle external disturbances and model uncertainties by minimizing the worst-case system gain, ensuring stability under adverse conditions. The control problem is formulated as a mathematical optimization task, and the controller is synthesized through a repetitive approach [67].

Reference [68] presents an H_∞ control technique for DC MG stabilization. A robust H_∞-based approach is proposed in [69] to improve voltage regulation under uncertainties. The authors uses Lebesque-measurable matrix to determine the upper and lower bounds of the uncertainties. In [70], a decentralized H_∞ loop-shaping design control is proposed for a small, isolated DC MG. The parametric uncertainties are modeled using an upper linear fractional transformation. This technique shows good dynamic performance; however, it has not been benchmarked against other robust controllers. A summary of model-based voltage control strategies is provided in Table A1 in Appendix A.

4. Model-Free Techniques

While model-based control strategies can achieve high performance when accurate system dynamics are available, their effectiveness is often limited by modeling complexity, parameter uncertainties, and unmodeled nonlinearities. In many practical DC MG applications, deriving an exact model is frequently difficult or impractical. These challenges have motivated the development of model-free techniques, which regulate system behavior without relying on explicit models. This section reviews representative model-free methods, with a focus on their operating principles, practical applications, and inherent challenges.

4.1. Fuzzy Logic Control

Fuzzy Logic Control is a rule-based mathematical concept that extends Boolean logic to a multi-valued framework [71]. It is widely applied in DC MG due to its simple design and strong robustness [72]. An FLC typically comprises three main blocks: fuzzification, rule base lookup table, and defuzzification [73]. The two most commonly used inference engines are Mamdani and Takagi-Sugeno. The unique difference between these models lies in the consequent parts of their rule base, aggregation, and defuzzification procedures [74].

Several studies have applied FLC for voltage regulation. Reference [75] proposes an FLC that determines the reference current of an ultracapacitor converter. An FLC is proposed in the outer voltage control loop, and a K-type regulator is used in the inner current control loop. To improve voltage stability in a DC/DC converter feeding CPLs, ref. [76] presents an intelligence-based, Type II FLC. The proposed method provides a powerful tool for handling systems with uncertainties. To stabilize the voltage in a full-electric ferry boat, ref. [77] proposes an intelligent single-input interval Type-II fuzzy PI controller (iSIT2-FPI). The proposed technique combines SMC with a model-free iSIT2-FPI to compensate for destructive impedance instability.

The study in [78] proposes a voltage stabilization technique using a Fuzzy-PI dual-mode controller. The PI ensures good steady-state response, and the FLC improves transient response. In [79], the stability of an isolated MG is enhanced by tuning fuzzy membership functions using PSO and Cuckoo search algorithms. Simulation results have shown that energy management using PSO outperformed the Cuckoo search algorithm. Another study in [80] applies an adaptive neuro-fuzzy inference system (ANFIS) to mitigate input voltage and load current perturbations. It is worth noting that ANFIS combines the learning ability of artificial neural networks and the structured intuition of FLC. The conclusion drawn from this survey is that its performance is sensitive to membership function selection, input/output tuning, and fuzzy rule base design, which often lacks rigorous mathematical rules.

4.2. Data-Driven Control

Model-free data-driven control (DDC) relies entirely on input/output (I/O) data for training and tuning, eliminating the need for explicit system models. This section surveys recent studies on the application of DDC for voltage regulation in DC MG.

4.2.1. Artificial Neural Network

An artificial neural network (ANN) uses data to learn complex relationships between system parameters and voltage fluctuations, enabling adaptation to changing conditions and improved stability [81]. Neural network-based control is widely adopted due to its ability to learn control strategies without an explicit system model [82]. However, its performance depends on training data and proper selection of network structures and activation functions.

Neural network architectures use distinctive principles to determine rules for diverse applications. Common types include feedforward, convolutional, radial basis function, recurrent, generative adversarial, and hybrid neural networks [83]. Backpropagation is popularly used training algorithm that optimize ANN parameters by minimizing the error between the desired and actual outputs [84].

The research community has proposed numerous ANN-based control strategies. For example, ref. [85] proposes a neural network predictive controller (NNPC) for voltage control in a buck converter. The proposed method combines the advantages of MPC with the high estimation power of ANN. A similar approach was employed in [86] to enhance boost converter stability, where MPC is applied for data generation, and a feedforward network is subsequently trained to generate the converter’s duty cycle. Moreover, ref. [87] integrates CS algorithm with ANN to regulate voltage in a system with a composite energy storage system (CESS).

To regulate voltage in a low-voltage DC MG, ref. [88] proposes a hybrid Bat search and ANN (HBSANN) approach. Similarly, ref. [89] introduces a supervised deep-learning controller that eliminates the need for sensor data. Another study in [90] presents an ANN-based control approach using approximate dynamic programming. The ANN is trained offline to minimize a cost function over a long-time horizon.

Based on the analysis of ANN-based voltage control, several conclusions can be drawn. The application of ANN improves stability in DC MG even with inaccurate model parameters. However, performance depends on the quantity and quality of the training data. Incomplete or inaccurate training data can lead to biased or unpredictable behavior. Furthermore, as system complexity increases, ANN become more susceptible to overfitting, reducing their ability to adapt to real-world disturbances and leading to inaccurate predictions and suboptimal performance.

4.2.2. Local Model Networks

Local model networks (LMNs) are a neural network architecture that combines the advantages of both local and global modeling approaches. They are particularly effective for handling complex, high-dimensional data, which can be challenging for conventional neural networks [91]. The LMN concept involves creating local models (LM) that collectively cover the entire operating range of the controlled process. These LM are then integrated using validity or weighting functions to form a non-linear global model of the plant [92]. Each LM within LMN is responsible for capturing the system dynamics within a specific operating regime [93].

Based on this concept, the study in [94] presents two methods for designing LM structures: the linearization-based approach and the LM-based approach. In the linearization-based method, the LM network is linearized around an initial operating point, and a linear controller is designed accordingly. In contrast, the LM-based approach considers a separate controller for each LM, with the overall control output determined by interpolating the responses of the local controllers according to the current operating point.

To improve the performance of conventional NN, ref. [91] proposes a data-driven strategy based on a LMN for the identification and voltage control of DC/DC converters. The LMN consists of local linear models (LLM), each responsible for modeling the dynamics of the converter within a specific operating regime, as determined by its validity function. The structure of the LMN is determined directly from the input-output measurements using a hierarchical binary tree learning algorithm. The identified LMN, employed in the local linear controller (LLC), is designed based on inverse error dynamics control. The results show that the identified model captures the actual dynamic behavior of the converters.

4.2.3. Model-Free Adaptive Control

Model-free adaptive control (MFAC) is a data-driven control approach for a class of unknown nonaffine, nonlinear, discrete-time systems [95]. This approach only relies on system I/O data, eliminating the need for explicit physical models and traditional stability theory. The controller establishes a dynamic linearization data model of the controlled system at each operating point using pseudo partial derivative, pseudo gradient, or pseudo-Jacobian matrix [96]. It is one of the most promising DDC for its computational efficiency and ease of operation [97]. The application of MFAC has been explored in various areas, including power distribution systems, automobiles, robotics, and MG. Notably, MFAC demonstrates strong potential for maintaining voltage stability in a system where the exact model is unavailable.

Recent studies show that MFAC provides strong control performance with high robustness and effective disturbance rejection. For example, ref. [98] presents an MFAC scheme for active stabilization in DC MG. The control approach based on compact form dynamic linearization shows a satisfactory result. To realize a robust control, ref. [99] presents an intelligent model-free SMC for regulating charging/discharging of ESD. Similarly, ref. [96] proposes a control approach for an ultracapacitor-based three-phase interleaved bidirectional DC/DC converter. The results demonstrate fast transient response and enhanced steady-state performance.

In [100], a model-free composite disturbance rejection control (MFC) is developed to regulate the voltage of a dynamic wireless charging system. A switching-gain-based MFC law is introduced to generate the desired duty cycle without requiring prior model knowledge. An online parameter identification law is then developed to recover the unknown control input gain. To improve the voltage stability, ref. [101] proposes a data-driven model-free disturbance rejection control for a DC/DC converter. In this study, the authors considered a data-driven adaptive extended observer (DAESO) for voltage regulation in the presence of disturbances. DAESO is designed by incorporating a data memory stack and a parameter learning law to estimate the unknown control input gain and the lumped disturbance.

The literature shows that the stability, error convergence, and internal stability of systems controlled by MFAC approaches are theoretically guaranteed through contraction mapping analysis. Design assumptions, such as bounded system states, are typically considered; however, these assumptions may introduce bias into the learning process. Moreover, the performance can be influenced by the quality of the data used for learning.

4.2.4. Deep Reinforcement Learning

DRL is an emerging machine learning technique where an agent learns to perform actions through trial-and-error interaction with a controlled system [24]. In the context of DRL, the controlled system is referred to as an environment. It finds applications in DC MG and many other fields for control and optimization. The study in [102] identified three major factors influencing the growing adoption of DRL: data, computing resources, and learning algorithms. A schematic representation of a typical DRL agent is shown in Figure 3.

Unlike other classes of machine learning, DRL involves sequential decision-making [103], where an agent learns to navigate a potentially stochastic environment, balancing exploration and exploitation. This balancing is crucial, particularly in a system where the agent must decide between exploiting known rewards and exploring new possibilities to maximize long-term reward. A comprehensive review of the applications of DRL is discussed in [92].

DRL models are classified into two main types: model-based and model-free DRL. In this context, the model refers to the agent’s perception of its environment. Model-based DRL explicitly requires a model of system dynamics to find a near-optimal policy [104]. Here, the agent has a clear understanding of the environment in which it operates. The environment can be either partially observable or fully observable. In contrast, a model-free agent has no prior knowledge of the environment it interacts with. The agent learning process mainly relies upon experience.

The Markov decision process (MDP) provides a mathematical framework that describes environment-agent interaction at each sequence of time steps [105]. MDP is used for modeling decisions in uncertain conditions using a trajectory shown in Figure 4 [106].

After formulating voltage control problem as an MDP, it can be trained and solved using various model-free DRL algorithms, either on-policy or off-policy. On-policy agents attempt to enhance the policy they are using to make decisions. In contrast, off-policy agents improve a policy that can be different from the one they use to make decisions. The basic classification of model-free DRL algorithms is shown in Table 1.

DRL agent optimization is either value, policy, or a combination of both policy and value (using actor and critic). In a value-based approach, a value function is initially selected for each state, and a new value function is then estimated using learning data. This process is iterated until the optimal value function is achieved. While the policy-based method directly optimizes the policy function, mapping states to actions without explicitly estimating value functions [107].

Recent studies have investigated the effectiveness of DRL for voltage control in DC MG. Reference [108] proposes an approach using PID and DRL algorithm. The PID serves as the base controller, while a DDPG agent is employed as an adaptive compensator to improve the system’s dynamic response. In [109], the application of DQN for voltage control of buck converter feeding CPLs is investigated. To overcome the limited exploration capability of DQN, ref. [110] adopts an improved version based on DDPG. The study in [111] presents a nonlinear-based scheme for voltage stabilization of a boost converter. The authors use a regression-based optimization algorithm to improve system efficiency. To overcome the overestimation of value functions that often occurs in DDPG, ref. [112] proposes a TD3 algorithm that uses two separate critic networks and delayed policy updates. Similarly, ref. [113] presents a TD3-based algorithm for a buck converter feeding CPL. The distinctive feature of this study is its framework, which is designed for direct control of the power switches. Moreover, the study in [114] modified and trained a TD3 agent to improve the dynamic stability in a boost converter.

In [115], a DDPG-PI based on a sliding mode observer is proposed to address the effects of unknown and non-ideal CPLs. The proposed scheme eliminates dependence on an accurate model, as the agent learns to accommodate unmodeled dynamics. In [116], an adaptive disturbance rejection controller employing DDPG and Internet of Things (IoT) is proposed for smart grid systems. The DDPG agent offers online tuning through neural network learning. To address the challenges of offline training, ref. [117] presents an integral DRL-based voltage control approach for an interleaved DC/DC boost converter that autonomously approximates the controller gains online.

Overall, DRL-based control algorithms demonstrate strong adaptability in a completely unknown environment; however, they require significant computational effort for agent training. Additionally, sparse rewards and excessive exploration may lead to unstable control actions oscillations. A summary of the reviewed model-free strategies is provided in Table A2 in Appendix A.

5. Hybrid Control Techniques

Hybrid control techniques combine aspects of both model-based and model-free control schemes. This section examines the recent advances in hybrid control techniques for voltage control in DC MG.

5.1. Metaheuristic Optimization Algorithms

Metaheuristic optimization algorithms (MOA) are nature-inspired algorithms designed to find optimal solutions to complex optimization problems. The main advantages of MOAs are their versatility and flexibility [118], as they can be easily modified to meet the specific requirements of a given problem. These algorithms are widely used to optimize the parameters of both classical and advanced control techniques. For example, ref. [119] proposes an intelligent binary and real-code genetic algorithm tuned PID controller for regulating fuel cell voltage. The binary genetic algorithm (GA) is a well-established optimization method for identifying near-optimal solutions. It consists of binary-encoded chromosomes that undergo roulette wheel selection, multipoint crossover, and uniform mutation to produce offspring [119]. Each chromosome in the offspring population is evaluated using a defined objective function.

To improve stability in a buck-boost converter, ref. [120] proposes a PSO-tuned PI controller. The PSO algorithm, inspired by the social behavior of bird flocks searching for food, iteratively adjusts the positions of particles (potential solutions) based on their individual best-known positions and the global best-known position of the swarm. The study in [121] introduces a quasi-oppositional Archimedes optimization algorithm (QOAOA) for tuning a fractional-order PID (FOPID) controller to regulate the output voltage of a cascaded DC/DC boost converter. To enhance voltage stability in PV-powered MGs, ref. [122] combines direct SMC with a hybrid salp swarm algorithm and particle swarm optimization (SSA-PSO). Moreover, ref. [123] proposes a control strategy for a buck converter by optimizing PID parameters using an improved sine-cosine algorithm.

Overall, MOAs offer a powerful framework for controller parameter optimization. They have demonstrated significant improvements in performance across both linear and nonlinear control applications. Nonetheless, their effectiveness may be constrained by sensitivity to local minima and the need for careful parameter tuning.

5.2. Physics-Informed Neural Networks

Physics-Informed Neural Networks (PINN) have recently gained attention as a promising approach for accelerating computations in nonlinear dynamical systems. PINN integrate physics-informed laws (model-based) into neural network methodologies (model-free) to learn complex physical behaviors [124]. This approach ensures that the learned solutions adhere to the underlying physics of the systems. Unlike the conventional neural network paradigm, PINN uses scientific knowledge or physics laws to guide the optimization, design, and implementation of deep neural networks [125]. The four core paradigms of PINN, as outlined in [126], include physics-informed loss functions, physics-informed initialization, physics-informed design of architecture, and hybrid physics-deep learning models.

In recent years, the adoption of PINN in power systems has expanded to applications such as state/parameter estimation, optimal power flow, and dynamics analysis. Reference [127] presents a PINN approach for estimating the state of health (SOH) of lithium-ion batteries. The authors model the attributes influencing battery degradation through empirical degradation and state-space equations and use neural networks to capture system dynamics. The proposed network comprises two components: a solution NN F(.) that builds feature-to-SOH mapping and a NN G(.) that models battery degradation dynamics.

To enhance adaptability and minimize reliance on extensive quality datasets and high computational power, ref. [128] proposes a PINN-based control framework for improving the stability of a buck converter. The framework employs a parallel architecture integrating data-driven, physics, control, and loss function modules. The proposed method performed well in identifying and rejecting unknown load disturbances and internal circuit parameters. Notably, PINN can achieve high accuracy and strong generalization compared to traditional neural networks due to their integration of physical laws. However, they often require high-quality supervised data for training and validation, which can limit their applications. Table A3 in Appendix A summarises recent research on hybrid voltage control techniques.

The voltage control techniques reviewed in Section 3, Section 4 and Section 5 exhibit distinct characteristics in terms of modeling requirements and performance. A summarized comparison is provided in Table 2. It is important to note that hybrid methods are not included in this comparative summary to maintain a clear distinction between model-based and model-free control techniques. These methods typically combine model knowledge with optimization-based (e.g., MOA) or data-driven (e.g., PINN) adaptation and often function as tuning or augmentation layers rather than standalone controllers. As a result, their performance largely depends on the main controller (model-based or model-free) and may inherit the strengths and limitations of the underlying control strategy.

6. A Case Study of DC/DC Converter with LC Filter

The preceding sections critically reviewed voltage control strategies, highlighting their respective strengths and limitations. While this survey provides a comprehensive view of state-of-the-art, it is equally important to evaluate their effectiveness in a representative and challenging scenario. To this end, this study presents a case study of a DC/DC buck converter cascaded with an input LC. This configuration is particularly relevant, as it captures one of the critical instability issues in DC MG: the adverse interaction between the LC filters and converters, which substantially reduces system damping and compromises voltage stability.

In practical DC MG, as shown in Figure 1, an input LC filter is commonly included at the DC/DC converter stage to suppress harmonics injected into the DC bus and to mitigate instantaneous impulse voltages and electromagnetic interference [129]. However, its inclusion can lower the damping ratio of the system and exacerbate instability risks through resonance with the converter dynamics [22]. When this effect is combined with the negative incremental impedance of CPL, the resulting system stability is severely degraded.

To address such challenges, researchers have proposed a wide range of strategies, including linear feedback control [130], passivity-based control [131], and feed-forward virtual impedance combined with proportional–resonant control [129]. More recently, however, attention has shifted toward model-free approaches, whose flexibility and adaptability to disturbances and parameter variations make them attractive for DC MGs without the need for continuous model updates [95].

The present case study is designed with two objectives: (i) to benchmark conventional and advanced controllers within a unified framework, and (ii) to demonstrate the potential of DRL strategies enhanced through BO for achieving robust voltage regulation under uncertain and nonlinear operating conditions.

It should be noted that hybrid approaches are not included in this comparative study. This decision is motivated by two considerations: first, hybrid methods such as MOA and PINN generally serve as tuning or augmentation layers for existing controllers rather than as standalone real-time control schemes; second, incorporating them would require extensive data or embedding additional optimization modules in the control loop, which is beyond the scope of this benchmark investigation. Instead, this work focuses on a fair comparison between widely adopted conventional controllers and the proposed DRL-based strategies, while recognizing hybrid approaches as a promising direction for future research.

6.1. System Model

To formalize this benchmark scenario, Figure 5 illustrates the studied system. It consists of a buck converter feeding a CPL through an input LC filter connected to a common DC bus. The DC bus voltage and the converter’s output voltage are defined as E and

v_{c}

, respectively. Similarly,

L_{f}, L, C_{f},

and C represent the inductances and capacitances of the LC filter and the converter. Additionally,

r_{f}

and r are the series resistances of

L_{f}

and L, respectively, while S denotes a controllable switch. These parameters are selected such that the converter operates in continuous conduction mode.

The system’s model is represented by the following differential equations:

L_{f} \frac{d i_{f} (t)}{d t} = E - i_{f} (t) r_{f} - v_{f} (t)

(1)

C_{f} \frac{d v_{f} (t)}{d t} = i_{f} (t) - u (t) i_{L} (t)

(2)

L \frac{d i_{L} (t)}{d t} = u (t) v_{f} (t) - i_{L} (t) r_{L} - v_{c} (t)

(3)

C \frac{d v_{c} (t)}{d t} = i_{L} (t) - i_{CPL} (t)

(4)

The control signal

u (t)

takes the value of ‘1’ for the switch ‘ON’ position and ‘0’ for the ‘OFF’ position.

The complete model is a four-state model with a state vector

x (t) = {[i_{f}, v_{f}, i_{L}, v_{c}]}^{T}

. The state-space average model is given by

\begin{matrix} \dot{x} (t) & = A x (t) + B u (t) \\ y (t) & = C x (t) \end{matrix}

(5)

where

A = [\begin{matrix} - \frac{r_{f}}{L_{f}} & - \frac{1}{L_{f}} & 0 & 0 \\ \frac{1}{C_{f}} & 0 & - \frac{S}{C_{f}} & 0 \\ 0 & \frac{S}{L} & - \frac{r}{L} & - \frac{1}{L} \\ 0 & 0 & \frac{1}{C} & - \frac{1}{R C} \end{matrix}], B = [\begin{matrix} 1 \\ 0 \\ 0 \\ 0 \end{matrix}]

C = [\begin{matrix} 0 & 0 & 0 & 1 \end{matrix}]

6.2. DRL Algorithms

Since the objective is to ensure that the output voltage remains fully regulated even under supply and load perturbations, this study proposes a model-free control strategy based on DRL and BO. To provide a balanced evaluation, the article adopts two representative DRL agents: PPO, an on-policy algorithm known for its stable policy updates and robustness across a range of control problems, and TD3, an off-policy algorithm that addresses overestimation bias often found in DDPG.

6.2.1. PPO

PPO is an on-policy DRL algorithm that uses a clipped surrogate function to prevent large policy updates and ensure stable learning [132]. It directly optimizes the policy through interactions with the environment, aiming to maximize the expected cumulative long-term reward. By constraining policy updates from deviating significantly from the current policy, PPO establishes a robust framework for optimizing control policies in a complex dynamic environment [133]. It can be applied in an environment with either continuous or discrete control action [134]. To estimate the policy and value function, PPO is particularly implemented using actor-critic function approximators [135]. The actor estimates the conditional probability of taking an action based on the current state, while the critic evaluates the chosen action and returns the corresponding expectation of the discounted long-term reward.

Policy optimization algorithms maximize the expected return

G (θ)

of a policy

π_{θ} (a ∣ s)

, parametrized by

θ

, over MDP trajectories:

G (θ) = E_{τ \sim π_{θ}} [\sum_{t = 0}^{\infty} γ^{t} r_{t}]

(6)

Applying policy gradient theorem, the gradient of (6) is given by:

\nabla_{θ} G (θ) = E_{s_{t}, a_{t} \sim π_{θ}} [\nabla_{θ} log π_{θ} (a_{t} ∣ s_{t}) A_{t}]

(7)

where

A_{t}

represents an advantage function, defined as the discounted sum of temporal difference error.

In general, PPO optimizes a clipped surrogate objective function using stochastic gradient descent [135]:

L^{CLIP} (θ) = {\hat{E}}_{t} [min (r_{t} (θ) {\hat{A}}_{t}, clip (r_{t} (θ), 1 - ϵ, 1 + ϵ) {\hat{A}}_{t})]

(8)

where

r_{t} (θ)

is a probability ratio between the new policy

π_{θ}

and old policy

π_{θ_{old}}

,

clip (\cdot)

restricts

r_{t} (θ)

to the range

[1 - ϵ, 1 + ϵ]

, and

ϵ

is a hyperparameter controlling the allowed policy change.

To encourage exploration and prevent premature convergence of the learned policy, PPO often adds an entropy loss term:

L^{ENTROPY} (θ) = E [H (π_{θ} (\cdot | s_{t}))]

(9)

Therefore, the final loss is given in (10):

L_{t} (θ) = E_{t} [L_{t}^{CLIP} (θ) - c_{1} L_{t}^{VF} (θ) + c_{2} L_{t}^{ENTROPY} (θ)]

(10)

where

L_{t}^{VF}

is a value function loss

{(V (s_{t}) - V_{t}^{target})}^{2}

,

c_{1}

and

c_{2}

are weighting coefficients.

6.2.2. TD3

TD3 is an off-policy actor-critic DRL agent that employs deterministic policies for continuous tasks [136]. It addresses the Q-value overestimation issue commonly observed in DDPG by introducing clipped double Q-learning, delayed policy updates, and target policy smoothing [137]. To estimate the policy and value functions, TD3 uses an actor network with a parametrized policy. The actor takes observation as input and outputs a control action that maximizes long-term reward. TD3 uses two identical critic networks to estimate the value of the state-action pairs.

In general, TD3 trains two critic networks

Q_{ϕ_{1}} (s, a)

,

Q_{ϕ_{2}} (s, a)

, and the minimum of the two values is used by the target. The target for the Q-value is obtained using (11) [137]:

y = r + γ min_{k = 1, 2} Q_{ϕ_{k}^{'}} (s^{'}, {\tilde{a}}^{'})

(11)

where r is the immediate reward,

γ

is the discount factor,

Q_{ϕ_{k}^{'}}

is the target Q-function, and

{\tilde{a}}^{'} = π_{θ^{'}} (s^{'}) + ϵ

represents the target action with added noise. It is important to note that, if

s^{'}

is a terminal state, the target Q-value is set to r only.

ϵ \sim clip (N (0, σ), - c, c)

(12)

where

σ

is the noise standard deviation, and c is the clipping range.

After every training step, the parameters of each critic are updated by minimizing the loss

L_{k}

across all sampled experiences:

L_{k} = \frac{1}{2 M} \sum_{i = 1}^{M} {(y_{i} - Q_{ϕ_{k}} (s_{i}, a_{i}))}^{2}

(13)

Similarly, the actor parameters are updated using the following sampled policy gradient:

\nabla_{θ} J = \frac{1}{M} \sum_{i = 1}^{M} G_{π_{i}} G_{a_{i}}

(14)

where

G_{π_{i}}

is the gradient of the actor output with respect to the actor parameters,

G_{a_{i}}

is the gradient of the minimum critic output with respect to the action estimated by the actor network.

TD3 employs a target actor (

θ^{'}

) and two target critics (

ϕ_{k}^{'}

). The target networks are updated periodically to avoid too fast convergence using a soft update rule as follows [112]:

\begin{matrix} θ^{'} & = τ θ + (1 - τ) θ^{'} \\ ϕ_{k}^{'} & = τ ϕ_{k} + (1 - τ) ϕ_{k}^{'} \end{matrix}

(15)

where

τ ≪ 1

is the smoothing factor.

6.3. DRL Controller Design

The complete block diagram illustrating the interaction between the proposed approach and the system is shown in Figure 6. The design of the state space, action space, reward function, and hyperparameters optimization is described in the following steps:

6.3.1. State Space

The state space contains sufficient information from the environment for the agent to make control decisions. For the objective of voltage regulation, it includes the voltage tracking error, the integral of the error, the delayed output voltage, and the reference voltage. The random variation of the reference voltage during training necessitates its inclusion to facilitate the learning of dynamic control policies. Thus, the state space is defined as

s_{t} = [e_{v} (t), \int e_{v} (t) d t, v_{o} (t - 1), V_{ref}]

(16)

where

e_{v} (t) = V_{ref} - v_{o}

.

6.3.2. Action Space

The action space represents the set of all valid control actions that the agent can execute within the environment. In this study, the converter’s duty cycle serves as the control signal generated by the agent to regulate the voltage. Therefore, the action space is defined as

a_{t} = d \in [0.0, 0.99]

(17)

6.3.3. Reward Function

The reward function is structured to encourage the agent for favorable actions and penalize it for poor decisions. The reward function used in this study is based on tracking error and is defined in (18).

R_{t} = r_{1} + r_{2} + r_{3}

(18)

r_{1} = - {(e_{v})}^{2},

r_{2} = \{\begin{matrix} 1 & if | e_{v} | \leq 0.01 \\ 0.1 & if 0.1 < | e_{v} | < 0.1 \\ 0 & otherwise \end{matrix}

r_{3} = \{\begin{matrix} - 100 & if v_{o} \geq 1.3 V_{ref} or v_{o} \leq 0.7 V_{ref} \\ 0 & otherwise \end{matrix}

where

r_{1}

denotes a parametric reward proportional to the negative squared voltage error,

r_{2}

is a non-parametric discrete reward that incentivizes favorable actions, and

r_{3}

penalizes violations of voltage limits. The scaling gains of the reward function are tuned using a trial-and-error approach.

6.3.4. Hyperparameter Optimization

DRL algorithms are making significant strides in the field of control engineering; however, their performance is challenged by hyperparameter sensitivity. These hyperparameters, such as discount factor, learning rate, minibatch size, etc., affect the learning process and the learned policy [138]. A popular tuning method involves trial-and-error or manual search. In this method, different hyperparameter values are used, and their impact on the performance is observed. It is noteworthy that manual search requires intuition and sufficient expertise to identify good hyperparameter sets. Furthermore, as the number and range of the hyperparameters increase, optimization becomes increasingly tedious, making it unlikely to find the best possible set.

To overcome the limitations of manual tuning, automatic search methods, such as grid search and random search, have been proposed. Grid search performs an exhaustive search of the entire space [139]. It trains an agent by trying all possible combinations of the hyperparameters. This method is straightforward to implement and widely adopted; however, it suffers from the curse of dimensionality and can be time-consuming, especially as the number and range of hyperparameters increase [139]. To address these drawbacks, random search selects hyperparameters randomly from a defined distributed search space [140]. While this approach reduces training time, it may not fully cover the parameter space.

To reduce computational effort and improve control performance, researchers employ BO to optimize the hyperparameters of DRL agents. BO attempts to minimize a scalar objective function,

f (λ) for all λ \in Λ

[141]. It iteratively builds Gaussian process model of the function based on the observed data and uses the model to predict performance. Hyperparameter optimization with BO relies on three key components: a Gaussian model of the objective function, a Bayesian update that modifies the model with each new evaluation of

f (λ)

, and an acquisition function [142].

In this study, BO is employed to optimize the hyperparameters by minimizing the cost function defined in (19).

f (λ) = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{t = 1}^{T_{i}} r {(λ)}_{i, t}

(19)

where

r {(λ)}_{i, t}

is the reward received with candidate hyperparameters

λ

, at time step t in episode i, N is the number of evaluation episodes, and

T_{i}

is the maximum number of time steps in the i-th evaluation episode.

The process involves computing four evaluations consecutively after every 20 training episodes. The mean value of the evaluation rewards used in the objective function provides insight into the agent’s performance in the environment, thereby facilitating more effective hyperparameter tuning.

Following the guidelines in [143], the hyperparameters most critical to learning performance, exploration, and stability were selected for the optimization. These hyperparameters and their respective bounds are provided in Table 3 and the complete algorithm is presented in pseudocode in Algorithm 1.

Algorithm 1 Bayesian optimization

Input: Objective function $f (λ)$ , maximum number of iterations $n_{max} = 30$ , maximum number of training episodes $p_{max} = 300$ , hyperparameter sets $λ_{TD 3} = [m, γ, α^{a}, α^{c}, k, τ_{s d}]$ , $λ_{PPO} = [m, γ, α^{a}, α^{c}, k, ω]$ , bounds $Λ$ .
Initialize: Train the agent using randomly sampled hyperparameters $λ_{0} \in Λ$ .
Collect initial dataset $D_{0} = {λ_{0}, f (λ_{0})}$ .
for $n = 1$ to $n_{max}$ do
Fit Gaussian model with observed data $D_{n - 1}$ .
Select new hyperparameter set $λ_{n} \in Λ$ by optimizing acquisition function.
for $p = 1$ to $p_{max}$ do
Train the agent and store mean evaluation rewards after every N episodes; $r_{i}$ .
end for
Evaluate $f (λ_{n}) = - \frac{1}{N} \sum_{i = 1}^{N} r_{i}$ .
Augment dataset $D_{n} \leftarrow D_{n - 1} \cup {λ_{n}, f (λ_{n})}$ .
end for
Return: Best solution $λ^{*}$ in observed dataset D.

6.4. Simulation Results

Simulations in MATLAB/Simulink (R2024a) are carried out to verify the performance of the proposed methods. A comparative evaluation with model-based (PI, SMC) and model-free FLC controllers is also presented.

6.4.1. Training and Optimization Results

The control agents’ hyperparameters were optimized through multiple training experiments to minimize the objective function defined in (19). During each experiment, the agent interacted with the system model to track a series of randomly generated reference voltages. Thirty (30) evaluations of BO were performed to identify the optimal hyperparameter combination within the defined search space (Table 3). Each evaluation involved 300 training episodes. It is important to note that high computational power was required only during the training and optimization phases. Once trained, the agent maps system states to corresponding control actions according to the learned policy with negligible computational demand.

The results of hyperparameter optimization are presented in Figure 7. Figure 7a,b illustrate the objective function minimization over 30 iterations for the PPO and TD3 agents, respectively. The results show that the objective function attained its global minimum at the 4th evaluation for the TD3 agent, whereas the PPO agent reached its minimum at the 9th iteration. This observation highlights the faster convergence of TD3 compared to PPO. Moreover, the trajectories demonstrate that the estimated minimum objective closely corresponds to the observed minimum, further validating the effectiveness of the BO process.

Figure 8 illustrates the trend of the episodic and average rewards for the two agents during training. The results were obtained over 1000 training episodes using two hyperparameter configurations: the optimized set obtained through BO and a manually tuned baseline. Figure 8a,b show that the agents trained with optimized hyperparameters converge faster and achieve higher rewards compared to the baseline agents.

6.4.2. Comparative Evaluation Under Dynamic Conditions

To assess the performance of the proposed methods, simulations were performed under different dynamic operating conditions. Three scenarios were considered: (i) varying reference voltage, (ii) supply disturbances, and (iii) load disturbances.

Scenario 1–Varying Reference Voltage

In this scenario, the reference voltage was varied as shown in Figure 9a. Based on the output voltage waveforms in Figure 9b, a detailed transient performance analysis was conducted and summarized in Table 4. It can be seen that all the control methods maintained global stability, with the proposed BO-TD3 demonstrating the most balanced response, achieving a fast rise time of 0.0016 s and the lowest settling time of 0.0032 s. Although the BO-PPO achieved the fastest rise time (0.0012 s), it exhibited a relatively high overshoot compared to BO-TD3. It can also be seen that the SMC achieved the lowest overshoot, but exhibited slower response compared to the proposed methods. Furthermore, steady-state performance analysis in terms of root mean square error (RMSE), mean of absolute error (MAE), mean absolute percentage error (MAPE), and integral absolute error (IAE) is presented in Table 5. This analysis clearly indicates that the BO-TD3 consistently achieves the lowest steady-state error.

Scenario 2–Supply Disturbance

Figure 10 represents a scenario with perturbations in supply voltage as illustrated in Figure 10a. The corresponding results, shown in Figure 10b, compare the responses of the proposed and benchmark methods. The results demonstrate that all the methods achieved global stability despite disturbances in the supply. From the analysis presented in Table 6, it can be concluded that the BO-TD3 achieved the lowest steady state error metrics.

Scenario 3–Load Disturbance

Figure 11 shows the system response when the reference voltage is set to 48 V and the load power varies. Based on the output voltage regulation shown in Figure 11b, a detailed performance analysis is summarized in Table 7. The results indicate that, among the five control methods, the BO-TD3 achieves the lowest error metrics.

7. Challenges and Future Works

Although the existing literature has focused on developing various advanced voltage control techniques, several challenges still need to be further considered to improve adaptability and robustness. On the one hand, system complexity increases with the integration of more components and loads, along with the potential interaction between DC MG, which amplifies uncertainties and external disturbances. On the other hand, the degradation of ESD needs to be considered. Additionally, validating and implementing the proposed approaches in real-world scenarios, such as using realistic test benches, remain crucial. Therefore, future research directions are summarized below:

Traditional control approaches often struggle under system disturbances, such as the integration of new DER/ESD or system reconfiguration. These events frequently alter DC MG dynamics. Advanced control strategies like DRL can be trained to continuously adapt their control actions, maintaining stability under high perturbations.
Data-driven control methods, while offering flexibility and reduced model dependency, often lack interpretability due to their black-box nature. Hybrid control strategies that combine data-driven and physics-based models–such as Model-based reinforcement learning (MBRL) and Physics-informed reinforcement learning (PIRL) should be further explored to balance flexibility, interpretability, and explainability while enhancing voltage stability under uncertainty.
As MG continue to increase in scale and complexity (multiple DER, diverse loads and interconnection with other MG), the need for more adaptable and flexible control schemes becomes apparent. It is recommended to investigate and develop Multi-agent deep reinforcement learning (MADRL) frameworks. This will enable scalable and coordinated control across large-scale MG systems, allowing for robust autonomous decision-making and enhanced operational resilience.
To ensure the practical applicability and reliability of the proposed control algorithms, future research should incorporate real-time experimental validation. This could be achieved by establishing Hardware-in-the-loop (HIL) test environments, for initial system integration and controller testing, and Power hardware-in-the-loop (PHIL) to assess performance with actual power-level components.

8. Conclusions

This study conducted a comprehensive review of voltage control techniques in DC MG. The reviewed methods were categorized based on their dependence on explicit models for controller design, resulting in a structured analysis of model-based, model-free, and hybrid strategies. Each category was critically examined, highlighting its respective strengths and limitations.

Based on the survey findings, DRL-based control frameworks using the BO-TD3 and BO-PPO algorithms were proposed for voltage regulation in a buck converter with an input LC filter. Three simulation scenarios were evaluated: reference voltage variation, input voltage disturbance, and load fluctuation. The BO-TD3 achieved the lowest error metrics, with a reduction of up to 5.6% in root mean square error (RMSE), and 7.8% in mean absolute percentage error (MAPE) compared to benchmark methods. These findings highlight the potential of DRL-based methods to enhance the voltage stability of DC MGs, with the main limitation being their high computational cost during training.

Future research should focus on extending the control methods to complex and large-scale DC MG and ensure their practical applicability and reliability through real-time experimental validation.

Author Contributions

Conceptualization, S.M., H.O., A.H., M.H. and H.G.; methodology, S.M., H.O., A.H., M.H. and H.G.; software, S.M.; validation, S.M., H.O., A.H., M.H. and H.G.; formal analysis, S.M.; investigation, S.M.; writing—original draft preparation, S.M.; writing—review and editing, H.O., A.H., M.H. and H.G.; visualization, S.M., H.O. and A.H.; supervision, H.O., A.H., M.H. and H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADRC	Active Disturbance Rejection Control
ANFIS	Adaptive Neuro Fuzzy Inference System
ANN	Artificial Neural Network
BO	Bayesian Optimization
CPL	Constant Power Load
DDC	Data Driven Control
DDPG	Deep Deterministic Policy Gradient
DER	Distributed Energy Resources
DQN	Deep Q Network
DRL	Deep Reinforcement Learning
ESD	Energy Storage Devices
HESS	Hybrid Energy Storage Systems
HIL	Hardware-in-the-loop
IoT	Internet of Things
LLC	Local Linear Controller
LLM	Local Linear Model
MADRL	Multi Agent Deep Reinforcement Learning
MDP	Markov Decision Process
MFAC	Model Free Adaptive Control
MOA	Metaheuristic Optimization Algorithm
MPC	Model Predictive Control
PBC	Passivity Based Control
PEMFC	Proton Exchange Membrane Fuel Cell
PHIL	Power Hardware-in-the-loop
PI	Proportional Integral controller
PIRL	Physics Informed Reinforcement Learning
PPL	Pulse Power Load
PPO	Proximal Policy Optimization
PSO	Particle Swarm Optimization
RES	Renewable Energy Sources
SMC	Sliding Mode Control
SOC	State of Charge
TRPO	Trust Region Policy Optimization

Appendix A. Additional Tables

Table A1. Summarized review of model-based voltage control techniques.

Control Method	Reference	Year	Proposed Method	Main Contribution	Limitation
SMC	[33]	2023	Adaptive SMC	Improving voltage stability in a buck converter feeding CPLs	Complexity in design
	[34]	2023	HM-GFTSMC	Improving voltage stability in a buck converter	Proving stability can be challenging in a complex MG scenario.
	[35]	2024	HOSMC-PID	Improving large signal stability in a DC MG	Proving stability can be challenging in a complex MG scenario.
	[40]	2022	RBFNN estimation-based Adaptive SMC	Improving voltage stability in a PEMFC.	Increased complexity.
Adaptive Droop Control	[42]	2019	Hierarchical adaptive droop and supervisory control	Improving voltage stability and load power sharing in a DC MG with multi-energy storage devices.	The effectiveness of the proposed technique has not been validated against existing approaches.
	[43]	2020	Adaptive distributed droop	Improving DC bus voltage stability	Stability challenges in a large-scale system
	[44]	2023	Adaptive droop + consensus control	DC MG power smoothing and voltage control	Difficulty in tuning parameters.
	[46]	2022	Droop index control	Improving voltage stability.	Performance may be sensitive to droop index.
MPC	[53]	2021	FCS–MPC	Voltage control and power allocation optimization for DC MG with HESS	Prediction at each control cycle can be computationally intensive.
	[54]	2020	Fast distributed MPC	Improving voltage stability	High computational cost.
	[55]	2021	Hybrid MPC	Improving voltage stability of a boost converter interfaced with CPLs	Prediction at each control cycle can be computationally intensive.
	[56]	2022	MPC combined with Kalman Observer	Enhancing voltage stability of an interleaved boost converter	Increased sensitivity to model accuracy.
PBC	[59]	2019	PBC	DC MG voltage regulation	Performance under varying load conditions has not been investigated.
	[10]	2019	Decentralized PBC	Improving voltage stability	Performance depends on model accuracy
	[60]	2024	IDA-PBC + SMRC	Improving voltage stability	Parameter uncertainties have not been considered.
	[61]	2021	Adaptive PBC	Voltage regulation in a buck-boost converter	Performance depends on the accuracy of the system model.
ADRC	[64]	2015	ADRC	Improving performance of a flywheel energy storage system.	Not specified
	[65]	2017	Time-scale droop control based on ADRC	Time-scale voltage droop control robust to uncertainties and external disturbances.	Performance depends on model accuracy.
	[66]	2019	Modified ADRC	Comparison of ADRC techniques for suppressing disturbances in a boost converter.	Evaluation is based on average model.
H_∞	[68]	2019	H_∞	Enhancing voltage stability.	Choosing appropriate weighting functions is challenging.
	[70]	2023	Loop-shaping H_∞	Robust voltage control of DC MG.	Performance not validated against other methods.

Table A2. Summarized Review of Model-Free Voltage Control Techniques.

Control Method	Reference	Year	Proposed Method	Main Contribution	Limitation
FLC	[76]	2020	SCA-HS tuned Type II Fuzzy	Enhancing voltage stability in a boost converter feeding CPLs.	Performance is often sensitive to the choice of optimization parameters and fuzzy rule base.
	[77]	2020	iSIT2-FPI + SMC	The authors proposed an SMC-based model-free FLC.	Relatively complex to implement.
	[78]	2020	Fuzzy-PI dual mode	The authors combined FLC with PI to enhance dynamic response and restrain fluctuations of the bus voltage.	Tuning scaling gains is necessary whenever the system dynamic changes.
	[80]	2019	ANFIS	Improving transient and steady-state responses of a flyback converter using FLC and neural network	Training ANFIS requires high-quality data. High computational cost
ANN	[87]	2022	CCSNN	The authors proposed an EMS to enhance power sharing among CESS, as well as maintain bus voltage stability.	It is computationally intensive to tune CC hyperparameters and train a neural network.
	[88]	2020	HBSANN	Proposed an HBSANN-based power management strategy. Improving the voltage regulation of a DC MG	It requires high quality training data
	[89]	2021	DNN	Proposed a supervised deep learning aided-sensorless controller	Risk of overfitting
	[90]	2021	ANN–approximate dynamic programming	Improving voltage stability under variable load and input voltage conditions.	Requires high-quality data
LMN	[91]	2019	LMN + LLC	Identification of a DC/DC converter’s dynamics directly from measured data. Developed a voltage controller based on the identified model.	Performance was not evaluated against robust control methods.
MFAC	[96]	2023	MFAC	Design a pseudo-gradient estimation algorithm based on I/O data. Improving voltage stability in a BDC.	Pseudo-gradient estimation methods may introduce systematic errors due to the approximation process.
	[99]	2021	Model-free iSIT2-FPI	Improving voltage regulation in a stand-alone shipboard DC MG.	Increased complexity.
DRL	[108]	2023	PID+DDPG	Enhancing the voltage stability of a buck converter	Performance is partially dependent on model accuracy.
	[109]	2022	DQN	Improving the voltage stability of a buck converter	Handles only discrete actions.
	[114]	2024	TD3	Optimizing parameters of a PI controller.	Performance was tested under light load conditions.
	[116]	2020	DDPG	Voltage stabilization of IoT-based buck converter feeding CPLs.	The paper does not discuss training or simulation results.
	[117]	2023	Integral RL	Improving voltage stability in an interleaved boost converter.	The paper does not discuss training or simulation results.

Table A3. Summarized Review of Hybrid Control Strategies.

Control Method	Reference	Year	Proposed Method	Main Contribution	Limitation
MOAs	[119]	2020	GA-tuned PID	Improving voltage stability and performance of a fuel cell	Poor parameter tuning can influence effectiveness of the algorithm.
	[120]	2021	PSO-tuned PI	Improving voltage stability in a buck-boost converter.	Performance in the presence of disturbances has not been evaluated.
	[121]	2023	QOAOA	Improving efficiency of a cascaded boost converter.	Performance with varying load conditions is not discussed.
	[122]	2023	SSA-PSO	Improving voltage stability in a PV-powered MG.	The proposed method is not compared with other established methods.
PINNs	[127]	2024	PINN	Estimating SOH of a lithium-ion battery.	Difficulty in handling high-dimensional nonlinear models.
	[128]	2024	PINN	Enhancing stability in a buck converter.	Difficulty in handling high-dimensional converter dynamics.

References

Arshad, R.; Mininni, G.M.; De Rosa, R.; Khan, H.A. Enhancing climate resilience of vulnerable women in the Global South through power sharing in DC microgrids. Renew. Energy 2024, 237, 121495. [Google Scholar] [CrossRef]
Xu, X.; Xia, J.; Hong, C.; Sun, P.; Xi, P.; Li, J. Optimization of cooperative operation of multiple microgrids considering green certificates and carbon trading. Energies 2025, 18, 4083. [Google Scholar] [CrossRef]
Cagnano, A.; De Tuglie, E.; Mancarella, P. Microgrids: Overview and guidelines for practical implementations and operation. Appl. Energy 2020, 258, 114039. [Google Scholar] [CrossRef]
e Ammara, U.; Zehra, S.S.; Nazir, S.; Ahmad, I. Artificial neural network-based nonlinear control and modeling of a DC microgrid incorporating regenerative FC/HPEV and energy storage system. Renew. Energy Focus 2024, 49, 100565. [Google Scholar] [CrossRef]
Ojo, K.E.; Saha, A.K.; Srivastava, V.M. Microgrids’ control strategies and real-time monitoring systems: A comprehensive review. Energies 2025, 18, 3576. [Google Scholar] [CrossRef]
Eyimaya, S.E.; Altin, N.; Nasiri, A. Optimization of photovoltaic and battery storage sizing in a DC microgrid using LSTM networks based on load forecasting. Energies 2025, 18, 3676. [Google Scholar] [CrossRef]
Derakhshan, S.; Shafiee-Rad, M.; Shafiee, Q.; Jahed-Motlagh, M.R.; Sahoo, S.; Blaabjerg, F. Decentralized voltage control of autonomous DC microgrids with robust performance approach. IEEE J. Emerg. Sel. Top. Power Electron. 2021, 9, 5508–5520. [Google Scholar] [CrossRef]
Eydi, M.; Ghazi, R.; Buygi, M.O. A decentralized control method for proportional current-sharing, voltage restoration, and SOCs balancing of widespread DC microgrids. Int. J. Electr. Power Energy Syst. 2024, 155, 109645. [Google Scholar] [CrossRef]
Rizk, H.; Chaibet, A.; Kribèche, A. Model-based control and model-free control techniques for autonomous vehicles: A technical survey. Appl. Sci. 2023, 13, 6700. [Google Scholar] [CrossRef]
Cucuzzella, M.; Lazzari, R.; Kawano, Y.; Kosaraju, K.C.; Scherpen, J.M.A. Robust passivity-based control of boost converters in DC microgrids. In Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; pp. 8435–8440. [Google Scholar] [CrossRef]
Santoni, C.; Zhang, Z.; Sotiropoulos, F.; Khosronejad, A. A data-driven machine learning approach for yaw control applications of wind farms. Theor. Appl. Mech. Lett. 2023, 13, 100471. [Google Scholar] [CrossRef]
Modu, B.; Abdullah, M.P.; Sanusi, M.A.; Hamza, M.F. DC-based microgrid: Topologies, control schemes, and implementations. Alex. Eng. J. 2023, 70, 61–92. [Google Scholar] [CrossRef]
Ashok Kumar, A.; Amutha Prabha, N. A comprehensive review of DC microgrid in market segments and control technique. Heliyon 2022, 8, e11694. [Google Scholar] [CrossRef]
Xu, Q.; Vafamand, N.; Chen, L.; Dragicevic, T.; Xie, L.; Blaabjerg, F. Review on advanced control technologies for bidirectional DC/DC converters in DC microgrids. IEEE J. Emerg. Sel. Top. Power Electron. 2021, 9, 1205–1221. [Google Scholar] [CrossRef]
Ekanayake, U.N.; Navaratne, U.S. A survey on microgrid control techniques in islanded mode. J. Electr. Comput. Eng. 2020, 2020, 6275460. [Google Scholar] [CrossRef]
Bukar, A.L.; Modu, B.; Abdullah, M.P.; Hamza, M.F.; Almutairi, S.Z. Peer-to-peer energy trading framework for an autonomous DC microgrid using game theoretic approach. Renew. Energy Focus 2024, 51, 100636. [Google Scholar] [CrossRef]
Patel, R.; Chudamani, R. Stability analysis of the main converter supplying a constant power load in a multi-converter system considering various parasitic elements. Eng. Sci. Technol. Int. J. 2020, 23, 1118–1125. [Google Scholar] [CrossRef]
Rahimian, M.M.; Mohammadi, H.R.; Guerrero, J.M. Constant power load issue in DC/DC multi-converter systems: Past studies and recent trends. Electr. Power Syst. Res. 2024, 235, 110851. [Google Scholar] [CrossRef]
Gheisarnejad, M.; Akhbari, A.; Rahimi, M.; Andresen, B.; Khooban, M.H. Reducing impact of constant power loads on DC energy systems by artificial intelligence. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 4974–4978. [Google Scholar] [CrossRef]
Liu, Z.; Liu, Y.; Yu, Y.; Yang, R. Coordinated control and optimal flow of shipboard MVDC system for adapting to large pulsed power load. Electr. Power Syst. Res. 2023, 221, 109354. [Google Scholar] [CrossRef]
Kumar, K.; Kumar, P.; Kar, S. A review of microgrid protection for addressing challenges and solutions. Renew. Energy Focus 2024, 49, 100572. [Google Scholar] [CrossRef]
Tavagnutti, A.A.; Bosich, D.; Pastore, S.; Sulligoi, G. A reduced order model for the stable LC-filter design on shipboard DC microgrids. In Proceedings of the 2023 IEEE International Conference on Electrical Systems for Aircraft, Railway, Ship Propulsion and Road Vehicles & International Transportation Electrification Conference (ESARS-ITEC), Venice, Italy, 29–31 March 2023; pp. 1–6. [Google Scholar] [CrossRef]
Al-Ismail, F.S. A critical review on DC microgrids voltage control and power management. IEEE Access 2024, 12, 30345–30361. [Google Scholar] [CrossRef]
Li, F.; Tu, W.; Zhou, Y.; Li, H.; Zhou, F.; Liu, W.; Hu, C. Distributed secondary control for DC microgrids using two-stage multi-agent reinforcement learning. Int. J. Electr. Power Energy Syst. 2025, 164, 110335. [Google Scholar] [CrossRef]
Saleem, O.; Rizwan, M. Performance optimization of LQR-based PID controller for DC-DC buck converter via iterative-learning-tuning of state-weighting matrix. Int. J. Numer. Model. Electron. Netw. Devices Fields 2019, 32, e2572. [Google Scholar] [CrossRef]
Sheikhi Jouybary, H.; Arab Khaburi, D.; El Hajjaji, A.; Mpanda Mabwe, A. Optimal sliding mode control of modular multilevel converters considering control input constraints. Energies 2025, 18, 2757. [Google Scholar] [CrossRef]
Peng, C.; Xie, C.; Zou, J.; Jiang, X.; Zhu, Y. A feedback linearization sliding mode decoupling and fuzzy anti-surge compensation based coordinated control approach for PEMFC air supply system. Renew. Energy 2024, 237, 121760. [Google Scholar] [CrossRef]
Ullah, Q.; Busarello, T.D.C.; Brandao, D.I.; Simões, M.G. Design and performance evaluation of SMC-based DC–DC converters for microgrid applications. Energies 2023, 16, 4212. [Google Scholar] [CrossRef]
Obeid, H.; Petrone, R.; Chaoui, H.; Gualous, H. Higher order sliding-mode observers for state-of-charge and state-of-health estimation of lithium-ion batteries. IEEE Trans. Veh. Technol. 2023, 72, 4482–4492. [Google Scholar] [CrossRef]
Muhammad, R.; Muhammad, A.; Bhatti, A.I.; Minhas, D.M.; Ahmed, B.A. Mathematical modeling and stability analysis of DC microgrid using SM hysteresis controller. Int. J. Electr. Power Energy Syst. 2018, 95, 507–522. [Google Scholar] [CrossRef]
Cucuzzella, M.; Lazzari, R.; Trip, S.; Rosti, S.; Sandroni, C.; Ferrara, A. Sliding mode voltage control of boost converters in DC microgrids. Control Eng. Pract. 2018, 73, 161–170. [Google Scholar] [CrossRef]
Mathew, K.K.; Abraham, D.M. Particle swarm optimization based sliding mode controllers for electric vehicle onboard charger. Comput. Electr. Eng. 2021, 96, 107502. [Google Scholar] [CrossRef]
Mustafa, G.; Ahmad, F.; Zhang, R.; Haq, E.U.; Hussain, M. Adaptive sliding mode control of buck converter feeding resistive and constant power load in DC microgrid. Energy Rep. 2023, 9, 1026–1035. [Google Scholar] [CrossRef]
Balta, G.; Güler, N.; Altin, N. Global fast terminal sliding mode control with fixed switching frequency for voltage control of DC–DC buck converters. ISA Trans. 2023, 143, 582–595. [Google Scholar] [CrossRef]
Roy, T.K.; Oo, A.M.T.; Ghosh, S.K. Designing a high-order sliding mode controller for photovoltaic- and battery energy storage system-based DC microgrids with ANN-MPPT. Energies 2024, 17, 532. [Google Scholar] [CrossRef]
Li, X.; Wang, M.; Dong, C.; Jiang, W.; Xu, Z.; Wu, X.; Jia, H. A robust autonomous sliding-mode control of renewable DC microgrids for decentralized power sharing considering large-signal stability. Appl. Energy 2023, 339, 121019. [Google Scholar] [CrossRef]
Derakhshannia, M.; Moosapour, S.S. RBFNN based fixed time sliding mode control for PEMFC air supply system with input delay. Renew. Energy 2024, 237, 121772. [Google Scholar] [CrossRef]
Zhang, L.; Chen, K.; Chi, S.; Lyu, L.; Ma, H.; Wang, K. The bidirectional DC/DC converter operation mode control algorithm based on RBF neural network. In Proceedings of the 2019 IEEE PES International Conference on Innovative Smart Grid Technologies Asia (ISGT Asia), Chengdu, China, 21–24 May 2019. [Google Scholar] [CrossRef]
Chen, X.; Shen, W.; Dai, M.; Cao, Z.; Jin, J.; Kapoor, A. Robust adaptive sliding-mode observer using RBF neural network for lithium-ion battery state of charge estimation in electric vehicles. IEEE Trans. Veh. Technol. 2016, 65, 1936–1947. [Google Scholar] [CrossRef]
Xiao, X.; Lv, J.; Chang, Y.; Chen, J.; He, H. Adaptive sliding mode control integrating with RBFNN for proton exchange membrane fuel cell power conditioning. Appl. Sci. 2022, 12, 3132. [Google Scholar] [CrossRef]
Zhang, H.; Sinha, R.; Golmohamadi, H.; Chaudhary, S.K.; Bak-Jensen, B. Autonomous control of electric vehicles using voltage droop. Energies 2025, 18, 2824. [Google Scholar] [CrossRef]
Yuan, M.; Fu, Y.; Mi, Y.; Li, Z.; Wang, C. Hierarchical control of DC microgrid with dynamical load power sharing. Appl. Energy 2019, 239, 1–11. [Google Scholar] [CrossRef]
Kumar, R.; Pathak, M.K. Distributed droop control of DC microgrid for improved voltage regulation and current sharing. IET Renew. Power Gener. 2020, 14, 2499–2506. [Google Scholar] [CrossRef]
Li, X.; Li, P.; Ge, L.; Wang, X.; Li, Z.; Zhu, L.; Guo, L.; Wang, C. A unified control of super-capacitor system based on bi-directional DC-DC converter for power smoothing in DC microgrid. J. Mod. Power Syst. Clean Energy 2023, 11, 938–949. [Google Scholar] [CrossRef]
Rehmat, A.; Alam, F.; Nasir, M.; Zaidi, S.S. Robust hierarchical non-linear droop control design for the PV based islanded microgrid. In Proceedings of the 19th International Bhurban Conference on Applied Sciences and Technology (IBCAST), Islamabad, Pakistan, 11–15 January 2022; pp. 620–628. [Google Scholar] [CrossRef]
Thogaru, R.B.; Naware, D.; Mitra, A.; Chaudhary, J. Resiliency-driven approach of DC microgrid voltage regulation based on droop index control for high step-up DC-DC converter. Int. Trans. Electr. Energy Syst. 2022, 2022, 3676438. [Google Scholar] [CrossRef]
Guo, Z.; Zuo, D.; Liu, X.; Zhang, Z.; Ma, J.; Meng, F.; Fang, Y. Dual tracking model-free predictive control for three-level neutral-point clamped inverters. Electr. Power Syst. Res. 2025, 249, 112054. [Google Scholar] [CrossRef]
Murillo-Yarce, D.; Riffo, S.; Restrepo, C.; González-Castaño, C.; Garcés, A. Model predictive control for stabilization of DC microgrids in island mode operation. Mathematics 2022, 10, 3384. [Google Scholar] [CrossRef]
Chen, L.; Zhou, J.; Zhai, J.; Yang, L.; Qian, X.; Tao, Z. Continuous-control-set model predictive control strategy for MMC-UPQC under non-ideal conditions. Energies 2025, 18, 2946. [Google Scholar] [CrossRef]
Babqi, A.J.; Alamri, B. A comprehensive comparison between finite control set model predictive control and classical proportional-integral control for grid-tied power electronics devices. Acta Polytech. Hung. 2021, 18, 67–87. [Google Scholar] [CrossRef]
Diaz Franco, F.; Vu, T.V.; Gonsulin, D.; Vahedi, H.; Edrington, C.S. Enhanced performance of PV power control using model predictive control. Sol. Energy 2017, 158, 679–686. [Google Scholar] [CrossRef]
Villalón, A.; Rivera, M.; Salgueiro, Y.; Muñoz, J.; Dragičević, T.; Blaabjerg, F. Predictive control for microgrid applications: A review study. Energies 2020, 13, 2454. [Google Scholar] [CrossRef]
Ni, F.; Zheng, Z.; Xie, Q.; Xiao, X.; Zong, Y.; Huang, C. Enhancing resilience of DC microgrids with model predictive control based hybrid energy storage system. Int. J. Electr. Power Energy Syst. 2021, 128, 106738. [Google Scholar] [CrossRef]
Marepalli, L.K.; Gajula, K.; Herrera, L. Fast distributed model predictive control for DC microgrids. In Proceedings of the 21st IEEE Workshop on Control and Modeling for Power Electronics (COMPEL 2020), Aalborg, Denmark, 9–12 November 2020. [Google Scholar] [CrossRef]
Karami, Z.; Shafiee, Q.; Sahoo, S.; Yaribeygi, M.; Bevrani, H.; Dragičević, T. Hybrid model predictive control of DC-DC boost converters with constant power load. IEEE Trans. Energy Convers. 2021, 36, 1347–1356. [Google Scholar] [CrossRef]
Tan, B.; Li, H.; Zhao, D.; Liang, Z.; Ma, R.; Huangfu, Y. Finite-control-set model predictive control of interleaved DC-DC boost converter based on Kalman observer. eTransportation 2022, 11, 100158. [Google Scholar] [CrossRef]
Kao, C.Y.; Khong, S.Z.; van der Schaft, A. On the converse passivity theorems for LTI systems. In Proceedings of the IFAC World Congress, IFAC-PapersOnLine, Berlin, Germany, 11–17 July 2020; pp. 6422–6427. [Google Scholar] [CrossRef]
Acevedo, D.M.; Parraguez-Garrido, I.; Gil-Gonzalez, W.; Montoya, O.D.; Gonzalez-Castano, C. Adaptive passivity-based control for DC motor speed regulation in DC-DC converter-fed systems. IEEE Access 2025, 13, 131957–131966. [Google Scholar] [CrossRef]
Sun, J.; Lin, W.; Hong, M.; Loparo, K.A. Voltage regulation of DC-microgrid with PV and battery: A passivity method. In Proceedings of the IFAC Workshop on Control of Smart Grid and Renewable Energy Systems (CSGRES), IFAC-PapersOnLine, Jeju, Republic of Korea, 10–12 June 2019; pp. 753–758. [Google Scholar] [CrossRef]
Martínez, L.; Fernández, D.; Mantz, R. Passivity-based control for an isolated DC microgrid with hydrogen energy storage system. Int. J. Hydrogen Energy 2024, 67, 1262–1269. [Google Scholar] [CrossRef]
Soriano-Rangel, C.A.; He, W.; Mancilla-David, F.; Ortega, R. Voltage regulation in buck-boost converters feeding an unknown constant power load: An adaptive passivity-based control. IEEE Trans. Control Syst. Technol. 2021, 29, 395–402. [Google Scholar] [CrossRef]
Han, J. From PID to active disturbance rejection control. IEEE Trans. Ind. Electron. 2009, 56, 900–906. [Google Scholar] [CrossRef]
Hao, F.; Guo, J.; Yu, Z.; Ye, J. Output voltage control of LLC resonant converter based on improved linear active disturbance rejection control. IEEE J. Emerg. Sel. Top. Power Electron. 2025, 13, 3555–3564. [Google Scholar] [CrossRef]
Chang, X.; Li, Y.; Zhang, W.; Wang, N.; Xue, W. Active disturbance rejection control for a flywheel energy storage system. IEEE Trans. Ind. Electron. 2015, 62, 991–1001. [Google Scholar] [CrossRef]
Yang, N.; Gao, F.; Paire, D.; Miraoui, A.; Liu, W. Distributed control of multi-time scale DC microgrid based on ADRC. IET Power Electron. 2017, 10, 329–337. [Google Scholar] [CrossRef]
Ahmad, S.; Ali, A. Active disturbance rejection control of DC–DC boost converter: A review with modifications for improved performance. IET Power Electron. 2019, 12, 2095–2107. [Google Scholar] [CrossRef]
Zhao, Y.; Yang, X.; Zhang, Y.; Zhang, Q. Event-triggered H-infinity pitch control for floating offshore wind turbines. IEEE Trans. Sustain. Energy 2025, 16, 1329–1339. [Google Scholar] [CrossRef]
Rigatos, G.; Zervos, N.; Siano, P.; Abbaszadeh, M.; Wira, P.; Onose, B. Nonlinear optimal control for DC industrial microgrids. Cyber-Phys. Syst. 2019, 5, 231–253. [Google Scholar] [CrossRef]
Mehdi, M.; Jamali, S.Z.; Khan, M.O.; Baloch, S.; Kim, C.H. Robust control of a DC microgrid under parametric uncertainty and disturbances. Electr. Power Syst. Res. 2020, 179, 106074. [Google Scholar] [CrossRef]
Ruchi, S.; Avirup, M.; Shyam, K. Robust control of an islanded DC microgrid using H-infinity loop-shaping design considering parametric uncertainties. In Proceedings of the TENCON 2023—IEEE Region 10 Conference, Chiang Mai, Thailand, 31 October–3 November 2023; pp. 1082–1087. [Google Scholar] [CrossRef]
Al Sumarmad, K.A.; Sulaiman, N.; Wahab, N.I.A.; Hizam, H. Energy management and voltage control in microgrids using artificial neural networks, PID, and fuzzy logic controllers. Energies 2022, 15, 303. [Google Scholar] [CrossRef]
Chandrasekaran, S.; Durairaj, S.; Padmavathi, S. A performance evaluation of a fuzzy logic controller-based photovoltaic-fed multi-level inverter for a three-phase induction motor. J. Frankl. Inst. 2021, 358, 7394–7412. [Google Scholar] [CrossRef]
Dumitrescu, C.; Ciotirnae, P.; Vizitiu, C. Fuzzy logic for intelligent control system using soft computing applications. Sensors 2021, 21, 2617. [Google Scholar] [CrossRef]
Belman-Flores, J.M.; Rodríguez-Valderrama, D.A.; Ledesma, S.; García-Pabón, J.J.; Hernández, D.; Pardo-Cely, D.M. A review on applications of fuzzy logic control for refrigeration systems. Appl. Sci. 2022, 12, 1302. [Google Scholar] [CrossRef]
Bhosale, R.; Agarwal, V. Fuzzy logic control of the ultracapacitor interface for enhanced transient response and voltage stability of a DC microgrid. IEEE Trans. Ind. Appl. 2019, 55, 712–720. [Google Scholar] [CrossRef]
Farsizadeh, H.; Gheisarnejad, M.; Mosayebi, M.; Rafiei, M.; Khooban, M.H. An intelligent and fast controller for DC/DC converter feeding CPL in a DC microgrid. IEEE Trans. Circuits Syst. II Express Briefs 2020, 67, 1104–1108. [Google Scholar] [CrossRef]
Khooban, M.H.; Gheisarnejad, M.; Farsizadeh, H.; Masoudian, A.; Boudjadar, J. A new intelligent hybrid control approach for DC-DC converters in zero-emission ferry ships. IEEE Trans. Power Electron. 2020, 35, 5832–5841. [Google Scholar] [CrossRef]
Zhang, Y.; Wei, S.; Wang, J.; Zhang, L. Bus voltage stabilization control of photovoltaic DC microgrid based on fuzzy-PI dual-mode controller. J. Electr. Comput. Eng. 2020, 2020, 2683052. [Google Scholar] [CrossRef]
Rodriguez, M.; Arcos-Aviles, D.; Martinez, W. Fuzzy logic-based energy management for isolated microgrid using meta-heuristic optimization algorithms. Appl. Energy 2023, 335, 120771. [Google Scholar] [CrossRef]
Shahid, M.A.; Abbas, G.; Hussain, M.R.; Asad, M.U.; Farooq, U.; Gu, J.; Balas, V.E.; Uzair, M.; Awan, A.B.; Yazdan, T. Artificial intelligence-based controller for DC-Dc flyback converter. Appl. Sci. 2019, 9, 5108. [Google Scholar] [CrossRef]
Al-Hitmi, M.A.; Islam, S.; Muyeen, S.M.; Iqbal, A.; Thomas, K.; Abdullah, A.K.M.; Ben-brahim, L. An ANN-based distributed secondary controller used to ensure accurate current sharing in DC microgrid. AEU Int. J. Electron. Commun. 2025, 201, 156002. [Google Scholar] [CrossRef]
Zhao, S.; Blaabjerg, F.; Wang, H. An overview of artificial intelligence applications for power electronics. IEEE Trans. Power Electron. 2021, 36, 4633–4658. [Google Scholar] [CrossRef]
Han, Y.; Liao, Y.; Ma, X.; Guo, X.; Li, C.; Liu, X. Analysis and prediction of the penetration of renewable energy in power systems using artificial neural network. Renew. Energy 2023, 215, 118914. [Google Scholar] [CrossRef]
Lopez-Garcia, T.B.; Coronado-Mendoza, A.; Domínguez-Navarro, J.A. Artificial neural networks in microgrids: A review. Eng. Appl. Artif. Intell. 2020, 95, 103894. [Google Scholar] [CrossRef]
Saadatmand, S.; Shamsi, P.; Ferdowsi, M. The voltage regulation of a buck converter using a neural network predictive controller. In Proceedings of the 2020 IEEE Texas Power and Energy Conference (TPEC), College Station, TX, USA, 6–7 February 2020; pp. 1–6. [Google Scholar] [CrossRef]
Khan, H.S.; Mohamed, I.S.; Kauhaniemi, K.; Liu, L. Artificial neural network-based voltage control of DC/DC converter for DC microgrid applications. In Proceedings of the 2021 6th IEEE Workshop on the Electronic Grid (eGRID), New Orleans, LA, USA, 8–10 November 2021; pp. 1–6. [Google Scholar] [CrossRef]
Singh, P.; Anwer, N.; Lather, J.S. Energy management and control for direct current microgrid with composite energy storage system using combined cuckoo search algorithm and neural network. J. Energy Storage 2022, 55, 105689. [Google Scholar] [CrossRef]
Singh, P.; Lather, J.S. Dynamic power management and control for low voltage DC microgrid with hybrid energy storage system using hybrid bat search algorithm and artificial neural network. J. Energy Storage 2020, 32, 101974. [Google Scholar] [CrossRef]
Akpolat, A.N.; Dursun, E.; Kuzucuoglu, A.E. Deep learning-aided sensorless control approach for PV converters in DC nanogrids. IEEE Access 2021, 9, 106641–106654. [Google Scholar] [CrossRef]
Dong, W.; Li, S.; Fu, X.; Li, Z.; Fairbank, M.; Gao, Y. Control of a buck DC/DC converter using approximate dynamic programming and artificial neural networks. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 1760–1768. [Google Scholar] [CrossRef]
Rouzbehi, K.; Miranian, A.; Escaño, J.M.; Rakhshani, E.; Shariati, N.; Pouresmaeil, E. A data-driven based voltage control strategy for DC-DC converters: Application to DC microgrid. Electronics 2019, 8, 493. [Google Scholar] [CrossRef]
She, B.; Li, F.; Cui, H.; Zhang, J.; Bo, R. Fusion of model-free reinforcement learning with microgrid control: Review and vision. IEEE Trans. Smart Grid 2023, 14, 3232–3245. [Google Scholar] [CrossRef]
Hartmann, B.; Nelle, O. On the smoothness in local model networks. In Proceedings of the American Control Conference, St. Louis, MO, USA, 10–12 June 2009; pp. 3573–3578. [Google Scholar] [CrossRef]
Novak, J.; Chalupa, P.; Bobal, V. Local model networks for modelling and predictive control of nonlinear systems. In Proceedings of the 23rd European Conference on Modelling and Simulation (ECMS 2009), Madrid, Spain, 9–12 June 2009; European Council for Modelling and Simulation: Kingston upon Thames, UK, 2009; pp. 557–562. [Google Scholar] [CrossRef]
Hou, Z.; Xiong, S. On model-free adaptive control and its stability analysis. IEEE Trans. Autom. Control 2019, 64, 4555–4569. [Google Scholar] [CrossRef]
Wang, Z.; Wang, D.; Peng, Z.; Liu, L. Model-free adaptive control for ultracapacitor based three-phase interleaved bidirectional DC–DC converter. IET Power Electron. 2023, 16, 2696–2707. [Google Scholar] [CrossRef]
Yu, W.; Wang, R.; Bu, X.; Hou, Z. Model free adaptive control for a class of nonlinear systems with fading measurements. J. Frankl. Inst. 2020, 357, 7743–7760. [Google Scholar] [CrossRef]
Saeid, A.H.; Babak, N.-M. Active stabilization of a microgrid using model free adaptive control. In Proceedings of the 2017 IEEE Industry Applications Society Annual Meeting, Cincinnati, OH, USA, 1–5 October 2017; pp. 1–8. [Google Scholar] [CrossRef]
Mosayebi, M.; Sadeghzadeh, S.M.; Gheisarnejad, M.; Khooban, M.H. Intelligent and fast model-free sliding mode control for shipboard DC microgrids. IEEE Trans. Transp. Electrif. 2021, 7, 1662–1671. [Google Scholar] [CrossRef]
Yue, J.; Liu, Z.; Su, H. Model-free composite disturbance rejection control for dynamic wireless charging system based on online parameter identification. IEEE Trans. Ind. Electron. 2024, 71, 7777–7785. [Google Scholar] [CrossRef]
Yue, J.; Liu, Z.; Su, H. Data-driven adaptive extended state observer-based model-free disturbance rejection control for DC–DC converters. IEEE Trans. Ind. Electron. 2024, 71, 7745–7755. [Google Scholar] [CrossRef]
Cai, Q.; Luo, X.Q.; Wang, P.; Gao, C.; Zhao, P. Hybrid model-driven and data-driven control method based on machine learning algorithm in energy hub and application. Appl. Energy 2022, 305, 117913. [Google Scholar] [CrossRef]
Dan, Y.; Zhong, H.; Wang, C.; Wang, J.; Fei, Y.; Yu, L. A graph deep reinforcement learning-based fault restoration method for active distribution networks. Energies 2025, 18, 4420. [Google Scholar] [CrossRef]
Wolgast, T.; Niese, A. Approximating energy market clearing and bidding with model-based reinforcement learning. IEEE Access 2024, 12, 145106–145117. [Google Scholar] [CrossRef]
Bachiri, K.; Yahyaouy, A.; Gualous, H.; Malek, M.; Bennani, Y.; Makany, P.; Rogovschi, N. Multi-agent DDPG based electric vehicles charging station recommendation. Energies 2023, 16, 6067. [Google Scholar] [CrossRef]
Zandi, O.; Poshtan, J. Voltage control of a quasi Z-source converter under constant power load condition using reinforcement learning. Control Eng. Pract. 2023, 135, 105499. [Google Scholar] [CrossRef]
Shakya, A.K.; Pillai, G.; Chakrabarty, S. Reinforcement learning algorithms: A brief survey. Expert Syst. Appl. 2023, 231, 120495. [Google Scholar] [CrossRef]
Hu, K.; Zhang, X.; Ma, H. A novel proportion-integral-differential controller based on deep reinforcement learning for DC/DC power buck converters. In Proceedings of the 2021 IEEE 1st International Power Electronics and Application Symposium (PEAS), Shanghai, China, 13–15 November 2021. [Google Scholar] [CrossRef]
Cui, C.; Yan, N.; Huangfu, B.; Yang, T.; Zhang, C. Voltage regulation of DC-DC buck converters feeding CPLs via deep reinforcement learning. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 1777–1781. [Google Scholar] [CrossRef]
Shi, X.; Chen, N.; Wei, T.; Wu, J.; Xiao, P. A reinforcement learning-based online-training AI controller for DC-DC switching converters. In Proceedings of the 2021 6th International Conference on Integrated Circuits and Microsystems (ICICM 2021), Nanjing, China, 22–24 October 2021; pp. 435–438. [Google Scholar] [CrossRef]
Marahatta, A.; Rajbhandari, Y.; Shrestha, A.; Phuyal, S.; Thapa, A.; Korba, P. Model predictive control of DC/DC boost converter with reinforcement learning. Heliyon 2022, 8, e11416. [Google Scholar] [CrossRef]
Ye, J.; Guo, H.; Zhao, D.; Wang, B.; Zhang, X. TD3 algorithm based reinforcement learning control for multiple-input multiple-output DC–DC converters. IEEE Trans. Power Electron. 2024, 39, 12729–12742. [Google Scholar] [CrossRef]
Rajamallaiah, A.; Karri, S.P.K.; Shankar, Y.R. Deep reinforcement learning based control strategy for voltage regulation of DC-DC buck converter feeding CPLs in DC microgrid. IEEE Access 2024, 12, 17419–17430. [Google Scholar] [CrossRef]
Muktiadji, R.F.; Ramli, M.A.M.; Milyani, A.H. Twin-delayed deep deterministic policy gradient algorithm to control a boost converter in a DC microgrid. Electronics 2024, 13, 433. [Google Scholar] [CrossRef]
Gheisarnejad, M.; Farsizadeh, H.; Khooban, M.H. A novel nonlinear deep reinforcement learning controller for DC-DC power buck converters. IEEE Trans. Ind. Electron. 2021, 68, 6849–6858. [Google Scholar] [CrossRef]
Gheisarnejad, M.; Khooban, M.H. IoT-based DC/DC deep learning power converter control: Real-time implementation. IEEE Trans. Power Electron. 2020, 35, 13621–13630. [Google Scholar] [CrossRef]
Qie, T.; Zhang, X.; Xiang, C.; Yu, Y.; Iu, H.H.C.; Fernando, T. A new robust integral reinforcement learning based control algorithm for interleaved DC/DC boost converter. IEEE Trans. Ind. Electron. 2023, 70, 3729–3739. [Google Scholar] [CrossRef]
Tomar, V.; Bansal, M.; Singh, P. Metaheuristic algorithms for optimization: A brief review. Eng. Proc. 2023, 59, 238. [Google Scholar] [CrossRef]
Kumar, S.; Krishnasamy, V.; Neeli, S.; Kaur, R. Artificial intelligence power controller of fuel cell based DC nanogrid. Renew. Energy Focus 2020, 34, 120–128. [Google Scholar] [CrossRef]
Vadi, S.; Gurbuz, F.B.; Bayindir, R.; Sagiroglu, S. Optimization of PI based buck-boost converter by particle swarm optimization algorithm. In Proceedings of the 9th International Conference on Smart Grid (icSmartGrid), Setubal, Portugal, 29 June–1 July 2021; pp. 295–301. [Google Scholar] [CrossRef]
Hema, S.; Sukhi, Y. Deep learning-based FOPID controller for cascaded DC-DC converters. Comput. Syst. Sci. Eng. 2023, 46, 1503–1519. [Google Scholar] [CrossRef]
AL-Wesabi, I.; Fang, Z.; Farh, H.M.H.; Dagal, I.; Al-Shamma’a, A.A.; Al-Shaalan, A.M.; Yang, K. Hybrid SSA-PSO based intelligent direct sliding-mode control for extracting maximum photovoltaic output power and regulating the DC-bus voltage. Int. J. Hydrogen Energy 2024, 51, 348–370. [Google Scholar] [CrossRef]
Nanyan, N.F.; Ahmad, M.A.; Hekimoğlu, B. Optimal PID controller for the DC-DC buck converter using the improved sine cosine algorithm. Results Control Optim. 2024, 14, 100352. [Google Scholar] [CrossRef]
Farea, A.; Yli-Harja, O.; Emmert-Streib, F. Understanding physics-informed neural networks: Techniques, applications, trends, and challenges. AI 2024, 5, 1534–1557. [Google Scholar] [CrossRef]
Stiasny, J.; Chatzivasileiadis, S. Physics-informed neural networks for time-domain simulations: Accuracy, computational cost, and flexibility. Electr. Power Syst. Res. 2023, 224, 109748. [Google Scholar] [CrossRef]
Huang, B.; Wang, J. Applications of physics-informed neural networks in power systems—A review. IEEE Trans. Power Syst. 2023, 38, 572–588. [Google Scholar] [CrossRef]
Wang, F.; Zhai, Z.; Zhao, Z.; Di, Y.; Chen, X. Physics-informed neural network for lithium-ion battery degradation stable modeling and prognosis. Nat. Commun. 2024, 15, 48779. [Google Scholar] [CrossRef]
Hui, P.; Cui, C.; Lin, P.; Ghias, A.M.Y.M.; Niu, X.; Zhang, C. On physics-informed neural network control for power electronics. arXiv 2024, arXiv:2406.15787. [Google Scholar] [CrossRef]
Lee, J.S.; Lee, G.Y.; Park, S.S.; Kim, R.Y. Impedance-based modeling and common bus stability enhancement control algorithm in DC microgrid. IEEE Access 2020, 8, 211224–211234. [Google Scholar] [CrossRef]
Hao, W.; Han, H.; Liu, Z.; Sun, Y.; Su, M.; Hou, X.; Yang, P. A stabilization method of LC input filter in DC microgrids feeding constant power loads. In Proceedings of the 2017 IEEE Energy Conversion Congress and Exposition (ECCE), Cincinnati, OH, USA, 1–5 October 2017. [Google Scholar]
Pang, S.; Nahid-Mobarakeh, B.; Pierfederici, S.; Huangfu, Y.; Luo, G.; Gao, F. Research on LC filter cascaded with buck converter supplying constant power load based on IDA-passivity-based control. In Proceedings of the IECON 2018—44th Annual Conference of the IEEE Industrial Electronics Society, Washington, DC, USA, 21–23 October 2018; pp. 4992–4997. [Google Scholar] [CrossRef]
Firdous, N.; Din, N.M.U.; Assad, A. An imbalanced classification approach for establishment of cause-effect relationship between heart-failure and pulmonary embolism using deep reinforcement learning. Eng. Appl. Artif. Intell. 2023, 126, 107004. [Google Scholar] [CrossRef]
Rehman, A.U.; Ullah, Z.; Qazi, H.S.; Hasanien, H.M.; Khalid, H.M. Reinforcement learning-driven proximal policy optimization-based voltage control for PV and WT integrated power system. Renew. Energy 2024, 227, 120590. [Google Scholar] [CrossRef]
Del Rio, A.; Jimenez, D.; Serrano, J. Comparative analysis of A3C and PPO algorithms in reinforcement learning: A survey on general environments. IEEE Access 2024, 12, 146795–146806. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Zhang, F.; Li, J.; Li, Z. A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment. Neurocomputing 2020, 411, 206–215. [Google Scholar] [CrossRef]
Fujimoto, S.; van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. arXiv 2018, arXiv:1802.09477. [Google Scholar] [CrossRef]
Wang, S.; Duan, J.; Shi, D.; Xu, C.; Li, H.; Diao, R.; Wang, Z. A data-driven multi-agent autonomous voltage control framework using deep reinforcement learning. IEEE Trans. Power Syst. 2020, 35, 4644–4654. [Google Scholar] [CrossRef]
Liessner, R.; Schmitt, J.; Dietermann, A.; Bäker, B. Hyperparameter optimization for deep reinforcement learning in vehicle energy management. In Proceedings of the ICAART 2019-Proceedings of the 11th International Conference on Agents and Artificial Intelligence, Prague, Czech Republic, 19–21 February 2019; SciTePress: Setúbal, Portugal, 2019; pp. 134–144. [Google Scholar] [CrossRef]
Victoria, A.H.; Maragatham, G. Automatic tuning of hyperparameters using Bayesian optimization. Evolving Syst. 2021, 12, 217–223. [Google Scholar] [CrossRef]
Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
MohammadNoor, I.; Fadi, S.; Ali Bou, N.; Aleksander, E.; Abdallah, S. Bayesian optimization with machine learning algorithms towards anomaly detection. In Proceedings of the IEEE Global Communications Conference, Abu Dhabi, United Arab Emirates, 9–13 December 2018; p. 90. [Google Scholar] [CrossRef]
Eimer, T.; Lindauer, M.; Raileanu, R. Hyperparameters in Reinforcement Learning and How to Tune Them. arXiv 2023, arXiv:2306.01324. [Google Scholar] [CrossRef]

Figure 1. Typical configuration of DC microgrid.

Figure 2. Negative incremental impedance characteristics of CPL.

Figure 3. Architecture of DRL controller.

Figure 4. Typical MDP trajectory.

Figure 5. Schematic of buck converter with LC filter.

Figure 6. Block diagram of the BO-DRL framework.

Figure 7. Bayesian optimization results: (a) evaluation progress for PPO (b) evaluation progress for TD3.

Figure 8. Agents learning curves: (a) Episodic; (b) Average.

Figure 9. Simulation results with varying reference voltage (a) reference voltage waveform (b) output voltage waveforms.

Figure 10. Simulation results with varying input voltage (a) filter voltage waveform (b) output voltage waveforms.

Figure 11. Simulation results with varying load (a) load power waveform (b) output voltage waveforms.

Table 1. Classification of model-free DRL agents.

Learning Policy	Key Features	Agent	Type	Action
On-policy	Learning is based on the current policy. Exploration is limited to the current policy. Suitable when the environment is relatively stable and the agent can explore safely.	State-action-reward-state-action (SARSA)	value-based	discrete
		Policy gradient (PG)	policy-based	discrete or continuous
		Actor critic (AC)	actor-critic	discrete or continuous
		Trust region policy optimization (TRPO)	actor-critic	discrete or continuous
		Proximal Policy optimization (PPO)	actor-critic	discrete or continuous
Off-policy	Learn from different policies. Explore more broadly through behavior policy. Suitable for complex environment	Q-learning	value-based	discrete
		Deep Q-network (DQN)	value-based	discrete
		Double DQN (DDQN)	value-based	discrete
		Deep deterministic policy gradient (DDPG)	actor-critic	continuous
		Twin-delay deep deterministic policy gradient (TD3)	actor-critic	continuous
		Soft actor-critic (SAC)	actor-critic	continuous
		Asynchronous advantage actor-critic (A3C)	actor-critic	discrete or continuous

Table 2. Comparative summary of voltage control techniques.

Category	Strengths	Limitations
Model-based	High accuracy and fast response when the system model is known. Stability can be guaranteed analytically. Facilitates optimal and predictive control.	Requires system model and parameters. Limited adaptability to uncertainties. Requires re-modeling in the event of system changes.
Model-free	Does not require an explicit system model. High adaptability and flexibility. Robust to parameter variations and uncertainties.	Lacks analytical stability guarantees. High computational cost during training. Requires high-quality data.

Table 3. Agents’ hyperparameter bounds.

Hyperparameter	PPO	TD3
Minibatch size (m)	[50, 400]	[50, 400]
Discount factor ( $γ$ )	[0.9, 1]	[0.9, 1]
Actor learning rate ( $α^{a}$ )	$[1 \times 10^{- 6}, 1 \times 10^{- 2}]$	$[1 \times 10^{- 6}, 1 \times 10^{- 2}]$
Critic learning rate ( $α^{c}$ )	$[1 \times 10^{- 6}, 1 \times 10^{- 2}]$	$[1 \times 10^{- 6}, 1 \times 10^{- 2}]$
Number of epochs (k)	[1, 10]	[1, 10]
Target smooth model standard deviation ( $σ_{τ}$ )	–	[0.1, 0.5]
Entropy loss function (w)	[0.01, 0.1]	–

Table 4. Transient performance metrics.

Metrics	PI	FLC	SMC	BO-PPO	BO-TD3
Rise time (s)	0.0016	0.0021	0.0090	0.0012	0.0016
Settling time (s)	0.0070	0.1398	0.0161	0.0033	0.0032
Overshoot (%)	11.27	5.41	0.03	7.42	5.95

Table 5. Comparison of control methods based on error metrics.

Metrics	PI	FLC	SMC	BO-PPO	BO-TD3
RMSE	0.0826	0.3462	0.1754	0.2798	0.0780
MAE	0.0706	0.2907	0.0889	0.2707	0.0650
MAPE	0.147	0.6057	0.1853	0.5639	0.1355
IAE	0.0095	0.0392	0.0120	0.0365	0.0088

Table 6. Comparison of control methods based on error metrics.

Metrics	PI	FLC	SMC	BO-PPO	BO-TD3
RMSE	0.8363	0.9832	1.2592	0.8676	0.7775
MAE	0.1039	0.3408	0.1038	0.4680	0.0750
MAPE	0.2165	0.7100	0.2163	0.9749	0.1562
IAE	0.3117	1.0223	0.3115	1.4039	0.2248

Table 7. Comparison of control methods based on error metrics.

Metrics	PI	FLC	SMC	BO-PPO	BO-TD3
RMSE	0.8437	0.9671	1.2679	0.7627	0.7942
MAE	0.1013	0.2980	0.1249	0.2356	0.0620
MAPE	0.2111	0.6208	0.2603	0.4909	0.1292
IAE	0.3040	0.8939	0.3748	0.7069	0.1860

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Muhammad, S.; Obeid, H.; Hammou, A.; Hinaje, M.; Gualous, H. Voltage Control for DC Microgrids: A Review and Comparative Evaluation of Deep Reinforcement Learning. Energies 2025, 18, 5706. https://doi.org/10.3390/en18215706

AMA Style

Muhammad S, Obeid H, Hammou A, Hinaje M, Gualous H. Voltage Control for DC Microgrids: A Review and Comparative Evaluation of Deep Reinforcement Learning. Energies. 2025; 18(21):5706. https://doi.org/10.3390/en18215706

Chicago/Turabian Style

Muhammad, Sharafadeen, Hussein Obeid, Abdelilah Hammou, Melika Hinaje, and Hamid Gualous. 2025. "Voltage Control for DC Microgrids: A Review and Comparative Evaluation of Deep Reinforcement Learning" Energies 18, no. 21: 5706. https://doi.org/10.3390/en18215706

APA Style

Muhammad, S., Obeid, H., Hammou, A., Hinaje, M., & Gualous, H. (2025). Voltage Control for DC Microgrids: A Review and Comparative Evaluation of Deep Reinforcement Learning. Energies, 18(21), 5706. https://doi.org/10.3390/en18215706

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Voltage Control for DC Microgrids: A Review and Comparative Evaluation of Deep Reinforcement Learning

Abstract

1. Introduction

2. Background of the Study

2.1. Cause of Voltage Instability

2.2. DC Microgrid Control

3. Model-Based Techniques

3.1. Sliding Mode Control

3.2. Adaptive Droop Control

3.3. Model Predictive Control

3.4. Passivity-Based Control

3.5. Active Disturbance Control

3.6. H-Infinity Control

4. Model-Free Techniques

4.1. Fuzzy Logic Control

4.2. Data-Driven Control

4.2.1. Artificial Neural Network

4.2.2. Local Model Networks

4.2.3. Model-Free Adaptive Control

4.2.4. Deep Reinforcement Learning

5. Hybrid Control Techniques

5.1. Metaheuristic Optimization Algorithms

5.2. Physics-Informed Neural Networks

6. A Case Study of DC/DC Converter with LC Filter

6.1. System Model

6.2. DRL Algorithms

6.2.1. PPO

6.2.2. TD3

6.3. DRL Controller Design

6.3.1. State Space

6.3.2. Action Space

6.3.3. Reward Function

6.3.4. Hyperparameter Optimization

6.4. Simulation Results

6.4.1. Training and Optimization Results

6.4.2. Comparative Evaluation Under Dynamic Conditions

Scenario 1–Varying Reference Voltage

Scenario 2–Supply Disturbance

Scenario 3–Load Disturbance

7. Challenges and Future Works

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Additional Tables

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI