Next Article in Journal
How Grid Decarbonization Reshapes Distribution Transformer Life-Cycle Impacts: A Forecasting-Based Life Cycle Assessment Framework for Hydro-Dominated Grids
Previous Article in Journal
A Unified Transformer-Based Harmonic Detection Network for Distorted Power Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Perspective

Reinforcement Learning-Driven Control Strategies for DC Flexible Microgrids: Challenges and Future

School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
*
Author to whom correspondence should be addressed.
Energies 2026, 19(3), 648; https://doi.org/10.3390/en19030648 (registering DOI)
Submission received: 30 December 2025 / Revised: 22 January 2026 / Accepted: 24 January 2026 / Published: 27 January 2026

Abstract

The increasing penetration of photovoltaic (PV) generation, energy storage systems, and flexible loads within modern buildings demands advanced control strategies capable of harnessing dynamic assets while maintaining grid reliability. This Perspective article presents a comprehensive overview of reinforcement learning-driven (RL-driven) control methods for DC flexible microgrids—focusing in particular on building-integrated systems that shift from AC microgrid architectures to true PV–Energy storage–DC flexible (PEDF) systems. We examine the structural evolution from traditional AC microgrids through DC microgrids to PEDF architectures, highlight core system components (PV arrays, battery storage, DC bus networks, and flexible demand interfaces), and elucidate their coupling within building clusters and urban energy networks. We then identify key challenges for RL applications in this domain—including high-dimensional state and action spaces, safety-critical constraints, sample efficiency, and real-time deployment in building energy systems—and propose future research directions, such as multi-agent deep RL, transfer learning across building portfolios, and real-time safety assurance frameworks. By synthesizing recent developments and mapping open research avenues, this work aims to guide researchers and practitioners toward robust, scalable control solutions for next-generation DC flexible microgrids.

1. Introduction

The accelerating penetration of PV generation, battery energy storage systems (BESS), and flexible loads within modern buildings is reshaping the operational paradigm of distribution-level energy systems. Traditional alternating-current (AC) microgrids, although effective in coordinating distributed energy resources (DERs), increasingly reveal structural limitations in scenarios with high DC-native renewable penetration [1]. These constraints—such as repetitive AC/DC conversion stages, synchronization requirements among inverter-dominated sources, and conversion-induced efficiency losses—have stimulated a transition toward direct-current (DC) microgrids, which inherently align with the electrical characteristics of PV modules, electrochemical storage, and an expanding portfolio of DC loads [2].  Building on this technological shift, recent developments have introduced the PEDF system as an integrated building or community-level architecture that unifies PV generation, hierarchical storage, multi-level DC distribution, and flexible demand-side resources within a coordinated cyber–physical framework. Unlike earlier AC or standalone DC microgrids, PEDF systems emphasize intelligent flexibility, fine-grained controllability, and multi-building interoperability, thereby laying the groundwork for DC-based urban energy clusters capable of high renewable self-consumption, resilience enhancement, and coordinated energy sharing.  In parallel with this architectural evolution, RL has emerged as a promising paradigm for managing the increasing operational complexity of PEDF systems. Recent studies across microgrid control, HVAC optimization, and building energy management demonstrate RL’s capability to learn optimal or near-optimal operational strategies under uncertainty [3,4,5]. Yet, despite notable advances, challenges remain in ensuring safety-critical operations, sample efficiency, interoperability across multi-agent environments, and reliable real-world deployment [6,7].  Given this convergence of DC microgrid development and AI-enabled control, a systematic examination of RL-driven strategies for PEDF systems is both timely and necessary. This perspective aims to (i) elucidate the technological evolution from AC microgrids to PEDF architectures, (ii) synthesize recent progress in RL-based control for PEDF systems, (iii) identify the central challenges limiting deployment, and (iv) outline future research directions spanning safety-assured RL, hierarchical and multi-agent control structures, and standardization pathways. Through this analysis, we seek to provide researchers and practitioners with a forward-looking understanding of how RL can unlock scalable, intelligent, and resilient control for next-generation DC energy systems.

2. From AC Microgrids to PEDF Systems—Definition and System Architecture

2.1. Limitations of AC Microgrids Under High Renewable Penetration

The evolution of building-integrated energy systems has followed a gradual yet well-defined trajectory toward improved energy efficiency, enhanced operational flexibility, and deeper penetration of renewable energy resources. Early AC microgrids were developed as decentralized electrical networks that interconnected distributed energy resources (DERs) and end-use loads through conventional AC infrastructure. Such systems enabled islanded operation, enhanced local resilience, and benefited from mature protection and control schemes inherited from legacy power systems [8]. Consequently, AC microgrids long served as the dominant paradigm for integrating distributed generation at building and community scales.
With the rapid proliferation of PV generation and BESS, the inherent limitations of AC-centric architectures have become increasingly evident. The integration of intrinsically DC-based sources and loads into AC microgrids necessitates multiple stages of AC/DC and DC/AC power conversion, which unavoidably introduces additional conversion losses and degrades overall system efficiency [9,10]. Moreover, the extensive use of power electronic interfaces intensifies harmonic distortion and imposes strict synchronization and control requirements on inverter-dominated networks [11,12]. As the proportion of inverter-interfaced distributed energy resources continues to increase, the effective system inertia is progressively reduced, thereby heightening vulnerability to voltage and frequency instability under high renewable penetration conditions [13,14].
In response to these challenges, DC microgrids have gained increasing attention due to their intrinsic compatibility with both generation and consumption subsystems. PV arrays and electrochemical energy storage are inherently DC-based, while a substantial proportion of modern building loads, including LED lighting, information and communication technology equipment, data-center loads, and electric vehicle (EV) charging infrastructure, also rely on internal DC interfaces. By eliminating unnecessary conversion stages, DC microgrids reduce energy losses, mitigate harmonic issues, and simplify local control strategies for integrating renewable energy and managing storage. However, most early DC microgrid implementations remain primarily focused on efficient power conversion and source–load balancing at the electrical layer, offering limited support for system-level coordination of demand-side flexibility, multi-time-scale scheduling, and integrated energy management.
Against this technological backdrop, the PEDF building energy system—first proposed in 2020 and subsequently validated through multiple standardization-oriented demonstration projects—represents a further architectural evolution beyond conventional DC microgrids [15]. As also summarized in Table 1, PEDF systems integrate PV generation, multi-scale energy storage, DC distribution networks, and flexible demand-side resources within a unified cyber–physical energy management framework rather than a mere DC power delivery architecture. This integrated architecture enables coordinated sensing, communication, and control across generation, storage, and loads, thereby explicitly elevating demand-side flexibility to a first-class system resource, transforming buildings from passive energy consumers into active prosumers capable of adaptive consumption, local balancing, and bidirectional interaction with the wider power system through intelligent control strategies.

2.2. PEDF Architecture

The PEDF architecture integrates PV generation, hierarchical energy storage, DC distribution, and demand-side flexibility into a unified, flexibility-oriented energy supply framework that emphasizes low carbon intensity and high operational stability. Unlike conventional DC or hybrid microgrids that are primarily defined by electrical topology, PEDF systems are functionally characterized by the explicit modeling, aggregation, and coordinated control of demand-side flexibility as a first-class system resource. As illustrated in Figure 1, the overall system is organized around a common DC backbone and can be conceptually decomposed into several tightly coupled functional subsystems that collectively enable efficient renewable energy utilization and flexible power management at the building or community level. From a control perspective, these subsystems are coordinated through a hierarchical framework with explicit time-scale separation, ensuring that learning-based decisions do not interfere with fast electrical dynamics.
Figure 1 presents a representative schematic of a PEDF microgrid structured around a shared DC bus. In this DC-centric configuration, PV generation, battery energy storage, flexible DC loads, and the utility grid are interconnected through multiple power-electronic interfaces, forming a highly coupled energy network with bidirectional power flows. This configuration enables the PEDF system to treat generation, storage, and flexible demand within a unified supervisory control framework, rather than as loosely coordinated subsystems. Specifically, fast converter-level dynamics are governed by local, model-based controllers, while supervisory coordination is performed at slower time scales through the EMS. PV arrays are connected to the DC bus via DC/DC converters equipped with maximum power point tracking (MPPT), ensuring efficient solar energy harvesting under varying irradiance and temperature conditions. Battery energy storage systems are interfaced through bidirectional DC/DC converters, enabling energy balancing, DC voltage regulation, and flexibility provision [16]. Grid interaction is achieved via an AC/DC converter that enables bidirectional power exchange in both grid-connected and islanded operating modes.
At the generation and storage levels, power conditioning is primarily handled by DC/DC conversion stages. The PV subsystem interfaces with the DC bus via a unidirectional DC/DC converter with MPPT functionality, enabling continuous, real-time optimization of renewable energy extraction. The energy storage subsystem consists mainly of electrochemical batteries (most commonly lithium-ion or lead-acid technologies) connected to the DC network via bidirectional DC/DC converters, typically based on Buck/Boost topologies. These converters support multiple operational functions, including charge-discharge regulation, transient power compensation, peak shaving, and DC voltage stabilization, thereby mitigating the inherent variability of PV generation. Within PEDF systems, these functions are further coordinated at the system level to support flexibility-aware scheduling and long-term energy management objectives. Importantly, such coordination is realized through reference-setting and envelope constraints issued by the EMS, rather than direct manipulation of converter switching actions.
The principal power conversion interfaces within the PEDF architecture are further detailed in Figure 2a,b. As shown in Figure 2a, the bidirectional DC/DC converter governs charging and discharging processes of the storage subsystem while stabilizing the DC bus voltage through adaptive duty-cycle modulation. This bidirectional capability enables seamless energy exchange between the storage module and the DC network across different operating conditions. Figure 2b illustrates the bidirectional DC/AC inverter that forms the external interconnection point between the DC network and the utility grid. This interface ensures voltage and frequency compatibility, supports smooth transitions between grid-connected and islanded modes, and enables ancillary service provision when permitted by system and regulatory constraints. Together, these converters constitute the core power-conditioning layer of the PEDF system, facilitating coordinated interaction among PV generation, energy storage, and flexible demand-side resources. It should be noted that the converter topologies illustrated in Figure 2a,b represent typical candidate implementations commonly adopted in PEDF systems, rather than fixed or mandatory designs. These power-electronic interfaces form a dedicated power-conditioning layer, upon which higher-level flexibility coordination and intelligent energy management functions are built. This layered separation ensures that millisecond-level voltage and current regulation remains fully decoupled from learning-based supervisory control decisions.
The DC bus and flexible load subsystem constitute the physical and functional backbone of the PEDF architecture. By interconnecting PV generation, storage units, and controllable loads (such as HVAC systems, lighting, consumer electronics, and EV chargers [17,18,19]), the DC bus enables efficient power routing and coordinated utilization of demand-side flexibility. It should be emphasized that HVAC systems are classified within the flexible load subsystem based on their role as controllable flexibility resources at the EMS level, rather than their underlying AC or DC electrical interfaces. In PEDF systems, flexible loads are not treated as passive consumers but are explicitly incorporated into the energy management decision space as dispatchable and schedulable resources. To accommodate heterogeneous load requirements, the DC distribution platform is typically implemented using standardized voltage levels, including 48 V, 220 V, 375 V, and 750 V, supporting scalable deployment at both building and cluster scales.
A supervisory EMS coordinates the operation of all subsystems by scheduling power flows and control actions to maximize on-site renewable energy consumption while maintaining supply-demand balance under uncertainty. Serving as the functional core of PEDF systems, the EMS enables system-level coordination across generation, storage, and flexible demand, thereby distinguishing PEDF architectures from conventional DC microgrids that rely primarily on local or rule-based control. In practical PEDF deployments, EMS decisions are typically executed at minute-level or multi-minute time resolutions, making them well-suited for optimization- or learning-based scheduling while preserving system stability. Acting as the primary interface between physical components and higher-level optimization or learning-based control strategies, the EMS enables the PEDF architecture to support advanced data-driven energy management paradigms [20]. For clarity, the main subsystems, key components, and their functional roles within the PEDF architecture are summarized in Table 2. Taken as an integrated whole, the PEDF architecture represents a functional evolution beyond conventional AC and DC microgrids, characterized by bidirectional energy flow, explicit flexibility integration, multi-level DC coordination, and compatibility with intelligent control frameworks, thereby facilitating deeper integration with building clusters and urban energy networks. The explicit separation between fast electrical control layers and slow supervisory decision layers further ensures that advanced learning-based methods can be incorporated without compromising converter-level stability or protection requirements.

2.3. Coupling with Building Energy Systems

In the broader context of urban decarbonisation and the ongoing transition toward digitalised and electrified energy systems, PEDF architectures are increasingly recognised as a key enabling layer for next-generation DC-based buildings. Rather than operating as isolated building-level solutions, PEDF systems provide a scalable, interoperable framework that supports the seamless integration of distributed energy resources, flexible loads, and intelligent control across individual buildings and building clusters.
Figure 3 illustrates a representative application of a PEDF microgrid within a building energy system, organised around a shared DC bus. In this configuration, photovoltaic generation, battery energy storage, grid interfaces, and a wide range of DC loads—including HVAC systems, lighting, office equipment, and EV chargers—are directly interconnected through power electronic converters. By adopting a DC-centric architecture, the PEDF system reduces redundant AC/DC conversion stages, improves overall energy efficiency, and enables direct coupling between renewable generation and storage and end-use devices.
When extended to district- or campus-scale deployments, multiple PEDF-enabled buildings can be interconnected via DC feeders to form multi-building microgrids or local energy-sharing communities. Such configurations enable coordinated operation of distributed PV systems, storage units, and flexible demands, yielding several functional advantages:
  • DC intra- and inter-building distribution, which minimizes conversion losses, simplifies system architecture, and enhances compatibility with native DC sources and loads.
  • Integrated PV-storage coordination, allowing higher levels of renewable self-consumption, improved operational autonomy, and enhanced resilience through collective energy balancing.
  • Hierarchical and flexible control, in which a supervisory EMS orchestrates energy flows across buildings, supports peak shaving and valley filling, and enables participation in demand response and ancillary service markets.
From a system-level perspective, the PEDF paradigm represents an evolution beyond conventional AC microgrids or standalone DC systems. By embedding controllable flexibility, unified DC distribution, and data-driven coordination into the building energy layer, PEDF architectures form a building-centric intelligent energy ecosystem. This ecosystem is inherently compatible with advanced optimisation and learning-based control strategies, such as RL, thereby positioning PEDF systems as a critical bridge between building energy automation, microgrid operation, and future autonomous urban power systems.

3. RL for Energy Management in PEDF Systems

3.1. RL: Definition and Basic Framework

RL is a class of machine learning methodologies tailored for sequential decision-making in stochastic and dynamically evolving environments [21,22,23,24]. Unlike supervised learning, RL does not rely on labelled input-output datasets; instead, an autonomous agent learns optimal control strategies through direct interaction with the environment by continuously observing system states, executing actions, and receiving scalar reward signals that encode long-term performance objectives [25,26]. This trial-and-error learning paradigm renders RL particularly suitable for complex cyber-physical energy systems, where uncertainty, delayed responses, and strong coupling among decision variables are intrinsic [27,28,29].
Formally, an RL problem is commonly modelled as a Markov decision process (MDP), defined by a state space S, an action space A, a state transition kernel P ( s t + 1 s t , a t ) , and a scalar reward function ρ ( s t , a t ) . In PEDF applications, this formulation provides a structured abstraction for explicitly mapping system-level flexibility objectives and operational constraints into the reward design, forming the theoretical basis for learning-based control in cyber-physical energy systems.
As illustrated in Figure 4, the agent-environment interaction follows a closed-loop decision framework. At each discrete time step t, the RL agent observes the system state s t S , selects a control action a t A according to a parameterized policy π θ ( a t s t ) , and receives a scalar reward:
r t = ρ ( s t , a t ) ,
which evaluates the immediate operational consequence of the selected action. In practical PEDF systems, the reward function is designed to operationalize physically meaningful performance metrics rather than serving as an abstract numerical signal. The environment then transitions to a new state s t + 1 in accordance with the underlying system dynamics, and the interaction process iterates over time.
The resulting agent-environment interaction generates a state-action-reward trajectory ( s 0 , a 0 , r 0 , s 1 , a 1 , r 1 , ) , from which the learning objective is to identify an optimal policy that maximizes the expected cumulative discounted return:
J ( π ) = E π t = 0 γ t r t ,
where γ ( 0 , 1 ] is the discount factor that balances short-term performance against long-term operational objectives.
From an algorithmic perspective, the closed-loop structure depicted in Figure 4 highlights the core computational components of RL-based control: state perception, policy evaluation, action execution, and reward feedback. Value-based methods focus on estimating the state-action value function
Q π ( s , a ) = E π k = 0 γ k r t + k | s t = s , a t = a ,
where policy-based approaches directly optimize the policy parameters θ by maximizing the objective in (2). actor–critic algorithms integrate both mechanisms by simultaneously learning a policy (actor) and a value estimator (critic), thereby improving convergence stability and scalability in high-dimensional and continuous control problems.
Together, the MDP formulation and the algorithmic workflow summarized in Figure 4 establish a unified theoretical foundation for applying RL to building-scale microgrids and PEDF systems, where control actions influence system evolution over time and long-term performance emerges from sequential decision-making under uncertainty [30,31,32].
Figure 4. Schematic diagram of RL principles [33].
Figure 4. Schematic diagram of RL principles [33].
Energies 19 00648 g004

3.2. Potential and Advantages of RL for PEDF Systems

In the context of PEDF systems, the state vector s t typically encapsulates key operational variables such as battery state of charge, DC bus voltage, power flow levels, renewable generation output, and aggregated load demand. Correspondingly, the action space may include continuous control signals for power electronic converters, charging and discharging commands for energy storage systems, and scheduling decisions for flexible loads. To avoid ambiguity across control layers, it should be emphasized that, in practical PEDF implementations, the RL action space is explicitly restricted to supervisory EMS-level variables. In practical PEDF implementations, these action variables are defined at the supervisory energy management level as reference values, operating envelopes, or scheduling decisions, rather than direct switching or inner-loop control commands for power electronic converters.
The reward function is commonly structured as a weighted aggregation of operationalized flexibility objectives, including economic cost minimization, renewable energy utilization, DC bus voltage regulation, battery SOC constraint enforcement, and flexible load comfort bounds [34,35,36]. Reward terms are typically normalized with respect to rated power, voltage limits, or comfort bounds, and the associated weights are selected to encode relative system-level priorities at the EMS layer, reflecting planning and operational preferences rather than controller-level tuning. As a result, reward weighting is treated as a system modeling choice, and sensitivity analysis with respect to reward weights is required in practical implementations to assess robustness and performance trade-offs, rather than to optimize numerical performance alone.
The rapid proliferation of PEDF systems, together with the widespread deployment of power-electronics-interfaced end-use devices, has significantly increased the structural and operational complexity of modern building microgrids. As renewable penetration rises and flexibility resources become more diverse, energy management decisions increasingly involve tightly coupled dynamics, nonlinear interactions, and multi-directional power flows. To manage this complexity without compromising stability, PEDF control architectures explicitly separate fast electrical dynamics from slower energy management decision layers.
Conventional control and optimization approaches, including rule-based heuristics, deterministic optimization, and model predictive control, typically depend on accurate system models and reliable forecasts [37,38,39]. In practice, however, these assumptions are difficult to maintain under stochastic renewable generation, volatile electricity prices, and uncertain occupant-driven load behavior [40,41,42,43]. As a result, model mismatch and forecast errors can degrade control performance and limit scalability as system complexity grows. It should be noted that model predictive control (MPC) remains a strong and widely adopted baseline for EMS-level control in building microgrids and DC systems, particularly when sufficiently accurate system models and short- to medium-term forecasts are available. In the PEDF context, MPC is well-suited for handling explicit constraints and short-horizon optimization problems; however, its performance may degrade when system dynamics, occupant-driven flexible loads, or long-term objectives such as comfort–energy trade-offs and storage degradation are difficult to model accurately. Consequently, RL is not positioned as a replacement for MPC, but rather as a complementary supervisory control option under conditions where model uncertainty, flexibility heterogeneity, or long-term operational adaptation play a dominant role.
Against this background, RL has emerged as an attractive control paradigm for DC flexible microgrids and PEDF systems [44,45,46,47]. As illustrated in Figure 5, an RL agent interacts with the physical PEDF system through a supervisory energy management and safety interface. Within the adopted hierarchical control framework, RL operates exclusively at the EMS layer with minute-level decision intervals [48], while primary (millisecond-level) and secondary (second-level) control functions remain fully model-based and deterministic. System states—including PV generation and forecasts, battery state of charge, DC bus voltage, load demand, and electricity price signals—are continuously collected and used as inputs to a deep RL policy, which generates control actions such as storage charge and discharge setpoints, grid power exchange decisions, and flexible load scheduling commands. The resulting system response is evaluated through a reward function that aggregates operating cost, voltage deviation penalties, hard constraint violation penalties, and renewable energy utilization, with reward weights selected according to system priorities and subjected to sensitivity analysis in practice, enabling adaptive and autonomous energy management through repeated closed-loop interaction.
By reducing reliance on high-fidelity component models, RL alleviates the modeling burden associated with power-electronic-dominated systems while allowing control policies to adapt to changing operating conditions, component ageing, and newly integrated flexibility resources. To mitigate sensitivity to reward weighting, constrained and multi-objective RL formulations are increasingly preferred in PEDF-related studies, providing a principled mechanism for balancing competing objectives without excessive manual tuning. It should be emphasized that such learning-based adaptability is confined to supervisory scheduling and dispatch decisions and does not alter the stability margins ensured by certified low-level controllers. In addition, RL-based control architectures have been explored across multiple temporal and spatial scales in the literature. In PEDF systems, however, RL is typically confined to supervisory energy management and coordination layers, while fast converter-level regulation remains handled by conventional controllers. In the PEDF context considered here, RL is applied at the supervisory and coordination scales, while fast converter-level regulation is intentionally excluded from the RL action space.
A growing body of literature demonstrates that RL-driven controllers can outperform conventional PID, droop, and optimization-based methods in terms of voltage regulation, disturbance rejection, and operational cost reduction, particularly under high renewable penetration and limited system observability [49,50,51,52]. Collectively, these characteristics position RL as a promising enabler for intelligent, resilient, and scalable control of next-generation PEDF infrastructures [53,54].

3.3. Categories of RL Algorithms

Depending on how value functions and control policies are represented and optimized, RL algorithms can be broadly categorized into three classes: value-based methods, policy-based methods, and actor–critic architectures. This classification provides a systematic perspective for understanding algorithmic trade-offs in terms of stability, convergence behavior, and action-space compatibility, and it serves as a practical guideline for selecting suitable RL techniques for different control objectives in DC flexible microgrids [55,56,57,58].
Value-based methods focus on learning the action-value function Q ( s , a ) and implicitly derive a control policy by maximizing the estimated Q-values. A representative algorithm in this category is the Deep Q-Network (DQN), which employs deep neural networks to approximate the Q-function in high-dimensional and nonlinear state spaces [23,59]. As illustrated in Figure 6, the RL agent observes the PEDF system state, including PV power or forecasts, battery state of charge (SOC), DC bus voltage, load demand, and electricity price signals. Based on these observations, the online Q-network selects discrete control actions—such as battery charging/discharging modes, grid import/export decisions, or load on/off scheduling—according to the greedy action selection rule arg max a Q ( s , a ) . To enhance learning stability and data efficiency, experience tuples ( s t , a t , r t , s t + 1 ) collected from interactions with the PEDF physical system are stored in a replay buffer and randomly sampled in mini-batches for network training. In addition, a target Q-network is employed to decouple target value estimation from online parameter updates, mitigating oscillations during learning. The reward function integrates multiple operational objectives, including minimizing operating costs, penalizing DC voltage deviations, enforcing SOC constraints, and incentivizing renewable energy utilization, thereby guiding the agent toward coordinated and economically efficient control strategies [60,61,62,63].
In contrast to value-based approaches, policy-based methods directly parameterize the control policy and optimize it via gradient-based updates, making them particularly suitable for continuous action spaces commonly encountered in power-electronic and energy-storage control. actor–critic architectures combine the strengths of value-based and policy-based methods by introducing separate function approximators for policy execution (Actor) and value evaluation (Critic), resulting in improved convergence and robustness in complex environments. A prominent actor–critic algorithm is the Deep Deterministic Policy Gradient (DDPG), which extends the framework to deterministic policies in continuous control domains. As depicted in Figure 7, the Actor network maps the observed system state to continuous control actions, such as battery power setpoints or converter control signals, while the Critic network evaluates the corresponding Q-value. The Critic is trained by minimizing the temporal-difference error through the Critic loss, whereas the Actor is updated by maximizing the expected Q-value via the Actor loss. Observations from the PEDF physical system are continuously fed back to update the state representation, forming a closed-loop learning and control process. This architecture enables DDPG to effectively address control tasks such as DC bus voltage regulation, energy storage dispatch, and coordinated power flow management under uncertainty.
Table 3 summarizes six representative RL algorithms, along with their action-space characteristics and typical applications in DC flexible microgrid control. This structured overview not only clarifies the algorithmic landscape but also lays the foundation for subsequent discussions on implementation challenges and future research directions in applying RL to PEDF systems.

4. Key Challenges

Despite the rapid progress reported in recent studies, the large-scale deployment of RL-driven control in PEDF microgrids remains constrained by a set of unresolved technical and practical challenges [85]. These challenges stem from the intrinsic safety-critical nature of power systems, the data-intensive characteristics of modern RL algorithms, and the increasing system complexity associated with multi-agent and cyber-physical integration [86,87]. As summarized in Table 4, the existing limitations can be broadly classified into five interrelated domains: (i) safety-critical exploration and operational risk, (ii) sample efficiency and training cost, (iii) uncertainty propagation and forecasting error coupling, (iv) scalability and interoperability in distributed multi-agent settings, and (v) real-time deployment, interpretability, and cybersecurity. All of these challenges are discussed under the assumption of a hierarchical PEDF control architecture with explicit time-scale separation, where RL is confined to supervisory energy management decisions.
(1)
Safety-critical exploration and constraint enforcement.
Unlike simulated environments, practical PEDF systems operate under stringent voltage, SOC, thermal, and power quality constraints. Any violation of these limits may lead to inverter instability, converter saturation, accelerated battery degradation, or even load interruption. Theoretical frameworks that explicitly incorporate operational constraints into optimal control formulations, such as those developed by Duan et al. [88] and extended in subsequent studies [89], provide valuable references for safe control design. However, most conventional RL algorithms rely on exploratory trial-and-error learning, which renders unconstrained exploration fundamentally incompatible with safety-critical PEDF operation. In practical PEDF architectures, RL is therefore not permitted to directly manipulate safety-critical control variables, and all physical and comfort constraints are enforced by deterministic protection and control layers. These protection and control layers correspond to primary (millisecond-level) and secondary (second-level) controllers, which remain entirely outside the RL action space. Although recent research has proposed safe RL, constrained optimization, and hybrid RL-rule-based architectures, the reliable enforcement of hard operational limits during both training and online execution remains an open challenge [90,91]. To mitigate this risk, exploration is typically conducted offline using high-fidelity digital twins or restricted to hard-constrained action spaces during online deployment, while certified low-level controllers retain full responsibility for fast electrical dynamics and protection functions. This issue is further exacerbated in DC microgrids, where fast converter dynamics and constant-power load characteristics impose millisecond-level stability requirements [92], reinforcing the necessity of strict control separation between learning-based EMS decisions and real-time converter control.
(2)
Sample efficiency and data availability.
State-of-the-art deep RL algorithms such as SAC, PPO, and TD3 typically require extensive interaction data to converge, often ranging from tens of thousands to millions of samples [93,94]. Such data requirements are impractical for real PEDF systems, where prolonged exploratory operation is neither economical nor safe. This limitation is particularly pronounced at the EMS level, where learning episodes correspond to long operational horizons measured in minutes or hours rather than fast electrical transients. While high-fidelity digital twins can partially mitigate this limitation, discrepancies between the simulated and physical environments often lead to performance degradation after deployment. Emerging paradigms, including offline RL, model-based RL, and meta-learning, offer promising avenues to improve sample efficiency, yet their application to PEDF systems remains limited and largely exploratory [95,96].
(3)
Forecast uncertainty and dynamic coupling effects.
PEDF operation is inherently influenced by multiple sources of uncertainty, including stochastic PV generation, volatile electricity prices, building thermal inertia, EV charging behavior, and occupant-driven load variability. Although fast electrical dynamics are regulated by lower-level controllers, forecast uncertainty primarily affects supervisory scheduling and dispatch decisions made at the EMS time scale. RL agents trained on imperfect forecasts may converge to brittle or short-sighted policies, particularly when uncertainty is not explicitly encoded in the state representation or reward structure [97]. Existing studies indicate that temporal window selection, feature engineering, and error-aware forecasting significantly affect policy robustness. Nevertheless, a lack of standardized state representations and uncertainty modeling frameworks continues to hinder reproducibility and cross-study comparison.
(4)
Distributed coordination and communication constraints.
As PEDF systems evolve toward interconnected building clusters and local energy communities, multi-agent RL (MARL) becomes a natural modeling choice for distributed energy management across multiple buildings or subsystems, rather than a monolithic control paradigm. Within the hierarchical PEDF framework, MARL coordination is typically implemented at the EMS level, where agents exchange low-rate scheduling information rather than real-time control signals. Communication latency, bandwidth limitations, and privacy concerns further complicate real-world deployment, highlighting the need for benchmark-driven evaluation and scalable coordination mechanisms in future PEDF research.
(5)
Real-time deployment, interpretability, and cybersecurity.
Practical PEDF deployment requires RL controllers to deliver millisecond-level inference, transparent decision logic, and robust resilience against cyber threats. In the considered PEDF architecture, millisecond-level real-time requirements apply exclusively to the primary control layers, whereas RL-based EMS decisions operate at slower time scales compatible with current inference capabilities. Deep RL policies, however, are typically implemented as opaque neural networks, which complicates certification, debugging, and compliance with building energy regulations. Moreover, adversarial manipulation of sensor data, reward signals, or communication channels may distort learned policies, raising critical concerns regarding trustworthiness and system resilience. Addressing interpretability and cybersecurity, therefore, represents a prerequisite for the safe adoption of RL in real PEDF infrastructures [98].

5. Future Perspectives

RL is widely regarded as a promising enabler of autonomous and adaptive control for next-generation PEDF systems. Although recent studies have reported encouraging proof-of-concept results, the transition from laboratory-scale demonstrations to reliable large-scale deployment remains challenging [99]. Progress beyond experimental settings will require coordinated advances in algorithm design, system integration, and regulatory alignment. Building on the current state of PEDF development [100,101] and broader progress in RL for cyber–physical energy systems [8], several key research directions can be identified, as summarized in Table 5.
(1)
Safety-by-design and certifiable RL control.
Operational safety must form the foundation of future RL-based PEDF control. PEDF systems are subject to strict constraints on voltage, battery state of charge, and thermal stress that must be adhered to during learning and operation. Future research should therefore focus on constrained RL formulations, safety filters such as control barrier functions, and hybrid architectures that combine certified low-level controllers with high-level RL agents for supervisory optimization [64]. Ensuring provable constraint satisfaction under fast converter dynamics remains a central challenge [102].
(2)
Sample efficiency and uncertainty-aware optimization.
The data-intensive nature of modern RL algorithms poses a major barrier to real-world PEDF deployment. Approaches such as model-based RL, offline RL using historical data, meta-learning, and uncertainty-aware policy design offer promising pathways to reduce training requirements and improve robustness [103]. These methods are particularly important under stochastic PV generation, volatile electricity prices, and occupant-driven demand variability [104,105].
(3)
Hierarchical and MARL.
PEDF systems operate across heterogeneous temporal and spatial scales, from fast converter-level control to building- and community-level energy management. Hierarchical RL aligns naturally with this layered structure, while multi-agent RL enables coordination among distributed assets and buildings. In practical PEDF deployments, MARL agents are typically defined at coarse granularities, such as the subsystem or building level, in order to limit coordination complexity and communication overhead. Although centralized training with decentralized execution (CTDE) has shown potential, further work is needed to address scalability, communication efficiency, and reward coordination in large systems [106,107]. Moreover, publicly available and standardized benchmark environments specifically tailored to PEDF systems are currently lacking, which limits systematic comparisons of RL strategies and practical guidance for practitioners. The development of open, scalable PEDF benchmark platforms should therefore be regarded as an important future research direction rather than an existing capability.
(4)
Real-time implementation and system integration.
Bridging the gap between algorithm development and operational deployment remains a practical priority. Real-world PEDF applications impose strict requirements on inference latency, computational efficiency, and compatibility with existing energy management systems. Future efforts should emphasize lightweight policy representations and seamless integration of RL controllers into established supervisory control architectures [108].
(5)
Standardization, benchmarking, and socio-technical integration.
Beyond algorithmic advances, large-scale adoption of RL-driven PEDF systems depends on standardized DC interfaces, interoperable data schemas, and openly available benchmark datasets. Regulatory frameworks, tariff structures, and user acceptance must also be incorporated into control design to ensure compliance and long-term viability.
In summary, RL-driven PEDF control is evolving from conceptual exploration toward scalable engineering practice. Addressing safety certification, data efficiency, hierarchical coordination, real-time integration, and standardization—as outlined in Table 5—will be essential for realizing intelligent, resilient, and deployable DC flexible energy systems.

6. Conclusions

The growing convergence of DC-based building energy systems and learning-enabled control is reshaping the technological trajectory of future urban energy infrastructures. The PEDF architecture represents a decisive evolution beyond conventional AC or standalone DC microgrids, providing a unified platform that integrates renewable generation, hierarchical storage, flexible loads, and multi-building energy interaction. As demonstrated in recent pilot projects and modeling studies, PEDF systems offer clear advantages in conversion efficiency, controllability, and renewable self-consumption, laying the groundwork for scalable DC building clusters and energy-sharing communities.
At the same time, RL has emerged as a compelling approach for optimizing PEDF operation under uncertainty, enabling adaptive decision-making across converter-level control, building-level scheduling, and multi-agent coordination. While early results confirm RL’s capability to enhance operational efficiency, respond to stochastic variability, and support distributed flexibility management, widespread deployment remains constrained by several fundamental challenges. These include safety guarantees in power-electronic environments, limitations on sample efficiency, difficulty in coordinating multiple agents across heterogeneous time scales, gaps between simulation and practical deployment, and a lack of unified standards and benchmarking frameworks for DC building systems.
Looking ahead, future research must focus on developing safety-certified RL architectures, uncertainty-aware and data-efficient algorithms, scalable hierarchical and multi-agent structures, and high-fidelity digital twins that enable reliable sim-to-real transfer. Equally important is the advancement of PEDF system standardization—including DC interface protocols, data schemas, and cybersecurity frameworks—as well as the integration of human–building interaction, market design, and regulatory considerations into learning-enabled control strategies.
Taken together, these developments chart a pathway toward autonomous, resilient, and interoperable PEDF systems capable of functioning as the foundational infrastructure for next-generation urban energy ecosystems. By combining the architectural strengths of DC distribution with the adaptive intelligence of RL, PEDF systems hold the potential to deliver transformative gains in efficiency, flexibility, and carbon reduction across buildings and communities.

Author Contributions

All authors, J.S., W.X., and K.L., contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research and Development Program in Zhenjiang City under Grant number SH2023108.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Sheida, K.; Seyedi, M.; Afridi, M.A.; Ferdowsi, F.; Khattak, M.J.; Gopu, V.K.; Rupnow, T. Resilient reinforcement learning for voltage control in an islanded dc microgrid integrating data-driven piezoelectric. Machines 2024, 12, 694. [Google Scholar] [CrossRef]
  2. Muriithi, G.; Chowdhury, S. Optimal energy management of a grid-tied solar pv-battery microgrid: A reinforcement learning approach. Energies 2021, 14, 2700. [Google Scholar] [CrossRef]
  3. Zhou, X.; Lin, W.; Kumar, R.; Cui, P.; Ma, Z. A data-driven strategy using long short term memory models and reinforcement learning to predict building electricity consumption. Appl. Energy 2022, 306, 118078. [Google Scholar] [CrossRef]
  4. Liu, X.; Ren, M.; Yang, Z.; Yan, G.; Guo, Y.; Cheng, L.; Wu, C. A multi-step predictive deep reinforcement learning algorithm for HVAC control systems in smart buildings. Energy 2022, 259, 124857. [Google Scholar] [CrossRef]
  5. Li, K.; Luo, Y.; Shen, Y.; Xue, W. Towards personalized HVAC: A non-contact human thermal sensation monitoring and regulation system. Energy Build. 2025, 350, 116649. [Google Scholar] [CrossRef]
  6. Meng, Q.; Hussain, S.; Luo, F.; Wang, Z.; Jin, X. An online reinforcement learning-based energy management strategy for microgrids with centralized control. IEEE Trans. Ind. Appl. 2024, 61, 1501–1510. [Google Scholar] [CrossRef]
  7. Sivamayil, K.; Rajasekar, E.; Aljafari, B.; Nikolovski, S.; Vairavasundaram, S.; Vairavasundaram, I. A systematic study on reinforcement learning based applications. Energies 2023, 16, 1512. [Google Scholar] [CrossRef]
  8. Vázquez-Canteli, J.R.; Nagy, Z. Reinforcement learning for demand response: A review of algorithms and modeling techniques. Appl. Energy 2019, 235, 1072–1089. [Google Scholar] [CrossRef]
  9. Gao, Y.; Matsunami, Y.; Miyata, S.; Akashi, Y. Operational optimization for off-grid renewable building energy system using deep reinforcement learning. Appl. Energy 2022, 325, 119783. [Google Scholar] [CrossRef]
  10. Michailidis, P.; Michailidis, I.; Kosmatopoulos, E. Reinforcement learning for optimizing renewable energy utilization in buildings: A review on applications and innovations. Energies 2025, 18, 1724. [Google Scholar] [CrossRef]
  11. Wan, Y. Advancing Intelligent DC Microgrids: AI-Enabled Control, Cyber Security, and Energy Management. Ph.D. Thesis, Technical University of Denmark (DTU), Lyngby, Denmark, 2023. [Google Scholar] [CrossRef]
  12. Lai, H.; Xiong, K.; Zhang, Z.; Chen, Z. Droop control strategy for microgrid inverters: A deep reinforcement learning enhanced approach. Energy Rep. 2023, 9, 567–575. [Google Scholar] [CrossRef]
  13. Duan, J.; Wang, C.; Xu, H.; Liu, W.; Xue, Y.; Peng, J.C.; Jiang, H. Distributed control of inverter-interfaced microgrids based on consensus algorithm with improved transient performance. IEEE Trans. Smart Grid 2017, 10, 1303–1312. [Google Scholar] [CrossRef]
  14. Akbulut, O.; Cavus, M.; Cengiz, M.; Allahham, A.; Giaouris, D.; Forshaw, M. Hybrid Intelligent Control System for Adaptive Microgrid optimization: Integration of rule-based control and deep learning techniques. Energies 2024, 17, 2260. [Google Scholar] [CrossRef]
  15. Liwei, Z. Analysis of PEDF Systems and Their Application Challenges and Countermeasures in Buildings; Sichuan Cement: Chengdu, China, 2022; pp. 26–28. [Google Scholar] [CrossRef]
  16. Carkhuff, B.G.; Demirev, P.A.; Srinivasan, R. Impedance-based battery management system for safety monitoring of lithium-ion batteries. IEEE Trans. Ind. Electron. 2018, 65, 6497–6504. [Google Scholar] [CrossRef]
  17. Nguyen, A.T.; Pham, D.H.; Oo, B.L.; Santamouris, M.; Ahn, Y.; Lim, B.T. modeling building HVAC control strategies using a deep reinforcement learning approach. Energy Build. 2024, 310, 114065. [Google Scholar] [CrossRef]
  18. Fu, Q.; Chen, X.; Ma, S.; Fang, N.; Xing, B.; Chen, J. Optimal control method of HVAC based on multi-agent deep reinforcement learning. Energy Build. 2022, 270, 112284. [Google Scholar] [CrossRef]
  19. Deng, X.; Zhang, Y.; Qi, H. Towards optimal HVAC control in non-stationary building environments combining active change detection and deep reinforcement learning. Build. Environ. 2022, 211, 108680. [Google Scholar] [CrossRef]
  20. Sabzalian, M.H.; Pirouzi, S.; Aredes, M.; Wanderley Franca, B.; Carolina Cunha, A. Two-layer coordinated energy management method in the smart distribution network including multi-microgrid based on the hybrid flexible and securable operation strategy. Int. Trans. Electr. Energy Syst. 2022, 2022, 3378538. [Google Scholar] [CrossRef]
  21. Chen, Y.; Yu, Z.; Han, Z.; Sun, W.; He, L. A decision-making system for cotton irrigation based on reinforcement learning strategy. Agronomy 2023, 14, 11. [Google Scholar] [CrossRef]
  22. Xie, F.; Guo, Z.; Li, T.; Feng, Q.; Zhao, C. Dynamic Task Planning for Multi-Arm Harvesting Robots Under Multiple Constraints Using Deep Reinforcement Learning. Horticulturae 2025, 11, 88. [Google Scholar] [CrossRef]
  23. Akbari, E.; Faraji Naghibi, A.; Veisi, M.; Shahparnia, A.; Pirouzi, S. Multi-objective economic operation of smart distribution network with renewable-flexible virtual power plants considering voltage security index. Sci. Rep. 2024, 14, 19136. [Google Scholar] [CrossRef] [PubMed]
  24. Arroyo, J.; Manna, C.; Spiessens, F.; Helsen, L. Reinforced model predictive control (RL-MPC) for building energy management. Appl. Energy 2022, 309, 118346. [Google Scholar] [CrossRef]
  25. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998; Volume 22447. [Google Scholar]
  26. Waghmare, A.; Singh, V.; Varshney, T.; Sanjeevikumar, P. A systematic review of reinforcement learning-based control for microgrids: Trends, challenges, and emerging algorithms. Discov. Appl. Sci. 2025, 7, 939. [Google Scholar] [CrossRef]
  27. Chen, Y.; Lin, M.; Yu, Z.; Sun, W.; Fu, W.; He, L. Enhancing cotton irrigation with distributional actor–critic reinforcement learning. Agric. Water Manag. 2025, 307, 109194. [Google Scholar] [CrossRef]
  28. Zhou, X.; Sun, J.; Tian, Y.; Lu, B.; Hang, Y.; Chen, Q. Hyperspectral technique combined with deep learning algorithm for detection of compound heavy metals in lettuce. Food Chem. 2020, 321, 126503. [Google Scholar] [CrossRef] [PubMed]
  29. Li, Y.; Wang, R.; Yang, Z. Optimal scheduling of isolated microgrids using automated reinforcement learning-based multi-period forecasting. IEEE Trans. Sustain. Energy 2021, 13, 159–169. [Google Scholar] [CrossRef]
  30. Zhao, J.; Fan, S.; Zhang, B.; Wang, A.; Zhang, L.; Zhu, Q. Research Status and Development Trends of Deep Reinforcement Learning in the Intelligent Transformation of Agricultural Machinery. Agriculture 2025, 15, 1223. [Google Scholar] [CrossRef]
  31. Cai, W.; Kordabad, A.B.; Gros, S. Energy management in residential microgrid using model predictive control-based reinforcement learning and Shapley value. Eng. Appl. Artif. Intell. 2023, 119, 105793. [Google Scholar] [CrossRef]
  32. Guo, C.; Wang, X.; Zheng, Y.; Zhang, F. Real-time optimal energy management of microgrid with uncertainties based on deep reinforcement learning. Energy 2022, 238, 121873. [Google Scholar] [CrossRef]
  33. Li, K.; Shi, J.; Hu, C.; Xue, W. The Intelligentization Process of Agricultural Greenhouse: A Review of Control Strategies and Modeling Techniques. Agriculture 2025, 15, 2135. [Google Scholar] [CrossRef]
  34. Wang, D.; Zheng, W.; Wang, Z.; Wang, Y.; Pang, X.; Wang, W. Comparison of reinforcement learning and model predictive control for building energy system optimization. Appl. Therm. Eng. 2023, 228, 120430. [Google Scholar] [CrossRef]
  35. Alotaibi, B.S. Context-aware smart energy management system: A reinforcement learning and IoT-based framework for enhancing energy efficiency and thermal comfort in sustainable buildings. Energy Build. 2025, 340, 115804. [Google Scholar] [CrossRef]
  36. Liu, S.; Han, S.; Zhu, S. Reinforcement learning-based energy trading and management of regional interconnected microgrids. IEEE Trans. Smart Grid 2022, 14, 2047–2059. [Google Scholar] [CrossRef]
  37. Li, K.; Sha, Z.; Xue, W.; Chen, X.; Mao, H.; Tan, G. A fast modeling and optimization scheme for greenhouse environmental system using proper orthogonal decomposition and multi-objective genetic algorithm. Comput. Electron. Agric. 2020, 168, 105096. [Google Scholar] [CrossRef]
  38. Li, K.; Mi, Y.; Zheng, W. An optimal control method for greenhouse climate management considering crop growth’s spatial distribution and energy consumption. Energies 2023, 16, 3925. [Google Scholar] [CrossRef]
  39. Chen, C.; Zhu, W.; Steibel, J.; Siegford, J.; Han, J.; Norton, T. Classification of drinking and drinker-playing in pigs by a video-based deep learning method. Biosyst. Eng. 2020, 196, 1–14. [Google Scholar] [CrossRef]
  40. Manjavacas, A.; Campoy-Nieves, A.; Jiménez-Raboso, J.; Molina-Solana, M.; Gómez-Romero, J. An experimental evaluation of deep reinforcement learning algorithms for HVAC control. Artif. Intell. Rev. 2024, 57, 173. [Google Scholar] [CrossRef]
  41. Kurte, K.; Amasyali, K.; Munk, J.; Zandi, H. Deep Reinforcement Learning based HVAC Control for Reducing Carbon Footprint of Buildings. In Proceedings of the 2023 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 16–19 January 2023; pp. 1–5. [Google Scholar] [CrossRef]
  42. Gao, Y.; Shi, S.; Miyata, S.; Akashi, Y. Successful application of predictive information in deep reinforcement learning control: A case study based on an office building HVAC system. Energy 2024, 291, 130344. [Google Scholar] [CrossRef]
  43. Li, K.; Xue, W.; Mao, H.; Chen, X.; Jiang, H.; Tan, G. Optimizing the 3D distributed climate inside greenhouses using multi-objective optimization algorithms and computer fluid dynamics. Energies 2019, 12, 2873. [Google Scholar] [CrossRef]
  44. Kumar, P.P.; Nuvvula, R.S.; Shezan, S.A.; JM, B.; Ahammed, S.R.; Ali, A. Intelligent Energy Management System for Microgrids using Reinforcement Learning. In Proceedings of the 2024 12th International Conference on Smart Grid (icSmartGrid), Setubal, Portugal, 27–29 May 2024; pp. 329–335. [Google Scholar] [CrossRef]
  45. Pang, K.; Zhou, J.; Tsianikas, S.; Coit, D.W.; Ma, Y. Long-term microgrid expansion planning with resilience and environmental benefits using deep reinforcement learning. Renew. Sustain. Energy Rev. 2024, 191, 114068. [Google Scholar] [CrossRef]
  46. Forootan, M.M.; Larki, I.; Zahedi, R.; Ahmadi, A. Machine learning and deep learning in energy systems: A review. Sustainability 2022, 14, 4832. [Google Scholar] [CrossRef]
  47. Syamala, M.; Komala, C.; Pramila, P.; Dash, S.; Meenakshi, S.; Boopathi, S. Machine learning-integrated IoT-based smart home energy management system. In Handbook of Research on Deep Learning Techniques for Cloud-Based Industrial IoT; IGI Global: Hershey, PA, USA, 2023; pp. 219–235. [Google Scholar] [CrossRef]
  48. Deng, X.; Zhang, Y.; Jiang, Y.; Qi, H. A novel operation method for renewable building by combining distributed DC energy system and deep reinforcement learning. Appl. Energy 2024, 353, 122188. [Google Scholar] [CrossRef]
  49. Al Sayed, K.; Boodi, A.; Broujeny, R.S.; Beddiar, K. Reinforcement learning for HVAC control in intelligent buildings: A technical and conceptual review. J. Build. Eng. 2024, 95, 110085. [Google Scholar] [CrossRef]
  50. Gokhale, G.; Tiben, N.; Verwee, M.S.; Lahariya, M.; Claessens, B.; Develder, C. Real-world implementation of reinforcement learning based energy coordination for a cluster of households. In Proceedings of the 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Istanbul Turkey, 15–16 November 2023; pp. 347–351. [Google Scholar] [CrossRef]
  51. Yang, L.; Nagy, Z.; Goffin, P.; Schlueter, A. Reinforcement learning for optimal control of low exergy buildings. Appl. Energy 2015, 156, 577–586. [Google Scholar] [CrossRef]
  52. Liu, J.; Abbas, I.; Noor, R.S. Development of deep learning-based variable rate agrochemical spraying system for targeted weeds control in strawberry crop. Agronomy 2021, 11, 1480. [Google Scholar] [CrossRef]
  53. Yu, L.; Xu, Z.; Zhang, T.; Guan, X.; Yue, D. Energy-efficient personalized thermal comfort control in office buildings based on multi-agent deep reinforcement learning. Build. Environ. 2022, 223, 109458. [Google Scholar] [CrossRef]
  54. Shen, R.; Zhong, S.; Wen, X.; An, Q.; Zheng, R.; Li, Y.; Zhao, J. Multi-agent deep reinforcement learning optimization framework for building energy system with renewable energy. Appl. Energy 2022, 312, 118724. [Google Scholar] [CrossRef]
  55. Wilk, P.; Wang, N.; Li, J. Multi-Agent Reinforcement Learning for Smart Community Energy Management. Energies 2024, 17, 5211. [Google Scholar] [CrossRef]
  56. Coraci, D.; Brandi, S.; Hong, T.; Capozzoli, A. Online transfer learning strategy for enhancing the scalability and deployment of deep reinforcement learning control in smart buildings. Appl. Energy 2023, 333, 120598. [Google Scholar] [CrossRef]
  57. Ye, Y.; Wang, H.; Chen, P.; Tang, Y.; Strbac, G. Safe deep reinforcement learning for microgrid energy management in distribution networks with leveraged spatial–temporal perception. IEEE Trans. Smart Grid 2023, 14, 3759–3775. [Google Scholar] [CrossRef]
  58. Zareef, M.; Chen, Q.; Hassan, M.M.; Arslan, M.; Hashim, M.M.; Ahmad, W.; Kutsanedzie, F.Y.; Agyekum, A.A. An overview on the applications of typical non-linear algorithms coupled with NIR spectroscopy in food analysis. Food Eng. Rev. 2020, 12, 173–190. [Google Scholar] [CrossRef]
  59. Alabdullah, M.H.; Abido, M.A. Microgrid energy management using deep Q-network reinforcement learning. Alex. Eng. J. 2022, 61, 9069–9078. [Google Scholar] [CrossRef]
  60. Ifaei, P.; Nazari-Heris, M.; Charmchi, A.S.T.; Asadi, S.; Yoo, C. Sustainable energies and machine learning: An organized review of recent applications and challenges. Energy 2023, 266, 126432. [Google Scholar] [CrossRef]
  61. Shah, S.F.A.; Iqbal, M.; Aziz, Z.; Rana, T.A.; Khalid, A.; Cheah, Y.N.; Arif, M. The role of machine learning and the internet of things in smart buildings for energy efficiency. Appl. Sci. 2022, 12, 7882. [Google Scholar] [CrossRef]
  62. Chang, X.; Huang, X.; Xu, W.; Tian, X.; Wang, C.; Wang, L.; Yu, S. Monitoring of dough fermentation during Chinese steamed bread processing by near-infrared spectroscopy combined with spectra selection and supervised learning algorithm. J. Food Process Eng. 2021, 44, e13783. [Google Scholar] [CrossRef]
  63. Zhou, X.; Zhao, C.; Sun, J.; Cao, Y.; Yao, K.; Xu, M. A deep learning method for predicting lead content in oilseed rape leaves using fluorescence hyperspectral imaging. Food Chem. 2023, 409, 135251. [Google Scholar] [CrossRef] [PubMed]
  64. Chaturvedi, S.; Bui, V.H.; Su, W.; Wang, M. Reinforcement learning-based integrated control to improve the efficiency of dc microgrids. IEEE Trans. Smart Grid 2023, 15, 149–159. [Google Scholar] [CrossRef]
  65. Domínguez-Barbero, D.; García-González, J.; Sanz-Bobi, M.A.; Sánchez-Úbeda, E.F. Optimising a microgrid system by deep reinforcement learning techniques. Energies 2020, 13, 2830. [Google Scholar] [CrossRef]
  66. Guo, C.; Wang, X.; Zheng, Y.; Zhang, F. Optimal energy management of multi-microgrids connected to distribution system based on deep reinforcement learning. Int. J. Electr. Power Energy Syst. 2021, 131, 107048. [Google Scholar] [CrossRef]
  67. Gutiérrez-Escalona, J.; Roncero-Clemente, C.; Husev, O.; Matiushkin, O.; Barrero-González, F.; González-Romera, E. Reinforcement Learning-based Energy Management Strategy for Flexible Hybrid ac/dc Microgrid. In Proceedings of the IECON 2024-50th Annual Conference of the IEEE Industrial Electronics Society, Chicago, IL, USA, 3–6 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
  68. Xue, W.; Jia, N.; Zhao, M. Multi-agent deep reinforcement learning based HVAC control for multi-zone buildings considering zone-energy-allocation optimization. Energy Build. 2025, 329, 115241. [Google Scholar] [CrossRef]
  69. Sabahi, K.; Jamil, M.; Shokri-Kalandaragh, Y.; Tavan, M.; Arya, Y. Deep deterministic policy gradient reinforcement learning based adaptive PID load frequency control of an AC micro-grid. IEEE Can. J. Electr. Comput. Eng. 2024, 47, 15–21. [Google Scholar] [CrossRef]
  70. Xiong, B.; Zhang, L.; Hu, Y.; Fang, F.; Liu, Q.; Cheng, L. Deep reinforcement learning for optimal microgrid energy management with renewable energy and electric vehicle integration. Appl. Soft Comput. 2025, 176, 113180. [Google Scholar] [CrossRef]
  71. Zhang, Z.; Shi, J.; Yang, W.; Song, Z.; Chen, Z.; Lin, D. Deep reinforcement learning based Bi-layer optimal scheduling for microgrids considering flexible load control. CSEE J. Power Energy Syst. 2022, 9, 949–962. [Google Scholar] [CrossRef]
  72. Lee, S.; Seon, J.; Sun, Y.G.; Kim, S.H.; Kyeong, C.; Kim, D.I.; Kim, J.Y. Novel architecture of energy management systems based on deep reinforcement learning in microgrid. IEEE Trans. Smart Grid 2023, 15, 1646–1658. [Google Scholar] [CrossRef]
  73. Huang, B.; Wang, J. Deep-reinforcement-learning-based capacity scheduling for PV-battery storage system. IEEE Trans. Smart Grid 2020, 12, 2272–2283. [Google Scholar] [CrossRef]
  74. Hu, C.; Cai, Z.; Zhang, Y.; Yan, R.; Cai, Y.; Cen, B. A soft actor–critic deep reinforcement learning method for multi-timescale coordinated operation of microgrids. Prot. Control Mod. Power Syst. 2022, 7, 29. [Google Scholar] [CrossRef]
  75. Du, W.; Huang, X.; Zhu, Y.; Wang, L.; Deng, W. Deep reinforcement learning for adaptive frequency control of island microgrid considering control performance and economy. Front. Energy Res. 2024, 12, 1361869. [Google Scholar] [CrossRef]
  76. Sepehrzad, R.; Langeroudi, A.S.G.; Khodadadi, A.; Adinehpour, S.; Al-Durra, A.; Anvari-Moghaddam, A. An applied deep reinforcement learning approach to control active networked microgrids in smart cities with multi-level participation of battery energy storage system and electric vehicles. Sustain. Cities Soc. 2024, 107, 105352. [Google Scholar] [CrossRef]
  77. Sang, J.; Sun, H.; Kou, L. Deep reinforcement learning microgrid optimization strategy considering priority flexible demand side. Sensors 2022, 22, 2256. [Google Scholar] [CrossRef]
  78. Jones, G.; Li, X.; Sun, Y. Robust energy management policies for solar microgrids via reinforcement learning. Energies 2024, 17, 2821. [Google Scholar] [CrossRef]
  79. Hosseini, E.; Horrillo-Quintero, P.; Carrasco-Gonzalez, D.; Garcia-Trivino, P.; Sarrias-Mena, R.; Garcia-Vazquez, C.A.; Fernandez-Ramirez, L.M. Reinforcement learning-based energy management system for lithium-ion battery storage in multilevel microgrid. J. Energy Storage 2025, 109, 115114. [Google Scholar] [CrossRef]
  80. Harrold, D.J.; Cao, J.; Fan, Z. Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning. Appl. Energy 2022, 318, 119151. [Google Scholar] [CrossRef]
  81. Fan, Z.; Zhang, W.; Liu, W. Multi-agent deep reinforcement learning-based distributed optimal generation control of DC microgrids. IEEE Trans. Smart Grid 2023, 14, 3337–3351. [Google Scholar] [CrossRef]
  82. Liu, Y.; Qie, T.; Yu, Y.; Wang, Y.; Chau, T.K.; Zhang, X.; Manandhar, U.; Li, S.; Iu, H.H.; Fernando, T. A novel integral reinforcement learning-based control method assisted by twin delayed deep deterministic policy gradient for solid oxide fuel cell in DC microgrid. IEEE Trans. Sustain. Energy 2022, 14, 688–703. [Google Scholar] [CrossRef]
  83. Cui, Y.; Xu, Y.; Li, Y.; Wang, Y.; Zou, X. Deep reinforcement learning based optimal energy management of multi-energy microgrids with uncertainties. CSEE J. Power Energy Syst. 2024. Available online: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10609308 (accessed on 20 January 2026).
  84. Rajamallaiah, A.; Karri, S.P.K.; Shankar, Y.R. Deep reinforcement learning based control strategy for voltage regulation of DC-DC Buck converter feeding CPLs in DC microgrid. IEEE Access 2024, 12, 17419–17430. [Google Scholar] [CrossRef]
  85. Stavrev, S.; Ginchev, D. Reinforcement learning techniques in optimizing energy systems. Electronics 2024, 13, 1459. [Google Scholar] [CrossRef]
  86. Zhou, X.; Xue, S.; Du, H.; Ma, Z. Optimization of building demand flexibility using reinforcement learning and rule-based expert systems. Appl. Energy 2023, 350, 121792. [Google Scholar] [CrossRef]
  87. Al-Saadi, M.; Al-Greer, M.; Short, M. Reinforcement learning-based intelligent control strategies for optimal power management in advanced power distribution systems: A survey. Energies 2023, 16, 1608. [Google Scholar] [CrossRef]
  88. Duan, J.; Yi, Z.; Shi, D.; Lin, C.; Lu, X.; Wang, Z. Reinforcement-learning-based optimal control of hybrid energy storage systems in hybrid AC–DC microgrids. IEEE Trans. Ind. Inform. 2019, 15, 5355–5364. [Google Scholar] [CrossRef]
  89. Dey, S.; Marzullo, T.; Henze, G. Inverse reinforcement learning control for building energy management. Energy Build. 2023, 286, 112941. [Google Scholar] [CrossRef]
  90. Zhang, H.; Seal, S.; Wu, D.; Bouffard, F.; Boulet, B. Building energy management with reinforcement learning and model predictive control: A survey. IEEE Access 2022, 10, 27853–27862. [Google Scholar] [CrossRef]
  91. Wang, X.; Wang, P.; Huang, R.; Zhu, X.; Arroyo, J.; Li, N. Safe deep reinforcement learning for building energy management. Appl. Energy 2025, 377, 124328. [Google Scholar] [CrossRef]
  92. Dey, S.; Henze, G.P. Reinforcement Learning Building Control: An Online Approach with Guided Exploration Using Surrogate Models. ASME J. Eng. Sustain. Build. Cities 2024, 5, 011005. [Google Scholar] [CrossRef]
  93. Qin, Y.; Ke, J.; Wang, B.; Filaretov, G.F. Energy optimization for regional buildings based on distributed reinforcement learning. Sustain. Cities Soc. 2022, 78, 103625. [Google Scholar] [CrossRef]
  94. Vamvakas, D.; Michailidis, P.; Korkas, C.; Kosmatopoulos, E. Review and evaluation of reinforcement learning frameworks on smart grid applications. Energies 2023, 16, 5326. [Google Scholar] [CrossRef]
  95. Li, Z.; Sun, Z.; Meng, Q.; Wang, Y.; Li, Y. Reinforcement learning of room temperature set-point of thermal storage air-conditioning system with demand response. Energy Build. 2022, 259, 111903. [Google Scholar] [CrossRef]
  96. Shaqour, A.; Hagishima, A. Systematic review on deep reinforcement learning-based energy management for different building types. Energies 2022, 15, 8663. [Google Scholar] [CrossRef]
  97. Nagy, Z.; Henze, G.; Dey, S.; Arroyo, J.; Helsen, L.; Zhang, X.; Chen, B.; Amasyali, K.; Kurte, K.; Zamzam, A.; et al. Ten questions concerning reinforcement learning for building energy management. Build. Environ. 2023, 241, 110435. [Google Scholar] [CrossRef]
  98. Zhang, T.; Sun, M.; Qiu, D.; Zhang, X.; Strbac, G.; Kang, C. A Bayesian deep reinforcement learning-based resilient control for multi-energy micro-gird. IEEE Trans. Power Syst. 2023, 38, 5057–5072. [Google Scholar] [CrossRef]
  99. Zhang, S.; Jia, R.; Pan, H.; Cao, Y. A safe reinforcement learning-based charging strategy for electric vehicles in residential microgrid. Appl. Energy 2023, 348, 121490. [Google Scholar] [CrossRef]
  100. Liu, X.; Liu, X.; Jiang, Y.; Zhang, T.; Hao, B. Photovoltaics and energy storage integrated flexible direct current distribution systems of buildings: Definition, technology review, and application. CSEE J. Power Energy Syst. 2022, 9, 829–845. [Google Scholar] [CrossRef]
  101. Ukoba, K.; Olatunji, K.O.; Adeoye, E.; Jen, T.C.; Madyira, D.M. Optimizing renewable energy systems through artificial intelligence: Review and future prospects. Energy Environ. 2024, 35, 3833–3879. [Google Scholar] [CrossRef]
  102. Li, J.; Cheng, Y. Deep meta-reinforcement learning-based data-driven active fault tolerance load frequency control for islanded microgrids considering Internet of Things. IEEE Internet Things J. 2023, 11, 10295–10303. [Google Scholar] [CrossRef]
  103. Tomin, N.; Zhukov, A.; Domyshev, A. Deep reinforcement learning for energy microgrids management considering flexible energy sources. In Proceedings of the EPJ Web of Conferences; EDP Sciences: Les Ulis Cedex, France, 2019; Volume 217, p. 01016. [Google Scholar] [CrossRef]
  104. Zhao, J.; Li, F.; Mukherjee, S.; Sticht, C. Deep reinforcement learning-based model-free on-line dynamic multi-microgrid formation to enhance resilience. IEEE Trans. Smart Grid 2022, 13, 2557–2567. [Google Scholar] [CrossRef]
  105. Lu, Y.; Xiang, Y.; Huang, Y.; Yu, B.; Weng, L.; Liu, J. Deep reinforcement learning based optimal scheduling of active distribution system considering distributed generation, energy storage and flexible load. Energy 2023, 271, 127087. [Google Scholar] [CrossRef]
  106. Dai, X.; Chen, R.; Guan, S.; Li, W.T.; Yuen, C. BuildingGym: An open-source toolbox for AI-based building energy management using reinforcement learning. In Proceedings of the Building Simulation; Springer: Berlin/Heidelberg, Germany, 2025; Volume 18, pp. 1909–1927. [Google Scholar] [CrossRef]
  107. Khalafian, F.; Iliaee, N.; Diakina, E.; Parsa, P.; Alhaider, M.M.; Masali, M.H.; Pirouzi, S.; Zhu, M. Capabilities of compressed air energy storage in the economic design of renewable off-grid system to supply electricity and heat costumers and smart charging-based electric vehicles. J. Energy Storage 2024, 78, 109888. [Google Scholar] [CrossRef]
  108. Du, Y.; Wu, D. Deep reinforcement learning from demonstrations to assist service restoration in islanded microgrids. IEEE Trans. Sustain. Energy 2022, 13, 1062–1072. [Google Scholar] [CrossRef]
Figure 1. Overall framework of a PEDF building energy system.
Figure 1. Overall framework of a PEDF building energy system.
Energies 19 00648 g001
Figure 2. Candidate power electronic interfaces in a PEDF system: (a) bidirectional DC/DC converter for energy storage interfacing, providing controlled charging/discharging and DC bus voltage stabilization; (b) bidirectional DC/AC inverter for grid interconnection, ensuring voltage and frequency compatibility in both grid-connected and islanded modes. The illustrated topologies represent typical candidate implementations rather than fixed or mandatory designs.
Figure 2. Candidate power electronic interfaces in a PEDF system: (a) bidirectional DC/DC converter for energy storage interfacing, providing controlled charging/discharging and DC bus voltage stabilization; (b) bidirectional DC/AC inverter for grid interconnection, ensuring voltage and frequency compatibility in both grid-connected and islanded modes. The illustrated topologies represent typical candidate implementations rather than fixed or mandatory designs.
Energies 19 00648 g002
Figure 3. A representative application of a PEDF microgrid in a building energy system with a shared DC bus.
Figure 3. A representative application of a PEDF microgrid in a building energy system with a shared DC bus.
Energies 19 00648 g003
Figure 5. A representative RL-based PEDF control framework.
Figure 5. A representative RL-based PEDF control framework.
Energies 19 00648 g005
Figure 6. DQN framework for discrete control in PEDF microgrids.
Figure 6. DQN framework for discrete control in PEDF microgrids.
Energies 19 00648 g006
Figure 7. DDPG actor–critic architecture for continuous control.
Figure 7. DDPG actor–critic architecture for continuous control.
Energies 19 00648 g007
Table 1. Evolution of building-integrated power systems from AC microgrids to DC microgrids and PEDF architectures.
Table 1. Evolution of building-integrated power systems from AC microgrids to DC microgrids and PEDF architectures.
System TypeAdvantagesLimitationsTechnical Drivers for EvolutionAdaptability to High Renewable Penetration in Buildings
AC MicrogridMature protection, strong legacy compatibility; suitable for traditional loads.Requires repeated AC/DC conversions; synchronization overhead; higher harmonic distortion.Increasing penetration of DC-native loads and distributed PV; rising conversion losses.Medium—effective but limited efficiency in high-PV scenarios.
DC MicrogridFewer conversions; lower harmonics; natural integration with PV/BESS.Limited flexibility resources; weak support for demand-side interaction; control standardization lacking.Proliferation of DC appliances, EVs, and DC chargers; local PV–storage demand.High—efficient renewable utilization but insufficient flexibility.
PEDF SystemUnified PV–storage–DC flexible control; high renewable consumption; supports demand response.Higher EMS complexity; requires unified DC interface standards.Building-level carbon neutrality; intelligent demand response; flexibility-centric system coordination.Very high—supports deep decarbonization and flexible operation.
Table 2. Main components and functional roles of subsystems in a PEDF architecture.
Table 2. Main components and functional roles of subsystems in a PEDF architecture.
SubsystemKey ComponentsPrimary FunctionsControl InterfacesNotes
PV Generation SubsystemPV arrays, unidirectional DC/DC converters with MPPTSolar energy harvesting and controlled injection into the shared DC backboneMPPT algorithms, converter duty-cycle controlPower output is intermittent and strongly dependent on irradiance and temperature
Energy Storage SubsystemElectrochemical batteries, bidirectional DC/DC convertersEnergy buffering, DC voltage stabilization, peak shaving, and flexibility provisionSOC regulation, charge–discharge scheduling, voltage supportBuck/Boost-based topologies enable fast bidirectional power exchange
Grid Interface SubsystemBidirectional AC/DC inverter, protection, and synchronization unitsBidirectional grid interaction, islanded operation, and ancillary service supportVoltage–frequency control, grid current regulationForms the external coupling point between the DC microgrid and the utility grid
DC Bus and Flexible Load SubsystemMulti-level DC bus (48/220/375/750 V), lighting, ICT loads, EV chargersDC power distribution, flexible demand integration, and local power routingLoad scheduling, demand response signals, DC voltage coordinationFlexible loads (e.g., HVAC) are grouped based on EMS-level controllability and flexibility abstraction, independent of their physical AC or DC electrical interfaces.
Energy Management System (EMS)Supervisory controller, sensing, communication, and monitoring modulesSystem-wide coordination of generation, storage, loads, and grid power flowsSet-point dispatch, optimization, and learning-based control interfacesProvides the interface between physical assets and advanced data-driven control strategies
Table 3. Representative RL algorithms for DC flexible microgrid control.
Table 3. Representative RL algorithms for DC flexible microgrid control.
AlgorithmPolicy TypeAction SpaceKey CharacteristicsTypical Applications
DQN [64,65,66,67]Off-policyDiscreteValue-based learning with deep Q approximationMode selection, generator on/off scheduling, basic demand response
DDPG [12,66,68,69]Off-policyContinuousDeterministic actor–critic for continuous controlBattery charge–discharge control, inverter power regulation
PPO [70,71,72,73]On-policyContinuousStable policy updates via clipped objectiveReal-time EMS, adaptive load coordination
SAC [74,75,76]Off-policyContinuousEntropy-regularized stochastic policyPV–storage coordination, DC bus voltage regulation
A2C/A3C [77,78]On-policyDiscrete/ContinuousParallel actor–critic learningDistributed EMS and multi-agent control
TD3 [79,80,81,82,83,84]Off-policyContinuousTwin critics mitigate value overestimationOptimal power flow and storage control in AC/DC microgrids
Table 4. Key challenges for scaling RL in PEDF systems.
Table 4. Key challenges for scaling RL in PEDF systems.
Challenge DomainSystem ContextRL LimitationsRequired AdvancesOpen Gaps
Safety-Critical ExplorationVoltage, SOC, thermal, and power quality constraintsUnsafe trial-and-error; no formal guaranteesSafe RL, constrained learning, hybrid controlReal-time safety certification under fast dynamics
Sample EfficiencyLimited safe operational dataLong training cycles and high data demandOffline RL, model-based RL, meta-learningLack of validated PEDF datasets
Forecast UncertaintyStochastic PV, prices, EVs, and loadsPolicy brittleness under prediction errorsUncertainty-aware states and rewardsNo standard uncertainty modeling framework
Distributed CoordinationMulti-building and energy community operationNon-stationarity and communication overheadCTDE-based MARL, scalable coordinationScalability and privacy constraints
Real-Time DeploymentFast control, regulation, and cybersecurityBlack-box policies and cyber vulnerabilityExplainable, lightweight, secure RLCertification and trustworthiness
Table 5. Future research directions for RL-driven PEDF systems.
Table 5. Future research directions for RL-driven PEDF systems.
DirectionCore FocusKey OutcomeTime HorizonRemarks
Safety-by-Design RLConstrained learning and safety filtersCertified control under physical limitsShort-termFundamental requirement for deployment
Sample-Efficient RLOffline, model-based, and uncertainty-aware RLReduced data demand and improved robustnessMedium-termAddresses limited real-world data
Hierarchical & Multi-Agent RLLayered control and CTDE-based coordinationScalable operation across assets and buildingsMedium-termKey enabler for PEDF clusters
Real-Time IntegrationLightweight policies and EMS compatibilityReliable online executionMedium-termFocus on latency and computation limits
Standardization & BenchmarkingDC interfaces, datasets, and policy alignmentInteroperable and reproducible deploymentLong-termRequires socio-technical coordination
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, J.; Xue, W.; Li, K. Reinforcement Learning-Driven Control Strategies for DC Flexible Microgrids: Challenges and Future. Energies 2026, 19, 648. https://doi.org/10.3390/en19030648

AMA Style

Shi J, Xue W, Li K. Reinforcement Learning-Driven Control Strategies for DC Flexible Microgrids: Challenges and Future. Energies. 2026; 19(3):648. https://doi.org/10.3390/en19030648

Chicago/Turabian Style

Shi, Jialu, Wenping Xue, and Kangji Li. 2026. "Reinforcement Learning-Driven Control Strategies for DC Flexible Microgrids: Challenges and Future" Energies 19, no. 3: 648. https://doi.org/10.3390/en19030648

APA Style

Shi, J., Xue, W., & Li, K. (2026). Reinforcement Learning-Driven Control Strategies for DC Flexible Microgrids: Challenges and Future. Energies, 19(3), 648. https://doi.org/10.3390/en19030648

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop