Reinforcement Learning-Driven Control Strategies for DC Flexible Microgrids: Challenges and Future
Abstract
1. Introduction
2. From AC Microgrids to PEDF Systems—Definition and System Architecture
2.1. Limitations of AC Microgrids Under High Renewable Penetration
2.2. PEDF Architecture
2.3. Coupling with Building Energy Systems
- DC intra- and inter-building distribution, which minimizes conversion losses, simplifies system architecture, and enhances compatibility with native DC sources and loads.
- Integrated PV-storage coordination, allowing higher levels of renewable self-consumption, improved operational autonomy, and enhanced resilience through collective energy balancing.
- Hierarchical and flexible control, in which a supervisory EMS orchestrates energy flows across buildings, supports peak shaving and valley filling, and enables participation in demand response and ancillary service markets.
3. RL for Energy Management in PEDF Systems
3.1. RL: Definition and Basic Framework

3.2. Potential and Advantages of RL for PEDF Systems
3.3. Categories of RL Algorithms
4. Key Challenges
- (1)
- Safety-critical exploration and constraint enforcement.
- (2)
- Sample efficiency and data availability.
- (3)
- Forecast uncertainty and dynamic coupling effects.
- (4)
- Distributed coordination and communication constraints.
- (5)
- Real-time deployment, interpretability, and cybersecurity.
5. Future Perspectives
- (1)
- Safety-by-design and certifiable RL control.
- (2)
- Sample efficiency and uncertainty-aware optimization.
- (3)
- Hierarchical and MARL.
- (4)
- Real-time implementation and system integration.
- (5)
- Standardization, benchmarking, and socio-technical integration.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sheida, K.; Seyedi, M.; Afridi, M.A.; Ferdowsi, F.; Khattak, M.J.; Gopu, V.K.; Rupnow, T. Resilient reinforcement learning for voltage control in an islanded dc microgrid integrating data-driven piezoelectric. Machines 2024, 12, 694. [Google Scholar] [CrossRef]
- Muriithi, G.; Chowdhury, S. Optimal energy management of a grid-tied solar pv-battery microgrid: A reinforcement learning approach. Energies 2021, 14, 2700. [Google Scholar] [CrossRef]
- Zhou, X.; Lin, W.; Kumar, R.; Cui, P.; Ma, Z. A data-driven strategy using long short term memory models and reinforcement learning to predict building electricity consumption. Appl. Energy 2022, 306, 118078. [Google Scholar] [CrossRef]
- Liu, X.; Ren, M.; Yang, Z.; Yan, G.; Guo, Y.; Cheng, L.; Wu, C. A multi-step predictive deep reinforcement learning algorithm for HVAC control systems in smart buildings. Energy 2022, 259, 124857. [Google Scholar] [CrossRef]
- Li, K.; Luo, Y.; Shen, Y.; Xue, W. Towards personalized HVAC: A non-contact human thermal sensation monitoring and regulation system. Energy Build. 2025, 350, 116649. [Google Scholar] [CrossRef]
- Meng, Q.; Hussain, S.; Luo, F.; Wang, Z.; Jin, X. An online reinforcement learning-based energy management strategy for microgrids with centralized control. IEEE Trans. Ind. Appl. 2024, 61, 1501–1510. [Google Scholar] [CrossRef]
- Sivamayil, K.; Rajasekar, E.; Aljafari, B.; Nikolovski, S.; Vairavasundaram, S.; Vairavasundaram, I. A systematic study on reinforcement learning based applications. Energies 2023, 16, 1512. [Google Scholar] [CrossRef]
- Vázquez-Canteli, J.R.; Nagy, Z. Reinforcement learning for demand response: A review of algorithms and modeling techniques. Appl. Energy 2019, 235, 1072–1089. [Google Scholar] [CrossRef]
- Gao, Y.; Matsunami, Y.; Miyata, S.; Akashi, Y. Operational optimization for off-grid renewable building energy system using deep reinforcement learning. Appl. Energy 2022, 325, 119783. [Google Scholar] [CrossRef]
- Michailidis, P.; Michailidis, I.; Kosmatopoulos, E. Reinforcement learning for optimizing renewable energy utilization in buildings: A review on applications and innovations. Energies 2025, 18, 1724. [Google Scholar] [CrossRef]
- Wan, Y. Advancing Intelligent DC Microgrids: AI-Enabled Control, Cyber Security, and Energy Management. Ph.D. Thesis, Technical University of Denmark (DTU), Lyngby, Denmark, 2023. [Google Scholar] [CrossRef]
- Lai, H.; Xiong, K.; Zhang, Z.; Chen, Z. Droop control strategy for microgrid inverters: A deep reinforcement learning enhanced approach. Energy Rep. 2023, 9, 567–575. [Google Scholar] [CrossRef]
- Duan, J.; Wang, C.; Xu, H.; Liu, W.; Xue, Y.; Peng, J.C.; Jiang, H. Distributed control of inverter-interfaced microgrids based on consensus algorithm with improved transient performance. IEEE Trans. Smart Grid 2017, 10, 1303–1312. [Google Scholar] [CrossRef]
- Akbulut, O.; Cavus, M.; Cengiz, M.; Allahham, A.; Giaouris, D.; Forshaw, M. Hybrid Intelligent Control System for Adaptive Microgrid optimization: Integration of rule-based control and deep learning techniques. Energies 2024, 17, 2260. [Google Scholar] [CrossRef]
- Liwei, Z. Analysis of PEDF Systems and Their Application Challenges and Countermeasures in Buildings; Sichuan Cement: Chengdu, China, 2022; pp. 26–28. [Google Scholar] [CrossRef]
- Carkhuff, B.G.; Demirev, P.A.; Srinivasan, R. Impedance-based battery management system for safety monitoring of lithium-ion batteries. IEEE Trans. Ind. Electron. 2018, 65, 6497–6504. [Google Scholar] [CrossRef]
- Nguyen, A.T.; Pham, D.H.; Oo, B.L.; Santamouris, M.; Ahn, Y.; Lim, B.T. modeling building HVAC control strategies using a deep reinforcement learning approach. Energy Build. 2024, 310, 114065. [Google Scholar] [CrossRef]
- Fu, Q.; Chen, X.; Ma, S.; Fang, N.; Xing, B.; Chen, J. Optimal control method of HVAC based on multi-agent deep reinforcement learning. Energy Build. 2022, 270, 112284. [Google Scholar] [CrossRef]
- Deng, X.; Zhang, Y.; Qi, H. Towards optimal HVAC control in non-stationary building environments combining active change detection and deep reinforcement learning. Build. Environ. 2022, 211, 108680. [Google Scholar] [CrossRef]
- Sabzalian, M.H.; Pirouzi, S.; Aredes, M.; Wanderley Franca, B.; Carolina Cunha, A. Two-layer coordinated energy management method in the smart distribution network including multi-microgrid based on the hybrid flexible and securable operation strategy. Int. Trans. Electr. Energy Syst. 2022, 2022, 3378538. [Google Scholar] [CrossRef]
- Chen, Y.; Yu, Z.; Han, Z.; Sun, W.; He, L. A decision-making system for cotton irrigation based on reinforcement learning strategy. Agronomy 2023, 14, 11. [Google Scholar] [CrossRef]
- Xie, F.; Guo, Z.; Li, T.; Feng, Q.; Zhao, C. Dynamic Task Planning for Multi-Arm Harvesting Robots Under Multiple Constraints Using Deep Reinforcement Learning. Horticulturae 2025, 11, 88. [Google Scholar] [CrossRef]
- Akbari, E.; Faraji Naghibi, A.; Veisi, M.; Shahparnia, A.; Pirouzi, S. Multi-objective economic operation of smart distribution network with renewable-flexible virtual power plants considering voltage security index. Sci. Rep. 2024, 14, 19136. [Google Scholar] [CrossRef] [PubMed]
- Arroyo, J.; Manna, C.; Spiessens, F.; Helsen, L. Reinforced model predictive control (RL-MPC) for building energy management. Appl. Energy 2022, 309, 118346. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998; Volume 22447. [Google Scholar]
- Waghmare, A.; Singh, V.; Varshney, T.; Sanjeevikumar, P. A systematic review of reinforcement learning-based control for microgrids: Trends, challenges, and emerging algorithms. Discov. Appl. Sci. 2025, 7, 939. [Google Scholar] [CrossRef]
- Chen, Y.; Lin, M.; Yu, Z.; Sun, W.; Fu, W.; He, L. Enhancing cotton irrigation with distributional actor–critic reinforcement learning. Agric. Water Manag. 2025, 307, 109194. [Google Scholar] [CrossRef]
- Zhou, X.; Sun, J.; Tian, Y.; Lu, B.; Hang, Y.; Chen, Q. Hyperspectral technique combined with deep learning algorithm for detection of compound heavy metals in lettuce. Food Chem. 2020, 321, 126503. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Wang, R.; Yang, Z. Optimal scheduling of isolated microgrids using automated reinforcement learning-based multi-period forecasting. IEEE Trans. Sustain. Energy 2021, 13, 159–169. [Google Scholar] [CrossRef]
- Zhao, J.; Fan, S.; Zhang, B.; Wang, A.; Zhang, L.; Zhu, Q. Research Status and Development Trends of Deep Reinforcement Learning in the Intelligent Transformation of Agricultural Machinery. Agriculture 2025, 15, 1223. [Google Scholar] [CrossRef]
- Cai, W.; Kordabad, A.B.; Gros, S. Energy management in residential microgrid using model predictive control-based reinforcement learning and Shapley value. Eng. Appl. Artif. Intell. 2023, 119, 105793. [Google Scholar] [CrossRef]
- Guo, C.; Wang, X.; Zheng, Y.; Zhang, F. Real-time optimal energy management of microgrid with uncertainties based on deep reinforcement learning. Energy 2022, 238, 121873. [Google Scholar] [CrossRef]
- Li, K.; Shi, J.; Hu, C.; Xue, W. The Intelligentization Process of Agricultural Greenhouse: A Review of Control Strategies and Modeling Techniques. Agriculture 2025, 15, 2135. [Google Scholar] [CrossRef]
- Wang, D.; Zheng, W.; Wang, Z.; Wang, Y.; Pang, X.; Wang, W. Comparison of reinforcement learning and model predictive control for building energy system optimization. Appl. Therm. Eng. 2023, 228, 120430. [Google Scholar] [CrossRef]
- Alotaibi, B.S. Context-aware smart energy management system: A reinforcement learning and IoT-based framework for enhancing energy efficiency and thermal comfort in sustainable buildings. Energy Build. 2025, 340, 115804. [Google Scholar] [CrossRef]
- Liu, S.; Han, S.; Zhu, S. Reinforcement learning-based energy trading and management of regional interconnected microgrids. IEEE Trans. Smart Grid 2022, 14, 2047–2059. [Google Scholar] [CrossRef]
- Li, K.; Sha, Z.; Xue, W.; Chen, X.; Mao, H.; Tan, G. A fast modeling and optimization scheme for greenhouse environmental system using proper orthogonal decomposition and multi-objective genetic algorithm. Comput. Electron. Agric. 2020, 168, 105096. [Google Scholar] [CrossRef]
- Li, K.; Mi, Y.; Zheng, W. An optimal control method for greenhouse climate management considering crop growth’s spatial distribution and energy consumption. Energies 2023, 16, 3925. [Google Scholar] [CrossRef]
- Chen, C.; Zhu, W.; Steibel, J.; Siegford, J.; Han, J.; Norton, T. Classification of drinking and drinker-playing in pigs by a video-based deep learning method. Biosyst. Eng. 2020, 196, 1–14. [Google Scholar] [CrossRef]
- Manjavacas, A.; Campoy-Nieves, A.; Jiménez-Raboso, J.; Molina-Solana, M.; Gómez-Romero, J. An experimental evaluation of deep reinforcement learning algorithms for HVAC control. Artif. Intell. Rev. 2024, 57, 173. [Google Scholar] [CrossRef]
- Kurte, K.; Amasyali, K.; Munk, J.; Zandi, H. Deep Reinforcement Learning based HVAC Control for Reducing Carbon Footprint of Buildings. In Proceedings of the 2023 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 16–19 January 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Gao, Y.; Shi, S.; Miyata, S.; Akashi, Y. Successful application of predictive information in deep reinforcement learning control: A case study based on an office building HVAC system. Energy 2024, 291, 130344. [Google Scholar] [CrossRef]
- Li, K.; Xue, W.; Mao, H.; Chen, X.; Jiang, H.; Tan, G. Optimizing the 3D distributed climate inside greenhouses using multi-objective optimization algorithms and computer fluid dynamics. Energies 2019, 12, 2873. [Google Scholar] [CrossRef]
- Kumar, P.P.; Nuvvula, R.S.; Shezan, S.A.; JM, B.; Ahammed, S.R.; Ali, A. Intelligent Energy Management System for Microgrids using Reinforcement Learning. In Proceedings of the 2024 12th International Conference on Smart Grid (icSmartGrid), Setubal, Portugal, 27–29 May 2024; pp. 329–335. [Google Scholar] [CrossRef]
- Pang, K.; Zhou, J.; Tsianikas, S.; Coit, D.W.; Ma, Y. Long-term microgrid expansion planning with resilience and environmental benefits using deep reinforcement learning. Renew. Sustain. Energy Rev. 2024, 191, 114068. [Google Scholar] [CrossRef]
- Forootan, M.M.; Larki, I.; Zahedi, R.; Ahmadi, A. Machine learning and deep learning in energy systems: A review. Sustainability 2022, 14, 4832. [Google Scholar] [CrossRef]
- Syamala, M.; Komala, C.; Pramila, P.; Dash, S.; Meenakshi, S.; Boopathi, S. Machine learning-integrated IoT-based smart home energy management system. In Handbook of Research on Deep Learning Techniques for Cloud-Based Industrial IoT; IGI Global: Hershey, PA, USA, 2023; pp. 219–235. [Google Scholar] [CrossRef]
- Deng, X.; Zhang, Y.; Jiang, Y.; Qi, H. A novel operation method for renewable building by combining distributed DC energy system and deep reinforcement learning. Appl. Energy 2024, 353, 122188. [Google Scholar] [CrossRef]
- Al Sayed, K.; Boodi, A.; Broujeny, R.S.; Beddiar, K. Reinforcement learning for HVAC control in intelligent buildings: A technical and conceptual review. J. Build. Eng. 2024, 95, 110085. [Google Scholar] [CrossRef]
- Gokhale, G.; Tiben, N.; Verwee, M.S.; Lahariya, M.; Claessens, B.; Develder, C. Real-world implementation of reinforcement learning based energy coordination for a cluster of households. In Proceedings of the 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Istanbul Turkey, 15–16 November 2023; pp. 347–351. [Google Scholar] [CrossRef]
- Yang, L.; Nagy, Z.; Goffin, P.; Schlueter, A. Reinforcement learning for optimal control of low exergy buildings. Appl. Energy 2015, 156, 577–586. [Google Scholar] [CrossRef]
- Liu, J.; Abbas, I.; Noor, R.S. Development of deep learning-based variable rate agrochemical spraying system for targeted weeds control in strawberry crop. Agronomy 2021, 11, 1480. [Google Scholar] [CrossRef]
- Yu, L.; Xu, Z.; Zhang, T.; Guan, X.; Yue, D. Energy-efficient personalized thermal comfort control in office buildings based on multi-agent deep reinforcement learning. Build. Environ. 2022, 223, 109458. [Google Scholar] [CrossRef]
- Shen, R.; Zhong, S.; Wen, X.; An, Q.; Zheng, R.; Li, Y.; Zhao, J. Multi-agent deep reinforcement learning optimization framework for building energy system with renewable energy. Appl. Energy 2022, 312, 118724. [Google Scholar] [CrossRef]
- Wilk, P.; Wang, N.; Li, J. Multi-Agent Reinforcement Learning for Smart Community Energy Management. Energies 2024, 17, 5211. [Google Scholar] [CrossRef]
- Coraci, D.; Brandi, S.; Hong, T.; Capozzoli, A. Online transfer learning strategy for enhancing the scalability and deployment of deep reinforcement learning control in smart buildings. Appl. Energy 2023, 333, 120598. [Google Scholar] [CrossRef]
- Ye, Y.; Wang, H.; Chen, P.; Tang, Y.; Strbac, G. Safe deep reinforcement learning for microgrid energy management in distribution networks with leveraged spatial–temporal perception. IEEE Trans. Smart Grid 2023, 14, 3759–3775. [Google Scholar] [CrossRef]
- Zareef, M.; Chen, Q.; Hassan, M.M.; Arslan, M.; Hashim, M.M.; Ahmad, W.; Kutsanedzie, F.Y.; Agyekum, A.A. An overview on the applications of typical non-linear algorithms coupled with NIR spectroscopy in food analysis. Food Eng. Rev. 2020, 12, 173–190. [Google Scholar] [CrossRef]
- Alabdullah, M.H.; Abido, M.A. Microgrid energy management using deep Q-network reinforcement learning. Alex. Eng. J. 2022, 61, 9069–9078. [Google Scholar] [CrossRef]
- Ifaei, P.; Nazari-Heris, M.; Charmchi, A.S.T.; Asadi, S.; Yoo, C. Sustainable energies and machine learning: An organized review of recent applications and challenges. Energy 2023, 266, 126432. [Google Scholar] [CrossRef]
- Shah, S.F.A.; Iqbal, M.; Aziz, Z.; Rana, T.A.; Khalid, A.; Cheah, Y.N.; Arif, M. The role of machine learning and the internet of things in smart buildings for energy efficiency. Appl. Sci. 2022, 12, 7882. [Google Scholar] [CrossRef]
- Chang, X.; Huang, X.; Xu, W.; Tian, X.; Wang, C.; Wang, L.; Yu, S. Monitoring of dough fermentation during Chinese steamed bread processing by near-infrared spectroscopy combined with spectra selection and supervised learning algorithm. J. Food Process Eng. 2021, 44, e13783. [Google Scholar] [CrossRef]
- Zhou, X.; Zhao, C.; Sun, J.; Cao, Y.; Yao, K.; Xu, M. A deep learning method for predicting lead content in oilseed rape leaves using fluorescence hyperspectral imaging. Food Chem. 2023, 409, 135251. [Google Scholar] [CrossRef] [PubMed]
- Chaturvedi, S.; Bui, V.H.; Su, W.; Wang, M. Reinforcement learning-based integrated control to improve the efficiency of dc microgrids. IEEE Trans. Smart Grid 2023, 15, 149–159. [Google Scholar] [CrossRef]
- Domínguez-Barbero, D.; García-González, J.; Sanz-Bobi, M.A.; Sánchez-Úbeda, E.F. Optimising a microgrid system by deep reinforcement learning techniques. Energies 2020, 13, 2830. [Google Scholar] [CrossRef]
- Guo, C.; Wang, X.; Zheng, Y.; Zhang, F. Optimal energy management of multi-microgrids connected to distribution system based on deep reinforcement learning. Int. J. Electr. Power Energy Syst. 2021, 131, 107048. [Google Scholar] [CrossRef]
- Gutiérrez-Escalona, J.; Roncero-Clemente, C.; Husev, O.; Matiushkin, O.; Barrero-González, F.; González-Romera, E. Reinforcement Learning-based Energy Management Strategy for Flexible Hybrid ac/dc Microgrid. In Proceedings of the IECON 2024-50th Annual Conference of the IEEE Industrial Electronics Society, Chicago, IL, USA, 3–6 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Xue, W.; Jia, N.; Zhao, M. Multi-agent deep reinforcement learning based HVAC control for multi-zone buildings considering zone-energy-allocation optimization. Energy Build. 2025, 329, 115241. [Google Scholar] [CrossRef]
- Sabahi, K.; Jamil, M.; Shokri-Kalandaragh, Y.; Tavan, M.; Arya, Y. Deep deterministic policy gradient reinforcement learning based adaptive PID load frequency control of an AC micro-grid. IEEE Can. J. Electr. Comput. Eng. 2024, 47, 15–21. [Google Scholar] [CrossRef]
- Xiong, B.; Zhang, L.; Hu, Y.; Fang, F.; Liu, Q.; Cheng, L. Deep reinforcement learning for optimal microgrid energy management with renewable energy and electric vehicle integration. Appl. Soft Comput. 2025, 176, 113180. [Google Scholar] [CrossRef]
- Zhang, Z.; Shi, J.; Yang, W.; Song, Z.; Chen, Z.; Lin, D. Deep reinforcement learning based Bi-layer optimal scheduling for microgrids considering flexible load control. CSEE J. Power Energy Syst. 2022, 9, 949–962. [Google Scholar] [CrossRef]
- Lee, S.; Seon, J.; Sun, Y.G.; Kim, S.H.; Kyeong, C.; Kim, D.I.; Kim, J.Y. Novel architecture of energy management systems based on deep reinforcement learning in microgrid. IEEE Trans. Smart Grid 2023, 15, 1646–1658. [Google Scholar] [CrossRef]
- Huang, B.; Wang, J. Deep-reinforcement-learning-based capacity scheduling for PV-battery storage system. IEEE Trans. Smart Grid 2020, 12, 2272–2283. [Google Scholar] [CrossRef]
- Hu, C.; Cai, Z.; Zhang, Y.; Yan, R.; Cai, Y.; Cen, B. A soft actor–critic deep reinforcement learning method for multi-timescale coordinated operation of microgrids. Prot. Control Mod. Power Syst. 2022, 7, 29. [Google Scholar] [CrossRef]
- Du, W.; Huang, X.; Zhu, Y.; Wang, L.; Deng, W. Deep reinforcement learning for adaptive frequency control of island microgrid considering control performance and economy. Front. Energy Res. 2024, 12, 1361869. [Google Scholar] [CrossRef]
- Sepehrzad, R.; Langeroudi, A.S.G.; Khodadadi, A.; Adinehpour, S.; Al-Durra, A.; Anvari-Moghaddam, A. An applied deep reinforcement learning approach to control active networked microgrids in smart cities with multi-level participation of battery energy storage system and electric vehicles. Sustain. Cities Soc. 2024, 107, 105352. [Google Scholar] [CrossRef]
- Sang, J.; Sun, H.; Kou, L. Deep reinforcement learning microgrid optimization strategy considering priority flexible demand side. Sensors 2022, 22, 2256. [Google Scholar] [CrossRef]
- Jones, G.; Li, X.; Sun, Y. Robust energy management policies for solar microgrids via reinforcement learning. Energies 2024, 17, 2821. [Google Scholar] [CrossRef]
- Hosseini, E.; Horrillo-Quintero, P.; Carrasco-Gonzalez, D.; Garcia-Trivino, P.; Sarrias-Mena, R.; Garcia-Vazquez, C.A.; Fernandez-Ramirez, L.M. Reinforcement learning-based energy management system for lithium-ion battery storage in multilevel microgrid. J. Energy Storage 2025, 109, 115114. [Google Scholar] [CrossRef]
- Harrold, D.J.; Cao, J.; Fan, Z. Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning. Appl. Energy 2022, 318, 119151. [Google Scholar] [CrossRef]
- Fan, Z.; Zhang, W.; Liu, W. Multi-agent deep reinforcement learning-based distributed optimal generation control of DC microgrids. IEEE Trans. Smart Grid 2023, 14, 3337–3351. [Google Scholar] [CrossRef]
- Liu, Y.; Qie, T.; Yu, Y.; Wang, Y.; Chau, T.K.; Zhang, X.; Manandhar, U.; Li, S.; Iu, H.H.; Fernando, T. A novel integral reinforcement learning-based control method assisted by twin delayed deep deterministic policy gradient for solid oxide fuel cell in DC microgrid. IEEE Trans. Sustain. Energy 2022, 14, 688–703. [Google Scholar] [CrossRef]
- Cui, Y.; Xu, Y.; Li, Y.; Wang, Y.; Zou, X. Deep reinforcement learning based optimal energy management of multi-energy microgrids with uncertainties. CSEE J. Power Energy Syst. 2024. Available online: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10609308 (accessed on 20 January 2026).
- Rajamallaiah, A.; Karri, S.P.K.; Shankar, Y.R. Deep reinforcement learning based control strategy for voltage regulation of DC-DC Buck converter feeding CPLs in DC microgrid. IEEE Access 2024, 12, 17419–17430. [Google Scholar] [CrossRef]
- Stavrev, S.; Ginchev, D. Reinforcement learning techniques in optimizing energy systems. Electronics 2024, 13, 1459. [Google Scholar] [CrossRef]
- Zhou, X.; Xue, S.; Du, H.; Ma, Z. Optimization of building demand flexibility using reinforcement learning and rule-based expert systems. Appl. Energy 2023, 350, 121792. [Google Scholar] [CrossRef]
- Al-Saadi, M.; Al-Greer, M.; Short, M. Reinforcement learning-based intelligent control strategies for optimal power management in advanced power distribution systems: A survey. Energies 2023, 16, 1608. [Google Scholar] [CrossRef]
- Duan, J.; Yi, Z.; Shi, D.; Lin, C.; Lu, X.; Wang, Z. Reinforcement-learning-based optimal control of hybrid energy storage systems in hybrid AC–DC microgrids. IEEE Trans. Ind. Inform. 2019, 15, 5355–5364. [Google Scholar] [CrossRef]
- Dey, S.; Marzullo, T.; Henze, G. Inverse reinforcement learning control for building energy management. Energy Build. 2023, 286, 112941. [Google Scholar] [CrossRef]
- Zhang, H.; Seal, S.; Wu, D.; Bouffard, F.; Boulet, B. Building energy management with reinforcement learning and model predictive control: A survey. IEEE Access 2022, 10, 27853–27862. [Google Scholar] [CrossRef]
- Wang, X.; Wang, P.; Huang, R.; Zhu, X.; Arroyo, J.; Li, N. Safe deep reinforcement learning for building energy management. Appl. Energy 2025, 377, 124328. [Google Scholar] [CrossRef]
- Dey, S.; Henze, G.P. Reinforcement Learning Building Control: An Online Approach with Guided Exploration Using Surrogate Models. ASME J. Eng. Sustain. Build. Cities 2024, 5, 011005. [Google Scholar] [CrossRef]
- Qin, Y.; Ke, J.; Wang, B.; Filaretov, G.F. Energy optimization for regional buildings based on distributed reinforcement learning. Sustain. Cities Soc. 2022, 78, 103625. [Google Scholar] [CrossRef]
- Vamvakas, D.; Michailidis, P.; Korkas, C.; Kosmatopoulos, E. Review and evaluation of reinforcement learning frameworks on smart grid applications. Energies 2023, 16, 5326. [Google Scholar] [CrossRef]
- Li, Z.; Sun, Z.; Meng, Q.; Wang, Y.; Li, Y. Reinforcement learning of room temperature set-point of thermal storage air-conditioning system with demand response. Energy Build. 2022, 259, 111903. [Google Scholar] [CrossRef]
- Shaqour, A.; Hagishima, A. Systematic review on deep reinforcement learning-based energy management for different building types. Energies 2022, 15, 8663. [Google Scholar] [CrossRef]
- Nagy, Z.; Henze, G.; Dey, S.; Arroyo, J.; Helsen, L.; Zhang, X.; Chen, B.; Amasyali, K.; Kurte, K.; Zamzam, A.; et al. Ten questions concerning reinforcement learning for building energy management. Build. Environ. 2023, 241, 110435. [Google Scholar] [CrossRef]
- Zhang, T.; Sun, M.; Qiu, D.; Zhang, X.; Strbac, G.; Kang, C. A Bayesian deep reinforcement learning-based resilient control for multi-energy micro-gird. IEEE Trans. Power Syst. 2023, 38, 5057–5072. [Google Scholar] [CrossRef]
- Zhang, S.; Jia, R.; Pan, H.; Cao, Y. A safe reinforcement learning-based charging strategy for electric vehicles in residential microgrid. Appl. Energy 2023, 348, 121490. [Google Scholar] [CrossRef]
- Liu, X.; Liu, X.; Jiang, Y.; Zhang, T.; Hao, B. Photovoltaics and energy storage integrated flexible direct current distribution systems of buildings: Definition, technology review, and application. CSEE J. Power Energy Syst. 2022, 9, 829–845. [Google Scholar] [CrossRef]
- Ukoba, K.; Olatunji, K.O.; Adeoye, E.; Jen, T.C.; Madyira, D.M. Optimizing renewable energy systems through artificial intelligence: Review and future prospects. Energy Environ. 2024, 35, 3833–3879. [Google Scholar] [CrossRef]
- Li, J.; Cheng, Y. Deep meta-reinforcement learning-based data-driven active fault tolerance load frequency control for islanded microgrids considering Internet of Things. IEEE Internet Things J. 2023, 11, 10295–10303. [Google Scholar] [CrossRef]
- Tomin, N.; Zhukov, A.; Domyshev, A. Deep reinforcement learning for energy microgrids management considering flexible energy sources. In Proceedings of the EPJ Web of Conferences; EDP Sciences: Les Ulis Cedex, France, 2019; Volume 217, p. 01016. [Google Scholar] [CrossRef]
- Zhao, J.; Li, F.; Mukherjee, S.; Sticht, C. Deep reinforcement learning-based model-free on-line dynamic multi-microgrid formation to enhance resilience. IEEE Trans. Smart Grid 2022, 13, 2557–2567. [Google Scholar] [CrossRef]
- Lu, Y.; Xiang, Y.; Huang, Y.; Yu, B.; Weng, L.; Liu, J. Deep reinforcement learning based optimal scheduling of active distribution system considering distributed generation, energy storage and flexible load. Energy 2023, 271, 127087. [Google Scholar] [CrossRef]
- Dai, X.; Chen, R.; Guan, S.; Li, W.T.; Yuen, C. BuildingGym: An open-source toolbox for AI-based building energy management using reinforcement learning. In Proceedings of the Building Simulation; Springer: Berlin/Heidelberg, Germany, 2025; Volume 18, pp. 1909–1927. [Google Scholar] [CrossRef]
- Khalafian, F.; Iliaee, N.; Diakina, E.; Parsa, P.; Alhaider, M.M.; Masali, M.H.; Pirouzi, S.; Zhu, M. Capabilities of compressed air energy storage in the economic design of renewable off-grid system to supply electricity and heat costumers and smart charging-based electric vehicles. J. Energy Storage 2024, 78, 109888. [Google Scholar] [CrossRef]
- Du, Y.; Wu, D. Deep reinforcement learning from demonstrations to assist service restoration in islanded microgrids. IEEE Trans. Sustain. Energy 2022, 13, 1062–1072. [Google Scholar] [CrossRef]






| System Type | Advantages | Limitations | Technical Drivers for Evolution | Adaptability to High Renewable Penetration in Buildings |
|---|---|---|---|---|
| AC Microgrid | Mature protection, strong legacy compatibility; suitable for traditional loads. | Requires repeated AC/DC conversions; synchronization overhead; higher harmonic distortion. | Increasing penetration of DC-native loads and distributed PV; rising conversion losses. | Medium—effective but limited efficiency in high-PV scenarios. |
| DC Microgrid | Fewer conversions; lower harmonics; natural integration with PV/BESS. | Limited flexibility resources; weak support for demand-side interaction; control standardization lacking. | Proliferation of DC appliances, EVs, and DC chargers; local PV–storage demand. | High—efficient renewable utilization but insufficient flexibility. |
| PEDF System | Unified PV–storage–DC flexible control; high renewable consumption; supports demand response. | Higher EMS complexity; requires unified DC interface standards. | Building-level carbon neutrality; intelligent demand response; flexibility-centric system coordination. | Very high—supports deep decarbonization and flexible operation. |
| Subsystem | Key Components | Primary Functions | Control Interfaces | Notes |
|---|---|---|---|---|
| PV Generation Subsystem | PV arrays, unidirectional DC/DC converters with MPPT | Solar energy harvesting and controlled injection into the shared DC backbone | MPPT algorithms, converter duty-cycle control | Power output is intermittent and strongly dependent on irradiance and temperature |
| Energy Storage Subsystem | Electrochemical batteries, bidirectional DC/DC converters | Energy buffering, DC voltage stabilization, peak shaving, and flexibility provision | SOC regulation, charge–discharge scheduling, voltage support | Buck/Boost-based topologies enable fast bidirectional power exchange |
| Grid Interface Subsystem | Bidirectional AC/DC inverter, protection, and synchronization units | Bidirectional grid interaction, islanded operation, and ancillary service support | Voltage–frequency control, grid current regulation | Forms the external coupling point between the DC microgrid and the utility grid |
| DC Bus and Flexible Load Subsystem | Multi-level DC bus (48/220/375/750 V), lighting, ICT loads, EV chargers | DC power distribution, flexible demand integration, and local power routing | Load scheduling, demand response signals, DC voltage coordination | Flexible loads (e.g., HVAC) are grouped based on EMS-level controllability and flexibility abstraction, independent of their physical AC or DC electrical interfaces. |
| Energy Management System (EMS) | Supervisory controller, sensing, communication, and monitoring modules | System-wide coordination of generation, storage, loads, and grid power flows | Set-point dispatch, optimization, and learning-based control interfaces | Provides the interface between physical assets and advanced data-driven control strategies |
| Algorithm | Policy Type | Action Space | Key Characteristics | Typical Applications |
|---|---|---|---|---|
| DQN [64,65,66,67] | Off-policy | Discrete | Value-based learning with deep Q approximation | Mode selection, generator on/off scheduling, basic demand response |
| DDPG [12,66,68,69] | Off-policy | Continuous | Deterministic actor–critic for continuous control | Battery charge–discharge control, inverter power regulation |
| PPO [70,71,72,73] | On-policy | Continuous | Stable policy updates via clipped objective | Real-time EMS, adaptive load coordination |
| SAC [74,75,76] | Off-policy | Continuous | Entropy-regularized stochastic policy | PV–storage coordination, DC bus voltage regulation |
| A2C/A3C [77,78] | On-policy | Discrete/Continuous | Parallel actor–critic learning | Distributed EMS and multi-agent control |
| TD3 [79,80,81,82,83,84] | Off-policy | Continuous | Twin critics mitigate value overestimation | Optimal power flow and storage control in AC/DC microgrids |
| Challenge Domain | System Context | RL Limitations | Required Advances | Open Gaps |
|---|---|---|---|---|
| Safety-Critical Exploration | Voltage, SOC, thermal, and power quality constraints | Unsafe trial-and-error; no formal guarantees | Safe RL, constrained learning, hybrid control | Real-time safety certification under fast dynamics |
| Sample Efficiency | Limited safe operational data | Long training cycles and high data demand | Offline RL, model-based RL, meta-learning | Lack of validated PEDF datasets |
| Forecast Uncertainty | Stochastic PV, prices, EVs, and loads | Policy brittleness under prediction errors | Uncertainty-aware states and rewards | No standard uncertainty modeling framework |
| Distributed Coordination | Multi-building and energy community operation | Non-stationarity and communication overhead | CTDE-based MARL, scalable coordination | Scalability and privacy constraints |
| Real-Time Deployment | Fast control, regulation, and cybersecurity | Black-box policies and cyber vulnerability | Explainable, lightweight, secure RL | Certification and trustworthiness |
| Direction | Core Focus | Key Outcome | Time Horizon | Remarks |
|---|---|---|---|---|
| Safety-by-Design RL | Constrained learning and safety filters | Certified control under physical limits | Short-term | Fundamental requirement for deployment |
| Sample-Efficient RL | Offline, model-based, and uncertainty-aware RL | Reduced data demand and improved robustness | Medium-term | Addresses limited real-world data |
| Hierarchical & Multi-Agent RL | Layered control and CTDE-based coordination | Scalable operation across assets and buildings | Medium-term | Key enabler for PEDF clusters |
| Real-Time Integration | Lightweight policies and EMS compatibility | Reliable online execution | Medium-term | Focus on latency and computation limits |
| Standardization & Benchmarking | DC interfaces, datasets, and policy alignment | Interoperable and reproducible deployment | Long-term | Requires socio-technical coordination |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Shi, J.; Xue, W.; Li, K. Reinforcement Learning-Driven Control Strategies for DC Flexible Microgrids: Challenges and Future. Energies 2026, 19, 648. https://doi.org/10.3390/en19030648
Shi J, Xue W, Li K. Reinforcement Learning-Driven Control Strategies for DC Flexible Microgrids: Challenges and Future. Energies. 2026; 19(3):648. https://doi.org/10.3390/en19030648
Chicago/Turabian StyleShi, Jialu, Wenping Xue, and Kangji Li. 2026. "Reinforcement Learning-Driven Control Strategies for DC Flexible Microgrids: Challenges and Future" Energies 19, no. 3: 648. https://doi.org/10.3390/en19030648
APA StyleShi, J., Xue, W., & Li, K. (2026). Reinforcement Learning-Driven Control Strategies for DC Flexible Microgrids: Challenges and Future. Energies, 19(3), 648. https://doi.org/10.3390/en19030648

