DR-RQL: A Sustainable Demand Response-Based Learning System for Energy Scheduling and Battery Health Estimation
Abstract
1. Introduction
- An energy management system is proposed for microgrid scenario, including a renewable energy generator, an ESS and bidirectional energy flow with the main grid. Different from previous works, a battery health monitoring module is designed to calculate the ESS degradation cost.
- A novel DR-based reinforcement learning scheme is proposed for the above energy scheduling problem. To overcome the future price/demand uncertainties in presence of the rapidly updated RTPs, LSTM is adopted to forecast the unknown electricity price and demand.
- To improve profits through adequate exploration, a random greedy strategy-based Q-learning variant is proposed to derive the optimized ESS control policy. RQL is model-free, enabling the SP to determine the control actions in real time without knowing the system dynamics.
- An examination of the SP’s operating profit under three representative energy-scheduling baselines reveals that our proposed algorithm can improve the profit by 5.04–17.31% than the other, indicating that the DR-based RQL algorithm enables the SP to make more profit while keeping the ESS healthy.
2. Related Work
3. System Model
3.1. Renewable Energy Generator
3.2. Service Provider
3.3. Battery Dynamic Model
3.4. Battery Health Model
3.5. Pricing Model
3.6. Objective Function
4. RL-Based Demand Response Energy Scheduling Algorithm
4.1. MDP Reformation
4.1.1. State
4.1.2. Action
4.1.3. Reward
4.2. Demand Response Forecasting with LSTM
4.3. Decision Making with Random Q-Learning
| Algorithm 1 RQL-Based DR Energy Scheduling Algorithm for Microgrid EMS |
| Require: , W, , , , the parameter for RQL |
| Ensure: , , the optimal Q-value table |
|
5. Case Study
5.1. Experiment Setup
5.2. Performance of DR Forecasting Model
- BiLSTM [52]: Bi-directional LSTM (BiLSTM) is a modified version of LSTM. In this work, the input layer of BiLSTM contains one forward layer with two LSTM neurons and one backward layer with two LSTM neurons. The remaining structure is the same as the proposed LSTM module.
- ARIMA [53]:Auto-regressive Integrated Moving Average (ARIMA) model is a widely used method for analyzing time-series data. In this work, the auto-regression degree is set at p = 50, the moving average degree is q = 1, and the order of differentiation is d = 5.
5.3. Performance Evaluation for DR-RQL
- RQL: This scheme utilizes the segmented modified -greedy strategy in (31) to update the Q-value table, training on the summer pattern dataset, i.e., 1–27 June 2019 for energy scheduling.
- QL [54]: This scheme is the vanilla version of the proposed method, utilizing the original -greedy strategy in (30) for action-selecting and training on the same dataset as RQL.
- Myopic [55]: This scheme pays more attention to current return and ignores the impact on future, as the EMS prefers to empty the battery storage while never guiding the battery to recharge only for maximizing the current reward.
- PSO [56]: Particle swarm optimization (PSO) is a heuristic optimization scheme widely used in microgrid scenarios for energy management. In this work, 100 random particles are generated in action space to evaluate the objective function P1. During each iteration, every particle adjusts its velocity and position to find the optimal solution. The maximum iteration is set at 1500, the inertia weight is w = 0.9, and the learning factor is = = 2.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Diagnostic Experiment with LSTM Prediction Module

Appendix B. Performance Evaluation with Error Bars

Appendix C. Parameter Experiments

References
- Zhang, H.; Zhang, G.; Zhao, M.; Liu, Y. Load Forecasting-Based Learning System for Energy Management With Battery Degradation Estimation: A Deep Reinforcement Learning Approach. IEEE Trans. Consum. Electron. 2024, 70, 2342–2352. [Google Scholar] [CrossRef]
- Paulsamy, K.; Karuvelam, S. Modeling and Design of a Grid-Tied Renewable Energy System Exploiting Re-Lift Luo Converter and RNN Based Energy Management. Sustainability 2025, 17, 187. [Google Scholar] [CrossRef]
- Wang, B.; Zha, Z.; Zhang, L.; Liu, L.; Fan, H. Deep Reinforcement Learning-Based Security-Constrained Battery Scheduling in Home Energy System. IEEE Trans. Consum. Electron. 2024, 70, 3548–3561. [Google Scholar] [CrossRef]
- Tan, C.; Liu, H.; Chen, L.; Wang, J.; Chen, X.; Wang, G. Characteristic analysis and model predictive-improved active disturbance rejection control of direct-drive electro-hydrostatic actuators. Expert Syst. Appl. 2026, 301, 130565. [Google Scholar] [CrossRef]
- Liu, H.; Zhou, S.; Gu, W.; Zhuang, W.; Gao, M.; Chan, C.C.; Zhang, X. Coordinated planning model for multi-regional ammonia industries leveraging hydrogen supply chain and power grid integration: A case study of Shandong. Appl. Energy 2025, 377, 124456. [Google Scholar] [CrossRef]
- Angelis, G.F.; Timplalexis, C.; Salamanis, A.I.; Krinidis, S.; Ioannidis, D.; Kehagias, D.; Tzovaras, D. Energformer: A New Transformer Model for Energy Disaggregation. IEEE Trans. Consum. Electron. 2023, 69, 308–320. [Google Scholar] [CrossRef]
- Bharatee, A.; Ray, P.K.; Ghosh, A. Hardware Design for Implementation of Energy Management in a Solar-Interfaced DC Microgrid. IEEE Trans. Consum. Electron. 2023, 69, 343–352. [Google Scholar] [CrossRef]
- Behera, P.K.; Pattnaik, M. Coordinated Power Management of a Laboratory Scale Wind Energy Assisted LVDC Microgrid With Hybrid Energy Storage System. IEEE Trans. Consum. Electron. 2023, 69, 467–477. [Google Scholar] [CrossRef]
- Yi, Y.; Zhang, G.; Jiang, H. Online Digital Twin-Empowered Content Resale Mechanism in Age of Information-Aware Edge Caching Networks. IEEE Trans. Commun. 2025, 73, 4990–5004. [Google Scholar] [CrossRef]
- Cui, Z.; Deng, K.; Zhang, H.; Zha, Z.; Jobaer, S. Deep Reinforcement Learning-Based Multi-Agent System with Advanced Actor–Critic Framework for Complex Environment. Mathematics 2025, 13, 754. [Google Scholar] [CrossRef]
- Das, D.; Singh, B.; Mishra, S. Grid Interactive Solar PV and Battery Operated Air Conditioning System: Energy Management and Power Quality Improvement. IEEE Trans. Consum. Electron. 2023, 69, 109–117. [Google Scholar] [CrossRef]
- Becchi, L.; Belloni, E.; Bindi, M.; Intravaia, M.; Grasso, F.; Lozito, G.M.; Piccirilli, M.C. A computationally efficient rule-based scheduling algorithm for battery energy storage systems. Sustainability 2024, 16, 10313. [Google Scholar] [CrossRef]
- Yi, Y.; Zhang, G.; Jiang, H. Mobile Edge Computing Networks: Online Low-Latency and Fresh Service Provisioning. IEEE Trans. Commun. 2025, 73, 11463–11479. [Google Scholar] [CrossRef]
- Song, L.; Hu, X.; Zhang, G.; Spachos, P.; Plataniotis, K.; Wu, H. Networking Systems of AI: On the Convergence of Computing and Communications. IEEE Internet Things J. 2022, 9, 20352–20381. [Google Scholar] [CrossRef]
- Harrold, D.J.; Cao, J.; Fan, Z. Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning. Appl. Energy 2022, 318, 119151. [Google Scholar] [CrossRef]
- Kim, J.; Oh, H.; Choi, J.K. Learning based cost optimal energy management model for campus microgrid systems. Appl. Energy 2022, 311, 118630. [Google Scholar] [CrossRef]
- Dai, R.; Esmaeilbeigi, R.; Charkhgard, H. The utilization of shared energy storage in energy systems: A comprehensive review. IEEE Trans. Smart Grid 2021, 12, 3163–3174. [Google Scholar] [CrossRef]
- Gianvincenzi, M.; Marconi, M.; Mosconi, E.M.; Favi, C.; Tola, F. Systematic review of battery life cycle management: A framework for European regulation compliance. Sustainability 2024, 16, 10026. [Google Scholar] [CrossRef]
- Jo, J.; Park, J. Demand-side management with shared energy storage system in smart grid. IEEE Trans. Smart Grid 2020, 11, 4466–4476. [Google Scholar] [CrossRef]
- Luo, Y.; Hao, H.; Yang, D.; Yin, Z.; Zhou, B. Optimal Operation Strategy of Combined Heat and Power System Considering Demand Response and Household Thermal Inertia. IEEE Trans. Consum. Electron. 2023, 69, 366–376. [Google Scholar] [CrossRef]
- Tong, X.; Ma, D.; Wang, R.; Xie, X.; Zhang, H. Dynamic Event-Triggered-Based Integral Reinforcement Learning Algorithm for Frequency Control of Microgrid With Stochastic Uncertainty. IEEE Trans. Consum. Electron. 2023, 69, 321–330. [Google Scholar] [CrossRef]
- Nawaz, A.; Zhou, M.; Wu, J.; Long, C. A comprehensive review on energy management, demand response, and coordination schemes utilization in multi-microgrids network. Appl. Energy 2022, 323, 119596. [Google Scholar] [CrossRef]
- Gupta, J.; Singh, B. A Cost Effective High Power Factor General Purpose Battery Charger for Electric Two-Wheelers and Three Wheelers. IEEE Trans. Consum. Electron. 2023, 69, 1114–1123. [Google Scholar] [CrossRef]
- Razmjoo, A.; Ghazanfari, A.; Østergaard, P.A.; Jahangiri, M.; Sumper, A.; Ahmadzadeh, S.; Eslamipoor, R. Moving toward the expansion of energy storage systems in renewable energy systems—A techno-institutional investigation with artificial intelligence consideration. Sustainability 2024, 16, 9926. [Google Scholar]
- Kumar, N.; Saxena, V.; Singh, B.; Panigrahi, B.K. Power Quality Improved Grid-Interfaced PV-Assisted Onboard EV Charging Infrastructure for Smart Households Consumers. IEEE Trans. Consum. Electron. 2023, 69, 1091–1100. [Google Scholar] [CrossRef]
- Zhang, G.; Shen, Z.; Wang, L. Online energy management for microgrids with CHP co-generation and energy storage. IEEE Trans. Control Syst. Technol. 2018, 28, 533–541. [Google Scholar]
- Shen, Z.; Wu, C.; Wang, L.; Zhang, G. Real-time energy management for microgrid with EV station and CHP generation. IEEE Trans. Netw. Sci. Eng. 2021, 8, 1492–1501. [Google Scholar] [CrossRef]
- Guo, C.; Wang, X.; Zheng, Y.; Zhang, F. Real-time optimal energy management of microgrid with uncertainties based on deep reinforcement learning. Energy 2022, 238, 121873. [Google Scholar] [CrossRef]
- Xu, B.; Oudalov, A.; Ulbig, A.; Andersson, G.; Kirschen, D.S. Modeling of lithium-ion battery degradation for cell life assessment. IEEE Trans. Smart Grid 2016, 9, 1131–1140. [Google Scholar] [CrossRef]
- Shi, Y.; Xu, B.; Tan, Y.; Kirschen, D.; Zhang, B. Optimal battery control under cycle aging mechanisms in pay for performance settings. IEEE Trans. Autom. Control 2018, 64, 2324–2339. [Google Scholar] [CrossRef]
- Xu, B. Dynamic valuation of battery lifetime. IEEE Trans. Power Syst. 2021, 37, 2177–2186. [Google Scholar] [CrossRef]
- Yu, M.; Hong, S.H.; Ding, Y.; Ye, X. An incentive-based demand response (DR) model considering composited DR resources. IEEE Trans. Ind. Electron. 2018, 66, 1488–1498. [Google Scholar] [CrossRef]
- Huang, J.; Zhang, H.; Zhao, M.; Wu, Z.; Liu, Y. Instance-Aware Visual Language Grounding for Consumer Robot Navigation. IEEE Trans. Consum. Electron. 2025; Early Access. [Google Scholar] [CrossRef]
- Shao, C.; Ding, Y.; Siano, P.; Song, Y. Optimal scheduling of the integrated electricity and natural gas systems considering the integrated demand response of energy hubs. IEEE Syst. J. 2020, 15, 4545–4553. [Google Scholar] [CrossRef]
- Li, H.; Wan, Z.; He, H. Real-Time Residential Demand Response. IEEE Trans. Smart Grid 2020, 11, 4144–4154. [Google Scholar] [CrossRef]
- Li, H.; He, H. Learning to Operate Distribution Networks With Safe Deep Reinforcement Learning. IEEE Trans. Smart Grid 2022, 13, 1860–1872. [Google Scholar] [CrossRef]
- Lu, R.; Hong, S.H.; Yu, M. Demand response for home energy management using reinforcement learning and artificial neural network. IEEE Trans. Smart Grid 2019, 10, 6629–6639. [Google Scholar] [CrossRef]
- Lu, R.; Hong, S.H. Incentive-based demand response for smart grid with reinforcement learning and deep neural network. Appl. Energy 2019, 236, 937–949. [Google Scholar] [CrossRef]
- Ruelens, F.; Claessens, B.J.; Quaiyum, S.; De Schutter, B.; Babuška, R.; Belmans, R. Reinforcement learning applied to an electric water heater: From theory to practice. IEEE Trans. Smart Grid 2016, 9, 3792–3800. [Google Scholar] [CrossRef]
- Ruelens, F.; Claessens, B.J.; Vandael, S.; De Schutter, B.; Babuška, R.; Belmans, R. Residential demand response of thermostatically controlled loads using batch reinforcement learning. IEEE Trans. Smart Grid 2016, 8, 2149–2159. [Google Scholar] [CrossRef]
- Li, L.L.; Lou, J.L.; Tseng, M.L.; Lim, M.K.; Tan, R.R. A hybrid dynamic economic environmental dispatch model for balancing operating costs and pollutant emissions in renewable energy: A novel improved mayfly algorithm. Expert Syst. Appl. 2022, 203, 117411. [Google Scholar] [CrossRef]
- Li, T.; Dong, M. Residential energy storage management with bidirectional energy control. IEEE Trans. Smart Grid 2018, 10, 3596–3611. [Google Scholar] [CrossRef]
- Alzahrani, A.; Sajjad, K.; Hafeez, G.; Murawwat, S.; Khan, S.; Khan, F.A. Real-time energy optimization and scheduling of buildings integrated with renewable microgrid. Appl. Energy 2023, 335, 120640. [Google Scholar] [CrossRef]
- Su, H.; Feng, D.; Zhou, Y.; Hao, X.; Yi, Y.; Li, K. Impact of uncertainty on optimal battery operation for price arbitrage and peak shaving: From perspectives of analytical solutions and examples. J. Energy Storage 2023, 62, 106909. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Nguyen, T.T.; Le, B. Artificial ecosystem optimization for optimizing of position and operational power of battery energy storage system on the distribution network considering distributed generations. Expert Syst. Appl. 2022, 208, 118127. [Google Scholar] [CrossRef]
- Ma, L.; Hu, C.; Cheng, F. State of charge and state of energy estimation for lithium-ion batteries based on a long short-term memory neural network. J. Energy Storage 2021, 37, 102440. [Google Scholar] [CrossRef]
- Guo, C.; Xu, Y.; Deng, N.; Huang, X. Efficient degradation of organophosphorus pesticides and in situ phosphate recovery via NiFe-LDH activated peroxymonosulfate. Chem. Eng. J. 2025, 524, 169107. [Google Scholar] [CrossRef]
- Shen, Z.; Zhang, G. Two-Timescale Mobile User Association and Hybrid Generator On/Off Control for Green Cellular Networks With Energy Storage. IEEE Trans. Veh. Technol. 2022, 71, 11047–11059. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, Y.; Lu, J.; Xiao, Y.; Zhang, G. Deep Reinforcement Learning Based Trajectory Design for Customized UAV-Aided NOMA Data Collection. IEEE Wirel. Commun. Lett. 2024, 13, 3365–3369. [Google Scholar] [CrossRef]
- Zhang, Y.; Lu, J.; Zhang, H.; Huang, Z.; Briso-Rodríguez, C.; Zhang, L. Experimental study on low-altitude UAV-to-ground propagation characteristics in campus environment. Comput. Netw. 2023, 237, 110055. [Google Scholar] [CrossRef]
- Zha, Z.; Wang, B.; Tang, X. Evaluate, explain, and explore the state more exactly: An improved Actor-Critic algorithm for complex environment. Neural Comput. Appl. 2021, 35, 12271–12282. [Google Scholar] [CrossRef]
- Liu, B.; Yu, Z.; Wang, Q.; Du, P.; Zhang, X. Prediction of SSE Shanghai Enterprises index based on bidirectional LSTM model of air pollutants. Expert Syst. Appl. 2022, 204, 117600. [Google Scholar] [CrossRef]
- Lee, C.M.; Ko, C.N. Short-term load forecasting using lifting scheme and ARIMA models. Expert Syst. Appl. 2011, 38, 5902–5911. [Google Scholar] [CrossRef]
- Ding, D.; Fan, X.; Zhao, Y.; Kang, K.; Yin, Q.; Zeng, J. Q-learning based dynamic task scheduling for energy-efficient cloud computing. Future Gener. Comput. Syst. 2020, 108, 361–371. [Google Scholar] [CrossRef]
- Chen, M.; Shen, Z.; Wang, L.; Zhang, G. Intelligent energy scheduling in renewable integrated microgrid with bidirectional electricity-to-hydrogen conversion. IEEE Trans. Netw. Sci. Eng. 2022, 9, 2212–2223. [Google Scholar] [CrossRef]
- Hossain, M.A.; Pota, H.R.; Squartini, S.; Abdou, A.F. Modified PSO algorithm for real-time energy management in grid-connected microgrids. Renew. Energy 2019, 136, 746–757. [Google Scholar] [CrossRef]






| Symbol | Definition |
|---|---|
| Energy demand at t | |
| Renewable generation at t | |
| Portion of renewable energy serving demand at t | |
| Portion of renewable energy sold to grid at t | |
| Energy-purchasing price from conventional grid at t | |
| Energy-selling price to the customer at t | |
| Charging energy stored into battery at t | |
| Discharging energy from battery serving demand at t | |
| State of energy of battery at t | |
| Cycle depth of battery at t | |
| State of health of battery at t | |
| Battery degradation cost at t | |
| Energy purchased from conventional grid at t | |
| EMS operation profit at t | |
| Charging efficiency of battery | |
| Discharging efficiency of battery | |
| Energy capacity of battery | |
| Battery back replacement cost per kWh |
| Parameter | Description | Value |
|---|---|---|
| The lower bound of battery SoE | 100 kWh | |
| The upper bound of battery SoE | 2000 kWh | |
| Battery back replacement cost | 500 $/kWh | |
| t | Time slot | 0.5 h |
| Learning rate | 0.001 | |
| Discounting factor | 0.95 | |
| Greedy value | 0.1 | |
| Training episode | 5000 | |
| Episode for exploration | 400 |
| Model | Computing Time (min) | ||
|---|---|---|---|
| LSTM | 2.34 | 8.69 | 6.5 |
| BiLSTM | 2.59 | 8.47 | 11.2 |
| ARIMA | 8.90 | 12.94 | 10.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Deng, K.; Zhang, H.; Cui, Z.; Zha, Z.; Gao, S.; Yan, S.; Hua, Y.; Liu, X.; Xu, S.; Wei, F.; et al. DR-RQL: A Sustainable Demand Response-Based Learning System for Energy Scheduling and Battery Health Estimation. Sustainability 2025, 17, 10970. https://doi.org/10.3390/su172410970
Deng K, Zhang H, Cui Z, Zha Z, Gao S, Yan S, Hua Y, Liu X, Xu S, Wei F, et al. DR-RQL: A Sustainable Demand Response-Based Learning System for Energy Scheduling and Battery Health Estimation. Sustainability. 2025; 17(24):10970. https://doi.org/10.3390/su172410970
Chicago/Turabian StyleDeng, Kailian, Hongtao Zhang, Zihao Cui, Zhongyi Zha, Shuyi Gao, Shuai Yan, Yicun Hua, Xiaojie Liu, Shaoxuan Xu, Fang Wei, and et al. 2025. "DR-RQL: A Sustainable Demand Response-Based Learning System for Energy Scheduling and Battery Health Estimation" Sustainability 17, no. 24: 10970. https://doi.org/10.3390/su172410970
APA StyleDeng, K., Zhang, H., Cui, Z., Zha, Z., Gao, S., Yan, S., Hua, Y., Liu, X., Xu, S., Wei, F., Chen, G., & Liu, X. (2025). DR-RQL: A Sustainable Demand Response-Based Learning System for Energy Scheduling and Battery Health Estimation. Sustainability, 17(24), 10970. https://doi.org/10.3390/su172410970

