Comparative Analysis of Reinforcement Learning Approaches for Multi-Objective Optimization in Residential Hybrid Energy Systems
Abstract
:1. Introduction
1.1. Background
1.2. Related Work
1.3. Contributions
- Using measured data of a zero energy house, we proposed a novel multi-objective optimization algorithm for residential hybrid energy system operations based on MDPs with high-dimensional action spaces and evaluated optimization performances of various DRL algorithms, including TD3, DDPG, SAC, and PPO.
- Regarding system constraints, we developed a new multi-objective optimization reward function that guarantees optimizing goals, specifically reducing system energy costs and increasing the ratio of PV self-consumption. Furthermore, we have optimized the environment model and reward function by incorporating expert experience, which enhanced data utilization and improved the model’s adaptability to small-sample data.
- All cases in this study used simulated dynamic COP and RTP as experimental conditions. In addition, we tested the effect of the DRL models on a floating FiT scenario. These findings provide valuable insights into the practical application of DRL, offering practical implications for integrating dynamic pricing mechanisms and renewable energy incentives into real energy systems.
2. Model Formulation
2.1. Energy Management Optimization Model
2.1.1. Objective Function
2.1.2. Energy Balance Constraints
2.1.3. Battery Constraints
2.1.4. Heat Pump Constraints
2.2. Markov Decision Process
2.2.1. State Space
- Energy features: Through latent pattern analysis of the data (see Section 4.1), we identified that residential users’ PV generation, electricity demand, heat demand, and electricity prices exhibit periodicity in their time series. To facilitate the agent in learning the underlying rules by capturing these patterns, we designed a sliding time window of 24 steps (12 h).
- Time series features: the time of day (), the day of the month ().
- Environmental features: outdoor temperature (), illumination ().
- Episode step: the present time slot’s location within the optimization window ().
2.2.2. Action Space
2.2.3. Reward Function
3. Algorithm
3.1. The Selection of the Algorithms
3.2. Deep Deterministic Policy Gradient (DDPG)
3.3. Twin Delayed Deep Deterministic Policy Gradient (TD3)
3.4. Soft Actor-Critic (SAC) Method
4. Case Study
4.1. Data Source
4.2. Environment Setup
- For the battery, the agent controls the charging and discharging power. When PV generation exceeds the electricity and thermal demand, and the battery has available charging capacity, the system prioritizes storing the excess PV in the battery, and the extra remainder is then supplied to the public grid for profit. Conversely, when PV generation falls short of meeting the electricity and thermal demands, and the battery has available discharge capacity, and the system prioritizes discharging the battery. Any remaining shortfall in power is then purchased from the public grid. If neither of the above conditions is met, the battery will remain inactive, refraining from any action.
- For the heat pump, the agent controls the power. When the heat pump power exceeds the thermal demand, if the capacity of the hot water tank is less than the maximum thermal holding capacity, the excess thermal is flushed into the hot water tank. Otherwise, the excess hot water is discarded, and the agent is penalized for exceeding the limit. When the heat pump power is less than the thermal demand, the tank releases hot water if the excess hot water can satisfy the remaining thermal demand. If the excess hot water is insufficient, the electric water heater is activated to supplement the thermal, while the agent incurs a penalty for exceeding the limit.
4.3. Model Setup
- M.0: It serves as the baseline, accurately reflecting the real-world usage state of the user. The HEMS currently employed in the target house operates on a rule-based control approach. Specifically, the battery and the heat pump operate at full power without real-time power control, following the constraints outlined in Section 4.2. The heat pump’s operation schedule is determined based on the user’s actual usage, with fixed full-power operation scheduled daily from 4 a.m. to 7 a.m. to refill the hot water tank. The heat pump provides thermal demand through real-time heating during the remaining time.
- M.1: It utilizes the PPO as the optimization approach, representing an on-policy DRL method for comparison.
- M.2: It adopts the SAC as the optimization approach.
- M.3: It adopts the DDPG as the optimization approach.
- M.4: It adopts the TD3 as the optimization approach. Notably, M3 and M4 utilize identical hyperparameters for comparative purposes.
4.4. Experimental Setup
5. Results and Discussion
5.1. Training Process Analysis
5.2. Performance Evaluation
5.2.1. Energy Cost Optimization
5.2.2. PV Self-Consumption Ratio Optimization
5.2.3. Comparison of Operation Strategy
5.2.4. Effects of FiT on Optimization
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Henderson, P.; Islam, R.; Bachman, P.; Pineau, J.; Precup, D.; Meger, D. Deep Reinforcement Learning That Matters. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
- Ahmed, A.; Ge, T.; Peng, J. Assessment of the renewable energy generation towards net-zero energy buildings: A review. Energy Build. 2022, 256, 111755. [Google Scholar] [CrossRef]
- Balasubramanian, C.; Lal Raja Singh, R. IOT based energy management in smart grid under price based demand response based on hybrid FHO-RERNN approach. Appl. Energy 2024, 361, 122851. [Google Scholar] [CrossRef]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym[A]. arXiv 2016, arXiv:1606.01540. [Google Scholar]
- Cai, Q.; Qing, J.; Xu, Q. Techno-economic impact of electricity price mechanism and demand response on residential rooftop photovoltaic integration. Renew. Sustain. Energy Rev. 2024, 189, 113964. [Google Scholar] [CrossRef]
- Chowdhury, M.A.; Al-Wahaibi, S.S.; Lu, Q. Entropy-maximizing TD3-based reinforcement learning for adaptive PID control of dynamical systems. Comput. Chem. Eng. 2023, 178, 108393. [Google Scholar] [CrossRef]
- Ding, B.; Li, Z.; Li, Z. A CCP-based distributed cooperative operation strategy for multi-agent energy systems integrated with wind, solar, and buildings. Appl. Energy 2024, 365, 123275. [Google Scholar] [CrossRef]
- Duryea, E.; Ganger, M.; Wei, H. Deep Reinforcement Learning with Double Q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Fujimoto, S.; Van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm Sweden, 10–15 July 2018; Volume 80, pp. 1587–1596. [Google Scholar]
- Gao, Y.; Hu, Z.; Shi, S. Adversarial discriminative domain adaptation for solar radiation prediction: A cross-regional study for zero-label transfer learning in Japan. Appl. Energy 2024, 359, 122685. [Google Scholar] [CrossRef]
- Ghaderi, R.; Kandidayeni, M.; Boulon, L. Q-learning based energy management strategy for a hybrid multi-stack fuel cell system considering degradation. Energy Convers. Manag. 2023, 293, 117524. [Google Scholar] [CrossRef]
- Hou, H.; Ge, X.; Chen, Y. Model-free dynamic management strategy for low-carbon home energy based on deep reinforcement learning accommodating stochastic environments. Energy Build. 2023, 278, 112594. [Google Scholar] [CrossRef]
- Huang, R.; He, H.; Zhao, X. Battery health-aware and naturalistic data-driven energy management for hybrid electric bus based on TD3 deep reinforcement learning algorithm. Appl. Energy 2022, 321, 119353. [Google Scholar] [CrossRef]
- Jia, C.; Li, K.; He, H. Health-aware energy management strategy for fuel cell hybrid bus considering air-conditioning control based on TD3 algorithm. Energy 2023, 283, 128462. [Google Scholar] [CrossRef]
- Jiang, K.; Wang, K.; Wu, C. Trajectory simulation and optimization for interactive electricity-carbon system evolution. Appl. Energy 2024, 360, 122808. [Google Scholar] [CrossRef]
- Kim, D.; Wang, Z.; Brugger, J. Site demonstration and performance evaluation of MPC for a large chiller plant with TES for renewable energy integration and grid decarbonization. Appl. Energy 2022, 321, 119343. [Google Scholar] [CrossRef]
- Kontokosta, C.E.; Spiegel-Feld, D.; Papadopoulos, S. The impact of mandatory energy audits on building energy use. Nat. Energy 2020, 5, 309–316. [Google Scholar] [CrossRef]
- Langer, L.; Volling, T. A reinforcement learning approach to home energy management for modulating heat pumps and photovoltaic systems. Appl. Energy 2022, 327, 120020. [Google Scholar] [CrossRef]
- Li, Q.; Zhang, M.; Shen, Y. A hierarchical deep reinforcement learning model with expert prior knowledge for intelligent penetration testing. Comput. Secur. 2023, 132, 103358. [Google Scholar] [CrossRef]
- Li, Y.; Ding, Y.; He, S. Artificial intelligence-based methods for renewable power system operation. Nat. Rev. Electr. Eng. 2024, 1, 163–179. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, X.; Xiao, F. Modeling and management performances of distributed energy resource for demand flexibility in Japanese zero energy house. Build. Simul. 2023, 16, 2177–2192. [Google Scholar] [CrossRef]
- Liang, T.; Chai, L.; Cao, X. Real-time optimization of large-scale hydrogen production systems using off-grid renewable energy: Scheduling strategy based on deep reinforcement learning. Renew. Energy 2024, 224, 120177. [Google Scholar] [CrossRef]
- Liu, X.; Liu, J.; Ren, K. An integrated fuzzy multi-energy transaction evaluation approach for energy internet markets considering judgement credibility and variable rough precision. Energy 2022, 261, 125327. [Google Scholar] [CrossRef]
- Lyu, J.; Wan, L.; Li, X. Off-policy RL algorithms can be sample-efficient for continuous control via sample multiple reuse. Inf. Sci. 2024, 666, 120371. [Google Scholar] [CrossRef]
- Mahmud, K.; Khan, B.; Ravishankar, J. An internet of energy framework with distributed energy resources, prosumers and small-scale virtual power plants: An overview. Renew. Sustain. Energy Rev. 2020, 127, 109840. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Pan, C.; Jia, Z.; Wang, J. Optimization of liquid cooling heat dissipation control strategy for electric vehicle power batteries based on linear time-varying model predictive control. Energy 2023, 283, 129099. [Google Scholar] [CrossRef]
- Pang, X.; Wang, Y.; Yu, Y. Optimal scheduling of a cogeneration system via Q-learning-based memetic algorithm considering demand-side response. Energy 2024, 300, 131513. [Google Scholar] [CrossRef]
- Park, K.; Moon, I. Multi-agent deep reinforcement learning approach for EV charging scheduling in a smart grid. Appl. Energy 2022, 328, 120111. [Google Scholar] [CrossRef]
- Patel, I.; Shah, A.; Shen, B. Stochastic optimisation and economic analysis of combined high temperature superconducting magnet and hydrogen energy storage system for smart grid applications. Appl. Energy 2023, 341, 121070. [Google Scholar] [CrossRef]
- Raffin, A.; Hill, A.; Gleave, A. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
- Ren, K.; Liu, J.; Wu, Z. A data-driven DRL-based home energy management system optimization framework considering uncertain household parameters. Appl. Energy 2024, 355, 122258. [Google Scholar] [CrossRef]
- Ruan, Y.; Liang, Z.; Qian, F. Operation strategy optimization of combined cooling, heating, and power systems with energy storage and renewable energy based on deep reinforcement learning. J. Build. Eng. 2023, 65, 105682. [Google Scholar] [CrossRef]
- Sharma, S.; Xu, Y.; Verma, A. Time-Coordinated Multienergy Management of Smart Buildings Under Uncertainties. IEEE Trans. Ind. Inform. 2019, 15, 4788–4798. [Google Scholar] [CrossRef]
- Saloux, E.; Runge, J.; Zhang, K. Operation optimization of multi-boiler district heating systems using artificial intelligence-based model predictive control: Field demonstrations. Energy 2023, 285, 129524. [Google Scholar] [CrossRef]
- Sinha, A.; Ghosh, V.; Hussain, N. Green financing of renewable energy generation: Capturing the role of exogenous moderation for ensuring sustainable development. Energy Econ. 2023, 126, 107021. [Google Scholar] [CrossRef]
- Wang, C.; Zhang, J.; Wang, A. Prioritized sum-tree experience replay TD3 DRL-based online energy management of a residential microgrid. Appl. Energy 2024, 368, 123471. [Google Scholar] [CrossRef]
- Wang, M.; Lin, B. MF^2: Model-free reinforcement learning for modeling-free building HVAC control with data-driven environment construction in a residential building. Build. Environ. 2023, 244, 110816. [Google Scholar] [CrossRef]
- Wang, Z.; Xiao, F.; Ran, Y. Scalable energy management approach of residential hybrid energy system using multi-agent deep reinforcement learning. Appl. Energy 2024, 367, 123414. [Google Scholar] [CrossRef]
- Wu, J.; He, H.; Peng, J. Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus. Appl. Energy 2018, 222, 799–811. [Google Scholar] [CrossRef]
- Xiao, H.; Fu, L.; Shang, C. Ship energy scheduling with DQN-CE algorithm combining bi-directional LSTM and attention mechanism. Appl. Energy 2023, 347, 121378. [Google Scholar] [CrossRef]
- Zhang, X.; Li, Y.; Xiao, F. Energy efficiency measures towards decarbonizing Japanese residential sector: Techniques, application evidence and future perspectives. Energy Build. 2024, 319, 114514. [Google Scholar] [CrossRef]
- Zhang, X.; Xiao, F.; Li, Y. Flexible coupling and grid-responsive scheduling assessments of distributed energy resources within existing zero energy houses. J. Build. Eng. 2024, 87, 109047. [Google Scholar] [CrossRef]
- Zhang, X.; Xiao, F.; Li, Y. Energy flexibility and resilience analysis of demand-side energy efficiency measures within existing residential houses during cold wave event. Build. Simul. 2024, 17, 1043–1063. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, C.; Fan, R. Energy management strategy for fuel cell vehicles via soft actor-critic-based deep reinforcement learning considering powertrain thermal and durability characteristics. Energy Convers. Manag. 2023, 283, 116921. [Google Scholar] [CrossRef]
- Zhang, Z.; Yang, Z.; Yau, D.K.; Tian, Y.; Ma, J. Data security of machine learning applied in low-carbon smart grid: A formal model for the physics-constrained robustness. Appl. Energy 2023, 347, 121405. [Google Scholar] [CrossRef]
Algorithm | DQN | DDQN | SAC | A3C | DDPG | PPO | TD3 |
---|---|---|---|---|---|---|---|
Category | Value-based | Value-based | Actor-critic | Actor-critic | Actor-critic | Actor-critic | Actor-critic |
Data Utilization | Off-policy | Off-policy | Off-policy | On-policy | Off-policy | On-policy | Off-policy |
Action Space | Discrete | Discrete | Continuous | Discrete/Continuous | Continuous | Discrete/Continuous | Continuous |
Parameter | Descriptions | Value |
---|---|---|
Power dissipation rate | 0.01 | |
Power charging efficiency | 0.95 | |
Power discharging efficiency | 0.95 | |
Energy storage capacity of the battery | 6.0 kWh | |
Peak charging power rate | 0.75 kWh | |
Peak discharging power rate | 0.75 kWh | |
Minimum levels for battery SoC | 0.20 | |
Maximum levels for battery SoC | 0.90 | |
Thermal energy dissipation rate | 0.20 | |
Charging efficiency of thermal energy | 0.90 | |
Discharging efficiency of thermal energy | 0.90 | |
Thermal energy storage capacity | 20.0 kWh | |
Lowest thermal energy generation | 0 | |
Highest thermal energy generation | 3.0 kW | |
Maximum thermal energy input | 3.0 kW | |
Minimum thermal energy input | 0 |
Category | M.1 (PPO) | M.2 (SAC) | M.3 (DDPG) | M.4 (TD3) |
---|---|---|---|---|
Activation function | Tanh | Relu | Relu | Relu |
Learning rate | 2 × 10−4 | 2 × 10−4 | 2 × 10−4 | 2 × 10−4 |
Batch size | 256 | 256 | 256 | 256 |
Replay memory capacity | 106 | 106 | 106 | 106 |
Discount factor | 0.99 | 0.99 | 0.99 | 0.99 |
Hidden Layer Dimensions | 64 | 256 | 256 | 256 |
Trace-decay parameter | 0.95 | None | None | None |
Polyak averaging | None | 5 × 10−3 | None | None |
Delay steps in TD3 | None | None | None | 2 |
Baseline | PPO | SAC | DDPG | TD3 | ||
---|---|---|---|---|---|---|
January | Cost (JPY) | 23,003.61 | 21,253.36 | 20,775.69 | 20,737.13 | 20,591.87 |
VS Baseline | 7.61% | 9.69% | 9.85% | 10.48% | ||
April | Cost (JPY) | 14,565.58 | 12,474.97 | 12,670.92 | 12,164.27 | 12,370.42 |
VS Baseline | 14.35% | 13.01% | 16.49% | 15.07% | ||
July | Cost (JPY) | 10,201.38 | 9074.35 | 8709.357 | 8531.925 | 8219.691 |
VS Baseline | 11.05% | 14.63% | 16.37% | 19.43% | ||
Total | Cost (JPY) | 47,770.57 | 42,802.68 | 42,155.97 | 41,433.32 | 41,181.98 |
VS Baseline | 10.40% | 11.75% | 13.27% | 13.79% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, Y.; Li, Y.; Gao, W. Comparative Analysis of Reinforcement Learning Approaches for Multi-Objective Optimization in Residential Hybrid Energy Systems. Buildings 2024, 14, 2645. https://doi.org/10.3390/buildings14092645
Xu Y, Li Y, Gao W. Comparative Analysis of Reinforcement Learning Approaches for Multi-Objective Optimization in Residential Hybrid Energy Systems. Buildings. 2024; 14(9):2645. https://doi.org/10.3390/buildings14092645
Chicago/Turabian StyleXu, Yang, Yanxue Li, and Weijun Gao. 2024. "Comparative Analysis of Reinforcement Learning Approaches for Multi-Objective Optimization in Residential Hybrid Energy Systems" Buildings 14, no. 9: 2645. https://doi.org/10.3390/buildings14092645
APA StyleXu, Y., Li, Y., & Gao, W. (2024). Comparative Analysis of Reinforcement Learning Approaches for Multi-Objective Optimization in Residential Hybrid Energy Systems. Buildings, 14(9), 2645. https://doi.org/10.3390/buildings14092645