Design and Improvement of SD3-Based Energy Management Strategy for a Hybrid Electric Urban Bus
Abstract
:1. Introduction
- The current EMS field lacks research related to the SD3 algorithm, and, to the authors’ knowledge, this is a pioneering research work related to the SD3-based strategy;
- In order to prevent the SD3-based strategy from outputting torque assignments that do not adhere to the physical limits of the powertrain system in stochastic exploration, an action masking method that does not go against the algorithmic concept is proposed. This work can drive DRL-based strategies toward engineering applications and be an inspiration for the improvement of other DRL algorithms;
- The possibility of utilizing TL methods to quicken SD3 learning is investigated. That is, part of the prior knowledge of the already converged strategy is migrated to the new driving cycle for initializing the new strategy, which can avoid the cold start of the new strategy in the new environment.
2. Powertrain Modeling and Energy Management Problem
2.1. Vehicle Dynamics Model
2.2. Power Units Model
2.3. Energy Management Problem
3. Methods and Design of SD3-Based EMS
3.1. Preliminary Formulation of SD3-Based EMS
3.1.1. Brief Review of SD3
| Algorithm 1: SD3 | 
| Initialize critic networks , , and actor networks , with random parameters , , , | 
| Initialize target networks , , , and | 
| Initialize replay buffer | 
| forepisode = 1 to E do | 
| for t = 1 to T do | 
| Observe state s and select action a with exploration noise according to dual-actor: | 
| Execute action a, observe reward r, next state s’, and done d | 
| Store transition tuple (s, a, r, s’, d) in | 
| for i = 1, 2 do | 
| Randomly sample a mini-batch of N transitions from | 
| Update the critic according to Bellman loss: | 
| Update actor by policy gradient: | 
| Update target networks: | 
| end for | 
| end for | 
| end for | 
3.1.2. Reward, Observation, Action, and Parameters Setting
3.2. Tips for Improving SD3-Based Strategy
3.2.1. Action Masking Technology
- At each time step t, the is first calculated by the following three steps: (1) The action space is discretized to obtain ; (2) Calculate the and by traversing according to the dynamics of the powertrain system (similar to ECMS); and (3) Obtain ;
- Then, the ICE power output from the actor network in the SD3-based strategy is restricted to by the clip operation, i.e., , since the clip operation does not change the original action space and thus does not have any effect on the policy.
3.2.2. Transfer Learning Technology
- The first step is to extract the parameters of the actor networks and critic networks of the SD3-based strategy that have been sufficiently converged in ;
- Then, the extracted parameters are used to initialize the parameters of the corresponding networks of the SD3-based strategy in , and freezing is implemented for the input and intermediate layers of the networks;
- Finally, the output layers of the networks are randomly initialized, and the networks are fine-tuned by a small amount of training.
4. Results and Discussion
4.1. Performance of SD3-Based Strategy
4.1.1. Convergence Performance
4.1.2. Control Performance
4.2. Impact of Action Masking
4.3. Impact of Transfer Learning on SD3-Based Strategy
5. Conclusions
- The proposed AM technique can effectively filter invalid actions without affecting the learning performance or stability of SD3. Under the CHTC-B and WVUCITY cycles, SD3 with AM has a faster convergence speed than SD3 without AM, and its fuel economy can reach at least 98.94% of that of DP.
- The TL technique can considerably accelerate the learning rate of SD3. Under the CHTC-B and WVUCITY cycles, the learning time of SD3 with TL is at least 67.61% less than that of SD3 without TL. Moreover, TL has almost no impact on the final control performance and economic performance of SD3.
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Wang, Y.; Jiao, X. Dual Heuristic Dynamic Programming Based Energy Management Control for Hybrid Electric Vehicles. Energies 2022, 15, 3235. [Google Scholar] [CrossRef]
- Li, X.; Evangelou, S. Torque-leveling threshold-changing rule-based control for parallel hybrid electric vehicles. IEEE Trans. Veh. Technol. 2019, 68, 6509–6523. [Google Scholar] [CrossRef]
- Shi, D.; Liu, S.; Cai, Y.; Wang, S.; Li, H.; Chen, L. Pontryagin’s minimum principle based fuzzy adaptive energy management for hybrid electric vehicle using real-time traffic information. Appl. Energy 2021, 286, 116467. [Google Scholar] [CrossRef]
- Lin, C.; Peng, H.; Grizzle, J.; Kang, J. Power management strategy for a parallel hybrid electric truck. IEEE Trans. Control Syst. Technol. 2003, 11, 839–849. [Google Scholar]
- Liu, T.; Hu, X.; Hu, W.; Zou, Y. A Heuristic Planning Reinforcement Learning-Based Energy Management for Power-Split Plug-in Hybrid Electric Vehicles. IEEE Trans. Ind. Inform. 2019, 15, 6436–6445. [Google Scholar] [CrossRef]
- Tian, X.; He, R.; Sun, X.; Cai, Y.; Xu, Y. An ANFIS-based ECMS for energy optimization of parallel hybrid electric bus. IEEE Trans. Veh. Technol. 2020, 69, 1473–1483. [Google Scholar] [CrossRef]
- Yuan, H.; Zou, W.; Jung, S.; Kim, Y. Optimized rule-based energy management for a polymer electrolyte membrane fuel cell/battery hybrid power system using a genetic algorithm. Int. J. Hydrogen Energy 2022, 47, 7932–7948. [Google Scholar] [CrossRef]
- Zhou, S.; Liu, X.; Hua, Y.; Zhou, X.; Yang, S. Adaptive model parameter identification for lithium-ion batteries based on improved coupling hybrid adaptive particle swarm optimization-simulated annealing method. J. Power Sources 2021, 482, 228951. [Google Scholar] [CrossRef]
- Du, C.; Huang, S.; Jiang, Y.; Wu, D.; Li, Y. Optimization of Energy Management Strategy for Fuel Cell Hybrid Electric Vehicles Based on Dynamic Programming. Energies 2022, 15, 4325. [Google Scholar] [CrossRef]
- Li, L.; Yang, C.; Zhang, Y.; Zhang, L.; Song, J. Correctional DP-based energy management strategy of plug-in hybrid electric bus for city-bus route. IEEE Trans. Veh. Technol. 2014, 64, 2792–2803. [Google Scholar] [CrossRef]
- Paganelli, G.; Guerra, T.; Delprat, S.; Santin, J.; Delhom, M.; Combes, E. Simulation and assessment of power control strategies for a parallel hybrid car. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2000, 214, 705–717. [Google Scholar] [CrossRef]
- Musardo, C.; Rizzoni, G.; Staccia, B. A-ECMS: An adaptive algorithm for hybrid electric vehicle energy management. Eur. J. Control 2005, 11, 509–524. [Google Scholar] [CrossRef]
- Yang, X.; Yang, R.; Tan, S.; Yu, X.; Fang, L. MPGA-based-ECMS for energy optimization of a hybrid electric city bus with dual planetary gear. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2022, 236, 1889–1909. [Google Scholar] [CrossRef]
- Yang, R.; Yang, X.; Huang, W.; Zhang, S. Energy management of the power-split hybrid electric city bus based on the stochastic model predictive control. IEEE Access. 2020, 9, 2055–2071. [Google Scholar] [CrossRef]
- Zhang, H.; Peng, J.; Tan, H.; Dong, H.; Ding, F. A deep reinforcement learning-based energy management framework with lagrangian relaxation for plug-in hybrid electric vehicle. IEEE Trans. Transp. Electrif. 2021, 7, 1146–1160. [Google Scholar] [CrossRef]
- Zhou, Q.; Zhao, D.; Shuai, B.; Li, Y.; Williams, H.; Xu, H. Knowledge implementation and transfer with an adaptive learning network for real-time power management of the plug-in hybrid vehicle. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5298–5308. [Google Scholar] [CrossRef]
- He, H.; Wang, Y.; Li, J.; Dou, J.; Lian, R.; Li, Y. An improved energy management strategy for hybrid electric vehicles integrating multistates of vehicle-traffic information. IEEE Trans. Transp. Electrif. 2021, 7, 1161–1172. [Google Scholar] [CrossRef]
- Liu, T.; Tan, W.; Tang, X.; Zhang, J.; Xing, Y.; Cao, D. Driving conditions-driven energy management strategies for hybrid electric vehicles: A review. Renew. Sust. Energ. Rev. 2021, 151, 111521. [Google Scholar] [CrossRef]
- Wang, K.; Yang, R.; Huang, W.; Mo, J.; Zhang, S. Deep reinforcement learning-based energy management strategies for energy-efficient driving of hybrid electric buses. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2022. [Google Scholar] [CrossRef]
- Wu, J.; He, H.; Peng, J.; Li, Y.; Li, Z. Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus. Appl. Energy 2018, 222, 799–811. [Google Scholar] [CrossRef]
- Han, X.; He, H.; Wu, J.; Peng, J.; Li, Y. Energy management based on reinforcement learning with double deep Q-learning for a hybrid electric tracked vehicle. Appl. Energy 2019, 254, 113708. [Google Scholar] [CrossRef]
- Qi, X.; Luo, Y.; Wu, G.; Boriboonsomsin, K.; Barth, M. Deep reinforcement learning enabled self-learning control for energy efficient driving. Transp. Res. Part C Emerg. Technol. 2019, 99, 67–81. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.; Veness, J.; Bellemare, M.; Graves, A.; Riedmiller, M.; Fidjeland, A.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Lian, R.; Peng, J.; Wu, Y.; Tan, H.; Zhang, H. Rule-interposing deep reinforcement learning based energy management strategy for power-split hybrid electric vehicle. Energy 2020, 197, 117297. [Google Scholar] [CrossRef]
- Liessner, R.; Schmitt, J.; Dietermann, A.; Baker, B. Hyperparameter optimization for deep reinforcement learning in vehicle energy management. In Proceedings of the 11th International Conference on Agents and Artificial Intelligence, Prague, Czech Republic, 19–21 February 2019; pp. 134–144. [Google Scholar]
- Zhang, S.; Wang, K.; Yang, R.; Huang, W. Research on energy management strategy for hybrid electric bus based on deep reinforcement learning. Chin. Intern. Combust. Engine Eng. 2021, 42, 10–16. [Google Scholar]
- Xiao, B.; Yang, W.; Wu, J.; Walker, P.; Zhang, N. Energy management strategy via maximum entropy reinforcement learning for an extended range logistics vehicle. Energy 2022, 253, 124105. [Google Scholar] [CrossRef]
- Wang, Y.; Tan, H.; Wu, Y.; Peng, J. Hybrid electric vehicle energy management with computer vision and deep reinforcement learning. IEEE Trans. Ind. Inform. 2020, 17, 3857–3868. [Google Scholar] [CrossRef]
- Li, Y.; He, H.; Peng, J.; Wang, H. Deep reinforcement learning-based energy management for a series hybrid electric vehicle enabled by history cumulative trip information. IEEE Trans. Veh. Technol. 2019, 68, 7416–7430. [Google Scholar] [CrossRef]
- Zhou, J.; Xue, S.; Xue, Y.; Liao, Y.; Liu, J.; Zhao, W. A novel energy management strategy of hybrid electric vehicle via an improved TD3 deep reinforcement learning. Energy 2021, 224, 120118. [Google Scholar] [CrossRef]
- Wu, J.; Wei, Z.; Li, W.; Wang, Y.; Li, Y.; Sauer, D. Battery thermal-and health-constrained energy management for hybrid electric bus based on soft actor-critic DRL algorithm. IEEE Trans. Ind. Inform. 2020, 17, 3751–3761. [Google Scholar] [CrossRef]
- Pan, L.; Cai, Q.; Huang, L. Softmax deep double deterministic policy gradients. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 5–12 December 2020; pp. 11767–11777. [Google Scholar]
- He, H.; Huang, R.; Meng, X.; Zhao, X.; Wang, Y.; Li, M. A novel hierarchical predictive energy management strategy for plug-in hybrid electric bus combined with deep deterministic policy gradient. J. Energy Storag. 2022, 52, 104787. [Google Scholar] [CrossRef]
- Li, M.; Yan, M.; He, H.; Peng, J. Data-driven predictive energy management and emission optimization for hybrid electric buses considering speed and passengers prediction. J. CLean. Prod. 2021, 304, 127139. [Google Scholar] [CrossRef]
- De Santis, M.; Agnelli, S.; Patanè, F.; Giannini, O.; Bella, G. Experimental Study for the Assessment of the Measurement Uncertainty Associated with Electric Powertrain Efficiency Using the Back-to-Back Direct Method. Energies 2018, 11, 3536. [Google Scholar] [CrossRef]
- Li, M.; He, H.; Feng, L.; Chen, Y.; Yan, M. Hierarchical predictive energy management of hybrid electric buses based on driver information. J. Clean. Prod. 2020, 269, 122374. [Google Scholar] [CrossRef]
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Hasselt, H. Double q-learning. In Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, BC, Canada, 6–9 December 2010; pp. 2613–2621. [Google Scholar]
- Xu, D.; Cui, Y.; Ye, J.; Cha, S.; Li, A.; Zheng, C. A soft actor-critic-based energy management strategy for electric vehicles with hybrid energy storage systems. J. Power Sources 2022, 524, 231099. [Google Scholar] [CrossRef]
- Lu, C.; Hu, F.; Cao, D.; Gong, J.; Xing, Y.; Li, Z. Virtual-to-real knowledge transfer for driving behavior recognition: Framework and a case study. IEEE Trans. Veh. Technol. 2019, 68, 6391–6402. [Google Scholar] [CrossRef]
- Lu, C.; Hu, F.; Cao, D.; Gong, J.; Xing, Y.; Li, Z. Transfer learning for driver model adaptation in lane-changing scenarios using manifold alignment. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3281–3293. [Google Scholar] [CrossRef]
- Wang, Y.; Zeng, X.; Song, D.; Yang, N. Optimal rule design methodology for energy management strategy of a power-split hybrid electric bus. Energy 2019, 185, 1086–1099. [Google Scholar] [CrossRef]




















| Component | Parameter (Unit) | Value | 
|---|---|---|
| Vehicle | Vehicle curb mass (kg) | 12,780 | 
| Wheel radius (m) | 0.464 | |
| Frontal area (m2) | 7.5 | |
| Wind resistance coefficient | 0.52 | |
| Rolling resistance coefficient | 0.0076 | |
| ICE | Maximum power (kW) | 160 | 
| Maximum torque (Nm) | 851 | |
| Maximum speed (rpm) | 2300 | |
| MG1/MG2 | Maximum power (kW) | 71.2/147 | 
| Maximum torque (Nm) | 400/700 | |
| Maximum speed (rpm) | 5000/8500 | |
| Transmission final drive | PG1/PG2 characteristic parameter | 2.11/2.11 | 
| Final drive | 4.85 | |
| Battery | Nominal voltage (V) | 432 | 
| Nominal capacity (Ah) | 52 | 
| Parameters | Value | Parameters | Value | 
|---|---|---|---|
| Critic learn-rate | 1 × 10−4 | 0.005 | |
| Actor learn-rate | 5 × 10−5 | Experience buffer size | 1 × 106 | 
| 0.99 | Mini-batch size | 128 | 
| Driving Cycle | Mean | Variance | 
|---|---|---|
| CHTC-B | 0.5998 | 0.0017 | 
| WVUCITY | 0.5989 | 0.0016 | 
| Driving Cycle | Method | Terminal SOC | Fuel Consumption (L/100 km) | Relative Increase (%) | 
|---|---|---|---|---|
| CHTC-B | DP | 0.5997 | 15.11 | - | 
| SD3 | 0.5998 | 15.27 | 1.06 | |
| WVUCITY | DP | 0.5994 | 15.39 | - | 
| SD3 | 0.5991 | 15.49 | 0.65 | 
| Driving Cycle | Method | Terminal SOC | Fuel Consumption (L/100 km) | Relative Increase (%) | 
|---|---|---|---|---|
| CHTC-B | DP-AM | 0.5997 | 14.87 | - | 
| SD3-AM | 0.5998 | 14.97 | 0.67 | |
| WVUCITY | DP-AM | 0.5994 | 14.95 | - | 
| SD3-AM | 0.5991 | 15.03 | 0.54 | 
| Driving Cycle | Method | Terminal SOC | Fuel Consumption (L/100 km) | Relative Increase (%) | 
|---|---|---|---|---|
| CHTC-B | DP | 0.5997 | 15.11 | - | 
| SD3 | 0.5998 | 15.27 | 1.06 | |
| SD3+TL | 0.5996 | 15.18 | 0.05 | |
| WVUCITY | DP | 0.5994 | 15.39 | - | 
| SD3 | 0.5991 | 15.49 | 0.65 | |
| SD3+TL | 0.6003 | 15.61 | 1.43 | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, K.; Yang, R.; Zhou, Y.; Huang, W.; Zhang, S. Design and Improvement of SD3-Based Energy Management Strategy for a Hybrid Electric Urban Bus. Energies 2022, 15, 5878. https://doi.org/10.3390/en15165878
Wang K, Yang R, Zhou Y, Huang W, Zhang S. Design and Improvement of SD3-Based Energy Management Strategy for a Hybrid Electric Urban Bus. Energies. 2022; 15(16):5878. https://doi.org/10.3390/en15165878
Chicago/Turabian StyleWang, Kunyu, Rong Yang, Yongjian Zhou, Wei Huang, and Song Zhang. 2022. "Design and Improvement of SD3-Based Energy Management Strategy for a Hybrid Electric Urban Bus" Energies 15, no. 16: 5878. https://doi.org/10.3390/en15165878
APA StyleWang, K., Yang, R., Zhou, Y., Huang, W., & Zhang, S. (2022). Design and Improvement of SD3-Based Energy Management Strategy for a Hybrid Electric Urban Bus. Energies, 15(16), 5878. https://doi.org/10.3390/en15165878
 
         
                                                

 
       