# Enhancing Energy Management Strategies for Extended-Range Electric Vehicles through Deep Q-Learning and Continuous State Representation

^{*}

## Abstract

**:**

## 1. Introduction

- Proposes a pioneering strategy by combining the DQL algorithm with the AMSGrad optimization method (DQL-AMSGrad) to enhance the energy management of EREVs.
- Effectively addresses the “curse of dimensionality” associated with discrete state variables in EREV environments, presenting an innovative solution to efficiently manage these variables by combining DQL with AMSGrad.
- Highlights the ability of the DQL-AMSGrad strategy to adapt continuously to changing environmental conditions, improving real-time energy management efficiency and sustainability.
- Addresses performance gaps, such as Q-value overestimation, instabilities, and difficulties in parameter tuning, by integrating AMSGrad, improving convergence speed and the effectiveness of neural network weight updates associated with DQL.

## 2. Methodology

#### 2.1. Deep Reinforcement Learning-Based Energy Management Strategy

#### 2.1.1. Deep Reinforcement Learning Feature

#### 2.1.2. Neural Network Structure and AMSGrad Optimization Method

#### 2.1.3. Structure of the DQL Algorithm

#### 2.2. Proposed Method (DQL-AMSGrad)

Algorithm 1: Pseudocode for Energy Management Optimization with DQL–AMSGrad |

Initialization |

${\mathsf{\omega}}_{1}$: Initial weights of the neural network. |

$\mathrm{Q}$: Neural network for evaluating the policy in deep reinforcement learning. |

$\widehat{\mathrm{Q}}$: Target neural network in DQL. |

$\mathsf{\epsilon}$: Exploration rate in DQL. |

γ: Discount factor in DQL. |

α: Learning rate in DQL. |

${\left\{{\mathsf{\theta}}_{\mathrm{t}}\right\}}_{\mathrm{t}=1}^{\mathrm{N}}$: Learning rate in AMSGrad. |

${\left\{{\mathsf{\mu}}_{1\mathrm{t}}\right\}}_{\mathrm{t}=1}^{\mathrm{N}},{\left\{{\mathsf{\mu}}_{2\mathrm{t}}\right\}}_{\mathrm{t}=1}^{\mathrm{N}}$$:\mathrm{Momentum}\mathrm{parameters}\mathrm{for}{\mathrm{m}}_{\mathrm{t}}$$\mathrm{and}{\mathrm{v}}_{\mathrm{t}}$ in AMSGrad. |

$\mathrm{B}$: Replay buffer to store experiences in DQL. |

$\mathrm{N}$: Total number of episodes in DQL. |

$\mathrm{M}$: Maximum number of steps per episode in DQL. |

AMSGrad |

for $\mathbf{t}\mathbf{=}\mathbf{1}$ to $\mathbf{N}$ do |

${\mathrm{g}}_{\mathrm{t}}=\nabla {\mathrm{L}}_{\mathrm{t}}\left({\mathsf{\omega}}_{\mathrm{t}}\right)$ |

${\mathrm{m}}_{\mathrm{t}}={\mathsf{\mu}}_{1\mathrm{t}}{\mathrm{m}}_{\mathrm{t}-1}+\left(1-{\mathsf{\mu}}_{1\mathrm{t}}\right){\mathrm{g}}_{\mathrm{t}}$ |

${\mathrm{v}}_{\mathrm{t}}={\mathsf{\mu}}_{2\mathrm{t}}{\mathrm{v}}_{\mathrm{t}-1}+\left(1-{\mathsf{\mu}}_{2\mathrm{t}}\right){\mathrm{g}}_{\mathrm{t}}^{2}$ |

${\widehat{\mathrm{v}}}_{\mathrm{t}}=\mathrm{max}\left({\widehat{\mathrm{v}}}_{\mathrm{t}-1},{\mathrm{v}}_{\mathrm{t}}\right)$ |

${\mathsf{\omega}}_{\mathrm{t}+1}$$={\mathsf{\omega}}_{\mathrm{t}}-\frac{{\mathsf{\theta}}_{\mathrm{t}}{\mathrm{m}}_{\mathrm{t}}}{\sqrt{{\widehat{\mathrm{v}}}_{\mathrm{t}}}}$ |

end for |

DQL |

for Episode = 1 to M |

$\mathrm{Reset}\mathrm{initial}\mathrm{state}{\mathrm{s}}_{1}=\left({\mathrm{SOC}}^{1},{\mathrm{n}}_{\mathrm{g}}^{1},{\mathrm{P}}_{\mathrm{dem}}^{1}\right)$ |

for
$\mathbf{t}\mathbf{=}\mathbf{1}$
to
$\mathbf{T}$ |

${\mathrm{a}}_{\mathrm{t}}\leftarrow \mathsf{\epsilon}-\mathrm{greedy}\left({\mathrm{S}}_{\mathrm{t}},\mathrm{Q}\right)$ |

$\mathrm{Execute}{\mathrm{a}}_{\mathrm{t}}$$;\mathrm{observe}{\mathrm{s}}_{\mathrm{t}+1}$$\mathrm{and}{\mathrm{r}}_{\mathrm{t}}$ |

$\mathrm{Store}\mathrm{the}\mathrm{vector}\left({\mathrm{s}}_{\mathrm{t}},{\mathrm{a}}_{\mathrm{t}},{\mathrm{r}}_{\mathrm{t}},{\mathrm{s}}_{\mathrm{t}+1}\right)$$\mathrm{in}\mathrm{replay}\mathrm{buffer}\mathrm{B}$ |

$\mathrm{Sample}\mathrm{random}\mathrm{batch}\mathrm{of}\left({\mathrm{s}}_{\mathrm{t}},{\mathrm{a}}_{\mathrm{t}},{\mathrm{r}}_{\mathrm{t}},{\mathrm{s}}_{\mathrm{t}+1}\right)$$\mathrm{from}\mathrm{B}$ |

$\mathbf{if}\mathbf{terminal}{\mathbf{s}}_{\mathbf{j}+\mathbf{1}}$ |

$\mathrm{Set}{\mathrm{y}}_{\mathrm{j}}={\mathrm{r}}_{\mathrm{j}}$ |

else |

$\mathrm{Set}{\mathrm{y}}_{\mathrm{j}}={\mathrm{r}}_{\mathrm{j}}+\mathsf{\gamma}{\mathrm{max}}_{{\mathrm{a}}_{\mathrm{j}+1}}\widehat{\mathrm{Q}}\left({\mathrm{s}}_{\mathrm{j}+1},{\mathrm{a}}_{\mathrm{j}+\mathrm{i}},{\mathsf{\omega}}^{\prime}\right)$ |

end if |

$\mathrm{Calculate}\mathrm{loss}\mathrm{fuction}{\mathrm{L}}_{\mathsf{\omega}}=\mathrm{E}\left[{\left({\mathrm{y}}_{\mathrm{j}}-\mathrm{Q}\left({\mathrm{s}}_{\mathrm{j}},{\mathrm{a}}_{\mathrm{j}},\mathsf{\omega}\right)\right)}^{2}\right]$ |

$\mathrm{Perform}\mathrm{optimization}\mathrm{method}\mathrm{AMSGrad}\mathrm{based}\mathrm{on}\mathrm{L}\left(\mathsf{\omega}\right)$ |

$\mathrm{Reset}\widehat{\mathrm{Q}}$$\mathrm{with}\mathrm{weights}{\mathsf{\omega}}^{\prime}=\mathsf{\omega}$ |

end for |

end for |

## 3. Case Study

#### 3.1. Driving Cycles

#### 3.2. Configuration Parameters of the DQL-AMSGrad Method

## 4. Results and Discussion

#### 4.1. Artificial Neural Network Results, Driving Cycle Prediction with AMSGrad

^{−5}and the validation checks’ fall value of 6, are presented. Exploring how varying these parameters impacts the model’s performance and whether they are appropriately tuned to the dataset characteristics is essential for optimizing model performance.

^{2}). Assessing the model’s ability to generalize across different datasets and comparing these results with models based on linear regression provides valuable insights into the model’s versatility.

#### 4.2. Optimality of DQL-AMSGrad Strategy

#### 4.3. Comparison with Traditional Strategies

#### 4.3.1. Fuel Efficiency

#### 4.3.2. Adaptability

#### 4.3.3. Fuel Saving

#### 4.3.4. Control Action

#### 4.4. Sensitivity Analysis

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Jeon, S.-I.; Jo, S.T.; Park, Y.-I.; Lee, J.M. Multi-Mode Driving Control of a Parallel Hybrid Electric Vehicle Using Driving Pattern Recognition. J. Dyn. Syst. Meas. Control
**2002**, 124, 141–149. [Google Scholar] [CrossRef] - Zheng, B.; Gao, X.; Li, X. Diagnosis of Sucker Rod Pump Based on Generating Dynamometer Cards. J. Process Control
**2019**, 77, 76–88. [Google Scholar] [CrossRef] - Yang, C.; Zha, M.; Wang, W.; Liu, K.; Xiang, C. Efficient Energy Management Strategy for Hybrid Electric Vehicles/Plug-in Hybrid Electric Vehicles: Review and Recent Advances under Intelligent Transportation System. IET Intell. Trans. Syst.
**2020**, 14, 702–711. [Google Scholar] [CrossRef] - Corinaldesi, C.; Lettner, G.; Schwabeneder, D.; Ajanovic, A.; Auer, H. Impact of Different Charging Strategies for Electric Vehicles in an Austrian Office Site. Energies
**2020**, 13, 5858. [Google Scholar] [CrossRef] - Yue, M.; Jemei, S.; Gouriveau, R.; Zerhouni, N. Review on Health-Conscious Energy Management Strategies for Fuel Cell Hybrid Electric Vehicles: Degradation Models and Strategies. Int. J. Hydrogen Energy
**2019**, 44, 6844–6861. [Google Scholar] [CrossRef] - Corinaldesi, C.; Lettner, G.; Auer, H. On the Characterization and Evaluation of Residential On-Site E-Car-Sharing. Energy
**2022**, 246, 123400. [Google Scholar] [CrossRef] - Xu, B.; Rathod, D.; Zhang, D.; Yebi, A.; Zhang, X.; Li, X.; Filipi, Z. Parametric Study on Reinforcement Learning Optimized Energy Management Strategy for a Hybrid Electric Vehicle. Appl. Energy
**2020**, 259, 114200. [Google Scholar] [CrossRef] - Hu, X.; Murgovski, N.; Johannesson, L.M.; Egardt, B. Comparison of Three Electrochemical Energy Buffers Applied to a Hybrid Bus Powertrain with Simultaneous Optimal Sizing and Energy Management. IEEE Trans. Intell. Transp. Syst.
**2014**, 15, 1193–1205. [Google Scholar] [CrossRef] - Duan, B.M.; Wang, Q.N.; Wang, J.N.; Li, X.N.; Ba, T. Calibration Efficiency Improvement of Rule-Based Energy Management System for a Plug-in Hybrid Electric Vehicle. Int. J. Automot. Technol.
**2017**, 18, 335–344. [Google Scholar] [CrossRef] - Katrašnik, T. Analytical Method to Evaluate Fuel Consumption of Hybrid Electric Vehicles at Balanced Energy Content of the Electric Storage Devices. Appl. Energy
**2010**, 87, 3330–3339. [Google Scholar] [CrossRef] - Zheng, C.; Li, W.; Liang, Q. An Energy Management Strategy of Hybrid Energy Storage Systems for Electric Vehicle Applications. IEEE Trans. Sustain. Energy
**2018**, 9, 1880–1888. [Google Scholar] [CrossRef] - Zou, Y.; Kong, Z.; Liu, T.; Liu, D. A Real-Time Markov Chain Driver Model for Tracked Vehicles and Its Validation: Its Adaptability via Stochastic Dynamic Programming. IEEE Trans. Veh. Technol.
**2017**, 66, 3571–3582. [Google Scholar] [CrossRef] - Wu, Y.; Tan, H.; Peng, J.; Zhang, H.; He, H. Deep Reinforcement Learning of Energy Management with Continuous Control Strategy and Traffic Information for a Series-Parallel Plug-in Hybrid Electric Bus. Appl. Energy
**2019**, 247, 454–466. [Google Scholar] [CrossRef] - Sabri, M.F.M.; Danapalasingam, K.A.; Rahmat, M.F. A Review on Hybrid Electric Vehicles Architecture and Energy Management Strategies. Renew. Sustain. Energy Rev.
**2016**, 53, 1433–1442. [Google Scholar] [CrossRef] - Zhou, Y.; Ravey, A.; Péra, M.C. A Survey on Driving Prediction Techniques for Predictive Energy Management of Plug-in Hybrid Electric Vehicles. J. Power Sources
**2019**, 412, 480–495. [Google Scholar] [CrossRef] - Hofman, T.; Steinbuch, M.; Van Druten, R.M.; Serrarens, A.F.A. Rule-Based Energy Management Strategies for Hybrid Vehicle Drivetrains: A Fundamental Approach in Reducing Computation Time. IFAC Proc. Vol.
**2006**, 39, 740–745. [Google Scholar] [CrossRef] - Liu, J.; Chen, Y.; Li, W.; Shang, F.; Zhan, J. Hybrid-Trip-Model-Based Energy Management of a PHEV with Computation-Optimized Dynamic Programming. IEEE Trans. Veh. Technol.
**2018**, 67, 338–353. [Google Scholar] [CrossRef] - Peng, J.; He, H.; Xiong, R. Rule Based Energy Management Strategy for a Series–Parallel Plug-in Hybrid Electric Bus Optimized by Dynamic Programming. Appl. Energy
**2017**, 185, 1633–1643. [Google Scholar] [CrossRef] - Li, Y.; Jiao, X.; Jing, Y. A Real-Time Energy Management Strategy Combining Rule-Based Control and ECMS with Optimization Equivalent Factor for HEVs. In Proceedings of the Proceedings—2017 Chinese Automation Congress, CAC 2017, Jinan, China, 20–22 October 2017; pp. 5988–5992. [Google Scholar]
- Chen, Z.; Xiong, R.; Wang, K.; Jiao, B. Optimal Energy Management Strategy of a Plug-in Hybrid Electric Vehicle Based on a Particle Swarm Optimization Algorithm. Energies
**2015**, 8, 3661–3678. [Google Scholar] [CrossRef] - Haskara, I.; Hegde, B.; Chang, C.F. Reinforcement Learning Based EV Energy Management for Integrated Traction and Cabin Thermal Management Considering Battery Aging. IFAC-PapersOnLine
**2022**, 55, 348–353. [Google Scholar] [CrossRef] - Wu, J.; He, H.; Peng, J.; Li, Y.; Li, Z. Continuous Reinforcement Learning of Energy Management with Deep Q Network for a Power Split Hybrid Electric Bus. Appl. Energy
**2018**, 222, 799–811. [Google Scholar] [CrossRef] - Beerel, P.A.; Pedram, M. Opportunities for Machine Learning in Electronic Design Automation. In Proceedings of the Proceedings—IEEE International Symposium on Circuits and Systems, Florence, Italy, 27–30 May 2018. [Google Scholar]
- Xu, B.; Tang, X.; Hu, X.; Lin, X.; Li, H.; Rathod, D.; Wang, Z. Q-Learning-Based Supervisory Control Adaptability Investigation for Hybrid Electric Vehicles. IEEE Trans. Intell. Transp. Syst.
**2022**, 23, 6797–6806. [Google Scholar] [CrossRef] - Deng, R.; Liu, Y.; Chen, W.; Liang, H. A Survey on Electric Buses—Energy Storage, Power Management, and Charging Scheduling. IEEE Trans. Intell. Transp. Syst.
**2021**, 22, 9–22. [Google Scholar] [CrossRef] - Du, G.; Zou, Y.; Zhang, X.; Liu, T.; Wu, J.; He, D. Deep Reinforcement Learning Based Energy Management for a Hybrid Electric Vehicle. Energy
**2020**, 201, 117591. [Google Scholar] [CrossRef] - Zou, Y.; Liu, T.; Liu, D.; Sun, F. Reinforcement Learning-Based Real-Time Energy Management for a Hybrid Tracked Vehicle. Appl. Energy
**2016**, 171, 372–382. [Google Scholar] [CrossRef] - Qi, X.; Luo, Y.; Wu, G.; Boriboonsomsin, K.; Barth, M. Deep Reinforcement Learning Enabled Self-Learning Control for Energy Efficient Driving. Transp. Res. Part. C Emerg. Technol.
**2019**, 99, 67–81. [Google Scholar] [CrossRef] - Campoverde, A.S.B. Análisis de La Isla de Calor Urbana En El Entorno Andino de Cuenca-Ecuador. Investig. Geográficas
**2018**, 70, 167–179. [Google Scholar] [CrossRef] - Putrus, G.A.; Suwanapingkarl, P.; Johnston, D.; Bentley, E.C.; Narayana, M. Impacto de Las Estaciones de Carga Para Vehículo Eléctrico En La Curva de Carga de La Ciudad de Cuenca. Maskana
**2017**, 8, 239–246. [Google Scholar] - Guo, L.; Zhang, X.; Zou, Y.; Han, L.; Du, G.; Guo, N.; Xiang, C. Co-Optimization Strategy of Unmanned Hybrid Electric Tracked Vehicle Combining Eco-Driving and Simultaneous Energy Management. Energy
**2022**, 246, 123309. [Google Scholar] [CrossRef] - Chemali, E.; Kollmeyer, P.J.; Preindl, M.; Emadi, A. State-of-Charge Estimation of Li-Ion Batteries Using Deep Neural Networks: A Machine Learning Approach. J. Power Sources
**2018**, 400, 242–255. [Google Scholar] [CrossRef] - Fahmy, Y.A.; Wang, W.; West, A.C.; Preindl, M. Snapshot SoC Identification with Pulse Injection Aided Machine Learning. J. Energy Storage
**2021**, 41, 102891. [Google Scholar] [CrossRef] - Braganza, D.; Dawson, D.M.; Walker, I.D.; Nath, N. A Neural Network Controller for Continuum Robots. IEEE Trans. Robot.
**2007**, 23, 1270–1277. [Google Scholar] [CrossRef] - Ramsami, P.; Oree, V. A Hybrid Method for Forecasting the Energy Output of Photovoltaic Systems. Energy Convers. Manag.
**2015**, 95, 406–413. [Google Scholar] [CrossRef] - Dahunsi, O.A.; Pedro, J.O.; Nyandoro, O.T. System Identification and Neural Network Based Pid Control of Servo- Hydraulic Vehicle Suspension System. SAIEE Afr. Res. J.
**2010**, 101, 93–105. [Google Scholar] [CrossRef] - Tran, P.T.; Phong, L.T. On the Convergence Proof of AMSGrad and a New Version. IEEE Access
**2019**, 7, 61706–61716. [Google Scholar] [CrossRef] - Zhong, H.; Chen, Z.; Qin, C.; Huang, Z.; Zheng, V.W.; Xu, T.; Chen, E. Adam Revisited: A Weighted Past Gradients Perspective. Front. Comput. Sci.
**2020**, 14, 145309. [Google Scholar] [CrossRef] - Iiduka, H. Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks. IEEE Trans. Cybern.
**2022**, 52, 13250–13261. [Google Scholar] [CrossRef] - Yu, Y.; Liu, F. Effective Neural Network Training with a New Weighting Mechanism-Based Optimization Algorithm. IEEE Access
**2019**, 7, 72403–72410. [Google Scholar] [CrossRef] - He, H.; Wang, Y.; Li, J.; Dou, J.; Lian, R.; Li, Y. An Improved Energy Management Strategy for Hybrid Electric Vehicles Integrating Multistates of Vehicle-Traffic Information. IEEE Trans. Transp. Electrif.
**2021**, 7, 1161–1172. [Google Scholar] [CrossRef] - Elbaz, K.; Zhou, A.; Shen, S.L. Deep Reinforcement Learning Approach to Optimize the Driving Performance of Shield Tunnelling Machines. Tunn. Undergr. Space Technol.
**2023**, 136, 105104. [Google Scholar] [CrossRef] - Qi, C.; Zhu, Y.; Song, C.; Yan, G.; Xiao, F.; Zhang, X.; Cao, J.; Song, S. Hierarchical Reinforcement Learning Based Energy Management Strategy for Hybrid Electric Vehicle. Energy
**2022**, 238, 121703. [Google Scholar] [CrossRef] - Nissan España. Coches Eléctricos, Crossovers, 4x4 y Furgonetas. Available online: https://www.nissan.es/ (accessed on 30 November 2023).

**Figure 5.**Driving profiles with 100 experiment cycles: (

**a**) EREV speed; (

**b**) distance traveled in each driving profile; (

**c**) course altitude in each driving profile.

**Table 1.**Parameters of the Nissan Xtrail e-POWER [44].

Symbol | Parameter | Value |
---|---|---|

Vehicle | Empty weight | 1800 kg |

Air resistance coefficient | 0.26 | |

Rolling resistance coefficient | 0.03 | |

Frontal area | 2.65 m^{2} | |

Electric motor | Maximum power | 150 kW at 5000 rpm |

Maximum torque | 330 Nm at 3505 rpm | |

Combustion Engine (Generator) | Maximum power | 116 kW at 4600 rpm |

Maximum torque | 250 Nm at 2400 rpm | |

Battery | Capacity | 1.73 kWh |

Voltage | 200 V |

Main Parameters | Value |
---|---|

Num_Episodes $\left(\mathrm{M}\right)$ | 1000 |

Max_Steps $\left(\mathrm{T}\right)$ | 200 |

Learning_Rate $\left(\mathsf{\alpha}\right)$ | 0.001 |

Discount_Factor $\left(\mathsf{\gamma}\right)$ | 0.99 |

Exploration_Prob. $\left(\mathsf{\epsilon}\right)$ | 0.2 |

Batch Size | 32 |

Replay Buffer Size | 10,000 |

τ | 0.001 |

Neurons | 64 |

AMSGrad (SquaredGradientDecayFactor) | 0.99 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Montaleza, C.; Arévalo, P.; Gallegos, J.; Jurado, F.
Enhancing Energy Management Strategies for Extended-Range Electric Vehicles through Deep Q-Learning and Continuous State Representation. *Energies* **2024**, *17*, 514.
https://doi.org/10.3390/en17020514

**AMA Style**

Montaleza C, Arévalo P, Gallegos J, Jurado F.
Enhancing Energy Management Strategies for Extended-Range Electric Vehicles through Deep Q-Learning and Continuous State Representation. *Energies*. 2024; 17(2):514.
https://doi.org/10.3390/en17020514

**Chicago/Turabian Style**

Montaleza, Christian, Paul Arévalo, Jimmy Gallegos, and Francisco Jurado.
2024. "Enhancing Energy Management Strategies for Extended-Range Electric Vehicles through Deep Q-Learning and Continuous State Representation" *Energies* 17, no. 2: 514.
https://doi.org/10.3390/en17020514