# TD3-Based EMS Using Action Mask and Considering Battery Aging for Hybrid Electric Dump Trucks

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- (1)
- A TD3-based EMS is proposed to extend the battery life and reduce the total usage cost. Because battery aging affects vehicle range, costly battery replacements are required when battery life terminates.
- (2)
- Most of EMSs ignore safety issues during the exploration stage such as the MG1 overloading, which cause serious problems in automotive control and is unacceptable in industrial applications. Then action masks are used to eliminate invalid actions that exceed the physical limits and improve the training efficiency of the policy.
- (3)
- The TD3 algorithm can reduce the overestimation bias of DDPG, thus the TD3 algorithm is applied as an EMS for hybrid electric dump trucks and trained by the self-learning capability of DRL. Finally, a comparison with DDPG-based EMS is presented.
- (4)
- The reward function that includes battery aging cost and fuel consumption cost is designed to extend the battery life and reduce fuel consumption.

## 2. Vehicle Modeling and Optimization Problem

#### 2.1. Vehicle Model

_{d}is the aerodynamic drag coefficient, A is the frontal area, v is the vehicle velocity, i is the angle of the road slope, and $\delta $ is the rotational mass conversion coefficient.

_{req}is the vehicle requirement power, P

_{ICE}is the ICE power, P

_{batt}is the battery power, η

_{batt}is the battery efficiency, and η

_{i}is the transmission efficiency.

_{ICE}, ω

_{MG}

_{1}, ω

_{MG}

_{2}, ω

_{R}

_{1}are the rotation speeds of the ICE, MG1, MG2 and planetary gear ring respectively, the T

_{ICE}, T

_{MG}

_{1}, T

_{MG}

_{2}, T

_{R}

_{1}are the torques of the ICE, MG1, MG2 and planetary gear ring respectively, k

_{1}is the transmission ratio of the PG, and k

_{2}is the gear ratio of the MG2.

_{fuel}is instantaneous fuel consumption function, T is total time.

_{motor_req}is the motor requirement power, T

_{m}, ω

_{m}, η

_{m}is the torque, rotation speed and efficiency of drive mode respectively, T

_{g}, ω

_{g}, η

_{g}is the torque, rotation speed and efficiency of generation mode respectively.

_{batt}is the battery current, U

_{oc}is the open circuit voltage, R

_{batt}is the internal resistance of the battery. The state of charge (SOC) is defined as

_{batt}is the nominal battery capacity.

#### 2.2. Battery Aging Model and Optimization Problem for EMS

_{loss}is the percentage of Li-ion battery capacity loss, B is the pre-exponential factor, E

_{a}is the activation energy, R

_{gas}is the gas constant, R

_{gas}= 8.314 J/(mol·K), T

_{k}is the battery temperature, Ah is the battery Ah-throughput, z is the power law factor.

_{c}is the current rate.

_{1}is the diesel oil price, which is set to 7.2 CNY/L, w

_{2}is the Li-ion battery price, which is set to 1700 CNY/kWh, CNY is Chinese Yuan.

## 3. Method and Design of TD3-Based EMS

#### 3.1. Reinforcement Learning

_{t},a

_{t}) is the reward at each moment.

_{π}(s

_{t},a

_{t}) to evaluate the policy function. The Bellman equation is given by

_{π}is the expectation.

_{π}(st,a′) when the optimal Q-value function Q*

_{π}(s

_{t},a

_{t}) is known. The Q*

_{π}(s

_{t},a

_{t}) of the Q-learning algorithm can be solved by temporal difference. However, the Q-learning algorithm faces the “dimensional disaster” when the dimensionality of state and action is large, and deep reinforcement learning using neural networks can solve this problem.

#### 3.2. TD3 Algorithm

_{t}is the Q-target value, calculated by temporal difference, and Q

_{θ}

_{′}(s

_{t+}

_{1},a

_{t+}

_{1}) uses the smaller of the two sets of target critic networks.

_{θ}

_{′}is the Q-target value and π

_{Φ}

_{′}is the target actor policy.

Algorithm 1: TD3 |

initialization: critic networks Q _{θ}_{1}, Q_{θ}_{2} with random parameters θ_{1}, θ_{2},actor network π _{φ} with random parameters φ,target critic networks θ′ _{1} ← θ_{1}, θ′_{2} ← θ_{2},actor network ϕ′ ← ϕ replay buffer $\mathcal{B}$ for t = 1 to T doobserve state s choose action with exploration noise a ~ π _{φ}(s) + ε,$\epsilon \sim \mathcal{N}(0,\sigma )$ and observe reward r and new state s′, store transition tuple (s, a, r, s’, d) in $\mathcal{B}$ randomly sample a mini-batch of N transitions {(s, a, r, s′, d)} from $\mathcal{B}$ ${a}^{\prime}\leftarrow {\pi}_{{\varphi}^{\prime}}({s}^{\prime})+\epsilon ,\epsilon \sim \mathrm{clip}(\mathcal{N}(0,\sigma ),-c,c)$ ${y}_{t}=r({s}_{t},{a}_{t})+\gamma {\mathrm{min}}_{i=1,2}{Q}_{{\theta}_{i}^{\prime}}({s}_{t+1},at+1)$ update critic networks ${\theta}_{i}\leftarrow \mathrm{arg}{\mathrm{min}}_{{\theta}_{i}}{N}^{-1}{\displaystyle \sum {(y-{Q}_{{\theta}_{i}}(s,a))}^{2}}$ if t mod d thenupdate φ by deterministic policy gradient: ${\nabla}_{\varphi}J(\varphi )={N}^{-1}{\displaystyle \sum {{\nabla}_{a}{Q}_{{\theta}_{1}}(s,a)|}_{a={\pi}_{\varphi}(s)}{\nabla}_{\varphi}{\pi}_{\varphi}(s)}$ Update target networks by soft update: $\begin{array}{c}{\theta}^{\prime}\leftarrow \tau \theta +\left(1-\tau \right){\theta}^{\prime}\\ {\varphi}^{\prime}\leftarrow \tau \varphi +\left(1-\tau \right){\varphi}^{\prime}\end{array}$ end ifend for |

#### 3.3. Action Mask

#### 3.4. Design of TD3-Based EMS

_{3}is the SOC sustainability penalty coefficient and SOC

_{init}is the initial SOC.

^{n}is larger, the longer the steps the agent considers in the future, and the training difficulty increases. When the discount factor takes a smaller value, γ

^{n}is smaller, the agent focuses more on the current reward, and the training difficulty decreases. The discount factor is as large as possible in order to allow the agent to consider as much of the global regression as possible, provided the algorithm can converge, so it takes the value of 0.99. The mini-batch size is the number of samples for a single training session, and the experience buffer size is the maximum capacity to record the training experience, and the earliest samples are removed if the training experience exceeds the maximum capacity.

## 4. Results

#### 4.1. The Impact of Action Mask

#### 4.2. Battery Capacity Loss and Fuel Consumption

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Ali, A.; Söffker, D. Towards Optimal Power Management of Hybrid Electric Vehicles in Real-Time: A Review on Methods, Challenges, and State-Of-The-Art Solutions. Energies
**2018**, 11, 476. [Google Scholar] [CrossRef] [Green Version] - Saiteja, P.; Ashok, B. Critical Review on Structural Architecture, Energy Control Strategies and Development Process towards Optimal Energy Management in Hybrid Vehicles. Renew. Sust. Energ. Rev.
**2022**, 157, 112038. [Google Scholar] [CrossRef] - Tran, D.; Vafaeipour, M.; El Baghdadi, M.; Barrero, R.; Van Mierlo, J.; Hegazy, O. Thorough State-of-the-Art Analysis of Electric and Hybrid Vehicle Powertrains: Topologies and Integrated Energy Management Strategies. Renew. Sust. Energ. Rev.
**2020**, 119, 109596. [Google Scholar] [CrossRef] - Padmarajan, B.; McGordon, A.; Jennings, P. Blended Rule-Based Energy Management for PHEV: System Structure and Strategy. IEEE Trans. Veh. Technol.
**2016**, 65, 8757–8762. [Google Scholar] [CrossRef] - Zhou, W.; Yang, L.; Cai, Y.; Ying, T. Dynamic Programming for New Energy Vehicles Based on Their Work Modes Part I: Electric Vehicles and Hybrid Electric Vehicles. J. Power Sources
**2018**, 406, 151–166. [Google Scholar] [CrossRef] - Rezaei, A.; Burl, J.; Zhou, B.; Rezaei, M. A New Real-Time Optimal Energy Management Strategy for Parallel Hybrid Electric Vehicles. IEEE Trans. Control Syst. Technol.
**2019**, 27, 830–837. [Google Scholar] [CrossRef] - East, S.; Cannon, M. Scenario Model Predictive Control for Data-Based Energy Management in Plug-In Hybrid Electric Vehicles. IEEE Trans. Control Syst. Technol.
**2022**, 30, 2522–2533. [Google Scholar] [CrossRef] - Yu, P.; Li, M.; Wang, Y.; Chen, Z. Fuel Cell Hybrid Electric Vehicles: A Review of Topologies and Energy Management Strategies. World Electr. Veh. J.
**2022**, 13, 172. [Google Scholar] [CrossRef] - Zhang, F.; Wang, L.; Coskun, S.; Pang, H.; Cui, Y.; Xi, J. Energy Management Strategies for Hybrid Electric Vehicles: Review, Classification, Comparison, and Outlook. Energies
**2020**, 13, 3352. [Google Scholar] [CrossRef] - Hu, Y.; Li, W.; Xu, K.; Zahid, T.; Qin, F.; Li, C. Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning. Appl. Sci.
**2018**, 8, 187. [Google Scholar] [CrossRef] [Green Version] - Zou, Y.; Liu, T.; Liu, D.; Sun, F. Reinforcement Learning-Based Real-Time Energy Management for a Hybrid Tracked Vehicle. Appl. Energy
**2016**, 171, 372–382. [Google Scholar] [CrossRef] - Xiong, R.; Cao, J.; Yu, Q. Reinforcement Learning-Based Real-Time Power Management for Hybrid Energy Storage System in the Plug-in Hybrid Electric Vehicle. Appl. Energy
**2018**, 211, 538–548. [Google Scholar] [CrossRef] - Liu, T.; Zou, Y.; Liu, D.; Sun, F. Reinforcement Learning of Adaptive Energy Management with Transition Probability for a Hybrid Electric Tracked Vehicle. IEEE Trans. Ind. Electron.
**2015**, 62, 7837–7846. [Google Scholar] [CrossRef] - Li, Y.; He, H.; Peng, J.; Wang, H. Deep Reinforcement Learning-Based Energy Management for a Series Hybrid Electric Vehicle Enabled by History Cumulative Trip Information. IEEE Trans. Veh. Technol.
**2019**, 68, 7416–7430. [Google Scholar] [CrossRef] - Wu, J.; He, H.; Peng, J.; Li, Y.; Li, Z. Continuous Reinforcement Learning of Energy Management with Deep Q Network for a Power Split Hybrid Electric Bus. Appl. Energy
**2018**, 222, 799–811. [Google Scholar] [CrossRef] - Han, X.; He, H.; Wu, J.; Peng, J.; Li, Y. Energy Management Based on Reinforcement Learning with Double Deep Q-Learning for a Hybrid Electric Tracked Vehicle. Appl. Energy
**2019**, 254, 113708. [Google Scholar] [CrossRef] - Li, Y.; He, H.; Khajepour, A.; Wang, H.; Peng, J. Energy Management for a Power-Split Hybrid Electric Bus via Deep Reinforcement Learning with Terrain Information. Appl. Energy
**2019**, 255, 113762. [Google Scholar] [CrossRef] - Tan, H.; Zhang, H.; Peng, J.; Jiang, Z.; Wu, Y. Energy Management of Hybrid Electric Bus Based on Deep Reinforcement Learning in Continuous State and Action Space. Energy Conv. Manag.
**2019**, 195, 548–560. [Google Scholar] [CrossRef] - Wu, Y.; Tan, H.; Peng, J.; Zhang, H.; He, H. Deep Reinforcement Learning of Energy Management with Continuous Control Strategy and Traffic Information for a Series-Parallel Plug-in Hybrid Electric Bus. Appl. Energy
**2019**, 247, 454–466. [Google Scholar] [CrossRef] - Zhou, J.; Xue, S.; Xue, Y.; Liao, Y.; Liu, J.; Zhao, W. A Novel Energy Management Strategy of Hybrid Electric Vehicle via an Improved TD3 Deep Reinforcement Learning. Energy
**2021**, 224, 120118. [Google Scholar] [CrossRef] - Li, T.; Cui, W.; Cui, N. Soft Actor-Critic Algorithm-Based Energy Management Strategy for Plug-In Hybrid Electric Vehicle. World Electr. Veh. J.
**2022**, 13, 193. [Google Scholar] [CrossRef] - Cheng, Y.; Xu, G.; Chen, Q. Research on Energy Management Strategy of Electric Vehicle Hybrid System Based on Reinforcement Learning. Electronics
**2022**, 11, 1933. [Google Scholar] [CrossRef] - Wang, J.; Liu, P.; Hicks-Garner, J.; Sherman, E.; Soukiazian, S.; Verbrugge, M.; Tataria, H.; Musser, J.; Finamore, P. Cycle-Life Model for Graphite-LiFePO4 Cells. J. Power Sources
**2011**, 196, 3942–3948. [Google Scholar] [CrossRef] - Tang, L.; Rizzoni, G.; Onori, S. Energy management strategy for HEVs including battery aging optimization. IEEE Trans. Transp. Electrif.
**2015**, 1, 211–222. [Google Scholar] [CrossRef] - Xu, D.; Cui, Y.; Ye, J.; Cha, S.W.; Li, A.; Zheng, C. A Soft Actor-Critic-Based Energy Management Strategy for Electric Vehicles with Hybrid Energy Storage Systems. J. Power Sources
**2022**, 524, 231099. [Google Scholar] [CrossRef] - Fujimoto, S.; Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the PMLR/35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Zhou, G.; Huang, F.; Liu, W.; Zhao, C.; Xiang, Y.; Wei, H. Comprehensive Control Strategy of Fuel Consumption and Emissions Incorporating the Catalyst Temperature for PHEVs Based on DRL. Energies
**2022**, 15, 7523. [Google Scholar] [CrossRef] - Nam, H.; Kim, Y.; Bae, J.; Lee, J. GateRL: Automated Circuit Design Framework of CMOS Logic Gates Using Reinforcement Learning. Electronics
**2021**, 10, 1032. [Google Scholar] [CrossRef] - Wu, Y.; Tseng, B.; Rasmussen, C. Improving Sample-Efficiency in Reinforcement Learning for Dialogue Systems by Using Trainable-Action-Mask. In Proceedings of the ICASSP/2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 1 May 2020; pp. 8024–8028. [Google Scholar]
- Tang, C.; Liu, C.; Chen, W.; You, S.D. Implementing Action Mask in Proximal Policy Optimization (PPO) Algorithm. ICT Express
**2020**, 6, 200–203. [Google Scholar] [CrossRef]

**Figure 10.**Rewards for the episode. (

**a**) TD3-based EMSs. (

**b**) DDPG-based EMSs. (

**c**) Convergence speed for different EMSs.

**Figure 11.**The ICE power of the DRL-based EMS for one of the episodes. (

**a**) TD3-based EMS without action mask. (

**b**) TD3-based EMS with action mask.

**Figure 12.**SOC of the hybrid electric dump truck under the CHTC-D driving cycle. (

**a**) TD3-based EMSs. (

**b**) DDPG-based EMSs.

**Figure 13.**The loss of battery capacity. (

**a**) Battery capacity loss at each time. (

**b**) Total battery capacity loss.

**Figure 15.**Distribution of operating points on the battery for different EMSs. (

**a**) TD3-based EMS, No. (

**b**) TD3-based EMS, Yes. (

**c**) DDPG-based EMS, No. (

**d**) DDPG-based EMS, Yes.

**Figure 16.**The fuel consumption of the ICE. (

**a**) Fuel consumption at each time. (

**b**) Fuel consumption for 100 km.

**Figure 17.**The distribution of ICE power. (

**a**) TD3-based EMS, No. (

**b**) TD3-based EMS, Yes. (

**c**) DDPG-based EMS, No. (

**d**) DDPG-based EMS, Yes.

Ref. | Author | Year | Categories | Continuous | Main Topic | |
---|---|---|---|---|---|---|

State | Action | |||||

[4] | Padmarajan et al. | 2016 | Rule-based | System structure and strategy | ||

[5] | Zhou et al. | 2018 | Optimization-based | Improvement of DP-based EMS for different HEVs | ||

[7] | East et al. | 2022 | Optimization-based | Scenario MPC for data-based EMS | ||

[13] | Liu et al. | 2015 | Learning-based | Reinforcement learning of adaptive EMS | ||

[15] | Wu et al. | 2018 | Learning-based | x | Continuous RL-based EMS | |

[16] | Han et al. | 2019 | Learning-based | x | DDQL-based EMS avoids falling into policy value overestimation | |

[17] | Li et al. | 2019 | Learning-based | x | EMS with terrain information | |

[18] | Tan et al. | 2019 | Learning-based | x | x | Continuous state and action spaces |

[19] | Wu et al. | 2019 | Learning-based | x | x | Continuous control and traffic information |

[21] | Li et al. | 2022 | Learning-based | x | x | SAC-AET-based EMS to improve the control effects |

**Table 2.**The parameters of the vehicle structure and the main parameters of the power and transmission systems.

Parts | Parameter Name | Value |
---|---|---|

Vehicle | Gross weight (kg) | 31,000 |

Dimension (mm) | 9662 × 2495 × 3450 | |

Dimension of cargo box (mm) | 6800 × 2350 × 1500 | |

Drive form | 8 × 4 | |

Drag coefficient | 0.56 | |

Frontal area (m^{2}) | 8.24 | |

Rolling resistance coefficient | 0.0041 + 0.0000256v | |

ICE | Max. power (kW) | 243 |

Max. torque (Nm) | 1400 | |

Max. speed (rpm) | 2200 | |

MG1 | Max. power (kW) | 110 |

Max. torque (Nm) | 340 | |

Max. speed (rpm) | 7500 | |

MG2 | Max. power (kW) | 196 |

Max. torque (Nm) | 375 | |

Max. speed (rpm) | 15,000 | |

Gear ratio | 6.7 | |

Transmission | Transmission ratio of PG | 4.4 |

AMT gears ratio | 6.3/2.1/1/0.86 | |

Final drive ratio | 5.1 | |

Battery | Capacity (Ah) | 70 |

Voltage (V) | 576 |

Parameters | Value |
---|---|

Actor network learning rate | 0.0001 |

Critic network learning rate | 0.0002 |

Discount factor | 0.99 |

Mini-batch size | 256 |

Experience buffer size | 1 × 10^{6} |

EMS | Consider Battery Aging | F.C. (L/100 km) | F.C. Cost (CNY) | Battery Capacity Loss (%) | Battery Aging Cost (CNY) | Total Cost (CNY) | Performance |
---|---|---|---|---|---|---|---|

TD3-based | No | 24.81 | 178.63 | 0.0423 | 28.99 | 207.62 | 95.82% |

Yes | 25.06 | 180.43 | 0.0270 | 18.51 | 198.94 | 100% | |

DDPG-based | No | 24.74 | 178.13 | 0.0434 | 29.75 | 207.88 | 95.70% |

Yes | 25.27 | 181.94 | 0.0286 | 19.60 | 201.54 | 98.71% |

EMS | Consider Battery Aging | F.C. (L/100 km) | F.C. Cost (CNY) | Battery Capacity Loss (%) | Battery Aging Cost (CNY) | Total Cost (CNY) | Performance |
---|---|---|---|---|---|---|---|

TD3-based | No | 30.14 | 217.01 | 0.0293 | 20.08 | 237.09 | 97.51% |

Yes | 30.31 | 218.23 | 0.0189 | 12.95 | 231.18 | 100% |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Mo, J.; Yang, R.; Zhang, S.; Zhou, Y.; Huang, W.
TD3-Based EMS Using Action Mask and Considering Battery Aging for Hybrid Electric Dump Trucks. *World Electr. Veh. J.* **2023**, *14*, 74.
https://doi.org/10.3390/wevj14030074

**AMA Style**

Mo J, Yang R, Zhang S, Zhou Y, Huang W.
TD3-Based EMS Using Action Mask and Considering Battery Aging for Hybrid Electric Dump Trucks. *World Electric Vehicle Journal*. 2023; 14(3):74.
https://doi.org/10.3390/wevj14030074

**Chicago/Turabian Style**

Mo, Jinchuan, Rong Yang, Song Zhang, Yongjian Zhou, and Wei Huang.
2023. "TD3-Based EMS Using Action Mask and Considering Battery Aging for Hybrid Electric Dump Trucks" *World Electric Vehicle Journal* 14, no. 3: 74.
https://doi.org/10.3390/wevj14030074