Research on Speed Planning and Energy Management Strategy for Distributed-Drive Electric Vehicles Based on Deep Deterministic Policy Gradient Algorithm
Abstract
1. Introduction
2. Establishment of a Simulation Model for a DDEV
2.1. Overall Architecture of the Hybrid Braking System for a DDEV
2.2. Construction of the DDEV Model
2.2.1. Modeling of the In-Wheel Motor
2.2.2. Modeling of the Power Battery
2.2.3. Construction of the DDEV Model
3. Design of the Hybrid Braking System Control Strategy for DDEVs
- (1)
- Battery SOC: The battery SOC should not be too high. If it exceeds a certain threshold, continued charging may lead to overcharging, which can reduce battery service life and, in severe cases, cause battery damage. Therefore, when the SOC value is greater than 95%, only hydraulic braking is employed, and the motor does not participate in braking. The constraint coefficient is defined as k1:
- (2)
- Maximum motor torque: The braking torque of the in-wheel motor installed on each wheel cannot exceed the maximum motor torque. Therefore, the constraint coefficient is defined as k2:
- (3)
- Vehicle speed: When the vehicle speed is below 5 km/h, the current generated by the motor during braking is low, resulting in low energy recovery efficiency. Moreover, motor braking is unstable at low speeds and can easily cause impact to the vehicle. Therefore, under such conditions, only hydraulic braking is performed, and the motor no longer participates. As the speed decreases to between 5 and 10 km/h, the motor braking torque gradually reduces until it is completely withdrawn. The constraint coefficient k3 is defined as:
- (4)
- Braking intensity: When the braking intensity exceeds 0.7, continued use of motor braking poses a safety risk. Therefore, motor braking is withdrawn, and only hydraulic braking is performed. The constraint coefficient is defined as k4:
- (5)
- Battery charging power: The sum of the regenerative braking power generated by the four wheels of a DDEV should not exceed the maximum charging power of the power battery; otherwise, it may affect the service life of the power battery or even cause damage. The constraint coefficient is defined as k5:
4. Construction of a Signalized Intersection Road Model and Study of Traffic Operation Status
4.1. Construction of a Signalized Intersection Road Model
4.2. Study of Traffic Operation Status at Signalized Intersections
- (1)
- Deceleration passing condition at a red light: When the test vehicle approaches the intersection the traffic signal is red, with 20 s remaining in the red phase. If the vehicle maintains its current constant speed, it will arrive at the stop line before the red light ends and would need to stop and wait for the light to turn green before proceeding. To avoid stopping before the stop line, the test vehicle adopts a deceleration coasting strategy. This allows the vehicle to smoothly pass through the intersection after the red light ends and the green phase begins, thereby improving traffic efficiency, as shown in Figure 6.
- (2)
- Acceleration passage condition at a green light: When the test vehicle approaches the intersection the traffic signal is green, with 15 s remaining in the green phase. If the vehicle continues driving at its current speed, it will not be able to pass through the intersection before the green phase ends. Therefore, in order to pass through the intersection within the green light duration, the test vehicle needs to accelerate, enabling it to clear the intersection before the light changes, as shown in Figure 7.
5. Speed Planning Algorithm for DDEV
5.1. Speed Planning Algorithm Based on DDPG
5.1.1. Principles of the DDPG Algorithm
- (1)
- Initialization: Initialize the Actor network and the Critic network, as well as their corresponding target networks. The parameters of the Actor and Critic networks are randomly initialized, while the parameters of the target networks are directly copied from their corresponding online networks. Initialize the experience replay buffer, which is used to store state–action–reward–next state (SARS) tuples.
- (2)
- Exploration and learning: At each time step , the agent selects an action based on the current state through the Actor network. To encourage exploration, DDPG introduces noise (typically Ornstein–Uhlenbeck noise or Gaussian noise) during action selection, achieving a balance between exploration and exploitation. Subsequently, the agent executes the action , interacts with the environment, and obtains the next state and the immediate reward .
- (3)
- Experience storage: The current interaction experiences are stored into the experience replay buffer.
- (4)
- Sampling and training: A batch of data is randomly sampled from the experience replay buffer to update the Critic and Actor networks. The Critic network optimizes the Q-value function by minimizing the TD error, where the target Q-value is computed jointly by the target Actor network and the target Critic network. The Actor network optimizes the policy function by maximizing the Q-value output by the Critic network.
- (5)
- Target network update: A soft-update method is adopted to smoothly update the target network parameters with a small step size : . This update method enhances the stability of the training process and prevents drastic changes in target values.
| Algorithm 1: Pseudocode of the DDPG algorithm (DDPG) |
| 1. Input: Initialize policy parameters , action-value function parameters , and experience replay buffer |
| 2. Initialize target network parameters: , |
| 3. For each episode: |
| 4. Observe state , select action: |
| , where |
| 5. Execute action in the environment |
| 6. Observe next state , reward , and termination flag |
| 7. Store transition in the experience replay buffer |
| 8. If the termination condition for is met, reset the environment state |
| 9. If the experience replay buffer is full: |
| 10. For : |
| 11. Randomly sample a mini-batch of transitions from |
| 12. Compute target network parameters: |
| 13. Update the action-value function using gradient descent: |
| 14. Update the policy using gradient ascent |
| 15. Update the target network parameters: |
| 16. End For |
| 17. End the loop until convergence |
5.1.2. Speed Planning Algorithm Based on DDPG
- (1)
- Traffic-light model
- (2)
- Initial state of the vehicle
- (3)
- Vehicle state space
- (4)
- Vehicle action space
- (5)
- Training logic
- (6)
- Reward function and constraint conditions
5.1.3. Training Results of the DDPG Algorithm
5.2. Speed Planning Algorithm Based on Dynamic Programming
- (1)
- Determine the state variable and the control variable of the problem to be solved, and discretize them within their feasible regions (boundary ranges).
- (2)
- Determine the state transition equation and design the single-step cost function based on the control objective.
- (3)
- Perform backward solving; starting from the final stage, traverse all state variables at each stage and the outcomes under all possible corresponding control variables. Determine the state of the previous stage based on the state transition equation and update the cost function until the initial state of the initial stage is reached.
- (4)
- Perform forward solving; starting from the initial state of the initial stage, determine the optimal control variable for each stage by minimizing the cost function, thereby obtaining the optimal control sequence. Based on the state transition equation, obtain the state variables at each stage under the action of the optimal control sequence.
5.3. Rule-Based Speed Planning Algorithm
6. Analysis of Speed Planning Results for DDEVs at Signalized Intersections
6.1. Traffic Efficiency and Comfort Analysis
6.1.1. Analysis of Red-Light Deceleration Passage Condition at Signalized Intersections
6.1.2. Analysis of Green-Light Acceleration Passage Condition at Signalized Intersections
6.2. Energy Consumption Economy Analysis of Speed Planning at Signalized Intersections
7. Conclusions
- (1)
- A hierarchical control framework for the hybrid braking system of DDEVs was constructed, considering individual wheel slip ratio. The upper-layer control strategy decides on different braking control modes and distributes braking torque between the front and rear axles based on wheel speed, vehicle speed, and brake pedal information. The lower-layer control strategy distributes motor braking torque and hydraulic braking torque for each wheel according to the front and rear axle braking torque information input from the upper layer, along with constraints such as battery and motor torque, achieving safe and efficient distribution of braking torque.
- (2)
- A multi-objective Markov decision process model was established, integrating vehicle state, road geometry information, and traffic signal phase. With traffic efficiency, energy consumption economy, and ride comfort as the comprehensive reward function, an integrated intelligent speed planning and energy management strategy based on DDPG was designed. It overcomes the technical bottleneck of the difficulty in collaboratively balancing multiple objectives and achieves adaptive speed regulation in dynamic intersection conditions.
- (3)
- Verified through simulations of typical conditions—deceleration at a red light and acceleration at a green light at signalized intersections—the proposed DDPG strategy, compared to traditional rule-based control strategies and the classic DP algorithm, achieves optimal traffic efficiency while maintaining energy consumption within an excellent range. Simultaneously, it effectively suppresses acceleration fluctuations caused by rapid acceleration and deceleration, balancing ride comfort. It achieves an optimal multi-objective balance among traffic efficiency, energy consumption, and comfort, fully validating the effectiveness and superiority of the proposed strategy.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| DDEV | Distributed-Drive Electric Vehicle |
| DDPG | Deep Deterministic Policy Gradient |
| SOC | State of Charge |
| DP | Dynamic Programming |
| EV | Electric Vehicle |
References
- Park, Y.; Park, S.; Ahn, C. Performance Potential of Regenerative Braking Energy Recovery of Autonomous Electric Vehicles. Int. J. Control. Autom. Syst. 2023, 21, 1442–1454. [Google Scholar] [CrossRef]
- Deepa, M.K.; Sridharan, S.; Subramanian, S.C. Energy Efficiency Improvement Framework for Regenerative Braking System in Electric Vehicles. IEEE Trans. Transp. Electrif. 2026, 12, 1994–2008. [Google Scholar] [CrossRef]
- Mazouzi, A.; Hadroug, N.; Hafaifa, A.; Iratni, A.; Colak, I. Particle swarm optimization of fuzzy logic—Based energy management system for enhanced efficiency in fuel cell hybrid electric vehicles. Sustain. Comput. Inform. Syst. 2025, 48, 101239. [Google Scholar] [CrossRef]
- Li, N.; Huang, Z.Y.; Wang, C.P.; Ning, X. Particle Swarm Optimization and Fuzzy Logic Co—Optimization for Energy Efficiency Cooperative Energy Management Strategy of Hybrid Energy Storage Electric Vehicles. World Electr. Veh. J. 2026, 17, 73. [Google Scholar] [CrossRef]
- Xu, S.W.; Li, J.Q.; Zhang, X.P.; Song, J.; Zeng, X. Research on Composite Braking Control Strategy of Four—Wheel—Drive Electric Vehicles with Multiple Motors Based on Braking Energy Recovery Optimization. IEEE Access 2023, 11, 110151–110163. [Google Scholar] [CrossRef]
- Ge, S.S.; Li, Q.H.; Xie, Z.Q.; Zhang, Z. Research on torque distribution control strategy of distributed—Drive electric vehicles for large—Slopes with low—Adhesion. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2025, 1–20. [Google Scholar] [CrossRef]
- Yang, L.; Tan, D. Study on hybrid brake control of distributed drive electric vehicle. Mech. Sci. Technol. Aerosp. Eng. 2021, 40, 619–626. (In Chinese) [Google Scholar] [CrossRef]
- Zhu, S.P.; Jiang, X.D.; Wang, Y.R.; Ye, X.Y.; Yu, G.; Xu, L.F. Research on Parallel Braking Control of Distributed Four—Wheel—Drive Electric Vehicle. Qiche Gongcheng Automot. Eng. 2020, 42, 1506–1512, 1544. [Google Scholar] [CrossRef]
- Li, B.; Pan, P.; Shen, H.Y.; L, J.; Li, L.; Liu, Z. Investigation of Cooperative Control Strategy for Path Tracking and Braking Energy Recovery for Intelligent Distributed—Drive Vehicles. China J. Highw. Transp. 2022, 35, 292–304. (In Chinese) [Google Scholar]
- He, R.; Xie, Y.K. Research on the Synchronization Control Strategy of Regenerative Braking of Distributed Drive Electric Vehicles. World Electr. Veh. J. 2024, 15, 512. [Google Scholar] [CrossRef]
- Wang, L.; Shu, Q.X.; Zhou, D.S.; Ti, Y. Extenics Coordinated Torque Distribution Control for Distributed Drive Electric Vehicles Considering Stability and Energy Efficiency. Actuators 2025, 15, 3. [Google Scholar] [CrossRef]
- Cai, G.S.; Yin, G.D.; Pi, D.W.; Zhuang, W.; Feng, J.; Ren, Y.; Ding, H. Safety Region—Based Event—Driven Lateral Stability Control for DDEVs with Energy Conservation. IEEE Trans. Transp. Electrif. 2025, 11, 13976–13989. [Google Scholar] [CrossRef]
- Techalimsakul, P.; Keyoonwong, W. Integrated Vehicle—Following Control for Four—Wheel Independent Drive Based on Regenerative Braking System Control Mechanism for Battery Electric Vehicle Conversion Driven by PMSM 30 kW. Energies 2024, 17, 2576. [Google Scholar] [CrossRef]
- Chen, Z.Y.; Xiong, R.; Cai, X.; Wang, Z.; Yang, R. Regenerative Braking Control Strategy for Distributed Drive Electric Vehicles Based on Slope and Mass Co—Estimation. IEEE Trans. Intell. Transp. Syst. 2023, 24, 14610–14619. [Google Scholar] [CrossRef]
- Zhang, X.D.; Göhlich, D.; Li, J.Y. Energy—Efficient Torque Allocation Design of Traction and Regenerative Braking for Distributed Drive Electric Vehicles. IEEE Trans. Veh. Technol. 2018, 67, 285–295. [Google Scholar] [CrossRef]
- Jin, L.Q.; Fan, J.P.; Fei, T. Coordinated control strategy of electro—Mechanical composite braking for four—Wheel drive electric vehicles. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2025, 239, 2838–2853. [Google Scholar] [CrossRef]
- Zhao, K.K.; Fan, X.B.; Huang, Z.P.; Wang, L.H.; Peng, J.X. A review of drive torque distribution control for distributed drive electric vehicles. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2025, 239, 5291–5315. [Google Scholar] [CrossRef]
- Hua, M.; Chen, G.Y.; Zhang, B.Y.; Huang, Y. A hierarchical energy efficiency optimization control strategy for distributed drive electric vehicles. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2019, 233, 605–621. [Google Scholar] [CrossRef]
- Iwaki, K.; Nakamura, K. Experimental Improvement of Speed—Torque Characteristics in Magnetic—Geared Switched Reluctance Motor. IEEE Trans. Magn. 2025, 61, 8202605. [Google Scholar] [CrossRef]
- Ming, X.; Wang, X.Y.; Liu, F.C.; Qu, Y.; Zhou, B.; Zhang, S.; Yu, P. Mechanical Parameter Identification of Permanent Magnet Synchronous Motor Based on Symmetry. Symmetry 2025, 17, 1929. [Google Scholar] [CrossRef]
- Tekin, M.; Karamangil, M.I. Comparative analysis of equivalent circuit battery models for electric vehicle battery management systems. J. Energy Storage 2024, 86, 111327. [Google Scholar] [CrossRef]
- Louback, E.; Kollmeyer, P.J.; Emadi, A. Braking Strategy Characterization for a Dual—Motor Battery Electric Vehicle and Regenerative Torque Limit Derivation. IEEE Access 2025, 13, 192920–192934. [Google Scholar] [CrossRef]
- Li, S.Q.; Yu, B.; Feng, X.Y. Research on braking energy recovery strategy of electric vehicle based on ECE regulation and I curve. Sci. Prog. 2020, 103. [Google Scholar] [CrossRef]
- Wahid, M.R.; Joelianto, E.; Budiman, B.A.; Dewanata, M.P.; Aziz, M. Optimizing regenerative braking in light electric vehicles using deep deterministic policy gradient reinforcement learning. Egypt. Inform. J. 2026, 33, 100893. [Google Scholar] [CrossRef]
- Ouyang, T.C.; Jin, S.; Xie, X.J.; Gong, Y.; Zhang, Z. Adaptive Energy Management in Dual—Motor Electric Vehicles Using Deep Deterministic Policy Gradient. IEEE Trans. Transp. Electrif. 2025, 11, 12647–12656. [Google Scholar] [CrossRef]
- Fan, D.Y.; Shen, H.K.; Dong, L.J. Multi—Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking. Actuators 2021, 10, 268. [Google Scholar] [CrossRef]
- Yin, Y.L.; Xiao, H.Y.; Zhan, S.; Chen, H.; Deng, C.; Li, Z.; Pan, X. Hierarchical control of hybrid electric vehicle platoon with slope—Adaptive variable spacing and soft actor—Critic based energy management. J. Energy Storage 2026, 152, 120623. [Google Scholar] [CrossRef]
- Manivannan, R. Research on IoT—Based hybrid electrical vehicles energy management systems using machine learning -based algorithm. Sustain. Comput.-Inform. Syst. 2024, 41, 100943. [Google Scholar] [CrossRef]
- Ma, Y.; Ma, Q.; Liu, Y.Q.; Gao, J.; Chen, H. Two—Level optimization strategy for vehicle speed and battery thermal management in connected and automated EVs. Appl. Energy 2024, 361, 122928. [Google Scholar] [CrossRef]
- Pan, C.; Li, Y.; Huang, A.; Wang, J.; Liang, J. Energy—Optimized adaptive cruise control strategy design at intersection for electric vehicles based on speed planning. Sci. China Technol. Sci. 2023, 66, 3504–3521. [Google Scholar] [CrossRef]
- Chen, S.H.; Zhang, Z.; Zhang, J.; Lu, Y.; Yu, X.; Xuan, D. Hierarchical energy management for FCHEV in car-following scenarios with speed prediction. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2026, 1–18. [Google Scholar] [CrossRef]
- Chen, G.D.; Zhu, P.M.; Peng, X.J.; Huang, C.; Li, H. Watermarking—Based Attack Detection for Sensor Networks with Intermittent Observation Under Stealthy Attacks. J. Syst. Sci. Complex. 2026, 1, 1–23. [Google Scholar] [CrossRef]
















| Road Number | Length |
|---|---|
| Section1 | 538 m |
| Section2 | 417 m |
| Section3 | 697 m |
| Section4 | 363 m |
| Section5 | 323 m |
| Section6 | 400 m |
| Section7 | 377 m |
| Section8 | 660 m |
| Section9 | 220 m |
| Section10 | 497 m |
| Section11 | 465 m |
| Section12 | 245 m |
| Section13 | 300 m |
| Traffic-Light Number | Green Phase | Red Phase | Total Phase |
|---|---|---|---|
| TL1 | 50 s | 60 s | 110 s |
| TL2 | 54 s | 57 s | 111 s |
| TL3 | 70 s | 40 s | 110 s |
| TL4 | 43 s | 67 s | 110 s |
| TL5 | 65 s | 45 s | 110 s |
| TL6 | 34 s | 22 s | 56 s |
| TL7 | 56 s | 53 s | 109 s |
| TL8 | 57 s | 51 s | 108 s |
| TL9 | 66 s | 42 s | 108 s |
| TL10 | 42 s | 68 s | 110 s |
| TL11 | 47 s | 63 s | 110 s |
| TL12 | 70 s | 38 s | 108 s |
| Parameter Name | Numerical Value | Parameter Name | Numerical Value |
|---|---|---|---|
| Simulation step(s) | 1 | Exploration noise type | Ornstein–Uhlenbeck |
| Actor learning rate | 0.001 | OU noise mean | 0 |
| Soft-update coefficient | 0.005 | OU noise initial standard deviation | 0.3 |
| Experience replay buffer size | 1,000,000 | OU noise decay rate | 1 × 10−5 |
| Batch size for training | 256 | Actor network hidden-layer architecture | [128, 200] |
| Reward discount factor | 0.95 | Critic network hidden-layer architecture | [128, 200] |
| Gradient threshold | 1 | Hidden-layer activation function | ReLU |
| Maximum simulation steps per episode | 2000 | Actor output layer activation function | Tanh |
| Maximum training episodes | 1000 | Critic output layer activation function | Linear |
| Weight coefficients of reward components , , | 0.4, 0.4, 0.2 | Target network update method | Soft update |
| Operating Condition | Unit | DDPG | Rule-Based Control Strategy | DP |
|---|---|---|---|---|
| Deceleration passage at a red light | kWh/km | 0.0315 | 0.0309 | 0.0359 |
| Acceleration passage at a green light | kWh/km | 0.0446 | 0.0475 | 0.0534 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, N.; Lin, Y.; Huang, Z.; Hong, Y.; Ning, X. Research on Speed Planning and Energy Management Strategy for Distributed-Drive Electric Vehicles Based on Deep Deterministic Policy Gradient Algorithm. Actuators 2026, 15, 248. https://doi.org/10.3390/act15050248
Li N, Lin Y, Huang Z, Hong Y, Ning X. Research on Speed Planning and Energy Management Strategy for Distributed-Drive Electric Vehicles Based on Deep Deterministic Policy Gradient Algorithm. Actuators. 2026; 15(5):248. https://doi.org/10.3390/act15050248
Chicago/Turabian StyleLi, Ning, Yong Lin, Zhongyuan Huang, Yihao Hong, and Xiaobin Ning. 2026. "Research on Speed Planning and Energy Management Strategy for Distributed-Drive Electric Vehicles Based on Deep Deterministic Policy Gradient Algorithm" Actuators 15, no. 5: 248. https://doi.org/10.3390/act15050248
APA StyleLi, N., Lin, Y., Huang, Z., Hong, Y., & Ning, X. (2026). Research on Speed Planning and Energy Management Strategy for Distributed-Drive Electric Vehicles Based on Deep Deterministic Policy Gradient Algorithm. Actuators, 15(5), 248. https://doi.org/10.3390/act15050248

