Electric Vehicle Cluster Charging Scheduling Optimization: A Forecast-Driven Multi-Objective Reinforcement Learning Method
Abstract
1. Introduction
- (1)
- EV charging control is modeled as an MDP. Random variable distributions are used to approximate the actual occurrences of vehicle state of charge (SoC) and owner travel behavior information. Historical load data and sequence neural network models are used to predict the charging demand of a residential community. Furthermore, the set of EV charging piles is treated as an agent interacting with the environment, allocating charging power to maximize cumulative reward.
- (2)
- Importance sampling logic is introduced into the experience replay buffer of the TD3 reinforcement learning method. This aims to find a globally optimal charging strategy in continuous action scenarios, minimizing electricity purchasing costs while meeting the distribution grid’s requirements for long-term load fluctuation magnitude and short-term growth rate.
- (3)
- A GRU model is utilized to forecast 48 h base load information to determine the current charging action. Additionally, Gaussian noise is added to the actor network to ensure effective exploration by the agent.
- (4)
- Simulation results demonstrate that compared to the baseline method, the proposed method balances low charging costs with distribution grid load requirements.
2. EV Charging Scheduling Model
2.1. Scenario Description
2.2. Charging Scheduling Model
2.2.1. Optimization Objectives
2.2.2. Constraints
3. EV Scheduling Based on GRU-TD3 Method
3.1. Charging Scheduling MDP Model
3.1.1. System State
3.1.2. Action Mapping
3.1.3. State Transition
3.1.4. Reward Function
3.2. GRU-TD3 Algorithm
3.2.1. GRU-Based Load Forecasting Model
3.2.2. TD3-Based EV Scheduling Method
- (1)
- Clipped Double Q-Learning:
- (2)
- Experience Replay with Importance Sampling:
- (3)
- Policy Update:
- (4)
- Network Architecture:
3.3. Algorithm Flow
3.3.1. Initialization
3.3.2. Training
| Algorithm 1: GRU-TD3 Initial Training | ||
| Inputs: horizon T; episodes N; replay buffer 𝔇; batch size M; discount γ; target rate τ; exploration coefficient ε; prioritized replay exponents α, β; step sizes ρ1, ρ2 | ||
| Outputs: trained actor parameters η | ||
| 1 | Initialize greedy coefficient ε and importance-sampling coefficients α, β; action and critic network parameters | |
| 2 | Set the target parameters | |
| 3 | For episode = 1 to N do | |
| 4 | Use GRU to predict the 48-h load from the previous 3-day history | |
| 5 | Initialize episode state with 200 EV info, , and time-of-use price | |
| 6 | For t = 1 to T do | |
| 7 | Obtain policy output ; add noise | |
| 8 | Execute action to obtain , | |
| 9 | Store into the replay buffer | |
| 10 | End | |
| 11 | Importance-sample a batch M | |
| 12 | Compute evaluated Q-value | |
| 13 | Generate next action by target actor | |
| 14 | Compute target Q-value via target critic to obtain | |
| 15 | Critic residual loss: | |
| 16 | Policy gradient: | |
| 17 | Policy gradient: , | |
| 18 | Soft-update targets: , | |
| 19 | Schedule coefficients: , | |
| 20 | End | |
3.3.3. Testing
| Algorithm 2: EV Charging Control Strategy Evaluation | ||
| Inputs: trained actor parameters; horizon T; number of days N; GRU prediction network | ||
| Outputs: mean values of the three evaluation metrics; average 24-h load curve | ||
| 1 | Load the trained actor parameters η | |
| 2 | For episode = 1 to N do | |
| 3 | Use the GRU prediction network with the previous three-day history to obtain | |
| 4 | Initialize episode state with 200 EV info, , and time-of-use price | |
| 5 | For t = 1 to T do | |
| 6 | Obtain the policy output | |
| 7 | Map to the feasible action according to the mapping relationship and priority | |
| 8 | Execute , observe , compute and record the load trajectory | |
| 9 | End | |
| 10 | Calculate the standard deviation of the load, average charging cost, and short-term load growth rate | |
| 11 | End | |
| 12 | Report the mean values of the three evaluation metrics over N episodes and plot the average 24-h load variation curve | |
4. Simulation Results and Evaluation
4.1. Simulation Setup
4.1.1. Dataset and Parameters
4.1.2. Benchmark Setup
- (1)
- Algorithmic comparisons
- GRU-TD3 (proposed): Forecast-augmented DRL model combining GRU prediction and TD3 decision-making for adaptive and stable scheduling.
- DDPG: Classical DRL algorithm for continuous control, used to verify the improvement brought by the proposed GRU-TD3.
- PSO: Representative model-based optimization method; each particle encodes EV charging power, minimizing a weighted sum of cost and load variance.
- (2)
- Multi-strategy scheduling comparison
- Base Load: No electric vehicle charging load.
- Uncoordinated Charging: Vehicles charge immediately at maximum power, significantly increasing load pressure during peak hours.
- Cost-Only Optimization: Solely considers charging cost, leading to a pronounced “midnight peak” phenomenon.
4.2. Load Forecasting Results and Validation
4.3. Scheduling Result Analysis
4.3.1. Scheduling Algorithms Results Comparison
4.3.2. Multi-Strategy Scheduling Results Comparison
4.4. Parameter Ablation Simulation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| Remaining State of Charge of the i-th vehicle upon arrival | |
| Q | Q-value function |
| Rt | reward function |
| at | action at time t |
| Pi,j | charging power of the j-th vehicle at time i |
| rt | reward at time t |
| st | state at time t |
| ti,arr | arrival time of the i-th vehicle |
| ti,dep | estimated departure time of the i-th vehicle |
| discount factor | |
| exploration coefficient | |
| charging efficiency | |
| learning rate of actor network | |
| learning rate of critic network | |
| soft-update factor |
References
- Salman, M.; Arslan, M.; Khan, S.A.; Fahad, S.; Imran, M.; Ullah, S. Chapter 14—Policies for the future: Promoting electric vehicle deployment. In Handbook on New Paradigms in Smart Charging for E-Mobility; Kumar, A., Bansal, R.C., Kumar, P., HE, X., Eds.; Elsevier: Amsterdam, The Netherlands, 2025; pp. 481–507. [Google Scholar]
- Salman, M.; Arslan, M.; Khan, S.A.; Fahad, S.; Imran, M.; Ullah, S. Chapter 11—Demand-side management and managing electric vehicles and their optimal charging locations and scheduling in smart grids. In Handbook on New Paradigms in Smart Charging for E-Mobility; Kumar, A., Bansal, R.C., Kumar, P., HE, X., Eds.; Elsevier: Amsterdam, The Netherlands, 2025; pp. 375–403. [Google Scholar]
- Nottrott, A.; Kleissl, J.; Washom, B. Storage Dispatch Optimization for Grid-Connected Combined Photovoltaic-Battery Storage Systems. In Proceedings of the 2012 IEEE Power and Energy Society General Meeting, San Diego, CA, USA, 22–26 July 2012; pp. 1–7. [Google Scholar]
- Koyanagi, F.; Uriu, Y. A Strategy of Load Leveling by Charging and Discharging Time Control of Electric Vehicles. IEEE Trans. Power Syst. 2002, 13, 1179–1184. [Google Scholar] [CrossRef]
- Flath, C.M.; Ilg, J.P.; Gottwalt, S.; Schmeck, H.; Weinhardt, C. Improving Electric Vehicle Charging Coordination Through Area Pricing. Transp. Sci. 2014, 48, 619–634. [Google Scholar] [CrossRef]
- Leemput, N.; Geth, F.; Claessens, B.; Van Roy, J.; Ponnette, R.; Driesen, J. A Case Study of Coordinated Electric Vehicle Charging for Peak Shaving on A Low Voltage Grid. In Proceedings of the 2012 3rd IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe), Berlin, Germany, 14–17 October 2012; pp. 1–7. [Google Scholar]
- Li, C.; Zhu, Y.; Lee, K.Y. Route Optimization of Electric Vehicles Based on Reinsertion Genetic Algorithm. IEEE Trans. Transp. Electrif. 2023, 9, 3753–3768. [Google Scholar] [CrossRef]
- Korkas, C.D.; Baldi, S.; Yuan, S.; Kosmatopoulos, E.B. An Adaptive Learning-Based Approach for Nearly Optimal Dynamic Charging of Electric Vehicle Fleets. IEEE Trans. Intell. Transp. Syst. 2017, 19, 2066–2075. [Google Scholar] [CrossRef]
- Zhang, L.; Li, Y. Optimal Management for Parking-Lot Electric Vehicle Charging by Two-Stage Approximate Dynamic Programming. IEEE Trans. Smart Grid 2015, 8, 1722–1730. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, J.; Poor, H.V. Risk-Aware Day-Ahead Scheduling and Real-time Dispatch for Electric Vehicle Charging. IEEE Trans. Smart Grid 2017, 5, 693–702. [Google Scholar] [CrossRef]
- Frendo, O.; Gaertner, N.; Stuckenschmidt, H. Real-Time Smart Charging Based on Precomputed Schedules. IEEE Trans. Smart Grid 2019, 10, 6921–6932. [Google Scholar] [CrossRef]
- Sarabi, S.; Kefsi, L. Electric Vehicle Charging Strategy Based on A Dynamic Programming Algorithm. In Proceedings of the IEEE International Conference on Intelligent Energy and Power Systems (IEPS), Kyiv, Ukraine, 2–6 June 2014; pp. 1–5. [Google Scholar]
- Liu, Z.F.; Zhang, W.; Ji, X.; Li, K. Optimal Planning of Charging Station for Electric Vehicle Based on Particle Swarm Optimization. In Proceedings of the IEEE PES Innovative Smart Grid Technologies, Tianjin, China, 21–24 May 2012; pp. 1–5. [Google Scholar]
- Ihekwaba, A.; Kim, C. Analysis of Electric Vehicle Charging Impact on Grid Voltage Regulation. In Proceedings of the 2017 North American Power Symposium (NAPS), Morgantown, WV, USA, 17–19 September 2017; pp. 1–6. [Google Scholar]
- Lample, G.; Chaplot, D.S. Playing FPS Games with Deep Reinforcement Learning. In AAAI’17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco CA USA, 4–9 February 2017; AAAI Press: San Francisco, CA, USA, 2017; Volume 31. [Google Scholar]
- Waltz, M.; Fu, K.S. A Heuristic Approach to Reinforcement Learning Control Systems. IEEE Trans. Autom. Control 1965, 10, 390–398. [Google Scholar] [CrossRef]
- Keneshloo, Y.; Shi, T.; Ramakrishnan, N.; Reddy, C.K. Deep Reinforcement Learning for Sequence-to-Sequence Models. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 2469–2489. [Google Scholar] [CrossRef] [PubMed]
- Najafi, S.; Livani, H. Robust Day-Ahead Voltage Support and Building Demand Response Scheduling Under Gaussian Mixture Model Uncertainty. IEEE Trans. Ind. Appl. p. 2025, in press. [CrossRef]
- Ding, T.; Zeng, Z.; Bai, J.; Qin, B.; Yang, Y.; Shahidehpour, M. Optimal Electric Vehicle Charging Strategy With Markov Decision Process and Reinforcement Learning Technique. IEEE Trans. Ind. Appl. 2020, 56, 5811–5823. [Google Scholar] [CrossRef]
- Ji, Y.; Wang, Y.; Zhao, H.; Gui, G.; Gacanin, H.; Sari, H.; Adachi, F. Multi-Agent Reinforcement Learning Resources Allocation Method Using Dueling Double Deep Q-Network in Vehicular Networks. IEEE Trans. Veh. Technol. 2023, 72, 13447–13460. [Google Scholar] [CrossRef]
- Huang, J.; Zhou, X. Optimizing EV Charging Station Placement in New South Wales: A Soft Actor-Critic Reinforcement Learning Approach. In Proceedings of the 5th International Conference on Computer Engineering and Application (ICCEA), Hangzhou, China, 12–14 April 2024; pp. 1790–1794. [Google Scholar]
- Lotfy, A.; Chaoui, H.; Kandidayeni, M.; Boulon, L. Enhancing Energy Management Strategy for Battery Electric Vehicles: Incorporating Cell Balancing and Multi-Agent Twin Delayed Deep Deterministic Policy Gradient Architecture. IEEE Trans. Veh. Technol. 2024, 73, 16593–16607. [Google Scholar] [CrossRef]
- Bi, X.; Gao, D.; Yang, M. A Reinforcement Learning-Based Routing Protocol for Clustered EV-VANET. In Proceedings of the 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 12–14 June 2020; pp. 1769–1773. [Google Scholar]
- Suresh Kumar, S.; Margala, M.; Siva Shankar, S.; Chakrabarti, P. A Novel Weight-Optimized LSTM for Dynamic Pricing Solutions in E-commerce Platforms Based on Customer Buying Behaviour. Soft Comput. 2023, 6, 1–13. [Google Scholar] [CrossRef]
- Cao, J.; Crozier, C.; McCulloch, M.; Fan, Z. Optimal Design and Operation of a Low Carbon Community Based Multi-Energy Systems Considering EV Integration. IEEE Trans. Sustain. Energy 2018, 10, 1217–1226. [Google Scholar] [CrossRef]
- Marasciuolo, F.; Orozco, C.; Dicorato, M.; Borghetti, A.; Forte, G. Chance-Constrained Calculation of the Reserve Service Provided by EV Charging Station Clusters in Energy Communities. IEEE Trans. Ind. Appl. 2023, 59, 4700–4709. [Google Scholar] [CrossRef]











| Scene | EV Charging Cost (¥/Vehicle) | Daily Load Standard Deviation |
|---|---|---|
| Basic Load | - | 784.8 |
| Disorderly Charging | 42.79 | 915.6 |
| Cost-priority | 24.40 | 788.4 |
| Proposed Method | 24.41 | 729.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhao, Y.; Jia, X.; Tan, S.; Liang, Y.; Wang, P.; Wang, Y. Electric Vehicle Cluster Charging Scheduling Optimization: A Forecast-Driven Multi-Objective Reinforcement Learning Method. Energies 2026, 19, 647. https://doi.org/10.3390/en19030647
Zhao Y, Jia X, Tan S, Liang Y, Wang P, Wang Y. Electric Vehicle Cluster Charging Scheduling Optimization: A Forecast-Driven Multi-Objective Reinforcement Learning Method. Energies. 2026; 19(3):647. https://doi.org/10.3390/en19030647
Chicago/Turabian StyleZhao, Yi, Xian Jia, Shuanbin Tan, Yan Liang, Pengtao Wang, and Yi Wang. 2026. "Electric Vehicle Cluster Charging Scheduling Optimization: A Forecast-Driven Multi-Objective Reinforcement Learning Method" Energies 19, no. 3: 647. https://doi.org/10.3390/en19030647
APA StyleZhao, Y., Jia, X., Tan, S., Liang, Y., Wang, P., & Wang, Y. (2026). Electric Vehicle Cluster Charging Scheduling Optimization: A Forecast-Driven Multi-Objective Reinforcement Learning Method. Energies, 19(3), 647. https://doi.org/10.3390/en19030647
