Skip Content
You are currently on the new version of our website. Access the old version .
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

31 January 2026

EVformer: A Spatio-Temporal Decoupled Transformer for Citywide EV Charging Load Forecasting

and
School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China
*
Author to whom correspondence should be addressed.
World Electr. Veh. J.2026, 17(2), 71;https://doi.org/10.3390/wevj17020071 
(registering DOI)
This article belongs to the Section Vehicle Management

Abstract

Accurate forecasting of citywide electric vehicle (EV) charging load is critical for alleviating station-level congestion, improving energy dispatching, and supporting the stability of intelligent transportation systems. However, large-scale EV charging networks exhibit complex and heterogeneous spatio-temporal dependencies, and existing approaches often struggle to scale with increasing station density or long forecasting horizons. To address these challenges, we develop a modular spatio-temporal prediction framework that decouples temporal sequence modeling from spatial dependency learning under an encoder–decoder paradigm. For temporal representation, we introduce a global aggregation mechanism that compresses multi-station time-series signals into a shared latent context, enabling efficient modeling of long-range interactions while mitigating the computational burden of cross-channel correlation learning. For spatial representation, we design a dynamic multi-scale attention module that integrates graph topology with data-driven neighbor selection, allowing the model to adaptively capture both localized charging dynamics and broader regional propagation patterns. In addition, a cross-step transition bridge and a gated fusion unit are incorporated to improve stability in multi-horizon forecasting. The cross-step transition bridge maps historical information to future time steps, reducing error propagation. The gated fusion unit adaptively merges the temporal and spatial features, dynamically adjusting their contributions based on the forecast horizon, ensuring effective balance between the two and enhancing prediction accuracy across multiple time steps. Extensive experiments on a real-world dataset of 18,061 charging piles in Shenzhen demonstrate that the proposed framework achieves superior performance over state-of-the-art baselines in terms of MAE, RMSE, and MAPE. Ablation and sensitivity analyses verify the effectiveness of each module, while efficiency evaluations indicate significantly reduced computational overhead compared with existing attention-based spatio-temporal models.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.