Abstract
Accurate multivariate traffic flow forecasting is critical for intelligent transportation systems yet remains challenging due to the complex interplay of temporal dynamics and spatial interactions. While Transformer-based models have shown promise in capturing long-range temporal dependencies, most existing approaches compress multidimensional observations into flattened sequences—thereby neglecting explicit modeling of cross-dimensional (i.e., spatial or inter-variable) relationships, which are essential for capturing traffic propagation, network-wide congestion, and node-specific behaviors. To address this limitation, we propose TSAformer, a novel Transformer architecture that explicitly preserves and jointly models time and dimension as dual structural axes. TSAformer begins with a multimodal input embedding layer that encodes raw traffic values alongside temporal context (time-of-day and day-of-week) and node-specific positional features, ensuring rich semantic representation. The core of TSAformer is the Two-Stage Attention (TSA) module, which first models intra-dimensional temporal evolution via time-axis self-attention then captures inter-dimensional spatial interactions through a lightweight routing mechanism—avoiding quadratic complexity while enabling all-to-all cross-node communication. Built upon TSA, a hierarchical encoder–decoder (HED) structure further enhances forecasting by modeling traffic patterns across multiple temporal scales, from fine-grained fluctuations to macroscopic trends, and fusing predictions via cross-scale attention. Extensive experiments on three real-world traffic datasets—including urban road networks and highway systems—demonstrate that TSAformer consistently outperforms state-of-the-art baselines across short-term and long-term forecasting horizons. Notably, it achieves top-ranked performance in 36 out of 58 critical evaluation scenarios, including peak-hour and event-driven congestion prediction. By explicitly modeling both temporal and dimensional dependencies without structural compromise, TSAformer provides a scalable, interpretable, and high-performance solution for spatiotemporal traffic forecasting.