Adaptive Cross-Scale Graph Fusion with Spatio-Temporal Attention for Traffic Prediction

Zhao, Zihao; Zhu, Xingzheng; Ye, Ziyun

doi:10.3390/electronics14173399

Open AccessArticle

Adaptive Cross-Scale Graph Fusion with Spatio-Temporal Attention for Traffic Prediction

by

Zihao Zhao

^1,2,†

,

Xingzheng Zhu

^2,*,†

and

Ziyun Ye

²

¹

College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China

²

Institute of Applied Artificial Intelligence of the Guangdong-Hong Kong-Macao Greater Bay Area, Undergraduate School of Artificial Intelligence, Shenzhen Polytechnic University, Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(17), 3399; https://doi.org/10.3390/electronics14173399

Submission received: 28 July 2025 / Revised: 24 August 2025 / Accepted: 24 August 2025 / Published: 26 August 2025

(This article belongs to the Special Issue Graph-Based Learning Methods in Intelligent Transportation Systems)

Download

Browse Figures

Versions Notes

Abstract

Traffic flow prediction is a critical component of intelligent transportation systems, playing a vital role in alleviating congestion, improving road resource utilization, and supporting traffic management decisions. Although deep learning methods have made remarkable progress in this field in recent years, current studies still face challenges in modeling complex spatio-temporal dependencies, adapting to anomalous events, and generalizing to large-scale real-world scenarios. To address these issues, this paper proposes a novel traffic flow prediction model. The proposed approach simultaneously leverages temporal and frequency domain information and introduces adaptive graph convolutional layers to replace traditional graph convolutions, enabling dynamic capture of traffic network structural features. Furthermore, we design a frequency–temporal multi-head attention mechanism for effective multi-scale spatio-temporal feature extraction and develop a cross-multi-scale graph fusion strategy to enhance predictive performance. Extensive experiments on real-world datasets, PeMS and Beijing, demonstrate that our method significantly outperforms state-of-the-art (SOTA) baselines. For example, on the PeMS20 dataset, our model achieves a 53.6% lower MAE, a 12.3% lower NRMSE, and a 3.2% lower MAPE than the best existing method (STFGNN). Moreover, the proposed model achieves competitive computational efficiency and inference speed, making it well-suited for practical deployment.

Keywords:

traffic flow prediction; frequency–time domain attention; adaptive graph convolution; cross-scale fusion

1. Introduction

In the rapidly advancing fields of urban development and intelligent transportation management, the ability to analyze and predict traffic data is essential for sustainable infrastructure planning [1,2,3]. Accurate traffic forecasting enables planners to take proactive measures, optimizing the management of transit systems, networks, and operations [4,5]. Reliable predictions assist infrastructure operators in making informed decisions and refining rerouting strategies, ultimately improving overall system performance [6,7]. In recent years, spatio-temporal modeling using graph neural networks (GNNs) has gained significant attention in traffic data analysis [8,9,10]. These approaches model transportation networks as graphs, where nodes represent intersections and edges denote road segments. By combining GNNs with temporal sequence models, such frameworks can effectively capture complex spatial and temporal dependencies, leading to substantial performance improvements on real-world traffic datasets [2].

However, existing graph-based forecasting models continue to face several persistent challenges [11]. A primary issue is the construction of spatial dependencies [2]. Many approaches rely on fixed graph structures based on geographic distances or road connectivity [12,13,14], which fail to capture the dynamic and context-sensitive nature of traffic interactions. In real-world scenarios, spatial correlations among traffic nodes evolve over time due to fluctuations in traffic flow, changes in signal control, and other external factors. While adaptive graph learning methods have been proposed to address this limitation, they often involve a large number of trainable parameters and rely on implicit optimization objectives, making them difficult to train and highly sensitive to data scarcity.

A second limitation stems from the coupling between spatial graph learning and temporal modeling [11,15]. Most existing models incorporate graph structure learning into the training process of the prediction model, leading to a bi-level optimization procedure that increases model complexity and reduces training stability. Furthermore, this coupling often lacks explicit supervision for the learned graph structures, making it challenging to ensure interpretability or convergence. The issue becomes even more pronounced when modeling longer temporal horizons, as the interaction between spatial topology and temporal dynamics grows increasingly complex and intertwined.

Moreover, traffic data inherently exhibit complex multi-scale temporal patterns—such as hourly fluctuations, daily commuting cycles, and weekly trends—that are challenging to fully capture with single-resolution temporal models [16]. While some recent studies have incorporated periodic decomposition or attention mechanisms, they often treat periodicity as a fixed prior and fail to explicitly model its interactions with spatial structures.

To address the challenges in traffic forecasting, we propose a novel framework that decouples spatial topology learning from temporal modeling while integrating frequency-aware, multi-scale representation learning. The framework consists of four key stages: First, we introduce a frequency domain analysis module that utilizes Short-Time Fourier Transform (STFT) or Fast Fourier Transform (FFT) to uncover dominant periodic patterns in the traffic time series. By extracting the top-k frequencies, this module guides temporal segmentation and multi-scale processing, enabling more accurate temporal modeling. Second, to capture the dynamic nature of traffic interactions, we design the Adaptive Graph Convolution Block (GraphBlock). This module constructs spatial graphs based on latent node representations, allowing the model to learn time-varying spatial dependencies in a data-driven manner. Third, we introduce a hybrid attention mechanism that integrates temporal self-attention with frequency domain attention, enabling the simultaneous modeling of short-term fluctuations and long-term periodic trends across multiple temporal scales. Finally, the spatio-temporal representations extracted at different temporal scales are fused, allowing the model to effectively capture both fine-grained variations and long-term trends.

This modular architecture offers a principled solution to the limitations of existing approaches, particularly in modeling dynamic spatial structures and multi-scale temporal dependencies. Our contributions can be summarized as follows:

We propose a novel framework that explicitly decouples spatial topology learning from temporal modeling, which alleviates optimization difficulties and improves both stability and interpretability.
The framework incorporates frequency domain analysis to extract dominant periodic components and short-term fluctuations, which are then integrated into adaptive graph convolution layers to dynamically capture spatio-temporal dependencies.
A frequency–temporal multi-head attention mechanism is further designed for effective multi-scale spatio-temporal feature extraction combined with a cross-scale graph fusion strategy to enhance prediction accuracy.
From a physical perspective, the model aligns with fundamental traffic flow properties by simultaneously capturing high-frequency short-term fluctuations and low-frequency long-term periodicity, thereby providing interpretable insights into traffic dynamics.
Extensive experiments on real-world datasets (PeMS and Beijing) demonstrate that the proposed model outperforms various state-of-the-art (SOTA) models in terms of MAE, NRMSE, and MAPE while maintaining high computational efficiency, making it suitable for practical deployment.

The paper is organized as follows: Section 2 reviews related work, Section 3 presents our proposed method, Section 4 details the experiments and results, and Section 5 concludes the paper.

2. Related Work

In this section, we briefly review the related work on spatial dependency modeling, coupled and decoupled spatio-temporal modeling, and multi-scale and frequency-aware temporal modeling in graph learning-based traffic predictors.

2.1. Spatial Dependency Modeling in Traffic Forecasting

Graph-based methods have become a cornerstone in traffic forecasting due to their ability to model complex topologies of road networks. Early methods typically relied on static graph constructions based on physical connectivity or Euclidean distances between sensors or intersections, such as DCRNN [12], STGCN [13], and ASTGCN [14]. While these approaches demonstrated effectiveness in capturing spatial locality, they were limited in their capacity to model the dynamic and time-varying spatial relationships inherent in real-world traffic systems.

To address this, several adaptive and dynamic graph learning methods have been proposed. For example, Graph WaveNet [11] and AGCRN [17] introduced learnable adjacency matrices that are optimized jointly with the forecasting objective. More recent works such as DGCRN [18] and STFGNN [19] further enhance the capability by explicitly learning temporally evolving graph structures, enabling the model to better reflect dynamic traffic interactions. These methods show improved flexibility and performance, but often come with increased model complexity and training instability due to the implicit optimization of graph parameters.

2.2. Coupled and Decoupled Spatio-Temporal Modeling

Several studies tightly couple spatial and temporal modeling within unified neural architectures. For instance, STG2Seq [15] and Graph WaveNet [11] adopt a sequential combination of graph convolutions and temporal modeling blocks. While effective in capturing joint dependencies, this coupling typically leads to a bi-level optimization problem, where the graph structure and temporal dynamics are learned simultaneously, potentially hindering convergence and generalization.

In response to these challenges, recent studies have proposed decoupled architectures that separate spatial and temporal learning components. STNorm [20] introduces decoupled normalization for spatial and temporal dimensions, while STGFormer [21] and STPT [22] leverage modular Transformer-based designs to independently model spatial topologies and temporal trends. This separation not only reduces model complexity but also enhances interpretability and training stability, making it easier to impose supervision or inject domain knowledge into either component.

2.3. Multi-Scale and Frequency-Aware Temporal Modeling

Traffic time series inherently exhibit multi-scale periodic patterns, including intra-day variations, weekday–weekend cycles, and seasonal trends. To capture such complex temporal structures, several works have adopted multi-resolution or multi-branch architectures. MSTGCN [23], STMeta [24], and HiFormer [25] incorporate short-term and long-term sequences into separate modeling paths, enabling the networks to learn both fine-grained and coarse-grained temporal dependencies.

In addition, frequency domain methods have gained popularity for uncovering dominant periodic components. FEDformer [26] leverages Fourier-based decomposition for long-term forecasting, while [27] and FEDAF [28] integrate spectral representations into attention mechanisms, enabling frequency-aware sequence learning. These methods provide complementary perspectives for time domain modeling and have shown strong performance in long-range and periodic forecasting tasks.

3. Approach

Our proposed method is composed of four main modules: enhanced frequency domain modeling, an adaptive graph convolution module, frequency–time domain dual attention, and a cross-scale fusion mechanism. The overall architecture is illustrated in Figure 1.

3.1. Enhanced Frequency Domain Modeling

To capture prominent periodic patterns in time series, we introduce a frequency domain analysis module that automatically detects the dominant periodic components. This module provides key references for the subsequent multi-scale graph convolutional networks. It supports two implementation strategies: Short-Time Fourier Transform (STFT) and Fast Fourier Transform (FFT). Both approaches analyze the amplitude spectrum and extract the top-k significant frequencies, from which the corresponding period lengths are calculated.

The input is a 3D tensor

x \in R^{B \times T \times C}

, where B, T, and C represent batch size, time steps, and number of variables, respectively. The data are reshaped to

x_{r} \in R^{(B \cdot T) \times C},

(1)

to allow simultaneous analysis across channels. Applying STFT or FFT yields the frequency representation:

S = S T F T (x_{r}, n_{F F T} = T / 2),

(2)

S = F F T (x_{r}),

(3)

The amplitude spectrum is then computed as follows:

A (f) = \frac{1}{B \cdot C} \sum_{b = 1}^{B} \sum_{c = 1}^{C} |S| .

(4)

with the DC component

A (0)

zeroed to avoid bias. Using a top-k strategy, the k frequencies with the largest amplitudes are selected, and their periods calculated:

P = {P_{1}, P_{2}, \dots, P_{k}},

(5)

P_{i} = \frac{T}{f_{i}}, i = 1, \dots, k .

(6)

The amplitudes of these selected frequencies are expanded to match batch dimensions for subsequent fusion. This module provides essential multi-scale period information for downstream processing.

3.2. Adaptive Graph Convolution Module

The core spatial feature extractor in our model is the GraphBlock, designed using adaptive graph convolutional networks (AGCN). Unlike traditional graph convolutions with fixed adjacency matrices, this module dynamically learns the graph topology based on node features.

For the spatio-temporal input

X \in R^{B \times T \times N}

, the learned adjacency matrix

A \in R^{N \times N}

is computed as follows:

A = softmax (ReLU (X W_{1}) \cdot {(X W_{2})}^{T}),

(7)

where

W_{1}, W_{2} \in R^{N \times d}

are learnable parameters. This dynamic adjacency matrix captures the evolving relationships among nodes over time. A multi-graph convolution is then applied to propagate information across different neighborhoods:

H^{(k)} = σ (\sum_{i = 0}^{k} {\tilde{A}}^{i} X W_{i}),

(8)

where

\tilde{A}

denotes the i-th power of A,

W_{i}

is the weight matrix for order i, and

σ

is a nonlinear activation function. This design enables GraphBlock to adaptively capture both local and global spatial dependencies.

3.3. Frequency–Time Domain Dual Attention

To model both temporal dependencies and periodic characteristics in time series, we introduce a dual attention mechanism that integrates temporal multi-head attention and frequency domain multi-head attention.

Temporal attention captures dependencies among time steps using the standard scaled dot-product attention:

A t t e n t i o n_{t} (Q, K, V) = softmax (\frac{Q K^{t}}{\sqrt{d_{k}}}) V,

(9)

where

d_{k}

is the key dimension. For frequency modeling, FFT is applied to the input along the time axis, yielding frequency domain features

X_{f}

. Multi-head attention is then applied to

X_{f}

:

H_{f} = MultiHeadAttention (X_{f}) .

(10)

Since the temporal and frequency representations may differ in sequence length, linear interpolation aligns them. Finally, both features are concatenated and passed through a linear layer for fusion, producing a rich and expressive feature representation.

3.4. Cross-Scale Fusion Mechanism

The cross-scale fusion module integrates spatio-temporal features extracted at multiple time scales. This is crucial for capturing both short-term high-frequency fluctuations and the long-term trends inherent in complex time series data.

The outputs from different scales are stacked along the scale dimension:

H_{s} \in R^{B \times T \times N \times k} .

(11)

A fully connected layer learns fusion weights

w_{i}

for each scale via softmax normalization:

H_{f u s e d} (b, t, n) = \sum_{i = 1}^{k} w_{i} \cdot H_{i} (b, t, n),

(12)

where

w_{i}

reflects the relative importance of scale i. Residual connections and layer normalization are applied to stabilize training and prevent gradient vanishing:

H_{o u t p u t} = LayerNorm (H_{f u s e d} + X) .

(13)

This design allows the model to dynamically emphasize features from relevant scales and achieve more comprehensive spatio-temporal representations.

Compared to single-scale models, the cross-scale fusion mechanism excels in adaptively combining short- and long-term dependencies while preserving information flow and training stability.

3.5. Theoretical Breakthrough

The proposed framework introduces three key innovations that collectively enhance predictive performance: First, it explicitly decouples spatial topology learning from temporal modeling, thereby reducing training complexity and improving stability and interpretability. Second, it integrates a frequency–time dual attention mechanism to jointly capture short-term fluctuations and long-term periodic patterns, yielding richer temporal representations. Third, it employs a cross-scale graph fusion strategy, guided by FFT/STFT-derived periodic information, to adaptively integrate multi-resolution dependencies. Beyond the contributions of each module individually, the framework also benefits from their complementary interactions. In particular, frequency domain analysis provides structured temporal features by extracting dominant periodic components and short-term fluctuations, which are then incorporated into the adaptive graph convolution layers. This interaction ensures that spatial dependencies are modeled on frequency-enhanced temporal signals, rather than raw sequences alone, enabling the propagation of richer and more robust spatio-temporal representations. Together, these advances address the critical limitations in existing approaches and establish a principled foundation for accurate and reliable traffic forecasting.

4. Experiments

4.1. Datasets

PeMS [29]: The Performance Measurement System provides a unified database of traffic data collected by Caltrans on California’s motorways, as well as datasets from other Caltrans and partner agencies. These data allow users to provide a unified, comprehensive assessment of motorway performance, make operational decisions based on knowledge of the current state of the motorway network, analyze congestion bottlenecks to identify potential remedies, and make better overall decisions. This dataset encompasses multiple years, including 2004, 2007, 2008, etc. The dataset includes input from hundreds of detectors which collect information at a frequency of five minutes, recording data on speed, traffic flow, occupancy, and more. The dataset used for our experiments is PeMS20, which contains more than 60,000 data, each with a feature length of 325.

Beijing [30]: The Beijing datasets are collected from the traffic speeds of major roads in Beijing. This dataset has the same sampling frequency of 5 min, but the feature length is 3126 and there are more than 20,000 data.

In contrast to the PeMS dataset, the Beijing dataset presents a more challenging scenario; it covers a dense urban road network with a very high feature dimensionality, complex congestion dynamics, and strong non-stationary fluctuations. By jointly evaluating our proposed model on these two datasets, we demonstrate that the proposed model is not only effective under standard benchmark settings but also robust in handling high-dimensional, complex, and noisy real-world traffic scenarios.

4.2. Implementation Details

All experiments are conducted on a Dell Precision 5820 Tower workstation with a Windows 10 operating system. The workstation is equipped with an Intel(R) Xeon(R) W-2255 CPU @ 3.70 GHz processor and an NVIDIA GeForce RTX 3080 (10 GB) graphics card.

In the experiments, all the datasets are divided into training, testing, and validation sets at a ratio of 7:2:1. During training, the batch_size is set to 16, the initial epoch is set to 20 times, and the training is terminated early if there is no enhancement of the validation results for three consecutive epochs. In addition, the model is trained with an initial learning rate of 0.0005 and a dropout of 0.05.

In order to assess the time series prediction ability of the model, we select three evaluation metrics: the Mean Absolute Error (MAE), the Normalized Root Mean Squared Error (NRMSE), and the Mean Absolute Percentage Error (MAPE). MAE results are more robust and intuitive, and are not affected by extreme values; NRMSE results focus on large errors, highlighting the potential for model optimization; and MAPE results can be expressed as a percentage, making it easy to compare and interpret in different scenarios. The MAE, NRMSE, and MAPE are specifically calculated as follows:

M A E = \frac{1}{T} \sum_{t = 1}^{T} |y_{t} - {\hat{y}}_{t}|,

(14)

N R M S E = \frac{\sqrt{\frac{1}{T} \sum_{t = 1}^{T} {(y_{t} - {\hat{y}}_{t})}^{2}}}{y_{max} - y_{min}},

(15)

M A P E = \frac{100 %}{T} \sum_{t = 1}^{T} |\frac{y_{t} - {\hat{y}}_{t}}{y_{t}}|,

(16)

where

y_{t}

is the true value,

{\hat{y}}_{t}

is the predicted value, and T is the total number of samples.

4.3. Baselines

In order to obtain a more precise depiction of the performance of the model, ten excellent time series prediction models are selected as benchmarks for the subsequent comparison experiments.

Autoformer [31] is a novel deep learning architecture designed for long-term time series forecasting which introduces progressive decomposition blocks and autocorrelation mechanisms based on the periodicity of the series to efficiently model complex temporal patterns. Compared to traditional autocorrelation methods, Autoformer has significant advantages in information utilization and computational efficiency, achieving SOTA results (38% relative improvement) for long-term forecasting in multiple domains.

Informer [32] is an efficient Transformer model for long time series prediction which significantly reduces computational complexity and improves long-term dependency modeling by introducing the ProbSparse self-attention mechanism and autoregressive decoding. It achieves high-precision prediction in a variety of real-world scenarios while supporting longer prediction ranges and larger data sizes.

Decomposition-Linear (DLinear) [33] is a lightweight time series forecasting model that achieves very low computational complexity and an excellent long-term series forecasting capability by modeling the long-term trend and short-term residuals with a linear layer, respectively, after decomposition of the series (separating the trend and residual terms). The approach avoids complex attention structures, making the model more efficient and easier to deploy.

TimesNet [34] is a general-purpose time series modeling framework which is designed to extract local and global features at different time scales by designing multi-scale blocks and residual connections to effectively model complex time series dependencies. The model performs well in multiple forecasting tasks, balancing long-term dependency capture and computational efficiency, and is suitable for a variety of time series application scenarios.

Crossformer [35] is a model designed for long-term time series forecasting which introduces the Cross-Dimension Block and a dynamic-length sampling strategy to capture both local time dependence and the global trend of time series. By modeling the interaction between time and variable dimensions, Crossformer shows excellent forecasting performance and scalability when dealing with high-dimensional, multivariate time series.

Spatio-Temporal Graph Convolutional Networks (STGCN) [13] is a model that combines a graph convolutional network (GCN) and a gated temporal convolutional network (Gated TCN). It is specially designed to capture spatial and temporal dependencies, and is especially suitable for the task of predicting time series of graph structures, such as traffic flows. The model incorporates a spatial dependency between nodes through spatial graph convolution and temporal dynamics through time-sequence convolution, thus achieving efficient spatio-temporal feature extraction and prediction.

AGCRN [17] is a spatio-temporal prediction method that combines adaptive graph convolution and recurrent neural networks to dynamically generate the adjacency matrix by learning node embeddings without the need for a predefined fixed graph structure. It is able to model dynamic spatial relationships and temporal dependencies simultaneously, and is suitable for the prediction of complex spatio-temporal data such as traffic flows.

Attention-Based Spatial-Temporal Graph Convolutional Networks (ASTGCN) [14] is a model that combines graph convolutional networks and attention-based temporal convolution for simultaneous modeling of spatial and temporal dependencies. The model dynamically captures the weights of different nodes and time steps through spatial attention, temporal attention, and convolutional layers, which improves the accuracy and interpretability of spatio-temporal data prediction.

Spatial-Temporal Fusion Graph Neural Networks (STFGNN) [19] can efficiently capture hidden spatial dependencies using temporal data-driven correlation graphs and their further fusion with given spatial distance graphs to learn both local spatio-temporal heterogeneity and global spatio-temporal homogeneity.

Reversible Instance Normalization (RevIN) [36] improves the generalization ability of the model by normalizing the input time series by instances, eliminating non-stationarity and distributional differences in the data. The normalization process is reversible, ensuring that the original data scale can be restored after model prediction to ensure the accuracy and stability of the results.

4.4. Comparative Experiments Based on the PeMS20 Dataset

The performance of all models is evaluated on the PeMS20 dataset by training them with identical parameter settings and assessing their MAE, MAPE, and NRMSE on the test set. The results are presented in Table 1.

As shown in Table 1, our proposed model significantly outperforms existing SOTA time series prediction methods on the PeMS20 dataset across all evaluation metrics. Specifically, it achieves the lowest MAE of 0.2930, an NRMSE of 1.4486, and an MAPE of 2.8955%, outperforming the second-best model (STFGNN) by margins of 53.6% in MAE, 12.3% in NRMSE, and 3.2% in MAPE. Traditional Transformer-based models such as Autoformer and Informer exhibit higher errors, highlighting their limited ability to capture complex spatio-temporal dependencies in traffic data. Although STFGNN and TimesNet achieve competitive performance, their results are still inferior to our model, which benefits from the integration of dynamic spatio-temporal fusion and enhanced temporal receptive fields. These results demonstrate the effectiveness of our framework in learning both local spatio-temporal heterogeneity and global temporal homogeneity, leading to superior predictive accuracy.

4.5. Comparative Experiments Based on the Beijing Dataset

To further explore the model prediction performance and robustness, we conduct experiments on the more complex Beijing dataset. In the Beijing dataset, the feature length of each datum is 3126, which results in a high memory requirement for model training. To reduce the computational resources required for training the model, we randomly select 500, 1000, or 1563 (1/2) features for each batch of training. The corresponding MAE, NRMSE, and MAPE results are recorded in Table 2.

As shown in Table 2, our model consistently achieves the best performance across all feature settings (500, 1000, and 1563 features), with the lowest MAE, NRMSE, and MAPE values. Notably, when using 1563 features, our model attains an MAE of 1.1310, an NRMSE of 1.2238, and an MAPE of 10.1381%, significantly outperforming all baselines. Compared to the second-best model (STFGNN), our model reduces the MAE by 57.9%, the NRMSE by 50.3%, and the MAPE by 31.5%. Even under reduced feature settings (500 and 1000), our model maintains superior performance, demonstrating strong robustness to feature dimensionality reduction. These results highlight the efficiency of our framework in handling high-dimensional and complex spatio-temporal data while achieving remarkable predictive accuracy with limited computational resources.

4.6. Ablation Experiments

To further understand the impact of each module on the performance of traffic flow prediction, we perform ablation experiments. In the first group, the enhanced frequency domain modeling is removed, and the data are processed directly in the time domain. In the second group, the adaptive graph convolution is removed and replaced with a standard GCN. In the third group, the frequency–time domain dual attention is removed, and only a single attention mechanism is used. In the fourth group, the cross-scale fusion is removed, and no multi-scale feature fusion wais applied, with only single-scale information being used. Tests based on the above changes are conducted on the PeMS20 dataset, and the results are summarized in Table 3.

As shown in Table 3, different modules contribute unequally to the overall performance. Notably, removing the enhanced frequency domain modeling (EFDM) or the frequency–time dual attention (FTDDA) results in the most significant performance drops. This can be explained by the intrinsic characteristics of traffic data; traffic flows typically exhibit strong periodic patterns, such as daily commuting cycles and weekly repetitions, while also containing short-term fluctuations caused by sudden congestion or signal changes. Frequency domain modeling explicitly extracts these periodic components, providing a more structured representation that time domain modeling alone cannot capture. The dual attention mechanism further aligns temporal fluctuations with periodic signals, which is essential for robust prediction. In contrast, modules such as the adaptive graph convolution (AGCM) and cross-scale fusion (CSFM) modules mainly enhance the modeling of spatial dependencies and multi-scale integration. While important, their impact is relatively smaller because spatial correlations in traffic networks are often more stable than temporal periodicity, and multi-scale fusion acts as a complementary refinement rather than a dominant factor. This analysis demonstrates that capturing frequency-aware temporal patterns is the most critical aspect for achieving accurate traffic forecasting.

4.7. Sensitivity Analysis

In addition, we perform parameter sensitivity analyses to filter out the optimal combination of model hyperparameters, covering input sequence length, model hidden dimension, number of attention heads, depth of graph convolutional network, number of cross-scale convolutional channels, and learning rate. Specifically, the input sequence length affects the extraction of frequency domain features and the modeling of global temporal dependencies; the model hiding dimension determines the expressive capacity of the attention mechanism and the overall model capacity; the number of attention heads has a significant impact on the accuracy of spatio-temporal dependency modeling; the depth of graph convolution relates to the ability to capture spatial dependencies; the number of cross-scale convolutional channels controls the representational capacity of the local spatio-temporal features; and the learning rate influences the convergence speed, stability, and generalization performance of the model. For these hyperparameters, we design different value groups for the experiments, and their specific value ranges and corresponding index trends are shown in Figure 2.

As shown in Figure 2, each hyperparameter exhibits a distinct influence on model performance, closely reflecting its functional role within the architecture. The input sequence length serves as a temporal sampling window, where a value of 96 effectively captures frequency domain features and global temporal dependencies while avoiding excessive noise and computational overhead from longer sequences. The hidden dimension size determines the model’s representational capacity; an intermediate setting (512) enhances the attention mechanism’s effectiveness without incurring unnecessary computational cost or diminishing returns. The number of attention heads influences the granularity of spatio-temporal dependency modeling, with four heads achieving optimal performance and too few or too many leading to slight degradation. The GCN depth balances the ability to learn spatial dependencies and the risk of overfitting, with two layers yielding the best results. The number of cross-scale convolution channels affects the capacity to capture local spatio-temporal features, where a moderate increase (16 channels) improves performance, but further growth increases resource demands significantly. Finally, the learning rate plays a critical role in training stability and convergence speed; a value of 5 × 10⁻⁴ offers a balanced trade-off between fast convergence and avoiding instability or overfitting. Considering prediction accuracy, computational resources, and training time, we select the following optimal hyperparameter configuration: an input sequence length of 96, a hidden dimension size of 512, 4 attention heads, a GCN depth of 2, 16 cross-scale convolution channels, and a learning rate of 5 × 10⁻⁴.

4.8. Model Complexity Analysis

In practical applications, the comprehensive evaluation of a model should not only focus on its predictive accuracy but also consider factors such as the number of parameters, computational resource requirements, and inference time, which are equally important. Therefore, we compare the proposed model with several baseline models in terms of these aspects to assess its overall complexity. The comparison results are summarized in Table 4.

We evaluate the above models on the PeMS20 dataset, with all experiments conducted on the same RTX 3080 GPU using an identical test set. As shown in Table 4, although our model has a relatively larger number of parameters (33.8 M) compared to several baselines, the increases in GPU memory usage, FLOPs, inference latency, and training time remain within 10%, which we believe is acceptable for most practical scenarios. This demonstrates that our model achieves significant improvements in prediction accuracy with only a modest increase in computational cost, and its computational resource usage and inference latency remain well within acceptable limits for real-world applications. Nevertheless, we acknowledge that resource-constrained deployments (e.g., edge devices or real-time systems with limited memory) may require lighter solutions. To this end, potential extensions include reducing the hidden dimension size, pruning less critical frequency components, or designing a distilled variant of the proposed model. Such lightweight adaptations could significantly reduce computational overhead while retaining most of the predictive performance, which we plan to explore in future work.

To further clarify the trade-off between accuracy and efficiency, we additionally compare our framework with simpler baselines, including traditional statistical forecasting models and lightweight graph-based approaches. As expected, these simpler models require significantly fewer parameters and achieve faster training and inference. However, their predictive accuracy is substantially lower, particularly in capturing nonlinear spatio-temporal dependencies and long-term periodic patterns. This comparison highlights that, while our model incurs moderately higher computational costs, it provides a clear accuracy–efficiency balance by achieving substantial performance gains that simpler models cannot deliver in complex traffic forecasting scenarios.

4.9. Real-World Application

To further demonstrate the practical utility of our approach, we consider a potential deployment scenario in urban traffic management. Traditional systems often rely on pre-timed or rule-based signal plans, which cannot adapt quickly to sudden congestion or fluctuations in traffic demand. By integrating our prediction model into the traffic control center, accurate short-term forecasts can be used to dynamically adjust signal timings at major intersections. For example, if the model predicts a rapid build-up of vehicles on a main arterial road during peak hours, the system can proactively extend green-light durations to alleviate bottlenecks and reduce queue lengths. Likewise, long-term periodic predictions can support scheduling and resource allocation, such as lane management or bus priority strategies. This case study illustrates that, beyond benchmark performance, our model offers clear potential to improve the efficiency and reliability of real-world intelligent transportation systems.

In practice, however, urban traffic networks are highly dynamic, and an important consideration is how well the model generalizes to unseen or irregular conditions. Our framework is designed with this challenge in mind. The frequency–time dual attention mechanism captures both stable long-term periodic patterns and sudden short-term fluctuations; the adaptive graph convolution dynamically adjusts spatial dependencies when traffic patterns shift; and the cross-scale fusion strategy enhances flexibility in handling multi-resolution dynamics, making the model better suited for special events or seasonal variations. Together, these mechanisms improve robustness in diverse traffic scenarios.

Nevertheless, certain limitations remain. Extreme or highly irregular disruptions (e.g., accidents, severe weather, or sudden policy interventions) may still challenge the model, as such events deviate sharply from historical patterns. In addition, the reliance on dense and high-quality sensor coverage may limit applicability in regions with sparse or noisy data. To address these issues, future work will explore incorporating external signals (e.g., weather reports, incident records, and social events) and developing robust domain adaptation strategies. These directions will further strengthen the model’s adaptability and facilitate its large-scale deployment in real-world intelligent transportation systems.

5. Conclusions

In this paper, we propose a novel traffic flow prediction model that integrates temporal and frequency domain information to better capture complex spatio-temporal dependencies. The model incorporates adaptive graph convolutional layers, a frequency–temporal multi-head attention mechanism, and a cross-multi-scale graph fusion strategy, which, together, enable more accurate and robust traffic predictions. Extensive experiments conducted on real-world datasets, PeMS and Beijing, demonstrate that the proposed method outperforms SOTA baselines in terms of MAE, NRMSE, and MAPE while maintaining a computational efficiency suitable for practical deployment.

For future work, we plan to extend our approach to address more complex traffic scenarios, particularly under severe congestion conditions, where traffic dynamics become highly nonlinear and unpredictable. This will involve exploring advanced anomaly detection and dynamic graph adaptation strategies to further enhance the model robustness and applicability in real-world intelligent transportation systems.

Author Contributions

Conceptualization, X.Z.; methodology, X.Z.; software, Z.Z.; validation, Z.Z.; formal analysis, Z.Z.; investigation, X.Z.; resources, Z.Y.; data curation, X.Z.; writing—original draft preparation, Z.Z. and X.Z.; writing—review and editing, Z.Z. and X.Z.; visualization, Z.Z.; supervision, X.Z.; project administration, X.Z.; funding acquisition, Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Shenzhen Polytechnic University Research Fund (grant no. 6025310044K) and in part by the Scientific Research Startup Fund for Shenzhen High-Caliber Personnel of SZPU (grant no. 6022310052k).

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alvi, M.; Minerva, R.; Rajapaksha, P.; Crespi, N.; Alvi, U. Traffic Flow Prediction in Sensor-Limited Areas Through Synthetic Sensing and Data Fusion. IEEE Sens. Lett. 2024, 8, 6003904. [Google Scholar] [CrossRef]
Yu, J.J.; Fang, X.; Zhang, S.; Ma, Y. CLEAR: Spatial-Temporal Traffic Data Representation Learning for Traffic Prediction. IEEE Trans. Knowl. Data Eng. 2025, 37, 1672–1687. [Google Scholar] [CrossRef]
Xu, C.; Shao, Y.; Ma, C.; Han, M.; Tong, H.; Peng, C. A Geometric Deep Learning Approach to Traffic Flow Shockwave Prediction on Freeways Using Vehicle Trajectory Data and HD Map. IEEE Trans. Intell. Transp. Syst. 2025, 26, 9907–9917. [Google Scholar] [CrossRef]
Guo, S.; Lin, Y.; Wan, H.; Li, X.; Cong, G. Learning dynamics and heterogeneity of spatial-temporal graph data for traffic forecasting. IEEE Trans. Knowl. Data Eng. 2021, 34, 5415–5428. [Google Scholar] [CrossRef]
Liu, J.; Di, J.; Hu, X.; Liu, Y. Understanding and Assessing the Impact of Hazardous Weather on Railway Operations. In Proceedings of the 27th IEEE International Conference on Intelligent Transportation Systems, ITSC 2024, Edmonton, AB, Canada, 24–27 September 2024; IEEE: New York, NY, USA, 2024; pp. 236–241. [Google Scholar] [CrossRef]
Liang, J.; Yang, K.; Tan, C.; Wang, J.; Yin, G. Enhancing High-Speed Cruising Performance of Autonomous Vehicles Through Integrated Deep Reinforcement Learning Framework. IEEE Trans. Intell. Transp. Syst. 2025, 26, 835–848. [Google Scholar] [CrossRef]
Liang, J.; Tian, Q.; Feng, J.; Pi, D.; Yin, G. A Polytopic Model-Based Robust Predictive Control Scheme for Path Tracking of Autonomous Vehicles. IEEE Trans. Intell. Veh. 2024, 9, 3928–3939. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, J.; Chen, J.; Zhang, H. Spatio-Temporal Context Graph Transformer Design for Map-Free Multi-Agent Trajectory Prediction. IEEE Trans. Intell. Veh. 2024, 9, 1369–1381. [Google Scholar] [CrossRef]
Sun, T.; Fan, L.H.; Yu, X.M.; Ma, S. FST-GNN: Feedback Space-Time Graph Neural Network Model for Networked Multi-Agent Formation Prediction With Noise Interference. IEEE Trans. Netw. Sci. Eng. 2024, 11, 5983–5994. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Y.; Gao, S.; Raubal, M. Context-Aware Knowledge Graph Framework for Traffic Speed Forecasting Using Graph Neural Network. IEEE Trans. Intell. Transp. Syst. 2025, 26, 3885–3902. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; International Joint Conferences on Artificial Intelligence Organization: Montreal, QC, Canada, 2019. [Google Scholar]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Yu, B.; Yin, H.; Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, 13–19 July 2018; Lang, J., Ed.; ijcai.org: Montreal, QC, Canada, 2018; pp. 3634–3640. [Google Scholar] [CrossRef]
Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, HI, USA, 27 January–1 February 2019; AAAI Press: Washington, DC, USA, 2019; pp. 922–929. [Google Scholar] [CrossRef]
Bai, L.; Yao, L.; Kanhere, S.S.; Wang, X.; Sheng, Q.Z. STG2seq: Spatial-temporal graph to sequence model for multi-step passenger demand forecasting. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 1981–1987. [Google Scholar]
Cai, W.; Liang, Y.; Liu, X.; Feng, J.; Wu, Y. Msgnet: Learning multi-scale inter-series correlations for multivariate time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–28 February 2024; Volume 38, pp. 11141–11149. [Google Scholar]
Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; 2020. [Google Scholar]
Li, F.; Feng, J.; Yan, H.; Jin, G.; Yang, F.; Sun, F.; Jin, D.; Li, Y. Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution. ACM Trans. Knowl. Discov. Data 2023, 17, 1–21. [Google Scholar] [CrossRef]
Li, M.; Zhu, Z. Spatial-Temporal Fusion Graph Neural Networks for Traffic Flow Forecasting. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February 2021; AAAI Press: Washington, DC, USA, 2021; pp. 4189–4196. [Google Scholar] [CrossRef]
Cao, D.; Li, J.; Ma, H.; Tomizuka, M. Spectral temporal graph neural network for trajectory prediction. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: New York, NY, USA, 2021; pp. 1839–1845. [Google Scholar]
Wang, H.; Chen, J.; Pan, T.; Dong, Z.; Zhang, L.; Jiang, R.; Song, X. Stgformer: Efficient spatiotemporal graph transformer for traffic forecasting. arXiv 2024, arXiv:2410.00385. [Google Scholar] [CrossRef]
Kumar, R.; Mendes-Moreira, J.; Chandra, J. Spatio-temporal parallel transformer based model for traffic prediction. ACM Trans. Knowl. Discov. Data 2024, 18, 1–25. [Google Scholar] [CrossRef]
Guo, S.; Lin, Y.; Li, S.; Chen, Z.; Wan, H. Deep Spatial–Temporal 3D Convolutional Neural Networks for Traffic Data Forecasting; IEEE: New York, NY, USA, 2019; Volume 20, pp. 3913–3926. [Google Scholar]
Pan, Z.; Zhang, W.; Liang, Y.; Zhang, W.; Yu, Y.; Zhang, J.; Zheng, Y. Spatio-Temporal Meta Learning for Urban Traffic Prediction; IEEE: New York, NY, USA, 2020; Volume 34, pp. 1462–1476. [Google Scholar]
Wu, X.; Lu, H.; Li, K.; Wu, Z.; Liu, X.; Meng, H. Hiformer: Sequence modeling networks with hierarchical attention mechanisms. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 31, 3993–4003. [Google Scholar] [CrossRef]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning. PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
Yu, C.; Li, Y.; Zhai, G. Time-frequency attention mechanism-based model for enhancing wind speed prediction accuracy. Expert Syst. Appl. 2025, 265, 126038. [Google Scholar] [CrossRef]
Yang, X.; Li, H.; Huang, X.; Feng, X. FEDAF: Frequency enhanced decomposed attention free transformer for long time series forecasting. Neural Comput. Appl. 2024, 36, 16271–16288. [Google Scholar] [CrossRef]
Liu, M.; Zeng, A.; Lai, Q.; Xu, Q. Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction. arXiv 2021, arXiv:2106.09305. [Google Scholar] [CrossRef]
Cai, Z.; Jiang, R.; Yang, X.; Wang, Z.; Guo, D.; Kobayashi, H.H.; Song, X.; Shibasaki, R. MemDA: Forecasting Urban Time Series with Memory-based Drift Adaptation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, UK, 21–25 October 2023; Frommholz, I., Hopfgartner, F., Lee, M., Oakes, M., Lalmas, M., Zhang, M., Santos, R.L.T., Eds.; ACM: New York, NY, USA, 2023; pp. 193–202. [Google Scholar] [CrossRef]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, Virtual, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W., Eds.; 2021; pp. 22419–22430. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February 2021; AAAI Press: Washington, DC, USA, 2021; pp. 11106–11115. [Google Scholar] [CrossRef]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, 7–14 February 2023; Williams, B., Chen, Y., Neville, J., Eds.; AAAI Press: Washington, DC, USA, 2023; pp. 11121–11128. [Google Scholar] [CrossRef]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In Proceedings of the The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023; OpenReview.net: Alameda, CA, USA, 2023. [Google Scholar]
Zhang, Y.; Yan, J. Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting. In Proceedings of the The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023; OpenReview.net: Alameda, CA, USA, 2023. [Google Scholar]
Kim, T.; Kim, J.; Tae, Y.; Park, C.; Choi, J.; Choo, J. Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. In Proceedings of the The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022; OpenReview.net: Alameda, CA, USA, 2022. [Google Scholar]

Figure 1. The architecture of the proposed method.

Figure 2. Parameter sensitivity analysis experimental results.

Table 1. Performance comparisons of SOTA time series prediction models on the PeMS20 dataset.

Model	MAE	NRMSE	MAPE
Autoformer	1.2180	2.2742	3.1596%
Informer	1.1594	2.1419	3.0862%
DLinear	1.3014	2.1538	3.2271%
TimesNet	0.9441	1.9715	3.0992%
Crossformer	0.8317	1.7295	2.9840%
STGCN	1.5036	2.4429	3.4344%
AGCRN	1.7193	2.5876	4.4028%
ASTGCN	1.6075	2.8812	3.5104%
STFGNN	0.6310	1.6527	2.9903%
RevIN	0.8292	1.7533	2.9065%
Ours	0.2930	1.4486	2.8955%

Table 2. Performance comparisons of SOTA time series prediction models on the Beijing dataset.

Model	MAE-500	NRMSE-500	MAPE-500	MAE-1000	NRMSE-1000	MAPE-1000	MAE-1563	NRMSE-1563	MAPE-1563
Autoformer	6.9413	8.1719	19.3246%	6.5028	8.0015	19.5410%	6.2216	8.5633	18.5682%
Informer	6.7324	7.3260	18.9619%	6.2525	7.5814	18.5176%	6.0837	7.4800	18.3381%
DLinear	6.3215	8.1220	17.8437%	6.1520	7.3602	17.5960%	6.0036	7.0098	17.7719%
TimesNet	4.8022	6.5580	16.5812%	4.6001	6.3719	16.1114%	4.3744	6.0117	15.4769%
Crossformer	4.4937	6.0125	15.0909%	3.9915	6.7942	16.4381%	4.1600	5.8401	15.6910%
STGCN	7.1826	13.0408	20.0326%	5.9518	12.1920	18.6910%	6.0614	11.2847	16.8284%
AGCRN	6.1405	12.0318	17.0050%	5.7942	11.1163	18.1937%	5.2238	9.7150	18.2776%
ASTGCN	6.1373	15.4364	20.5515%	6.3605	13.1900	15.3602%	5.9250	14.1896	18.5503%
STFGNN	2.8604	2.5832	14.0110%	2.7195	2.3900	15.6824%	2.6933	2.4612	14.7881%
RevIN	3.9825	5.0300	13.0355%	4.2780	4.8144	12.3150%	4.1317	5.1516	12.7204%
Ours	1.2226	1.3266	10.5487%	1.2032	1.3106	11.1357%	1.1310	1.2238	10.1381%

Table 3. Ablation experiment results on the PeMS20 dataset.

Model	MAE	NRMSE	MAPE
Full Model	0.2930	1.4486	2.8955%
w/o-EFDM	0.5611	1.8816	3.1910%
w/o-AGCM	0.3306	1.6518	3.0051%
w/o-FTDDA	0.5374	2.1005	3.1271%
w/o-CSFM	0.3122	1.4783	2.9990%

Table 4. Model complexity comparison on the PeMS20 dataset.

Model	Params (M)	FLOPs (G)	GPU Memory (MB)	Inference Time (ms)	Training Time (h)
Autoformer	19.6	48.2	1450	431.3	10.1
Informer	18.9	45.7	1420	369.8	9.4
DLinear	9.6	18.8	610	151.7	7.2
TimesNet	22.1	52.4	1620	477.6	10.6
Crossformer	20.8	49.1	1570	435.9	10.2
STGCN	13.1	16.7	890	166.7	7.4
AGCRN	13.9	18.5	950	248.2	8.2
ASTGCN	20.8	36.1	1240	285.9	8.6
STFGNN	23.5	37.4	1520	337.4	9.6
RevIN	21.2	32.3	1680	283.5	8.5
Ours	33.8	58.6	2050	551.2	11.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Z.; Zhu, X.; Ye, Z. Adaptive Cross-Scale Graph Fusion with Spatio-Temporal Attention for Traffic Prediction. Electronics 2025, 14, 3399. https://doi.org/10.3390/electronics14173399

AMA Style

Zhao Z, Zhu X, Ye Z. Adaptive Cross-Scale Graph Fusion with Spatio-Temporal Attention for Traffic Prediction. Electronics. 2025; 14(17):3399. https://doi.org/10.3390/electronics14173399

Chicago/Turabian Style

Zhao, Zihao, Xingzheng Zhu, and Ziyun Ye. 2025. "Adaptive Cross-Scale Graph Fusion with Spatio-Temporal Attention for Traffic Prediction" Electronics 14, no. 17: 3399. https://doi.org/10.3390/electronics14173399

APA Style

Zhao, Z., Zhu, X., & Ye, Z. (2025). Adaptive Cross-Scale Graph Fusion with Spatio-Temporal Attention for Traffic Prediction. Electronics, 14(17), 3399. https://doi.org/10.3390/electronics14173399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Cross-Scale Graph Fusion with Spatio-Temporal Attention for Traffic Prediction

Abstract

1. Introduction

2. Related Work

2.1. Spatial Dependency Modeling in Traffic Forecasting

2.2. Coupled and Decoupled Spatio-Temporal Modeling

2.3. Multi-Scale and Frequency-Aware Temporal Modeling

3. Approach

3.1. Enhanced Frequency Domain Modeling

3.2. Adaptive Graph Convolution Module

3.3. Frequency–Time Domain Dual Attention

3.4. Cross-Scale Fusion Mechanism

3.5. Theoretical Breakthrough

4. Experiments

4.1. Datasets

4.2. Implementation Details

4.3. Baselines

4.4. Comparative Experiments Based on the PeMS20 Dataset

4.5. Comparative Experiments Based on the Beijing Dataset

4.6. Ablation Experiments

4.7. Sensitivity Analysis

4.8. Model Complexity Analysis

4.9. Real-World Application

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI