1. Introduction
In the rapidly advancing fields of urban development and intelligent transportation management, the ability to analyze and predict traffic data is essential for sustainable infrastructure planning [
1,
2,
3]. Accurate traffic forecasting enables planners to take proactive measures, optimizing the management of transit systems, networks, and operations [
4,
5]. Reliable predictions assist infrastructure operators in making informed decisions and refining rerouting strategies, ultimately improving overall system performance [
6,
7]. In recent years, spatio-temporal modeling using graph neural networks (GNNs) has gained significant attention in traffic data analysis [
8,
9,
10]. These approaches model transportation networks as graphs, where nodes represent intersections and edges denote road segments. By combining GNNs with temporal sequence models, such frameworks can effectively capture complex spatial and temporal dependencies, leading to substantial performance improvements on real-world traffic datasets [
2].
However, existing graph-based forecasting models continue to face several persistent challenges [
11]. A primary issue is the construction of spatial dependencies [
2]. Many approaches rely on fixed graph structures based on geographic distances or road connectivity [
12,
13,
14], which fail to capture the dynamic and context-sensitive nature of traffic interactions. In real-world scenarios, spatial correlations among traffic nodes evolve over time due to fluctuations in traffic flow, changes in signal control, and other external factors. While adaptive graph learning methods have been proposed to address this limitation, they often involve a large number of trainable parameters and rely on implicit optimization objectives, making them difficult to train and highly sensitive to data scarcity.
A second limitation stems from the coupling between spatial graph learning and temporal modeling [
11,
15]. Most existing models incorporate graph structure learning into the training process of the prediction model, leading to a bi-level optimization procedure that increases model complexity and reduces training stability. Furthermore, this coupling often lacks explicit supervision for the learned graph structures, making it challenging to ensure interpretability or convergence. The issue becomes even more pronounced when modeling longer temporal horizons, as the interaction between spatial topology and temporal dynamics grows increasingly complex and intertwined.
Moreover, traffic data inherently exhibit complex multi-scale temporal patterns—such as hourly fluctuations, daily commuting cycles, and weekly trends—that are challenging to fully capture with single-resolution temporal models [
16]. While some recent studies have incorporated periodic decomposition or attention mechanisms, they often treat periodicity as a fixed prior and fail to explicitly model its interactions with spatial structures.
To address the challenges in traffic forecasting, we propose a novel framework that decouples spatial topology learning from temporal modeling while integrating frequency-aware, multi-scale representation learning. The framework consists of four key stages: First, we introduce a frequency domain analysis module that utilizes Short-Time Fourier Transform (STFT) or Fast Fourier Transform (FFT) to uncover dominant periodic patterns in the traffic time series. By extracting the top-k frequencies, this module guides temporal segmentation and multi-scale processing, enabling more accurate temporal modeling. Second, to capture the dynamic nature of traffic interactions, we design the Adaptive Graph Convolution Block (GraphBlock). This module constructs spatial graphs based on latent node representations, allowing the model to learn time-varying spatial dependencies in a data-driven manner. Third, we introduce a hybrid attention mechanism that integrates temporal self-attention with frequency domain attention, enabling the simultaneous modeling of short-term fluctuations and long-term periodic trends across multiple temporal scales. Finally, the spatio-temporal representations extracted at different temporal scales are fused, allowing the model to effectively capture both fine-grained variations and long-term trends.
This modular architecture offers a principled solution to the limitations of existing approaches, particularly in modeling dynamic spatial structures and multi-scale temporal dependencies. Our contributions can be summarized as follows:
We propose a novel framework that explicitly decouples spatial topology learning from temporal modeling, which alleviates optimization difficulties and improves both stability and interpretability.
The framework incorporates frequency domain analysis to extract dominant periodic components and short-term fluctuations, which are then integrated into adaptive graph convolution layers to dynamically capture spatio-temporal dependencies.
A frequency–temporal multi-head attention mechanism is further designed for effective multi-scale spatio-temporal feature extraction combined with a cross-scale graph fusion strategy to enhance prediction accuracy.
From a physical perspective, the model aligns with fundamental traffic flow properties by simultaneously capturing high-frequency short-term fluctuations and low-frequency long-term periodicity, thereby providing interpretable insights into traffic dynamics.
Extensive experiments on real-world datasets (PeMS and Beijing) demonstrate that the proposed model outperforms various state-of-the-art (SOTA) models in terms of MAE, NRMSE, and MAPE while maintaining high computational efficiency, making it suitable for practical deployment.
The paper is organized as follows:
Section 2 reviews related work,
Section 3 presents our proposed method,
Section 4 details the experiments and results, and
Section 5 concludes the paper.
3. Approach
Our proposed method is composed of four main modules: enhanced frequency domain modeling, an adaptive graph convolution module, frequency–time domain dual attention, and a cross-scale fusion mechanism. The overall architecture is illustrated in
Figure 1.
3.1. Enhanced Frequency Domain Modeling
To capture prominent periodic patterns in time series, we introduce a frequency domain analysis module that automatically detects the dominant periodic components. This module provides key references for the subsequent multi-scale graph convolutional networks. It supports two implementation strategies: Short-Time Fourier Transform (STFT) and Fast Fourier Transform (FFT). Both approaches analyze the amplitude spectrum and extract the top-k significant frequencies, from which the corresponding period lengths are calculated.
The input is a 3D tensor
, where
B,
T, and
C represent batch size, time steps, and number of variables, respectively. The data are reshaped to
to allow simultaneous analysis across channels. Applying STFT or FFT yields the frequency representation:
The amplitude spectrum is then computed as follows:
with the DC component
zeroed to avoid bias. Using a top-
k strategy, the
k frequencies with the largest amplitudes are selected, and their periods calculated:
The amplitudes of these selected frequencies are expanded to match batch dimensions for subsequent fusion. This module provides essential multi-scale period information for downstream processing.
3.2. Adaptive Graph Convolution Module
The core spatial feature extractor in our model is the GraphBlock, designed using adaptive graph convolutional networks (AGCN). Unlike traditional graph convolutions with fixed adjacency matrices, this module dynamically learns the graph topology based on node features.
For the spatio-temporal input
, the learned adjacency matrix
is computed as follows:
where
are learnable parameters. This dynamic adjacency matrix captures the evolving relationships among nodes over time. A multi-graph convolution is then applied to propagate information across different neighborhoods:
where
denotes the
i-th power of
A,
is the weight matrix for order
i, and
is a nonlinear activation function. This design enables GraphBlock to adaptively capture both local and global spatial dependencies.
3.3. Frequency–Time Domain Dual Attention
To model both temporal dependencies and periodic characteristics in time series, we introduce a dual attention mechanism that integrates temporal multi-head attention and frequency domain multi-head attention.
Temporal attention captures dependencies among time steps using the standard scaled dot-product attention:
where
is the key dimension. For frequency modeling, FFT is applied to the input along the time axis, yielding frequency domain features
. Multi-head attention is then applied to
:
Since the temporal and frequency representations may differ in sequence length, linear interpolation aligns them. Finally, both features are concatenated and passed through a linear layer for fusion, producing a rich and expressive feature representation.
3.4. Cross-Scale Fusion Mechanism
The cross-scale fusion module integrates spatio-temporal features extracted at multiple time scales. This is crucial for capturing both short-term high-frequency fluctuations and the long-term trends inherent in complex time series data.
The outputs from different scales are stacked along the scale dimension:
A fully connected layer learns fusion weights
for each scale via softmax normalization:
where
reflects the relative importance of scale
i. Residual connections and layer normalization are applied to stabilize training and prevent gradient vanishing:
This design allows the model to dynamically emphasize features from relevant scales and achieve more comprehensive spatio-temporal representations.
Compared to single-scale models, the cross-scale fusion mechanism excels in adaptively combining short- and long-term dependencies while preserving information flow and training stability.
3.5. Theoretical Breakthrough
The proposed framework introduces three key innovations that collectively enhance predictive performance: First, it explicitly decouples spatial topology learning from temporal modeling, thereby reducing training complexity and improving stability and interpretability. Second, it integrates a frequency–time dual attention mechanism to jointly capture short-term fluctuations and long-term periodic patterns, yielding richer temporal representations. Third, it employs a cross-scale graph fusion strategy, guided by FFT/STFT-derived periodic information, to adaptively integrate multi-resolution dependencies. Beyond the contributions of each module individually, the framework also benefits from their complementary interactions. In particular, frequency domain analysis provides structured temporal features by extracting dominant periodic components and short-term fluctuations, which are then incorporated into the adaptive graph convolution layers. This interaction ensures that spatial dependencies are modeled on frequency-enhanced temporal signals, rather than raw sequences alone, enabling the propagation of richer and more robust spatio-temporal representations. Together, these advances address the critical limitations in existing approaches and establish a principled foundation for accurate and reliable traffic forecasting.
4. Experiments
4.1. Datasets
PeMS [
29]: The Performance Measurement System provides a unified database of traffic data collected by Caltrans on California’s motorways, as well as datasets from other Caltrans and partner agencies. These data allow users to provide a unified, comprehensive assessment of motorway performance, make operational decisions based on knowledge of the current state of the motorway network, analyze congestion bottlenecks to identify potential remedies, and make better overall decisions. This dataset encompasses multiple years, including 2004, 2007, 2008, etc. The dataset includes input from hundreds of detectors which collect information at a frequency of five minutes, recording data on speed, traffic flow, occupancy, and more. The dataset used for our experiments is PeMS20, which contains more than 60,000 data, each with a feature length of 325.
Beijing [
30]: The Beijing datasets are collected from the traffic speeds of major roads in Beijing. This dataset has the same sampling frequency of 5 min, but the feature length is 3126 and there are more than 20,000 data.
In contrast to the PeMS dataset, the Beijing dataset presents a more challenging scenario; it covers a dense urban road network with a very high feature dimensionality, complex congestion dynamics, and strong non-stationary fluctuations. By jointly evaluating our proposed model on these two datasets, we demonstrate that the proposed model is not only effective under standard benchmark settings but also robust in handling high-dimensional, complex, and noisy real-world traffic scenarios.
4.2. Implementation Details
All experiments are conducted on a Dell Precision 5820 Tower workstation with a Windows 10 operating system. The workstation is equipped with an Intel(R) Xeon(R) W-2255 CPU @ 3.70 GHz processor and an NVIDIA GeForce RTX 3080 (10 GB) graphics card.
In the experiments, all the datasets are divided into training, testing, and validation sets at a ratio of 7:2:1. During training, the batch_size is set to 16, the initial epoch is set to 20 times, and the training is terminated early if there is no enhancement of the validation results for three consecutive epochs. In addition, the model is trained with an initial learning rate of 0.0005 and a dropout of 0.05.
In order to assess the time series prediction ability of the model, we select three evaluation metrics: the Mean Absolute Error (MAE), the Normalized Root Mean Squared Error (NRMSE), and the Mean Absolute Percentage Error (MAPE). MAE results are more robust and intuitive, and are not affected by extreme values; NRMSE results focus on large errors, highlighting the potential for model optimization; and MAPE results can be expressed as a percentage, making it easy to compare and interpret in different scenarios. The MAE, NRMSE, and MAPE are specifically calculated as follows:
where
is the true value,
is the predicted value, and
T is the total number of samples.
4.3. Baselines
In order to obtain a more precise depiction of the performance of the model, ten excellent time series prediction models are selected as benchmarks for the subsequent comparison experiments.
Autoformer [
31] is a novel deep learning architecture designed for long-term time series forecasting which introduces progressive decomposition blocks and autocorrelation mechanisms based on the periodicity of the series to efficiently model complex temporal patterns. Compared to traditional autocorrelation methods, Autoformer has significant advantages in information utilization and computational efficiency, achieving SOTA results (38% relative improvement) for long-term forecasting in multiple domains.
Informer [
32] is an efficient Transformer model for long time series prediction which significantly reduces computational complexity and improves long-term dependency modeling by introducing the ProbSparse self-attention mechanism and autoregressive decoding. It achieves high-precision prediction in a variety of real-world scenarios while supporting longer prediction ranges and larger data sizes.
Decomposition-Linear (DLinear) [
33] is a lightweight time series forecasting model that achieves very low computational complexity and an excellent long-term series forecasting capability by modeling the long-term trend and short-term residuals with a linear layer, respectively, after decomposition of the series (separating the trend and residual terms). The approach avoids complex attention structures, making the model more efficient and easier to deploy.
TimesNet [
34] is a general-purpose time series modeling framework which is designed to extract local and global features at different time scales by designing multi-scale blocks and residual connections to effectively model complex time series dependencies. The model performs well in multiple forecasting tasks, balancing long-term dependency capture and computational efficiency, and is suitable for a variety of time series application scenarios.
Crossformer [
35] is a model designed for long-term time series forecasting which introduces the Cross-Dimension Block and a dynamic-length sampling strategy to capture both local time dependence and the global trend of time series. By modeling the interaction between time and variable dimensions, Crossformer shows excellent forecasting performance and scalability when dealing with high-dimensional, multivariate time series.
Spatio-Temporal Graph Convolutional Networks (STGCN) [
13] is a model that combines a graph convolutional network (GCN) and a gated temporal convolutional network (Gated TCN). It is specially designed to capture spatial and temporal dependencies, and is especially suitable for the task of predicting time series of graph structures, such as traffic flows. The model incorporates a spatial dependency between nodes through spatial graph convolution and temporal dynamics through time-sequence convolution, thus achieving efficient spatio-temporal feature extraction and prediction.
AGCRN [
17] is a spatio-temporal prediction method that combines adaptive graph convolution and recurrent neural networks to dynamically generate the adjacency matrix by learning node embeddings without the need for a predefined fixed graph structure. It is able to model dynamic spatial relationships and temporal dependencies simultaneously, and is suitable for the prediction of complex spatio-temporal data such as traffic flows.
Attention-Based Spatial-Temporal Graph Convolutional Networks (ASTGCN) [
14] is a model that combines graph convolutional networks and attention-based temporal convolution for simultaneous modeling of spatial and temporal dependencies. The model dynamically captures the weights of different nodes and time steps through spatial attention, temporal attention, and convolutional layers, which improves the accuracy and interpretability of spatio-temporal data prediction.
Spatial-Temporal Fusion Graph Neural Networks (STFGNN) [
19] can efficiently capture hidden spatial dependencies using temporal data-driven correlation graphs and their further fusion with given spatial distance graphs to learn both local spatio-temporal heterogeneity and global spatio-temporal homogeneity.
Reversible Instance Normalization (RevIN) [
36] improves the generalization ability of the model by normalizing the input time series by instances, eliminating non-stationarity and distributional differences in the data. The normalization process is reversible, ensuring that the original data scale can be restored after model prediction to ensure the accuracy and stability of the results.
4.4. Comparative Experiments Based on the PeMS20 Dataset
The performance of all models is evaluated on the PeMS20 dataset by training them with identical parameter settings and assessing their MAE, MAPE, and NRMSE on the test set. The results are presented in
Table 1.
As shown in
Table 1, our proposed model significantly outperforms existing SOTA time series prediction methods on the PeMS20 dataset across all evaluation metrics. Specifically, it achieves the lowest MAE of 0.2930, an NRMSE of 1.4486, and an MAPE of 2.8955%, outperforming the second-best model (STFGNN) by margins of 53.6% in MAE, 12.3% in NRMSE, and 3.2% in MAPE. Traditional Transformer-based models such as Autoformer and Informer exhibit higher errors, highlighting their limited ability to capture complex spatio-temporal dependencies in traffic data. Although STFGNN and TimesNet achieve competitive performance, their results are still inferior to our model, which benefits from the integration of dynamic spatio-temporal fusion and enhanced temporal receptive fields. These results demonstrate the effectiveness of our framework in learning both local spatio-temporal heterogeneity and global temporal homogeneity, leading to superior predictive accuracy.
4.5. Comparative Experiments Based on the Beijing Dataset
To further explore the model prediction performance and robustness, we conduct experiments on the more complex Beijing dataset. In the Beijing dataset, the feature length of each datum is 3126, which results in a high memory requirement for model training. To reduce the computational resources required for training the model, we randomly select 500, 1000, or 1563 (1/2) features for each batch of training. The corresponding MAE, NRMSE, and MAPE results are recorded in
Table 2.
As shown in
Table 2, our model consistently achieves the best performance across all feature settings (500, 1000, and 1563 features), with the lowest MAE, NRMSE, and MAPE values. Notably, when using 1563 features, our model attains an MAE of 1.1310, an NRMSE of 1.2238, and an MAPE of 10.1381%, significantly outperforming all baselines. Compared to the second-best model (STFGNN), our model reduces the MAE by 57.9%, the NRMSE by 50.3%, and the MAPE by 31.5%. Even under reduced feature settings (500 and 1000), our model maintains superior performance, demonstrating strong robustness to feature dimensionality reduction. These results highlight the efficiency of our framework in handling high-dimensional and complex spatio-temporal data while achieving remarkable predictive accuracy with limited computational resources.
4.6. Ablation Experiments
To further understand the impact of each module on the performance of traffic flow prediction, we perform ablation experiments. In the first group, the enhanced frequency domain modeling is removed, and the data are processed directly in the time domain. In the second group, the adaptive graph convolution is removed and replaced with a standard GCN. In the third group, the frequency–time domain dual attention is removed, and only a single attention mechanism is used. In the fourth group, the cross-scale fusion is removed, and no multi-scale feature fusion wais applied, with only single-scale information being used. Tests based on the above changes are conducted on the PeMS20 dataset, and the results are summarized in
Table 3.
As shown in
Table 3, different modules contribute unequally to the overall performance. Notably, removing the enhanced frequency domain modeling (EFDM) or the frequency–time dual attention (FTDDA) results in the most significant performance drops. This can be explained by the intrinsic characteristics of traffic data; traffic flows typically exhibit strong periodic patterns, such as daily commuting cycles and weekly repetitions, while also containing short-term fluctuations caused by sudden congestion or signal changes. Frequency domain modeling explicitly extracts these periodic components, providing a more structured representation that time domain modeling alone cannot capture. The dual attention mechanism further aligns temporal fluctuations with periodic signals, which is essential for robust prediction. In contrast, modules such as the adaptive graph convolution (AGCM) and cross-scale fusion (CSFM) modules mainly enhance the modeling of spatial dependencies and multi-scale integration. While important, their impact is relatively smaller because spatial correlations in traffic networks are often more stable than temporal periodicity, and multi-scale fusion acts as a complementary refinement rather than a dominant factor. This analysis demonstrates that capturing frequency-aware temporal patterns is the most critical aspect for achieving accurate traffic forecasting.
4.7. Sensitivity Analysis
In addition, we perform parameter sensitivity analyses to filter out the optimal combination of model hyperparameters, covering input sequence length, model hidden dimension, number of attention heads, depth of graph convolutional network, number of cross-scale convolutional channels, and learning rate. Specifically, the input sequence length affects the extraction of frequency domain features and the modeling of global temporal dependencies; the model hiding dimension determines the expressive capacity of the attention mechanism and the overall model capacity; the number of attention heads has a significant impact on the accuracy of spatio-temporal dependency modeling; the depth of graph convolution relates to the ability to capture spatial dependencies; the number of cross-scale convolutional channels controls the representational capacity of the local spatio-temporal features; and the learning rate influences the convergence speed, stability, and generalization performance of the model. For these hyperparameters, we design different value groups for the experiments, and their specific value ranges and corresponding index trends are shown in
Figure 2.
As shown in
Figure 2, each hyperparameter exhibits a distinct influence on model performance, closely reflecting its functional role within the architecture. The input sequence length serves as a temporal sampling window, where a value of 96 effectively captures frequency domain features and global temporal dependencies while avoiding excessive noise and computational overhead from longer sequences. The hidden dimension size determines the model’s representational capacity; an intermediate setting (512) enhances the attention mechanism’s effectiveness without incurring unnecessary computational cost or diminishing returns. The number of attention heads influences the granularity of spatio-temporal dependency modeling, with four heads achieving optimal performance and too few or too many leading to slight degradation. The GCN depth balances the ability to learn spatial dependencies and the risk of overfitting, with two layers yielding the best results. The number of cross-scale convolution channels affects the capacity to capture local spatio-temporal features, where a moderate increase (16 channels) improves performance, but further growth increases resource demands significantly. Finally, the learning rate plays a critical role in training stability and convergence speed; a value of 5 × 10
−4 offers a balanced trade-off between fast convergence and avoiding instability or overfitting. Considering prediction accuracy, computational resources, and training time, we select the following optimal hyperparameter configuration: an input sequence length of 96, a hidden dimension size of 512, 4 attention heads, a GCN depth of 2, 16 cross-scale convolution channels, and a learning rate of 5 × 10
−4.
4.8. Model Complexity Analysis
In practical applications, the comprehensive evaluation of a model should not only focus on its predictive accuracy but also consider factors such as the number of parameters, computational resource requirements, and inference time, which are equally important. Therefore, we compare the proposed model with several baseline models in terms of these aspects to assess its overall complexity. The comparison results are summarized in
Table 4.
We evaluate the above models on the PeMS20 dataset, with all experiments conducted on the same RTX 3080 GPU using an identical test set. As shown in
Table 4, although our model has a relatively larger number of parameters (33.8 M) compared to several baselines, the increases in GPU memory usage, FLOPs, inference latency, and training time remain within 10%, which we believe is acceptable for most practical scenarios. This demonstrates that our model achieves significant improvements in prediction accuracy with only a modest increase in computational cost, and its computational resource usage and inference latency remain well within acceptable limits for real-world applications. Nevertheless, we acknowledge that resource-constrained deployments (e.g., edge devices or real-time systems with limited memory) may require lighter solutions. To this end, potential extensions include reducing the hidden dimension size, pruning less critical frequency components, or designing a distilled variant of the proposed model. Such lightweight adaptations could significantly reduce computational overhead while retaining most of the predictive performance, which we plan to explore in future work.
To further clarify the trade-off between accuracy and efficiency, we additionally compare our framework with simpler baselines, including traditional statistical forecasting models and lightweight graph-based approaches. As expected, these simpler models require significantly fewer parameters and achieve faster training and inference. However, their predictive accuracy is substantially lower, particularly in capturing nonlinear spatio-temporal dependencies and long-term periodic patterns. This comparison highlights that, while our model incurs moderately higher computational costs, it provides a clear accuracy–efficiency balance by achieving substantial performance gains that simpler models cannot deliver in complex traffic forecasting scenarios.
4.9. Real-World Application
To further demonstrate the practical utility of our approach, we consider a potential deployment scenario in urban traffic management. Traditional systems often rely on pre-timed or rule-based signal plans, which cannot adapt quickly to sudden congestion or fluctuations in traffic demand. By integrating our prediction model into the traffic control center, accurate short-term forecasts can be used to dynamically adjust signal timings at major intersections. For example, if the model predicts a rapid build-up of vehicles on a main arterial road during peak hours, the system can proactively extend green-light durations to alleviate bottlenecks and reduce queue lengths. Likewise, long-term periodic predictions can support scheduling and resource allocation, such as lane management or bus priority strategies. This case study illustrates that, beyond benchmark performance, our model offers clear potential to improve the efficiency and reliability of real-world intelligent transportation systems.
In practice, however, urban traffic networks are highly dynamic, and an important consideration is how well the model generalizes to unseen or irregular conditions. Our framework is designed with this challenge in mind. The frequency–time dual attention mechanism captures both stable long-term periodic patterns and sudden short-term fluctuations; the adaptive graph convolution dynamically adjusts spatial dependencies when traffic patterns shift; and the cross-scale fusion strategy enhances flexibility in handling multi-resolution dynamics, making the model better suited for special events or seasonal variations. Together, these mechanisms improve robustness in diverse traffic scenarios.
Nevertheless, certain limitations remain. Extreme or highly irregular disruptions (e.g., accidents, severe weather, or sudden policy interventions) may still challenge the model, as such events deviate sharply from historical patterns. In addition, the reliance on dense and high-quality sensor coverage may limit applicability in regions with sparse or noisy data. To address these issues, future work will explore incorporating external signals (e.g., weather reports, incident records, and social events) and developing robust domain adaptation strategies. These directions will further strengthen the model’s adaptability and facilitate its large-scale deployment in real-world intelligent transportation systems.
5. Conclusions
In this paper, we propose a novel traffic flow prediction model that integrates temporal and frequency domain information to better capture complex spatio-temporal dependencies. The model incorporates adaptive graph convolutional layers, a frequency–temporal multi-head attention mechanism, and a cross-multi-scale graph fusion strategy, which, together, enable more accurate and robust traffic predictions. Extensive experiments conducted on real-world datasets, PeMS and Beijing, demonstrate that the proposed method outperforms SOTA baselines in terms of MAE, NRMSE, and MAPE while maintaining a computational efficiency suitable for practical deployment.
For future work, we plan to extend our approach to address more complex traffic scenarios, particularly under severe congestion conditions, where traffic dynamics become highly nonlinear and unpredictable. This will involve exploring advanced anomaly detection and dynamic graph adaptation strategies to further enhance the model robustness and applicability in real-world intelligent transportation systems.