Traffic Flow Prediction Research Based on an Interactive Dynamic Spatial–Temporal Graph Convolutional Probabilistic Sparse Attention Mechanism (IDG-PSAtt)

: Accurate traffic flow prediction is highly important for relieving road congestion. Due to the intricate spatial–temporal dependence of traffic flows, especially the hidden dynamic correlations among road nodes, and the dynamic spatial–temporal characteristics of traffic flows, a traffic flow prediction model based on an interactive dynamic spatial–temporal graph convolutional probabilistic sparse attention mechanism (IDG-PSAtt) is proposed. Specifically, the IDG-PSAtt model consists of an interactive dynamic graph convolutional network (IL-DGCN) with a spatial–temporal convolution (ST-Conv) block and a probabilistic sparse self-attention (ProbSSAtt) mechanism. The IL-DGCN divides the time series of a traffic flow into intervals and synchronously and interactively shares the captured dynamic spatiotemporal features. The ST-Conv block is utilized to capture the complex dynamic spatial–temporal characteristics of the traffic flow, and the ProbSSAtt block is utilized for medium-to-long-term forecasting. In addition, a dynamic GCN is generated by fusing adaptive and learnable adjacency matrices to learn the hidden dynamic associations among road network nodes. Experimental results demonstrate that the IDG-PSAtt model outperforms the baseline methods in terms of prediction accuracy. Specifically, on METR-LA, the mean absolute error (MAE) and root mean square error (RMSE) induced by IDG-PSAtt for a 60 min forecasting scenario are reduced by 0.75 and 1.31, respectively, compared to those of the state-of-the-art models. This traffic flow prediction improvement will lead to more precise estimates of the emissions produced by mobile sources, resulting in more accurate air quality forecasts. Consequently, this research will greatly support local environmental management efforts.


Introduction
With the rapid development of urbanization and the growing complexity of road traffic networks, accurate traffic flow prediction has become an indispensable part of intelligent transportation systems (ITSs).Traffic flow prediction aims to predict future traffic flows based on a given historical traffic time series, and accurate traffic flow prediction is essential for transportation services, including route planning, congestion mitigation, and complex traffic network management.Traffic planning can improve the traffic environment, improve the quality of residents' travel, and promote the sustainable development of cities.Through scientific planning, the rational use of land resources can be promoted, land use efficiency can be improved, and land waste can be reduced.Moreover, reasonable transportation system planning can reduce vehicle emissions and noise pollution, improve air quality, and improve residents' travel comfort and safety.
reasonable transportation system planning can reduce vehicle emissions and noise pollution, improve air quality, and improve residents' travel comfort and safety.
Traffic flow prediction has always been a research hotspot in the field of ITSs, but due to the complex, non-Euclidean, and dynamic nature of road traffic networks, achieving accurate and efficient traffic flow prediction is still a significant challenge.Traffic flow data are time series data that have complex temporal dependencies and dynamic spatial correlations, as well as special periodicity, such as morning and evening peaks.Accurate traffic flow prediction requires a prediction model that can adequately capture long-term dependencies and dynamic spatial-temporal correlations.
First, in traffic flow prediction, the prediction error usually accumulates with the increase in time step.Even if the prediction in the first few time steps is relatively accurate, the error may gradually accumulate and expand with the passage of time.It can be challenging to estimate long-term traffic flows due to the complexity of temporal dependencies.For instance, when traffic conditions are predicted for the next 12 time steps by considering the traffic flows of the previous 12 time steps, the traffic flow predictions for the 9th-12th time steps are generally more difficult to obtain than those for the 1st-3rd time steps.Further complicating the task of precisely predicting traffic flows are the underlying dynamic spatial-temporal characteristics that result from the geographical heterogeneity, dynamic correlations, and uncertainty of transportation networks.
The complexity of a transportation network results in complex dynamic spatial-temporal interactions between traffic flows, as illustrated in Figure 1a, which demonstrates that spatial traffic conditions can influence each other and change over time.For instance, a traffic accident may impact the flows of traffic on surrounding roads, and the traffic flows in various directions on the same road might also be different.Figure 1b shows the hidden dynamic spatial characteristics of traffic flows, such as spatial heterogeneity, dynamic correlations, and uncertainty.The presence of distinct traffic patterns and spatialtemporal characteristics, such as residential and industrial zones, is what is meant by "spatial heterogeneity" [1].Given historical traffic data and the topology of the associated road network, dynamic correlations refer to the associations between road network nodes over time.Uncertainty pertains to how a traffic flow is affected by accidents, holidays, severe weather, etc.The ability to anticipate traffic flows using traditional machine learning techniques is severely constrained by their excessive reliance on feature engineering.Instead, deep learning-based approaches, which are frequently employed in traffic flow prediction applications, can efficiently and automatically extract data that describe traffic flow parameters.For instance, a spatial-temporal graph convolutional network (STGCN) [2] utilizes GCNs and gated linear units (GLUs) to capture the spatial-temporal correlations among traffic flows.To capture spatial-temporal correlations, Graph WaveNet [3] adapts both temporal convolutional networks (TCNs) and adaptive graph convolution.SLCNN [4] adapts dynamic graph convolution and a 1D convolutional neural network (CNN) [5] for The ability to anticipate traffic flows using traditional machine learning techniques is severely constrained by their excessive reliance on feature engineering.Instead, deep learning-based approaches, which are frequently employed in traffic flow prediction applications, can efficiently and automatically extract data that describe traffic flow parameters.For instance, a spatial-temporal graph convolutional network (STGCN) [2] utilizes GCNs and gated linear units (GLUs) to capture the spatial-temporal correlations among traffic flows.To capture spatial-temporal correlations, Graph WaveNet [3] adapts both temporal convolutional networks (TCNs) and adaptive graph convolution.SLCNN [4] adapts dynamic graph convolution and a 1D convolutional neural network (CNN) [5] for exploring the spatial-temporal features between traffic flow nodes.STFGNN [6] utilizes multiple GCNs and 1D CNNs [7] to simultaneously extract spatial and temporal correlations.Several of the above studies employed serial or parallel structures to extract dynamic spatial-temporal features; however, these structures can weaken the captured spatial-temporal correlations and even amplify some irrelevant information, resulting in poor traffic flow prediction results.Therefore, the ASTGCN [8] employs a spatial attention mechanism and a temporal attention mechanism to further enhance the prediction performance of the model.The ASTGNN [9] adapts dynamic graph convolutions to extract spatial features and learns the temporal dependencies of traffic flows through an attention mechanism.Additionally, the AGCRN [10] embeds a spatial module into its temporal module, modifying an adaptive GCN and a gated recursive unit (GRU) [11] to enable the simultaneous capture of complex temporal dependencies and dynamic spatial features.MRA-BGCN [12] integrates an attention mechanism into its embedded structure to further extract dynamic spatial-temporal features.Although these methods have improved the ability to capture the dynamic spatial-temporal characteristics of traffic flows, the ability of spatial and temporal modules to interactively learn about and extract dynamic spatialtemporal features is poor, which affects the perception of the traffic flow prediction model with respect to the periodicity of the given time series and the trends of changes.This also results in the inability to adequately capture the dynamic spatial-temporal features of traffic flows [13].
Furthermore, many current studies represent the deep structure of a traffic flow by defining various adjacency matrices to capture hidden dynamic spatial features [14].For instance, the MTGNN [15] employs GCNs, 1D CNNs, and adaptive adjacency matrices to learn hidden spatial features.The STSGCN [16] combines multiple adjacency matrices and employs an embedded structure for traffic flow prediction purposes.An adaptive adjacency matrix can explore the hidden relationships among road network nodes to improve the ability of the utilized model to learn about the spatial heterogeneity of traffic flows; however, with the cessation of the model training process, the adaptive adjacency matrix cannot learn the dynamic associations among graph nodes over time, which leads to the inability of the model to make full use of historical traffic flow information [17].Therefore, the above methods still cannot fully and effectively capture the hidden spatial features of traffic flows.
To address the above research challenges, a traffic flow prediction model based on an interactive dynamic spatial-temporal graph convolution probabilistic sparse attention mechanism (IDG-PSAtt) is proposed; this model adequately extracts dynamic spatialtemporal features from traffic flow time series.A dynamic GCN (DGCN) that can fully utilize a priori knowledge to generate dynamic graphs for capturing the hidden spatial features of traffic flows is adapted.The DGCN is embedded into an interactive learning structure to form an interactive dynamic GCN (IL-DGCN), which analyses the periodicity of the traffic flows, divides the sequences into intervals, and then captures their deep dynamic spatial-temporal features through interactive learning among the divided subsequences.The IDG-PSAtt model combines multiple IL-DGCN modules through an interactive learning strategy to fully extract the dynamic spatial-temporal features of traffic flows.In addition, an adaptive adjacency matrix and a dynamic adjacency matrix are used to further explore the dynamic associations between nodes over time.Finally, IDG-PSAtt captures the complex temporal dependence at the same location and the dynamic spatial correlations among traffic flows at neighboring locations at the same time step via a spatialtemporal convolution (ST-Conv) block; it also employs a probabilistic sparse self-attention (ProbSSAtt) mechanism to incorporate dynamic spatial-temporal features and reduce the computational complexity of the model.
Traditional convolutional networks are only applicable to extracting the local features of Euclidean data, whereas traffic flows are non-Euclidean data.A GCN extends the traditional convolution process to graph-structured data and learns the neighbor information of nodes and edges to process non-Euclidean data and capture the dynamic spatial features of non-Euclidean data [18].Currently, GCNs are mainly categorized into two types of methods, as follows: null domain and spectral domain methods.GCNs based on null domain methods perform convolution by aggregating node information from neighbors to capture the features of the nodes, but the node neighborhood selection process of this method is extremely difficult [19].A combination of null domain-based GCNs and an attention mechanism was used to dynamically adjust the weights of neighboring nodes for determining the importance of these nodes in [20].A GCN based on the spectral domain aggregates the neighbor information of each node through a spectral analysis conducted over the entire graph and the entire graph must be processed at once, which is computationally complex.Deng et al. [21] mapped the structure of a topological graph in the null domain to the spectral domain through the Fourier transform to perform a convolution operation and then used an inverse transformation back to the null domain to complete the computation.Yan et al. [22] adapted ChebNet [23] to decrease the computational complexity of the Laplacian and enhance the performance of traditional GCNs.Based on the graph convolution framework, Li et al. [2] utilized gated GCNS to capture the dynamic features of traffic flows; however, their model does not consider the dynamic spatial-temporal dependencies of traffic flows.
Two main STGCN approaches are available, as follows: artificial neural network (RNN)-based and CNN-based methods [24].An RNN-based STGCN can learn the temporal dependencies of traffic flows, but the iterative RNN training process leads to problems such as error accumulation, a slow training speed, and the inability to handle long time series [25].For example, the spatial-temporal features required for traffic flow prediction can be captured by using an RNN-based graph convolutional recurrent unit network that simultaneously filters inputs and hidden states, but this method is not able to effectively capture dynamic spatial-temporal correlations and has poor long-term prediction capabilities [26].In contrast, CNN-based STGCNs can process data in parallel and consume less memory, significantly improving the training speed of these models; however, when the number of layers in an STGCN is too deep, data feature extraction difficulties are encountered.In addition, the incorporation of a long short-term memory (LSTM) network into a CNN-based STGCN can enable it to efficiently process complex time series.Therefore, Chen et al. [27] processed the dynamic spatial-temporal information of road networks and, thus, predicted future traffic flow through a CNN and LSTM.The common STGCN structure is shown in Figure 2.An STGCN consists of a graph convolution in the spatial dimension and a one-dimensional standard convolution in the temporal dimension, which capture the hidden spatial features of neighborhood locations and the complex temporal dependencies at different times, respectively.Inspired by the above studies, by combining the spatial heterogeneity, dynamic correlations, and uncertainty of road traffic networks, as well as the non-Euclidean data [31] characteristics of traffic flows, a traffic flow prediction model called IDG-PSAtt is proposed.The combination of an interactive dynamic convolution structure with spatialtemporal convolution and a ProbSSAtt block fully captures the dynamic spatial-temporal features of the input traffic flow time series.The IL-DGCN adopts interactive learning to curate traffic flow data into segments at intervals and then synchronously extracts the spatial-temporal dependencies of the segmented sequences and shares the learned spatial-temporal features between the sequences.The ProbSSAtt block improves the computational efficiency of the model by adjusting its attention coefficients so that a small number of key points in the traffic flow provide the main attention for reducing the computational complexity.The interactive learning strategy and the ProbSSAtt block enable the IDG-PSAtt model to effectively perform long-term prediction.In addition, this paper con- In addition, with the rapid development of attention mechanisms, these mechanisms have been widely used in many fields, such as image processing, speech recognition, and natural language processing.They are especially utilized in the field of ITSs to assist in the long-term prediction of traffic flows [28].For example, a spatial-temporal graph convolution prediction model can be constructed by means of a spatial-temporal graph convolution and a self-attention mechanism to capture the dynamic spatial-temporal features of traffic flows.Zheng et al. [29] proposed a GCN based on a self-attention mechanism; this GCN inherits the advantages of the self-attention mechanism and can capture the dynamic spatial-temporal dependencies of traffic flows.Zheng et al. [30] utilized an encoder-decoder structure composed of spatial-temporal attention modules to capture the spatial-temporal features of traffic flows.Guo et al. [9] presented an attentionbased spatial-temporal GNN (ASTGNN), which captures complex temporal dependencies through trend-aware self-attention modules and utilizes a DGCN to extract dynamic spatial features.Although the above methods enhance the prediction performance achieved by the associated models by using self-attention mechanisms, most of these studies tended to ignore the stacked implicit relationships and hidden spatial-temporal correlations in the channel dimensions, weakening the ability of their models to capture dynamic spatialtemporal features.
Inspired by the above studies, by combining the spatial heterogeneity, dynamic correlations, and uncertainty of road traffic networks, as well as the non-Euclidean data [31] characteristics of traffic flows, a traffic flow prediction model called IDG-PSAtt is proposed.The combination of an interactive dynamic convolution structure with spatial-temporal convolution and a ProbSSAtt block fully captures the dynamic spatial-temporal features of the input traffic flow time series.The IL-DGCN adopts interactive learning to curate traffic flow data into segments at intervals and then synchronously extracts the spatial-temporal dependencies of the segmented sequences and shares the learned spatial-temporal features between the sequences.The ProbSSAtt block improves the computational efficiency of the model by adjusting its attention coefficients so that a small number of key points in the traffic flow provide the main attention for reducing the computational complexity.The interactive learning strategy and the ProbSSAtt block enable the IDG-PSAtt model to effectively perform long-term prediction.In addition, this paper constructs a dynamic GCN through an unusual dynamic graph generation approach to capture the hidden dynamic correlations among traffic flow nodes and thus capture the dynamic spatial correlations of the traffic network.Finally, the dynamic spatial-temporal characteristics extracted by adapting the multihead ProbSSAtt block are adaptively fused in this research by employing the gated fusion technique, which mitigates the propagation of errors and increases the resulting prediction accuracy.
The main contributions of this paper are summarized below.

1.
A traffic flow prediction model based on IDG-PSAtt is proposed; this model embeds a DGCN into an interactive learning structure and inherits the advantages of spatialtemporal convolution, as well as a ProbSSAtt block to capture long-range dynamic spatia-temporal features.

2.
A DGCN is constructed to capture spatial-temporal features; this network is generated via the fusion of an adaptive adjacency matrix and a learnable adjacency matrix, where the adaptive adjacency matrix captures the heterogeneity of the given traffic flow time series and the learnable adjacency matrix learns the dynamic correlations among the nodes of the road network.

3.
An ST-Conv block is designed, and the ProbSSAtt block is introduced; these blocks learn the hidden spatial features among various nodes and the complex spatialtemporal dependencies to improve the computational efficiency of the model.

4.
Several comparative experiments are conducted on two datasets and the results show that the IDG-PSAtt model achieves the best prediction performance in both cases when compared to the existing baseline methods.

5.
The traffic flow prediction model proposed in this paper can guide the transportation planning process, thus improving the transportation environment, enhancing the quality of residents' travel, and promoting the sustainable development of cities.

Problem Definition
This paper represents a road traffic network as a graph G = (V, E, A), where |V| = N is the set of nodes; E is the set of edges between the nodes, whose weights are represented by the distances between the nodes; and A ∈ R N×N denotes the initial adjacency matrix generated by the graph G; if v i , v j ∈ V and v i , v j ∈ E, A ij is 1; otherwise, A ij is 0. The traffic flow prediction task aims to predict a future traffic flow based on the given historical information.The adjacency matrix A obtained from the original traffic network is used as a priori knowledge to predict future traffic flows G ∈ R N×C denotes the observation value of graph G at time t, C denotes the number of feature channels, T ′ denotes the length of the given historical time series, and T denotes the length of the predicted future traffic series.The mapping relationships in the traffic flow prediction problem can be expressed as follows: where f denotes a prediction function that is capable of predicting future traffic flows from a given historical time series.

Framework of IDG-PSAtt
The IDG-PSAtt model is proposed for simultaneously capturing the dynamic spatialtemporal correlations of traffic flows.The overall framework of IDG-PSAtt is shown in Figure 3; this framework consists of an IL-DGCN, a tandem fusion module, an ST-Conv block, and a ProbSSAtt block.Among them, the IL-DGCN can extract the hidden dynamic spatial features and dynamic relationships between nodes over time.

Interactive Learning
In this paper, an interactive learning module is implemented with a CNN and a GCN, which can efficiently process non-Euclidean data, better learn the spatial-temporal dependencies of traffic flows, and more adequately capture complex temporal features and dynamic spatial features than can CNN-and TCN-based methods.Moreover, since traffic flows are periodic, trending, and similar, the interleaved sampled subsequences still retain most of the information of the original sequence; therefore, this paper employs the interleaved sampling method to process the original data for performing multiresolution analyses and expanding the sensory field.The interactive learning framework in this paper consists of three identical IL-DGCNs, the core of which is the IL-DGCN module.In the IL-DGCN, two subsequences interactively learn their respective dynamic spatial-temporal features, and each subsequence preprocesses the features via convolution to expand the receptive domain.Moreover, the two subsequences share parameter weights in the DGCN and capture dynamic spatial-temporal features from each other.
In this paper,  ∈ ℝ ×× denotes the input of the IL-DGCN, and  obtains two subsequences after performing interleaved sampling, as follows: an odd sequence   ∈ ℝ ××  2 and an even sequence   ∈ ℝ ××  2 .Moreover, 1 , 2 , 4 , and First, this paper feeds the raw data into a Start Conv layer to obtain a high-dimensional spatial representation of the data for capturing deeper dependencies; then, the IL-DGCN processes the features extracted from the Start Conv layer with an interactive learning strategy implemented on top of the DGCN.The original input of the IL-DGCN is recursively generated by performing interleaved sampling in the data division phase to generate two subsequences of equal size (halved in length); then, the IL-DGCN interactively learns these two subsequences and shares the features learned by each of them.By embedding the DGCN into the interactive learning structure, the dynamic spatial features of traffic flows are interactively acquired, while capturing their temporal dependencies.After the IL-DGCN extracts spatial-temporal features, two subsequences are output.Through the tandem fusion module, both output subsequences are reorganized in time order and then fed to diffusion graph convolution and ST-Conv blocks to extract the global dynamic spatialtemporal features of the traffic flows.Finally, the captured dynamic spatial-temporal features are fed to the ProbSSAtt block and a multilayer perceptron (MLP) to output the predicted sequence.

Interactive Learning
In this paper, an interactive learning module is implemented with a CNN and a GCN, which can efficiently process non-Euclidean data, better learn the spatial-temporal dependencies of traffic flows, and more adequately capture complex temporal features and dynamic spatial features than can CNN-and TCN-based methods.Moreover, since traffic flows are periodic, trending, and similar, the interleaved sampled subsequences still retain most of the information of the original sequence; therefore, this paper employs the interleaved sampling method to process the original data for performing multiresolution analyses and expanding the sensory field.The interactive learning framework in this paper consists of three identical IL-DGCNs, the core of which is the IL-DGCN module.In the IL-DGCN, two subsequences interactively learn their respective dynamic spatial-temporal features, and each subsequence preprocesses the features via convolution to expand the receptive domain.Moreover, the two subsequences share parameter weights in the DGCN and capture dynamic spatial-temporal features from each other.
In this paper, X ∈ R C×N×T denotes the input of the IL-DGCN, and X obtains two subsequences after performing interleaved sampling, as follows: an odd sequence X odd ∈ R C×N× T 2 and an even sequence X even ∈ R C×N× T 2 .Moreover, Conv1, Conv2, Conv4, and Conv4 in the IL-DGCN denote 1D convolution operations.The outputs of the first interactive learning process of the IL-DGCN are X ′ odd ∈ R C×N× T 2 and X ′ even ∈ R C×N× T 2 .Through additional interactive learning, X ′ odd and X ′ even obtain the final output sequences The specific operations in the interactive dynamic graph convolution process are denoted as follows: X even , X odd = Split(X) where ⊙ denotes the Hadamard product and tanh denotes the activation function.

Dynamic Graph Convolution
The DGCN in this paper mainly consists of a diffusion GCN and a graph generation module to better learn deep dynamic spatial features for enhancing the performance of the IDG-PSAtt approach in terms of capturing spatial heterogeneity.The DGCN feeds the hidden features H ∈ R C×N×T and the predefined initial adjacency matrix A ∈ R N×N as inputs to the diffusion GCN, which is subsequently fed to the generator and MLP layers to generate discrete matrices A ′ ∈ R N×N containing spatial-temporal information.A ′ is represented as follows: where GCN denotes the diffusion convolution and graph generation operations and MLP denotes the multilayer perceptron.Gumbel reparameterization is used in this paper because of the need to ensure that the sampling process is conductible during training: where g ∼ Gumbel(0, 1) denotes a random variable, τ is the softmax temperature and has a value of 0.5, and A learn denotes the adjacency matrix generated by a graph generator that can simulate the dynamic dependencies between nodes.Moreover, this paper constructs an adaptive adjacency matrix A apt ∈ R N×N , which is denoted as follows: where E 1 ∈ R N×c and E T 2 ∈ R N×c denote the learnable parameters, and the initial value of A apt is a predefined adjacency matrix A ∈ R N×N , based on the original graph data.
In this paper, we extract the hidden dynamic spatial-temporal correlations among road traffic flows by fusing A learn and A apt with an adaptive fusion module and then feeding the resulting dynamic adjacency matrix A dyn ∈ R N×N to a diffusion GCN.The specific operation of this fusion module is as follows: where α denotes the learnable adaptive parameter This paper utilizes diffusion graph convolution and fusion graph convolution and uniformly defines the diffusion graph convolution input as X in ∈ R C×N×T .
The diffusion graph convolution process is defined as follows: where k is the diffusion step size, K is the maximum number of diffusion steps, and W is the parameter matrix.
The adjacency matrix of the fusion graph convolution input, denoted by the symbol A dyn in the fusion graph convolution module, is represented as follows: The IDG-PSAtt model feeds the dynamic spatial-temporal characteristics extracted from the interactive learning structure to the diffusion graph convolution module by recombining them in the concatenation module in time order to capture and correct all the time series features.
Different from previous studies, this paper uses both the predefined initial adjacency matrix A ∈ R N×N and the dynamic adjacency matrix A dyn ∈ R N×N obtained by the interactive learning structure in the diffusion graph convolution module.For the initial adjacency matrix A, this paper uses directed graphs as well as P f = A rowsum(A) and P b = A T rowsum(A T ) to denote the forward and backward transfer matrices of A, respectively.In this case, the diffusion map convolution process in the concatenation fusion module is represented as follows: The DGCN module is capable of extracting deep hidden spatial features by exploring the invisible dependencies between nodes in the traffic network and generating dynamic correlations between data based on the simulation of the input traffic flow time series.In addition, by embedding a DGCN into the interactive learning framework, it is possible to make full use of the dynamic spatial information captured by the DGCN to more effectively capture the complex temporal dependencies of traffic flows during the training process.

ST-Convolution Block
In a road traffic network, the data detected by each sensor exhibit a certain degree of periodicity.For example, during the morning and evening peak phases on weekdays, the traffic flow increases significantly and its speed is generally low.The hidden spatial features of traffic flows are related to the distances between different sensors and these spatial features are not affected by temporal dependence.
In this paper, a spatial-temporal convolution module consisting of three kernels is designed, as shown in Figure 4.The three kernels correspond to the temporal, spatial, and spatial-temporal perspectives for capturing the spatial-temporal features extracted from the diffusion graph convolution module and the influences of multiple node features on a single node feature in the topological graph structure of the traffic flow.The temporal kernel captures the dependencies of traffic flows at different times at the same location, and the spatial kernel captures the spatial correlations of traffic flows at neighboring locations during the same time step.The output of the prior spatial-temporal attention block serves as the input for each subsequent spatial-temporal convolutional block, i.e., can be calculated from Equations ( 14) and (15).
where ϖ [l+1] t is the temporal kernel with a size of f × 1, ϖ is the spatial kernel with a size of 1 × f , and ϖ [l+1] st is the spatial-temporal kernel with a size of f × f .LeakyReLU(•) denotes the Leaky rectified linear unit function and * denotes the convolution operation.Finally, the outputs of the three convolution kernels are concatenated and the 1 × 1 convolution ϖ [l+1] o is used to compress the features and limit the number of channels.

Subsubsection
A typical input of a self-attention mechanism possesses the form (, , ) and the dot product operation is computed as follows: where  ∈ ℝ   × ,  ∈ ℝ   × ,  ∈ ℝ   × , and  denote the input queries, keys, values, and dimensionality, respectively.The attention factor (  , , ) for the -th query is as follows: ).
The spatial complexity of the self-attention mechanism for computing the dot product (  |  ) is (    ).However, in the computation of the ProbSSAtt block, the input lengths of the query and the key are usually equivalent, i.e.,   =   = , resulting in a total temporal and spatial complexity of (  ).In addition, the ProbSSAtt block combines probabilistic sparsity and a self-attention mechanism by adjusting the attention co-

Subsubsection
A typical input of a self-attention mechanism possesses the form (Q, K, V) and the dot product operation is computed as follows: where , and d denote the input queries, keys, values, and dimensionality, respectively.The attention factor A(q i , K, V) for the i-th query is as follows: where q i , k i , and v i are the ith rows in Q, K, and V, respectively.p k j q i = k(q i ,k j ) ∑ l k(q i ,k l ) and k(q i , k l ) use the asymmetric index kernel exp The spatial complexity of the self-attention mechanism for computing the dot product p k j q i is O L Q L K .However, in the computation of the ProbSSAtt block, the input lengths of the query and the key are usually equivalent, i.e., L Q = L K = L, resulting in a total temporal and spatial complexity of O(LlnL).In addition, the ProbSSAtt block combines probabilistic sparsity and a self-attention mechanism by adjusting the attention coefficients on top of the self-attention mechanism so that for each query, only some of the keys are important to it, i.e., a few key dot products provide the main attention and the remaining dot products are neglected.This approach can indirectly combine complex time-dependent and dynamic spatial features to save computational resources without affecting the accuracy of the model.STC-ProbSSAtt uses M(q i , K) to denote the sparsity of the ith query and KL scatter to measure the sparsity of the query as follows: where the arithmetic mean of all keys is the second term, and the first term is the logarithm and exponent of q i for all keys.It is possible to create the ProbSparse self-attention mechanism using this idea.
The constant sample factor c controls u = c • lnL Q , and Q designates the same sparse matrix as that in dimension q.Q includes only the first u queries under the sparsity evaluated using M(q, K).As a result, the queries of the ProbSparse self-attention mechanism have only O lnL Q complexity.To prevent major information losses, we adopt the multihead ProbSparse self-attention technique in this study.This mechanism may provide various sparse query-key pairs.

Data Description
The prediction performance achieved using the IDG-PSAtt model on the METR-LA and PEMS-BAY [32] public transportation datasets is validated in this study.The METR-LA dataset, rooted in the bustling urban sprawl of Los Angeles County, provides a rich vein of data reflecting the dynamic nature of traffic flows on its freeways.This dataset's significance is amplified by its detailed capture of traffic speed statistics, offering a granular look at vehicular movement patterns over a four-month period through the lens of 207 strategically placed sensors.The PEMS-BAY dataset serves as a complementary yet distinct counterpart, focusing on the San Francisco Bay Area's traffic arteries.With a broader temporal scope spanning six months, it encompasses traffic speed data collected by 325 sensors.The detection site, detection date, and data type are all recorded by METR-LA and PEMS-BAY.Table 1 displays specific information about the experimental datasets.(11) STSGCN [16]: This model individually captures localized spatial and temporal correlations, the number of STSG layers is set to 3 and the hidden dimension is set to 64.

Results and Discussion
This paper compares the performance of the IDG-PSAtt model with that of 11 common baseline models for 15, 30, and 60 min predictions.On two datasets, the IDG-PSAtt model achieves the best prediction results in terms of all the evaluated metrics.
The experimental results in Table 2 indicate that the statistical approaches (HA, VAR, and ARIMA) and the traditional machine learning approaches (SVR and FC-LSTM) perform poorly because these models only consider temporal dependencies and ignore the dynamic spatial characteristics of traffic flows.GCN-based models can handle non-Euclidean traffic data and capture the hidden relationships among road network nodes more effectively; thus, the STGCN and STSGCN models with spatial-temporal GCNs perform better.Although the STSGCN model is capable of concurrently capturing spatial and temporal data, it performs inadequately since it only emphasizes capturing temporal dependencies and uses a straightforward sliding window to capture temporal correlations.Additionally, since attention mechanisms capture the temporal dependencies of sequences, models based on attention mechanisms (e.g., the ASTGCN) also perform well.Graph WaveNet embeds a GCN into a TCN, which makes its performance better than that of the ASTGCN, but Graph WaveNet does not incorporate a self-attention mechanism to further capture the hidden spatial-temporal features.The IDG-PSAtt model significantly improves upon the state-of-the-art models on the METR-LA and PEMS-Bay datasets for 15, 30, and 60 min predictions, because the IDG-PSAtt model adequately captures the dynamic spatial-temporal features of traffic flows through interactive learning structures, its DGCN, and its ST-Conv block; it also utilizes the ProbSSAtt block to produce effective long-range predictions.For instance, the IDG-PSAtt model outperforms the state-of-the-art methods in terms of the MAE, RMSE, and MAPE metrics by 29.4% and 32.2%, respectively, as well as by 9.4%/8.3%and 15.5%/16.1%,respectively, for 15 min and 60 min predictions on PEMS-Bay.The IDG-PSAtt model can adequately capture the dynamic spatial-temporal characteristics of traffic flows.
The IDG-PSAtt model combines an interactive learning strategy with ST-Conv and ProbSSAtt blocks to effectively capture dynamic spatial-temporal correlations in a synchronized manner.As a result, the IDG-PSAtt model can better capture the dynamic spatial-temporal correlations during each period of a traffic flow than can the baseline models, and can also achieve the best prediction results at 15, 30, and 60 min.The IDG-PSAtt model can explore the invisible dynamic correlations among road network nodes.As the prediction period increases, the prediction difficulty increases; however, as shown in Table 2, the long-term prediction effect of the IDG-PSAtt model is still very good, which further validates the effectiveness of the interactive learning strategy employed by the IDG-PSAtt model.3. The differences between these eight model variants and the IDG-PSAtt model are as follows: (1) GCN w/o: Based on the IDG-PSAtt model, the GCN is removed.The ProbSSAtt and IL-DGCN modules used in this research are essential for improving the performance of the model.As a crucial part of the interactive learning framework, the receptive field is extended using one-dimensional convolution, and ablation tests have shown that one-dimensional convolution can greatly boost the performance of models.Along with the ablation of the two adjacency matrices defined within the DGCN module, the validity of the adaptive adjacency matrix is also investigated in the IDG-PSAtt model, as shown in Figure 5.The dynamic adjacency matrix is created by combining a learnable adjacency matrix with an adaptive adjacency matrix.The dynamic adjacency matrix enables the graph convolution process to more accurately depict the hidden spatial correlations in traffic data, as shown in Table 2 and Figures 5-7, demonstrating the effectiveness of the two vital structures proposed in this paper, namely, interactive learning and dynamic graph convolution.(     (     (  (2) Visual Analysis To better explain the proposed IDG-PSAtt model, the experimental outcomes yielded by the IDG-PSAtt, FNN, FC-LSTM, Graph WaveNet, and STGCN models on the PEMS-BAY dataset are visualized in Figure 8.It is obvious from the three subfigures that the prediction performance of the IDG-PSAtt model far exceeds that of the FNN, FC-LSTM, Graph WaveNet, and STGCN models, demonstrating that the proposed model can more adequately extract the dynamic spatial-temporal characteristics of traffic flows.Moreover, as the prediction duration increases, the growth rate of prediction error decreases, and when the prediction duration is longer than 15 min, the prediction errors of IDG-PSAtt are all significantly lower than those of the other comparative models, which indicates that the long-term prediction performance of this model is superior to that of the other models.The above study reveals that the IDG-PSAtt model yields the best prediction results at different prediction time points.The IDG-PSAtt model accurately predicts traffic congestion, captures the trends of traffic flows, and identifies the starting and ending times of the peak traffic flow period, which proves the excellent prediction performance of the IDG-PSAtt model in the traffic flow prediction task, as well as its effectiveness in real-time traffic prediction.The traffic flow prediction model proposed in this paper can guide transportation planning, thus improving the transportation environment, enhancing the quality of residents' travel, and promoting the sustainable development of cities.
By accurately predicting traffic flow and congestion, traffic planners can formulate more effective traffic management strategies, such as adjusting signal timing, optimizing route planning, and providing real-time traffic information, among others.These measures not only reduce traffic congestion and improve traffic efficiency, but also enhance the traffic environment.Furthermore, the improved accuracy of real-time traffic flow prediction enabled by the model facilitates the forecasting of mobile source emissions, significantly enhancing the precision of local air quality predictions.This comprehensive approach contributes to both smoother traffic flow and a better environmental outcome [39,40].This, in turn, can offer valuable data and technical support to environmental management departments to develop various control measures, such as implementing truck bans around a city, restricting the national use of three vehicles, and granting preferential road rights to vehicles using new energy.Furthermore, traffic flow simulations can provide early warnings of potential traffic congestion, enabling the timely implementation of diversionary measures to mitigate emissions resulting from idling vehicles.

Conclusions
This paper proposes an efficient and accurate traffic prediction model called IDG-PSAtt, which not only considers non-Euclidean traffic flows but also combines an interactive learning strategy with ST-Conv and ProbSSAtt blocks to fully capture the dynamic spatial-temporal features of traffic flows.This approach solves the problems faced by the noninteractive, previously developed models, which insufficiently capture spatial-temporal features and have difficulty making long-term predictions.Specifically, the IDG-PSAtt model creates a dynamic graph structure by adapting the input spatial-temporal information and employs a preset initial adjacency matrix to simulate the dynamic relationships between nodes for exploring the dynamic associations between the invisible nodes in a traffic network and capturing their hidden spatial correlations.Moreover, an IL-DGCN is constructed by embedding a DGCN block into the interactive learning framework to learn the periodic characteristics and trends of traffic flows and simultaneously capture their spatial-temporal dependencies.Finally, ST-Conv and ProbSSAtt blocks are used to fully exploit the dynamic spatial-temporal features of traffic flows to achieve improved traffic flow prediction accuracy.The IDG-PSAtt model has significantly better prediction performance than the baseline models according to experiments conducted on two traffic datasets.On METR-LA, the MAE and RMSE of IDG-PSAtt for 60 min predictions are reduced by 0.75 and 1.31, respectively, compared with those of the state-of-the-art models.As the training time increases, the performance of the proposed model improves, increasing the accuracy of the predicted traffic flow and the predictability of the traffic flow over the medium-to-long term.
Different cities may have different data collection techniques and standards, so models need to be able to adapt to data from different sources and formats.For example, some cities may need to focus more on the impact of public transportation, while others may need to pay more attention to the flow of private vehicles.Therefore, the IDG-PSAtt model would need to incorporate more variable factors for discussion.
In practical scenarios, external variables such as weather conditions and current social events significantly impact traffic flow prediction tasks.By accounting for these external effects, we can enhance the accuracy and training performance of predictive models.Moreover, environmental protection is a crucial consideration in this context.Accurately

Figure 4 .
Figure 4. Diagram of the ST-Conv block framework.

Figure 4 .
Figure 4. Diagram of the ST-Conv block framework.

( 1 )
Ablation Experiment To further investigate the performance of the various modules in the IDG-PSAtt model proposed in this paper, eight variants of the IDG-PSAtt model are designed to verify the effect of each module on the IDG-PSAtt model.These eight variants are compared with the full IDG-PSAtt model in terms of the mean values of the MAE, the RMSE, and the MAPE metrics produced on the METR-LA and the PEMS-BAY datasets, and the results of the ablation experiments are shown in Table

( 2 )( 6 )
DGCN w/o: Based on the IDG-PSAtt model, the DGCN is removed.(3) Conv w/o: The one-dimensional convolutional modules are removed from the interactive learning structures based on the IDG-PSAtt model.(4) Interaction w/o: Based on the IDG-PSAtt model, the interactive learning structures are removed.(5) Apt Adj w/o: Based on the IDG-PSAtt model, the adaptive adjacency matrix in the DGCN is removed.Learned Adj w/o: Based on the IDG-PSAtt model, the graph generation structure is removed, and the adaptive adjacency matrix is retained.(7) ProbSSAtt w/o: Based on the IDG-PSAtt model, the ProbSSAtt block module is removed.(8) ST-Conv Block w/o: Based on the IDG-PSAtt model, the ST-Conv module is removed.

Figure 5 .
Figure 5.Comparison between the MAE metrics produced on the two datasets.

Figure 6 .
Figure 6.Comparison between the MAPE metrics produced on the two datasets.

Figure 7 .
Figure 7.Comparison between the RMSE metrics produced on the two datasets.
) Visual Analysis To better explain the proposed IDG-PSAtt model, the experimental outcomes yielded by the IDG-PSAtt, FNN, FC-LSTM, Graph WaveNet, and STGCN models on the PEMS-BAY dataset are visualized in Figure 8.It is obvious from the three subfigures that the prediction performance of the IDG-PSAtt model far exceeds that of the FNN, FC-LSTM, Graph WaveNet, and STGCN models, demonstrating that the proposed model can more

Figure 5 .
Figure 5.Comparison between the MAE metrics produced on the two datasets.

Figure 5 .
Figure 5.Comparison between the MAE metrics produced on the two datasets.

Figure 6 .
Figure 6.Comparison between the MAPE metrics produced on the two datasets.

Figure 7 .
Figure 7.Comparison between the RMSE metrics produced on the two datasets.
) Visual Analysis To better explain the proposed IDG-PSAtt model, the experimental outcomes yielded by the IDG-PSAtt, FNN, FC-LSTM, Graph WaveNet, and STGCN models on the PEMS-BAY dataset are visualized in Figure 8.It is obvious from the three subfigures that the prediction performance of the IDG-PSAtt model far exceeds that of the FNN, FC-LSTM, Graph WaveNet, and STGCN models, demonstrating that the proposed model can more

Figure 6 .
Figure 6.Comparison between the MAPE metrics produced on the two datasets.

Figure 5 .
Figure 5.Comparison between the MAE metrics produced on the two datasets.

Figure 6 .
Figure 6.Comparison between the MAPE metrics produced on the two datasets.

Figure 7 .
Figure 7.Comparison between the RMSE metrics produced on the two datasets.
) Visual Analysis To better explain the proposed IDG-PSAtt model, the experimental outcomes yielded by the IDG-PSAtt, FNN, FC-LSTM, Graph WaveNet, and STGCN models on the PEMS-BAY dataset are visualized in Figure 8.It is obvious from the three subfigures that the prediction performance of the IDG-PSAtt model far exceeds that of the FNN, FC-LSTM,

Figure 7 .
Figure 7.Comparison between the RMSE metrics produced on the two datasets.

Atmosphere 2024 ,
15, x FOR PEER REVIEW 16 of 20that the long-term prediction performance of this model is superior to that of the other models.

Figure 8 .Figure 8 .
Figure 8. Visualizations of the comparisons conducted between different models on the PEMS-BAY dataset.The above study reveals that the IDG-PSAtt model yields the best prediction results at different prediction time points.The IDG-PSAtt model accurately predicts traffic congestion, captures the trends of traffic flows, and identifies the starting and ending times

Table 1 .
Description of the experimental datasets.

Table 2 .
Comparison between IDG-PSAtt and the baselines on two traffic datasets.

Table 3 .
Comparison between IDG-PSAtt and its variants on two traffic datasets.