Next Article in Journal
The LADM Spatial Plan Information Country Profile for Serbia
Previous Article in Journal
Modeling Spatial Determinants of Blue School Certification: A Maxent Approach in Mallorca
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mixed-Graph Neural Network for Traffic Flow Prediction by Capturing Dynamic Spatiotemporal Correlations

1
College of Computer Science, Beijing University of Technology, Beijing 100124, China
2
College of Urban Transportation, Beijing University of Technology, Beijing 100124, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2025, 14(10), 379; https://doi.org/10.3390/ijgi14100379
Submission received: 26 July 2025 / Revised: 23 September 2025 / Accepted: 26 September 2025 / Published: 27 September 2025
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)

Abstract

Traffic flow prediction is a prominent research area in intelligent transportation systems, significantly contributing to urban traffic management and control. Existing methods or models for traffic flow prediction predominantly rely on a fixed-graph structure to capture spatial correlations within a road network. However, the fixed-graph structure can restrict the representation of spatial information due to varying conditions such as time and road changes. Drawing inspiration from the attention mechanism, a new prediction model based on the mixed-graph neural network is proposed to dynamically capture the spatial traffic flow correlations. This model uses graph convolution and attention networks to adapt to complex and changeable traffic and other conditions by learning the static and dynamic spatial traffic flow characteristics, respectively. Then, their outputs are fused by the gating mechanism to learn the spatial traffic flow correlations. The Transformer encoder layer is subsequently employed to model the learned spatial characteristics and capture the temporal traffic flow correlations. Evaluated on five real traffic flow datasets, the proposed model outperforms the state-of-the-art models in prediction accuracy. Furthermore, ablation experiments demonstrate the strong performance of the proposed model in long-term traffic flow prediction.

1. Introduction

Traffic prediction is a prominent research area within intelligent transportation systems, which includes traffic flow prediction, traffic speed prediction, traffic density prediction, etc. Nowadays, it increasingly influences urban traffic control and management. With the expansion of urban population and space, various traffic problems are affecting people’s travel experience, such as traffic congestion, air and noise pollution, etc. Through accurate and effective traffic characteristics capture, traffic flow or speed prediction and urban traffic management, intelligent traffic systems can avoid traffic congestion and improve people’s travel efficiency [1].
One of the classic problems of spatial–temporal data prediction is traffic flow prediction. Given the complexity of urban road networks, the traffic flow on a particular road segment will be influenced not only by its own historical traffic flow but also by the traffic flow on neighboring road segments. An illustration of a road network is presented in Figure 1. For a road segment M, the traffic flow at time t is influenced not only by its historical traffic flow (red dotted lines) but also by its adjacent road segments at the current time (blue dotted lines). In addition, the historical traffic flow of adjacent road segments of M will also have an indirect impact on the traffic flow of M at the current time (yellow dotted lines). The aforementioned effects or correlations can be classified into temporal or spatial correlations. In short, traffic flow shows strong correlations in both temporal and spatial dimensions. Therefore, comprehensively investigating the temporal and spatial traffic flow correlations in road segments and identifying their changing patterns is crucial for achieving precise and effective traffic flow prediction.
The traffic flow data of a road segment can be considered as a time-correlated sequence of data. To capture the temporal traffic flow correlations, regression-based methods and models were introduced to identify the linear and nonlinear change patterns of traffic flow data [2]. With the development of deep learning, traffic flow prediction models based on RNN [3] and LSTM [4] were proposed to deal with long-term historical traffic data. LSTM can effectively address the gradient vanishing problem of RNNs and offers excellent prediction performance. However, the above methods and models only consider the temporal traffic flow correlations; they ignore their spatial correlations, so they are not suitable for traffic flow prediction in large-scale road networks. To capture the spatial traffic flow correlations, scholars have put forward many studies in the last ten years. Some scholars combined CNN with RNN to capture spatial–temporal correlations at the same time and achieved some good prediction results [5]. CNN can extract the correlations from graph structure data to obtain better performance in spatial correlation capture. However, the road network’s graph structure is non-Euclidean. To more effectively use the convolution operation to capture the graph structure, the graph convolution network (GCN) employing the Fourier transform was introduced to describe the spatial correlations within graph structure data [6].
Most of the existing methods rely on GCN for modeling and extracting the spatial–temporal traffic flow correlations. Supposing D represents the degree matrix and A represents the adjacency matrix. In GCN, the D 1 / 2 A D 1 / 2 matrix is calculated and used as the weight matrix. For the traffic flow of the same road segments at different times, the weight matrix is fixed, which makes GCN unable to capture the dynamic correlations between road segments. Dynamic correlations refer to the fact that the distribution of traffic flow on the same road segment varies due to different times or environments. For example, if a traffic accident occurs in a road segment, the traffic flow of its adjacent road segments will be very different from the past. If only GCN is used in the prediction methods, it will ignore these dynamic correlations. Therefore, to realize the traffic flow prediction within a road network, it is necessary to capture the dynamic correlations of road segments. The graph attention network (GAT) inspired by the attention mechanism gives more weight to important adjacent road segments, and the self-attention mechanism is utilized to further improve the impacts between road segments [7].
The calculation rules of GCN and GAT are studied in this paper, and the characteristics learned by them are integrated into the traffic flow prediction. The weight matrix of GCN is fixed when weighting neighboring nodes, while GAT uses attention weights to weight neighboring nodes, and the attention weights are dynamically calculated based on the characteristics of the neighboring nodes. GAT is more flexible than GCN. Therefore, based on the above analysis, a new traffic flow prediction model is proposed, which combines GCN and GAT to learn the static and dynamic spatial traffic flow characteristics of road segments within a road network. In the proposed model, GCN is utilized to capture the static spatial characteristics of road segments, while GAT is utilized to capture the dynamic spatial characteristics of road segments over time. The captured characteristics are given different weights through the gating mechanism to obtain the road segments’ spatial characteristics within the road network. Then, the spatial characteristics of road segments are input into the Transformer encoder layer to extract the temporal correlations and gain the temporal characteristics. Finally, the classical convolution layer is utilized to predict the road segments’ future traffic flow within the road network. The key contributions of this paper can be summarized as follows.
  • GCN and GAT are mixed to extract the static and dynamic spatial traffic flow characteristics of road segments within a road network.
  • A spatial–temporal block, involving GCN, GAT, and Transformer encoder, is proposed to extract the spatial and temporal traffic flow characteristics of road segments within a road network.
  • Experiments on five real datasets show that the proposed model outperforms the baseline models in traffic flow prediction accuracy. In addition, experiments also show that the proposed model has strong performance and extensibility in long-term traffic flow prediction.
The remainder of this paper is structured as follows. Section 2 introduces and analyzes the related work on traffic flow prediction. Section 3 presents the problem description and definitions. Section 4 details the design of the proposed model. Section 5 provides the experiment results and analysis. Finally, Section 6 concludes the paper and outlines the direction for future research.

2. Related Work

One of the typical problems of spatial–temporal data prediction is traffic flow prediction, which is a collection of continuous traffic flow in the same location. In recent decades, a lot of research has been done on traffic flow prediction from different perspectives. Using the temporal correlation and considering the change characteristics of the past and future states of traffic flow data, autoregression integrated moving average (ARIMA)-based models were proposed, which smooth the non-stationary traffic flow by differential processing [2,8,9]. To accommodate the traffic flow’s nonlinear changes, nonparametric models, such as support vector regression (SVR), have been introduced and successfully employed in the field of traffic flow prediction [10,11]. The regression-based traffic flow prediction models are fast, but they struggle to handle the data with complex correlations, making them unsuitable for predicting the overall traffic flow within large-scale road networks.
The deep learning models have more parameters to capture the complex correlations of data compared with the above regression-based traffic flow prediction models, so they have been employed in traffic flow prediction in recent years. Tawfeek et al. [12] used neural networks to improve safety performance functions based on linear regression, which significantly improves the robustness and performance of the model. To capture temporal correlations of traffic data, methods and models based on recurrent neural networks (RNN) were proposed for traffic data prediction [3,13,14]. As the variants of RNN, gated recurrent unit (GRU) [15,16,17,18,19] and long short-term memory (LSTM) [4,20,21,22] can overcome the RNN’s gradient vanishing problem and achieve long-term traffic flow prediction. However, in the aforementioned research, only the temporal correlations are considered, and the spatial correlations are ignored.
The traffic conditions of a road segment are affected by its neighboring road segments because the traffic network is connected. Therefore, spatial correlation capture is important for traffic flow prediction. To capture spatial correlations, a convolutional neural network (CNN) was utilized to establish a spatial correlation model. In recent years, scholars have begun to combine the CNN with the RNN to capture the temporal and spatial traffic flow correlations of road segments simultaneously, which achieves good performance [5,23,24,25,26]. For example, the CNN and RNN are combined to solve the time series prediction problem in [5]. In their model, to obtain better prediction performance, CNN is utilized to capture the characteristics of grid structure data. However, this model tends to forget the earlier time information, which will diminish the prediction accuracy. Han et al. proposed a hybrid model for the traffic flow prediction by combining CNN and LSTM [27]. In their model, CNN and LSTM are utilized to extract the spatial and temporal traffic flow correlations of road segments, respectively. Since LSTM can effectively overcome the gradient disappearance problem of RNN, the model has good traffic flow prediction ability. However, the aforementioned models are mostly applied to regular road network structures, and their limitation is that the input data of these models must be standard grid data. However, the road network’s graph structure is non-Euclidean; thus, the CNN restricts the expression of spatial correlations among road segments within a road network.
To perform a convolution operation on a graph structure, the Fourier transform-based graph convolutional network (GCN) is proposed [6,28,29,30,31,32,33,34]. This kind of GCN can process both Euclidean and non-Euclidean structures, such as road networks, social networks, etc., which brings a new direction for the traffic flow prediction. Zhao et al. introduced GCN to the traffic flow prediction [6]. In their model, GCN and GRU were utilized to extract the spatial and temporal traffic flow characteristics, respectively, and achieved good prediction results. A deep learning model for the traffic flow prediction is proposed in [35], which models the spatial characteristics of road segments as a directed graph and introduces the diffusion CNN to capture the spatial and temporal traffic flow correlations. Yu et al. proposed a spatial–temporal GCN model [31], which considers the temporal traffic flow correlations and combines GCN to solve the problem of spatial–temporal traffic flow prediction. However, their model still uses a fixed-graph structure to model and extract the temporal and spatial traffic flow characteristics, which limits the expression of spatial correlations of road segments within a road network. A graph attention network (GAT) inspired by the attention mechanism was proposed in [7], which gives greater weight to important adjacent road segments within a road network. Wu et al. combined the GAT with LSTM [36], which establishes an end-to-end learnable encoder prediction model to solve the problem of multi-link traffic flow prediction. However, the spatial dynamic and static traffic flow correlations are not fully studied by the model. To overcome the aforementioned limitations, a spatial module integrating GCN and GAT is established in the proposed traffic flow prediction model to extract the static and dynamic spatial traffic flow correlations. Then, the gating mechanism is employed to combine the two spatial correlations. Finally, the combined characteristics are input into the Transformer encoder layer to predict traffic flow.

3. Problem Description and Definitions

In this section, the problem of traffic flow prediction is formalized and some important definitions are given.
Definition 1.
The definition of a road network ( G ) is as follows:
G = { V , E , A } ,
where the set of N nodes in G is represented by V = { v 1 , v 2 , , v N } , and v i indicates the i t h road segment; E = { w 1 , 1 , w 1 , 2 , , w N , N } is a set of graph edges, and w i , j indicates the road network weight between road segments v i and v j ; A R N × N is the adjacency matrix. The value at the position with subscripts ( i , j ) in A is w i , j . The road network weights are the reciprocals of the min-max normalized Euclidean distances between adjacent road segments. The road network weights of non-adjacent road segments are zero.
The traffic flow data utilized in this paper were sourced from PeMS [37] and BJTaxi [38], which were collected by roadside facilities along road segments within a road network. The traffic flow data of the network with N nodes in the period T can be defined as follows:
Definition 2.
The matrix D ( N , T ) can be utilized to represent the traffic flow data collected from the network of N nodes in the period T,
D ( N , T ) = d ( 1 , 1 ) d ( 1 , 2 ) d ( 1 , T ) d ( 2 , 1 ) d ( 2 , 2 ) d ( 2 , T ) d ( N , 1 ) d ( N , 1 ) d ( N , T ) ,
where d n , t indicates the traffic flow data value of the n t h road segment at the t t h time point. Figure 2 represents the traffic flow data of N nodes in the time interval T. Based on the roadside facilities, the nodes’ traffic flow data within a road network can be continuously collected (i.e., T could be very long).
The aim of the traffic flow prediction for N nodes within a road network is to develop a model ( m o d e l ( · ) ) that can predict the future traffic flow values for the nodes according to the road network structure information G and the collected traffic flow data, which can be shown as follows:
D ( N , T + 1 ) = m o d e l ( G ; D ( N , T ) ) .

4. The Basic Principle of the Proposed Model

As shown in Figure 1, both the historical traffic flow and the traffic flow of adjacent road segments within the road network will influence the traffic flow of a certain road segment. By capturing the spatial and temporal traffic flow correlation of road segments within the road network, the proposed model can obtain the static and dynamic traffic flow characteristics. The proposed model consists of the input layer, the spatial characteristics extraction layer, the temporal characteristics extraction layer, and the output layer, which are shown in Figure 3. In the input layer, the collected traffic flow data are reorganized into overlapping traffic flow segments by time window. Then, the traffic flow segments are input into the spatial characteristics extraction layer, where GCN and GAT are utilized to capture the static spatial characteristics and dynamic traffic flow characteristics of road segments, respectively. After that, the extracted spatial characteristics are input into the temporal characteristics extraction layer to extract the temporal traffic flow characteristics of road segments. Finally, the results of traffic flow prediction are output by the output layer. The detailed processes of the first three layers will be introduced in the following subsections.

4.1. Input Layer

The first layer of the proposed model is the input layer, where the time window is used to extract traffic flow segments from collected traffic flow data (see Figure 2). To keep as many spatial and temporal characteristics as possible, the time window only moves by one time interval after each traffic flow segment extraction. The collected traffic flow data D ( N , T ) are a N × T matrix, where N and T represent the numbers of nodes and time intervals, respectively (see Definition 2). The time window size is N × F , where F is the number of the extracted continuous time intervals. To keep the short and long term traffic flow correlations, the value of F should be 1 F T and set according to real applications.
After traffic flow segment extraction, the P = T F + 1 traffic flow segments can be extracted, which can be represented by { X r 1 , X r 2 , , X r P } , where X r p R N × F represents the p t h traffic flow segment. These extracted segments will be input into the spatial characteristics extraction layer in turn.

4.2. Spatial Characteristics Extraction Layer

As we introduced in Section 1, due to the extensive connectivity of the road network, the traffic flow of road segments within the road network has complex correlations. The traffic network is connected, so the traffic condition of a road segment is affected by that of its neighboring road segments. The traditional convolution operation can efficiently extract the static characteristics of the road network, such as the adjacency and distance characteristics of road segments within a road network. However, due to the impact of dynamic and uncontrollable conditions, such as time and traffic accidents, the dynamic spatial correlations are not fixed, which is difficult to directly obtain from the static road network information. To extract the dynamic spatial correlations, a spatial characteristics extraction module is proposed to obtain them for the accurate traffic flow prediction. As two mainstream neural networks, GAT and GCN have good learning effects on non-Euclidean data. Therefore, GCN learns the static spatial traffic flow correlations and GAT learns the dynamic spatial traffic flow correlations with time. The structure of spatial characteristics extraction layer is illustrated as follows.
Figure 4 and Section 4.1 suggest that the input of the spatial characteristics extraction layer is a three-dimensional tensor X R N × F × P , where N represents the number of nodes, F represents the number of time intervals in a traffic flow segment (window size), and P is the number of traffic flow segments extracted from the continuous traffic flow data. The traffic flow segments (i.e., X r p R N × F ) are input into GCN and GAT, respectively, in turn. To keep the proposed model stable, a fully connected layer (i.e., FC) is utilized to calculate the residual (i.e., X r e s p R N × τ ) from the traffic flow segments, where τ is the number of extracted characteristics from a node. After entering GCN and GAT, the static and dynamic spatial traffic flow characteristics are X c o n v p R N × τ and X a t t n p R N × τ , respectively. Then, the static spatial characteristics X c o n v p and dynamic spatial characteristics X a t t n p will be fused as X g p R N × τ with the gating mechanism. After that, the fused spatial characteristics X g p R N × τ is added with the residual X r e s p R N × τ to obtain the spatial characteristics X s p R N × τ . Finally, to extract more spatial characteristics, X s p R N × τ is input into the K stacking of spatial characteristics extraction layers. After passing K spatial characteristics extraction layers, a three-dimensional tensor X s R N × τ × P can be obtained.

4.2.1. GCN

First, GCN is utilized to capture the static spatial traffic flow correlations. The GCN inputs are the characteristic matrix X r p and the adjacency matrix. The output is the static characteristics matrix X c o n v p R N × τ . The working principle of GCN is to aggregate the characteristics between adjacent nodes (i.e., adjacent road segments) based on the adjacency matrix. The propagation rules of GCN are described as follows:
H ( l + 1 ) = ReLU ( D ˜ 1 2 A ˜ D ˜ 1 2 H ( l ) W ( l ) ) ,
where A ˜ = A + I N , and A and I N are the adjacency and identity matrices, respectively, so A ˜ is the adjacency matrix where the nodes are self-connected; D is the degree matrix, and D ˜ 1 2 A ˜ D ˜ 1 2 is symmetrical normalized A ˜ ; W ( l ) is the learnable weight matrix of the l t h layer of GCN; H ( l ) is the activated characteristic matrix of the l t h layer, and H ( 0 ) = X r p . Because the graph structure is created according to the road segments’ adjacency relationships, the static spatial traffic flow characteristics can be successfully extracted through GCN.

4.2.2. GAT

However, in addition to the static characteristics determined by the adjacency relationships of road segments within the road network, the traffic conditions of the road network are also affected by time, traffic accidents, etc., which means that the correlations of adjacent road segments (nodes) are not fixed. GAT is utilized to extract the dynamic spatial traffic flow characteristics. GAT is an improvement of GCN in neighboring weight allocation. In GAT, the correlations of adjacent nodes can be learned through the self-attention mechanism, which can efficiently obtain the dynamic spatial traffic flow characteristics. Compared with updating the weight purely relying on the graph structure, GAT has more expressive abilities. Its input is a series of node characteristics X r p R N × F , where F is the number of traffic flow values of each node, and its output is a set of new node characteristics X a t t n p R N × τ . Based on GAT, for a node i, the unnormalized attention coefficients between i and all of its adjacent nodes are first calculated in turn. The coefficient between node i and one of its adjacent nodes j is calculated as follows:
e i j ( l ) = LeakyReLU ( a ( l ) [ W ( l ) h i | | W ( l ) h j ] ) ,
where a ( l ) and W ( l ) are the learnable-shared weight vector and matrix of the l t h layer, respectively; h i and h i are the characteristics (i.e., F continuous traffic flow values) of nodes i and j; and [ · | | · ] is the concatenation operation. After coefficient calculation, the softmax function is used to normalize all coefficients of node i, which is described as follows:
α i j ( l ) = Softmax ( e i j ( l ) ) = e x p ( e i j ( l ) ) k N i e x p ( e i k ( l ) ) ,
where N i denotes all the adjacent nodes of node i. Finally, the normalized coefficient α i j ( l ) is used to combine the characteristics of all adjacent nodes of node i to compute the final characteristics of node i, which is calculated as follows:
h i ( l + 1 ) = σ ( j N i α i j ( l ) W ( l ) h j ) ,
where σ ( · ) represents the activation function. Because GAT can dynamically set the weight coefficient of its adjacent nodes using the attention mechanism according to the traffic flow, it can capture the dynamic spatial traffic flow characteristics of road segments within the road network well.

4.2.3. The Gate Mechanism

In addition, after extracting the static and dynamic spatial traffic flow characteristics of the road segments, the integration of learned characteristics is also very important for the traffic flow prediction. Therefore, the gate mechanism is utilized to integrate the static and dynamic spatial characteristics (i.e., X c o n v p and X a t t n p ) learned from GCN and GAT, respectively, which is described as follows:
X g p = g · X c o n v p + ( 1 g ) · X a t t n p ,
g = σ ( X c o n v p W g c o n v + X a t t n p W g a t t n ) ,
where static spatial traffic flow characteristics X c o n v p and dynamic spatial traffic flow characteristics X a t t n p are multiplied by two learnable matrices, namely, W g c o n v R τ × τ and W g a t t n R τ × τ , respectively. In this way, the gate mechanism adaptively adjusts the static and dynamic spatial traffic flow characteristics, making them more suitable for the gate coefficient matrix g generation. σ ( · ) is the sigmoid function that normalizes the input values to the range of 0 to 1. Finally, the dot multiplication of g with the static spatial traffic flow characteristics and ( 1 g ) with dynamic spatial traffic flow characteristics are performed, and their sum is the fused spatial traffic flow characteristics X g p .

4.2.4. The Residual Calculation

To retain the characteristics of the raw input data, the residual characteristics between the raw input data and the output of the gate mechanism are calculated and a fully connected neural network (FC) is utilized to keep the dimension consistent.
X r e s p = FC ( X r p )
X s p = X g p + X r e s p ,
where X g p R N × τ is the output of the gate mechanism (see Equation (8)), F C represents the fully connected neural network, X r e s p is the residual characteristics. After passing the spatial characteristics extraction layer, the output is X s p R N × τ .

4.3. Temporal Characteristics Extraction Layer

The above layer only considers the spatial correlations of traffic flow. However, the traffic flow also shows strong temporal characteristics. Therefore, it is also necessary to use the time series model to extract the temporal characteristics from the traffic flow. At present, the commonly used deep learning time series models are RNN, TCN [39], and Transformer [40]. RNN can only be calculated step by step, and its computational efficiency will be low when the number of model parameters is large or the length of time series data is long. In addition, RNN also has the problem of gradient vanishing and explosion [41]. TCN uses stacked CNNs to dilatedly aggregate characteristics of time series data, which allows TCN to model long time series data. Thanks to the attention mechanism, the Transformer encoder can focus on the entire time series data at the same time, so it can extract time series characteristics well. Therefore, to extract the long-term temporal characteristics, the Transformer encoder is used to capture the temporal characteristics of traffic flow.
Similarly, in the temporal characteristics extraction layer, the input is the three-dimensional tensor X s R N × τ × P extracted by the spatial characteristics extraction layer. For the p t h traffic flow segment, the input is a two-dimensional characteristic matrix X s p R N × τ , where τ is the number of extracted characteristics from a node and N is the number of nodes. First, three matrices are generated according to the input X s p , namely, query matrix Q R N × d , key matrix K R N × d , and value matrix V R N × τ , which are calculated as follows:
Q = X s p W q ,
K = X s p W k ,
V = X s p W v ,
where W q R τ × d , W k R τ × d , and W v R τ × τ are the learnable weight matrices used to determine the three vectors Q , K , and V . Then, the contribution of adjacent temporal characteristics to current temporal characteristics Y p R N × τ is calculated as follows:
Y p = Softmax ( Q K T d ) V
After calculating all Y p , a three-dimensional tensor Y R N × τ × P is obtained. Finally, the learned characteristics can be input into the classical convolution layer to obtain the traffic flow prediction results.

4.4. Complexity Analysis

The time complexity of deep learning models mainly depends on the number of matrix multiplications. Let τ be the latent space dimension of traffic flow characteristics, N be the number of nodes (i.e., the number of road segments), and E be the number of edges (i.e., the number of edges with road network weights greater than 0). The proposed model uses GCN to extract static spatial traffic flow characteristics. For GCN, the node characteristics must be projected first. Supposing that the dimension is still τ after projection, the time complexity is O ( N τ 2 ) . Then, each node needs to aggregate neighbor characteristics, whose time complexity is O ( E τ ) . Supposing that the number of layers of GCN is L, the total time complexity is O ( L ( N τ 2 + E τ ) ) . The proposed model uses GAT to extract dynamic spatial characteristics of traffic flow. Compared with GCN, GAT additionally calculates the attention weights between nodes and the time complexity is O ( N 2 τ ) . Similarly, the total time complexity of GAT is O ( L ( N τ 2 + E τ + N 2 τ ) ) . Supposing that the number of layers of the spatial characteristics extraction layer is K, considering the gate mechanism and residual connection, the time complexity of the spatial characteristics extraction layer is O ( K L τ ( N τ + E + N 2 ) ) .
The time complexity of the temporal characteristics extraction layer comes from the Transformer encoder layers. For the multi-head attention mechanism, supposing d = τ , we obtain O ( N τ 2 ) the time complexity for the query, key, and value matrices, as well as the O ( N 2 τ ) time complexity for the attention weight (i.e., temporal characteristics). The time complexity of the feed forward neural network is O ( N τ 2 ) . Supposing that the Transformer encoder has L T layers, the total time complexity of the temporal characteristics extraction layer is O ( L T N τ ( N + τ ) ) . Therefore, for the p t h traffic flow segment, the total time complexity of the proposed model is O ( K L τ ( N τ + E + N 2 ) + L T N τ ( N + τ ) ) .

5. Experimental Results and Analysis

To validate the performance of the proposed model, the experiments are conducted on five real datasets, and the experimental results are analyzed by comparing them with baseline models.

5.1. Experimental Settings

The datasets used in this paper are PEMS03, PEMS04, PEMS07, PEMS08, and BJTaxi. PeMS [37] uses roadside sensors on California highways to collect traffic data in real-time, which is widely used in the traffic flow prediction. In PeMS, the traffic flow values are collected every five minutes, and 288 traffic flow values are collected every day at each road segment (i.e., node). In BJTaxi [38], taxi GPS trajectories are collected from 290 taxicabs in Beijing. The time interval is half an hour, and 22,484 traffic flow values are generated from these trajectory data. In the experiment, the data are sorted according to the timestamp from small to large, and the first 60% of the traffic data is utilized as the training set and the last 40% is equally divided into the validation set and the test set. The validation set is used to evaluate the model’s performance in each training epoch to check whether the model converges and for tuning of hyperparameters. The test set is used to evaluate the model’s prediction accuracy. The information on the five datasets used in the experiment is shown in Table 1.
In alignment with [32], the traffic flow values in the former 12 time intervals (1 h for PeMS, 6 h for BJTaxi) are utilized to predict the traffic flow values in the subsequent 12 time intervals.
To make the proposed model show the best prediction accuracy on different datasets, we set different hyperparameters for the proposed model on different datasets. To facilitate experimental comparison, the batch size of all datasets is set to 32. Due to the over-smoothing problem of the graph neural network [42], the spatial characteristics extraction layers should not be too deep. After experimental verification, the number of layers is set to 2. A common head number for the graph attention network and the Transformer encoder is 8. Larger τ and more layers of the Transformer encoder can improve the capability of the model, and their values are experimentally determined. The values of τ and the numbers of the Transformer encoder layers for the PEMS04 and PEMS07 datasets are relatively large. This is not only because the data scales of PEMS04 and PEMS07 are larger but also because the data variance of PEMS04 and PEMS07 is also greater than that of PEMS08. The learning rate should be the opposite of the number of model parameters. If a large learning rate is set for a model with more parameters, it will be difficult for the model to converge during training. Therefore, the learning rate of the model with the largest number of parameters (PEMS04) is set to 0.0001 and the learning rate of the model with the smallest number of parameters (PEMS03, PEMS08, and BJTaxi) is set to 0.001. The hyperparameters of the proposed model in the experiments are summarized in Table 2. Note that K in Table 2 is the number of the spatial characteristics extraction layers (Section 4.2 and Figure 3).
In the experiments, the mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean square error (RMSE) are utilized to indicate the prediction accuracy of the traffic flow prediction models, which are calculated as follows:
MAE = i = 1 N | v t i v ^ t i | N , MAPE = 100 % N i = 1 N | v t i v ^ t i v t i | , RMSE = i = 1 N ( v t i v ^ t i ) 2 N ,
where v t i is the observed real traffic flow value, v ^ t i is the predicted traffic flow value, and N is the number of predicted traffic flow values.

5.2. The Traffic Flow Prediction of Different Models

The proposed model was compared with the following baseline models in the experiment to verify the capability of the proposed model.
  • SVR [43]: Support vector regression is a machine learning algorithm that uses support vector machines for the regression tasks.
  • LSTM [44]: The long short-term memory network can extract the time series characteristics with a long time span. It can effectively solve the problems of gradient vanishing and gradient explosion and has good prediction ability.
  • STGNN [45]: The Transformer encoder layer and GRU are used to capture temporal correlations. GCN and the latent positional representation are merged and used to capture dynamic spatial correlations.
  • AGCRN [46]: The spatial relationship graph is dynamically constructed based on node embeddings. Moreover, the dynamic and temporal spatial correlations are captured by using GCN and GRU, respectively.
  • GWN [47]: The temporal convolution layer is modified from the dilated causal convolution [48], where GCN is used to capture temporal and spatial characteristics for the traffic data prediction.
  • STGCN [31]: The spatial–temporal graph convolutional network uses GCN to capture spatial information, where 1-D causal convolution and gated linear units (GLU) are used to capture temporal information.
  • ASTGCN [32]: The spatial–temporal graph convolutional network based on attention mechanism uses the attention mechanism and GCN to extract spatial and temporal characteristics for the traffic data prediction.
  • STSGCN [37]: The spatial–temporal synchronous graph convolutional network uses aggregate operation and GCN to extract the heterogeneities in localized spatial-temporal graphs.
  • STFGNN [49]: The spatial–temporal fusion graph neural network proposes the fusion graph module and the gated convolution network on spatial-temporal fusion graph for the traffic data prediction.
  • STCDN [50]: The spatial–temporal continuous dynamics network uses a neural ordinary differential equation to construct the traffic flow prediction model with encoder-decoder architecture.
The traffic flow prediction results of different models on five datasets are presented in Table 3. Table 3 suggests that the proposed model demonstrates the highest prediction accuracy when compared to the baseline traffic flow prediction models on five datasets. Traditional traffic flow prediction models, such as SVR, exhibit constrained learning capabilities, when dealing with complex and dynamic traffic flow patterns, resulting in suboptimal prediction results. In contrast, LSTM, a specialized RNN model, is good at capturing temporal traffic flow correlations. Nevertheless, it neglects the spatial correlations of traffic flow, thereby reducing its prediction accuracy.
STGNN assigns a latent positional representation vector to each node. A new adjacency matrix is constructed by calculating the vector transpose multiplication between adjacent nodes; then, GCN is used to capture correlations on the new adjacency matrix. The prediction accuracy of STGNN lags far behind the proposed model, which means that the ability of STGNN to capture dynamic correlations is weaker than that of GAT. It is worth noting that STGNN experienced GPU memory overflow during training on the large-scale dataset PEMS07; thus, no experimental results were obtained. The spatial relationship graph is dynamically constructed by AGCRN using learnable node embeddings. Similar to STGNN, in AGCRN, GCN is also used to capture dynamic spatial correlations based on the new graph. Since the ability of GCN to capture dynamic correlations is weaker than that of GAT, AGCRN is outperformed by the proposed model.
GWN, STGCN, ASTGCN, and STSGCN leverage CNN or GCN to capture spatial correlations of traffic flow. Compared with SVR or LSTM, these four models have better prediction accuracy. GWN employs a temporal convolution layer that aggregates the characteristics of two nodes in the temporal dimension by adjusting the skipping distance. However, this model exhibits limitations compared to the Transformer of the proposed model, which concurrently aggregates global characteristics of nodes. Additionally, by only using 1D convolution, it is hard for GWN to extract long-term time series characteristics, resulting in suboptimal prediction accuracy. Similarly, STGCN utilizes 1D convolution for the temporal characteristics capture. By using GLU, its prediction accuracy has slightly improved. However, since GWN and STGCN neglect the dynamic spatial traffic flow characteristics, their prediction accuracy are lower than that of the proposed model. ASTGCN divides spatial–temporal traffic data into hours, days, and weeks, which attempts to extract long-term temporal characteristics from these different periods of data with a module of shared parameters. However, the module of shared parameters also limits the expressiveness of ASTGCN. STSGCN utilizes a sliding window to simultaneously extract temporal and spatial traffic flow characteristics, yet it encounters challenges in effectively modeling global correlations.
STFGNN incorporates both temporal and spatial graphs to capture dynamic spatial correlations to some extent. Nevertheless, as it treats each graph node with equal importance, its ability to model dynamic spatial correlations is constrained. In contrast, the proposed model introduces GAT with the attention mechanism, which assigns distinct weights to various adjacent nodes, thereby enhancing the effective capture of dynamic spatial characteristics.
STCDN uses a graph neural network to parameterize the so-called network dynamics, which can be viewed as the change rate of traffic flow. STCDN simulates the changes in traffic flow characteristics in a certain period by the first-order definite integral of network dynamics. Since only the first-order definite integral is considered, and the high-order term is not included, the traffic flow prediction accuracy of STCDN is worse than that of the proposed model in most metrics. Although the RMSE of the proposed model is slightly higher than that of STCDN in three datasets, the MAE and MAPE are lower than those of STCDN in all datasets, which indicates that the proposed model performs better than STCDN on most of the data, while it has relatively large prediction errors on a small amount of data. However, compared to the MAE and MAPE gaps between the proposed model and STCDN, the RMSE gaps are smaller.
The proposed model effectively captures both spatial and temporal traffic flow characteristics compared with the aforementioned baseline models, leading to superior prediction accuracy.

5.3. The Long-Term Traffic Flow Prediction

To study the traffic flow prediction accuracy of the proposed model in different time intervals and on different datasets, the time intervals are divided into four groups, namely, 18, 24, 30, and 36. It should be noted that the PeMS dataset records data every 5 min, while BJTaxi records data every 30 min. Therefore, one “time interval” represents 5 min in PeMS and 30 min in BJTaxi. The traffic flow prediction results under these four time intervals are presented in Table 4.
Table 4 suggests that in the five traffic flow datasets, with the increase in time intervals, the values of three metrics increase slowly, and the prediction accuracy of the proposed model does not deteriorate sharply. Even for very long time intervals (36), the traffic flow prediction accuracy of the proposed model is acceptable. This fully demonstrates the suitability of the proposed model for long-term traffic flow prediction.

5.4. Robustness and Sensitivity Analysis

The prediction accuracy on different traffic scenarios: In order to evaluate the prediction accuracy of the proposed model in special traffic scenarios, the traffic flow data in a day are divided into peak and off-peak hours according to the traffic flow values. Peak hours refer to the period in a day when traffic flow is consistently at high values. Specifically, for the PEMS04 dataset, the peak hours are from 5:00 AM to 6:00 PM, and for the PEMS08 dataset, the peak hours are from 7:30 AM to 8:30 PM. The model in Table 3 that was trained on the full training set (i.e., peak and off-peak hours) will be used to predict the traffic flow in peak and off-peak hours, respectively. The prediction accuracy is shown in Table 5. The arrows in Table 5 indicate whether the metrics have increased or decreased compared to the benchmark metrics in Table 3.
As can be seen in Table 5, compared to the baseline metrics in Table 3, the metrics in peak and off-peak hours show opposite changes. The variation amplitude and frequency of traffic flow in peak hours are larger than that in off-peak hours, so it is more difficult to predict during peak hours. In terms of metrics, MAE and RMSE increase in peak hours on both datasets, while MAE and RMSE decrease in off-peak hours. MAPE is a percentage error and the traffic flow values in peak hours are large; that is, the value of the denominator in Equation (16) is large. When the increase in the error (i.e., numerator) is smaller than the denominator, MAPE will become smaller. Thus, MAPE in peak hours decreases on both datasets. On the contrary, the traffic flow in off-peak hours is smaller, and the numerator and denominator values are close. Therefore, MAPE increases in off-peak hours on both datasets. The change ranges of the metrics in peak hours are not large, which indicates that the proposed model has good robustness. Because the amount of data in off-peak hours is small, the metrics changes in off-peak hours are large.
The prediction accuracy on different missing patterns: In real scenarios, problems such as sensor failure, network delay, and power outage may cause traffic flow data missing. Incomplete traffic flow data will affect the prediction accuracy of traffic flow models. Missing data is divided into two patterns: point missing pattern and block missing pattern. The point missing pattern indicates random and scattered missing data, which is usually caused by brief network delays. The block missing pattern refers to continuous missing data. For example, several sensor failures may lead to a sequence of data misses in the spatial dimension, and long network delays may lead to a sequence of missing data in the temporal dimension, or the two cases may occur simultaneously, resulting in large missing data blocks. In this paper, 25% of the traffic flow data are randomly removed from the original data to simulate the point missing pattern. To simulate the block missing pattern, some points are selected with a probability of 0.15%. Starting from these points, 5 to 20 points are randomly extended along the spatial and temporal dimensions, and the data within the matrix formed by these points will be removed. Since the block missing pattern is usually accompanied by a point missing pattern in real situations, an additional 5% of the data will be randomly removed in block missing pattern.
The PEMS04 and PEMS08 datasets are selected to evaluate the prediction accuracy of the proposed model under different missing patterns. Consistent with the above experimental settings, the proportions of training, validation, and test datasets are 60%, 20%, and 20%, respectively. In order to obtain the correct results, we only simulate the missing patterns on the training data. It is worth noting that the original data of the PEMS04 and PEMS08 datasets have 1.59% and 0.35% missing data, respectively. For the PEMS04 dataset, 26.26% of the data is missing in the point missing pattern, and 10.75% of the data is missing in the block missing pattern. For the PEMS08 dataset, 25.25% of the data is missing in the point missing pattern and 9.6% of the data is missing in the block missing pattern. The prediction accuracy of the proposed model under different missing patterns of both the PEMS04 and PEMS08 datasets is shown in Table 6. The percentage in parentheses in Table 6 indicates the percentage increases in the metrics compared to the benchmark data in Table 3. As can be seen in Table 6, the maximum percentage increase in the metrics does not exceed 2.6% for the PEMS04 dataset, and it does not exceed 2.8% for the PEMS08 dataset. This shows that the proposed model is robust and can be used for traffic flow prediction in real scenarios.

5.5. Hyperparameter Analysis and Time Costs

Deep learning models often have to make a trade-off between model prediction accuracy and the number of parameters. More parameters lead to higher computational complexity and longer training time. When adding more parameters does not lead to a significant prediction accuracy improvement, it is not worthwhile to further increase the number of parameters. To evaluate the relationship between model prediction accuracy and the training time with different numbers of parameters, we set different hyperparameters and recorded the times of the model training on the PEMS08 dataset over one epoch. The experimental results are shown in Figure 5. In Figure 5, the X-axes represent the hyperparameters introduced in Table 2, where larger values of the hyperparameters mean more parameters. The blue lines represent the model prediction accuracy (MAE), while the orange lines represent the time (in seconds) required to train the model for one epoch. As can be seen in Figure 5, the training times of the model will increase as the values of the hyperparameters increase. The model achieved the best prediction accuracy when the K of the spatial characteristics extraction layers, the τ of GCN and GAT, and the heads of GNN and the Transformer encoder were set to 2, 64, and 8, respectively. As for the layers of Transformer encoder, four layers show a slight improvement over three layers in terms of MAE, but the training time increased significantly. Therefore, the number of layers is set to three. It is worth noting that when the layers of Transformer encoder are set to five, the model’s prediction accuracy deteriorates significantly. We suggest that for the PEMS08 dataset, the model has too many parameters, and the data in the PEMS08 dataset are insufficient to fully train the model.

5.6. Entropy Analysis of GAT

To analyze the effectiveness of GAT and its ability in dynamic correlation capturing, the PEMS04 dataset was divided into peak and off-peak hours according to the method described in Section 5.4, and the prediction accuracy of G2T-noGAT and G2T-noGCN on the complete dataset in peak and off-peak hours was evaluated, respectively. G2T is the complete model. G2T-noGAT and G2T-noGCN represent models that remove the GAT and GCN modules from G2T, respectively. The experimental results are shown in Table 7. In Table 7, the prediction accuracy of the model is measured by MAE. Diff is the difference between the results of the incomplete model (i.e., G2T-noGAT and G2T-noGCN) and the results of G2T. Ratio is the ratio of the Diff of peak and off-peak hours to the Diff of the complete dataset. From another perspective, Diff 1 represents the accuracy improvement amplitude of G2T-noGAT after adding GAT, and Diff 2 represents the accuracy improvement amplitude of G2T-noGCN after adding GCN. The difference between Diff 1 and Diff 2 during peak hours is the smallest, which indicates that GAT has a more obvious effect during peak hours compared to off-peak hours and complete data. The Ratio 1 of G2T-noGAT in peak hours is much higher than that in off-peak hours, which indicates that GAT is more sensitive to peak hours data. In summary, GAT and GCN are indispensable and the best prediction accuracy of our model can only be achieved, when both of them are present.
To further explore why GAT and GCN complement each other, the entropy of the attention weights of GAT is analyzed [51]. For node i, its entropy E i can be calculated as follows:
E i = j N ( i ) α i j l o g 2 α i j ,
where N ( i ) denotes the neighboring nodes of node i, and α i j is the attention weight value of node i to node j.
For GCN, the entropy of D 1 / 2 A D 1 / 2 was calculated in Equation (4). In addition, we simulated the entropy when the weights follow a uniform distribution. The aggregation histogram of entropy is drawn in Figure 6. In Figure 6, the x-axis represents the entropy intervals, and the y-axis represents the number of nodes whose entropy values fall within the corresponding entropy intervals.
A lower entropy means that the weights are relatively large for some neighboring nodes; that is, it is more discriminative for different nodes. The uniform distribution is randomly generated and does not make any sense, so it has the maximum entropy. GCN calculates weights based on the adjacency matrix A and the degree matrix D, which is discriminative to some extent. However, the weights of GCN are fixed and cannot be adjusted according to the characteristics of nodes. Compared with GCN, the weights of GAT take node characteristics into account. From Figure 6, it can be seen that the entropy of GAT is different during peak and off-peak hours, and the entropy is lower during peak hours, which indicates that GAT is more sensitive to peak hours data. In other words, the neighboring nodes that GAT focuses on at different hours are discriminative and can be complementary to GCN. Only when both GAT and GCN are present, the proposed model can achieve the best prediction accuracy.

5.7. The Ablation Study

In the proposed model, the static and dynamic spatial traffic flow characteristics are extracted by GCN and GAT. The temporal traffic flow characteristics are captured through the Transformer encoder. These modules have a significant impact on the traffic flow prediction of the proposed model. To verify the effectiveness of GAT, GCN, and the Transformer encoder, comparative experiments are carried out. G2T-noGAT, G2T-noGCN, and G2T-noTransformer represent models that remove the GAT module, the GCN module and Transformer encoder, respectively. In contrast, G2T is a complete model with all three modules. The experimental results of them on PEMS08 dataset are presented in Figure 7.
As presented in Figure 7, there was a small decrease in the model prediction accuracy when the GAT was removed (i.e., G2T-noGAT). GAT is utilized to learn the dynamic spatial traffic flow characteristics of road segments, which gives different attention weights to adjacent road segments, to better describe the spatial traffic flow correlations. The task of the GCN is to model a fixed road network structure by extracting the adjacency and distance characteristics from the road network. After the removal of the GCN (i.e., G2T-noGCN), the model prediction accuracy also decreased slightly, but the decrease was greater than that after the removal of the GAT, which indicates that the static spatial characteristics captured by the GCN are more important.
When the Transformer encoder was removed, the model prediction accuracy decreased greatly. As the prediction time increases, the prediction errors of the model without the Transformer encoder (i.e., G2T-noTransformer) increase sharply. This is because the Transformer encoder can learn the long-term traffic flow characteristics sequence and has certain advantages in learning time series data. After removing the Transformer encoder, the long-term temporal correlations are ignored, so the prediction accuracy is relatively poor.

6. Conclusions

A model is proposed in this paper to handle the traffic flow prediction problem in complex road networks. In the proposed model, GCN and GAT are used to extract static and dynamic spatial characteristics from traffic flow data. In addition, a fully connected layer is used to calculate the residuals for the spatial characteristics extraction layer. After that, the traffic flow data were input into the Transformer encoder to capture their temporal characteristics. The experiments on five real traffic flow datasets indicate that the proposed model outperforms many current spatial–temporal traffic flow prediction models. However, only the characteristics of the traffic flow and road network are considered in the proposed model, while the functional characteristics of road segments are ignored, so we will add these kinds of characteristics to the proposed model in the future. In addition, the users’ travel routes can be considered to represent the link between road segments to improve traffic flow prediction accuracy in the future.

Author Contributions

Conceptualization, Xing Su, Limin Guo, and Zhi Cai; methodology, Xing Su and Pengcheng Li; software, Pengcheng Li; validation, Pengcheng Li; formal analysis, Xing Su and Pengcheng Li; investigation, Limin Guo and Pengcheng Li; resources, Zhi Cai; writing—original draft preparation, Pengcheng Li; writing—review and editing, Xing Su; visualization, Boya Zhang; supervision, Zhi Cai. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grants No. 62276011, 62072016) and the Beijing Municipal Natural Science Foundation (Grants No.4244074).

Data Availability Statement

PeMS source data: https://pems.dot.ca.gov/ (accessed on 26 July 2025). BJTaxi source data: https://drive.google.com/file/d/1GOhxZmNwU9TuRWnSLvy0dw-XXIjZzJBx/view (accessed on 10 September 2025).

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyzes, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
GCNGraph Convolution Network
GATGraph Attention Network
RNNRecurrent Neural Network
GLUGated Linear Units
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
RMSERoot Mean Square Error

References

  1. Jia, Y.; Wu, J.; Xu, M. Traffic flow prediction with rainfall impact using a deep learning method. J. Adv. Transp. 2017, 2017, 6575947. [Google Scholar] [CrossRef]
  2. Wang, Y.; Li, L.; Xu, X. A piecewise hybrid of ARIMA and SVMs for short-term traffic flow prediction. In Proceedings of the International Conference on Neural Information Processing, Guangzhou, China, 14–18 November 2017. [Google Scholar]
  3. Zhu, H.; Xie, Y.; He, W.; Sun, C.; Zhu, K.; Zhou, G.; Ma, N. A Novel Traffic Flow Forecasting Method Based on RNN-GCN and BRB. J. Adv. Transp. 2020, 2020, 7586154. [Google Scholar] [CrossRef]
  4. Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C 2015, 54, 187–197. [Google Scholar] [CrossRef]
  5. Zonoozi, A.; Kim, J.J.; Li, X.L.; Cong, G. Periodic-CRN: A convolutional recurrent model for crowd density prediction with recurring periodic patterns. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
  6. Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
  7. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  8. Yao, R.; Zhang, W.; Zhang, L. Hybrid Methods for Short-Term Traffic Flow Prediction Based on ARIMA-GARCH Model and Wavelet Neural Network. J. Transp. Eng. Part A Syst. 2020, 146, 04020086. [Google Scholar] [CrossRef]
  9. Lu, S.; Zhang, Q.; Chen, G.; Seng, D. A combined method for short-term traffic flow prediction based on recurrent neural network. Alex. Eng. J. 2021, 60, 87–94. [Google Scholar] [CrossRef]
  10. Ge, W.; Cao, Y.; Ding, Z.; Guo, L. Forecasting Model of Traffic Flow Prediction Model Based on Multi-resolution SVR. In Proceedings of the 2019 3rd International Conference on Innovation in Artificial Intelligence, Suzhou, China, 15–18 March 2019; ICIAI ’19. pp. 1–5. [Google Scholar] [CrossRef]
  11. Li, C.; Xu, P. Application on traffic flow prediction of machine learning in intelligent transportation. Neural Comput. Appl. 2021, 33, 613–624. [Google Scholar] [CrossRef]
  12. Tawfeek, M.H.; El-Basyouny, K. Estimating Traffic Volume on Minor Roads at Rural Stop-Controlled Intersections using Deep Learning. Transp. Res. Rec. 2019, 2673, 108–116. [Google Scholar] [CrossRef]
  13. Hu, H.; Lin, Z.; Hu, Q.; Zhang, Y. Attention Mechanism With Spatial-Temporal Joint Model for Traffic Flow Speed Prediction. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16612–16621. [Google Scholar] [CrossRef]
  14. Geng, X.; Li, Y.; Wang, L.; Zhang, L.; Yang, Q.; Ye, J.; Liu, Y. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 3656–3663. [Google Scholar]
  15. Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar] [CrossRef]
  16. Shu, W.; Cai, K.; Xiong, N.N. A Short-Term Traffic Flow Prediction Model Based on an Improved Gate Recurrent Unit Neural Network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16654–16665. [Google Scholar] [CrossRef]
  17. Zhang, Z.; Li, M.; Lin, X.; Wang, Y.; He, F. Multistep speed prediction on traffic networks: A deep learning approach considering spatio-temporal dependencies. Transp. Res. Part C Emerg. Technol. 2019, 105, 297–322. [Google Scholar] [CrossRef]
  18. Pan, Z.; Liang, Y.; Wang, W.; Yu, Y.; Zheng, Y.; Zhang, J. Urban traffic prediction from spatio-temporal data using deep meta learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1720–1730. [Google Scholar]
  19. Chen, W.; Chen, L.; Xie, Y.; Cao, W.; Gao, Y.; Feng, X. Multi-range attentive bicomponent graph convolutional network for traffic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3529–3536. [Google Scholar]
  20. Li, Y.; Chai, S.; Ma, Z.; Wang, G. A Hybrid Deep Learning Framework for Long-Term Traffic Flow Prediction. IEEE Access 2021, 9, 11264–11271. [Google Scholar] [CrossRef]
  21. Cui, Z.; Henrickson, K.; Ke, R.; Wang, Y. Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting. IEEE Trans. Intell. Transp. Syst. 2019, 21, 4883–4894. [Google Scholar] [CrossRef]
  22. Guo, K.; Hu, Y.; Qian, Z.; Sun, Y.; Gao, J.; Yin, B. Dynamic graph convolution network for traffic forecasting based on latent network of Laplace matrix estimation. IEEE Trans. Intell. Transp. Syst. 2020, 23, 1009–1018. [Google Scholar] [CrossRef]
  23. Zhang, J.; Zheng, Y.; Qi, D. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar] [CrossRef]
  24. Lin, Z.; Feng, J.; Lu, Z.; Li, Y.; Jin, D. DeepSTN+: Context-Aware Spatial-Temporal Neural Network for Crowd Flow Prediction in Metropolis. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1020–1027. [Google Scholar] [CrossRef]
  25. Yao, H.; Wu, F.; Ke, J.; Tang, X.; Jia, Y.; Lu, S.; Gong, P.; Ye, J.; Li, Z. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Washington, DC, USA, 2018. AAAI’18/IAAI’18/EAAI’18. [Google Scholar]
  26. Yao, H.; Tang, X.; Wei, H.; Zheng, G.; Li, Z. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; AAAI Press: Washington, DC, USA, 2019. AAAI’19/IAAI’19/EAAI’19. [Google Scholar] [CrossRef]
  27. Han, D.; Chen, J.; Sun, J. A parallel spatiotemporal deep learning network for highway traffic flow forecasting. Int. J. Distrib. Sens. Netw. 2019, 15, 15501477198. [Google Scholar] [CrossRef]
  28. Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; KDD ’20. pp. 753–763. [Google Scholar] [CrossRef]
  29. Li, N.; Jia, S.; Li, Q. Traffic Message Channel Prediction Based on Graph Convolutional Network. IEEE Access 2021, 9, 135423–135431. [Google Scholar] [CrossRef]
  30. Chen, Z.; Zhao, B.; Wang, Y.; Duan, Z.; Zhao, X. Multitask Learning and GCN-Based Taxi Demand Prediction for a Traffic Road Network. Sensors 2020, 20, 3776. [Google Scholar] [CrossRef]
  31. Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; AAAI Press: Washington, DC, USA, 2018. IJCAI’18. pp. 3634–3640. [Google Scholar]
  32. Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 922–929. [Google Scholar]
  33. Huang, R.; Huang, C.; Liu, Y.; Dai, G.; Kong, W. LSGCN: Long Short-Term Traffic Prediction with Graph Convolutional Networks. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Yokohama, Japan, 11–17 July 2020; Bessiere, C., Ed.; International Joint Conferences on Artificial Intelligence Organization: Montreal, QC, Canada, 2020; Volume 7, pp. 2355–2361. [Google Scholar] [CrossRef]
  34. Zhang, Q.; Chang, J.; Meng, G.; Xiang, S.; Pan, C. Spatio-Temporal Graph Structure Learning for Traffic Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 1177–1185. [Google Scholar] [CrossRef]
  35. Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  36. Wu, T.; Chen, F.; Wan, Y. Graph Attention LSTM Network: A New Model for Traffic Flow Forecasting. In Proceedings of the 2018 5th International Conference on Information Science and Control Engineering (ICISCE), Zhengzhou, China, 20–22 July 2018; pp. 241–245. [Google Scholar]
  37. Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 914–921. [Google Scholar]
  38. Liu, L.; Zhen, J.; Li, G.; Zhan, G.; He, Z.; Du, B.; Lin, L. Dynamic spatial-temporal representation learning for traffic flow prediction. IEEE Trans. Intell. Transp. Syst. 2020, 22, 7169–7183. [Google Scholar] [CrossRef]
  39. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
  40. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  41. Liu, M.; Huang, H.; Feng, H.; Sun, L.; Du, B.; Fu, Y. Pristi: A conditional diffusion framework for spatiotemporal imputation. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 2–7 April 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1927–1939. [Google Scholar]
  42. Chen, D.; Lin, Y.; Li, W.; Li, P.; Zhou, J.; Sun, X. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3438–3445. [Google Scholar]
  43. Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 1996, 9, 155–161. [Google Scholar]
  44. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  45. Wang, X.; Ma, Y.; Wang, Y.; Jin, W.; Wang, X.; Tang, J.; Jia, C.; Yu, J. Traffic Flow Prediction via Spatial Temporal Graph Neural Network. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; WWW ’20. pp. 1082–1092. [Google Scholar] [CrossRef]
  46. Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, Online, 6–12 December 2020. NIPS ’20. [Google Scholar]
  47. Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, Macao, 10–16 August 2019; International Joint Conferences on Artificial Intelligence Organization: Montreal, QC, Canada, 2019; Volume 7, pp. 1907–1913. [Google Scholar] [CrossRef]
  48. Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
  49. Li, M.; Zhu, Z. Spatial-temporal fusion graph neural networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 4189–4196. [Google Scholar]
  50. Xie, Y.; Xiong, Y.; Zhang, J.; Chen, C.; Zhang, Y.; Zhao, J.; Jiao, Y.; Zhao, J.; Zhu, Y. Temporal super-resolution traffic flow forecasting via continuous-time network dynamics. Knowl. Inf. Syst. 2023, 65, 4687–4712. [Google Scholar] [CrossRef]
  51. Li, X.; Cheng, Y. Understanding the message passing in graph neural networks via power iteration clustering. Neural Netw. 2021, 140, 130–135. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The example of the traffic flow in various road segments within a road network.
Figure 1. The example of the traffic flow in various road segments within a road network.
Ijgi 14 00379 g001
Figure 2. The example of collected traffic flow data.
Figure 2. The example of collected traffic flow data.
Ijgi 14 00379 g002
Figure 3. The overall framework of the proposed model.
Figure 3. The overall framework of the proposed model.
Ijgi 14 00379 g003
Figure 4. Structure of spatial characteristics extraction layer.
Figure 4. Structure of spatial characteristics extraction layer.
Ijgi 14 00379 g004
Figure 5. The prediction accuracy and training time of the proposed model under different hyperparameters on PEMS08. The blue lines represent the model prediction accuracy (MAE), while the orange lines represent the time (in seconds) required to train the model for one epoch.
Figure 5. The prediction accuracy and training time of the proposed model under different hyperparameters on PEMS08. The blue lines represent the model prediction accuracy (MAE), while the orange lines represent the time (in seconds) required to train the model for one epoch.
Ijgi 14 00379 g005
Figure 6. The entropy aggregation histogram of GAT, GCN, and uniform distribution.
Figure 6. The entropy aggregation histogram of GAT, GCN, and uniform distribution.
Ijgi 14 00379 g006
Figure 7. Ablation study on each module in the proposed model on PEMS08.
Figure 7. Ablation study on each module in the proposed model on PEMS08.
Ijgi 14 00379 g007
Table 1. The information of datasets.
Table 1. The information of datasets.
Datasets# of Nodes# of TimestepsTime Periods
PEMS0335826,2081 September 2018–30 November 2018
PEMS0430716,9921 January 2018–28 February 2018
PEMS0788328,2241 May 2017–31 August 2017
PEMS0817017,8561 July 2016–31 August 2016
BJTaxi29022,4841 February 2016–31 August 2016
Table 2. The hyperparameters of the proposed model for all datasets.
Table 2. The hyperparameters of the proposed model for all datasets.
HyperparametersPEMS03PEMS04PEMS07PEMS08BJTaxi
Batch size3232323232
K of spatial characteristics extraction layers22222
τ of GCN and GAT642562566464
Heads of graph attention network88888
Heads of Transformer encoder88888
Layers of Transformer encoder35433
Learning rate0.0010.00010.00050.0010.001
Table 3. The traffic flow prediction accuracy of different models. Bold font indicates the best prediction accuracy.
Table 3. The traffic flow prediction accuracy of different models. Bold font indicates the best prediction accuracy.
ModelsSVRLSTMSTGNNAGCRNGWNSTGCNASTGCNSTSGCNSTFGNNSTCDNProposed Model
DatasetsMetrics
PEMS03MAE22.0121.3320.7115.8519.8518.2817.8517.4816.7716.3315.07
MAPE (%)22.9313.3319.1615.7419.3117.5217.6516.7816.3015.8715.26
RMSE35.2835.1132.6228.1832.9430.7329.8829.2128.3426.1426.44
PEMS04MAE28.6627.1426.5019.7425.4522.2722.4221.1919.8320.4119.36
MAPE (%)19.1518.2017.5113.1017.2914.3615.8713.9013.0213.8512.83
RMSE44.5941.5939.3832.2239.7035.0234.7533.6531.8831.2431.48
PEMS07MAE32.9729.98-21.8926.8527.4125.9824.6222.0722.4021.39
MAPE (%)15.4313.20-9.3812.1212.2311.8410.219.2110.109.17
RMSE50.1545.84-35.7642.7841.0239.6539.0335.8035.2235.26
PEMS08MAE23.2522.2020.5716.1319.1318.0418.8617.1316.6416.4815.17
MAPE (%)14.7114.2013.3710.1712.6811.1612.4910.9610.6010.519.65
RMSE36.1534.0631.2225.5931.0527.9428.5526.8126.2224.9024.59
BJTaxiMAE31.8730.9127.7719.5625.3723.4422.5919.3918.7318.8317.62
MAPE (%)22.4621.1220.9717.2619.2918.8718.5716.7815.1115.3214.20
RMSE49.7447.5144.1333.4738.6634.2334.1334.9830.4728.9328.01
Table 4. The prediction accuracy of the long-term traffic flow prediction.
Table 4. The prediction accuracy of the long-term traffic flow prediction.
Time Intervals18243036
DatasetsMetrics
MAE16.2216.9217.5918.41
PEMS03MAPE (%)16.1816.0016.7518.07
RMSE27.7428.9230.5431.59
MAE20.0920.8421.4221.65
PEMS04MAPE (%)13.3213.8414.0414.29
RMSE32.8333.9735.0835.50
MAE22.5723.7824.3125.08
PEMS07MAPE (%)10.3610.8510.9011.49
RMSE37.4939.7341.1042.36
MAE16.2816.7317.3718.01
PEMS08MAPE (%)10.4410.6511.1812.25
RMSE26.2627.3528.3929.27
MAE18.3118.9819.7320.87
BJTaxiMAPE (%)15.1615.5716.3917.14
RMSE29.3831.0532.4433.68
Table 5. The prediction accuracy of the proposed model under different traffic scenarios of PEMS04 and PEMS08. Upward/downward arrows indicate that the prediction accuracy in different traffic scenarios has increased/decreased compared with the corresponding prediction accuracy in Table 3.
Table 5. The prediction accuracy of the proposed model under different traffic scenarios of PEMS04 and PEMS08. Upward/downward arrows indicate that the prediction accuracy in different traffic scenarios has increased/decreased compared with the corresponding prediction accuracy in Table 3.
Traffic ScenariosPeak HoursOff-Peak Hours
DatasetsMetrics
PEMS04MAE21.85 (↑)9.36 (↓)
MAPE (%)11.05 (↓)20.26 (↑)
RMSE34.19 (↑)16.63 (↓)
PEMS08MAE15.85 (↑)13.60 (↓)
MAPE (%)8.47 (↓)12.41 (↑)
RMSE24.63 (↑)24.50 (↓)
Table 6. The prediction accuracy of the proposed model under different missing patterns for PEMS04 and PEMS08. Percentages and upward arrows indicate the proportion of increase in prediction accuracy under different missing patterns compared with the corresponding prediction accuracy in Table 3.
Table 6. The prediction accuracy of the proposed model under different missing patterns for PEMS04 and PEMS08. Percentages and upward arrows indicate the proportion of increase in prediction accuracy under different missing patterns compared with the corresponding prediction accuracy in Table 3.
Missing PatternPoint MissingBlock Missing
DatasetsMetrics
PEMS04MAE19.88 (2.6% ↑)19.61 (1.3% ↑)
MAPE (%)12.95 (0.9% ↑)12.89 (0.5% ↑)
RMSE32.27 (2.5% ↑)32.30 (2.6% ↑)
PEMS08MAE15.33 (1.1% ↑)15.40 (1.5% ↑)
MAPE (%)9.75 (1.0% ↑)9.92 (2.8% ↑)
RMSE24.85 (1.1% ↑)24.84 (1.0% ↑)
Table 7. The prediction accuracy of G2T, G2T-noGAT, and G2T-noGCN under different traffic scenarios of PEMS04.
Table 7. The prediction accuracy of G2T, G2T-noGAT, and G2T-noGCN under different traffic scenarios of PEMS04.
DatasetsG2TG2T-noGAT Diff 1 Ratio 1 G2T-noGCN Diff 2 Ratio 2
Complete19.3619.610.25-19.980.62-
Peak hours23.3626.763.4013.6027.023.665.90
Off-peak hours8.529.230.712.849.931.412.27
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Su, X.; Li, P.; Cai, Z.; Guo, L.; Zhang, B. Mixed-Graph Neural Network for Traffic Flow Prediction by Capturing Dynamic Spatiotemporal Correlations. ISPRS Int. J. Geo-Inf. 2025, 14, 379. https://doi.org/10.3390/ijgi14100379

AMA Style

Su X, Li P, Cai Z, Guo L, Zhang B. Mixed-Graph Neural Network for Traffic Flow Prediction by Capturing Dynamic Spatiotemporal Correlations. ISPRS International Journal of Geo-Information. 2025; 14(10):379. https://doi.org/10.3390/ijgi14100379

Chicago/Turabian Style

Su, Xing, Pengcheng Li, Zhi Cai, Limin Guo, and Boya Zhang. 2025. "Mixed-Graph Neural Network for Traffic Flow Prediction by Capturing Dynamic Spatiotemporal Correlations" ISPRS International Journal of Geo-Information 14, no. 10: 379. https://doi.org/10.3390/ijgi14100379

APA Style

Su, X., Li, P., Cai, Z., Guo, L., & Zhang, B. (2025). Mixed-Graph Neural Network for Traffic Flow Prediction by Capturing Dynamic Spatiotemporal Correlations. ISPRS International Journal of Geo-Information, 14(10), 379. https://doi.org/10.3390/ijgi14100379

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop