Gated Recurrent Graph Convolutional Attention Network for Trafﬁc Flow Prediction

: Trafﬁc ﬂow prediction is an important function of intelligent transportation systems. Accurate prediction results facilitate trafﬁc management to issue early congestion warnings so that drivers can avoid congested roads, thus directly reducing the average driving time of vehicles, which means less greenhouse gas emissions. However, trafﬁc ﬂow data has complex spatial and temporal correlations, which makes it challenging to predict trafﬁc ﬂow accurately. A Gated Recurrent Graph Convolutional Attention Network (GRGCAN) for trafﬁc ﬂow prediction is proposed to solve this problem. The model consists of three components with the same structure, each of which contains one temporal feature extractor and one spatial feature extractor. The temporal feature extractor ﬁrst introduces a gated recurrent unit (GRU) and uses the hidden states of the GRU combined with an attention mechanism to adaptively assign weights to each time step. In the spatial feature extractor, a node attention mechanism is constructed to dynamically assigns weights to each sensor node, and it is fused with the graph convolution operation. In addition, a residual connection is introduced into the network to reduce the loss of features in the deep network. Experimental results of 1-h trafﬁc ﬂow prediction on two real-world datasets (PeMSD4 and PeMSD8) show that the mean absolute percentage error (MAPE) of the GRGCAN model is as low as 15.97% and 12.13%, and the prediction accuracy and computational efﬁciency are better than the baselines.


Introduction
In the urbanization process of countries all over the world, the holdings of cars have been rising [1].While private cars have brought convenience to the lives of residents, they have also created serious traffic congestion problems and contributed to higher greenhouse gas emissions [2].To solve this problem, many countries began to promote the construction of intelligent transportation systems (ITS) [3].
ITS is an integrated system that applies advanced communication, control, sensing, and computer technologies to solve traffic management and control problems [4].The primary goal of ITS is to provide a safe, efficient, and reliable transportation environment for traffic participants [5].In addition, ITS also has important positive effects on the natural environment by promoting transportation technology innovation and reducing greenhouse gas emissions [6,7].Take traffic flow prediction as an example; as one of the main tasks of ITS [8], accurate traffic flow prediction facilitates traffic management to release congestion warning early so that drivers can avoid congested roads, thus directly reducing the average driving time of vehicles, which means less greenhouse gas emissions [9].
The essence of traffic prediction is to extract the embedded characteristics of the region through the geographical information of the road network and historical traffic data and to predict the traffic flow in the future period accordingly [8].With the emphasis on traffic data, many sensors are deployed on the roads.The dataset consisting of time series data Sustainability 2023, 15, 7696 2 of 13 collected by the sensors and the geographic location of the sensors provides a solid data basis for the field of traffic prediction [10].Traffic flow data is a kind of data with complex spatial and temporal characteristics.First, traffic flow data is an obvious time series data, but the main difference from other time series data is that it is influenced by the spatial structure of the road network [11].As shown in Figure 1, The traffic flow measured by a sensor at a particular time is related not only to the historical flow here but also to the relative spatial location of that sensor in the road network.For example, the traffic flow on a highway depends on the traffic flow on the merging ramps as well as the traffic flow on the exiting ramps.Therefore, accurate traffic flow prediction is a challenging problem.It is necessary to model and analyze both the temporal characteristics of traffic flow and the spatial characteristics of the road network in order to effectively improve the prediction accuracy.
Sustainability 2023, 15, x FOR PEER REVIEW 2 of 14 The essence of traffic prediction is to extract the embedded characteristics of the region through the geographical information of the road network and historical traffic data and to predict the traffic flow in the future period accordingly [8].With the emphasis on traffic data, many sensors are deployed on the roads.The dataset consisting of time series data collected by the sensors and the geographic location of the sensors provides a solid data basis for the field of traffic prediction [10].Traffic flow data is a kind of data with complex spatial and temporal characteristics.First, traffic flow data is an obvious time series data, but the main difference from other time series data is that it is influenced by the spatial structure of the road network [11].As shown in Figure 1, The traffic flow measured by a sensor at a particular time is related not only to the historical flow here but also to the relative spatial location of that sensor in the road network.For example, the traffic flow on a highway depends on the traffic flow on the merging ramps as well as the traffic flow on the exiting ramps.Therefore, accurate traffic flow prediction is a challenging problem.It is necessary to model and analyze both the temporal characteristics of traffic flow and the spatial characteristics of the road network in order to effectively improve the prediction accuracy.Existing traffic flow prediction methods have yielded promising results, yet several challenges remain.Statistical methods [12][13][14], traditional machine learning methods [15][16][17][18][19], and early deep learning methods [20][21][22][23][24][25] tend to consider traffic flow data as timeseries data and ignore the influence of the spatial structure of the road network [26].The methods using convolutional neural networks (CNN) are able to capture spatial features but are only effective for grid structures [27][28][29].The methods using advanced techniques such as graph neural networks (GNN) or attention mechanisms can effectively capture the spatial features of the road network, but they often apply only one or two separate techniques and thus have a slight lack of ability to extract spatio-temporal features [30][31][32][33][34][35][36][37][38][39][40].
In this article, a gated recurrent graph convolutional attention network (GRGCAN) for traffic flow prediction is proposed, which overcomes the above drawbacks.To capture the spatial and temporal features in traffic flow data, a gated recurrent unit (GRU) [23] combined with an attention mechanism is first used to learn temporal features in the data.An attention mechanism and a graph convolution [31] module are fused to extract spatial features among sensor nodes, and finally, feature loss in the network is reduced by a residual connection.
In brief, our main work is as follows: • A temporal feature extractor is constructed, which introduces a GRU and uses its hidden states of it combined with an attention mechanism to adaptively assign weights to each time step.

•
A node attention mechanism fused with graph convolution operation is constructed, which can dynamically assign weights to each sensor node.A spatial feature extractor based on this method is used to synthetically extract spatial features of traffic flow data from a graph-based road network structure.In addition, a residual connection is introduced into the network to reduce the loss of features in the deep network.Existing traffic flow prediction methods have yielded promising results, yet several challenges remain.Statistical methods [12][13][14], traditional machine learning methods [15][16][17][18][19], and early deep learning methods [20][21][22][23][24][25] tend to consider traffic flow data as time-series data and ignore the influence of the spatial structure of the road network [26].The methods using convolutional neural networks (CNN) are able to capture spatial features but are only effective for grid structures [27][28][29].The methods using advanced techniques such as graph neural networks (GNN) or attention mechanisms can effectively capture the spatial features of the road network, but they often apply only one or two separate techniques and thus have a slight lack of ability to extract spatio-temporal features [30][31][32][33][34][35][36][37][38][39][40].
In this article, a gated recurrent graph convolutional attention network (GRGCAN) for traffic flow prediction is proposed, which overcomes the above drawbacks.To capture the spatial and temporal features in traffic flow data, a gated recurrent unit (GRU) [23] combined with an attention mechanism is first used to learn temporal features in the data.An attention mechanism and a graph convolution [31] module are fused to extract spatial features among sensor nodes, and finally, feature loss in the network is reduced by a residual connection.
In brief, our main work is as follows: • A temporal feature extractor is constructed, which introduces a GRU and uses its hidden states of it combined with an attention mechanism to adaptively assign weights to each time step.

•
A node attention mechanism fused with graph convolution operation is constructed, which can dynamically assign weights to each sensor node.A spatial feature extractor based on this method is used to synthetically extract spatial features of traffic flow data from a graph-based road network structure.In addition, a residual connection is introduced into the network to reduce the loss of features in the deep network.

•
To test the effectiveness of the proposed GRGCAN, the model and several other baselines are applied to several real-world traffic flow datasets.The results show that the GRGCAN can make accurate predictions of traffic flows with higher prediction accuracy than baselines.In addition, the GRGCAN does not require module reuse and thus has high training efficiency.
The rest of the article is organized as follows.Section 2 reviews the existing traffic flow prediction methods and their shortcomings.Section 3 introduces the definition of the traffic flow prediction problem, followed by a detailed description of the structural details of the model by introducing three feature extractors.Section 4 gives the details of the experiment, including the datasets we used and how they were preprocessed, the experimental settings, the evaluation metrics, and the baselines for comparison, followed by an analysis of the experimental results.Section 5 is the conclusion of the article and future works.
Early traffic flow prediction works generally use statistical methods.Hamed et al. [12] used the autoregressive integrated moving average (ARIMA) method to develop a time series model to predict the short-term traffic flow on urban arterials.Williams et al. [13] modeled univariate traffic flow data as a seasonal ARIMA process.Zivot et al. [14] used vector autoregressive (VAR) models for the prediction of multivariate time series.These statistical methods consider traffic flow data as mere time series data and make a large number of assumptions about the traffic flow system, and therefore have major limitations and poor prediction accuracy.
With the rise of machine learning, these algorithms have been applied to traffic flow prediction.Ding et al. [15] first applied a support vector machine (SVM) to the traffic flow time series prediction work and made the prediction of short-term traffic flow more effective.Sun et al. [16] proposed a Bayesian network-based traffic flow prediction method in which the traffic flow between adjacent roads in a traffic network is modeled as a Bayesian network.The joint probability distribution between the cause node (the data used for prediction) and the effect node (the data to be predicted) is described as a Gaussian mixture model (GMM), with its parameters estimated by the competitive expectation maximization (CEM) algorithm.Jeong et al. [17] proposed an online learning weighted support-vector regression (OLWSVR) model based on support-vector regression, which can make effective predictions of short-term traffic flow.Johansson et al. [18] used a random forest as a base model for time series prediction, which allows for determining the size of the prediction intervals by using out-of-bag estimates instead of requiring a separate calibration set.Zheng et al. [19] proposed a method based on the k-nearest neighbor (KNN) algorithm to predict short-term traffic flow, which has the advantage of being insensitive to extreme values.However, these methods have difficulty capturing non-linear features in the data.
Due to the significant development of computer performance in recent years, deep learning methods with the ability to process large-scale data and extract non-linear features are widely used in traffic flow prediction.Hua et al. [20] used a feedforward neural network for the first time to predict traffic flow, showing the great potential of deep learning methods in this field.Recurrent neural networks (RNN) [21] are a class of neural networks that process serial data inputs, and RNN and their variants, long short-term memory (LSTM) [22] networks and GRU [23] networks are commonly used to process time series data.For example, Fu et al. [24] used LSTM and GRU to predict short-term traffic flow and showed that both LSTM and GRU achieved better accuracy compared to statistical methods.
The above machine learning and deep learning methods have improved the prediction accuracy of traffic flow compared with statistical methods, but they are still based on the analysis of temporal features of traffic flow data and ignore spatial features [25].
With the gradual understanding of traffic flow, the complex spatial characteristics it contains are recognized, which are derived from the spatial structure of the road network.CNN [26] are models that are commonly used to extract local features of images.Ma et al. [27] converted traffic flow data into images and then applied CNN to them to extract features of the traffic flow for prediction.Yang et al. [28] combined CNN and LSTM to construct the ConvLSTM model, which can predict future traffic flow in the absence of data.
However, CNN can only extract features from grid-structured data, which is difficult to handle road network structures with non-Euclidean properties.This problem is solved by the advent of GNN, which can represent arbitrary graph structures by adjacency matrices to extract features of non-Euclidean data and are, therefore, more suitable for application to traffic networks.Graph convolutional neural networks (GCN) [29] apply convolutional operations to graph structures and can effectively extract features of graphs.Defferrard et al. [30] proposed ChebNet, which uses Chebyshev polynomial approximation to compute the graph convolution and substantially optimizes the computational efficiency of GCN.Within the field of traffic flow prediction, GCN is often fused with other deep learning methods to extract spatio-temporal features of the data simultaneously.Zhao et al. [31] combined GCN and GRU and proposed the temporal graph convolutional network (T-GCN), which can obtain the spatio-temporal correlation from traffic data.Yu et al. [32] proposed the spatio-temporal graph convolutional network (STGCN) consisting of ST-Conv blocks, which captures spatio-temporal correlations through GCN and CNN in each ST-Conv block.Geng et al. [33] proposed the spatio-temporal multigraph convolution network (ST-MGCN), which uses multigraph convolution to capture different types of correlations between regions.Ge et al. [34] designed the global spatial-temporal graph convolutional network (GSTGCN) for urban traffic prediction, in which temporal features are extracted using 1D CNN, and residual connectivity and spatial features are extracted using GCN, considering the influence of external factors.Wei et al. [35] proposed the novel spatial-temporal graph synchronous aggregation model (STGSA), which constructs the time dependency in time series as a graph with reference to the spatial graph and aggregates it with the spatial graph to extract spatio-temporal features.However, features may be lost in the process of graph construction and aggregation.
The attention mechanism is a method for extracting key information from data, which is widely used in the fields of image processing [36] and natural language processing [37] and has been used in recent years in the field of traffic flow prediction.The ST-MetaNet proposed by Liang et al. [38] has a meta-graph attention network to capture diverse spatial correlations and a meta-recurrent neural network to consider diverse temporal correlations.Attention-based spatial-temporal graph convolutional networks (ASTGCN) proposed by Guo et al. [39] used a spatio-temporal attention mechanism combined with spatio-temporal convolution, which allows dynamic learning of correlations between space and time.The spatial-temporal attention wavenet (STAWnet) proposed by Tian et al. [40] applies temporal convolution and self-attention networks to capture the spatio-temporal features of the data without prior knowledge of the graph.
Inspired by the above studies and considering the complex spatio-temporal characteristics of traffic flow data, we construct the model using GRU, attention mechanism, GCN, and CNN concurrently.

Problem Definition
In this study, the road network is defined by the graph G = (V, E, A), where V is a finite set denoting |V| = N traffic flow sensor nodes; E is a set consisting of edges between nodes in graph G, representing the connectivity between nodes; A ∈ R N×N is the normalized adjacency matrix of graph G, representing the direction and distance between nodes.In graph G, the graph signal of time step t is X t = x 1 t , . . ., x N t ∈ R N×F , where x n t (n ∈ {1, . . . ,N}) are all the features collected by the n-th sensor at time step t; F is the number of features observed at each node.
The goal of traffic flow prediction is to find a model f θ (•), where θ are learnable parameters.The model takes the historical traffic flow sequence with a length of T and the adjacency matrix A as inputs to give predictions for the next T time steps.The input sequence is denoted as χ = {X t−T+1 , . . . ,X t } ∈ R N×F×T and the output sequence is denoted as {X t+1 , . . . ,X t+T } ∈ R N×F×T .{X t+1 , . . . ,X t+T } = f θ (X t−T+1 , . . . ,X t ; A) = f θ (χ; A) (1) 3.2.The Architecture of GRGCAN

Problem Definition
In this study, the road network is defined by the graph  = (, , ), where  is a finite set denoting || =  traffic flow sensor nodes;  is a set consisting of edges between nodes in graph , representing the connectivity between nodes;  ∈ ℝ × is the normalized adjacency matrix of graph , representing the direction and distance between nodes.In graph , the graph signal of time step  is  = { , … ,  } ∈ ℝ × , where  ( ∈ {1, … , } ) are all the features collected by the  -th sensor at time step  ;  is the number of features observed at each node.
The goal of traffic flow prediction is to find a model  (•) , where  are learnable parameters.The model takes the historical traffic flow sequence with a length of  and the adjacency matrix  as inputs to give predictions for the next ′ time steps.The input sequence is denoted as χ = { , … ,  } ∈ ℝ × × and the output sequence is denoted

Time Feature Extractor
Recurrent neural networks are the most used models for extracting features from time series data, but traditional RNNs have problems of gradient disappearance or gradient explosion when the sequence is too long.The advent of LSTM has solved these

Time Feature Extractor
Recurrent neural networks are the most used models for extracting features from time series data, but traditional RNNs have problems of gradient disappearance or gradient explosion when the sequence is too long.The advent of LSTM has solved these problems to some extent, but its structure is complex and requires a long computation time.GRU streamlines the unit structure while inheriting the ideas of LSTM, and the accuracy is also improved.Therefore, we choose GRU as the component of the temporal feature extractor1 in the model.Instead of using GRU directly to predict the time series, the hidden states of GRU are used to obtain the temporal features indirectly.The calculation process is as follows: where h t is the output state at time step t; ∼ h t is the candidate hidden state at time step t; z t is the update gate which determines how much information needs to be retained in the current state h t from the historical state h t−1 ; r t is the reset gate which determines how much information needs to be retained in the candidate hidden state For the traffic flow prediction, the impact of each historical time step on the future is not equal.To better capture the temporal features in traffic flow data, an attention mechanism is used to learn the output states of GRU to adaptively assign weights to each historical time step.The calculation process is as follows: where A GRU is the weighting matrix for historical time steps; H = {h 1 , . . . ,h t } ∈ R N×F×T is the output state of GRU at T historical time steps; W q1 and W k1 are learnable parameters.
As shown in Equation ( 7), the output ĤG = { ĥG1 , . . ., ĥGT } ∈ R N×F×T is obtained by weighted summation, which will be used as the input of the spatial feature extractor.

Spatial Feature Extractor
The extraction of spatial features of road networks has been the key to traffic flow prediction.In general, the spatial structure of the road network is represented by the adjacency matrix, which reflects the location of the sensor nodes, so extracting the spatial features of the road network is to extract the location features of the sensor nodes.A node attention graph convolution operation is proposed to extract spatial features.
The attention mechanism is able to dynamically capture important information in the data.An attention mechanism is applied to learn the input to adaptively assign weights to each sensor node and capture the correlation between nodes.The calculation process is as follows: where A Node is the weighting matrix for sensor nodes; W q2 , W k2 , W, V, b are learnable parameters; σ(•) denotes the sigmoid function.
After that, the spatial features of the road network need to be extracted.The graph convolution based on spectral methods [30] is suitable for traffic flow data with non-Euclidean spatial structures.First, we calculate the normalized Laplacian matrix L of the graph G and make an eigendecomposition of it: where I N is an identity matrix; D is the degree matrix of the graph G; U is the eigenvector matrix of L; Λ is the diagonal matrix consisting of the eigenvalues of L.
Based on that, the graph convolution operation * G of the graph signal x with C filters g θ is defined as: where K is the order of the Chebyshev polynomial; T k (•) is the Chebyshev polynomial of order k; β k denotes the polynomial coefficients and also the learnable parameters; ∼ Λ is the diagonal matrix consisting of adjusted eigenvalues, which ensures the inputs of the Chebyshev polynomial satisfy the range [−1,1]; λ max is the maximum eigenvalue of the Laplacian matrix L; ∼ L is the Laplacian matrix with adjusted eigenvalues.In the model, C and K are hyperparameters that need to be set.Similar to convolutional neural networks, the number of filters C is currently set mainly by experience.The value of C is set to 64 by references [39,41].K is the order of the Chebyshev polynomial, which means that the range of information extraction in the graph convolution is from 1st to K-th order neighbors around each node [42].As K increases, the performance of the model improves slightly, but the computational cost also increases.Considering that extracting information from the 1st-3rd order neighbors of each node will provide good performance, and it is difficult to significantly improve performance by further increasing K, the value of K is set to 3.
In the above process, we replace the input signal x with ĤG , multiply it with the weight matrix A Node of sensor nodes, and use the rectified linear unit (ReLU) as the activation function, then the node attention graph convolution is calculated as: where ĤN = { ĥN1 , . . ., ĥNT } ∈ R N×C×T is the output of this module.

Adaptive Residual Block
To reduce the loss of spatio-temporal features in the deep network, a residual connection is constructed, which can project the input into the feature space of the output of the spatial feature extractor by 1 × 1 convolution.After summing with the adaptive residual output, the output Ĥ = { ĥ1 , . . ., ĥN } ∈ R N×C×T is obtained by the ReLU function.The calculation process is as follows: where Γ θr (•) denotes the 1 × 1 convolution operation with θ r as the parameter; W r is a learnable parameter.Finally, Ĥ is normalized, and an output that matches the predicted target shape is subsequently obtained through the fully connected layer.

Multi-Component Fusion
GRGCAN model contains three structurally identical components, each with the outputs Ĥh , Ĥd , and Ĥw .These three outputs are of different importance to the prediction results [39].For example, the importance of the day-period component and the weekperiod component will be higher when predicting traffic flow on weekday morning peaks compared to predicting traffic flow on suburban roads.Therefore, a learnable weight is assigned to each output to learn the fusion method from the historical traffic flow data.The calculation process is as follows: where W h , W d and Ĥw are learnable parameters.

Experiment 4.1. Datasets and Preprocessing
To test the performance of the GRGCAN model, we conducted experiments on two real-world traffic flow datasets, PeMSD4 and PeMSD8.These PeMS datasets [43] are collected by the Caltrans Performance Measurement System; they record traffic data for major freeways in California over a period, updated every 5 min, i.e., the time step is 5 min.The data collected by redundant sensors were removed according to the method of [39] to ensure that the distance between any adjacent sensors is larger than 3.5 miles.Processed PeMSD4 records traffic flow data from 307 sensors in the California Bay Area from 1 January 2018 to 28 February 2018.Processed PeMSD8 records traffic flow data from 170 sensors in San Bernardino, California, from 1 July 2016 to 31 August 2016.
The dataset is divided into the training set, validation set, and test set in the ratio of 6:2:2 according to the time order.In addition, to accelerate the convergence of the model during training, the data were transformed by using zero-mean normalization to make them average zero.The calculation process is as follows: where x is the processed traffic flow data;

∼
x is the raw traffic flow data; mean(•) denotes the mean value operation.

Experiment Settings
We built the GRGCAN model using the deep learning framework PyTorch and conducted experiments on a computer with a 12th Gen Intel(R) Core(TM) i7-12700H 2.30 GHz CPU, NVIDIA GeForce RTX3070 Laptop GPU, and 16G-DDR5 RAM.
We use 1 h of historical traffic flow data as input, i.e., the input sequence length T is 12, to predict the traffic flow in the next 1 h, i.e., the output sequence lengths T are 3, 6, and 12, respectively.In the training process, we used the mean absolute error (MAE) as the loss function (L1 loss function) and adaptive moment estimation (Adam) optimizer.With a balance of training efficiency and equipment limitations, the learning rate was set to 0.001, the batch size was set to 32, and the model was trained 100 times.

Evaluation Metrics
We used three common metrics for evaluating deep learning models to assess the performance of the GRGCAN model: mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE).At time step t, they are calculated as follows: where N denotes the number of nodes in graph G; ŷi is the predicted value of traffic flow; y i is the true value of traffic flow.

Baselines
To test the performance of the proposed GRGCAN model, the following seven models were used as baselines.

•
GRU [23]: Gated recurrent unit network: treating traffic flow data as simple time series.
• T-GCN [31]: Temporal graph convolutional network: a model that uses two-layer GCN to extract spatial features and GRU to extract temporal features.

•
MSTGCN [39]: Multi-component spatial-temporal graph convolutional networks: a model that uses GCN and CNN to extract spatial and temporal features of the data, respectively.• ASTGCN [39]: Attention-based spatial-temporal graph convolutional networks: an MSTGCN-based model that uses spatio-temporal attention mechanism and spatiotemporal convolution to extract features.

•
STAWnet [40]: Spatial-temporal attention wavenet: A model that applies temporal convolution to capture temporal features and a self-attention network to capture dynamic spatial features without requiring prior knowledge of the graph.

Result Analysis
Table 1 shows the performance of the GRGCAN model with the other baseline models on the two datasets.Considering that the warning period of traffic congestion is roughly 30 min [44], the time steps to be predicted are set to 3 (15 min), 6 (30 min), and 12 (1 h). Figure 3 shows the prediction results of the GRGCAN model for the traffic flow during 24 h on both datasets.
Based on the experimental results, the following were observed.(1) GRGCAN achieves excellent accuracy on both datasets and performs best on most metrics, especially when the number of time steps to be predicted is small.(2) All models that consider the spatial characteristics of the data outperform the GRU net, which implies that the spatial information of the traffic data is important for prediction.(3) The trends of traffic flow predicted by GRGCAN are generally consistent with the trends of the actual values.(4) As shown in Figure 3a, the predicted value at the outlier in the dataset is not disturbed by the outlier, which indicates the model's good robustness.
Bold represents the best performance.Based on the experimental results, the following were observed.( 1) GRGCAN achieves excellent accuracy on both datasets and performs best on most metrics, especially when the number of time steps to be predicted is small.(2) All models that consider the spatial characteristics of the data outperform the GRU net, which implies that the spatial information of the traffic data is important for prediction.(3) The trends of traffic flow predicted by GRGCAN are generally consistent with the trends of the actual values.(4) As shown in Figure 3a, the predicted value at the outlier in the dataset is not disturbed by the outlier, which indicates the model's good robustness.
Due to the immediacy of traffic flow, the computational efficiency of traffic flow prediction models is important.As shown in Table 2, we compared the training efficiency of GRGCAN and baselines on the PeMSD4 dataset (except GRU, which does not consider spatial features), indicated by the average training time of 1 epoch for each model.It is observed that GRGCAN achieves the best training efficiency, which shows its streamlined and effective structure.In T-GCN, two repeated graph convolution operations are performed, which leads to a rise in computational effort.In MSTGCN and ASTGCN, the ST block needs to be reused twice to achieve better results, which leads to a rise in the number of parameters.STSGCN needs to construct the localized spatial-temporal graph first, and STAWnet uses a self-learning adjacency matrix, both of which lead to the generation of additional computations.It is observed that GRGCAN achieves the best training efficiency, which shows its streamlined and effective structure.In T-GCN, two repeated graph convolution operations are performed, which leads to a rise in computational effort.In MSTGCN and ASTGCN, the ST block needs to be reused twice to achieve better results, which leads to a rise in the number of parameters.STSGCN needs to construct the localized spatial-temporal graph first, and STAWnet uses a self-learning adjacency matrix, both of which lead to the generation of additional computations.
Due to GRGCAN's excellent computational efficiency and short-time prediction accuracy, it is well suited to be used for real-time traffic regulation and other tasks.

Ablation Experiment
To verify the validity of each module in the GRGCAN model, the temporal feature extractor, the spatial feature extractor, and the adaptive residual block were removed from the model, respectively.Then the prediction experiments were conducted on the PeMSD4 dataset for the future 1-h traffic flow (T = 12).We name the three degenerate models GRGCAN-1, GRGCAN-2, and GRGCAN-3, respectively.The experimental results are shown in Table 3.The experimental results show that the original model outperforms the three degenerated models.Thus, the temporal feature extractor, spatial feature extractor, and adaptive residual block all positively impact the model's performance.Among them, the most significant impacts on the model performance are the spatial feature extractor, which indicates that the application of the node attention mechanism helps to effectively extract the spatial features of the traffic flow data.

Conclusions
To support the construction of intelligent transportation systems, relieve traffic pressure, and reduce greenhouse gas emissions, a GRGCAN model for traffic flow prediction is proposed.In this model, GRU and GCN are combined with an attention mechanism to adaptively extract spatio-temporal features of traffic flow and reduce the loss of features in the deep network by adaptive residual connection.The experimental findings of one-hour traffic flow prediction using two real-world datasets, namely PeMSD4 and PeMSD8, indicate that the GRGCAN model has a significantly lower MAPE of 15.97% and 12.13%, respectively.Moreover, it outperforms the baseline models in terms of accuracy.Notably, the streamlined model does not reuse structures, which results in an efficient computational performance.The average training time per epoch is as low as 19.71 s.In addition, the ablation experiment proves that either temporal feature extractor, spatial feature extractor, or adaptive residual connection has a positive effect on the performance of the model.In conclusion, the GRGCAN is a novel traffic prediction model that can effectively capture the spatio-temporal features in graph-structured traffic data and provide accurate prediction results.
In future research, we hope to construct more accurate models by considering factors that have an impact on traffic flow, such as weather [45], epidemic [46], or driver's driving style [47].In addition, we will further consider the impact of cyclical vacations on traffic and try to research using techniques such as continual learning [48].It is possible to contribute to a more environmentally friendly intelligent transportation system by predicting the greenhouse gas emissions generated by road traffic accordingly.

Figure 1 .
Figure 1.Spatial and temporal correlation of transportation networks.

Figure 1 .
Figure 1.Spatial and temporal correlation of transportation networks.

Figure 2
Figure 2 demonstrates the structure of the GRGCAN model.The GRGCAN model consists of three independent components with the same structure, and their inputs are historical time series, day-period time series, and week-period time series, respectively.Each component consists of three main parts: (1) Temporal feature extractor: for extracting temporal features of traffic flow data, (2) spatial feature extractor: for extracting spatial features of traffic flow data, (3) adaptive residual block: for reducing feature loss in deep networks adaptively.

Figure 2
Figure 2 demonstrates the structure of the GRGCAN model.The GRGCAN model consists of three independent components with the same structure, and their inputs are historical time series, day-period time series, and week-period time series, respectively.Each component consists of three main parts: (1) Temporal feature extractor: for extracting temporal features of traffic flow data, (2) spatial feature extractor: for extracting spatial features of traffic flow data, (3) adaptive residual block: for reducing feature loss in deep networks adaptively.

Figure 2 .
Figure 2. Structure of the GRGCAN model.(1) Temporal feature extractor: consists of GRU and attention module applied to the hidden state of GRU.(2) Spatial feature extractor: consists of a GCN module fused with a node attention mechanism.(3) Adaptive residual block: consists of a 1 × 1 convolutional network and a fully connected layer.

Figure 2 .
Figure 2. Structure of the GRGCAN model.(1) Temporal feature extractor: consists of GRU and attention module applied to the hidden state of GRU.(2) Spatial feature extractor: consists of a GCN module fused with a node attention mechanism.(3) Adaptive residual block: consists of a 1 × 1 convolutional network and a fully connected layer.
b h are learnable parameters; σ(•) denotes the sigmoid function; denotes the Hadamard product.

Figure 3 Figure 3 .
Figure3shows the prediction results of the GRGCAN model for the traffic flow during 24 h on both datasets.

Figure 3 .
Figure 3. Visualization of traffic flow prediction results of GRGCAN on PeMSD4 and PeMSD8 datasets.(a) 24-h prediction results on the PeMSD4 dataset; (b) 24-h prediction results on the PeMSD8 dataset.Due to the immediacy of traffic flow, the computational efficiency of traffic flow prediction models is important.As shown in Table 2, we compared the training efficiency of GRGCAN and baselines on the PeMSD4 dataset (except GRU, which does not consider spatial features), indicated by the average training time of 1 epoch for each model.

Table 1 .
Performance of each model at a given prediction time step.

Table 2 .
Average training time of each model.

Table 2 .
Average training time of each model.
Bold represents the best performance.

Table 3 .
Results of ablation experiments.