Augmented Multi-Component Recurrent Graph Convolutional Network for Trafﬁc Flow Forecasting

: Due to the periodic and dynamic changes of trafﬁc ﬂow and the spatial–temporal coupling interaction of complex road networks, trafﬁc ﬂow forecasting is highly challenging and rarely yields satisfactory prediction results. In this paper, we propose a novel methodology named the Augmented Multi-component Recurrent Graph Convolutional Network (AM-RGCN) for trafﬁc ﬂow forecasting by addressing the problems above. We ﬁrst introduce the augmented multi-component module to the trafﬁc forecasting model to tackle the problem of periodic temporal shift emerging in trafﬁc series. Then, we propose an encoder–decoder architecture for spatial–temporal prediction. Speciﬁcally, we propose the Temporal Correlation Learner (TCL) which incorporates one-dimensional convolution into LSTM to utilize the intrinsic temporal characteristics of trafﬁc ﬂow. Moreover, we combine TCL with the graph convolutional network to handle the spatial–temporal coupling interaction of the road network. Similarly, the decoder consists of TCL and convolutional neural networks to obtain high-dimensional representations from multi-step predictions based on spatial–temporal sequences. Extensive experiments on two real-world road trafﬁc datasets, PEMSD4 and PEMSD8, demonstrate that our AM-RGCN achieves the best results.


Introduction
Traffic flow forecasting plays a vital role in Intelligent Transportation Systems (ITSs) [1]. Given a road network, traffic flow forecasting aims to predict the trends of traffic flow in the near future based on historical flow data. As traffic congestion is becoming a serious problem in most cities, how to accurately predict traffic flow is of great significance to transportation management, environmental protection, and public safety.
Traffic flow forecasting is a typical spatial-temporal problem where both spatial features and temporal features should be considered thoroughly. Early forecasting approaches [2,3] mainly made use of traditional statistical models to mine the implicit rules hidden in the data. However, these models are too simple to capture the non-linearity of the traffic series. Contrary to statistical methods, classical machine learning-based approaches [4,5] were able to learn more complex relationships but required careful feature engineering which could be laborious and tedious for human beings. Inspired by advances in deep learning, some attempts [6,7] tried to make predictions based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). However, classical CNNs cannot exploit the non-Euclidean relationship of irregular traffic road networks because of their regular convolutional operations. From this perspective, Graph Convolutional Networks (GCNs) have been introduced to traffic flow forecasting considering their ability to deal with graph data. Moreover, some research works [8][9][10][11] combined GCNs with RNNs and CNNs in order to capture spatial and temporal characteristics, respectively.
Although promising advances have been made in traffic flow forecasting, it is still very challenging to achieve highly accurate predictions, mainly due to the two following reasons: (a) the characteristics of periodic temporal shifts in traffic flow are not taken into consideration and (b) the spatial-temporal correlations are not captured effectively.
For the former, most existing approaches [12][13][14][15][16][17] only paid attention to the periodicity of the traffic flow regardless of periodic temporal shift, which resulted in the non-comprehensive capture of temporal characteristics. Thus, the robustness and accuracy achieved by these models cannot meet expectations. To illustrate, the periodicity of the traffic series is dynamic rather than static because of various factors such as complicated weather or real-time traffic conditions, although daily periodicity and weekly periodicity are commonly recognized as strong contributors to traffic flow forecasting. A typical example of the periodic temporal shift is shown in Figure 1 (data source: Didi's real-world traffic flow data ranging from 31 October 2019 to 30 November 2019 in Beijing). The daily peak hours in Figure 1a are usually between 6:00 p.m. and 7:00 p.m., but could vary from 5:00 p.m. to 9:00 p.m., depending on whether it is a workday and other factors such as abnormal weather and traffic congestion. Similarly, in Figure 1b, the fluctuation can be observed in weekly numbers. Therefore, it is difficult for current methods to deal with the dynamic and complex situations in actual traffic networks, with only modeling of the static characteristic of periodicity. For the latter, traffic data have tightly coupled spatial-temporal correlations, but recent studies [9,13,18] have not considered the mutual dependence between spatial features and temporal features in traffic flow. They adopted the GCN module to represent the spatial features of the whole traffic network at the same time step, and the CNN module to process temporal features of each road at different time steps. This solution decouples the spatialtemporal correlations and results in the loss of some implicit factors, such as the influence of each road on its surrounding roads at the different time steps. Thus, the accuracy of model prediction is lower than expected.
To address the two above-mentioned challenges, we propose a deep learning-based framework: the Augmented Multi-component Recurrent Graph Convolutional Network (AM-RGCN) for traffic flow forecasting. We first come up with an augmented multi-component module to capture the periodic temporal shift that emerges in the traffic flow series. Then, we present an encoder-decoder architecture where the encoder aims to capture the spatialtemporal correlations and the decoder has the ability to obtain high-dimensional representations from multi-step predictions based on spatial-temporal sequences. A fusion module is finally developed to produce prediction results by incorporating high-dimensional representations. In summary, this paper makes the following contributions: • We propose an augmented multi-component module to capture the characteristics of the periodic temporal shift in traffic series by adding the temporal shift representations to the periodic representations. • We propose the Temporal Correlation Learner (TCL) which incorporates one-dimensional convolution into LSTM and combine it with graph convolution in encoder-decoder architecture to handle the spatial-temporal correlations in the road network. • Extensive experiments on two real-world traffic datasets, PEMSD4 and PEMSD8, verify that our AM-RGCN achieves state-of-the-art results compared with the existing approaches.
The remainder of this article is organized as follows. The related works on traffic flow forecasting are discussed in Section 2. Section 3 introduces our proposed approaches in detail. In Section 4, we conduct comparable experiments with AM-RGCN on real-world traffic datasets and analyze the results. Finally, the conclusions of this study are provided in Section 5.

Traffic Flow Forecasting
Traffic flow forecasting has extensively been researched in ITSs. The existing approaches of traffic flow forecasting can be mainly divided into traditional approaches and deep learning approaches. Specifically, the traditional approaches can be further classified into two categories: parametric and non-parametric models [19]. Parametric models, such as Auto-Regressive Integrated Moving Average (ARIMA)-based approaches [2] and Kalman Filtering (KF)-based approaches [3], employ the historical traffic series to statistically mine the implicit rule in time series. However, these approaches are too simple to capture the non-linearity in traffic data. Non-parametric models, such as K-Nearest Neighbor (KNN)-based approaches [5], Support Vector Regression (SVR)-based approaches [4], and Gradient-Boosted Regression Tree (GBRT)-based approaches [20], employ the principle of empirical risk minimization, which may suffer from the overfitting problem.
Deep learning has made great achievements in recent years. Consequently, many researchers apply deep learning approaches to traffic flow forecasting. Some attempts [6,21] applied Long Short-Term Memory (LSTM) [22] and Gated Recurrent Units (GRUs) [23] to traffic flow forecasting and achieved remarkable results. However, these approaches mainly consider the characteristics of time series and neglect the spatial features in traffic networks. In order to handle the spatial features simultaneously, Ma et al. [24] proposed a deep CNN model which exploits a two-dimensional spatial-temporal matrix to capture spatial-temporal information. Yu et al. [25] employed a CNN to obtain the spatial features and then applied LSTM for time series analysis. However, the CNN-based approaches mainly extract the features in Euclidean space rather than the non-Euclidean space. To tackle the problem, GCNs [26] are introduced into traffic flow forecasting. Spatial-Temporal GCNs (STGCNs) [9] and Temporal GCNs (T-GCNs) [27] both adopt GCNs to obtain spatial features and then capture temporal features via one-dimensional CNNs or GRUs, separately. These approaches generally perform better than the approaches without GCNs in prediction. Song et al. [28] proposed the Spatial-Temporal Synchronous GCN (STSGCN) to capture the heterogeneity in spatial-temporal networks and achieved a distinct outcome. Bai et al. [29] integrated GCNs into GRUs to capture spatial-temporal relations simultaneously. Some studies [30][31][32] attempted to introduce the attention mechanism to solve it.
The GCN-based approaches above mainly use the recent data to represent temporal information while ignoring the periodicity in traffic flow. Roy et al. [14] proposed the Simplified Spatio-temporal Traffic GNN (SST-GNN) to capture the periodic traffic patterns by adopting a novel position encoding scheme. Chen et al. proposed the Temporal Directed GCN (T-DGCN) [15] which utilizes a novel global position encoding strategy to capture temporal dependence such as daily periodicity. Ou et al. [16] proposed Spatial-Temporal Parallel TrellisNets (STP-TrellisNets) which use the Periodicity TrellisNet (P-TrellisNet) module to capture periodicity in traffic series. These models considered daily periodicity while ignoring the weekly periodicity. Although some scholars proposed the Multi-component STGCN (MSTGCN) [13], Attention-based STGCN (ASTGCN) [18], and Information Geometry and Attention-based GCN (IGAGCN) [17] to represent the daily and weekly periodicity, they left out the impacts of periodic temporal shift. Yao et al. [33] proposed the Spatial-Temporal Dynamic Network (STDN) for periodic temporal shift, which exploited an LSTM network with an attention mechanism to capture the long-and short-term dependencies in traffic series. However, the model is deployed in Euclidean space and focuses on addressing the shifting in daily periodicity. In addition, most of the above-mentioned approaches adopt different models to capture spatial and temporal features separately. Accordingly, they fail to acquire the spatial-temporal correlations effectively.

Graph Convolution Networks
Traditional approaches [24,25] mainly divided the traffic network into grids and employed CNNs to capture spatial features, which ignored the topological connectivity of traffic networks. Graph convolutional approaches can handle graph-structured data by aggregating the neighbors' information, which are effective approaches to extract complex spatial topological relationships in traffic networks.
Graph convolutions networks fall into two categories, spectral-based and spatialbased. Spatial-based approaches directly conduct convolution operations on the nodes of the graph. GraphSAGE [34] introduced an aggregation function to define graph convolution. The Graph Attention Network (GAT) [35] exploited attention layers to adjust the importance of each node when applying aggregation functions. Spectral-based approaches [26,36,37] employ a Laplacian matrix to perform convolution operations on graphs in the Fourier domain. According to the selection of convolution kernels, spectral-based approaches mainly include ChebNet [37] and GCNs [26]. GCNs employ ChebNet's first-order approximation to greatly simplify the parameters of the graph convolution. By stacking multiple GCN layers, the receptive neighborhood range of GCNs can be enlarged.

Preliminaries
Given the historical traffic flow data recorded by sensors in the traffic network and the topological graph of the corresponding sensors, the purpose of traffic flow forecasting is to predict the future traffic flow in road networks.
In this study, we define the traffic road network as an undirected graph G = (V, E, A), where V is a finite set of |V| = N nodes denoting the sensors; E is a set of edges connecting different sensors; A ∈ R N * N is the adjacency matrix of graph G, representing the connectivity of the whole road network. The adjacency matrix contains only elements of 0 and 1. The element is 1 if there are two sensors directly adjacent on the same road, otherwise the element is 0. The traffic flow observed on G at time t is denoted as a graph signal X t ∈ R N * F , where F is the feature dimension of each node. The historical traffic flow data at time t can be defined as X = (X t−H+1 , X t−H+2 , . . . , X t ) ∈ R H * N * F , where H is the length of historical observation data. The forecasting traffic flow is denoted as Y = (X t+1 , X t+2 , . . . , X t+P ) ∈ R P * N * F , P is the length of forecasting data. Traffic flow forecasting aims to learn a model φ that can accurately forecast future P graph signals given historical H graph signals of the whole road network:

Methodology
We propose a general framework named AM-RGCN to address the problem of periodic temporal shift and exploit spatial-temporal correlations. As shown in Figure 2, the proposed AM-RGCN mainly consists of three modules: (1) an augmented multi-component module which intends to capture the characteristics of periodicity and periodic temporal shift synchronously; (2) an encoder module which aims to characterize the spatial-temporal correlations in traffic flow data; (3) a decoder module which performs multi-step predictions from spatial-temporal sequences.

Augmented Multi-Component Module for Periodic Temporal Shift
The idea of the multi-component was introduced by Guo et al. [18]. The proposed multi-component module incorporates the recent component, daily periodicity component, and weekly periodicity component of traffic flow data. As shown in Figure 3a, T p and t c refer to the predicting window (from 5:00 p.m. to 6:00 p.m. on Thursday) and the current time, respectively. T h , T d , and T w are numbers of time steps of the above three components describing traffic flow data in different time scales, respectively. Specifically, T h = N h * T p , where N h ∈ N + represents using the traffic series from the past N h hour(s). T d = N d * T p , where N d ∈ N + denotes using the traffic flow of the same period (from 5:00 p.m. to 6:00 p.m.) in the past N d day(s). T w = N w * T p , where N w ∈ N + indicates using the traffic records of the same period (from 5:00 p.m. to 6:00 p.m.) in the past N w Thursday(s).
The augmented multi-component introduces the daily augmented component and weekly augmented component for the daily shift and weekly shift. As shown in Figure 3b, T ds and T ws are the lengths of the daily and weekly augmented component, separately. S denotes periodic offset, indicating that there is a period shift of S * T p time steps before and after the daily and weekly periodicity. Specifically, the relationship between the augmented multi-component and multi-component can be expressed as Equation (1): The sampling frequency of traffic series is defined as f times a day, then the details of the augmented multi-component are described as follows: (1) Recent component As shown in Figure 3b, the recent component is the golden part, which represents time series that are the closest to the prediction sequence. Owing to the continuity of traffic flow, we argue that there exist strong correlations within recent moments. The expression of the recent component is described as follows: where N represents the number of nodes in the road network, F is the dimension of each node representation, and X t stands for the traffic flow at time t. (2) Daily augmented component As shown in Figure 3b, the daily component is the green part, which represents data in the same period as the prediction window in the last several days. The impact of the periodic shift is caused by abnormal weather, traffic congestion, and other factors in traffic flow. Consequently, we add offset series, which is the same length as forecasting sequences, before and after daily components, to form the daily augmented component. The component can be expressed as follows and we simplify the result by Equation (1): Suppose the time step is 5 min and we wish to predict the traffic flow of the next hour (T p = 12) from 5:00 p.m. to 6:00 p.m. on Thursday. f = 288 is the sampling frequency of a day. Let S = 1 and T d = 24, then N d = 2. The Equation (3) is the traffic flow from 4:00 p.m. to 7:00 p.m. on the most recent Wednesday.
(3) Weekly augmented component As shown in Figure 3b, the weekly component is the red part, which represents data in the same period as the prediction window in the last several weeks. We add an offset series, which is the same length as the forecasting sequence, before and after the weekly component to form the weekly augmented component. Similarly, it can be expressed as: Assume that we have the same setting as described in the daily augmented component. Let S = 1 and T w = 24, then N w = 2. Equation (4) indicates that we adopt the traffic records of 4:00 p.m. to 7:00 p.m. from the past two Thursdays. Accordingly, (X t c −N w * f * 7−S * T p +1 , . . . , X t c −N w * f * 7+S * T p +T p ) can be taken as the traffic flow from 4:00 p.m. to 7:00 p.m. on the Thursday two weeks ago while is the traffic flow from 4:00 p.m. to 7:00 p.m. on the last Thursday.
The above three components jointly make up the augmented multi-component module, which takes into account the periodicity and periodic temporal shift in traffic forecasting. Let T = T h + T ds + T ws be the length of the augmented multi-component module, and the input data X am = (X h , X ds , X ws ) ∈ R T * N * F are passed to the encoder-decoder architecture.

Encoder for Spatial-Temporal Correlations
The encoder module is designed to exploit spatial-temporal correlations. As shown in Figure 2, it is composed of a GCN and TCL, both of which are employed to learn the spatial-temporal representations from the augmented multi-component series.

Graph Convolution in Spatial Dimension
As illustrated in Figure 4a, the traffic network is a typical graph structure and the neighbors' traffic flow of each sensor is essential for forecasting. We choose a GCN to capture spatial topological relationships. As briefly illustrated in Figure 4b, the GCN model can obtain the topological relationship between the central sensor and its first-order surrounding sensors and embed the traffic flow attributes in the network. The two-layer GCN model can be expressed as: where X t ∈ R N * F denotes the characteristics of the road network at each time slice t ∈ {1, . . . , T};Â =D − 1 2ÃD − 1 2 ∈ R N * N indicates the renormalization trick;Ã = A + I ∈ R N * N means adding a self-loop to the adjacency matrix;D = ∑ jÃij ∈ R N * N , W 0 ∈ R F * H and W 1 ∈ R H * C represent the parameter matrix from the input feature dimension F to the output feature dimension H and C, separately. ReLU is the activation function.
To take full advantage of the topological information, we exploit a two-layer shared weight GCN to capture the spatial features of the traffic network at each time slice.

Temporal Correlation Learner (TCL) in Temporal Dimension
The traffic flow data are mainly the three-dimensional input of nodes, sequences, and features. In the previous section, we exploited GCNs to represent the mutual spatial correlations among all sensors along the node dimension. In TCL blocks, we first append one-dimensional convolution to integrate the internal characteristic of each sensor in the feature dimension, and then adopt LSTM in the sequence dimension for temporal features. Our proposed TCL is partly inspired by Convolutional LSTM (ConvLSTM) [38]. As illustrated in Figure 5, we denote the spatial representations extracted from GCNs at each moment as G t ∈ R N * C . The TCL first exploits one-dimensional convolution to integrate the spatial characteristic of each sensor based on the previous hidden state H t−1 ∈ R N * H and G t , then passes it to the LSTM together with the previous cell memory state C t−1 ∈ R N * H to learn the temporal features, where H is the hidden size. Specifically, we initialize all the states of the LSTM to zero before the first input comes. During the training process, we add zero-padding to the hidden states before applying convolutional operations. The kernel size and padding size of the one-dimensional convolution are 3 and 1, separately.
To ensure that spatial and temporal features can be learned simultaneously, we input spatial features G t as the source of the TCL at each moment of traffic flow series, which helps the model to learn spatial-temporal correlations. Thus, at time step t, the computation process of the proposed TCL can be simplified as: where t ∈ {1, . . . , T}. We pass G t into the TCL and update the cell memory state C t using the input gate I t , the forget gate F t , and the previous hidden state H t−1 . Finally, we employ the output gate O t to update the current hidden state H t . We can summarize the computation process above as follows: where W αβ (α ∈ {g, h, c}, β ∈ {i, f , c, o}) denotes the learnable parameters in TCL; σ and tanh denote the activation function; * represents the one-dimensional convolution operation; denotes the Hadamard product. The encoder module then passes the final output state H T ∈ R N * H , C T ∈ R N * H of TCL including the spatial-temporal features of traffic flow to the decoder.

Decoder for Multi-Step Prediction
The decoder module is mainly used for multi-step prediction. As shown in Figure 2, it is composed of the TCL and CNN, employing the hidden states obtained from the encoder to produce high-dimensional feature representations from spatial-temporal sequences.
In our decoder, TCL is adopted to unfold the hidden state H T and the cell memory state C T from the encoder. Additionally, since there are no input sequences in the TCL, we initialize an all-zero array with the same dimension as the hidden state H T as the input for simplification. Specifically, at each moment, we employ the hidden state and cell memory state from the previous moment and the all-zero array to forecast the next moment. The purpose is to ensure that the prediction result at each moment is related to the previous moment. Assuming that the size of the prediction window is P, the expression of the decoder is: where t ∈ {T, . . . , T + P − 1} and 0 denote the all-zero arrays. We then concatenate {X T , . . . , X T+P } ∈ R P * N * H and apply a convolutional operation to convert the multi-step predictions into high-dimensional representations. The representations obtained from the above augmented multi-component module and encoder-decoder architecture are passed to the fusion module, which consists of a residual connection and CNN, to produce prediction results. Concretely, the fusion module utilizes a convolutional residual connection to integrate the residual information R from the augmented multi-component module with the high-dimensional representations F (X) of the decoder, aiming to speed up the model training and mitigate the overfitting problem. Eventually, a CNN is adopted to guarantee that the predictions Y ∈ R P * N * F have the same dimensions and shapes as expected.

Experiment
To evaluate the performance of our model, comparative experiments on two real-world traffic datasets are carried out. Moreover, we carry out ablation studies to demonstrate the effectiveness of different modules.

Datasets
The public traffic datasets PEMSD4 and PEMSD8 are the real highway traffic datasets collected by the California Transportation Agency Performance Measurement System (PeMS). The system is displayed on a map and has more than 39,000 independent sensors deployed on the highway system across all major metropolitan areas of the state of California. The observations of the sensors are aggregated into 5-min windows and the geographic information of the sensors is also included.
We use the popular benchmarks of PEMSD4 and PEMSD8 released by Guo et al. [18] which remove redundant sensors whose distance is less than 3.5 miles and adopt linear interpolation for missing values. The details of the datasets are described in Table 1: (1) PeMSD4 records two months of statistics on traffic flow in the San Francisco Bay Area, ranging from 1 January 2018 to 28 February 2018, including 307 sensors. We choose data from the first 50 days as the training set and validation set, and the remaining 9 days as the test set. (2) PeMSD8 contains two months of statistics on traffic flow in the San Bernardino area, ranging from 1 July 2016 to 31 August 2016, including 170 sensors. We select data from the first 50 days as the training set and validation set, and the remaining 12 days as the test set. In addition, we preprocess the dataset by calculating the maximum value Max(X ) in the dataset and normalizing the entire dataset by X = X /Max(X ).

Model Parameter
All experiments are compiled and tested on a Linux cluster (CPU: 6 Intel Core Processor (Broadwell), GPU: NVIDIA Tesla P40). The model parameters can be divided into three parts: (1) augmented multi-component: in this study, we focus on predicting the traffic flow of the next hour, namely, T p = 12. For T p = 6 or 3, we use the model parameter of T p = 12 for training efficiency. Due to a trade-off between prediction accuracy and computational efficiency in the experiment, we set the three component parameters as T h = 24, T d = 12, and T w = 12 and the periodic offset as S = 1 for both datasets. Moreover, the augmented intervals cover the range of periodic temporal shifts of peak hours shown in Figure 1. Consequently, the length of the augmented multi-component sequence is T D = 96; (2) network structure: the encoder uses two layers of the GCN, whose convolution filters are 128 and 64, separately. The convolution filters of the TCL are consistent with the number of sensors and there are 64 hidden units. In the decoder, there are 64 hidden units of the TCL while the output sequences are T p and the convolution filters of the CNN are set as T D . In the fusion module, the convolution filters are T p ; (3) training hyperparameters: we train our model using the Adam optimizer [39] with a learning rate of 0.001 and weight decay of 5 × 10 −4 . We set the dropout [40] as 0.8. In this paper, we apply a mean square error (MSE) between the estimator and the ground truth as the loss function.

Evaluation Metric
We adopt the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) as evaluation metrics: where Y i means ground truth,Ŷ i is the predicted traffic flow, and n is the number of all predicted values.

Baseline
We compare our model with the following baselines: • Historical Average (HA). We use the average of the past 12 time slices in the same period as a week ago to forecast the current time slice.
• ARIMA [2]. A typical traditional forecasting model for time series. We set the autoregressive coefficient p = 0, difference coefficient d = 0, moving average coefficient q = 1. • LSTM [22]. A special RNN model for time series prediction. We set historical traffic flow T h = 12 and the hidden size h = 64. • Gated Recurrent Unit (GRU) network [23]. An improved RNN model for time series prediction. We set historical traffic flow T h = 12 and the hidden size as 64. • STGCN [9]. The model employs one-dimensional convolution and graph convolution to extract spatial-temporal features, which are widely used in traffic flow forecasting. Both the graph convolution kernel size K s and temporal convolution kernel size K t are set to 3 in the experiments. • MSTGCN [13]. A multi-component network for traffic flow forecasting. The best combinations adopted in this paper are T h = 36, T d = 12, and T w = 12. • ASTGCN [18]. A traffic flow forecasting model, which adds spatial-temporal attention to the MSTGCN. The best combinations adopted in this paper are T h = 24, T d = 12, and T w = 24. • STSGCN [28]. A traffic forecasting model which attempts to capture the complex localized spatial-temporal correlations in spatial-temporal data. The best setting consists of 4 STSGCLs, each STSGCM contains 3 graph convolutional operations with 64, 64, 64 filters, separately.

Results and Analysis
Overall, in the baseline comparison, our model achieves the best performance in PEMSD4 and PEMSD8, compared to existing traditional methods and deep learning methods. Then, the augmented multi-component method and TCL module are evaluated and proved to be effective in the following aspects: (1) the augmented multi-component method outperforms the multi-component method and is conducive to capturing the periodic temporal shift in traffic flow; (2) our TCL module performs better than its variants and can learn the spatial-temporal correlations effectively. Table 2 presents the performances of the AM-RGCN and baseline models for 15 min (three time slices), 30 min (six time slices), and 1 h (12 time slices) ahead predictions on two datasets. As shown, our AM-RGCN performs the best on both datasets in terms of all evaluation metrics.

Baseline Comparison
The forecasting performances of traditional baselines (HA and ARIMA) are the worst, limited by their abilities to capture spatial-temporal characteristics from complex time series data. Comparatively, deep learning approaches (GCN-and RNN-based models) outperform them by large margins with their ability to learn from non-linear traffic data. In the one-hour traffic forecasting task on dataset PEMSD8, even the deep learning model with the worst performance (LSTM) works better than the best selected traditional method (HA). Statistically, the former reduces the RMSE and MAE errors by approximately 29.9% and 22.5% compared to the latter. In deep learning approaches, GCN-based models (STGCN, MSTGCN, ASTGCN, STSGCN, and AM-RGCN) generally perform better than RNN-based models (LSTM and GRU). The superior GCN-based models include graph convolution to extract spatial-temporal characteristics, while the latter only considers temporal features.
Among GCN-based models, STGCN is the worst one since it only uses the recent component to capture temporal features and is restricted by its lack of periodicity characteristics. Better methods, such as MSTGCN and ASTGCN, are able to capture daily and weekly periodicity with the multi-component method. Thus, they improve performances of RMSE and MAE by 12.6%, 9.5% and 13.5%, 10.2% in one-hour prediction on PEMSD4. However, these GCN-based models are not efficient in correlation recognition, since they utilize GCN and 1D-CNN modules to model the spatial and temporal characteristics separately. Comparatively, the STSGCN takes the localized spatial-temporal correlations into account and is superior to the STGCN too, but its neglect of the periodicity prevents its performance from completely surpassing MSTGCN and ASTGCN in all intervals and metrics. The above results of four GCN-based models prove the significance of periodicity and spatial-temporal correlations in traffic flow forecasting. However, their performances are limited by the inability to synchronously handle static periodicity characteristics and spatial-temporal correlations, as well as to introduce the dynamic periodicity shift.
In contrast, our AM-RGCN employs the augmented multi-component to grasp periodic offset characteristics, and it combines the TCL with GCN at each moment to learn the spatial-temporal correlations. It is compared with ASTGCN (the previous state-of-the-art model) in the next hour's prediction based on the same model parameters (about 1.4M). The results demonstrate that AM-RGCN decreases the RMSE and MAE by 6.3% and 8.0% on PEMSD8, and 8.0% and 9.2% on PEMSD4, although it takes more time (4.5 ms) than ASTGCN (2.4 ms) for one forward iteration at inference due to the employment of a recurrent structure. According to the above comparisons, our model has more advantages in expressing the spatial and temporal characteristics of traffic series.  ASTGCN) which only consider the periodicity in traffic series. To control variables, the ranges of the multi-component and augmented multi-component are set as T h = 24, T d = 12, T w = 12 and T h = 24, T ds = 36, T ws = 36, respectively, where the periodic offset S = 1. The setting indicates that the periodic temporal offset is one hour. The experimental results are shown in Figure 6. We can observe that each approach using the augmented multi-component performs better than that with the multi-component. For example, for the one-hour prediction in PEMSD4 and PEMSD8, MSTGCN, ASTGCN, and AM-RGCN all achieve a better performance when replacing the multi-component with the augmented multi-component. We argue that the factors which actually lead to the worse predictions in the multi-component, such as weather and traffic conditions, are not included in the period interval range. Comparatively, the augmented multi-component could cover the above factors and capture the characteristics of periodicity and periodic temporal shifts synchronously by augmenting the data range of each period module. Then, we conduct ablation experiments to further explore the contribution of each component in the augmented multi-component in PEMSD8 (Table 3), and following experimental results are observed: (1) the model equipped with X h significantly outperforms those with X ds and X ws , enhancing the effect by 31.6% and 32.2% and 28.1% and 23.9% for RMSE and MAE, respectively. This indicates that time series forecasting depends primarily on the recent time slices when considering only one component; (2) compared with single X h , the performance can be improved by combining the X h with X ds or X ws . Moreover, the best performance can be achieved when the model is equipped with all components, because increasing the daily augmented multi-component and weekly augmented multicomponent can help with modeling the periodicity and periodic temporal shift, compared to only applying the recent component for short-term dependency. Thus, these two experiments support the superiority of our augmented multi-component method in handling the problem of periodic temporal shift.

Effects of Temporal Correlation Learner
To further verify the advantages of the TCL, we compare AM-RGCN with its variants which replace the TCL with a CNN or LSTM, while all of them are equipped with same the augmented multi-component module. According to the experimental results in Table 4, we can draw several conclusions: Firstly, AM-LSTM-GCN does not perform as well as AM-CNN-GCN. We suggest the underlying reason is that errors are accumulated when this model generates predictions of multiple steps ahead via a step-by-step approach. To be more specific, if the prediction length is P, then we need to loop the LSTM units P times, which results in the accumulation of error in each time. Oppositely, AM-CNN-GCN directly avoids error propagation by employing a CNN to map the final temporal prediction length to P.
Secondly, AM-RGCN is superior to AM-CNN-GCN, owing to its combination of a GCN and TCL in the encoder network to capture the spatial-temporal correlations. Concretely, at each predicted moment, it considers the spatial-temporal information of the last time step to achieve a continuous forecast. In contrast, AM-CNN-GCN decouples the correlations between spatial and temporal features and leaves out the influence of each sensor on its surrounding sensors at different time steps in traffic networks.
As for the comparison of AM-RGCN and AM-LSTM-GCN, our model still maintains its significant primacy. This is because the solution to combine one-dimensional convolution and LSTM works effectively when handling spatial topological features from the GCN, while AM-LSTM-GCN flattens it due to the ability restriction. In detail, the flattening operation merges the spatial characteristics and other characteristics into one dimension, which results in the loss of spatial characteristics from the GCN. Accordingly, AM-RGCN is better than the other two variants at processing spatial-temporal correlations.

Conclusions and Future Work
We propose the Augmented Multi-component Recurrent Graph Convolutional Network (AM-RGCN) to perform traffic flow forecasting. Specifically, we introduce the augmented multi-component module to capture the periodic temporal shift emerging in traffic series. Then, we implement an encoder-decoder architecture where the encoder aims to capture the spatial-temporal correlations and the decoder is designed to obtain highdimensional representations of multi-step predictions.
In fact, the graph structure adopted in this paper is an undirected graph. In practice, the graph structure of the road network is directed and the conditions in the traffic network are dynamically changed. In the future, we will pay attention to improving the performance of traffic flow forecasting using a directed graph and attention mechanism. However, our AM-RGCN is a general spatial-temporal forecasting framework for the graph structure data, thus it can also be applied to other spatial-temporal prediction tasks, such as traffic speed forecasting.