Next Article in Journal
Transfer Learning-Based Coupling of Smoothed Finite Element Method and Physics-Informed Neural Network for Solving Elastoplastic Inverse Problems
Next Article in Special Issue
Efficiency Evaluation of China’s Provincial Digital Economy Based on a DEA Cross-Efficiency Model
Previous Article in Journal
A Stochastic Analysis of the Effect of Trading Parameters on the Stability of the Financial Markets Using a Bayesian Approach
Previous Article in Special Issue
A Text-Oriented Fault Diagnosis Method for Electromechanical Device Based on Belief Rule Base
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Traffic Flow Prediction Based on Dynamic Graph Spatial-Temporal Neural Network

1
School of Internet Economics and Business, Fujian University of Technology, Fuzhou 350014, China
2
School of Transportation, Fujian University of Technology, Fuzhou 350108, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(11), 2528; https://doi.org/10.3390/math11112528
Submission received: 25 April 2023 / Revised: 25 May 2023 / Accepted: 29 May 2023 / Published: 31 May 2023
(This article belongs to the Special Issue Data-Driven Decision Making: Models, Methods and Applications)

Abstract

:
More accurate traffic prediction can further improve the efficiency of intelligent transportation systems. However, the complex spatiotemporal correlation issues in transportation networks pose great challenges. In the past, people have carried out a great deal of research to solve this problem. Most studies are based on graph neural networks to model traffic graphs and attempt to use fixed graph structures to obtain relationships between nodes. However, due to the time-varying spatial correlation of the transportation network, there is no stable node relationship. To address the above issues, we propose a new traffic prediction framework called the Dynamic Graph Spatial-Temporal Neural Network (DGSTN). Unlike other models that use predefined graphs, this model represents stable node relationships and time-varying node relationships by constructing static topology maps and dynamic information maps during the training and testing stages, to capture hidden node relationships and time-varying spatial correlations. In terms of network architecture, we designed multi-scale causal convolution and adaptive spatial self-attention mechanisms to capture temporal and spatial features, respectively, and assisted learning through static and dynamic graphs. The proposed framework has been tested on two real-world traffic datasets and can achieve state-of-the-art performance.

1. Introduction

Increasing vehicle ownership and travel demand have led to huge pressure on traffic management, and effectively optimizing and positioning traffic resources has become a challenge. The emergence of intelligent transportation systems (ITS) makes it very possible to solve this challenge. For example, intelligent transportation systems mainly include intelligent transportation infrastructure and data analysis algorithms such as computer vision methods for real-time monitoring of the Internet of Vehicles [1], intelligent Internet of Vehicles solutions [2,3], traffic signal controls [4], and reinforcement learning for autonomous driving [5,6], etc.
As an essential part of intelligent transportation system technology, traffic forecasting plays a vital role in solving these problems by using historical traffic data to predict future traffic conditions and helping to reduce traffic congestion by realizing effective coordination between passengers, vehicles, and roads.
Early statistical models such as support vector machine (VAR) [7], autoregressive integrated moving average (ARIMA) [8], and HA, were used for traffic prediction problems, but these models did not perform well in practice. Subsequently, some machine-learning models, such as support vector machines (SVM) [9] and the k-nearest neighbors algorithm (KNN) [10], which could model nonlinear traffic data began to emerge, but their accuracy was hindered by time-consuming and complex feature engineering. Thanks to the success of deep-learning networks in modeling time series [11,12], a large amount of literature has begun to use deep-learning models for traffic prediction problems. The most widely used is the recurrent neural network (RNN) [13,14], but it is prone to vanishing gradient problems when modeling long sequences. For spatiotemporal data, exploring the spatial correlation between data is also very important. The spatial convolution neural network (CNN) uses convolution to extract hidden information from data by dividing the research area into grids. However, this method of dividing the road network into grids is very different from the structure of the real road network [15,16,17,18]. Graph neural networks have achieved great success in processing graph topology. Time convolutional network (TCN) is an architecture that convolves in the time dimension [19,20], using extended convolution to achieve exponential-level perception fields. However, due to the use of exponential-level perception fields, they cannot effectively capture cyclic changes in traffic conditions [21]. Graph convolution neural networks (GCN) can be used because the spatial-temporal correlation of non-Euclidean space structure applies to the road network [22].
Existing models, while effective at capturing spatial-temporal correlation, still face two enormous challenges. First, we show that the spatial correlation between actual nodes changes dynamically over time rather than being static and invariant. Traffic conditions detected by a specific sensor are difficult to capture. As can be seen in Figure 1a, for example, the spatial correlation between nodes A and B is very high in the morning hours but then weakens between nodes C and D during the afternoon. Second, most models typically use a predefined static graph structure to describe the actual road network relationship; the graph of the adjacency matrix constructed using Euclidean distance does not really reflect the spatial relationship of the road network, e.g., Figure 1b shows that nodes C and D are two nonadjacent sensor nodes which are highly correlated spatially.
To solve the above problems, we propose a new framework to solve the traffic flow problem: the Dynamic Graph Spatial-Temporal Neural Network (DGSTN) is used to predict the traffic flow of each sensor in the traffic network. We designed a set of adaptive graphs, including static graphs that can capture real node correlations in road networks and dynamic graphs that capture dynamic spatial correlations. In addition, for the model, temporal characteristics were captured by constructing multi-scale temporal convolutions, and time-varying spatial characteristics were captured according to an adaptive spatial self-attention mechanism and the proposed static topology and dynamic information graphs. Specifically: the main contributions of this paper are as follows:
  • A multi-scale time-gated convolution is proposed to capture different temporal finesse and, based on an improved adaptive spatial self-attention mechanism, the node correlation of the real spatial relationship is calculated.
  • The design is a set of adaptive graphs: a static topology graph combined with an adjacency matrix as prior information and an adaptive embedding matrix to capture real node dependencies. By capturing the similarity of changes in flow information, a set of dynamic information graphs is constructed to obtain a dynamic spatial correlation.
  • Results tested on two real-world datasets and show that the framework proposed in this paper achieves the best results when compared with various baselines.

2. Literature Review

2.1. Space-Time Traffic Forecast

The prediction of traffic flow is a fundamental problem in intelligent transportation and has been extensively studied by many researchers over time, with applications in a wide range of areas. Initial research was focused on statistically based methods including VAR [7], ARIMA [8], and HA. These models are underpinned by mathematical theory, but they rely on the assumption of linearity in the prediction task, which is not consistent with the nonlinear nature of the traffic data, leading to poor prediction results. Models based on machine learning may be a good solution to this problem e.g., SVM [9] and KNN [10], but good results rely on high-quality manual feature generation, which undoubtedly leads to complex and time-consuming modeling. Models based on deep learning perform well in other domains, automatically extracting network features from a given dataset, obviating the need for manual feature generation, and alleviating modeling complexity. The success of convolutional neural networks (CNN) in computer vision tasks has been so great that some academics have used them for traffic [13,14,23]. However, modeling methods that use mesh partitioning for convolutional operations are not, on their own, sufficient to capture the topology of road networks. The RNN-based model is very suitable for sequence data. It is a widely used practice to capture temporal and spatial features by combining long Short-term Memory (LSTM) and convolutional neural networks [14]. However, these methods can only use off-the-shelf solutions without exploring the correlation of different regions, and it is more challenging to preserve long-term information for long-sequence problems in recursive sequence models such as RNN and LSTM.

2.2. Graph Convolution

The recent emergence of graph convolutional networks is well suited to traffic prediction tasks, given that traffic path networks are natural graph topologies. Much of the work is based on modeling in either the time and space dimensions separately, or in the space-time dimension. Since traffic data must be considered correlated in both time and space dimensions, a common approach to combining convolution with recursive models such as RNNs is to use convolution in place of the matrix multiplication operation of recursive models with convolution on a local spatial-temporal graph [10,18]. Most tasks in spatial dependency capture use predefined graphs built based on Euclidean distances to capture spatial dependencies, but in real-world traffic networks two sensors that have similar Euclidean distances may not exhibit strong spatial dependencies for certain specific purposes (intersections, roadway closures, two opposite lanes, roadway area functions). To address this issue, refs. [16,24] make use of adaptively learnable graph structures to capture the real spatial dependencies.

2.3. Attention Mechanisms

Attention mechanisms first appeared in the modeling of problems for natural language processing and are now used in a wide range of domains [25,26], providing efficient improvements to many tasks [27,28]. One immediate goal of using attention mechanisms is to score the various dimensions of the input and then weigh the features based on their scores to highlight the impact that important features have on downstream models or modules. The basic idea of the attention mechanism is as follows. Numerous models that have emerged in the field of traffic prediction also prove the efficacy of attention, such as [29,30,31].
Attention becomes self-attentive when the query, the key, and the value are the same, and in sequential tasks, it can parallelize processing and consider information from the global sequence more efficiently and quickly. The emergence of multi-headed attention [32], which can learn the correlation of different subspaces, allows the self-attention mechanism to bring greater flexibility and modeling ease to the problem compared to CNN and RNN.

3. Materials and Methods

3.1. Problem Formulation

Definition 1.
(Road Network). The road network can be viewed as a graph  G = ( V , E , A ) where  V = { v 1 , , v N }  is a set of  N  nodes ( N =   | V | ),  E  is an edge set, and A is the adjacency matrix of the traffic road network  G .
Definition 2.
(Traffic Flow Tensor). We use  X t N × F  to represent the traffic flow of N nodes in the traffic network at time  t F  is a feature dimension. We use  X = ( X 1 , X 2 , , X T ) T × N × F  to represent the traffic flow of all nodes in the road network at T time slices.
The goal of traffic flow forecasting is to predict the future flow of the entire traffic system through given historical flow information. Formally, our goal is to learn a function f to predict the traffic flow of T time steps in the future given the traffic flow observations of T historical time steps.
f : [ X ( t T + 1 ) , , X t ; G ] [ X ( t + 1 ) , , X ( t + T ) ]

3.2. Dynamic Graph Spatial-Temporal Neural Network

We show the framework of DGSTN, which consists of an embedding layer, S-T block, and an output layer, in Figure 2a. The input to the model is historical traffic data X = [ X ( t T + 1 ) , , X t ] T × N × F , and the output is to predict traffic flow Y = [ X t + 1 , , X t + T ] T × N × C over a period of time in the future. The input of each space-time block is H ( l 1 ) T × N × D , and the output is H ( l ) . The details are shown in Figure 2b.

3.2.1. Adaptive Graph

In this section, we describe how to leverage a given information graph G = ( V , E , A , X t T : t ) so static topology graphs and dynamic information graphs can learn from static topology and dynamic traffic information, respectively. Figure 2c shows the specific details of the adaptive graph.

Static Topology Graph

To solve the difficulty of capturing globally valid information using predefined graphs in most current models, we propose a static topology graph that learns the adjacency matrix of the optimal graph. It learns the implicit information that cannot be captured by predefined graphs, and then projects the hidden relationship into the predefined adjacency matrix to achieve complementary information. First, the initialization of the predefined adjacency matrix graph plays an important role in the learning of the adaptive graph, and we define the initialization as: L = I + D 1 2 A D 1 2 , where A is the adjacency matrix, I is the unit matrix, D N × N is the degree matrix, and D i i = i A i j is constructed as. The specific definition is as follows:
A 1 = R e l u ( E 1 E 2 + D i a g ( Λ ) )
Here, E 1 , E 2 N × F , Λ N is the learnable parameter, N is the size of the node, and F is the dimension of the learnable parameter, where F N . By multiplying E 1 and E 2 , one can generate a sparse graph, where D i a g ( Λ ) is used to generate the weights of the diagonal positions and R e l u is used to eliminate the weak connection after adding the two. Note that no prior knowledge is required for the sparse matrix A 1 here, and all parameters are learned end-to-end by stochastic gradient descent. Adaptive modules are then used to adaptively aggregate predefined and learnable sparse matrices.
S = S i g m o i d ( C o v ( A 1 + L ) )
A 3 = S A 1 + ( 1 S ) L
Here, we perform an adaptive aggregation operation on the resulting sparse matrix A 1 , the predefined matrix L , S i g m o i d is a nonlinear activation function and C o v is a convolutional layer of 1 × 1, where represents element multiplication. Since the new matrix is obtained by adaptive aggregation based on predefined graphs A 1 and learnable sparse matrices L , it not only retains the prior features of the predefined graphs L , but also learns node features through training, ensuring faster convergence in the iterative process.
A s = D 3 1 2 A 3 D 3 1 2
The normalization operation is performed on the final matrix obtained.

Dynamic Information Graph

The spatial correlation between different nodes will change with time, so it is necessary to use node information to mine the change in spatial correlation. First, for the given node information X T × N × F , we need to upgrade the feature dimension to D dimension; the specific formula is as follows:
S = F C ( X ) T × N × D
In this formula, F C ( · ) is a fully connected network and S represents the node attributes after linear mapping. To capture the dynamically changing spatial correlation over T time lengths, we need to use a one-dimensional dilated convolution to perform convolution operations on the time dimension:
D C ( S ) = ( t = d × ( k 1 ) T K = 0 K 1 w k · S t d × k )
Equation (7) represents a one-dimensional dilated convolution operation. We can stack multiple dilated convolutional layers and aggregate the time dimension to arrive at the formula:
M = D C 3 ( D C 2 ( D C 1 ( S ) ) )
Through formula (8), we convert S T × N × D into M N × D , where the overall parameter of the convolution kernel is w T × D × D . In our model, cosine similarity is used to calculate the spatial correlation between two nodes. Therefore, the relationship between two nodes can be expressed as:
S i j = M i · M j M i M j
Here, S i j represents the similarity between node i and node j . The higher the similarity, the stronger the spatial dependence and the higher the spatial correlation. Furthermore, the spatial dynamic graph A d can be expressed as the following form:
A d = S o f t m a x ( R e l u ( S i j ) )
The R e l u activation function here can eliminate negative connections and enhance the nonlinear capability, while the S o f t m a x function is used to normalize the dynamic information graph.

3.2.2. Multi-Scale Gated Time Convolution

Compared with recursive units, convolutional operations do not require sequential calculations, which can save a great deal of computational time. Compared to self-attention mechanisms that require a large number of parameters, they only require a few parameters, making the models more lightweight. In contrast to the design of TCN, we designed a multi-scale time convolution which consists of three GTU-gated convolutions with a different receptive field.
The input to the time-gated convolutional network is Z ( l ) T × N × D , where l is the number of ST layers. The size of the convolutional kernel is θ K × D × 2 D , and here the output of the network is Z ( l ) = θ Z ( l ) N × ( T ( S 1 ) ) × 2 D and the whole cell can be defined as follows:
Z ( l ) = σ ( Z ( l ) ) T a n h ( Z ( l ) )
Here σ stands for the T a n h activation function and for the Hadamard product. We employ a multi-scale gated convolution module to capture the dynamic temporal information of the traffic by adjusting the size of the convolution kernel to obtain different perceptual fields, which are used to capture the long-term and short-term temporal dependencies. The multi-scale GTU can be represented as follows:
Z o u t ( l ) = N o r m ( θ 1 Z ( l ) + θ 2 Z ( l ) + θ 3 Z ( l ) )
Here 1 × k 1 , 1 × k 2 , and 1 × k 3 are the sizes of the convolution kernels of θ 1 , θ 2 , and θ 3 , respectively. The operation fuses the features obtained from the GTU with three different receptive fields, resulting in a feature with a size of 3 T ( k 1 + k 2 + k 3 3 ) .

3.2.3. Spatial Attention

When modeling spatial-temporal data, spatial correlations change dynamically at different time steps. The most direct way to capture this feature is to use a fully connected spatial attention mechanism (FSA) to obtain the attention of all nodes at different times. However, in real road networks, many nodes are not directly connected due to geographical location or weak correlation. To address this issue, we employ an adaptive spatial attention method that captures the dynamic spatial correlation between nodes with realistic relationships. Figure 3 demonstrates the difference between fully connected spatial attention and adaptive spatial attention.
For fully connected spatial attention, it is first necessary to obtain the query, key, and value vectors of the self-attention mechanism:
Q t ( S ) = X t W Q S , K t ( S ) = X t W K S , V t ( S ) = X t W V ( S )
Here, W Q S , W K S , and W V S D × D are a set of learnable parameters and D is the dimensions of query, key, and value. Next, the dependencies between nodes are computed in the spatial dimension, and the attention scores of all nodes at time t are computed:
A t ( S ) = ( Q t ( S ) ) ( K t ( S ) ) D
At different time steps, the attention scores A t ( S ) are different between each node, which can dynamically capture the spatial correlation. Further, the output of spatial self-attention can be obtained by multiplying the attention score A t ( S ) with the Value matrix:
F S A ( Q t ( S ) , K t ( S ) , V t ( S ) ) = S o f t m a x ( A t ( S ) ) V t ( S )
The above formula is the standard full-spatial attention, where all nodes are related to each other. The adaptive spatial attention we adopted designs a mask matrix M that performs a masking operation on nodes with less spatial correlation at each time step, and only considers the correlation between nodes with a greater relationship to reality. When the correlation is less than the threshold, this means that the correlation between the two nodes is small so the attention between the two nodes is covered up, and the attention score is set to . Furthermore, the attention score A t ( S ) can be further tuned by multiplying it with the adaptive graph A a d a p . Therefore, adaptive spatial attention can be expressed as follows:
A S A ( Q t ( S ) , K t ( S ) , V t ( S ) , A a d a p ) = S o f t m a x ( ( A t ( S ) + M ) A a d a p ) V t ( S )
The symbol here represents the Hadamard product. Adaptive spatial attention accomplishes adaptive modeling of spatial correlations between real nodes. Multi-head spatial attention can be expressed as:
M A S A = ( A S A 1 , A S A 2 , , A S A h ) W O
By introducing a multi-head spatial-attention mechanism, where parallel attention heads are stitched together, hidden spatial dependencies can be captured from various subspaces.
Since the multi-headed attention mechanism completely discards the convolution and recursive operations, it is necessary to add a mark to each input to represent the timing and position relationship. To this end, we design a spatial embedding (SE) to better capture spatial dependencies. The adaptive graph A adap learned by the graph learning module can be initialized to obtain the spatial embedding S E N × N , which can capture the connectivity and distance relationship between nodes, and it can transform the spatial and temporal embeddings linearly and along the temporal and spatial dimensions to generate S E T × N × D .

3.2.4. Input and Output Layer

The input layer is used to map the input node to a high-dimensional space, and a 1 × 1 convolution is used to convert the data dimension into X T × N × D . To realize the function of multi-step prediction, the output layer uses two 1 × 1 convolutions to convert the hidden dimension into the required dimension X ^ T × N × C . The loss function uses the mean absolute error between the predicted value Y = [ X ^ t + 1 , , X ^ t + T ] and the true value X = [ X t T + 1 , , X t ] :
L = 1 T t = t + 1 t = t + T X ^ t X t 1

4. Results and Discussion

In this section, we present the experimental results of the DGSTN and baseline on two spatiotemporal datasets, using multiple evaluation indicators for comprehensive evaluation. During the study, we conducted ablation experiments on the model to analyze the effectiveness of each component and adaptive graph.

4.1. Datasets

To evaluate the performance of the proposed model, we conducted experiments on two real-world datasets: PeMS04 and PeMS08. PeMS is a unified database of traffic data collected by California transportation companies and partners on California highways, reporting data every 30 s. The dataset descriptions are as follows:
  • PeMS04: Traffic data collected by Caltrans Performance Measurement System (PeMS) from 307 detectors in the San Francisco Bay Area from 1 January 2018 to 28 February 2018.
  • PeMS08: Traffic information collected by the Caltrans Performance Measurement System (PeMS) from 170 detectors in the San Bernardino area from 1 July 2016 to 31 August 2016.
The traffic flow data of the two datasets is recorded every 5 min, with a total of 288 pieces of data per day. The missing values in the above two datasets are filled in by linear interpolation, and the training is made more stable by normalizing the dataset by the standard normalization method x = x m e a n ( x ) . In the forecasting process, this paper uses one hour’s historical data to predict the next hour’s data; that is to say the historical data of 12 time steps is used to predict the future data of the next 12 time steps. The two datasets are divided into training, validation, and test sets in chronological order, with a segmentation ratio of 6:2:2. Table 1 summarizes the key information of these two datasets.

4.2. Baseline Method

We compared the proposed framework with baseline methods, including classical methods and advanced neural network methods.
  • HA: Historical average value, which uses traffic flow data from the past period and calculates its average value to achieve prediction.
  • ARIMA [8]: The Kalman filter autoregressive comprehensive moving-average model is a classic time-series prediction model.
  • FNN: feedforward neural network, the neural network of multiple hidden layers.
  • LSTM: Due to its memory function, LSTM can use long sequence information to construct learning models.
  • DCRNN [13]: Diffusion convolutional recurrent neural network combines a recurrent neural network with diffusion convolution to model the relationship between traffic inflow and outflow.
  • ASTGCN [15]: An attention-based spatiotemporal graph convolutional network for traffic flow prediction. By overlaying attention layers and convolutional layers, temporal and spatial features in the data were proposed to obtain more effective temporal and temporal features.
  • STSGCN [18]: Spatiotemporal synchronous graph convolutional network. To more effectively capture complex local spatiotemporal correlations more, a spatiotemporal synchronization graph modeling mechanism is proposed.
  • GWN [16]: Graph WaveNet for deep spatiotemporal graph modeling. A graph convolutional architecture, which proposes an adaptive graph to capture spatial correlations and uses diffusion convolution to capture temporal relationships, is suggested.
  • AGCRN [24]: An adaptive graph convolutional recursive network for traffic volume prediction. This modifies commonly used graph convolutions through node-adaptive parameter learning and adaptive graph-generation modules, and combines graph convolution with GRU to explore spatiotemporal correlations in data.
  • ASTGNN [30]: The learning dynamics and heterogeneity of spatiotemporal map data for traffic prediction. This model adopts a self-attention mechanism to capture features in both temporal and spatial dimensions.

4.3. Experiment Settings

All experiments in this paper were conducted on a machine equipped with NVIDIA GeForce 3060ti and 16 GB of RAM. The models in this paper were implemented using Windows 11, Py-Torch 1.17, and Python 3.9. Similar training settings to those in [33,34] were used, and were trained using an Adam optimizer with learning rate of 0.001, a batch size of 32, and an epoch of 100, and using an early stopping strategy to prevent overfitting, both in training the baseline model and the model proposed in this paper. The layer of DSTGN was set to 3, and the embedding dimension was set to 128. The convolution kernels k 1 , k 2 and k 3 of the gated temporal convolution were set to 3, 5, and 8, respectively, and the number of heads of spatial multi-head attention was set to 8.

4.4. Performance Comparison

We used three widely used metrics: mean absolute error (MAE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE), to measure the predictive performance of the model. We compared the predictive performance of the proposed model with the baseline model on the PeMS04 and PeMS08 datasets. Table 2 shows the average performance of the model presented in this paper and the baseline model over the next hour. Our model achieves the best performance at different moments. (1) First, it can be found that both our model and the deep-learning model are ahead of the traditional methods HA and ARIMA, something which shows that deep learning is very effective in modeling time-series forecasting. (2) Compared with the deep-learning model LSTM which only models the time dimension, our model and some deep-learning models which consider graph structure information are ahead, indicating a strong spatial-temporal dependency in the spatial-temporal sequence. (3) Our model and the self-attention-based ASTGCN and AASTGN models outperform recurrent networks such as DCRNN and LSTM in terms of performance, suggesting that capturing dynamic spatial-temporal correlations is highly necessary. (4) Our model and GWN which consider the graph relationship are better than the graph model DCRNN and STSGCN in terms of effect, indicating that the adjacency graph constructed using only Euclidean distance cannot reflect the real spatial relationship, and that exploring the real node relationship can improve model performance. (5) Compared with GWN and AGCRN, which only consider static node relationships, our model also considers dynamic node relationship changes, learns time-varying spatial characteristics, and demonstrates more powerful performance. (6) Compared with models such as ASTGCN and ASTGNN which use self-attention mechanisms, our model uses spatial attention to interact with the adaptive graph structure proposed in this paper and adaptively selects relevant nodes for spatial attention calculations. The performance of the model is improved, and the effectiveness of considering node relationships is further demonstrated. (7) Compared with the baselines, our model has a significant lead in the long-term prediction effect. Table 3 and Table 4, Figure 4 show the changes which occur in the prediction performance of various methods as the prediction interval increases in the two datasets.
To understand and evaluate the predictive performance of the model more intuitively, we visualized the predictive effect of the model. We chose sensors No. 7 and No. 157 in the PeMS08 data set and took the real data for 24 h a day from the No. 7 sensor to visualize the prediction effect of STSGCN, ASTGCN, and the model proposed in this paper. It can be seen from Figure 5a that STSGCN has a lower degree of fitting with ASTGCN, and DSTGN has a closer effect on the prediction of traffic flow than the accurate prediction. Compared with the other two models, the predicted situation of DSTGN is more in line with the actual situation during the two periods of 3:00 pm to 6:00 pm and 6:00 pm to 8:00 pm. At the same time, while our model is good at capturing the inherent patterns of time series, it can also effectively avoid over-fitting problems. For example, in Figure 5b we visualize the accurate and predicted traffic flow data recorded by sensor No. 157 from 19 August 2016 to 23 August 2016. It can be seen intuitively that the sensor generally reaches the low peak of the day at around 2:00 pm and reaches the highest peak around 2:00 pm. At 2:00 pm on 19 August 2016, the traffic flow suddenly had an abnormal increase. That night it suddenly decreased abnormally, resulting in a low peak at around 2:00 am on 20 August 2016, compared to other days, with lower peaks and valleys. However, our model did not firmly fit the abnormal changes in that day’s data. Our model achieves impressive predictions, but some local predictions may need improvement due to random noise.

4.5. Ablation Experiment

For this section, we conducted ablation experiments on the adaptive graph structure on the PeMS-04 dataset to verify the effectiveness of the model. First, we defined three different graph structures: static matrix (S), dynamic information matrix (D), and adjacency matrix (A). For the above three different graph structures, we made different combinations.
Figure 6 shows the detailed results of the model’s average results for one hour and predictions for each time slice under different graph structure combinations. In the prediction effect of PeMS04 data set, the combination of static topology matrix and dynamic information matrix shows the best effect, while the case of only considering the adjacency graph is the worst, which further shows that only using the distance of each node is not enough to judge the relationship strength of nodes; further exploring the effective node relationships is very effective. Based on the adjacency graph, adding dynamic graph information leads to the effect being greatly improved, something which shows that it is very effective to construct a dynamic graph by introducing traffic similarity and further proves the necessity of capturing the dynamic characteristics of node traffic. The graph structure composed solely of static graphs has a greatly improved effect compared with the adjacency matrix, which shows the effectiveness of mining hidden relationships between nodes. The adaptive graph structure composed of static graphs and dynamic graphs works best, further proving the necessity of capturing hidden spatial relationships and flow characteristics between various nodes. In short, the design of each graph has a positive impact on performance improvement.

4.6. Model Efficiency Study

In this section, we compared the computational efficiency of the model with the training time and inference time, and the results are shown in Table 5. AASTGN obtains the optimal computational efficiency. Unlike DCRNN which uses a recurrent network, AASTGN can directly generate all predictions so the running speed is faster than DCRNN. STSGCN obtains the space-time graph information for adjacent time steps for modeling, and needs to be calculated at each time step, while AASTGN uses adaptive node embedding and learns the dynamic characteristics of node information, and can learn faster. ASTGCN and ASTGNN use the self-attention mechanism, which results in a substantial increase in computing speed. Compared with the previous two models, DGSTN has better computational costs for improved adaptive spatial attention and multi-scale temporal convolution.

4.7. Research on the Validity of Static Topology Graph

To visualize the effectiveness of the static topology map, we selected the top 50 sensors in the PeMS04 dataset as our research focus. Figure 7a shows the sensor correlation heat map of the adjacency matrix graph, and Figure 7b shows the sensor correlation heat map of the static topology graph. Comparing the two heat maps, it can be seen that the static topology map has been adjusted many times on the basis of the adjacency matrix. Static topology maps learn from predefined mappings, thus preserving some of their basic characteristics. However, unlike the predefined graph, the adaptive graph learns some hidden relationships in the road network structure; for example, static topology weakens the relationship between sensor 36 and sensor 47 and enhances the influence of sensor 19 on sensor 36. In this case, sensor 19 and sensor 36 are not directly connected geographically, but adaptive graph learning reveals a strong hidden correlation between them. We visualized the traffic flow curves of sensor 19 and sensor 36 within a day, and it can be seen from the graph that sensor 19 and sensor 36 have a high spatial correlation with each other. This indicates that the adjacency matrix graph cannot express the true node dependency because two sensors with close geographical locations may not always have strong dependencies, which further indicates that our static topology graph can find the hidden spatial dependencies in the road network.

5. Conclusions

This paper proposes a new traffic flow prediction model called DGSTN. DGSTN introduces a set of adaptive graphs, including static topological graphs that can explore accurate spatial correlations and dynamic information graphs that explore dynamic traffic features. The method mines the features between nodes to characterize genuine traffic node relationships. The model is a spatial-temporal module consisting of multi-scale gated convolution and adaptive spatial attention for exploring accurate spatial-temporal correlations. An empirical study on two traffic datasets shows that DGSTN achieves superior performance. The effectiveness of the components is demonstrated by ablation experiments and visualization of the static topological matrix, which indicate that the model has a very high potential for exploring fundamental spatial-temporal structures. In addition, the graph structure’s high flexibility and extensibility indicate that the model is innovative, useful, and practical. We use multi-headed attention to interact with the graph structure to capture the spatial structure. Although we can capture the relevance from a global graph, developing a more lightweight model is necessary because multi-headed attention uses dot products for computation, something which requires enormous computational effort. In the next step, we can further apply the proposed framework to other spatial-temporal sequence prediction tasks, such as the evolution of social networks, weather, and air quality prediction, etc.

Author Contributions

Writing—original draft preparation, Z.L.; Software, Z.L.; Writing—review and editing, M.J.; Funding acquisition, M.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Research Project of the Science and Technology Innovation Think Tank of the Fujian Society of Science and Technology under Grant No. FJKX-2022XKB023; National Social Science Foundation of China under Grant No. 22BGL007; Fujian Social Sciences Federation Planning Project under Grant No. FJ2021XZB089 and No. FJ2021Z006; Project of the Science and Technology Innovation Think Tank of the Fujian Society of Science and under Grant No. FJKX-A2113; Fujian University of Technology under Grant No. GY-S20042.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, and further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the reviewers and editors for improving this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wan, S.; Ding, S.; Chen, C. Edge computing enabled video segmentation for real-time traffic monitoring in internet of vehicles. Pattern Recognit. 2022, 121, 108146. [Google Scholar] [CrossRef]
  2. Busacca, F.; Grasso, C.; Palazzo, S.; Schembra, G. A smart road side unit in a microeolic box to provide edge computing for vehicular applications. IEEE Trans. Green Commun. Netw. 2022, 7, 194–210. [Google Scholar] [CrossRef]
  3. Spandonidis, C.; Giannopoulos, F.; Sedikos, E.; Reppas, D.; Theodoropoulos, P. Development of a MEMS-based IoV system for augmenting road traffic survey. IEEE Trans. Instrum. Meas. 2022, 71, 1–8. [Google Scholar] [CrossRef]
  4. Chu, T.; Wang, J.; Codecà, L.; Li, Z. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1086–1095. [Google Scholar] [CrossRef]
  5. Feng, J.; Li, Y.; Zhang, C.; Sun, F.; Meng, F.; Guo, A.; Jin, D. Deepmove: Predicting human mobility with attentional recurrent networks. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1459–1468. [Google Scholar] [CrossRef]
  6. Gao, Q.; Zhou, F.; Trajcevski, G.; Zhang, K.; Zhong, T.; Zhang, F. Predicting human mobility via variational attention. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2750–2756. [Google Scholar] [CrossRef]
  7. Chen, C.; Petty, K.; Skabardonis, A. Freeway performance measurement system: Mining loop detector data. Transp. Res. Rec. 2001, 1748, 96–102. [Google Scholar] [CrossRef]
  8. Williams, B.M.; Hoel, L.A. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. J. Transp. Eng. 2003, 129, 664–672. [Google Scholar] [CrossRef]
  9. Jeong, Y.S.; Byon, Y.J.; Castro-Neto, M.M.; Easa, S.M. Supervised weighting-online learning algorithm for short-term traffic flow prediction. IEEE Trans. Intell. Transp. Syst. 2013, 14, 1700–1707. [Google Scholar] [CrossRef]
  10. Van Lint, J.W.C.; Van Hinsbergen, C. Short-term traffic and travel time prediction models. Artif. Intell. Appl. Crit. Transp. Issues 2012, 22, 22–41. [Google Scholar]
  11. Bildirici, M.; Bayazit, N.G.; Ucan, Y. Modelling oil price with lie algebras and long short-term memory networks. Mathematics 2021, 9, 1708. [Google Scholar] [CrossRef]
  12. Ersin, Ö.Ö.; Bildirici, M. Financial Volatility Modeling with the GARCH-MIDAS-LSTM Approach: The Effects of Economic Expectations, Geopolitical Risks and Industrial Production during COVID-19. Mathematics 2023, 11, 1785. [Google Scholar] [CrossRef]
  13. Li, Y.; Yu, R.; Shahabi, C. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar] [CrossRef]
  14. Seo, Y.; Defferrard, M.; Vandergheynst, P.; Bresson, X. Structured sequence modeling with graph convolutional recurrent networks. In Proceedings of the Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, 13–16 December 2018; pp. 362–373. [Google Scholar] [CrossRef]
  15. Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar] [CrossRef]
  16. Wu, Z.; Pan, S.; Long, G. Graph wavenet for deep spatial-temporal graph modeling. arXiv 2019, arXiv:1906.00121. [Google Scholar] [CrossRef]
  17. Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI conference on artificial intelligence, Honolulu, HI, USA, 27–28 January 2019; pp. 922–929. [Google Scholar] [CrossRef]
  18. Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 914–921. [Google Scholar] [CrossRef]
  19. Oord, A.; Dieleman, S.; Zen, H. Wavenet: A generative model for raw audio. arXiv 2016, arXiv:1609.03499. [Google Scholar] [CrossRef]
  20. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
  21. Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar] [CrossRef]
  22. Yu, B.; Yin, H.; Zhu, Z. Spatial-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar] [CrossRef]
  23. Chen, W.; Chen, L.; Xie, Y.; Cao, W.; Gao, Y.; Feng, X. Multi-range attentive bicomponent graph convolutional network for traffic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 3529–3536. [Google Scholar] [CrossRef]
  24. Bai, L.; Yao, L.; Li, C.; Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 17804–17815. [Google Scholar] [CrossRef]
  25. Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The long-document transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar] [CrossRef]
  26. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar] [CrossRef]
  27. Arnab, A.; Dehghani, M.; Heigold, G.; Sun, C.; Lučić, M.; Schmid, C. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 20–25 June 2021; pp. 6836–6846. [Google Scholar]
  28. Bao, H.; Dong, L.; Piao, S.; Wei, F. Beit: Bert pre-training of image transformers. arXiv 2021, arXiv:2106.08254. [Google Scholar] [CrossRef]
  29. Zheng, C.; Fan, X.; Wang, C.; Qi, J. Gman: A graph multi-attention network for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 1234–1241. [Google Scholar] [CrossRef]
  30. Guo, S.; Lin, Y.; Wan, H. Learning dynamics and heterogeneity of spatial-temporal graph data for traffic forecasting. IEEE Trans. Knowl. Data Eng. 2021, 34, 5415–5428. [Google Scholar] [CrossRef]
  31. Xu, M.; Dai, W.; Liu, C.; Gao, X.; Lin, W.; Qi, G.J.; Xiong, H. Spatial-temporal transformer networks for traffic flow forecasting. arXiv 2020, arXiv:2001.02908. [Google Scholar] [CrossRef]
  32. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
  33. Shao, Z.; Zhang, Z.; Wei, W.; Wang, F.; Xu, Y.; Cao, X.; Jensen, C.S. Decoupled dynamic spatial-temporal graph neural network for traffic forecasting. arXiv 2022, arXiv:2206.09112. [Google Scholar] [CrossRef]
  34. Shin, Y.; Yoon, Y. Pgcn: Progressive graph convolutional networks for spatial-temporal traffic forecasting. arXiv 2022, arXiv:2202.08982. [Google Scholar] [CrossRef]
Figure 1. Findings on traffic prediction; (a) Spatial dependence changes dynamically over time. (b) Hidden spatial dependencies.
Figure 1. Findings on traffic prediction; (a) Spatial dependence changes dynamically over time. (b) Hidden spatial dependencies.
Mathematics 11 02528 g001
Figure 2. The framework of Dynamic Graph Spatial−Temporal Neural Network (DGSTN). (a) DGSTN architecture; (b) ST block; (c) Adaptive graph.
Figure 2. The framework of Dynamic Graph Spatial−Temporal Neural Network (DGSTN). (a) DGSTN architecture; (b) ST block; (c) Adaptive graph.
Mathematics 11 02528 g002
Figure 3. Different types of attention. (a) Fully connected spatial attention; (b) Adaptive spatial attention. The numbers in the upper panels in (a,b) represent the geographic nodes where they are located, and the figures in the lower panels represent the corresponding spatial attention matrices. Among them, green represents the self-connection of nodes, blue represents fully connected spatial attention, indicating that each node is related to each other, and yellow represents adaptive spatial attention, indicating that each node is adaptively associated.
Figure 3. Different types of attention. (a) Fully connected spatial attention; (b) Adaptive spatial attention. The numbers in the upper panels in (a,b) represent the geographic nodes where they are located, and the figures in the lower panels represent the corresponding spatial attention matrices. Among them, green represents the self-connection of nodes, blue represents fully connected spatial attention, indicating that each node is related to each other, and yellow represents adaptive spatial attention, indicating that each node is adaptively associated.
Mathematics 11 02528 g003
Figure 4. Performance comparison of the tested models at each horizon on PeMS04 and PeMS08 dataset for traffic flow prediction, where one horizon denotes 5 min.
Figure 4. Performance comparison of the tested models at each horizon on PeMS04 and PeMS08 dataset for traffic flow prediction, where one horizon denotes 5 min.
Mathematics 11 02528 g004
Figure 5. Visualization on the PeMS08 dataset. (a) Visualization of node 7 of the PeMS08 dataset; (b) Visualization of node 157 of the PeMS08 dataset.
Figure 5. Visualization on the PeMS08 dataset. (a) Visualization of node 7 of the PeMS08 dataset; (b) Visualization of node 157 of the PeMS08 dataset.
Mathematics 11 02528 g005aMathematics 11 02528 g005b
Figure 6. Ablation study on the PeMS04 dataset.
Figure 6. Ablation study on the PeMS04 dataset.
Mathematics 11 02528 g006
Figure 7. Analyzing the effectiveness of static topology graph research on the PeMS04 dataset. (a) Predefined adjacency matrix heat map; (b) Heat map of the graph matrix obtained by static topology graph; (c) One-day traffic flow on sensors 19 and 36.
Figure 7. Analyzing the effectiveness of static topology graph research on the PeMS04 dataset. (a) Predefined adjacency matrix heat map; (b) Heat map of the graph matrix obtained by static topology graph; (c) One-day traffic flow on sensors 19 and 36.
Mathematics 11 02528 g007
Table 1. Dataset description for experiments.
Table 1. Dataset description for experiments.
DatasetsNodesEdgesTime StepsTime Range
PeMS0430734016,9921 January 2018–28 February 2018
PeMS0817029517,8561 July 2016–31 August 2016
Table 2. The average prediction performance for the next hour of DGSTN and other different methods on two real-world datasets.
Table 2. The average prediction performance for the next hour of DGSTN and other different methods on two real-world datasets.
ModelPeMS04PeMS08
MAERMSEMAPE (%)MAERMSEMAPE (%)
HA24.5039.8316.5821.1936.6413.79
ARIMA31.5547.5721.4025.2737.7715.539
FNN26.8241.5619.9822.4034.7122.47
LSTM25.6940.0217.7620.2431.8412.78
DCRNN23.0535.7215.9718.2928.6111.62
ASTGCN21.9934.9714.4918.5328.6911.21
STSGCN21.4134.2814.4917.7927.3711.70
GWN20.8232.3514.7015.8624.9710.13
AGCRN19.6832.2713.0416.9026.7710.53
ASTGNN19.3331.2013.1415.8125.039.97
DGSTN18.8830.8612.4715.2724.339.82
Table 3. The prediction effect of DGSTN and other methods on PeMS04 datasets with different step lengths for the next hour.
Table 3. The prediction effect of DGSTN and other methods on PeMS04 datasets with different step lengths for the next hour.
ModelMAERMSEMAPE (%)
15 min30 min60 min15 min30 min60 min15 min30 min60 min
HA22.0325.9834.9234.4240.0152.0316.8319.2725.51
LSTM21.1324.9033.4033.2538.5449.9614.2616.9823.89
DCRNN19.9522.6428.1531.3034.9742.2913.6015.5719.97
ASTGCN19.8421.6225.9931.6234.2740.6013.2014.3416.87
STSGCN19.8021.2424.2031.9334.0438.1813.5114.2416.31
GWN19.0320.6823.8829.8732.1536.5313.0314.8217.38
AGCRN18.8719.5921.0730.8932.1334.3612.5513.0013.89
ASTGNN18.1719.4521.3029.5331.5734.3612.5512.7713.95
DGSTN18.0419.0920.5529.5931.5333.8112.1112.7413.72
Table 4. The prediction effect of DGSTN and other methods on PeMS08 datasets with different step lengths for the next hour.
Table 4. The prediction effect of DGSTN and other methods on PeMS08 datasets with different step lengths for the next hour.
ModelMAERMSEMAPE (%)
15 min30 min60 min15 min30 min60 min15 min30 min60 min
HA18.2821.4529.8128.2333.2644.4020.3321.8626.46
LSTM16.5719.6126.5825.8830.7640.2910.3012.2917.14
DCRNN15.7918.0222.4924.4528.0834.4810.0111.4214.31
ASTGCN16.3518.4022.2525.2528.4333.7710.1111.0613.10
STSGCN16.4017.6820.1525.1027.3530.9210.9111.5213.01
GWN14.4915.8517.9122.7525.1028.219.1810.1011.31
AGCRN15.4516.6819.5324.2526.4330.789.6310.3412.19
ASTGNN14.2315.7818.2522.4724.9928.559.239.9211.22
DGSTN14.2615.3016.9222.4624.4126.899.209.7610.84
Table 5. The computation cost on the PeMS04 dataset.
Table 5. The computation cost on the PeMS04 dataset.
ModelComputation Time
Training (s/Epoch)Inference (s)
DCRNN93.20 s11.91 s
STSGCN196.98 s26.69 s
ASTGCN84.39 s9.45 s
ASTGNN101.37 s47.91 s
DGSTN74.01 s9.99 s
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, M.; Liu, Z. Traffic Flow Prediction Based on Dynamic Graph Spatial-Temporal Neural Network. Mathematics 2023, 11, 2528. https://doi.org/10.3390/math11112528

AMA Style

Jiang M, Liu Z. Traffic Flow Prediction Based on Dynamic Graph Spatial-Temporal Neural Network. Mathematics. 2023; 11(11):2528. https://doi.org/10.3390/math11112528

Chicago/Turabian Style

Jiang, Ming, and Zhiwei Liu. 2023. "Traffic Flow Prediction Based on Dynamic Graph Spatial-Temporal Neural Network" Mathematics 11, no. 11: 2528. https://doi.org/10.3390/math11112528

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop