Spatio-Temporal Multi-Graph Convolution Traffic Flow Prediction Model Based on Multi-Source Information Fusion and Attention Enhancement

Li, Wenjing; Sun, Zhongning; Wan, Yao

doi:10.3390/app152011295

Open AccessArticle

Spatio-Temporal Multi-Graph Convolution Traffic Flow Prediction Model Based on Multi-Source Information Fusion and Attention Enhancement

by

Wenjing Li

^*

,

Zhongning Sun

and

Yao Wan

School of Resources and Environmental Engineering, Wuhan University of Science and Technology, Wuhan 430081, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 11295; https://doi.org/10.3390/app152011295

Submission received: 28 September 2025 / Revised: 19 October 2025 / Accepted: 20 October 2025 / Published: 21 October 2025

(This article belongs to the Special Issue Advanced Methods for Time Series Forecasting)

Download

Browse Figures

Versions Notes

Featured Application

MIFA-ST-MGCN has significant potential applications in intelligent transportation systems (ITS), particularly for real-time traffic flow prediction and urban traffic management. By integrating multi-source data such as weather conditions and Points of Interest (POI), the model can provide more accurate and adaptive traffic forecasting, essential for optimizing traffic signal control, route planning, and congestion management. Additionally, its robustness to noisy data makes it ideal for deployment in dynamic, real-world environments where data quality can fluctuate. The ability to dynamically weight temporal and spatial features allows for more precise traffic predictions, which can directly improve the efficiency and safety of urban transportation systems, reduce congestion, and contribute to sustainable urban planning.

Abstract

Traffic flow prediction plays a vital role in intelligent transportation systems, directly affecting travel scheduling, road planning, and traffic management efficiency. However, traditional methods often struggle to capture complex spatiotemporal dependencies and integrate heterogeneous data sources. To overcome these challenges, we propose a Spatio-temporal Multi-graph Convolution Traffic Flow Prediction Model based on Multi-source Information Fusion and Attention Enhancement (MIFA-ST-MGCN). The model adopts adaptive data fusion strategies according to spatiotemporal characteristics, achieving effective integration through feature concatenation and multi-graph structure construction. A spatiotemporal attention mechanism is designed to dynamically capture the varying contributions of different adjacency relations and temporal dependencies, thereby enhancing feature representation. In addition, recurrent units are combined with graph convolutional networks to model spatiotemporal data and generate more accurate prediction results. Experiments conducted on a real-world traffic dataset demonstrate that the proposed model achieves superior performance, reducing the mean absolute error by 3.57% compared with mainstream traffic flow prediction models. These results confirm the effectiveness of multi-source fusion and attention enhancement in improving prediction accuracy.

Keywords:

traffic flow prediction; multimodal data; graph convolutional network; attention mechanism

1. Introduction

With the profound progress of urbanization, as of 2023, the total number of motor vehicles nationwide has reached 435 million, and the average annual number of newly registered vehicles exceeds 34 million. The rapid growth of the vehicle population has significantly exacerbated traffic congestion and accident risks. From 2015 to 2020, the average annual growth rate of the number of traffic accident fatalities was 3.35%. The peak congestion index in major cities generally exceeds 2.0, and in developed cities, the congestion index during the morning and evening rush hours even reaches 3.0. Meanwhile, the frequent starts and stops resulting from congestion not only decrease traffic efficiency but also indirectly trigger emotional fluctuations among drivers, thereby further increasing the likelihood of accidents [1,2,3,4]. In this context, Intelligent Transportation Systems (ITS) have emerged as the crucial solution to these challenges. Traffic flow prediction, an essential component of ITS, can offer scientific support for the management and planning of urban transportation systems. Therefore, enhancing the accuracy of traffic flow prediction and formulating effective traffic management strategies to alleviate congestion, reduce accident probabilities, and improve travel efficiency are of paramount importance for urban traffic management, enhancing network efficiency, and reducing energy consumption. Traditional traffic prediction methods typically focus on extracting spatiotemporal characteristics from historical traffic data to forecast future traffic conditions. However, traffic conditions are not solely determined by historical traffic data; they are also significantly influenced by external information, including weather conditions, Points of Interest (POI), road conditions, and other environmental factors [5,6,7,8,9].

We: (1) designs a multi-source information fusion module that deeply explores the external temporal relationships between traffic flow and environmental factors, as well as their inherent spatial correlations with the road network structure, through feature-level fusion and multi-graph convolution fusion, thereby enhancing the model’s ability to perceive complex scenarios; (2) designs a spatiotemporal attention module that dynamically adjusts the model’s spatiotemporal focus on different time periods and regions through an attention mechanism, improving the prediction accuracy of the model; (3) proposes a Spatio-temporal Multi-graph Convolution Traffic Flow Prediction Model Based on Multi-source Information Fusion and Attention Enhancement (MIFA-ST-MGCN). This model integrates multi-source heterogeneous information, fusing external environmental data, regional similarity features, and traffic flow information, and incorporates the spatiotemporal attention module for feature enhancement. Ultimately, the traffic flow is predicted through a spatiotemporal graph convolutional network, and the model’s performance is evaluated using real traffic flow datasets. Additionally, we design ablation experiments to validate the effectiveness of attribute data fusion, multi-graph convolution, and spatiotemporal attention mechanisms. The model is also subjected to perturbation analysis, where Gaussian and Poisson noises are added to the original data to test the model’s robustness and stability.

2. Related Works

Traffic flow prediction is a crucial component of Intelligent Transportation Systems (ITS) and plays an important role in urban traffic control and development. Traffic flow prediction has undergone different evolutionary stages. In early research in the field of traffic flow prediction, due to limited understanding of the problem and technical constraints, traffic flow prediction was simply described as a time series statistical task. In this approach, mathematical statistics were used to extract linear or periodic patterns from historical traffic data to predict future traffic conditions. Among these methods, the Historical Average Model (HA) [10] predicts future traffic flow states by using historical averages. This model is simple in principle, computationally fast, but suffers from low prediction accuracy and is difficult to apply to complex traffic scenarios. Time series models such as the ARMA model and its variants [11,12,13] are based on autoregressive and moving average models. They predict based on the relationship between current and historical data while modeling periodicity and trends in the data. These models can capture the linear dependencies of the data well and are suitable for short-term traffic flow prediction where the data exhibits clear periodicity. These models are interpretable, computationally efficient, and cost-effective, making them widely used in simpler forecasting scenarios. However, these models are based on the assumption of time series stability and cannot handle nonlinear traffic features, making them ill-suited for dynamic changes.

With the continuous advancement of computer technology, machine learning-based traffic prediction models have gradually been applied in the field of traffic forecasting. Representative algorithms include K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and Bayesian Networks, among others. The K-Nearest Neighbors algorithm [14] selects the k most similar samples from historical data to the current state and predicts traffic flow through a weighted average. SVM [15] use kernel functions to map low-dimensional nonlinear traffic data to high-dimensional spaces, constructing an optimal separating hyperplane to handle nonlinear features. Bayesian Networks [16] model variable conditional dependencies by constructing a directed acyclic graph (DAG) and fit traffic flow using a Gaussian Mixture Model (GMM) with joint probability distributions, further predicting future traffic conditions. Compared to models based on mathematical statistics, these machine learning-based algorithms are capable of modeling more complex traffic flow features. However, they have limited ability to capture nonlinear traffic characteristics and long-term dependencies.

In recent years, deep learning has attracted significant attention from researchers due to its advantages in capturing nonlinear features and handling complex scenarios. Traffic flow data is a typical form of time-series data, and extracting its inherent temporal characteristics is one of the key challenges in traffic flow prediction. Recurrent Neural Networks (RNNs) have been widely applied in traffic prediction tasks due to their ability to effectively capture temporal dependencies in time-series data [17]. However, during the backpropagation process, the gradient can vanish, causing RNNs to be influenced only by short-term memory, thereby failing to capture long-term temporal features. To address this issue, researchers have designed Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU) to capture long-term temporal dependencies in traffic flow [18,19]. Although these models can capture the temporal dependencies of traffic flow, researchers have gradually recognized the importance of spatial dependencies in traffic prediction tasks. By incorporating Convolutional Neural Networks (CNNs) to extract spatial information from roads and combining them with LSTMs, the prediction accuracy can be improved [20]. However, since CNNs are primarily designed for regular grid data, they cannot directly handle irregular topological structures, thus failing to fully capture the spatial dependencies of traffic flow. Graph Convolutional Networks (GCNs), on the other hand, can directly process irregular graph network structures, enabling a better exploration of the spatial characteristics of traffic networks [21].

Beyond historical traffic data and spatial topology, traffic flow is also influenced by a variety of external factors. For instance, weather conditions, special events, and the spatial distribution of points of interest can all introduce disturbances to traffic patterns [22,23,24]. Moreover, regions with similar road network structures often exhibit comparable traffic behaviors, and interregional interactions further affect the evolution of traffic flows. Effectively integrating these external factors and capturing interregional dependencies within predictive models remains a significant challenge in contemporary traffic flow forecasting [25,26,27].

Notably, some recent research on fundamental traffic flow theory has provided new perspectives for understanding and predicting traffic congestion. On the one hand, studies from a traffic flow phase-transition perspective have proposed a “congestion boundary” approach to characterize the critical transformation between free-flow and congested states. This method leverages the bimodal distribution of traffic parameters (e.g., speed, density) to identify threshold values that separate free flow from congestion, thereby quantifying the critical conditions for traffic breakdown. For example, using real highway data, Lee et al. [28] estimated a congestion boundary at approximately 66.9 km/h (speed), 22.8 vehicles per kilometer (density), and about 341 vehicles per five-minute interval (flow). Based on these thresholds, it was observed that actual roadways can experience a phase transition into congestion when flow reaches only around 72–83% of the conventional theoretical capacity. Thus, the congestion boundary method provides a theoretical foundation for pinpointing the onset of congestion and enabling effective congestion management. On the other hand, another line of research has drawn inspiration from the theory of phase transitions in simple fluids, treating traffic flow as a fluid system analogous to a gas–liquid phase transition, in order to explore the scaling laws of urban congestion [29]. Laval et al. proposed that the fundamental diagram of traffic flow is analogous to the coexistence curve in gas–liquid phase transitions. Using this analogy, they demonstrated that urban traffic dynamics obey scaling relations characteristic of the Kardar–Parisi–Zhang (KPZ) universality class. Moreover, they found that the “costs” of congestion (such as travel delays and fuel consumption) scale superlinearly with city size (population), with growth even higher than predicted by conventional urban scaling theories [30]. Taken together, these macro-level theoretical studies provide new insights into the underlying mechanisms of traffic congestion and offer important guidance for alleviating congestion in large cities. In addition, in the related task of travel time prediction, spatiotemporal deep learning models have achieved remarkable progress. Lee et al. combined a Gated Recurrent Unit (GRU) network with spatiotemporal analysis, proposing a model for highway travel-time prediction. This model explicitly integrates spatial dependencies among road segments and temporal dependency features within the GRU, effectively reducing the lag of predicted travel times relative to actual conditions. Experiments demonstrated that a GRU model augmented with spatiotemporal features outperforms traditional RNNs, LSTMs, as well as a GRU baseline without spatial information, achieving the highest accuracy in travel time prediction at both the segment and route levels. This finding indicates that incorporating spatial correlations into time-series predictions can significantly enhance the accuracy of travel time estimates.

In summary, advanced models such as AST-GCN and ST-GRAT have achieved notable progress in spatiotemporal traffic forecasting. AST-GCN pioneered the integration of external (exogenous) information via an attribute-enhancement module, while ST-GRAT dynamically captures road-network dependencies through a carefully designed spatiotemporal attention mechanism. Nevertheless, these methods still have limitations: the fusion strategy in AST-GCN is relatively simple and does not fully exploit diverse spatial relations; and although ST-GRAT excels in attention modeling, its ability to fuse heterogeneous multi-source exogenous data (e.g., weather, POIs) remains underexplored. To address these limitations, we propose the MIFA-ST-MGCN model, whose core innovations are as follows:

Multi-level multi-source information fusion architecture: Unlike AST-GCN’s simple attribute concatenation, our model constructs three complementary graphs—a geographic adjacency graph, a POI functional-similarity graph, and a spatial-similarity graph—to enable deep, layer-wise fusion within the graph-convolutional hierarchy.
Adaptive multi-graph fusion mechanism: We devise a learnable weighted fusion scheme that dynamically adjusts the relative contributions of different graph structures to the forecasting objective, thereby addressing the weight-allocation challenge in multi-source information fusion.
Enhanced temporal modeling capacity: Building on conventional GRU-based sequence modeling, we incorporate Transformer-style multi-head self-attention to better capture long-range dependencies, thereby mitigating GRU’s limitations in long-sequence modeling.

The comparison of the characteristics of these models is shown in Table 1.

3. Model Design

MIFA-ST-MGCN captures space–time dependencies in traffic flow by integrating multi-graph convolution, spatiotemporal graph convolution, and spatiotemporal attention. The model can be divided into two branches: temporal and spatial. In the temporal branch, the model models the temporal attribute information and captures temporal features at different time scales through temporal attention. In the spatial branch, the model uses multi-graph convolution to capture various types of spatial dependencies, and combines spatial attention to better capture the influence of key road segments on traffic flow. Finally, the features from both the temporal and spatial branches are fused to generate the prediction results.

3.1. Problem Definition

The goal of traffic flow prediction is to forecast future traffic conditions based on historical states and both internal and external information. The traffic state of a road network is typically described using metrics such as traffic volume, average speed, and road occupancy.

Definition 1.

Traffic Network Graph G. In the field of traffic prediction, the traffic network can be represented as a graph

G = (V, A, E)

, where

V = {v_{1}, v_{2}, \dots, v_{n}}

denotes the set of sensors recording traffic-related information in the network, with

n

being the number of sensors, i.e., the number of nodes.

E = {e_{1}, e_{2}, \dots, e_{m}}

represents the set of road segments connecting pairs of sensors, where

m

is the number of road segments, i.e., the number of edges.

A \in R^{N \times N}

is the adjacency matrix used to represent the connectivity between sensors, where

a_{i j} \in A

is the element at the

i - t h

row and

j - t h

column of the adjacency matrix, indicating the connection status between nodes

v_{i}

and

v_{j}

. If

a_{i j} = 1

, it indicates that there is a road segment connecting nodes

v_{i}

and

v_{j}

; if

a_{i j} = 0

, it means there is no direct road segment connecting nodes

v_{i}

and

v_{j}

. Therefore, the adjacency matrix

A

is a binary matrix composed of 0 and 1.

Definition 2.

Traffic Flow Feature Matrix X. We use traffic speed as the primary node feature on the road-network graph, forming a matrix

X

.

x_{i}^{t} \in X

denotes the traffic speed at the

i - t h

sensor node at time

t

.

Definition 3.

Auxiliary Information K. Auxiliary information refers to environmental factors that influence the traffic flow state. We represent the environmental factors that affect traffic conditions as node-level auxiliary features, denoted by

K = {K_{1}, K_{2}, \dots, K_{p}}

, where

p

is the number of categories of auxiliary features. The auxiliary feature information of category

q

is represented as

K_{q} = {q^{1}, q^{2}, \dots, q^{t}}

, where

q_{i}^{t}

denotes the auxiliary feature information of category

q

at the

i - t h

sensor node at time

t

.

To sum up, the traffic flow prediction problem can be viewed as learning the traffic flow information for the future time period

m

by combining the traffic network graph

G

, the flow feature matrix

X

, and the auxiliary information

K

, through the establishment of a function f, that models the relationship between these components, i.e.,

x_{t + m} = f (G, X, K) .

(1)

3.2. Overall Framework

We proposes a Spatio-temporal Multi-graph Convolution Traffic Flow Prediction Model based on Multi-source Information Fusion and Attention Enhancement (MIFA-ST-MGCN). As shown in Figure 1, the model mainly consists of data preprocessing, a spatiotemporal attention module, and a spatiotemporal convolution module.

3.3. Spatio-Temporal Dependency Modeling

Traffic flow data exhibits dependencies not only in the spatial domain but also in the temporal domain. We choose spatiotemporal graph convolutional networks as the base prediction model. The Temporal Graph Convolutional Network (TGCN) [31] integrates graph convolutional networks with temporal forecasting, simultaneously modeling in both the spatial and temporal dimensions. This approach effectively captures the spatiotemporal features of traffic flow and addresses the spatiotemporal dependencies in traffic flow prediction.

3.3.1. Spatial Modeling: Graph Convolution Operation

Graph Convolutional Networks (GCN) capture the spatial dependencies of each road segment through neighborhood aggregation. The modeling process of GCN is illustrated in Figure 2 and can be divided into an input layer, hidden layers, and an output layer. The inputs to the GCN are twofold: the node feature matrix

X \in R^{N \times F}

, which describes the traffic conditions of each road segment, and the adjacency matrix

A \in R^{N \times N}

, which represents the connectivity between road segments, where

N

is the number of road segments and

F

is the feature dimension of each road segment. The hidden layers are the core components of the GCN, where convolution operations are defined on the graph structure to extract features. The computation for the

(l + 1) - t h

layer hidden state can be expressed as:

H^{(l + 1)} = σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{l} W^{l}),

(2)

where

\tilde{A} = A + I, \tilde{A} \in R^{N \times N}

is the adjacency matrix with self-loops added,

\tilde{D} \in R^{N \times N}

is the degree matrix,

H^{l}

is the output of the

l - t h

layer hidden state,

W^{l}

is the weight matrix of the

l - t h

layer, and

σ (\cdot)

is a nonlinear activation function. The output layer performs classification and regression on the hidden layer data through a fully connected layer.

3.3.2. Temporal Modeling: Graph Convolution Operation

Traffic flow data is a typical form of time-series data, and effectively capturing the temporal features within the data is crucial for prediction accuracy. We use Gated Recurrent Units (GRUs) to model the temporal dynamics in the data. Compared to traditional Recurrent Neural Networks (RNNs), GRU effectively mitigates the vanishing gradient problem in long time series by introducing a gating mechanism. In contrast to Long Short-Term Memory (LSTM) networks, GRU optimizes the gating mechanism, resulting in a simpler structure and higher computational efficiency.

The operation process of the GRU in handling time-series data is shown in Figure 3. At time step

t

, the GRU receives the traffic flow feature

x_{t} \in R^{B \times N \times (g r u_u n i t s + 1)}

at the current time step and the hidden state

h_{(t - 1)}

from the previous time step. Through the gating mechanism, it outputs the hidden state

h_{t}

at the current time step and passes it as the input hidden state to the next time step. The GRU’s gating mechanism consists of two gates: the update gate and the reset gate. The update gate

z_{t}

determines how much of the past traffic flow features

[x_{1}, x_{2}, \dots, x_{t - 1}]

should be retained at the current time step and how much of the current input feature

x_{t}

should be integrated into the new hidden state

\tilde{h} \in R^{B \times N \times g r u_u n i t s}

. The reset gate controls how much of the past traffic flow information should be forgotten. This process can be represented by Equations (3)–(6). Here,

B

represents the batch size,

N

represents the number of nodes, and

g r u_u n i t s

refers to the number of GRU units in the layer. We utilize GCN to extract spatial features at each time step, which are then input into a GRU to capture temporal dependencies, thereby constructing a fundamental spatiotemporal feature extraction framework.

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}]),

(3)

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}]),

(4)

{\tilde{h}}_{t} = t a n h (W \cdot [r_{t} \times h_{t - 1}, x_{t}]),

(5)

h_{t} = (1 - z_{t}) \times h_{t - 1} + z_{t} \times {\tilde{h}}_{t} .

(6)

3.4. Multi-Source Information Fusion Modeling

As a complex open system, the transportation system is influenced by various factors such as weather conditions and geographical location, which in turn affect traffic flow states. Single-source traffic flow data is insufficient to comprehensively capture these complex influencing factors. Therefore, this model considers different types of information and designs various data fusion strategies to enhance the model’s adaptability to external disturbances and environmental changes.

3.4.1. Graph Structure Construction

The changes in traffic conditions are closely related to the spatial correlations between regions. We model inter-regional spatial correlations by constructing spatial and functional similarity graphs.

Spatial Similarity Graph: According to Tobler’s First Law of Geography [32], spatial entities exhibit spatial autocorrelation, namely, closer entities are more strongly correlated, whereas entities farther apart tend to be more dissimilar. Based on this principle, we construct a spatial similarity graph

G_{s s} = (V, A_{s s}, E)

, where

A_{s s}

denotes the spatial similarity matrix. The spatial similarity between nodes

v_{i}

and

v_{j}

, denoted as

A_{s s} (v_{i}, v_{j})

, is calculated as shown in Equation (7):

A_{s s} (v_{i}, v_{j}) = \frac{1}{P L (v_{i}, v_{j}) + 1},

(7)

where

P L (v_{i}, v_{j})

represents the path length between nodes

v_{i}

and

v_{j}

.

Functional Similarity Graph: As the distance between regions increases, the spatial correlation between them gradually decreases. However, due to the potential similarity in the distribution of Points of Interest (POI), the traffic flow states between regions may exhibit similar patterns of change. To deeply explore the correlation between POI distribution and regional traffic flow state changes, we use the Jensen–Shannon (JS) divergence to measure the functional similarity between two nodes and define the functional similarity graph

G_{f s} = (V, A_{f s}, E)

.

A_{f s}

is the functional similarity matrix, and the functional similarity

A_{f s} (v_{i}, v_{j})

between nodes

v_{i}

and

v_{j}

is calculated as shown in Equations (8)–(10):

A_{f s} (v_{i}, v_{j}) = \{\begin{matrix} 1 - D_{J S} (v_{i} | | v_{j}) & D_{J S} (v_{i} | | v_{j}) \in [0, 1] \\ 0 & e l s e \end{matrix},

(8)

D_{J S} (v_{i} | | v_{j}) = \frac{1}{2} D_{K L} (v_{i} | | \frac{(v_{i} + v_{j})}{2}) + \frac{1}{2} D_{K L} (v_{j} | | \frac{(v_{i} + v_{j})}{2}),

(9)

D_{K L} (v_{i} | | v_{j}) = v_{i} (p) l o g (\frac{v_{i} (p)}{v_{j} (p)}) .

(10)

where

D_{K L} (v_{i} | | v_{j})

represents the KL divergence between nodes

v_{i}

and

v_{j}

, and

D_{J S} (v_{i} | | v_{j})

represents the JS divergence between nodes

v_{i}

and

v_{j}

. As shown in Equation (8),

D_{J S} (v_{i} | | v_{j}) = D_{J S} (v_{j} | | v_{i})

, and

v_{i} (p)

represents the POI distribution feature of node

v_{i}

.

3.4.2. Multi Graph Convolution Fusion

The impact of spatial data of the same type on the traffic state varies across different road segments, and the influence of different types of spatial data on the traffic state of the same road segment also differs. We employ a multi-graph convolutional fusion strategy to model the spatial dependencies that shape traffic flow. The processing flow of multi-graph convolution is illustrated in Figure 4.

The multi-graph convolution fusion strategy can be represented as

A_{f u s e d} = α_{g e o} A^{(g e o)} + α_{p o i} A^{(p o i)} + α_{s p a} A^{(s p a)}, α_{g e o} + α_{p o i} + α_{s p a} = 1,

(11)

where

A^{(g e o)}

represents the Geographic Adjacency Matrix,

A^{(p o i)}

is the POI Function Matrix, and

A^{(s p a)}

is the Spatial Similarity Matrix. As shown in Equation (11), we assign a learnable weight to each graph structure and optimize these weights end-to-end via gradient-based training. We apply a Softmax to normalize the weights so that

α_{1}

,

α_{2}

and

α_{3}

are nonnegative and sum to one, ensuring a well-balanced contribution of each component to the fused adjacency matrix. This multi-graph convolution fusion strategy allows the model to adaptively emphasize different spatial relations based on the data: for example, traffic states may at certain times be driven more by physically adjacent links, whereas in other scenarios synchronous fluctuations among functionally similar regions may dominate. Compared with models that rely on a single adjacency, multi-graph convolution can dynamically capture heterogeneous types of spatial correlation, thereby enhancing the model’s ability to characterize complex traffic patterns.

3.4.3. Feature Level Fusion

We integrate static attributes (e.g., POI) and dynamic exogenous variables (e.g., weather) with the traffic-flow inputs via feature concatenation. Compared to other methods, the feature concatenation operation is simple, and the model does not need to consider the spatial correlation effects between different nodes, allowing the model to focus more on the processing of temporal features.

The attribute features of static attribute data are fixed and do not change over time. For example, in the case of POI data, the distribution and quantity of POI remain constant within a given time and spatial range and do not change over time. Therefore, the feature matrix

S

of static attribute data can be represented as

S = {[s_{1}, s_{2}, \dots, s_{n}]}^{T}

, where

s_{i}

represents the static attribute feature of the

i - t h

node, and

n

is the total number of nodes. The significant characteristic of dynamic attribute features is that their attribute information changes over time. For example, weather conditions change dynamically at different time points, and the weather at the current time is influenced by past weather conditions and affects future weather conditions. Therefore, the feature matrix

D

of dynamic attribute data can be represented as

D = [d_{1}, d_{2}, \dots, d_{n}]

, and the dynamic feature information of node

i

,

d_{i}

, is represented as

[d_{i}^{t - m}, d_{i}^{t - (m + 1)}, \dots, d_{i}^{t}]

, where

t

is the current time step, and

m

is the historical time step length. The fused attribute feature matrix at time

t

can be represented as Equation (12):

E^{t} = [X^{t}, S, D^{t - m, t}] .

(12)

E^{t} \in R^{n \times (p + 1 + w * m)}

represents the feature concatenation matrix at time

t

, where

X^{t} \in R^{N \times F}

denotes the traffic flow features (e.g., speed) of all N road segments at time

t

.

N

denotes the number of nodes, and

F

denotes the feature dimensionality.

S = [s_{1}, s_{2}, \dots, s_{F_{s}}], S \in R^{N \times F_{s}}

is the static attribute feature matrix, and

F_{s}

is the number of static attribute types. Since static attribute information does not change over time, the static attributes are repeatedly used at different time steps.

D^{t = m, t} = [d_{1}^{t - m, t}, d_{2}^{t - m, t}, \dots, d_{F_{d}}^{t - m, t}], D^{t - m, t} \in R^{N \times F_{d} \cdot (m + 1)}

is the dynamic attribute feature matrix from

t - m

to

t

, where

F_{d}

is the number of dynamic attribute types. Since dynamic attribute features change over time, the information from time

t - m

to

t

is selected as the input for time

t

when modeling the dynamic attribute matrix. Adopting straightforward feature concatenation to fuse attribute data is simple and effective to implement. Compared with more complex fusion schemes, plain concatenation avoids introducing excessive parameters at the fusion stage, deferring importance weighting to subsequent attention modules and thereby mitigating overfitting risk.

We use POI data as a semantically explicit static descriptor and weather data as a dynamic feature of traffic flow. Specifically, for POI we take, for each road segment, the dominant POI-category code as its POI feature, apply min–max normalization to scale it to [0, 1], and replicate it across time to fill the temporal dimension to length

T

. For weather, we likewise use the weather-category code as the weather feature, min–max normalize it to [0, 1], and replicate it across space to fill the spatial dimension to

N

segments. For traffic-flow variables, we directly apply min–max normalization to [0, 1]. Through this dimensional replication (broadcasting), all inputs are expanded to a unified shape of

N \times N \times T

and then fed into subsequent modules for feature extraction and computation.

3.5. Attention-Enhancing Mechanism

Traffic flow data contains latent features that are difficult to capture in both the spatiotemporal dimensions. In the temporal dimension, traffic flow is nonlinear and dynamically changing, influenced by various external factors. In the spatial dimension, traffic flow exhibits complex spatial interactions, where the traffic state of other regions can directly or indirectly affect the traffic flow in the local region. We introduce a spatiotemporal attention module to capture spatiotemporal dependencies in traffic-flow data. The module consists of both temporal and spatial attention mechanisms. Time attention is used to dynamically adjust the weights of different time steps based on the historical sequence, while spatial attention highlights key road segments and graph structures that have a significant impact on traffic flow. Through a spatiotemporal multi-head self-attention mechanism, the model dynamically adjusts its attention to different time periods and regions, enhancing its ability to capture complex spatiotemporal dependencies.

The overall architecture of the spatiotemporal attention module is shown in Figure 5. This module employs a multi-head self-attention mechanism to capture the spatiotemporal dependencies in traffic flow data. The multi-head self-attention mechanism projects the traffic flow feature data onto multiple attention heads, and the weights for each attention head are computed in parallel. Then, the outputs of each attention head are concatenated together through a weighted fusion. The computation process of multi-head self-attention can be expressed as

\{\begin{matrix} \begin{matrix} Q_{i} = X W_{Q_{i}} \\ K_{i} = X W_{K_{i}} \\ V_{i} = X W_{V_{i}} \end{matrix} & i = 1, 2, \dots, h \end{matrix},

(13)

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) \cdot V,

(14)

O u t p u t = c o n c a t (Attention n_{1}, A t t e n t i o n_{2}, \dots, A t t e n t i o n_{h}) .

(15)

In the equations,

Q_{i}

represents the feature of the current time step or the current road segment;

K_{i}

denotes the traffic flow features of the historical time steps or other road segments;

V_{i}

represents the weighted features of each time step or road segment;

h

is the number of attention heads;

X

is the feature matrix. When processing temporal features,

X \in R^{B \times T \times F}

, where

B

is the batch size,

T

is the number of time steps, and

F

is the feature dimension. When using the attention mechanism to capture dynamic correlations between nodes,

X \in R^{B \times N \times F}

, where

N

is the number of nodes, i.e., the number of road segments.

W_{Q}, W_{K}, W_{V} \in R^{F \times d_{k}}

are learnable weight matrices, and

d_{k}

is the dimension of the Query and Key matrices.

3.6. Spatio-Temporal Multi Graph Convolution Model Based on Multi-Source Information Fusion and Attention Enhancement (MIFA-ST-MGCN)

Based on spatiotemporal graph convolutional networks, we have designed a spatiotemporal multi-graph convolution model with multi-source information fusion and attention enhancement to address the complex spatial dependencies and temporal dependencies in traffic flow prediction. As shown in Figure 6, the attribute feature fusion module concatenates the dynamic and static attribute matrices at each time step into the feature matrix

X^{t}

to expand its feature dimension. The expanded feature matrix is denoted as

E^{t ‘}

, and then, through temporal attention, the weights of each time step are adjusted to obtain

E^{t}

. The spatial feature fusion module adjusts the importance of different graphs and road segments using spatial attention mechanisms and multi-graph convolution, resulting in the fused adjacency matrix. The enhanced feature matrix

E^{t}

and the fused adjacency matrix

A_{f u s e d}

are then input into the model

f

to obtain the final prediction result

\bar{y}

:

{\bar{y}}_{t} = f (A_{f u s e d}, E^{t}) .

(16)

We construct the model

f

by combining a Graph Convolutional Network (GCN) with Gated Recurrent Units (GRU) to capture spatiotemporal dependencies in traffic-flow data. Specifically, the fused adjacency matrix

A_{f u s e d}

and the enhanced feature matrix

E^{t}

are first input into the GCN module, where multiple layers of graph convolution and non-linear activation are applied to extract spatial representations of each road segment at different time steps. These representations are subsequently fed into the GRU, which utilizes update and reset gates to regulate the preservation and adaptation of traffic information over time, thereby capturing temporal dynamics in the traffic flow.

The primary objective of traffic flow forecasting is to minimize prediction errors, ensuring that the predicted values closely approximate the actual observations. Accordingly, when designing the loss function, it is essential to reduce the prediction error to enhance the model’s forecasting accuracy:

L o s s = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\tilde{y}}_{i})}^{2} + λ \sum_{j} | | {\bar{ϖ}}_{j} | |^{2} .

(17)

Specifically,

\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\tilde{y}}_{i})}^{2}

denotes the L2 loss, which quantifies the discrepancy between the predicted and actual values, where

{\tilde{y}}_{i}

represents the predicted traffic flow of the

i - t h

road segment and

y_{i}

denotes the corresponding ground truth. The term

λ \sum_{j} | | {\bar{ϖ}}_{j} | |^{2}

corresponds to L2 regularization, which is introduced to mitigate overfitting. Here,

λ

is the regularization coefficient that controls the strength of the regularization.

4. Experiments

4.1. Datasets

In order to validate the effectiveness of the proposed model, we perform experiments on the real-world traffic dataset SZ_taxi [33] and METR_LA.

SZ_taxi: This dataset consists of taxi trajectory data collected from 156 major road segments in Luohu District, Shenzhen, spanning the period from 1 January to 31 January 2015, with a sampling interval of 15 min. The dataset is composed of two parts: the speed feature matrix and the adjacency matrix. The speed feature matrix is organized with timestamps as row indices and road segments as column indices, resulting in a matrix of size $2976 \times 156$ . The adjacency matrix is constructed to model the connectivity between road segments, with a size of $156 \times 156$ .
SZ_POI: This dataset contains the distribution of Points of Interest (POIs), encompassing nine categories: catering services, enterprises, shopping facilities, transportation infrastructure, educational services, living services, medical services, and accommodations. For each road segment, the most prevalent POI category within its surrounding area is selected as its static feature, resulting in a POI static matrix of size $156 \times 1$ .
SZ_Weather: This dataset records the weather conditions around the study area at 15 min intervals throughout January 2015. The data include five categories of weather: sunny, cloudy, foggy, light rain, and heavy rain. Based on these observations, a dynamic weather matrix of size $156 \times 2976$ is constructed.

4.2. Evaluation Indicators and Parameter Settings

4.2.1. Evaluation Indicators

We evaluate the predictive performance of the model using five commonly adopted metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Accuracy, Coefficient of Determination (R²), and Explained Variation (VAR).

RMSE: A widely used metric for quantifying the discrepancy between predicted and actual values, where a smaller value indicates lower prediction error of the model.

$R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\tilde{y}}_{i})}^{2}} .$

(18)
MAE: The average absolute difference between predicted and actual values, reflecting the model’s ability to control errors. A smaller MAE indicates that the model can provide more stable and reliable predictions.

$M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\tilde{y}}_{i} | .$

(19)
Accuracy: The proportion of predictions that exactly match the actual values. A higher accuracy indicates stronger predictive capability of the model.

$A c c u r a c y = 1 - \frac{1}{n} \sum_{i = 1}^{n} \frac{\sqrt{{(y_{i} - \tilde{y})}^{2}}}{y_{i}} .$

(20)
R²: A metric used to evaluate the goodness of fit of the model, where values closer to 1 indicate better predictive performance.

$R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \tilde{y})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}} .$

(21)
VAR: A metric that measures the extent to which the model accounts for the overall variance in the data, where values closer to 1 indicate stronger predictive capability.

$V A R = 1 - \frac{V A R (y - \tilde{y})}{V A R (y)} .$

(22)

4.2.2. Parameter Settings

All experiments were conducted on a computing platform equipped with an Intel Core i5-12600KF CPU and an NVIDIA RTX 4060 GPU with 8 GB memory. The model was implemented in Python (version 3.9.18) using the TensorFlow framework (version 2.10.0). After extensive parameter tuning, the number of hidden units in the GRU was set to 128, the number of heads in the multi-head attention mechanism was set to 4, and the Adam optimizer was adopted to accelerate model convergence. The learning rate of Adam was set to 0.001, the batch size to 64, and the total number of training epochs to 600.

4.3. Baseline Models

To evaluate the performance of the proposed MIFA-ST-MGCN model, comparative experiments were conducted against a variety of forecasting models, including both traditional approaches and deep learning-based methods.

Historical Average (HA): Predicts future traffic conditions by averaging the traffic flow observed during the same time intervals in historical data, under the assumption of periodic traffic patterns [10].
Support Vector Machine Regression (SVR): Formulates the time series forecasting task as a supervised learning problem, leveraging support vector machines to construct a regression function in high-dimensional space [34].
Long Short-Term Memory (LSTM): Employs gated mechanisms and memory cells to capture long-range temporal dependencies in sequential data [35].
Gated Recurrent Units (GRU): A simplified variant of LSTM with fewer gating mechanisms, resulting in reduced parameter complexity and improved computational efficiency [36].
Temporal Graph Convolutional Network (TGCN): Integrates graph convolution to capture spatial dependencies with recurrent units to model temporal dynamics, enabling joint spatio-temporal feature learning [31].
Spatio-Temporal Graph Convolutional Network (STGCN): Combines graph convolution with gated causal convolutions to simultaneously capture spatio-temporal dependencies, thereby improving both prediction accuracy and training efficiency [21].
A3T-GCN: Enhances temporal graph convolution by incorporating temporal attention, leading to improved prediction of traffic peaks [37].
AST-GCN: Introduces attribute augmentation units to explicitly integrate static and dynamic external factors, thereby improving traffic prediction accuracy and enhancing interpretability for sudden traffic fluctuations [8].

4.4. Baseline Comparison Experiments

Based on the aforementioned experimental settings and baseline models, we conducted comparative experiments to evaluate the effectiveness of the proposed model. All experiments were performed on the SZ dataset and MATR_LA data, and the results for all models are presented in Table 2 and Table 3. As shown in the table, our MIFA-ST-MGCN model demonstrates superior performance compared with other models, as evidenced by its leading results in the majority of evaluation metrics. Compared with the traditional HA and SVR models, the RMSE of our model decreases by approximately 4.10% and 2.74%, while the prediction accuracy improves by 1.70% and 1.11%, respectively. Relative to LSTM and GRU, which only capture temporal dependencies, MIFA-ST-MGCN achieves reductions of 17.54% and 41.24% in RMSE, along with improvements of 9.17% and 38.43% in prediction accuracy. Furthermore, when compared with TGCN, STGCN, A3T-GCN, and AST-GCN, which incorporate spatio-temporal modeling, graph convolution, attention mechanisms, and attribute augmentation, respectively, our model achieves RMSE reductions of 13.17%, 54.91%, 14.20%, and 17.43%, as well as accuracy improvements of 6.39%, 86.13%, 7.01%, and 9.09%. These results validate the effectiveness of the proposed model.

From Table 2 and Table 3, our model outperforms competing methods on both the SZ_taxi and METR_LA datasets. On SZ_taxi, because the margin over the LSTM baseline is relatively small, we conducted repeated trials comparing our model with LSTM. The results show that our model’s RMSE is consistently lower by 0.0059–0.0957 across runs, corroborating the reliability of the observed performance gains and validating the effectiveness of our improvements.

4.5. Comparison of Prediction Performance for Different Time Periods

To comprehensively evaluate the performance of the model under different prediction horizons, we conducted experiments on both the proposed model and the baseline models with forecasting intervals of 15, 30, 45, and 60 min. The results are presented in Figure 7. The experiments show that our model consistently outperforms the baselines across all horizons. Although performance declines as the prediction horizon increases, our model still achieves better forecasting results than the baseline methods and exhibits a considerably smaller performance drop compared to the other models.

4.6. Ablation Experiments

To evaluate the effectiveness of the key components in the proposed forecasting model, we designed a series of ablation experiments. These experiments can be broadly categorized into two types. The first type is based on TGCN, where POI data and weather data are incorporated to verify the effectiveness of attribute data fusion. The second type is based on our MIFA-ST-MGCN model, where the attribute data fusion module, multi-graph convolution module, and spatio-temporal attention module are individually removed to examine their contributions. The results of the ablation experiments are summarized in Table 4.

4.7. Perturbation Analysis

Given the inherent complexity and uncertainty of real-world scenarios, traffic flow data inevitably contain noise, missing values, and outliers. To evaluate the stability and reliability of the proposed model under such conditions, we conducted a perturbation analysis by injecting Gaussian noise and Poisson noise into the original data. The Gaussian noise follows a normal distribution

N \in (0, σ^{2})

with

σ \in \{0.2, 0.4, 0.6, 0.8, 1.0\}

, while the Poisson noise follows a distribution

P (λ)

with

λ \in \{1, 2, 4, 8, 16\}

. The experimental results, shown in Figure 8, indicate that the evaluation metrics remain relatively stable across different noise types and intensities. These findings demonstrate that our model maintains strong adaptability in complex real-world environments.

4.8. Visualization of Predictions

To more clearly demonstrate the predictive capability of our model, we conducted a visualization analysis by comparing the predicted results of different models with the ground truth speeds in the test set. Furthermore, we provided a more in-depth interpretation and evaluation of the models from two perspectives:

(1): Predictions of Different Models

Based on 2.5 h of historical traffic flow data, we employed different models to predict traffic speed for the following hour. The prediction results are presented in Figure 9. As illustrated, our model produces predictions that are closer to the ground truth than those of other models, both in terms of overall traffic flow trends and peak prediction accuracy.

(2): Effectiveness of Different Modules

To further investigate the importance of different modules, we visualized the results of the ablation experiments. Figure 10, Figure 11 and Figure 12 present a comparison between the baseline TGCN model and the models incorporating external information fusion, spatiotemporal attention, and multi-graph convolution. Figure 10 illustrates the prediction results of the model enhanced with POI and weather data. Compared with TGCN, the inclusion of POI and weather information improves the model’s perception capability and prediction accuracy. For instance, on 5 January 2015, when heavy rainfall occurred, the model incorporating weather data produced predictions that were much closer to the ground truth, thereby demonstrating the effectiveness of external information fusion. Figure 11 and Figure 12 show the prediction results of the models with spatiotemporal attention and multi-graph convolution, respectively. Compared with TGCN, the model with spatiotemporal attention exhibits superior performance in capturing overall traffic flow trends, while the model with multi-graph convolution achieves better accuracy in peak prediction.

Results show that our model adheres more closely to the ground-truth curves during sharp peak periods such as the morning and evening rush hours. This advantage arises from two complementary mechanisms. First, the spatial branch employs multi-graph convolution with Softmax-normalized, learnable fusion weights, enabling the characterization of corridor-like inter-segment couplings and functional similarities—particularly effective in areas with simultaneous demand surges (e.g., commercial districts and transportation hubs). Second, the temporal branch incorporates multi-head self-attention to reweight the most predictive lags during the phases of queue formation and dissipation, thereby improving relative alignment and amplitude depiction around peaks. Moreover, by incorporating weather and POI information, the model can identify traffic-flow variations induced by meteorological conditions or POI types, allowing rapid adaptation to changing exogenous factors and enhancing both the realism and interpretability of the forecasts.

5. Conclusions

We propose a spatio-temporal multi-graph convolution traffic flow prediction model, termed MIFA-ST-MGCN, which integrates multi-source information fusion and attention enhancement to improve prediction accuracy and robustness. By combining Temporal Graph Convolutional Networks (TGCN), multi-source information fusion, and spatio-temporal attention mechanisms, we design a predictive framework capable of effectively capturing spatio-temporal dependencies in traffic flow while accounting for external environmental factors. Comparative experimental results demonstrate that MIFA-ST-MGCN outperforms existing baseline models across multiple evaluation metrics, validating its effectiveness in complex traffic scenarios. Ablation studies further confirm the contributions of feature fusion, multi-graph convolution, and spatio-temporal attention, while perturbation experiments show that the model maintains strong adaptability under noisy and uncertain conditions.

Despite the strong predictive performance demonstrated by the proposed model, several limitations remain. First, the model’s scalability is constrained by the computational complexity inherent in its key components. The spatio-temporal multi-head self-attention mechanism, while effective, incurs a computational cost of

O (T \times N^{2} \times F)

for spatial attention and

O (N \times T^{2} \times F)

for temporal attention, where

N

is the number of nodes,

T

is the sequence length, and

F

is the feature dimension. This quadratic dependency on

N

and

T

can become a bottleneck when applied to large-scale metropolitan networks with thousands of nodes. Future work will explore more efficient attention mechanisms, such as linearized or sparse attention, to mitigate this issue. Second, the model’s performance is sensitive to certain hyperparameters, such as the number of graph convolution layers, the number of attention heads, and the dimensionality of GRU hidden states. Although we conducted extensive parameter tuning for this study, a comprehensive sensitivity analysis was not included. The depth of the multi-graph convolution module, in particular, requires careful design; preliminary ablation experiments (varying layers from 1 to 3) indicated that while a 2-layer structure offered the best trade-off, deeper architectures risk over-smoothing and increased computational overhead without commensurate gains in accuracy. Third, the model’s reliance on multi-source data (e.g., POI, weather), while beneficial for accuracy, introduces practical deployment challenges in data-scarce environments. The requirement for high-quality, synchronized external data may limit the model’s applicability in regions where such data are incomplete or unavailable. Future iterations could investigate semi-supervised or self-supervised learning strategies to reduce dependency on extensive labeled and auxiliary data. Lastly, the current framework assumes a static graph topology, which restricts its ability to adapt to dynamic network changes caused by traffic incidents or temporary road closures. Integrating dynamic graph construction techniques or temporal graph networks could enhance the model’s responsiveness to real-time structural variations.

In summary, the proposed MIFA-ST-MGCN model provides an effective solution to the traffic flow forecasting problem, carrying both theoretical significance and practical value. With the increasing complexity of urban transportation systems and the growing volume of data, the model can be further optimized and extended in future work to better adapt to more dynamic and complex traffic environments.

Author Contributions

Conceptualization, W.L. and Z.S.; methodology, W.L.; software, Y.W.; validation, W.L., Z.S. and Y.W.; formal analysis, Z.S.; investigation, Z.S.; resources, W.L.; data curation, Z.S.; writing—original draft preparation, Z.S.; writing—review and editing, W.L.; visualization, Z.S.; supervision, W.L.; project administration, Z.S.; funding acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Wuhan Key Research and Development Program, project number 2024050702030122.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dickerson, A.; Peirson, J.; Vickerman, R. Road accidents and traffic flows: An econometric investigation. Economica 2000, 67, 101–121. [Google Scholar] [CrossRef]
Retallack, A.E.; Ostendorf, B. Relationship between traffic volume and accident frequency at intersections. Int. J. Environ. Res. Public Health 2020, 17, 1393. [Google Scholar]
Leich, A.; Nippold, R.; Schadschneider, A.; Wagner, P. Physical models of traffic safety at crossing streams. Phys. A Stat. Mech. Its Appl. 2024, 640, 129669. [Google Scholar]
Wang, X.; Zhuang, X.; Gao, Z.; Dong, Y. A Survey on Highway Traffic Flow Prediction: Methods and Advances. Comput. Technol. Dev. 2025, 35, 1–9. [Google Scholar]
Wang, Q.; Wu, Y.; Zhu, C.; Wang, Y. Short-term traffic flow prediction studies integrated with external properties. Appl. Res. Comput. 2022, 39, 2974–2978. [Google Scholar]
Zhang, Y.; Zhou, C.; Chen, Y. Highway Traffic Flow Prediction Based on a Feature Fused Spatio-Temporal Graph Mixed Networks. J. Dalian Jiaotong Univ. 2025, 46, 15–25. [Google Scholar]
Lei, B.; Li, J.; Zhang, P.; Li, W.; Chen, C. Long Term Prediction on Urban Traffic Flow Based on Multi-source Spatio-temporal Graph Convolutional Neural Network Model. J. Highw. Transp. Res. Dev. 2024, 41, 204–213. [Google Scholar]
Zhu, J.; Wang, Q.; Tao, C.; Deng, H.; Zhao, L.; Li, H. AST-GCN: Attribute-Augmented Spatiotemporal Graph Convolutional Network for Traffic Forecasting. IEEE Access 2021, 9, 35973–35983. [Google Scholar]
Zong, X.; Yan, H.; Qi, Y. Recent Advances in Multi-source Data Fusion for Traffic Flow Prediction: A Review. Arch. Comput. Methods Eng. 2025; Prepublish. [Google Scholar]
Sun, Y.; Zhang, G.; Yin, H. Passenger Flow Prediction of Subway Transfer Stations Based on Nonparametric Regression Model. Discret. Dyn. Nat. Soc. 2014, 2014, 1–8. [Google Scholar] [CrossRef]
Klepsch, J.; Klüppelberg, C.; Wei, T. Prediction of functional ARMA processes with an application to traffic data. Econ. Stat. 2017, 1, 128–149. [Google Scholar] [CrossRef]
Kumar, P.B.; Hariharan, K. Time series traffic flow prediction with hyper-parameter optimized ARIMA models for intelligent transportation system. J. Sci. Ind. Res. 2022, 81, 408–415. [Google Scholar] [CrossRef]
Balawi, M.; Tenekeci, G. Time series traffic collision analysis of London hotspots: Patterns, predictions and prevention strategies. Heliyon 2024, 10, e25710. [Google Scholar] [CrossRef] [PubMed]
Huang, Z.; Lin, P.; Lin, X.; Zhou, C.; Huang, T. Spatiotemporal attention mechanism-based multistep traffic volume prediction model for highway toll stations. Arch. Transp. 2022, 61, 21–38. [Google Scholar] [CrossRef]
Feng, X.; Ling, X.; Zheng, H.; Chen, Z.; Xu, Y. Adaptive multi-kernel SVM with spatial–temporal correlation for short-term traffic flow prediction. IEEE Trans. Intell. Transp. Syst. 2018, 20, 2001–2013. [Google Scholar]
Xia, J.; Wang, S.; Wang, X.; Xia, M.; Xie, K.; Cao, J. Multi-view Bayesian spatio-temporal graph neural networks for reliable traffic flow prediction. Int. J. Mach. Learn. Cybern. 2024, 15, 65–78. [Google Scholar]
Zhu, H.; Xie, Y.; He, W.; Sun, C.; Zhu, K.; Zhou, G.; Ma, N. A novel traffic flow forecasting method based on RNN-GCN and BRB. J. Adv. Transp. 2020, 2020, 7586154. [Google Scholar] [CrossRef]
Chaoura, C.; Lazar, H.; Jarir, Z. Traffic Flow Prediction at Intersections: Enhancing with a Hybrid LSTM-PSO Approach. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 494–501. [Google Scholar] [CrossRef]
Zhang, W.; Li, X.; Li, A.; Huang, X.; Wang, T.; Gao, H. Sgru: A high-performance structured gated recurrent unit for traffic flow prediction. In Proceedings of the 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS), Ocean Flower Island, China, 17–21 December 2023; IEEE: Dallas, TX, USA, 2023; pp. 467–473. [Google Scholar]
Ma, X.; Dai, Z.; He, Z.; Ma, J.; Wang, Y.; Wang, Y. Learning traffic as images: A deep convolutional neural network for large-scale transportation network speed prediction. Sensors 2017, 17, 818. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar]
Li, W.; Sui, L.; Zhou, M.; Dong, H. Short-term passenger flow forecast for urban rail transit based on multi-source data. EURASIP J. Wirel. Commun. Netw. 2021, 2021, 9. [Google Scholar]
Xu, Z.; Yuan, J.; Yu, L.; Wang, G.; Zhu, M. Machine learning-based traffic flow prediction and intelligent traffic management. Int. J. Comput. Sci. Inf. Technol. 2024, 2, 18–27. [Google Scholar] [CrossRef]
Wang, K.; Liu, L.; Liu, Y.; Li, G.; Zhou, F.; Lin, L. Urban regional function guided traffic flow prediction. Inf. Sci. 2023, 634, 308–320. [Google Scholar] [CrossRef]
Du, M.; Yang, L.; Tu, J. A novel approach to calculate the spatial–temporal correlation for traffic flow based on the structure of urban road networks and traffic dynamic theory. Sensors 2021, 21, 4725. [Google Scholar] [CrossRef]
Chen, M.; Yuan, H.; Jiang, N.; Bao, Z.; Wang, S. Urban traffic accident risk prediction revisited: Regionality, proximity, similarity and sparsity. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 281–290. [Google Scholar]
Chu, L.; Hou, Z.; Jiang, J.; Yang, J.; Zhang, Y. Spatial-temporal feature extraction and evaluation network for citywide traffic condition prediction. IEEE Trans. Intell. Veh. 2023, 9, 5377–5391. [Google Scholar] [CrossRef]
Lee, E.H.; Lee, E. Congestion boundary approach for phase transitions in traffic flow. Transp. B Transp. Dyn. 2024, 12, 2379377. [Google Scholar]
Laval, J.A. Traffic Flow as a Simple Fluid: Toward a Scaling Theory of Urban Congestion. Transp. Res. Rec. 2024, 2678, 376–386. [Google Scholar] [CrossRef]
Lee, E.H.; Kho, S.-Y.; Kim, D.-K.; Cho, S.-H. Travel time prediction using gated recurrent unit and spatio-temporal algorithm. Proc. Inst. Civ. Eng. Munic. Eng. 2021, 174, 88–96. [Google Scholar]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
Sui, D.Z. Tobler’s first law of geography: A big idea for a small world? Ann. Assoc. Am. Geogr. 2004, 94, 269–277. [Google Scholar] [CrossRef]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
Li, Y.; Xu, W. Short-term traffic flow forecasting based on SVR. In Proceedings of the 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018), Dalian, China, 12–13 May 2018; Atlantis Press: Amsterdam, The Netherlands, 2018; pp. 57–61. [Google Scholar]
Shao, H.; Soong, B.H. Traffic flow prediction with long short-term memory networks (LSTMs). In Proceedings of the 2016 IEEE Region 10 Conference (TENCON), Singapore, 22–25 November 2016; IEEE: Dallas, TX, USA, 2016; pp. 2986–2989. [Google Scholar]
Zhang, D.; Kabuka, M.R. Combining weather condition data to predict traffic flow: A GRU-based deep learning approach. IET Intell. Transp. Syst. 2018, 12, 578–585. [Google Scholar]
Bai, J.; Zhu, J.; Song, Y.; Zhao, L.; Hou, Z.; Du, R.; Li, H. A3t-gcn: Attention temporal graph convolutional network for traffic forecasting. ISPRS Int. J. Geo-Inf. 2021, 10, 485. [Google Scholar] [CrossRef]

Figure 1. Overall Framework.

Figure 2. Graph Convolutional Networks.

Figure 3. Gated Recurrent Units.

Figure 4. Multi-Graph Convolutional Networks.

Figure 5. Spatio-temporal attention module & multi-head self-attention.

Figure 6. MIFA-ST-MGCN at time t.

Figure 7. Performance of competing models across different forecasting horizons: (a) RMSE (km/h; lower is better), (b) MAE (km/h; lower is better), (c) Accuracy (higher is better; see Section 4.2 for definition), and (d) R² (higher is better). The x-axis lists horizons of 13, 30, 45, and 60 min. Results are reported on the SZ_taxi dataset (Jan-2015). Model names match Table 2 and Table 3 (HA, SVR, LSTM, GRU, TGCN, STGCN, A3T-GCN, AST-GCN, Ours).

Figure 8. Results of noise experiments. Sub-figures illustrate (a) Gaussian Noise, (b) Poisson Noise.

Figure 9. Visualization of predictions. Given 2.5 h of history to forecast the following 1 h, we plot the ground truth and predictions from all models (TGCN, GRU, LSTM, STGCN, A3T-GCN, AST-GCN, Ours). The y-axis reports speed in km/h.

Figure 10. Prediction results of the baseline model versus the model enhanced with external information. We plot the ground truth, TGCN (baseline), and TGCN + POI + Weather in the same time window (Speed in km/h). On 5 January 2015 with heavy rainfall, the externally enhanced model aligns more closely with the ground truth, evidencing the benefit of information fusion.

Figure 11. Prediction results of the baseline model versus the spatio-temporal attention-enhanced model. Compared with TGCN (baseline), the attention-enhanced model better captures overall traffic-flow trends in the same time window (Speed in km/h).

Figure 12. Prediction results of the baseline model versus the multi-graph convolution-enhanced model. Relative to TGCN (baseline), TGCN + MGCN achieves better accuracy on peak depiction. Curves are shown for the same time window (Speed in km/h).

Table 1. Comparison of Model Characteristics.

Feature	AST-GCN	ST-GRAT	MIFA-ST-MGCN
Data fusion mechanism	Static/dynamic attribute splicing	No external information	Feature splicing + Multi graph convolution
Figure structure	Single topology	Topological graph with direction	Multiple graph structure
Attention mechanism	Not using attention mechanism	Self attention mechanism	Spatiotemporal attention mechanism

Table 2. Results of comparative experiments (SZ-taxi dataset).

Evaluation Indicators	RMSE	MAE	Accuracy	R²	Var
HA	4.2740	2.7985	0.7021	0.8325	0.8325
SVR	4.1966	2.7777	0.7075	0.8385	0.8400
LSTM	4.0665	2.7257	0.7166	0.8484	0.8486
GRU	4.2540	2.9616	0.7035	0.8341	0.8347
TGCN	4.3243	3.0398	0.6986	0.8287	0.8302
STGCN	8.3224	6.6436	0.4201	0.3652	0.3652
A3T-GCN	4.1070	2.7913	0.7138	0.8454	0.8456
AST-GCN	4.3240	2.9625	0.6987	0.8286	0.8286
Ours	4.0600	2.6951	0.7171	0.8489	0.8491

Bold is the best indicator.

Table 3. Results of comparative experiments (METR-LA dataset).

Evaluation Indicators	RMSE	MAE	Accuracy	R²	Var
HA	7.4271	4.4556	0.8506	0.7442	0.7442
SVR	9.6734	6.1117	0.8397	0.6318	0.6326
LSTM	7.8820	5.5922	0.8658	0.6759	0.6877
GRU	8.4899	5.7926	0.8554	0.6240	0.6241
TGCN	6.8520	4.2676	0.8834	0.7551	0.7555
STGCN	10.3812	6.2751	0.7931	0.6536	0.6544
A3T-GCN	7.0729	4.5617	0.8555	0.7312	0.7317
AST-GCN	6.9356	4.8788	0.8663	0.7762	0.7762
Ours	6.2740	4.2419	0.8932	0.7946	0.7952

Bold is the best indicator.

Table 4. Results of ablation experiments. Y denotes the added data or module.

Modules					Evaluation Indicators
Poi	Weather	Poi + Weather	ST-Attention	MGCN	RMSE	MAE	Acc	R²
					4.3243	3.0398	0.6986	0.8287
Y					4.2794	2.9656	0.6893	0.8076
	Y				4.2712	2.9702	0.6911	0.8089
		Y			4.2679	2.9467	0.6978	0.8107
		Y		Y	4.2082	2.8621	0.7123	0.8113
		Y	Y		4.0988	2.7776	0.7145	0.8464
			Y	Y	4.1070	2.7913	0.7138	0.8456
		Y	Y	Y	4.0600	2.6951	0.7171	0.8489

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Sun, Z.; Wan, Y. Spatio-Temporal Multi-Graph Convolution Traffic Flow Prediction Model Based on Multi-Source Information Fusion and Attention Enhancement. Appl. Sci. 2025, 15, 11295. https://doi.org/10.3390/app152011295

AMA Style

Li W, Sun Z, Wan Y. Spatio-Temporal Multi-Graph Convolution Traffic Flow Prediction Model Based on Multi-Source Information Fusion and Attention Enhancement. Applied Sciences. 2025; 15(20):11295. https://doi.org/10.3390/app152011295

Chicago/Turabian Style

Li, Wenjing, Zhongning Sun, and Yao Wan. 2025. "Spatio-Temporal Multi-Graph Convolution Traffic Flow Prediction Model Based on Multi-Source Information Fusion and Attention Enhancement" Applied Sciences 15, no. 20: 11295. https://doi.org/10.3390/app152011295

APA Style

Li, W., Sun, Z., & Wan, Y. (2025). Spatio-Temporal Multi-Graph Convolution Traffic Flow Prediction Model Based on Multi-Source Information Fusion and Attention Enhancement. Applied Sciences, 15(20), 11295. https://doi.org/10.3390/app152011295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spatio-Temporal Multi-Graph Convolution Traffic Flow Prediction Model Based on Multi-Source Information Fusion and Attention Enhancement

Featured Application

Abstract

1. Introduction

2. Related Works

3. Model Design

3.1. Problem Definition

3.2. Overall Framework

3.3. Spatio-Temporal Dependency Modeling

3.3.1. Spatial Modeling: Graph Convolution Operation

3.3.2. Temporal Modeling: Graph Convolution Operation

3.4. Multi-Source Information Fusion Modeling

3.4.1. Graph Structure Construction

3.4.2. Multi Graph Convolution Fusion

3.4.3. Feature Level Fusion

3.5. Attention-Enhancing Mechanism

3.6. Spatio-Temporal Multi Graph Convolution Model Based on Multi-Source Information Fusion and Attention Enhancement (MIFA-ST-MGCN)

4. Experiments

4.1. Datasets

4.2. Evaluation Indicators and Parameter Settings

4.2.1. Evaluation Indicators

4.2.2. Parameter Settings

4.3. Baseline Models

4.4. Baseline Comparison Experiments

4.5. Comparison of Prediction Performance for Different Time Periods

4.6. Ablation Experiments

4.7. Perturbation Analysis

4.8. Visualization of Predictions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI