Improved Heterogeneous Spatiotemporal Graph Network Model for Traffic Flow Prediction at Highway Toll Stations

Zhang, Yaofang; Chen, Jian; Chen, Fafu; Gao, Jianjie

doi:10.3390/su17177905

Open AccessArticle

Improved Heterogeneous Spatiotemporal Graph Network Model for Traffic Flow Prediction at Highway Toll Stations

by

Yaofang Zhang

¹

,

Jian Chen

¹,

Fafu Chen

² and

Jianjie Gao

^3,*

¹

College of Traffic and Transportation, Chongqing Jiaotong University, Chongqing 400074, China

²

College Electronic and Information Engineering, Southwest University, Chongqing 400715, China

³

Sichuan Provincial Key Laboratory of Intelligent Policing, Sichuan Police College, Luzhou 646000, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(17), 7905; https://doi.org/10.3390/su17177905

Submission received: 8 July 2025 / Revised: 28 August 2025 / Accepted: 30 August 2025 / Published: 2 September 2025

(This article belongs to the Special Issue Advances in Intelligent Transportation, Smart Grids and Electric Vehicles in the Context of Sustainability)

Download

Browse Figures

Versions Notes

Abstract

This study aims to guide the management and service of highways towards a more efficient and intelligent direction, and also provides intelligent and green data support for achieving sustainable development goals. The forecasting of traffic flow at highway stations serves as the cornerstone for spatiotemporal analysis and is vital for effective highway management and control. Despite considerable advancements in data-driven traffic flow prediction, the majority of existing models fail to differentiate between directions. Specifically, entrance flow prediction has applications in dynamic route guidance, disseminating real-time traffic conditions, and offering optimal entrance selection suggestions. Meanwhile, exit flow prediction is instrumental for congestion and accident alerts, as well as for road network optimization decisions. In light of these needs, this study introduces an enhanced heterogeneous spatiotemporal graph network model tailored for predicting highway station traffic flow. To accurately capture the dynamic impact of upstream toll stations on the target station’s flow, we devise an influence probability matrix. This matrix, in conjunction with the covariance matrix across toll stations, updated graph structure data, and integrated external weather conditions, allows the attention mechanism to assign varied combination weights to the target toll station from temporal, spatial, and external standpoints, thereby augmenting prediction accuracy. We undertook a case study utilizing traffic flow data from the Chengdu-Chengyu station on the Sichuan Highway to gauge the efficacy of our proposed model. The experimental outcomes indicate that our model surpasses other baseline models in performance metrics. This study provides valuable insights for highway management and control, as well as for reducing traffic congestion. Furthermore, this research highlights the importance of using data-driven approaches to reduce carbon emissions associated with transportation, enhance resource allocation at toll plazas, and promote sustainable highway transportation systems.

Keywords:

sustainable transportation; intelligent transportation; highway; traffic prediction; spatiotemporal characteristics; heterogeneous graph network

1. Introduction

With the continuous expansion of highway networks and the rapid growth of traffic flows, traditional traffic management models are increasingly struggling to cope with the escalating complexity of traffic pressures, such as recurrent holiday congestion and section paralysis caused by accidents [1]. In recent years, the in-depth application of ITS intelligent transportation systems) in the highway sector has brought about a revolutionary transformation in traffic management models and service patterns [2]. At the core of ITS lies highway traffic flow prediction, which relies on massive volumes of historical traffic data collected by devices like ETC (electronic toll collection) gates, microwave radars, and on-road video surveillance to accurately forecast key parameters such as future road traffic flow, speed, and congestion index [3]. This technology has thus become a cornerstone of intelligent highway operation. Accurate highway traffic flow prediction not only serves as the “eye” for perceiving the operational status of the road network but also acts as the “brain center” for intelligent control. By monitoring real-time traffic flow trends across various sections, traffic management departments can dynamically optimize variable speed limit signs, information board release strategies, and proactively implement measures like tidal lane switching and emergency lane control, thereby significantly improving the overall operational efficiency of the road network [4]. Meanwhile, differentiated toll policies formulated based on prediction data can effectively guide vehicles to travel during off-peak periods, reducing unnecessary idling and stop-and-go driving, which in turn lowers energy consumption and exhaust emissions—a crucial contribution to promoting green and low-carbon transportation. For travelers, real-time road condition navigation and travel planning suggestions derived from prediction results help them reasonably arrange travel times and avoid congested sections, reducing travel frustration and energy waste caused by detours. For highway operating companies, accurate flow prediction provides a scientific basis for formulating maintenance plans, reserving emergency materials, and scheduling toll station personnel, enabling more efficient resource allocation and reducing operational costs, which aligns with the concept of sustainable development in resource utilization [5]. In essence, the advancement of highway traffic flow prediction technology under ITS not only enhances traffic management efficiency but also plays a vital role in promoting environmental sustainability, economic efficiency, and social well-being in the transportation sector.

Despite its significance, accurately forecasting traffic flow presents a substantial challenge due to intricate temporal, spatial, and semantic dependencies [6]. While notable advancements have been made in data-driven traffic flow prediction—such as employing LSTM (long short-term memory) to process traffic time series data for prediction—these models often fall short in extracting the spatial features of highway traffic flow [7]. Alternative approaches, like convLSTM, use convolutional layers instead of LSTM full connection layers to extract spatiotemporal features concurrently. However, the topology of highway networks, which diverges from typical grid data, exhibits non-Euclidean characteristics that limit their applicability [8]. Graph neural networks offer a solution by handling complex spatial relationships and extracting spatial features between traffic nodes. The incorporation of a graph attention mechanism allows for the assignment of different weights in space–time, effectively integrating spatiotemporal features for prediction [9]. Heterogeneous neural network fusion techniques, such as those combining an encoder–decoder structure with a random walk to model the spatiotemporal features of traffic road network node flows, have also gained traction. These methods merge GCNs (graph convolutional networks) and RNNs (recurrent neural networks) for enhanced traffic flow prediction [10]. While significant advancements have been made in data-driven traffic flow forecasting, most models fail to differentiate between directions, and traffic flow on highways can vary depending on the direction. A more scientifically sound and reasonable approach to promoting future sustainable intelligent transportation applications would involve making predictions based on different directions. For instance, entrance flow prediction plays a crucial role in dynamic route guidance, disseminating real-time traffic conditions, and providing optimal entrance selection suggestions. On the other hand, exit flow prediction can alert about congestion and potential accidents, thereby facilitating informed decision-making for road network optimization.

In this study, we introduce a spatiotemporal attention mechanism–based traffic flow prediction model for highways. To dynamically assess the influence of upstream toll stations on the traffic flow at the target station, we construct an influence probability matrix. This matrix, when combined with the covariance matrix between toll stations, allows for the optimization of graph structure data. Taking into account external weather conditions, we propose a deep learning model that integrates the spatiotemporal attention mechanism with an encoder–decoder architecture. Through this attention mechanism, the model assigns varied combination weights to the target toll station from dimensions such as time series, space, and external factors, thereby enhancing prediction accuracy. As a case study, we utilize traffic flow data from the Chengdu-Chongqing station on the Sichuan Highway to validate the efficacy of our proposed model. Experimental results indicate that our model outperforms other baseline models in terms of key performance indicators. The study is expected to provide scientific references for the optimization of highway network design and the rational allocation of resources, as also contributed to sustainable development solutions.

The main contributions of this study are outlined as follows:

(1): For the prediction of traffic flow at highway toll stations, an improved heterogeneous spatiotemporal graph network model is designed. This innovatively considers the contribution of upstream toll stations to the traffic flow at the target station, optimizing graph structure data by constructing a dynamic influence matrix and combining it with the covariance matrix between toll stations.
(2): A deep learning model that integrates spatiotemporal attention mechanisms with encoder–decoder is proposed, incorporating external weather factors. With the help of the attention mechanism, dynamic weights are allocated to the target toll stations from multiple dimensions such as time series, spatial correlations, and external factors.
(3): The performance of the proposed framework is verified based on actual traffic flow data. Experimental results show that this framework has significant advantages compared to five benchmark methods.

2. Related Studies

Traffic flow prediction unearths concealed patterns and rules by scrutinizing deterministic patterns within extensive historical datasets and incorporating the learning of indeterminate factors. This information is then converted into beneficial knowledge for predictive modeling. By establishing a functional correlation between inputs and outputs, the developed model can generate precise estimates and predictions based on known input data. Over the past few decades, numerous scholars have proposed a variety of methods for forecasting traffic flow. These existing methods can be primarily categorized into three distinct groups: parametric methods, non-parametric methods, and hybrid methods [11]. In the parametric methods, the time series model is the most widely used [12], typical time series models include HA (History Average) [13], ARIMA (AutoRegressive Integrated Moving Average) [14]. Due to the strong non-linearity and randomness characteristics of traffic flow, non-parametric models have been applied, such as KNN (K-nearest neighbor) [15], SVR (support vector machine) [16], Gaussian process [17], Bayesian [18], etc. However, non-parametric models are usually more complex, and their algorithm convergence speed is relatively slower. The learning process of these models largely depends on the principle of empirical risk minimization, but it does not guarantee the minimization of expected risk, thus non-parametric models are more prone to overfitting. In addition, due to the large number of optimization parameters and large computational load, it is easy to fall into local optimal solutions, making non-parametric models not very suitable for real-time computation in traffic systems that require rapid response. In handling high-dimensional datasets, conventional methods frequently grapple with the “curse of dimensionality” issue. This problem becomes increasingly evident as the number of features escalates, leading to heightened data sparsity, which in turn amplifies the prediction challenge [19]. However, the integration of machine learning [20] and deep learning [21] in traffic flow prediction has significantly enhanced modeling accuracy for real-world traffic volume forecasting problems [22]. Deep learning utilizes a sophisticated multi-layered approach to representation learning, whereby raw data are transformed through simple, non-linear models into more abstract, higher-level expressions. This process enables the achievement of complex and highly flexible function approximation [23]. The deep learning methodologies that are frequently employed encompass DBN (deep belief net) [24], CNN (convolutional neural network) [25], LSTM [26], GRU (gated recurrent unit) [27], and transformer [28], among others. In light of the spatiotemporal nuances in traffic flow, researchers have increasingly employed an amalgamation of diverse deep learning techniques for traffic flow prediction studies [29]. The constructed prediction model consists of two parts: a feature learner and a predictor. The former is responsible for extracting high-order spatiotemporal features from historical traffic flow sequences, while the latter is tasked with using the extracted high-order spatiotemporal features as inputs to forecast future traffic flow. The data types targeted for prediction are broadly classified into grid-based and topology-based categories. The former often represents the observation area as a grid with each cell corresponding to specific traffic flow characteristics. This method closely resembles the image data structure prevalent in computer vision, thereby simplifying subsequent convolution operations and similar processes. On the other hand, the latter relies on topological structures to organize traffic flow observation data, capitalizing on the inherent connection relationships of the road network to capture spatial correlations between different observation units. However, due to the non-Euclidean nature of topological data, graph convolutional neural networks are required instead of the standard convolutional neural networks. When forecasting future conditions, the basic input typically includes historical traffic flow sequences. Additional external factors such as weather and holidays may sometimes be incorporated. The prediction duration can vary from the immediate next segment (single-step prediction) to several segments in the future (multi-step prediction). Using deep neural networks as the core methodology for traffic flow prediction has become a research hotspot and mainstream, such as GCN [30], STGCN (spatial temporal graph convolutional network) [31], GAT (graph attention network) [32], DCGCN (dynamic conditional graph convolutional network) [33], ASTGCN (attention-based spatial-temporal graph convolutional networks) [34], transformer [35], etc. The typical methods of deep learning for traffic flow prediction are shown in Table 1.

Due to the highway networked toll system, a large amount of sequential data from vehicle passages is recorded, including information such as entry, exit, identification stations, travel time, vehicle type, and license plate. The comprehensive analysis of highway travel behavior patterns, as observed in existing research, necessitates substantial data support. The inherent characteristics of the data, such as continuity and correlation, warrant further exploration. The current representation of potential spatiotemporal relationships in traffic flow is insufficient. It is imperative to enhance our ability to swiftly detect regularities and anomalies in data changes over time, understand the correlation between the spatial road network topology structures, and improve the fusion learning model for spatiotemporal features. Therefore, this study aims to utilize ETC toll station data to explore the temporal correlation, spatial correlation, and external environmental correlation of highway traffic flow. The goal is to identify the travel patterns of highways in both temporal and spatial dimensions, thereby enhancing the accuracy and interpretability of traffic flow predictions at highway stations.

3. Preliminaries and Problem Definitions

Traffic flow prediction pertains to forecasting future traffic conditions based on historical data or external environmental factors, with the aim of predicting the state of traffic flow at a future time t. This prediction incorporates various time granularities, including intervals of 5, 10, 15, 30 min, and 1 h. The process of forecasting the step size is also included. The application of deep learning techniques to traffic flow prediction typically involves organizing a series of time series data in a specific sequence and inputting it into a neural network. Through iterative model training, adjustments are made to the internal weights and output features to optimize the model’s output to accurately reflect the true distribution.

The transportation network is represented by a graph structure

G^{t} = (V, E, A, w)

, where the set

V = v_{1}, v_{2}, \dots, v_{n}

represents the vertices of the graph, where each toll station corresponds

s_{i}

, and n is the number of toll stations;

E = e_{1}, e_{2}, \dots, e_{n}

represents the edges of the graph and the distance between each toll station’s road segment; set A represents the adjacency matrix of the graph, in which

A_{i j} = 1

means adjacent otherwise

A_{i j} = 0

; and

w

is the weight coefficient matrix, in which

{w^{t}}_{i, j}

represents the interaction weight of the toll station entering and the destination being the station during the time period t.

The traffic characteristics are represented by matrix

C^{t} = (C_{T}, C_{s}, C_{E}) \in R^{n * m}

, which represents the characteristics of all toll stations during time period t; and m is the total number of temporal, spatial, and external features.

The historical traffic flow situation uses matrix

x_{i, t} = ({In}_{i, t}, {Out}_{i, t})

, which means the entrance and exit traffic of toll station

S_{i}

during time period t. The matrix e represents the interactive traffic entering from toll station

S_{i}

and ending at toll station

S_{j}

during time period t. Therefore, for the exit traffic of toll station

S_{j}

, it can be represented by the following formula:

{Out}_{j, t} = \sum_{i = 1}^{c} {Edge}_{i, j, t}

(1)

where C represents the number of all entry stations associated with the destination being station

S_{j}

.

Given the historical traffic

X = (X_{t - k}, \dots, X_{t - 1}, X_{t}) \in R^{C \times T}

of a target toll station

S_{j}

, Where T is the length of the time series.

x^{j} = (x_{t - k}^{j}, x_{t - k + 1}^{j}, \dots, x_{t}^{j}) \in R^{T}

is the

t - k

time steps traffic flow matrix at toll stations

S_{j}

,

x_{t}^{j}

represents the entrance traffic, exit traffic, and associated interaction traffic of the toll station

S_{j}

at time t. The flow prediction of mainline toll stations on highways aims to predict the traffic flow (i.e., entrance flow, exit flow) within h time steps at the target toll station. The value of h determines whether the prediction result is short term or long term. The expression is as follows:

\begin{array}{l} {\hat{x}}_{t + h} = x ({In}_{j, t + h}, {Out}_{j, t + h}) \\ {\hat{x}}_{t + h} = f (X_{t - k}, \dots, X_{t - 1}, X_{t}) \end{array}

(2)

4. Model Construction

To accurately assess the dynamic influence of upstream toll stations on the traffic volume at target toll stations, we construct an impact probability matrix. This matrix is optimized by leveraging the covariance relationships between toll stations and takes into account external weather conditions. We introduce a novel heterogeneous cyclic neural network model, termed ST-ED-GAGGRU, which integrates a spatiotemporal attention mechanism. This attention mechanism assigns varied combination weights for analyzing target toll stations across temporal, spatial, and external dimensions, thereby enhancing the precision of traffic predictions at highway mainline stations. The model employs an encoder–decoder architecture that accommodates flexible input and output sequence lengths. Both encoder and decoder consist of multiple layers of identical stacks. When fed with historical traffic flow data from highway toll records, the encoder processes the data to discern temporal, spatial, and external features, while the decoder forecasts the traffic flow at highway toll stations. Attention mechanisms within both encoder and decoder amplify the model’s capacity to discern spatiotemporal correlations. Consequently, this structure adeptly captures nuances in traffic sequence data, manages diverse input and output sequence lengths, and boasts robust generalization capabilities. The comprehensive model architecture is depicted in Figure 1. The key symbols used in model are shown in Table 2.

4.1. Encoder Structure

The encoder structure consists of L layers of identical encoder stacks, as shown in Figure 2. Each layer contains three fused modules: GAT spatial feature extractor, GRU temporal feature extractor, and external feature extractor. The traffic information input

(X_{t - k}, \dots, X_{t - 1}, X_{t})

,

X_{t} = (G^{t}, C^{t}, x_{t})

by each node in the encoder includes timestamp information, location information, traffic flow information, and external feature information.

Highway traffic flow is characterized by intricate spatiotemporal correlations, reflecting diverse and dynamic patterns that evolve in both time and space. These patterns are not only shaped by temporal and spatial dimensions but also by external environmental factors such as holidays and weather conditions.

4.1.1. Construction of Spatial Feature Learner

This research uses GAT to capture spatial features, and the spatial correlation of traffic flow data is not only reflected in the basic spatial position relationship between upstream and downstream and the studied nodes, but also in the spatial weight connection relationship of road network nodes. The process of spatial feature learning is shown in Figure 3.

The spatial correlation of any node in the road network graph at each time step contains two layers of information:

Static spatial features: adjacent stations in space.

Dynamic spatial features: the key associated toll stations with interactive traffic with predicted sites.

The traffic of node

v_{i}

is not only affected by the traffic of adjacent nodes at the current time step t, the influence coefficient shown in Formula (3), the covariance matrix

ρ

is constructed using the influence coefficient, but is also affected by the time step

t - 1

and the nodes with interactive traffic

{p^{t}}_{i, j}

, and its degree of influence is represented by a probability matrix

p

shown in Formula (4). The impact of node dynamic spatiality, as shown in Figure 4, is influenced by fluctuations in interaction traffic.

ρ_{{v_{i}}^{k}, v_{j}} = \frac{cov (v_{i}, v_{j})}{σ_{v_{i}} σ_{v_{j}}} = \frac{\sum_{t = 1}^{T} ({v_{i}}_{t}^{k} - {\bar{v}}_{j}^{k}) ({v_{j}}_{t}^{k} - {\bar{v}}_{j}^{k})}{\sqrt{\sum_{t = 1}^{T} {(v_{it}^{k} - {\bar{v}}_{i}^{k})}^{2} \sum_{t = 1}^{T} {(v_{j t}^{k} - {\bar{v}}_{j}^{k})}^{2}}}

(3)

{p^{t}}_{i, j} = \frac{E d g e_{i, j, t - 1}}{O u t_{i, t - 1}} \times 100 %

(4)

where

ρ_{{v_{i}}^{k}, v_{j}} \in [- 1, 1]

, the larger the value, the stronger the spatial correlation.

The static spatial attention weight between nodes

v_{i}, v_{j}

is:

α_{i, j}^{'} = σ (w^{s} [α_{i, j}, ρ] + b^{s})

(5)

where

w^{s} \in ℝ^{N \times 2 N}, b^{s} \in ℝ^{N}

are the parameters to be learned.

The dynamic spatial correlation weight between nodes

v_{i}, v_{j}

is:

α_{i, j}^{″} = σ (w^{d} [α_{i, j}^{'}, P] + b^{d})

(6)

where

w^{d} \in ℝ^{N \times 2 N}, b^{d} \in ℝ^{N}

are the parameters to be learned.

The output of the spatial feature extractor for nodes

v_{i}

at t time step is:

{h^{* t}}_{v i} = σ (w^{o} [α_{i, j}^{″}] + b^{o})

(7)

where

w^{o}

is the mapping layer integrates all spatial correlation parameters.

4.1.2. Construction of Temporal Feature Learner

This study uses the GRU module to capture temporal features, contains three aspects of information:

Adjacent temporal feature (time): The situation in which the current predicted period of traffic volume data in the stable change pattern of traffic flow at highway stations is affected by the adjacent previous period or periods. That is $x_{t}$ capturing the impact of $x_{t - 1}, x_{t - 2}, x_{t - 3}, \dots$ the predicted time t.
Trend temporal characteristics (days): Station traffic flow has a similar trend of change every 24 h, such as the appearance of morning and evening peaks at similar times, and the trend of change between peak and off peak is similar. Capture $x_{t}$ the degree impact of $x_{t - 24}, x_{t - 2 \times 24}, x_{t - 3 \times 24}, \dots$ .
Periodic time characteristics (weeks): The traffic flow at the station has a clear weekly pattern, and the traffic flow during the current predicted time period is influenced by the traffic flow situation of the previous week or weeks. Capture the degree to which $x_{t}$ is affected by $x_{t - 7 \times 24}, x_{t - 2 \times 7 \times 24}, x_{t - 3 \times 7 \times 24}, \dots$ .

For station

s_{j}

, the output of the spatial feature learner is used as the input of the temporal feature learner, and GRU is used to extract its temporal features. Let

r, z

represent the reset-gate and update-gate respectively, and

c_{s_{j}, t_{j}}

is the output at

t_{j}

time step;

w

,

b

are the weight and bias parameters for each layer; and

σ (\cdot)

,

\tanh (\cdot)

represent the Sigmoid function and Tanh function, respectively.

The process of extracting temporal correlation is as follows:

Step1: GRU combines the input of this moment with the output of the previous moment and uses the reset gate to update the state unit

c_{s_{j}, t_{j}}

:

r_{s_{j}, t_{j}} = σ (w_{r} \cdot h_{t - 1} + w_{r} \cdot {x^{*}}_{s_{j}, t_{j}} + b_{r})

(8)

Step2: After passing the degree to which the previous output is brought into the current state through the update gate, the candidate update state units

{\tilde{c}}_{s_{j}, t_{j}}

:

z_{s_{j}, t_{j}} = σ (w_{z} \cdot h_{t - 1} + w_{z} \cdot {x^{*}}_{s_{j}, t_{j}} + b_{z})

(9)

{\tilde{c}}_{s_{j}, t_{j}} = t a n h (w_{c} \cdot r_{s_{j}, t_{j}} \cdot h_{t - 1} + w_{c} \cdot {x^{*}}_{s_{j}, t_{j}} + b_{c})

(10)

Step3: Determine the output of GRU:

h_{s_{j}, t_{j}} = z_{s_{j}, t_{j}} \cdot h_{t - 1} + (1 - z_{s_{j}, t_{j}}) \cdot {\tilde{c}}_{s_{j}, t_{j}}

(11)

4.1.3. Construction of External Feature Learner

This part considers the external characteristics that affect the traffic flow of highways, mainly including holidays and weather, which are categorical variables. Therefore, when constructing external features

C_{T} = (h, ω, v, W)

,

h \in [0, 1, 2, \dots, 23]

,

ω \in [0, 1, 2, \dots, 6]

,

v \in [0, 1, 2]

,

v = 1

means weekend,

v = 2

means festival and holiday.

W

is weather attribute, according to the weather conditions in the experimental area, and it is divided into three categories.

W = 0

includes three types of weather: sunny, cloudy, and partly cloudy;

W = 1

includes three types of weather: light rain, showers, and heavy rain; and

W = 2

includes weather types such as heavy snow, light snow, and sleet.

Perform One-Hot encoding on the constructed external feature matrix

C_{T}

, and then obtain the feature matrix expression through a fully connected layer as follows:

C_{s_{j}, t_{j}} = F C (O n e h o t (C_{T}))

(12)

The encoder extracts spatiotemporal features from the input raw data and converts them into hidden layer expressions, which are used for the spatiotemporal attention layer of the decoder.

h_{t}^{'} = c o n c a t ({h^{* t}}_{S j}, h_{s_{j}, t_{j}}, C_{s_{j}, t_{j}})

(13)

4.2. Decoder Structure

The decoder structure is similar to the encoder, with the difference being the output based on encoder

h_{t}^{'}

, which focuses on the attention coefficient of the impact of temporal, spatial, and external features on the predicted values of historical traffic data in each module. On the other hand, it uses a fully connected layer to predict the traffic flow at highway stations. The influence weight of time step from

t_{1}

to

t_{t + i}

is:

{u^{k}}_{t + i, t} = \frac{〈f_{k} (W^{k} h_{t + i}^{'}) \cdot f_{k} (W^{k} h_{t}^{'})〉}{\sqrt{d}}

(14)

{β^{k}}_{t + i, t} = \frac{\exp ({u^{k}}_{t + i, t})}{\exp \sum_{t_{r}}^{N_{t + i}} ({u^{k}}_{t + i, t_{r}})}

(15)

where

f_{k}

is the nonlinear transformation functions,

W^{k}

is the learning parameter, and

〈\cdot〉

represents inner product operation.

After obtaining the degree of influence attention coefficient

{β^{k}}_{t + i, t}

, the attention mechanism is adopted and the hidden layer state is updated to:

{\hat{h}}_{t} = \sum_{t_{r}}^{N_{t + i}} {β^{k}}_{t + i, t} \cdot ({W_{d}}^{k} h_{t + i}^{'})

(16)

For the prediction task of traffic flow at highway toll stations, the decoder output feature “feed forward” is transmitted to the fully connected layer to generate prediction values. The pseudocode of the task training algorithm for the model is shown in Algorithm 1, and the prediction function is as follows:

{\hat{y}}_{t} = W_{s} {\hat{h}}_{t}

(17)

Algorithm 1. ST-ED-GAGGRU Pseudocode for task training algorithm

Input: Historical traffic data, Structure of Highway Station Map

G = (V, E, A)

, Covariance matrix

ρ

, influence probability matrix

p

, weather condition:

W

Output: Trained traffic predictor

{\hat{f}}_{t r a i n}

.
//Model training
1. Repeat: epoch = epoch + 1
2. For each epoch extract from the training set

T_{batch}

3.

{h_{t}, \dots, h_{t + h - 1}, h_{t + h}} \leftarrow E n c o d e r (X_{t - k}, \dots, X_{t - 1}, X_{t})

4.

{{h^{'}}_{t}, \dots, {h^{'}}_{t + h - 1}, {h^{'}}_{t + h}} \leftarrow D e c o d e r (h_{t}, \dots, h_{t + h - 1}, h_{t + h})

5. Until meet the conditions for stopping the strategy
6. Calculate the loss and iterate in reverse

\nabla Loss \leftarrow L o s s {L}

//Model prediction
7. for i++, i < n
8.

{\hat{h}}_{t} = \sum_{t_{r}}^{N_{t + i}} β_{t + i, t}^{k} \cdot (W_{d}^{k} {h^{'}}_{t + i})

9.

{\hat{y}}_{t} = W_{s} {\hat{h}}_{t}

10. end
Output:

{\hat{f}}_{t r a i n}

5. Experimental Analysis

5.1. Dataset Description

This study utilized actual transaction data harvested from the Sichuan Province highway network’s toll collection system to assess the proposed model’s performance. Between 1 April and 30 June 2018, the system generated approximately 67.58 million records. During the data cleaning phase in the Oracle database, a significant amount of anomalous data were detected within the original dataset. The principal cleaning methods employed included removal of duplicate data, correction of abnormal average travel speed data, and elimination of virtual toll stations. Post-processing yielded 13.8 million valid records, averaging roughly 4.6 million records per month and approximately 150,000 daily transaction records. These records encompassed information from 548 valid toll stations within Sichuan Province. Statistical analysis reveals that the majority of trips, approximately 77.22%, occur within a 1 h timeframe due to the extensive duration of highway travel. An incomplete journey can have an impact on the experimental results, and an imprecise time granularity can also cause the trained model to fail to achieve the expected results. Based on this, we set the experimental granularity to 1 h. The distribution of travel time is depicted in Figure 5.

The data were collected in 1h intervals, resulting in 2184 time-sliced datasets. Chongqing was divided into a training set and a testing set in an 8:2 ratio based on experimental data. The topology of the Sichuan Province expressway network is depicted in Figure 6.

The effectiveness of the model was tested through experiments using data on the entrance, exit, and interaction flow at the Chengdu toll station in Chengdu-Chongqing. This data were used to predict future entrance and exit flow at the next moment. As the central city of western China, Chengdu is a key part of the national development plan for the Chengdu-Chongqing economic circle. Recently, traffic volume on the Chengdu-Chongqing Expressway has been rising, leading to noticeable congestion at significant toll stations. Predicting highway section traffic flow can contribute to better control of critical nodes in road network traffic, improve traffic conditions, optimize resource allocation efficiency, and increase overall road network traffic efficiency. The data were collected over a 1 h period for visual analysis, as illustrated in Figure 7. The traffic flow distribution for each consecutive day within a month displayed similar patterns of change, with notable alterations in traffic flow during various time periods. A discernible trend of evolution was observed between adjacent time periods, and the traffic flow distribution demonstrated a bimodal pattern, characterized by morning and evening peak hours. Furthermore, the trend of change varied across each working day of the week, with distinct trends also observed for Saturdays and Sundays.

Figure 8 illustrates the average hourly traffic flow of the data on weekdays and holidays. It is evident that there exists a notable correlation between variations in traffic flow and the occurrence of holidays. Furthermore, the graphs indicate that holidays indeed have a significant impact on the fluctuations observed in traffic data.

Upon conducting a multi-source fusion processing of weather data and traffic flow at a highway toll station, it was discerned that weather conditions significantly influence people’s travel decisions. The study also revealed that the traffic flow variations differ across distinct weather types, as illustrated in Figure 9.

As a typical representative of key stations on the Chengdu-Chongqing Expressway toll station (No.060), taking it as an example to analyze the dynamic spatial correlation between toll stations, it is found that the number of source stations will dynamically change in space due to different time periods, but its trend of change is similar to the time-varying trend of traffic, with a Pearson correlation coefficient of 0.97, indicating a strong correlation, as shown in Figure 10. By using a non_parametric testing method kernel density estimation to fit the data of the proportion of traffic flow between associated stations and corresponding sources, it was found that the density of the fitted feature data was concentrated between 10:00--23:00, and the proportion of traffic flow to the total day flow exceeded 90%. Meanwhile, a horizontal analysis of kernel density revealed that there were 14 source sites contributing over 1.8% of the total traffic, accounting for 68.91% of the day’s traffic. There are 332 other associated sites, and the traffic contribution rate of each site is less than 1%. The associated traffic is relatively scattered.

Statistical analysis of the entrance traffic of these 14 key associated stations is shown in Table 3. It is found that there are significant differences in the traffic of the associated sites, and the contribution ratio of the exit traffic to the target toll station deviates greatly. Moreover, the traffic contribution rate of each individual site exceeds 10% when concentrated in the three associated sites of 062, 066, and 067.

Figure 11 presents the Pearson correlation coefficient matrix heatmap for the entrance traffic at 14 key associated toll stations. The intensity of the color corresponds to the proximity of the temporal distribution trends and the strength of the correlation, suggesting this feature’s utility as a significant indicator for discerning dynamic spatial correlations. The graph reveals that, with the exception of the weak correlation between site 70 and sites 060, 951, and 934, there exists a pronounced correlation among the other key associated stations.

All data were input into the model after maximum minimum normalization. The experimental data are divided into a training set and a testing set in an 8:2 ratio. The prediction process of the model is shown in Figure 12.

5.2. Evaluation Metrics & Experimental Environment

Performance evaluation indicators for predictive models used in this study were the mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (

R^{2}

). The software and hardware configurations of the experimental environment for this study are shown in Table 4.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(18)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(19)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(20)

5.3. Baseline Methods

The literature review highlights the efficacy of various classical models in traffic flow prediction research. To underscore the superior performance of the model introduced in this study, we have chosen typical models, spanning traditional statistical methods (HA, ARIMA), machine learning techniques (SVR), and deep learning approaches (LSTM, STGCN), as baseline models for our comparative experiments in this section.

HA: This model calculates the mean of traffic at corresponding times in the historical period in the prediction task, which belongs to statistical methods. The data from the first week of input will not be included in the prediction. Starting from the eighth day, the average value of the same day and time in the previous week will be used for prediction.

ARIMA: This is a typical time series prediction model that converts data into stationary data through differencing, and uses a parameter model that combines autoregression (AR) and moving average (MA). The parameters are as follows: d = 1, AR: p = 2, MA: q = 3.

SVR: This is a supervised learning method based on support vector machine, which constructs the optimal hyperplane in high-dimensional feature space for regression analysis, and belongs to non-parametric models. The kernel function is selected as Gaussian function, and the initial values of the kernel function coefficients are set to 0.001.

LSTM: This is a widely used model in the field of time series prediction, which is a special type of recurrent neural network (RNN) that solves the long-term dependency problem of general RNNs. The number of LSTM layers is set to 1. The LSTM hidden layer is 10. The learning rate is set to 0.01. The batch size is set to 64.

STGCN: This model combines GCN and CNN, and is a deep learning network specifically designed for processing spatiotemporal data. There are three main modules: graph convolutional layer, spatiotemporal convolutional layer, and fully connected layer. The channels of the three layers in the ST Conv block are set to 64, 32, and 64. The initial learning rate is set to 0.001, and the Chebyshev polynomial is used to approximate the graph convolution kernel size K = 1 and the time convolution kernel size Kt = 3. RMSprop is used to minimize the mean-square error to train the model.

5.4. Experimental Setting

To verify the predictive performance of the model proposed in this chapter, the basic hyperparameters are first tuned. The basic hyperparameter setting range in this experiment is as follows: the model batch size (Batch_Size) is [32, 64, 128], the encoder layers L are set to [2, 3, 4, 5, 6], the number of attention heads is [1, 2, 4, 8], the learning rate is [0.01, 0.005, 0.001, 0.0005, 0.00001], and the maximum epoch is 100. During the training process, an early stopping mechanism (with a patience parameter of 50) was used, and a stochastic gradient descent (SGD) optimizer was selected. The iteration decay rate was set to 0.95, and the loss function used was mean-square error (MSE) loss, as shown in Formula (21). When epoch = 80, learning rate of 0.001, number of attention heads of 4, encoder layers of 5, and Batch_size of 64, the performance of the model is optimal. Table 5 lists the relevant hyperparameters set after model training. The impact analysis of some parameters is shown in Figure 13.

{Loss}_{MSE} = \frac{1}{N} \sum_{i = 1}^{N} \sum_{t = 1}^{M} {({\hat{y}}_{t}^{i} - y_{t}^{i})}^{2}

(21)

5.5. Experimental Results

Table 6 illustrates the experimental results of different models for predicting the exit and entrance flow at Chengdu toll stations on highways in Sichuan Province. Bolding denotes superior experimental performance, substantiating the efficacy of model development and feature selection discussed within this study. The findings unequivocally demonstrate that the deep learning approaches listed in the final three entries significantly outperform the conventional non-parametric models (HA and ARIMA) and the previously mentioned machine learning model (SVR). This comparative advantage indirectly suggests that deep learning paradigms are aptly suited for decoding the intricate spatiotemporal nonlinearities inherent in highway traffic forecasting. Such insights underscore the increasing predilection among researchers for deploying deep learning techniques to tackle these kinds of challenges in contemporary literature.

Specifically, in terms of export flow prediction, the RMSE, MAE, R² values of the STS-ED_GATGRU model are 35.42, 17.57, and 0.74, respectively. Compared with other baseline models, the evaluation indicators RMSE of the ST-ED-GAGRU model improved by 19.25, 17.96, 17.65, 11.36, and 5.67, respectively. The evaluation indicators MAE have increased by 11.36, 10.07, 9.56, 6.06, and 3.96 respectively. The evaluation indicators R² have increased by 0.32, 0.47, 0.28, 0.16, and 0.11, respectively.

In terms of entrance flow prediction, the RMSE, MAE, R² values of the ST-EDGATGRU model are 37.06, 18.78, and 0.72, respectively. Overall, the prediction effect of exit flow has slightly decreased, which may be due to the fact that the data of highway entrance flow comes from other national highways or expressways. Compared with other baseline models, the evaluation indicators RMSE of the ST-ED-GAGRU model improved by 16.2, 16.13, 15.17, 9.26, and 8.22, respectively. The evaluation indicators MAE have increased by 9.26, 8.46, 8.28, 4.63, and 3.58 respectively. The evaluation indicators R² have increased by 0.29, 0.44, 0.25, 0.14, and 0.11, respectively. We predict the entrance traffic into scenarios of workdays and holidays, and observe the changes in the evaluation indicator RMSE, as shown in Figure 14. All models have better predictive performance on workdays than on holidays, with specific data as follows: HA 0.82%, ARIMA 1.32%, SVR 1.19%, LSTM 0.91%, ST_GCN1.15%, and ST_ED_GATGRU 1.37%. On the one hand, there are more samples on workdays than on holidays, and on the other hand, there may be some unexpected situations during holidays that result in lower feature acquisition performance. In the prediction results of holiday, the performance metric RMSE of the proposed model decreased by 19.45, 18.14, 18.83, 11.48, and 5.73 compared to the baseline models HA, ARIMA, SVR, LSTM, and ST-GCN. In the prediction results of workdays, the performance metric RMSE of the proposed model decreased by 19.48, 17.92, 18.67, 11.53, and 5.74 compared to the baseline models HA, ARIMA, SVR, LSTM, and ST-GCN.

5.6. Ablation Experiment

The internal structure of our proposed model consists of three modules: a temporal feature extraction module using a gated recurrent unit network (GRU), a spatial feature extraction module using a graph attention network (GAT), and a fusion module combining external and spatiotemporal features. In order to further verify the impact of different modules of the proposed model on the overall model accuracy, we designed an ablation method and conducted experiments using the following models to analyze the results:

(1): Using traditional graph convolutional networks instead of graph attention networks (ST-ED_GCNGRU): This model replaces GAT, which involves spatial feature extraction, with GCN to achieve traffic prediction for the next time step of input data, while keeping other structures unchanged. Observe the dynamic correlation extraction effect of GAT on spatial features.
(2): Not using spatial feature extractor (T_ED-GATGRU): This model removes the GAT module that involves spatial feature extraction and only fuses temporal and external environmental features. It still uses the ENCODER-DECODER structure to achieve traffic prediction for the next time step of input data, and observes the effectiveness of the model that only considers the correlation between temporal and external features.
(3): Not using a temporal feature extractor (S_ED-GATGRU): This model removes the GRU module involved in temporal feature extraction, leaving other structures unchanged, to achieve traffic prediction for the next time step of input data. The model only considers spatial and external feature correlation effects.
(4): Not using the external feature fusion module (ST_ED_GATGRU_nofusion): This model removes external feature data, does not consider spatiotemporal attention fusion, and maintains other structures to achieve traffic prediction for the next time step of input data. It only considers spatiotemporal features, does not perform feature fusion, and does not consider the correlation of external features.

Table 7 shows the results of the ablation experiment, with the proposed ST-ED-GATGRU model achieving the highest performance across evaluation metrics compared to other methods. The poorest performing method was ST-ED-GATGRU_nofusion, suggesting that the fusion module is critical in assigning weights to diverse features. The performance of ST-ED-GCNGRU was closest to optimum, demonstrating superior fusion of temporal and spatial correlations, as well as external feature correlation. GAT notably enhanced the extraction of dynamic spatial correlation on highways. In exit traffic prediction, S_ED-GATGRU outperformed T_ED-GATGRU, indicating a stronger influence of spatial correlation than temporal correlation on exit traffic prediction. For predicting entrance traffic, T_ED-GATGRU performed better than S_ED-GATGRU, suggesting that entrance traffic’s spatial correlation is weaker than its time series correlation.

In order to explore the impact of dynamic spatiality on prediction results, we removed the influence probability matrix

p

from the trained model, which means only considering the influence of static physical space for comparative experiments. We found that in terms of exit flow prediction, the performance indicators RMSE and MAE decreased by 11.63% and 13.43%, while R² increased by 12.16%. In terms of entrance flow prediction, the performance indicators RMSE and MAE decreased by 4.56% and 6.58%, while R² increased by 11.11%.The changes in performance evaluation indicators further confirm that the impact of dynamic spatial correlation on exit flow is greater than that on entrance flow. Figure 15 shows the comparison results of performance evaluation indicators.

6. Conclusions

This study explored the problem of predicting traffic flow at highway stations in real scenarios using an improved spatiotemporal heterogeneous graph model, with data sourced from Chengdu ETC data. The proposed model, a deep learning construct, integrates the spatiotemporal attention mechanism with an encoder-decoder to transform traffic flow sequence data matrices and road network structures into graph-structured data. This model forms an impact probability matrix to quantitatively assess the dynamic influence of upstream toll stations on the traffic contribution of the target station. The covariance matrix between toll stations optimizes the graph structure data, while the attention mechanism analyses and assigns varying combination weights to the target toll station from temporal, spatial, external, and other dimensions. The internal architecture of the model primarily comprises three components: a temporal feature extraction module employing a gated recurrent unit network (GRU), a spatial feature extraction module utilizing a graph attention network (GAT), and a fusion module that combines external and spatiotemporal features. The findings of this study have practical applications for the active management of key nodes in highway traffic flow. By enhancing traffic operation conditions and optimizing resource allocation efficiency, it is possible to improve road network traffic efficiency. Furthermore, the data provided by this research can be leveraged to support the sustainable development of highway transportation systems. The main conclusions are as follows:

(1): In terms of case analysis, the traffic flow data of Chengdu-Chengyu Station on the Sichuan province highway was used for case experimental analysis and ablation experiments, and HA, ARIMA, SVR, LSTM, and ST-GCN were selected as baseline models. The experimental results show that the proposed model has RMSE, MAE, R² values of 35.42, 17.57, and 0.74 for exit traffic flow prediction, and RMSE, MAE, R² values of 37.06, 18.78, and 0.72 for entrance traffic flow prediction, respectively. This demonstrates the effectiveness of model construction and feature selection in this chapter.
(2): In terms of ablation experiment analysis, the ST_ED_GATGRU model for exit traffic flow prediction performance improved by 0.26 in RMSE, 0.25 in MAE, and 1.35% in R², and the ST_ED_GATGRU model for exit traffic flow prediction performance improved by 0.87 in RMSE, 0.08 in MAE, and 4.17% in R². This study assesses the influence of distinct modules within the proposed model on the accuracy of the entire model.
(3): In comparative experiments that only consider the influence of static physical space, it was found that in terms of exit flow prediction, the proposed model performance indicators RMSE and MAE decreased by 11.63% and 13.43%, while R² increased by 12.16%. In terms of entrance flow prediction, the proposed model performance indicators RMSE and MAE decreased by 4.56% and 6.58%, while R² increased by 11.11%. This illustrates the impact of dynamic spatiality on prediction results.

In subsequent research, a comprehensive dataset from diverse sources will be assembled. This dataset will encompass various geographical environments including plains, mountainous regions, and urban ring roads. Furthermore, it will incorporate data collected through different methods such as ETC gantry and radar monitoring. This diverse approach aims to validate the universality of the prediction model. The integration of advanced algorithms, like reinforcement learning and Bayesian optimization, is proposed for the dynamic adaptive adjustment of model parameters. Such enhancements are anticipated to provide more precise and efficient decision-making support for the management and planning of contemporary highway transportation systems.

Author Contributions

Study conception and design: J.C.; data collection: J.C. and J.G.; analysis and interpretation of results: Y.Z.; draft manuscript preparation: Y.Z.; Coding Programming: Y.Z. and F.C.; Supplementary experiments: J.G. and F.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by National Natural Science Foundation of China General Project (52472339), and in part by Supported by Intelligent Policing Key Laboratory of Sichuan Province, No. ZNJW2024KFMS007, and in part by Key Project of the Philosophy and Social Sciences Innovation Program of Chongqing (2024CXZD25), and in part by the Supported by Sichuan Science and Technology Program under Grant No. 2024NSFSC2029.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no potential conflicts of interest.

References

Naheliya, B.; Redhu, P.; Kumar, K. A Review on Developments in Evolutionary Computation Approaches for Road Traffic Flow Prediction: A Review on Developments in Evolutionary Computation Approaches for Road…: B. Naheliya et al. Arch. Comput. Methods Eng. 2025, 32, 1–25. [Google Scholar] [CrossRef]
Khalil, R.A.; Safelnasr, Z.; Yemane, N.; Kedir, M.; Shafiqurrahman, A.; Saeed, N. Advanced Learning Technologies for Intelligent Transportation Systems: Prospects and Challenges. IEEE Open J. Veh. Technol. 2024, 5, 397–427. [Google Scholar] [CrossRef]
Sattarzadeh, A.R.; Pathirana, P.N.; Huynh, V.T. Traffic State Estimation with Spatio-Temporal Autoencoding Transformer (STAT Model). IEEE Access 2025, 13, 87048–87067. [Google Scholar] [CrossRef]
Wei, X.; Xia, D.; Li, Y.; Ao, Y.; Chen, Y.; Hu, Y.; Li, Y.; Li, H. Attention-based spatial-temporal synchronous graph convolution networks for traffic flow forecasting. Appl. Intell. 2025, 55, 516. [Google Scholar] [CrossRef]
Song, L.; Ren, Q.; Zhou, Y. Spatio-Temporal Heterogeneous Graph Neural Network with Multi-view Learning for Traffic Prediction. In Proceedings of the International Conference on Pattern Recognition, Viña del Mar, Chile, 1–4 December 2025; Springer: Cham, Switzerland, 2025. [Google Scholar]
Wang, R.; Xi, L.; Ye, J.; Zhang, F.; Yu, X.; Xu, L. Adaptive Spatio-Temporal Relation Based Transformer for Traffic Flow Prediction. IEEE Trans. Veh. Technol. 2025, 74, 2220–2230. [Google Scholar] [CrossRef]
Wang, D.; Guo, G.; Ouyang, T.; Yu, D.; Zhang, H.; Li, B.; Jiang, R.; Xu, G.; Deng, S. A Lightweight Spatio-Temporal Neural Network With Sampling-Based Time Series Decomposition for Traffic Forecasting. IEEE Trans. Intell. Transp. Syst. 2025, 26, 8682–8693. [Google Scholar] [CrossRef]
Rahimi, R.; Ravirathinam, P.; Ebtehaj, A.; Behrangi, A.; Tan, J.; Kumar, V. Global Precipitation Nowcasting of Integrated Multi-satellitE Retrievals for GPM: A U-Net Convolutional LSTM Architecture. J. Hydrometeorol. 2024, 25, 947–963. [Google Scholar] [CrossRef]
Xia, D.; Shen, B.; Geng, J.; Hu, Y.; Li, Y.; Li, H. Attention-based spatial–temporal adaptive dual-graph convolutional network for traffic flow forecasting. Neural Comput. Appl. 2023, 35, 17217–17231. [Google Scholar] [CrossRef]
Zhu, W.; Zhou, X.; Lan, S.; Wang, W.; Hou, Z.; Ren, Y.; Pan, T. A dual branch graph neural network based spatial interpolation method for traffic data inference in unobserved locations. Inf. Fusion 2025, 114, 102703. [Google Scholar] [CrossRef]
Vlahogianni, E.I.; Karlaftis, M.G.; Golias, J.C. Short-term traffic forecasting: Where we are and where we’re going. Transp. Res. Part C Emerg. Technol. 2014, 43, 3–19. [Google Scholar] [CrossRef]
Vol, N. Nonparametric statistics for stochastic processes. In Nonparametric Statistics for Stochastic Processes; Springer: Cham, Switzerland, 1998. [Google Scholar]
Williams, B.M. Multivariate Vehicular Traffic Flow Prediction: Evaluation of ARIMAX Modeling. Transp. Res. Rec. J. Transp. Res. Board 2001, 1776, 194–200. [Google Scholar] [CrossRef]
Lee, S.; Fambro, D.B. Application of Subset Autoregressive Integrated Moving Average Model for Short-Term Freeway Traffic Volume Forecasting. Transp. Res. Rec. J. Transp. Res. Board 1999, 1678, 179–188. [Google Scholar] [CrossRef]
Zhao, X.; Bi, R.; Yang, R.; Chu, Y.; Guo, J.; Huang, W.; Cao, J. Short-Term Traffic Flow Prediction Based on the Intelligent Parameter Adjustment K-Nearest Neighbor Algorithm. In International Conference on Artificial Intelligence and Soft Computing; Springer: Cham, Switzerland, 2020. [Google Scholar]
Zhang, Y.; Xie, Y. Forecasting of Short-Term Freeway Volume with v-Support Vector Machines. Transp. Res. Rec. J. Transp. Res. Board 2007, 2024, 92–99. [Google Scholar] [CrossRef]
Xie, Y.; Zhao, K.; Sun, Y.; Chen, D. Gaussian Processes for Short-Term Traffic Volume Forecasting. Transp. Res. Rec. J. Transp. Res. Board 2010, 2165, 69–78. [Google Scholar] [CrossRef]
Sun, S.; Zhang, C.; Yu, G. A Bayesian Network Approach to Traffic Flow Forecasting. IEEE Trans. Intell. Transp. Syst. 2006, 7, 124–132. [Google Scholar] [CrossRef]
Erfani, S.M.; Rajasegarar, S.; Karunasekera, S.; Leckie, C. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognit. 2016, 58, 121–134. [Google Scholar] [CrossRef]
Koesdwiady, A.; Soua, R.; Karray, F. Improving Traffic Flow Prediction with Weather Information in Connected Cars: A Deep Learning Approach. IEEE Trans. Veh. Technol. 2016, 65, 9508–9517. [Google Scholar] [CrossRef]
Huang, W.; Song, G.; Hong, H.; Xie, K. Deep Architecture for Traffic Flow Prediction: Deep Belief Networks with Multitask Learning. IEEE Trans. Intell. Transp. Syst. 2014, 15, 2191–2201. [Google Scholar] [CrossRef]
Yang, H.-F.; Dillon, T.S.; Chen, Y.-P.P. Optimized Structure of the Traffic Flow Forecasting Model With a Deep Learning Approach. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2371–2381. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Hinton, G.E.; Osindero, S.; Teh, Y.-W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
Yang, D.; Li, S.; Peng, Z.; Wang, P.; Wang, J.; Yang, H. MF-CNN: Traffic Flow Prediction Using Convolutional Neural Network and Multi-Features Fusion. IEICE Trans. Inf. Syst. 2019, E102.D, 1526–1536. [Google Scholar] [CrossRef]
Liu, Y.; Zheng, H.; Feng, X.; Chen, Z. Short-term traffic flow prediction with Conv-LSTM. In Proceedings of the 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 11–13 October 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
Xiao, J.; Long, B. A multi-channel spatial-temporal transformer model for traffic flow forecasting. Inf. Sci. 2024, 671, 14. [Google Scholar] [CrossRef]
Srivastava, N.; Devarakonda, R.; Ruthwik; Krishna, V.; Bharadwaj, B.; Gohil, B.N. Predicting Traffic Flow with Deep Learning. In International Conference on Soft Computing for Problem-Solving; Springer: Cham, Switzerland, 2024. [Google Scholar]
Ma, Y.; Lou, H.; Yan, M.; Sun, F.; Li, G. Spatio-temporal fusion graph convolutional network for traffic flow forecasting. Inf. Fusion 2024, 104, 11. [Google Scholar] [CrossRef]
Karim, A.; Nower, N. Probabilistic spatio-temporal graph convolutional network for traffic forecasting. Appl. Intell. 2024, 54, 7070–7085. [Google Scholar] [CrossRef]
Jiang, S.; Zhu, M.; Li, J. Traffic Flow Forecasting Using a Spatial-Temporal Attention Graph Convolutional Network Predictor. In International Conference on Spatial Data and Intelligence; Springer: Cham, Switzerland, 2020. [Google Scholar]
Gu, J.; Jia, Z.; Cai, T.; Song, X.; Mahmood, A. Dynamic Correlation Adjacency-Matrix-Based Graph Neural Networks for Traffic Flow Prediction. Sensors 2023, 23, 2897. [Google Scholar] [CrossRef]
Kong, Y.; Li, L.; Zhang, K.; Ni, Q. Attention module-based spatial–temporal graph convolutional networks for skeleton-based action recognition. J. Electron. Imaging 2019, 28, 1. [Google Scholar] [CrossRef]
Shao, Z.; Yao, X.; Wang, Z.; Gao, J. ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction. arXiv 2024, arXiv:2404.15899. [Google Scholar]
Wu, Y.; Tan, H.; Qin, L.; Ran, B.; Jiang, Z. A hybrid deep learning based traffic flow prediction method and its understanding. Transp. Res. Part C Emerg. Technol. 2018, 90, 166–180. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. arXiv 2019, arXiv:1906.00121. [Google Scholar]
Zheng, H.; Li, X.; Li, Y.; Yan, Z.; Li, T. GCN-GAN: Integrating Graph Convolutional Network and Generative Adversarial Network for Traffic Flow Prediction. IEEE Access 2022, 10, 94051–94062. [Google Scholar] [CrossRef]
Hu, N.; Liang, W.; Zhang, D.; Xie, K.; Li, K.; Zomaya, A.Y. FedGCN: A Federated Graph Convolutional Network for Privacy-Preserving Traffic Prediction. IEEE Trans. Sustain. Comput. 2024, 9, 925–935. [Google Scholar] [CrossRef]

Figure 1. The proposed model structure diagram.

Figure 2. The structure of the encoder.

Figure 3. The structure of the spatial feature learner.

Figure 4. Node dynamic spatial correlation.

Figure 5. The distribution of travel time is depicted.

Figure 6. Topological diagram of highway network.

Figure 7. Visualization of time-varying characteristics of traffic flow.

Figure 8. The average hourly traffic flow of the data on weekdays and holidays.

Figure 9. Alterations in the relationship between traffic flow and meteorological conditions.

Figure 10. Number of associated station and trend of traffic changes.

Figure 11. The correlation coefficient’s heatmap with key source stations.

Figure 12. The flowchart of model prediction.

Figure 13. Hyperparameter impact analysis.

Figure 14. Comparison of prediction results for workdays and holidays.

Figure 15. Experimental analysis of dynamic spatial ablation.

Table 1. History of typical methods for traffic volume prediction based on deep learning.

Year	Authors	Model Approach
2016	Wu et al. [20]	A feature-level fusion model that utilizes CNN to capture the spatial features of traffic flow and two LSTMs to mine the short-term periodicity characteristics of traffic flow.
2017	Yu B et al. [31]	A novel deep learning framework, spatial temporal graph convolutional network (STGCN) to tackle the time series prediction problem in traffic domain.
2018	Wu et al. [36]	Constructing a fusion prediction model based on CNN_GRU.
2019	Z. Wu et al. [37]	A novel graph neural network architecture, Graph WaveNet, for spatial-temporal graph modeling. By developing a novel adaptive dependency matrix and learn it through node embedding.
2019	S. Guo et al. [34]	A novel attention based spatial-temporal graph convolutional network (ASTGCN) model to solve traffic flow forecasting problem.
2020	Jiang S et al. [32]	A novel attention-based graph neural network predictor (GAT) to forecast traffic flow.
2022	H. Zheng et al. [38]	A novel traffic flow prediction model named “graph convolution and generative adversative neural network” (GCN-GAN) to predict urban traffic flow.
2023	Gu J et al. [33]	A novel model named “dynamic correlation graph convolutional network” (DCGCN) for traffic forecasting.
2024	Shao Z et al. [35]	A multi-level multi-view augmented spatiotemporal transformer (Transformer) for traffic prediction.
2024	N Hu et al. [39]	A federated graph neural network with spatial information completion (FedGCN) for privacy-preserving traffic prediction.

Table 2. Summary of key symbols.

Symbol	Definition
$X_{t - k}, \dots, X_{t - 1}, X_{t}$	Historical traffic flow of k time slices
$C$	Traffic feature matrix
$G$	Road network graph
$ρ$	Covariance matrix
$p$	Influence probability matrix
$h_{t}$	Output of encoder hidden layer
$h_{t}^{'}$	Output of decoder hidden layer
${\hat{x}}_{t + h}$	Final model output data

Table 3. Statistical analysis of traffic data from key associated station.

No.	Station	Mean	σ	Min	Max	Traffic Contribution Rate
1	40	5135.92	3217.58	596	9500	3.32%
2	62	1626.75	1245.96	85	3752	16.36%
3	64	926.54	761.04	28	2210	3.77%
4	66	1421.58	1012.66	140	3159	10.13%
5	67	1425.71	1118.94	89	3356	10.69%
6	70	569.17	410.92	64	1283	1.98%
7	71	1417.75	1078.86	132	3342	5.77%
8	73	1903.92	1477.89	176	4516	2.21%
9	81	1388.96	1171.90	68	3540	2.77%
10	141	918.58	702.35	104	2137	1.85%
11	146	1977.83	1584.23	88	4864	1.80%
12	830	889.13	681.70	60	1880	1.80%
13	934	3272.17	2501.11	314	6944	2.88%
14	951	2092.54	1788.58	200	6264	3.58%

Traffic Contribution Rate displayed in bold indicates a value greater than 10%.

Table 4. Experimental environment parameters.

Hardware Configuration	Parameters	Software Configuration	Parameters
CPU	Intel(R) Core(TM) i7	Programming language	Python3.7
Memory/hard disk	64G/2T	Deep learning framework	TensorFlow 2.0
Graphics card	NVIDIA GeForce GTX 1050	Deep learning library	Keras 2.3
		database	Linux7.9 Oracle11g

Table 5. The parameters of proposed model.

Module	Hyperparameters	Values
ENCODER	Hidden nodes layer	128
DECODER	Hidden nodes layer	128
Graph Attention Network	Hidden nodes layer	64
Graph Attention Network	The head of Attention	4
Gate controlled cycle unit	Hidden nodes layer	64
Fully connected layer	Hidden nodes layer	128
Overall Model	Optimizer	SGD
	Learning rate	0.001
	Epoch	80
	Decay rate	0.95
	Batch_size	64

Table 6. Performance evaluation of the proposed model and baseline model on the dataset.

Model	Exit Traffic			Entrance Traffic
Model	RMSE	MAE	R²	RMSE	MAE	R²
HA	54.67	28.93	0.42	53.26	28.04	0.43
ARIMA	53.38	27.64	0.27	53.19	27.24	0.28
SVR	53.07	27.13	0.46	52.23	27.06	0.47
LSTM	46.78	23.63	0.58	46.32	23.41	0.58
ST_GCN	41.09	21.53	0.63	45.28	22.36	0.61
ST_ED_GATGRU	35.42	17.57	0.74	37.06	18.78	0.72

Table 7. Analysis of ablation experiment results.

Model	Exit Traffic			Entrance Traffic
Model	RMSE	MAE	R²	RMSE	MAE	R²
ST_ED_GATGRU	35.42	17.57	0.74	37.06	18.78	0.72
ST_ED_GCNGRU	35.68	17.82	0.73	37.89	18.86	0.69
T_ED_GATGRU	36.43	18.17	0.7	37.98	18.94	0.68
S_ED_GATGRU	36.19	18.03	0.71	38.23	19.14	0.67
ST_ED_GATGRU_nofusion	36.75	18.24	0.68	38.54	19.29	0.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Chen, J.; Chen, F.; Gao, J. Improved Heterogeneous Spatiotemporal Graph Network Model for Traffic Flow Prediction at Highway Toll Stations. Sustainability 2025, 17, 7905. https://doi.org/10.3390/su17177905

AMA Style

Zhang Y, Chen J, Chen F, Gao J. Improved Heterogeneous Spatiotemporal Graph Network Model for Traffic Flow Prediction at Highway Toll Stations. Sustainability. 2025; 17(17):7905. https://doi.org/10.3390/su17177905

Chicago/Turabian Style

Zhang, Yaofang, Jian Chen, Fafu Chen, and Jianjie Gao. 2025. "Improved Heterogeneous Spatiotemporal Graph Network Model for Traffic Flow Prediction at Highway Toll Stations" Sustainability 17, no. 17: 7905. https://doi.org/10.3390/su17177905

APA Style

Zhang, Y., Chen, J., Chen, F., & Gao, J. (2025). Improved Heterogeneous Spatiotemporal Graph Network Model for Traffic Flow Prediction at Highway Toll Stations. Sustainability, 17(17), 7905. https://doi.org/10.3390/su17177905

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Heterogeneous Spatiotemporal Graph Network Model for Traffic Flow Prediction at Highway Toll Stations

Abstract

1. Introduction

2. Related Studies

3. Preliminaries and Problem Definitions

4. Model Construction

4.1. Encoder Structure

4.1.1. Construction of Spatial Feature Learner

4.1.2. Construction of Temporal Feature Learner

4.1.3. Construction of External Feature Learner

4.2. Decoder Structure

5. Experimental Analysis

5.1. Dataset Description

5.2. Evaluation Metrics & Experimental Environment

5.3. Baseline Methods

5.4. Experimental Setting

5.5. Experimental Results

5.6. Ablation Experiment

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI