Transformer-Based Traffic Flow Prediction Considering Spatio-Temporal Correlations of Bridge Networks

Tian, Yadi; Li, Wanheng; Wang, Xiaojing; Yan, Xin; Xu, Yang

doi:10.3390/app15168930

Open AccessArticle

Transformer-Based Traffic Flow Prediction Considering Spatio-Temporal Correlations of Bridge Networks

by

Yadi Tian

^1,2,

Wanheng Li

¹,

Xiaojing Wang

²,

Xin Yan

² and

Yang Xu

^3,4,*

¹

Research Institute of Highway Ministry of Transport, Beijing 100088, China

²

China-Road Transportation Verification & Inspection Hi-Tech Co., Ltd., Beijing 100088, China

³

Key Lab of Smart Prevention and Mitigation of Civil Engineering Disasters of the Ministry of Industry and Information Technology, Harbin 150090, China

⁴

School of Civil Engineering, Harbin Institute of Technology, Harbin 150090, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(16), 8930; https://doi.org/10.3390/app15168930

Submission received: 14 July 2025 / Revised: 9 August 2025 / Accepted: 11 August 2025 / Published: 13 August 2025

(This article belongs to the Special Issue Artificial Intelligence for Structural Health Monitoring, Inspection, Maintenance, and Rehabilitation of Civil Infrastructure)

Download

Browse Figures

Versions Notes

Abstract

With the widespread implementation of bridge structural health monitoring (SHM) systems, monitored bridge networks have gradually formed. Understanding vehicle loads and considering spatio-temporal correlations within bridge networks is critical for structural condition assessment and maintenance decision making. This study aims to predict traffic flows by investigating traffic flow correlations within a bridge network using multi-bridge data, thereby supporting bridge network-level SHM. A transformer-based traffic flow prediction model considering spatio-temporal correlations of bridge networks (ST-TransNet) is proposed. It integrates external factors (processed via fully connected networks) and multi-period traffic flows of input bridges (captured by self-attention encoders) to generate traffic flow predictions through a self-attention decoder. Validated using weigh-in-motion data from an 8-bridge network, the proposed ST-TransNet reduces prediction root mean square error (RMSE) to 12.76 vehicles/10 min, outperforming a series of baselines—SVR, CNN, BiLSTM, CNN&BiLSTM, ST-ResNet, transformer, and STGCN—with significant relative reductions of 40.5%, 36.9%, 36.6%, 37.3%, 35.6%, 31.1%, and 22.8%, respectively. Ablation studies confirm the contribution of each component of the external factors and multi-period traffic flows, particularly the recent traffic flow data. The proposed ST-TransNet effectively captures underlying the spatio-temporal correlations of traffic flow within bridge networks, offering valuable insights for enhancing bridge assessment and maintenance.

Keywords:

traffic flow prediction; bridge network; transformer; spatio-temporal correlations modeling; structural health monitoring

1. Introduction

Structural health monitoring (SHM) systems are vital for preventing bridge operational risks, improving maintenance, and enhancing emergency response [1,2]. China aims for comprehensive SHM coverage on special bridges (e.g., those spanning rivers, seas, and gorges) and full implementation across highway bridges [3]. China is progressively establishing a comprehensive SHM system for road and bridge networks [4,5]. Compared to single-bridge monitoring, network-level SHM provides a new perspective for structural assessment and maintenance. Considering that bridge groups within a region are subject to similar environmental conditions (temperature, humidity, rainfall, icing, etc.) and loads (vehicle, wind, earthquakes, etc.), this geographical correlation can alleviate signal interference and incomplete data, while also enabling cross-bridge comparative analysis. Analyzing bridge networks presents a great structural engineering challenge, as these networks are characterized as spatial correlated systems vastly larger and more complex than any single structure [6].

Traffic data are crucial for analyzing bridge and bridge network performance, assessing reliability, managing maintenance, and optimizing operation. The accurate prediction of traffic flow plays a significant role in estimating vehicle loads on bridges, which could utilize the statistical information of vehicle type, wheelbase, axle load, and wheel spacing. Including traffic conditions into damage detection algorithms can significantly improve the diagnostic accuracy [7]. Understanding the bridge network correlations is important for road network vulnerability [8], as bridge capacity or closures impact transport network operability and resilience [9]. Traffic flow distribution is crucial for bridge network fragility analysis [10], and traffic demand significantly impacts optimal plan for bridge network maintenance scheduling [11]. Traffic load research contributes to bridge design and safety assessment [7]. Specifically, vehicle weight and traffic volume significantly affect the fatigue life of orthotropic steel bridges [12,13]. Bocchini and Frangopol used a simplified traffic flow model to evaluate bridge network life-cycle performance and reliability [14]. Fiorillo and Ghosn identified overweight traffic loads as major risk factors for highway bridge networks [15]. For large bridges, weigh-in-motion (WIM) systems monitor key traffic parameters (axle weight, speed and volume etc.), providing vital data for traffic load simulation [16]. These data support critical applications, like reliability-based vehicle weight limits [17] and improved bridge load rating analysis [18].

This study mainly focuses on traffic flow prediction (TFP) within bridge networks so as to understand the traffic load correlations across networks, supporting subsequent SHM research. TFP remains challenging due to numerous influencing factors, like weather, accidents, and holidays [19]. Prediction techniques fall into parametric models (e.g., time series analysis [20] and Kalman filtering [21]) and nonparametric methods [22].

Parametric models capture temporal patterns and influencing factors in traffic flows but struggle with complex nonlinear relationships between variables. Consequently, nonparametric models—primarily machine learning (ML) and deep learning (DL) approaches—have gained significant attention. Their structures are not predefined but are instead learned based on data. ML methods for TFP include the k-nearest neighbor method [23], support vector regression [24], and artificial neural networks [22]. DL approaches primarily use CNNs, recurrent neural networks (RNNs), transformers, and hybrid methods. CNNs are specialized for capturing spatial features (e.g., city-wide traffic flow modeled over distinct periods [25] and regional crowd flow [26]). RNNs are specialized for modeling temporal dependencies. Wang et al. decomposed a time series into a trend term and residual term using LSTM-RNN reconstruction [27]. Transformers are also used for sequence modeling [28]. Hybrid/ensemble methods combine different structures to improve performance. Han et al. used adjacent point inputs with a CNN (spatial) and LSTM (temporal) for highway TFP [29]. Du et al. fused multi-modality traffic data via a CNN-GRU-Attention module for TFP [30]. Jia and Yan converted taxicab trajectory data to images using an RCNN (temporal) and a CNN (spatial), supplemented with external data [31]. Ma et al. integrated a CNN, LSTM, and attention mechanism with meteorological data for short-term highway TFP [32]. Méndez et al. combined a CNN and BiLSTM, along with auxiliary stations and meteorological data, for long-term TFP [33].

Building on these DL advancements, graph neural networks (GNNs) offer a powerful framework for TFP by effectively modeling unstructured data and complex relational patterns through graph structures, often integrated with other DL techniques. Yu et al. first applied GNNs to traffic prediction [34]. Li and Zhu integrated spatial and temporal elements using a temporal graph to complement a spatial graph for TFP [35]. Variants, like spatial and temporal transformers, leverage self-attention mechanisms for traffic speed prediction [36]. Guo et al. designed a graph convolutional network (GCN) to capture multi-scale dependencies (recent, daily, and weekly) in traffic patterns [37]. Jiang et al. used an extended GCN for traffic speed prediction [38]. Zhao et al. proposed a points of interest-based dynamic GCN for traffic forecasting [39]. Emerging hybrids include integration of GNNs with large language models for TFP (Sun et al. [40]).

Other DL-based TFP research can reference to [41,42,43,44,45]. Existing traffic flow forecasting studies primarily serve intelligent transportation systems (ITSs) for traffic efficiency and resource allocation [19,40], using past traffic data of an area to predict its future traffic flow. Currently, in the field of bridge health monitoring, on the one hand, not all bridges in the road network are equipped with WIM systems; on the other hand, even for those installed with WIM, there exist issues of incomplete or missing data.

The key innovations and scientific contributions of this study are summarized as follows.

Firstly, this study advances TFP specifically for SHM of bridge networks. Unlike existing TFP research focusing on intelligent transportation systems for traffic efficiency, we develop a novel TFP pipeline considering spatio-temporal correlations of bridge networks—predicting traffic loads critical to structural assessment, reliability analysis, and maintenance planning.

Secondly, this study proposes a transformer-based traffic flow prediction model considering spatio-temporal correlations of bridge networks (ST-TransNet), integrating external factors (processed via fully connected networks) and multi-period traffic flows of input bridges (captured by self-attention encoders) to generate traffic flow predictions through a self-attention decoder.

Thirdly, this study introduces a fixed-value imputation approach to address the missing data issue in real-world WIM systems, which preserves multi-period time synchronization to maintain sequence integrity. Imputed data are excluded from the calculation of evaluation metrics to ensure accuracy without interpolation artifact distortions. This data processing approach enhances methodology adaptability to practical application scenarios.

Finally, the proposed method is validated through real-world WIM data from an operational eight-bridge network. ST-TansNet reduces the RMSE of TFP to 12.76 vehicles/10 min, achieving a relative error reduction of 22.8–40.5% over 7 different types of baseline models (SVR, CNN, BiLSTM, CNN&BiLSTM, ST-ResNet, transformer, and STGCN). Furthermore, ablation studies validate the necessity and effectiveness of each component (External Factor 1, External Factor 2, distant traffic flow, near traffic flow, recent traffic flow, output traffic flow, and the direct link between the encoder and output), ensuring the rationality of the framework design. This in-field validation confirms the actual application capability of the proposed ST-TransNet for reliable TFP of bridge networks.

This study proposes a spatial-temporal transformer-based traffic flow prediction model (ST-TransNet) for bridge networks which can effectively predict the traffic flow of target bridges and support bridge health monitoring and network-level analysis. The rest of this paper is outlined as follows. Section 2 introduces the structure of the proposed model for TFP. Section 3 describes the implementation details on a bridge network, including bridge locations, the dataset, and the training hyperparameter settings. Section 4 contains the results and discussion, presenting the prediction results, validating the model effectiveness, and comparing it with the baseline models. Section 5 contains the conclusions of the paper.

2. Methodology

2.1. Overall Architecture of ST-TransNet

This study proposes ST-TransNet, a model designed to capture the correlation among traffic flows within bridge networks. ST-TransNet considers the spatio-temporal correlations inherent in traffic flows and incorporates external factors into its analysis. The attention mechanism operates across various time steps, enabling it to learn temporal dependencies, while the interaction among different features within the same time step allows it to grasp spatial correlations. This dual functionality of the attention mechanism enables the depiction of spatio-temporal correlations within traffic flows across bridge networks.

The architecture of ST-TransNet, which is utilized for TFP in bridge networks, is depicted in Figure 1. As shown in Figure 1a, the ST-TransNet comprises of input section and output section. The input section consists of five segments, with two segments modeling external components and three depicting the traffic flow on input bridges during various time periods. The outcomes of these five components are integrated through fusion, with distinct weights assigned to each component. These fused data, in conjunction with the masked output traffic flow, are then fed into the output module. The subsequent part of Section 2 will offer a detailed introduction to the proposed ST-TransNet.

2.2. Timestamp Input and Feature Extraction by External Components

This section details the calculation of external components in ST-TransNet. External factors considered in this study include hour of day (External Factor 1) and day of week and holiday (External Factor 2) as the first two input components in Figure 1a. They are processed via two distinct layers of fully connected neural networks.

Traffic flow is affected by a variety of external factors, including weather conditions (sunny, rainy, typhoon, etc.), temporal elements (rush hour, weekdays, holidays, etc.), and events (traffic accidents, natural disasters, ship collisions, etc.). In this study, we primarily focus on temporal factors, including hour of day, day of week, and holiday. Figure 2 illustrates the influence of the temporal factor on traffic flow. Figure 2a shows the daily variation in traffic flow for three bridges over two consecutive days, exhibiting a distinct diurnal pattern. The traffic flow fluctuates throughout the day, with peak hours between 12:00 and 18:00 and troughs between 00:00 and 06:00. Figure 2b reveals that traffic flow on weekends for these bridges is typically lower than on weekdays. Figure 2c indicates that traffic flow on holidays may differ from that observed on regular days.

A twenty-four-bit one-hot encoding scheme is employed to represent the hour of day information (External Factor 1), which encodes the specific hour. For instance, the sequence {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0} indicates that the samples fall within the time range of 00:00 to 01:00.

An encoding scheme using nine bits of one-hot representation is utilized to encode the day of week and holiday information (External Factor 2). The first seven bits correspond to the day of the week, the eighth bit indicates whether it is a holiday, and the ninth bit signifies if the day is a compensatory day off. For instance, the sequence {0,1,0,0,0,0,0,1,0} indicates that it is Tuesday and a holiday, while {0,0,0,0,0,1,0,0,1} represents working overtime on a Saturday.

Two stacked fully connected layers, designated as FC1 and FC2, are employed to extract the features of external factors, e.g., the hour of day and day of week and holiday, respectively.

L

denotes the length of external factors, which is equal to the length of the output traffic flow.

e_{1}

and

e_{2}

represent the output dimensions of these two fully connected networks, respectively. The output vectors of FC1 and FC2 are

X_{ex1} \in ℝ^{L \times e_{1}}

,

X_{ex2} \in ℝ^{L \times e_{2}}

, respectively.

2.3. Multi-Period Traffic Flow Input by Recent, Near, and Distant Components

2.3.1. Traffic Flow Input Matrices

Given that different time periods can influence traffic patterns in various ways, we segment the traffic flow into three distinct time periods for separate analysis, as depicted in the latter three networks in the input part shown in Figure 1a. Three temporal dependencies are modeled: recent (preceding-hour flow), near (adjacent-day homologous flow), and distant (multi-week historical flow).

We now introduce the selection of input traffic flows for these three networks.

L_{o u t}

denotes the length of the output, while the output traffic flow can be represented as

Y = [y_{t + 1}, y_{t + 2}, \dots, y_{t + L_{out}}]

. Assign

{t + 1, t + 2, \dots, t + L_{out}}

as

T

; thus,

Y = y_{T}

. Considering the cyclical nature of traffic patterns, we shall assign

p

to represent the number of days and

q

to represent the number of weeks that exert significant influence. Let

L_{recent}

represent the length of recent influential samples, while

L_{d a y}

and

L_{w e e k}

denote the length of samples in one day and one week, respectively. Consequently, the input traffic flow matrices are structured as follows:

X_{distant} = [x_{T - L_{week} \cdot q}, x_{T - L_{week} \cdot (q - 1)}, \dots, x_{T - L_{week}}]

for the distant network (Encoder 1),

X_{near} = [x_{T - L_{day} \cdot p}, x_{T - L_{day} \cdot (p - 1)}, \dots, x_{T - L_{day}}]

for the near network (Encoder 2), and

X_{recent} = [x_{t - L_{recent} + L_{out} + 1}, x_{t - L_{recent} + 2}, \dots, x_{t}, x_{T}]

for the recent network (Encoder 3). Thus, the length of the input matrix for the distant network is

L_{distant} = q \cdot L_{out}

, the length of the input matrix for the near network is

L_{near} = p \cdot L_{out}

, and the length of the input matrix for the recent network is

L_{recent}

.

Since the attention mechanism cannot capture temporal information, positional embedding is required to incorporate position information into the input sequences. In this study, we introduce an incremental one-dimensional vector to the traffic flow matrices. For example, given an input traffic flow matrix

X_{in 0} \in ℝ^{l^{*} \times n}

, where

l^{*}

is the length of the input traffic flow, which may vary among the three networks, and

n

represents the number of input bridges, the added vector can be defined as

{0 : a : l^{*} \cdot a}

,

l^{*} \cdot a < 1

. Consequently, the input matrix

X_{in} \in ℝ^{l^{*} \times (n + 1)}

is modified accordingly.

2.3.2. Computational Procedures of Encoder Blocks

As depicted in Figure 1a, the distant traffic flow matrix

X_{distant}

, near traffic flow matrix

X_{near}

, and recent traffic flow matrix

X_{recnet}

are each processed through Encoder 1, Encoder 2. and Encoder 3 to capture spatio-temporal correlations. The detailed encoder blocks are shown in Figure 1b, including the multi-head self-attention mechanism, residual connection, and layer normalization.

The multi-head self-attention mechanism is capable of extracting distinct spatio-temporal correlation patterns. The embedded input traffic flow

X_{in}

is initially projected into high-dimensional latent subspaces. For the

i

-th self-attention head, the query vector

Q_{i} \in ℝ^{l^{*} \times d_{q}}

, key vector

K_{i} \in ℝ^{l^{*} \times d_{k}}

, and value vector

V_{i} \in ℝ^{l^{*} \times d_{v}}

are derived from

X_{i n}

, as follows:

Q_{i} = X_{in} W_{i}^{Q}, K_{i} = X_{in} W_{i}^{K}, V_{i} = X_{in} W_{i}^{V}

(1)

where

X_{i n}

is the input matrix,

W_{i}^{Q} \in ℝ^{(n + 1) \times d_{q}}

,

W_{i}^{Q} \in ℝ^{(n + 1) \times d_{k}}

, and

W_{i}^{V} \in ℝ^{(n + 1) \times d_{v}}

are the weight matrices for

Q_{i}

,

K_{i}

, and

V_{i}

, respectively. The elements within these weight matrices are learned by the network throughout the training process. Typically, we set

d_{q} = d_{k} = d_{v} = (n + 1) / h_{in}

, where

h_{in}

corresponds to the number of heads in the encoder.

For the

i

-th self-attention head, the result

A_{i}

is computed as follows:

A_{i} = softmax (\frac{Q_{i} {K_{i}}^{T}}{\sqrt{d_{k}}}) V_{i}

(2)

Multi-head self-attention

A_{z}

is achieved by concatenating and integrating the learned features from each head, as follows:

A_{z} = Concat (A_{1}, \dots, A_{h}) W^{O}

(3)

Residual connections are applied following the multi-head self-attention layer and fully connected layer to alleviate the issue of gradient vanishing. Layer normalization is performed on the output of each residual connection to accelerate the back-propagation of errors and enhance the convergence speed during training.

Upon calculating three stacked encoder blocks, the feature representations for traffic flows are acquired as follows: the output of Encoder 1 (responsible for capturing distant traffic flow patterns) is denoted as

M_{distant} \in ℝ^{L_{distant} \times (n + 1)}

, the output of Encoder 2 (modeling near traffic flow characteristics) is denoted as

M_{near} \in ℝ^{L_{near} \times (n + 1)}

, and the output of Encoder 3 (focusing on recent traffic flow features) is denoted as

M_{recent} \in ℝ^{L_{recent} \times (n + 1)}

.

2.4. Prediction Module of Traffic Flow

2.4.1. Fusion of Input Components

As shown in Figure 1a, the outcomes of the five input components are integrated using a parametric matrix-based fusion technique to serve as input for the output network. To facilitate the feature fusion, extracted features are further processed by matrix multiplication with learnable parameters to ensure that the output vector length is uniformly aligned with the output length

L_{o u t}

before final concatenation as follows:

X_{Fusion} = Concat (W_{ex1} X_{ex1}, W_{ex2} X_{ex2}, W_{distant} M_{distant}, W_{near} M_{near}, W_{recent} M_{recent})

(4)

where, as previously described,

X_{ex1} \in ℝ^{L \times e_{1}}

,

X_{ex2} \in ℝ^{L \times e_{2}}

,

M_{distant} \in ℝ^{L_{distant} \times (n + 1)}

,

M_{near} \in ℝ^{L_{near} \times (n + 1)}

, and

M_{recent} \in ℝ^{L_{recent} \times (n + 1)}

are extracted features. The weight matrices

W_{ex1} \in ℝ^{L_{out} \times L}

,

W_{ex2} \in ℝ^{L_{out} \times L}

,

W_{distant} \in ℝ^{L_{out} \times L_{diatant}}

,

W_{near} \in ℝ^{L_{out} \times L_{near}}

, and

W_{recent} \in ℝ^{L_{out} \times L_{recent}}

are learnable and assign different weights to External Factor 1, External Factor 2, and the distant, near, and recent components, respectively. The fused feature is

X_{Fusion} \in ℝ^{L_{out} \times f}

,

f = e_{1} + e_{2} + 3 (n + 1)

.

2.4.2. Computational Procedures of Decoder Blocks

As depicted in Figure 1a, the fused input component

X_{Fusion}

and masked predicted traffic flow serve as the input for the output part. The output module consists of three stacked decoder blocks arranged in series. The detailed decoder block is shown in Figure 1c. The multi-head self-attention mechanism, residual connection, and layer normalization were introduced in the encoder block. An additional multi-head cross-attention layer is designed to integrate the fusion of input components

X_{Fusion}

from the encoder and the outcome of last masked multi-head self-attention layer

Y^{'}

from the decoder.

The calculation process of the masked multi-head self-attention layer in the decoder is similar to the multi-head self-attention layer in the encoder, other than that the input is the masked predicted traffic flow. Given that the output traffic flow is unknown during the prediction stage, mask embedding is employed. Predictions at time

t

can only rely on the predicted outputs at times preceding

t

, denoted as

{\hat{Y}}^{1 : t - 1}

. The lower triangular mask matrix sets future positions to “

- \infty

”, which ensures that the model can only attend to historical timesteps, preventing information leakage from the future. After masking, a softmax normalization operator converts these “

- \infty

” values to zero-weight values, effectively blocking their influences on the prediction. The query vector, key vector, and value vector are derived from the masked

\hat{Y}

.

The multi-head cross-attention layer bridges the decoder output

\hat{Y}

with the fusion of input components

X_{Fusion}

in the encoder. For the i-th cross-attention head, the query matrix

Q_{i}^{c} \in ℝ^{L_{out} \times d_{q}}

is derived from the last masked multi-head cross-attention layer

Y^{'}

in the decoder, while the key matrix

K_{i}^{c} \in ℝ^{L_{out} \times d_{k}}

and value matrix

V_{i}^{c} \in ℝ^{L_{out} \times d_{v}}

are generated from the fused input component

X_{Fusion}

in the encoder, as follows:

Q_{i}^{c} = Y^{'} Φ_{i}^{Q}, K_{i}^{c} = X_{Fusion} Φ_{i}^{Q}, V_{i}^{c} = X_{Fusion} Φ_{i}^{Q}

(5)

where the dimension for the outcome of the masked multi-head self-attention layer

Y^{'} \in ℝ^{L_{out} \times (m + 1)}

is the same as the decoder input

Y \in ℝ^{L_{out} \times (m + 1)}

, where

m

denotes the number of predicted bridges, while

Φ_{i}^{Q} \in ℝ^{(m + 1) \times d_{q}}

,

Φ_{i}^{K} \in ℝ^{f \times d_{k}}

, and

Φ_{i}^{V} \in ℝ^{f \times d_{v}}

are learned weight matrices of

Q_{i}^{c}

,

K_{i}^{c}

, and

V_{i}^{c}

, respectively. We set

d_{q} = d_{k} = d_{v} = (m + 1) / h_{out}

, where

h_{out}

represents the number of attention heads for the decoder.

2.5. Loss Function

As shown in Figure 1a, the calculated results of the decoder blocks are then passed through the softmax operation to obtain the predicted traffic flow. The ST-TransNet is trained to predict traffic flow from two external factors and four traffic flow matrices by minimizing the root mean square error between the predicted traffic flow matrix

\hat{Y}

and the actual traffic flow matrix

Y

, as follows:

L = \sum_{j = 1}^{m} \sqrt{\frac{1}{l} \sum_{i = 1}^{l} {({\hat{y}}_{j}^{i} - y_{j}^{i})}^{2}}

(6)

where

{\hat{y}}_{j}^{i}

and

y_{j}^{i}

are the predicted traffic flow and ground truth of the

j

-th target bridge at the

i

-th time period, respectively.

2.6. Evaluation Metrics

The root mean square error (RMSE) and relation root mean square error (RRMSE) are utilized as evaluation metrics. RMSE quantitatively evaluates the absolute magnitude of traffic flow prediction errors, while RRMSE, defined as the ratio between the predicted RMSE to the rooted mean square of actual traffic flow, quantifies the proportion of prediction errors relative to the actual traffic flow.

RMSE = \sum_{j = 1}^{m} \sqrt{\frac{1}{l} \sum_{i = 1}^{l} {({\hat{y}}_{j}^{i} - y_{j}^{i})}^{2}}

(7)

RRMSE = \frac{\sum_{j = 1}^{m} \sqrt{\frac{1}{l} \sum_{i = 1}^{l} {({\hat{y}}_{j}^{i} - y_{j}^{i})}^{2}}}{\sum_{j = 1}^{m} \sqrt{\frac{1}{l} \sum_{i = 1}^{l} {(y_{j}^{i})}^{2}}}

(8)

3. Implementation Details

3.1. Investigated Datasets

Traffic flow data collected from structural health monitoring systems of eight bridges within a specific region are used to examine the proposed method. The locations of these eight bridges are depicted in Figure 3. Basic information about these bridges is detailed in Table 1. The traffic flow data utilized in this study were collected via WIM systems integrated into the SHM systems of eight bridges. Installed on one side of the bridge in each lane, these strain-based WIM sensors record traffic load parameters as vehicles pass. When a vehicle passes, the device is triggered to record the license plate number, passage time (accurate to the second), vehicle speed, total weight, axle spacing, number of axles, and maximum axle weights. The traffic flow data analyzed in this study refer to vehicle counts per 10 min interval, which are derived by aggregating the records across all lanes of each bridge. These aggregated counts form the traffic flow dataset for model training and testing.

The data from 17 September 2023 to 29 February 2024 are utilized as the training set, while the data from 1 March 2024 to 21 June 2024 serve as the test set. The training set spans a duration of 166 days, whereas the test set spans 113 days. Only results where the input channels include at least three valid data points and the output channel also contains valid data are presented. For the three target bridges, the training and test datasets that satisfy these criteria consist of 3591, 3929, and 4974 data points for the training set, and 2032, 2675, and 2478 data points for the test sets, respectively. It should be noted that due to the missing data issue, the input data for Encoder 1, Encoder 2, and Encoder 3 are incomplete.

Direct removal was infeasible as missing values affect the time synchronization of multiple time steps in the multi-period input structure, which would disrupt sequential samples. To address the missing data issue (denoted as NaN in the original data records), a fixed-value imputation approach was applied after normalization. The traffic flow data were first normalized to [0.2, 0.8], and the remaining NaN values were imputed with a fixed value of 0.4, which was set according to statistical results among the normalized range of [0.2, 0.8]. It should be noted that all imputed data points and their corresponding predictions were excluded from RMSE and RRMSE calculations during model evaluation. This ensures that the evaluation metrics reflect only instances with ground truth, preventing distortion from artificially generated values.

3.2. WIM Data Preprocessing

In ST-TransNet, softmax is used as the activation function, whose values range from 0 to 1. Correspondingly, the Min–Max normalization method is used to scale the data into the range of [0.2, 0.8] to ensure that the samples fall within the range of larger gradients of the activation function, thereby accelerating the back-propagation of errors and convergence speed during training process. During the evaluation phase, the predicted values are rescaled back to the normal range. For external factors, one-hot encoding is used to convert metadata (e.g., hour of day and day of week and holiday) into binary vectors.

3.3. Training Hyperparameters

The learnable parameters are initialized using Glorot_uniform_initializer with the default settings in Tensorflow, and the Adam optimizer is employed with hyperparameters set to

β_{1}

= 0.9,

β_{2}

= 0.98, and

ε

= 10⁻⁹. The leaning rate is adjusted according the following formula:

η (s t e p) = \sqrt{d_{m o d e l}} \cdot \min (\sqrt{s t e p}, s t e p \cdot w a r m u p_s t e p^{- 1.5})

(9)

The learning rate increases in a linear fashion for the initial

w a r m u p_s t e p

training steps, and then decreases in proportion to the inverse square root of the step number, where

w a r m u p_s t e p

= 400.

In this study, five bridges are used as inputs, and three bridges are used as outputs, e.g.,

n = 5, m = 3

. The output traffic flow spans a duration of 3 h, with a total length of 18 steps; therefore,

L_{o u t}

= 18. In this case,

L

= 18,

L_{d a y}

= 144, and

L_{w e e k}

= 1008. We set

L_{r e c e n t}

= 36,

p

= 3, and

q

= 3. Consequently, the length of the distant traffic flow

L_{d i s t a n t}

= 54, and the length of the near traffic flow

L_{n e a r}

= 54. For the 2 fully connected networks that account for external factors, each has 2 layers with a length of 18 and 3 hidden units. The hidden units for the distant encoder, near encoder, recent encoder, and decoder are all set to 15. The number of attention heads for the distant encoder, near encoder, recent encoder, and decoder are 3, 3, 3, and 2, respectively. The parameters are detailed in Table 2.

4. Results and Discussion

4.1. Traffic Flow Prediction Results

Figure 4, Figure 5 and Figure 6 show the prediction results for the three target bridges. The green dotted line in the figures indicates the boundary between the training set and the test set. The dashed lines in the time-history of traffic flow represent the six statutory holidays in China, in order, namely National Day, New Year’s Day, Spring Festival, Tomb-Sweeping Day, Labor Day, and Dragon Boat Festival. Prediction results are reported only where corresponding monitoring data were available. It is apparent that the model can capture the traffic flow patterns of the bridge network quite accurately. Bridge 1, Bridge 2, and Bridge 3 have RMSE (root mean square error) values of 26.36 vehicles/10 min, 5.78 vehicles/10 min, and 6.14 vehicles/10 min in the test set, respectively, with RRMSE (relative RMSE, RMSE relative to root mean square traffic flow) values of 9.08%, 7.39%, and 9.90%, respectively.

From these figures, it is evident that the prediction error is comparatively large during the holiday period. During holidays, the traffic flow of Bridge 1 and Bridge 3 significantly increases compared to normal days. The traffic flow of Bridge 2 does not change much compared to the daily average during the New Year’s Day and Dragon Boat Festival holidays, but it is notably higher than the average daily level during the other four holidays. This may be due to the limited number of holiday samples, resulting in the model not fully learning the deep features of holiday traffic patterns. It is particularly noteworthy that the prediction error around the 2024 Spring Festival (10 February to 17 February) is especially significant. This may be because during the Spring Festival travel season (26 January to 5 March), there is a sharp increase in traffic flow, and the traffic flow patterns are quite different from normal days, while the model lacks sufficient training data to capture these changes. To illustrate this phenomenon in detail, the figures highlight the traffic flow data before and after the New Year’s Day and Dragon Boat Festival holidays with black square boxes, covering two full weeks from Monday to Sunday.

4.2. Ablation Tests

This section performs ablation test to compare the influence of different components on the prediction results of ST-TransNet.

4.2.1. Ablation Tests on Single Computational Components

This section performs ablation tests on single components to assess their significance to the model. ST-TransNet is the final adopted model in this study, as depicted in Section 2, comprising two fully connected networks describing the external factors, three encoder networks, and one decoder network, adding the direct link between encoder and output. In contrast to ST-TransNet, “C1” removes Encoder 1, which describes the influence of distant traffic flows, “C2” removes Encoder 2, which characterizes the impact of near traffic flows, “C3” removes Encoder 3, which evaluates the impact of recent traffic flows, “C4” removes External Factor 1, i.e., hour of day information, “C5” removes External Factor 2, which represents the day of week and holiday information, “C6” takes off the decoder, which describes the relationship between output traffic flows, and “C7” has no direct link between input and output parts. Table 3 lists the changes for different models.

Table 4 and Table 5 list the RMSE and RRMSE in the training set and test set for the ablation test, and Figure 7 presents them intuitively. Table 4 and Table 5 and Figure 7 illustrate that ST-TransNet has the smallest RMSE and RRMSE in the test set (12.7 vehicles/10 min, 8.79%), followed by “C6” (14.47 vehicles/10 min, 11.04%), “C5” (14.62 vehicles/10 min, 9.81%), and “C2” (14.78 vehicles/10 min, 11.74%). Next is “C4” (15.67 vehicles/10 min, 11.95%), followed by “C1” (16.29 vehicles/10 min, 12.99%), and “C7” (16.59 vehicles/10 min, 12.35%), while “C3” (17.69 vehicles/10 min, 13.72%) performs worst. From these results, “ST-TransNet” obtains better results, which illustrates that the factors we considered all enhance the learning capability of the proposed model.

The relative RMSE values of the test sets of the ablation variants—i.e., “C1”, “C2”, “C3”, “C4”, “C5”, “C6”, and “C7”—are 127.7%, 115.8%, 138.6%, 122.8%, 114.6%, 113.4%, and 130.0%, respectively. Among the ablation variants, “C3”—which removes Encoder 3 (responsible for capturing recent traffic flows)—exhibits the most severe performance degradation. Critically, the magnitude of performance deterioration in an ablation variant directly reflects the significance of the removed component. Thus, the significant decline in prediction accuracy when removing the encoder of the recent traffic flow confirms that this component constitutes the most influential temporal factor for predictive accuracy.

The results of “C7” (removing the direct connection between encoder and the output) and “C1” (removing Encoder 1 (responsible for distant traffic flows)) indicate that the impact of direct connection between the encoder and output and distant traffic flow of the previous three weeks are quite significant as well. The result for “C4” (removing External Factor 1) indicates that hour of day information is still important for the model.

The results of “C2” (without Encoder 2 for near traffic flow), “C5” (without External Factor 2) and “C6” (without the decoder) indicate that traffic flows from previous days, day of week and holiday information, and the decoder part have relatively limited impact on the model, despite which, they all contribute to the model.

The ST-TransNet achieves an RMSE of 12.76 vehicles/10 min and an RRMSE of 8.79% in the test set, while “C2”, which excludes near flow but retains recent and distant flows, is a little worse than the proposed ST-TransNet (with an RMSE of 14.78 vehicles/10 min and RRMSE of 11.74%). However, the “C3” ablation test (17.69 vehicles/10 min, 13.72%), which excludes recent flow but retains near and distant flows, is significantly worse than “C2”. This indicates that when recent flow is excluded, retaining near and distant flows cannot effectively mitigate the performance degradation, suggesting that the near flow contains partial but not complete information overlap with the recent flow.

The performance of “C5” may because the input of the decoder is empty at the start of both the training and test processes, which could diminish the significance of the decoder.

The diminished influence of day of week and holiday information in “C6” may be attributed to data fluctuations around the Spring Festival. Although most days during this period are not legal holidays, vehicle mobility increases significantly, differing markedly from other non-holiday periods. This difference could be mitigated by expanding the training dataset and enhancing the complexity of the encoder.

4.2.2. Ablation Tests on Combined Components

This section conducts ablation test on combined factors to analyze their importance to the prediction results. ST-TransNet is the final utilized model, as previously described. In contrast to ST-TransNet, “Com1” removes Encoder 1 and Encode 2, meaning that its input only contains recent traffic flow and does not include distant and near traffic flows, while “Com2” removes all external components.

Table 6 and Table 7 list the RMSE and RRMSE in the training set and test set for different models, and Figure 8 presents them intuitively. It illustrates that ST-TransNet has the smallest RMSE and RRMSE values in the test set (12.7 vehicles/10 min, 8.79%), followed by ”Com2” (14.29 vehicles/10 min, 10.62%), while “Com1” (15.33 vehicles/10 min, 11.33%) performs worst. From the results of “Com1”, when removing the distant and recent traffic flows, the model performed worse. It should be noted that removing the external factors is inconducive to model performance, although the influences from external factors are slighter.

4.3. Comparison with Various Types of Baseline Models

The compared baseline models are outlined as follows:

SVR: Support vector regression (SVR) extends support vector machines to the field of regression analysis, which is used for predicting continuous numerical values. By maximizing the margin of the regression model and allowing some samples to deviate from a certain error range, SVR enhances the model robustness. Utilizing a “radial basis function” kernel, the coefficient for the kernel function is 0.1, the penalty factor for the error term is 10, and the threshold for allowable error is 0.1.
CNN: A 1D convolutional neural network is commonly used for processing sequential data. In this case, we use an input traffic flow with a length of 144 (representing monitoring data of 1 day) and a channel count of 5 to obtain the target traffic flow (with a size of 144 × 3). The convolutional layer employs 10 filters with a kernel size of 2 and uses the ReLU activation function; this is followed by a max-pooling layer with a pool size of 2. Subsequently, we add a deconvolutional layer to keep the data length unchanged, and finally, we incorporate a batch normalization layer. The above components constitute a convolutional block, and the designed network utilizes a total of six such convolutional blocks.
BiLSTM: Bidirectional LSTM (BiLSTM) is a widely used deep learning model for time series data. In this research, we employ a 2-layer BiLSTM with 16 and 8 hidden units, respectively, followed by a dense layer with 3 neurons. The input and output are the same as the aforementioned CNN model. The adoption of bidirectional LSTM over unidirectional LSTM is primarily due to the consideration that the output length matches the input length, enabling more accurate prediction of time series end data.
CNN&BiLSTM: The architecture of the CNN&BiLSTM model is inspired by [29], where the two layers of the BiLSTM are designed to learn the temporal features and the six layers of the 1D CNN are designed to learn the spatial features. The CNN and BiLSTM are calculated separately, and their results are then concatenated into a dense layer. The CNN uses filters with a size of 10, and the BiLSTM has hidden units with sizes of 16 and 8, respectively.
ST-ResNet: The architecture of the spatio-temporal residual network (ST-ResNet) for TFP is tailored to practical situations on the basis of a previous study [26]. The ST-ResNet captures the temporal closeness, periodicity, and trend characteristics of traffic flow. Each of these properties is modeled by residual convolutional blocks, while the external factor—day of the week—is depicted by two fully connected layers. These layers are then combined with the outputs from the three residual networks. A residual convolutional block comprises a convolutional layer, a max-pooling layer, a deconvolutional layer, a dense layer, another convolutional layer, another max-pooling layer, another deconvolutional layer, another dense layer, and a residual connection. Each residual network is composed of three residual blocks. The convolutional filters have a size of 10, the kernel size is 2, and the pool size is also 2.
Transformer: The transformer model employs the standard architecture detailed in a previous study [46]. It processes an input matrix of dimensions 144 × 5 and outputs a matrix of dimensions 144 × 3. The model features 16 hidden units, 4 attention heads, and 3 layers.
STGCN: The compared method spatio-temporal graph convolutional network (STGCN) originally proposed in a previous study [34], is adapted for bridge-group scenarios. The model utilizes data from 54 time steps (9 h) across 5 bridges as the input and forest traffic flows of the other 3 target bridges for the last 18 time steps (3 h). The adjacency matrix of bridge network is derived from inter-bridge distances using optimal vehicle routing plan recommendations according to Gaode Map. The channel dimensions of the 3 layers in the ST-Conv block are (16,8,16), and the kernel sizes of the graph convolution and temporal convolution in the ST-Conv block are both set to [1,3]. Furthermore, the kernel size of the final temporal convolution layer employs a [3,3] kernel configuration to maintain dimensional consistency with the output channels.

Table 8 lists the comparison of factors considered by baseline models. In this article, inter-spatial connection refers to the inter-class spatial correlation, specifically the correlation between the input bridge traffic flow and the output bridge traffic flow, while the intra-spatial correlation refers to the correlation among the output bridge traffic flows. The temporal correlation—recent, near, and distant—is described in Section 2.2.

The comparison results with the baseline models are displayed in Figure 9 and Table 9 and Table 10. The proposed ST-TransNet outperforms the baselines—i.e., SVR, CNN, BiLSTM, CNN&BiLSTM, ST-ResNet, Transformer, STGCN—achieving relative RMSE reductions of 40.5%, 36.9%, 36.6%, 37.3%, 35.6%, 31.1%, and 22.8%, respectively. From the learning effects on the test set, it is evident from the figure that neural networks perform better than SVR, potentially because SVR does not consider temporal correlation. The effects of CNN, BiLSTM, CNN&BiLSTM, and ST-ResNet are similar, whereas the transformer and STGCN outperform the aforementioned comparative models, with STGCN performing better than the transformer. The proposed model, i.e., ST-ResNet, demonstrates the best performance in terms of RMSE and RRMSE metrics. The advantages of the proposed model are twofold: firstly, it is based on the transformer architecture, considering not only the correlation between input traffic flows but also that between output traffic flows; secondly, it integrates the effects of recent, near, and distant terms, as well as external factors, which is why its learning effect surpasses other comparison models.

5. Conclusions

The increasing deployment of SHM systems across bridges now provides the dense, synchronized data needed to investigate bridge network traffic flow correlations—a critical aspect previously inaccessible due to limited instrumentation. Capitalizing on this opportunity, we present the first comprehensive analysis of traffic flow correlations within an instrumented bridge network.

To model these complex spatiotemporal dependencies, we propose ST-TransNet, a novel self-attention architecture that integrates multi-scale traffic flows (recent, near, and distant) from source bridges with external factors (hour of day, day of week and holiday), processes traffic flows through self-attention encoders and external factors via fully connected layers, then employs a self-attention decoder to predict target bridge traffic flows while explicitly modeling target–bridge correlations.

Validated using field WIM data from the investigated 8 bridges, the proposed ST-TransNet achieves an RMSE of 12.78 vehicles/10 min, outperforming a series of baseline models—i.e., SVR, CNN, BiLSTM, CNN&BiLSTM, ST-ResNet, transformer, and STGAN—achieving significant relative reductions of 40.5%, 36.9%, 36.6%, 37.3%, 35.6%, 31.1%, and 22.8%, respectively. The framework successfully captures complex cross-bridge relationships by integrating multi-scale temporal dependencies and external factors, with recent flows exhibiting the strongest influence. Decoder refinement informed by learned target–bridge correlations further enhances forecasting precision.

This work reveals significant network-level spatio-temporal correlations in traffic flows across bridge networks, offering valuable insights for shifting the paradigm from single-bridge to network level health monitoring. This study may enable more accurate system-level performance assessment, identification of critical load paths, optimized sensor placement, and enhanced network-wide safety evaluation.

Future research will include the following two aspects: (1) embedding the unique topology structure of bridge network into the proposed ST-TransNet using graph-based neural networks for user-specific customization of a certain city-scale bridge network; and (2) demonstrating the generalization ability and adaption efficiency under various scenarios.

Author Contributions

Conceptualization, Y.T. and Y.X.; methodology, Y.T., X.Y. and Y.X.; software, Y.T.; validation, Y.T. and X.Y.; formal analysis, Y.T. and Y.X.; investigation, all; resources, W.L., X.W. and Y.X.; data curation, Y.T.; writing—original draft preparation, Y.T., X.Y. and Y.X.; writing—review and editing, W.L., X.W. and Y.X.; visualization, Y.T. and X.Y.; supervision, W.L., X.W. and Y.X.; project administration, W.L. and X.W.; funding acquisition, W.L., X.W. and Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Plan Program of the Ministry of Science and Technology of China (Grant Nos. 2023YFC3805700 and 2021YFB3202905) and the Key R&D Plan Program of the Ningxia Hui Autonomous Region (Grant No. 2022BEG02056).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the first author.

Conflicts of Interest

Authors Yadi Tian, Xiaojing Wang and Xin Yan were employed by the company China-Road Transportation Verification & Inspection Hi-Tech Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BiLSTM	Bidirectional LSTM
DL	Deep learning
GCN	Graph convolutional network
GNN	Graph neural network
ITS	Intelligent transportation systems
ML	Machine learning
RNN	Recurrent neural network
RMSE	Root mean square error
RRMSE	Relative RMSE
ST-TransNet	Transformer-based traffic flow prediction model considering spatio-temporal correlations of bridge network
STGCN	Spatio-temporal graph convolutional network
ST-ResNet	Spatio-temporal residual network
SHM	Structural health monitoring
SVR	Support vector regression
TFP	Traffic flow prediction
WIM	Weigh-in-motion

References

He, Z.; Li, W.; Salehi, H.; Zhang, H.; Zhou, H.; Jiao, P. Integrated structural health monitoring in bridge engineering. Autom. Constr. 2022, 136, 104168. [Google Scholar] [CrossRef]
Deng, Z.; Huang, M.; Wan, N.; Zhang, J. The current development of structural health monitoring for bridges: A review. Buildings 2023, 13, 1360. [Google Scholar] [CrossRef]
Ministry of Transport Guidance on Further Improving the Safety and Durability of Highway Bridges. Available online: https://xxgk.mot.gov.cn/2020/jigou/glj/202012/t20201228_3509089.html (accessed on 28 December 2020).
Gu, H.; Liu, Y.; Liu, Z.; Gu, S. Research on a Big Data Platform for Bridge Cluster Monitoring Based on TDengine. Highway 2023, 68, 396–401. [Google Scholar]
Wei, B.; Xu, Z.; Wu, Y.; Wang, Y. Key Technology and Practice of Road Network Level Bridge Cluster Monitoring System. Guangdong Highw. Commun. 2024, 50, 62–65+71. [Google Scholar] [CrossRef]
Frangopol, D.M.; Bocchini, P. Bridge network performance, maintenance and optimisation under uncertainty: Accomplishments and challenges. Struct. Infrastruct. Syst. 2019, 8, 341–356. [Google Scholar] [CrossRef]
Khan, S.M.; Atamturktur, S.; Chowdhury, M.; Rahman, M. Integration of structural health monitoring and intelligent transportation systems for bridge condition assessment: Current status and future direction. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2107–2122. [Google Scholar] [CrossRef]
Xiao, Q.; Huang, H.; Tang, C. Quantitative analysis of the importance and correlation of urban bridges and roads in the study of road network vulnerability. Adv. Bridge Eng. 2023, 4, 18. [Google Scholar] [CrossRef]
Mitoulis, S.A.; Domaneschi, M.; Cimellaro, G.P.; Casas, J.R. Bridge and transport network resilience–a perspective. Proc. Inst. Civ. Eng.-Bridge Eng. 2022, 175, 138–149. [Google Scholar] [CrossRef]
Bocchini, P.; Frangopol, D.M. A stochastic computational framework for the joint transportation network fragility analysis and traffic flow distribution under extreme events. Probabilistic Eng. Mech. 2011, 26, 182–193. [Google Scholar] [CrossRef]
Mao, X.; Jiang, X.; Yuan, C.; Zhou, J. Modeling the optimal maintenance scheduling strategy for bridge networks. Appl. Sci. 2020, 10, 498. [Google Scholar] [CrossRef]
Zhong, W.; Ding, Y.; Song, Y.; Liu, S.; Xu, M.; Wang, X. The Fatigue Life Prediction of Welded Joints in Orthotropic Steel Bridge Decks Considering Weld-Induced Residual Stress and Its Relaxation Under Vehicle Loads. Buildings 2025, 15, 1644. [Google Scholar] [CrossRef]
Zhang, M.; Wang, X.; Li, Y. Fatigue Reliability Assessment of Bridges Under Heavy Traffic Loading Scenario. Infrastructures 2024, 9, 238. [Google Scholar] [CrossRef]
Bocchini, P.; Frangopol, D.M. Generalized bridge network performance analysis with correlation and time-variant reliability. Struct. Saf. 2011, 33, 155–164. [Google Scholar] [CrossRef]
Fiorillo, G.; Ghosn, M. Risk-based importance factors for bridge networks under highway traffic loads. Struct. Infrastruct. Eng. 2019, 15, 113–126. [Google Scholar] [CrossRef]
Ge, L.; Dan, D.; Liu, Z.; Ruan, X. Intelligent simulation method of bridge traffic flow load combining machine vision and weigh-in-motion monitoring. IEEE Trans. Intell. Transp. Syst. 2022, 23, 15313–15328. [Google Scholar] [CrossRef]
Chen, S.-Z.; Feng, D.-C.; Sun, Z. Reliability-based vehicle weight limit determination for urban bridge network subjected to stochastic traffic flow considering vehicle-bridge coupling. Eng. Struct. 2021, 247, 113166. [Google Scholar] [CrossRef]
Hou, R.; Jeong, S.; Lynch, J.P.; Ettouney, M.M.; Law, K.H. Data-driven analytical load rating method of bridges using integrated bridge structural response and weigh-in-motion truck data. Mech. Syst. Signal Process. 2022, 163, 108128. [Google Scholar] [CrossRef]
Braz, F.J.; Ferreira, J.; Gonçalves, F.; Weege, K.; Almeida, J.; Baldo, F.; Gonçalves, P. Road traffic forecast based on meteorological information through deep learning methods. Sensors 2022, 22, 4485. [Google Scholar] [CrossRef]
Kumar, P.B.; Hariharan, K. Time series traffic flow prediction with hyper-parameter optimized ARIMA models for intelligent transportation system. J. Sci. Ind. Res. 2022, 81, 408–415. [Google Scholar] [CrossRef]
Zhou, T.; Jiang, D.; Lin, Z.; Han, G.; Xu, X.; Qin, J. Hybrid dual Kalman filtering model for short-term traffic flow forecasting. IET Intell. Transp. Syst. 2019, 13, 1023–1032. [Google Scholar] [CrossRef]
Medina-Salgado, B.; Sánchez-DelaCruz, E.; Pozos-Parra, P.; Sierra, J.E. Urban traffic flow prediction techniques: A review. Sustain. Comput. Inform. Syst. 2022, 35, 100739. [Google Scholar] [CrossRef]
Mrudula, S.T.; Ritonga, M.; Sivakumar, S.; Jawarneh, M.; Keerthika, T.; Rane, K.P.; Roy, B. Internet of things and optimized knn based intelligent transportation system for traffic flow prediction in smart cities. Meas. Sens. 2024, 35, 101297. [Google Scholar] [CrossRef]
Cong, Y.; Wang, J.; Li, X. Traffic flow forecasting by a least squares support vector machine with a fruit fly optimization algorithm. Procedia Eng. 2016, 137, 59–68. [Google Scholar] [CrossRef]
Sun, S.; Wu, H.; Xiang, L. City-Wide Traffic Flow Forecasting Using a Deep Convolutional Neural Network. Ital. Natl. Conf. Sens. 2020, 20, 421. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Zheng, Y.; Qi, D.; Li, R.; Yi, X.; Li, T. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artif. Intell. 2018, 259, 147–166. [Google Scholar] [CrossRef]
Wang, X.; Xu, L.; Chen, K. Data-Driven Short-Term Forecasting for Urban Road Network Traffic Based on Data Processing and LSTM-RNN. Arab. J. Sci. Eng. 2018, 44, 3043–3060. [Google Scholar]
Reza, S.; Ferreira, M.C.; Machado, J.J.; Tavares, J.M.R. A multi-head attention-based transformer model for traffic flow forecasting with a comparative analysis to recurrent neural networks. Expert Syst. Appl. 2022, 202, 117275. [Google Scholar] [CrossRef]
Han, D.; Chen, J.; Sun, J. A parallel spatiotemporal deep learning network for highway traffic flow forecasting. Int. J. Distrib. Sens. Netw. 2019, 15, 1550147719832792. [Google Scholar] [CrossRef]
Du, S.; Li, T.; Gong, X.; Horng, S.-J. A hybrid method for traffic flow forecasting using multimodal deep learning. Int. J. Comput. Intell. Syst. 2020, 13, 85–97. [Google Scholar] [CrossRef]
Jia, T.; Yan, P. Predicting citywide road traffic flow using deep spatiotemporal neural networks. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3101–3111. [Google Scholar] [CrossRef]
Ma, F.; Deng, S.; Mei, S. A short-term highway traffic flow forecasting model based on CNN-LSTM with an attention mechanism. J. Phys. Conf. Ser. 2023, 2491, 012008. [Google Scholar] [CrossRef]
Méndez, M.; Merayo, M.G.; Núñez, M. Long-term traffic flow forecasting using a hybrid CNN-BiLSTM model. Eng. Appl. Artif. Intell. 2023, 121, 106041. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar] [CrossRef]
Li, M.; Zhu, Z. Spatial-temporal fusion graph neural networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27–28 January 2021; pp. 4189–4196. [Google Scholar]
Xu, M.; Dai, W.; Liu, C.; Gao, X.; Lin, W.; Qi, G.-J.; Xiong, H. Spatial-temporal transformer networks for traffic flow forecasting. arXiv 2020, arXiv:2001.02908. [Google Scholar] [CrossRef]
Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 922–929. [Google Scholar]
Jiang, T.; Guo, M.; Yang, L.; Ma, Z.; Liu, H. Traffic Flow Prediction Based on Decomposed Spatio-Temporal Fusion Graph Convolutional Network. Intell. Transp. Smart Cities 2024, 156–167. [Google Scholar] [CrossRef]
Zhao, X.; Wu, Z.; Zhang, X.; Deng, Z.; Su, L.; Li, G.; Wang, Y.; Huang, Q. POI-based Double-deck Graph Convolution Network for Traffic Forecasting. APSIPA Trans. Signal Inf. Process. 2024, 13, 1–19. [Google Scholar] [CrossRef]
Sun, Y.; Shi, Y.; Jia, K.; Zhang, Z.; Qin, L. A dual-stream cross AGFormer-GPT network for traffic flow prediction based on large-scale road sensor data. Sensors 2024, 24, 3905. [Google Scholar] [CrossRef]
Rahman, M.M.; Nower, N. Attention based deep hybrid networks for traffic flow prediction using google maps data. In Proceedings of the 2023 8th International Conference on Machine Learning Technologies, Stockholm, Sweden, 10–12 March 2023; pp. 74–81. [Google Scholar] [CrossRef]
Ji, J.; Wang, J.; Jiang, Z.; Jiang, J.; Zhang, H. STDEN: Towards physics-guided neural networks for traffic flow prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; pp. 4048–4056. [Google Scholar]
Liu, L.; Tian, Y.; Chakraborty, C.; Feng, J.; Pei, Q.; Zhen, L.; Yu, K. Multilevel federated learning-based intelligent traffic flow forecasting for transportation network management. IEEE Trans. Netw. Serv. Manag. 2023, 20, 1446–1458. [Google Scholar] [CrossRef]
Djenouri, Y.; Belhadi, A.; Srivastava, G.; Lin, J.C.-W. Hybrid graph convolution neural network and branch-and-bound optimization for traffic flow forecasting. Future Gener. Comput. Syst. 2023, 139, 100–108. [Google Scholar] [CrossRef]
Zhang, M.; Zhao, W. Traffic Flow Prediction Based on Large Language Models and Future Development Directions. In Proceedings of the ITM Web of Conferences, Nanjing, China, 6–8 December 2024; p. 01008. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]

Figure 1. The ST-TransNet architecture (FC: fully connected). (a) Overall architecture diagram of ST-TransNet; (b) encoder block; (c) decoder block.

Figure 2. Time history of traffic flow for 3 bridges. (a) Two-day time history; (b) one-week time history; (c) one-week time history (including holidays).

Figure 3. The location of the 8 bridges (8 bridges are labeled in order as A, B, C, D, E, F, G, H).

Figure 4. Prediction results of Bridge 1. (a) Overall prediction results; (b) enlarged view of the prediction during 25 December 2023~8 January 2024; (c) enlarged view of the prediction during 3 June 2024~17 June 2024 (black boxes denote the enlarged parts).

Figure 5. Prediction results of Bridge 2. (a) Overall prediction results; (b) enlarged view of the prediction during 25 December 2023~8 January 2024; (c) enlarged view of the prediction during 3 June 2024~17 June 2024 (black boxes denote the enlarged parts).

Figure 6. Prediction results of Bridge 3. (a) Overall prediction results; (b) enlarged view of the prediction during 25 December 2023~8 January 2024; (c) enlarged view of the prediction during 3 June 2024~17 June 2024 (black boxes denote the enlarged parts).

Figure 7. RMSE and RRMSE of ablation tests on single components. (a) RMSE; (b) RRMSE.

Figure 8. RMSE and RRMSE of ablation tests on combined components. (a) RMSE; (b) RRMSE. (Com1 represents the model without distant and near traffic flows; Com2 represents the model without External Factors 1 and 2).

Figure 9. RMSE and RRMSE comparison with baseline models. (a) RMSE; (b) RRMSE.

Table 1. Information about the 8 bridges used in this study.

Bridge	Type	Number of Lanes	Maximum Span (m)	Length (m)
A	Suspension bridge	6	616	1440
B	Girder bridge	4	170	509
C	Girder bridge	4	200	1077
D	Girder bridge	4	160	497
E	Suspension bridge	4	788	1719
F	Cable-stayed bridge	4	220	860
G	Girder bridge	4	460	2175
H	Girder bridge	4	176	558

Table 2. Parameters of ST-TransNet.

Components	Length	Dimension	Heads	Hidden Units	Layers/Blocks
External Factor 1	18	9	/	3	2
External Factor 2	18	24	/	3	2
Encoder 1 (distant)	54	5+1 ¹	3	15	3
Encoder 2 (near)	54	5+1	3	15	3
Encoder 3 (recent)	36	5+1	3	15	3
Decoder	18	3+1	2	15	3

¹ “+1” represents the additional channel of position encoding.

Table 3. Model changes for the single-factor ablation test.

Components	‘C1’	‘C2’	‘C3’	‘C4’	‘C5’	‘C6’	‘C7’	ST-TransNet
Encoder 1 (distant)	○ ¹	√	√	√	√	√	√	√
Encoder 2 (near)	√ ²	○	√	√	√	√	√	√
Encoder 3 (recent)	√	√	○	√	√	√	√	√
FC 1 (External Factor 1)	√	√	√	○	√	√	√	√
FC 2 (External Factor 2)	√	√	√	√	○	√	√	√
Decoder	√	√	√	√	√	○	√	√
Direct link between encoder and output	√	√	√	√	√	√	○	√

○ ¹—The “C1” model does not have the Encoder 1 (distant) component. √ ²—The “C1” model has the Encoder 2 (near) component.

Table 4. RMSE of ablation tests on single components.

RMSE (Vehicles/10 min)	‘C1’	‘C2’	‘C3’	‘C4’	‘C5’	‘C6’	‘C7’	ST-TransNet
Training set	9.48	8.76	11.69	9.03	7.91	10.18	15.68	6.61
Test set	16.29	14.78	17.69	15.67	14.62	14.47	16.59	12.76

Table 5. RRMSE of ablation tests on single components.

RRMSE (%)	‘C1’	‘C2’	‘C3’	‘C4’	‘C5’	‘C6’	‘C7’	ST-TransNet
Training set	6.98	5.88	8.58	6.07	5.19	6.68	10.87	4.72
Test set	12.99	11.74	13.72	11.95	9.81	11.04	12.35	8.79

Table 6. RMSE of ablation test on combined components.

RMSE (Vehicles/10 min)	“Com1”	“Com2”	ST-TransNet
Training set	10.63	9.15	6.61
Test set	15.33	14.29	12.76

Table 7. RRMSE of ablation test on combined components.

RRMSE (%)	“Com1”	“Com2”	ST-TransNet
Training set	7.75	6.15	4.72
Test set	11.33	10.62	8.79

Table 8. Properties of the baseline models.

Model	Spatial		Temporal			External Factors
Model	Inter	Intra	Recent	Near (Daily)	Distant (Weekly)	External Factors
SVR	√ ¹	○ ²	○	○	○	○
CNN	√	○	Explicit ³	Implicit ⁴	○	○
BiLSTM	√	○	Explicit	Implicit	○	○
CNN&BiLSTM	√	○	Explicit	Implicit	○	○
ST-ResNet	√	○	Explicit	Explicit	Explicit	√
Transformer	√	√	Explicit	Implicit	○	○
STGCN	√	√	Explicit	Explicit	Explicit	√

¹ SVR describes inter-spatial correlation. ² SVR does not describe intra-spatial correlation. ³ CNN describes the recent temporal correlation explicitly. ⁴ CNN describes near-temporal correlation implicitly.

Table 9. RMSE comparison with the baseline models (Bold RMSE indicates the smallest prediction error).

RMSE (Vehicles/10 min)	SVR	CNN	BiLSTM	CNN&BiLSTM	ST-ResNet	Transformer	STGCN	ST-TransNet (Proposed)
Training set	18.45	16.08	14.47	15.45	13.96	14.47	10.08	6.61
Test set	21.46	20.22	20.14	20.36	19.82	18.52	16.52	12.76

Table 10. RRMSE comparison with the baseline models (Bold RRMSE indicates the smallest prediction error).

RRMSE (%)	SVR	CNN	BiLSTM	CNN&BiLSTM	ST-ResNet	Transformer	STGCN	ST-TransNet (Proposed)
Training set	12.90	11.18	9.45	10.14	10.13	9.79	8.23	4.72
Test set	16.74	14.08	15.49	15.67	14.00	12.64	11.71	8.79

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tian, Y.; Li, W.; Wang, X.; Yan, X.; Xu, Y. Transformer-Based Traffic Flow Prediction Considering Spatio-Temporal Correlations of Bridge Networks. Appl. Sci. 2025, 15, 8930. https://doi.org/10.3390/app15168930

AMA Style

Tian Y, Li W, Wang X, Yan X, Xu Y. Transformer-Based Traffic Flow Prediction Considering Spatio-Temporal Correlations of Bridge Networks. Applied Sciences. 2025; 15(16):8930. https://doi.org/10.3390/app15168930

Chicago/Turabian Style

Tian, Yadi, Wanheng Li, Xiaojing Wang, Xin Yan, and Yang Xu. 2025. "Transformer-Based Traffic Flow Prediction Considering Spatio-Temporal Correlations of Bridge Networks" Applied Sciences 15, no. 16: 8930. https://doi.org/10.3390/app15168930

APA Style

Tian, Y., Li, W., Wang, X., Yan, X., & Xu, Y. (2025). Transformer-Based Traffic Flow Prediction Considering Spatio-Temporal Correlations of Bridge Networks. Applied Sciences, 15(16), 8930. https://doi.org/10.3390/app15168930

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Transformer-Based Traffic Flow Prediction Considering Spatio-Temporal Correlations of Bridge Networks

Abstract

1. Introduction

2. Methodology

2.1. Overall Architecture of ST-TransNet

2.2. Timestamp Input and Feature Extraction by External Components

2.3. Multi-Period Traffic Flow Input by Recent, Near, and Distant Components

2.3.1. Traffic Flow Input Matrices

2.3.2. Computational Procedures of Encoder Blocks

2.4. Prediction Module of Traffic Flow

2.4.1. Fusion of Input Components

2.4.2. Computational Procedures of Decoder Blocks

2.5. Loss Function

2.6. Evaluation Metrics

3. Implementation Details

3.1. Investigated Datasets

3.2. WIM Data Preprocessing

3.3. Training Hyperparameters

4. Results and Discussion

4.1. Traffic Flow Prediction Results

4.2. Ablation Tests

4.2.1. Ablation Tests on Single Computational Components

4.2.2. Ablation Tests on Combined Components

4.3. Comparison with Various Types of Baseline Models

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI