Traffic Flow Prediction in Intelligent Transportation Systems: A Comprehensive Review of Graph Neural Networks and Hybrid Deep Learning Methods

Wang, Zhenhua; Wang, Xinmeng; Wang, Lijun; Wu, Zheng; Hu, Jiangang; Yuan, Fujiang; Tian, Zhen

doi:10.3390/a19040310

Open AccessReview

Traffic Flow Prediction in Intelligent Transportation Systems: A Comprehensive Review of Graph Neural Networks and Hybrid Deep Learning Methods

by

Zhenhua Wang

^1,*

,

Xinmeng Wang

¹,

Lijun Wang

²,

Zheng Wu

¹,

Jiangang Hu

³,

Fujiang Yuan

⁴

and

Zhen Tian

⁵

¹

College of Information Technology, Nanjing Police University, Nanjing 210023, China

²

College of Information, Guangdong Communication Polytechnic, Guangzhou 510650, China

³

College of Public Security, Nanjing Police University, Nanjing 210023, China

⁴

School of Computer Science and Technology, Taiyuan Normal University, Jinzhong 030619, China

⁵

James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(4), 310; https://doi.org/10.3390/a19040310

Submission received: 19 January 2026 / Revised: 11 March 2026 / Accepted: 17 March 2026 / Published: 16 April 2026

Download

Browse Figures

Versions Notes

Abstract

Traffic flow prediction is a key component of Intelligent Transportation Systems (ITS), crucial for alleviating urban congestion, optimizing traffic management, and improving the overall efficiency of road networks. With the rapid growth in vehicle numbers and the increasing complexity of urban traffic patterns, accurate short-term traffic flow prediction has become increasingly important. This paper comprehensively reviews the latest advancements in traffic flow prediction methods, focusing on graph neural network (GNN)-based approaches and hybrid deep learning frameworks. First, we introduce the fundamental theoretical foundations, including graph neural networks, deep learning algorithms, heuristic optimization methods, and attention mechanisms. Subsequently, we summarize GNN-based prediction methods into four paradigms: (1) federated learning and privacy-preserving methods, enabling cross-regional collaboration while protecting sensitive data; (2) dynamically adaptive graph structure methods, capturing time-varying spatial dependencies; (3) multi-graph fusion and attention mechanism methods, enhancing feature representations from multiple perspectives; and (4) cross-domain technology integration methods, fusing novel architectures and interdisciplinary technologies. Furthermore, we investigate hybrid methods combining signal decomposition, heuristic optimization, and attention mechanisms with LSTM networks to address challenges related to non-stationarity and model optimization. For each category, we analyzed representative works and summarized their core innovations, strengths, and limitations using a systematic comparative table. Finally, we discussed current challenges, including computational complexity, model interpretability, and generalization ability, and outlined future research directions such as lightweight model design, uncertainty quantification, multimodal data fusion, and integration with traffic control systems. This review provides researchers and practitioners with a systematic understanding of the latest advances in traffic flow prediction and offers guidance for methodological selection and future research.

Keywords:

traffic flow prediction; graph neural network; deep learning; intelligent transportation systems; spatiotemporal modeling; federated learning; attention mechanism

1. Introduction

Traffic flow prediction has long been recognized as a fundamental capability supporting intelligent transportation systems (ITS) and efficient urban operation [1]. However, its practical importance is not merely conceptual; it is quantitatively significant. According to global urban mobility reports, congestion costs in major metropolitan areas account for billions of dollars annually in lost productivity, excess fuel consumption, and environmental externalities. In many large cities, commuters lose tens to over one hundred hours per year due to congestion, corresponding to measurable economic losses at both individual and societal levels [2]. Even marginal improvements in short-term traffic prediction accuracy have been shown to translate into measurable reductions in travel delay, improved signal coordination efficiency, and enhanced throughput at bottleneck intersections. Therefore, improving prediction reliability is directly linked to operational efficiency and economic performance.

With the rapid growth of private vehicle ownership—particularly in developing economies—urban road networks are operating increasingly close to or beyond capacity. Congestion frequently emerges at arterial corridors and critical junctions, where small disturbances can propagate nonlinearly through the network. A schematic diagram of urban traffic flow is shown in Figure 1. In such high-sensitivity systems, accurate short-term prediction plays a pivotal role in adaptive signal control, ramp metering, congestion pricing, and dynamic route guidance. Prediction errors during peak periods may lead to suboptimal control strategies, compounding queue spillbacks and increasing accident risks.

Methodologically, traffic flow prediction research has evolved from classical time-series models such as ARIMA [3], which model traffic as a seasonal stationary process and provide interpretable parameter estimates but fail under abrupt non-stationary conditions, to deep learning approaches including RNN and LSTM architectures [4], which demonstrated that recurrent networks can learn long-term temporal dependencies in speed data from remote microwave sensors and substantially outperform statistical baselines. While these methods achieve satisfactory performance under relatively stable conditions, their robustness and generalization remain challenged by the intrinsic characteristics of urban traffic: randomness, strong nonlinearity, abrupt regime shifts, and complex spatiotemporal coupling, as shown by the diffusion convolutional recurrent framework that highlighted the inadequacy of node-independent temporal models for spatially coupled networks. Moreover, Polson [5] specifically demonstrated that deep learning models optimized purely for accuracy frequently suffer from prohibitive inference latency when scaled to city-wide sensor deployments, raising critical concerns about practical applicability.

Traffic flow prediction also interacts closely with related tasks such as travel time estimation, speed forecasting, density inference, queue dissipation analysis, and travel behavior modeling [6]. As sensing infrastructure expands and data volumes grow, research trends increasingly emphasize multi-source data integration, cross-regional generalization, and joint optimization of accuracy and efficiency [7,8]. By grounding methodological innovation in measurable operational impact—such as delay reduction, throughput improvement, and congestion cost mitigation—the field can more clearly align theoretical advances with tangible societal benefits.

In this context, a review of traffic flow prediction methods has important theoretical and practical value [9]. On the one hand, with the increasing complexity of transportation systems and the rapid expansion of data dimensions [10], existing research has shown diversified development in terms of model structure, data utilization strategies, spatiotemporal feature modeling, and multi-task prediction. It is urgent to conduct a systematic review to clarify the research context, summarize the evolution of methods, and identify the advantages and limitations of various models. On the other hand, ITS is moving from local pilot projects to large-scale deployments, placing higher demands on the real-time performance, scalability, and adaptability of predictive models. This makes a comprehensive evaluation of the performance and applicability of existing methods crucial. By summarizing research hotspots and pointing out current challenges, we can not only provide a clear direction for subsequent research but also provide a basis for traffic management departments to rationally select predictive technologies in practical applications.

To better understand the methodological development reviewed in this paper, it is necessary to distinguish the three developmental stages of traffic flow prediction research. The first generation (before 2020) mainly relied on statistical methods and shallow machine learning methods (such as ARIMA and SVM), which were difficult to handle nonlinear spatiotemporal dependencies. The second generation (2020–2022) witnessed the rise of static graph neural network architectures (such as GCN and GAT) and early LSTM-based hybrid models, in which the graph structure was mostly predefined, and the attention mechanism was introduced as an auxiliary module for the first time. The third generation (2023–2025) is the focus of this review, characterized by three key technological changes compared to previous work: (1) a shift from static graph topology to dynamically learned graph topology; (2) the adoption of federated learning as a privacy framework that can be used in production environments, rather than a conceptual proposal; and (3) the systematic integration of Transformer-based self-attention mechanisms with graph neural network encoders to achieve large-scale parallel spatiotemporal modeling.

As shown in Table 1. Liu et al. [11] focused on traffic flow prediction technology in intelligent transportation systems, categorizing methods into three main types: statistical, machine learning, and deep learning. They analyzed the core principles, application scenarios, and advantages and disadvantages of each method, highlighting the irreplaceable advantages of deep learning in handling complex nonlinear relationships, particularly the superior prediction accuracy and generalization ability of hybrid neural networks compared to traditional methods. They also noted the challenges in model generalization across different scenarios and long-term prediction. Attioui et al. [12], following the PRISMA 2020 guidelines, systematically reviewed the application of machine learning in traffic congestion prediction from 2010 to 2024. They selected 115 high-quality studies from 9695 records, emphasizing the dominant role of deep learning and supervised learning. They analyzed the distribution of research by road type, vehicle type, and prediction cycle, while also pointing out the insufficient application of reinforcement learning and the lack of research on rural roads, providing a current status reference for research in this field. Kong et al. [13] focused on time series forecasting, covering multiple application areas such as traffic flow. They divided deep learning model architectures into five paradigms, summarized feature extraction methods such as dimensionality decomposition and time-frequency transformation, compiled relevant datasets, and deeply analyzed data privacy and model interpretability issues. They also provided an outlook on future directions such as representation learning and causal inference, offering a systematic framework for cross-domain time series forecasting research. Annarita et al. [14] comprehensively reviewed the development of traffic flow forecasting technology, categorizing methods into four types: naive techniques, parametric methods, traffic simulation techniques, and nonparametric models. They analyzed the theoretical foundations and practical applications, emphasizing the role of artificial intelligence in dynamic and accurate forecasting, and pointing out that spatiotemporal modeling and real-time data fusion are the main future development trends, providing a panoramic reference for researchers and policymakers. Shahriar et al. [15] focused on the application of deep learning algorithms and classic models in traffic forecasting. They introduced the principles of deep learning models such as LSTM and CNN, as well as classic models such as Kalman filtering and ARIMA, and compared their performance in traffic flow, speed, and congestion forecasting, providing guidance for model selection under different needs. Bernardo et al. [16] analyzed traffic flow prediction and classification research in Europe over the past five years, elucidating the application of historical and real-time data, outlining data preprocessing techniques, comparing the effectiveness of methods such as deep learning, parametric models, and genetic programming, as well as clustering and classification methods, and clarifying the applicable scenarios for various performance evaluation indicators, filling a gap in regional research. Aristeidis et al. [17] focused on traffic congestion prediction, comprehensively covering statistical, machine learning, deep learning, and ensemble methods. They clarified the key points for selecting short-term, medium-term, and long-term prediction models, listed key input parameters such as weather, season, and road information, and proposed a standard process for data collection, preprocessing, and model selection. They also analyzed the limitations of various methods and future directions for improvement.

Existing research on traffic flow prediction has made significant progress. Systematic cyclic prediction methods are categorized into three main types: statistical methods, machine learning, and deep learning. These studies have analyzed the core principles, application scenarios, and performance of each method in depth. They have fully demonstrated the irreplaceable advantages of deep learning in handling complex relationships, particularly the superior prediction accuracy and generalization capabilities of hybrid neural networks (such as LSTM and CNN) compared to traditional methods, which have been validated in multiple aspects including traffic flow, speed, and congestion prediction. Furthermore, industry experts have summarized model architecture paradigms, feature extraction methods (such as dimensionality reduction and various time-frequency transformations), and data reconstruction techniques, providing systematic guidance for model selection for different prediction cycles and application scenarios, and clarifying spatiotemporal modeling and real-time data fusion as the main future development trends.

However, these studies also have some limitations: (1) The methodological classification is not systematic enough. Most reviews classify the technology types (statistics, machine learning, deep learning) and lack a deep classification framework based on core innovation mechanisms; (2) There is insufficient attention to cutting-edge GNN methods. Although some reviews point out graph neural networks, there is no systematic review of their evolution path and technical paradigm in traffic prediction; (3) Signaling methods lack systematic summary. There is a lack of comprehensive induction and comparative analysis of the combination of hierarchical, heuristic optimization and deep learning frameworks. In response to these shortcomings, this review makes up for the deficiencies of existing reviews in terms of methodological system, cutting-edge technology tracking, hybrid method induction and practical guidance through a systematic classification framework, comprehensive technology coverage, detailed comparative analysis and practice-oriented insights. It provides a more complete and practical reference frame for traffic flow research, prediction and practical application. This review makes the following contributions:

(1): A two-dimensional classification framework. Existing reviews typically organize methods by technology type (statistics, machine learning, deep learning), without distinguishing core innovation mechanisms. This review categorizes GNN-based methods into four mechanism-driven paradigms (federated learning and privacy protection, dynamic adaptive graph structures, multi-graph fusion and attention mechanisms, and cross-domain integration) and summarizes hybrid deep learning into three implementation paths, providing a more structured methodological overview.
(2): Systematic coverage of recent GNN advances (2023–2025). While some existing reviews mention graph neural networks, few systematically examine their development within traffic prediction. This review traces the progression from static to dynamic graphs, from single-graph to multi-graph architectures, and from centralized to federated models, covering literature from 2023 to 2025 that is largely absent from prior surveys.
(3): Method selection guidance and future directions. This review provides method selection suggestions for different application scenarios (e.g., real-time prediction, privacy protection, cross-domain generalization) and discusses several emerging directions, including interpretable AI, edge computing, multimodal fusion, and reinforcement learning, offering practical reference for subsequent research.

2. Materials and Methods

This chapter introduces the theoretical foundation and literature selection methods supporting the full-text analysis. Regarding literature sources, the study conducted Boolean searches across five major academic databases: Web of Science, Scopus, IEEE Xplore, Google Scholar, and ACM Digital Library, ultimately including over 100 high-quality original articles. In terms of the theoretical framework, this chapter sequentially elucidates the message passing mechanism of Graph Neural Networks and the layer-by-layer propagation rules of GCNs; the gating structures and long-term dependency modeling capabilities of RNNs and LSTMs in deep learning systems; the neighborhood search paradigm and multi-objective optimization strategies of heuristic optimization algorithms; and the core advantages of attention mechanisms (especially the self-attention architecture of Transformers) in capturing the spatiotemporal dependencies of traffic flow. These theoretical modules are interconnected, collectively forming the technical support system for the methodological analysis in the subsequent three chapters, laying a solid conceptual foundation for readers to understand various hybrid prediction models.

2.1. Theoretical Foundations of Traffic Flow and Congestion Modeling

Beyond the methodological framework, understanding traffic flow prediction also requires a solid theoretical foundation in traffic flow dynamics. The core dynamic characteristic of traffic flow is phase transition behavior: near the critical density, the traffic system can abruptly transition from a free-flowing state to a congested state, exhibiting strong nonlinearity and instability. The congestion boundary method provides a rigorous analytical framework for characterizing these phase transition boundaries by explicitly defining the conditions under which traffic transitions from free flow to congestion [18]. By linking the macroscopic flow-density relationship with the congestion propagation mechanism, this approach provides a physically interpretable framework that data-driven prediction models can leverage to constrain learned representations and ensure physically consistent outputs.

At the microscopic level, car-following models are crucial tools for analyzing congestion formation and propagation. Recent work on heterogeneous ring road car-following models incorporating visual angle defects and speed limit effects has further revealed that the incompleteness of driver perception in real-world traffic systems significantly affects vehicle fleet stability, accelerating congestion formation and spread [19]. This finding provides direct theoretical guidance for introducing heterogeneity in driving behavior into prediction models, particularly for approaches that model individual vehicle interactions within graph-structured road networks.

At the macroscopic policy level, research on the synergistic optimization of road pricing and capacity expansion demonstrates that traffic congestion mitigation is not only a matter of prediction accuracy but also involves supply–demand balance and systematic allocation of policy instruments [20]. Specifically, analyses of road price and capacity policies subject to fiscal constraints in urban settings reveal that the marginal social cost of congestion depends critically on network topology and demand elasticity—considerations that embedding predictive models into policy evaluation frameworks must account for in order to achieve a functional leap from “prediction” to “decision support.” This connection is directly relevant to the cross-domain technology integration paradigm reviewed in Section 3.4.

Furthermore, empirical studies on congestion wave propagation in mixed-traffic environments [21] and theoretical analyses of traffic breakdown probability at bottleneck locations [22] collectively demonstrate that the spatiotemporal patterns of congestion are governed by deterministic physical laws as well as stochastic fluctuations, underscoring the necessity of hybrid physics-informed and data-driven modeling strategies as reviewed in Section 3.4.

2.2. Literature Search and Selection

To ensure the reproducibility of the research results and the rigor of the methods, this review followed a structured literature search and selection process. We used Boolean search queries to systematically search five academic databases (Web of Science, Scopus, IEEE Xplore, Google Scholar, and ACM Digital Library), using keywords including “traffic flow prediction,” “graph neural network,” “GNN,” “GCN,” “deep learning,” “LSTM,” “federated learning,” “attention mechanism,” “spatiotemporal,” “dynamic graph,” “VMD,” “CEEMDAN,” and “heuristic optimization.” In addition, we manually screened the reference lists of key review papers. The search scope covered literature published between January 2020 and January 2026, with a particular focus on research published between 2023 and 2025. We also selectively included groundbreaking literature published before 2020, provided that these publications established fundamental methodologies directly relevant to the paradigm of this review. Inclusion criteria included studies that: proposed or evaluated methods for predicting traffic flow, speed, or density based on graph neural networks (GNNs) or hybrid deep learning frameworks; provided quantitative performance evaluations based on benchmark datasets or real-world datasets; and were published as peer-reviewed articles or conference papers in English. Studies focusing solely on non-traffic tasks, lacking methodological innovation, or whose full text was unavailable were excluded. The selection process consisted of three phases: title/abstract screening, full text review, and subject classification. Disagreements were resolved through discussion among co-authors. Ultimately, this review was based on 101 original studies.

2.3. Introduction to Graph Neural Networks

Graph Neural Networks (GNNs) are a specialized class of deep learning models designed to process data residing on non-Euclidean structured graphs [23]. Unlike traditional neural networks optimized for regular grid data, GNNs leverage the inherent structure of a graph

G = (V, E)

to learn expressive representations for its nodes, edges, or the entire graph [24]. The fundamental principle of a GNN lies in the iterative application of a message passing paradigm, where the representation of each node is updated by aggregating information from its local neighborhood [25]. This mechanism allows the model to capture the complex spatial dependencies and relational context within the topological structure. Among various GNN architectures, the Graph Convolutional Network (GCN) stands out as the canonical and most influential model, successfully adapting the concept of convolution to the graph domain [26].

Figure 2 illustrates the overall architecture of a Graph Convolutional Network, showing the propagation of node features through multiple GCN layers with ReLU activation functions. The core operation of a GCN layer is the transformation of node features from layer l to layer

l + 1

[27]. This transformation effectively implements a local spectral filter that computes a feature representation for each node based on the aggregated features of its immediate neighbors. This process is defined by the layer-wise propagation rule. Given the node feature matrix

H^{(l)}

at layer l, the GCN propagation rule for obtaining the features

H^{(l + 1)}

at layer

l + 1

is typically formulated as follows:

H^{(l + 1)} = σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(1)

In Equation (1),

H^{(l)} \in R^{| V | \times d_{l}}

represents the matrix of node features, where

| V |

is the number of nodes and

d_{l}

is the feature dimension at layer l. The matrix

W^{(l)}

is the layer-specific trainable weight matrix. The term

σ (\cdot)

denotes a non-linear activation function, such as ReLU. The critical component of the GCN is the normalized adjacency matrix

{\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}}

, where A is the adjacency matrix of the graph,

\tilde{A} = A + I

(incorporating a self-loop I), and

\tilde{D}

is the degree matrix of

\tilde{A}

. This normalization ensures that the aggregated information from neighbors is appropriately scaled, mitigating potential feature distortion and stabilizing the learning process. By iteratively stacking multiple GCN layers, the model can efficiently learn multi-hop spatial dependencies, enabling deep feature extraction within the graph structure.

2.4. Introduction to Deep Learning Algorithms

Deep learning is a class of machine learning methods based on multi-layered neural networks for feature learning and decision modeling [28], where Wang et al. demonstrated its effectiveness for domain adaptation via latent transferability of feature components. This paradigm is designed to simulate the hierarchical information-processing mechanisms of the human brain: Wang et al. [29] showed that teacher–student adversarial augmentation strategies substantially improve generalization under domain shift in medical image segmentation; Zeng et al. [30] demonstrated that external graph neural network potentials integrated within the DeePMD-kit framework enable accurate molecular dynamics simulations with deep learning; and Yang et al. [31] established that spatiotemporal graph neural networks with double-explored architectures achieve superior multi-site intra-hour photovoltaic power forecasting, illustrating the broad cross-domain applicability of hierarchical feature extraction. Its defining characteristic is network depth: unlike traditional shallow models with one or two layers, deep neural networks may contain hundreds of layers, enabling substantially stronger feature extraction and pattern recognition across tasks such as image recognition, speech recognition, and NLP [32].

Within this family, Recurrent Neural Networks (RNNs) are specifically designed for sequential and temporal data [33]. By introducing feedback connections, RNNs allow the current output to depend on historical inputs, enabling temporal context-based reasoning for applications including traffic flow and stock market prediction. Training is performed via Backpropagation Through Time (BPTT) [34], which computes gradients across all time steps, supporting flexible input–output structures (one-to-many, many-to-one, many-to-many). However, unidirectional RNNs are inherently limited to past information and cannot leverage future context, restricting their performance on tasks with bidirectional dependencies [35].

Long Short-Term Memory networks are a variant of RNNs specifically designed to address the vanishing and exploding gradient problems that commonly occur in ordinary RNNs during long sequence training [36]. LSTMs achieve controlled storage and selective forgetting of historical information by designing additional “unit states” in hidden layers and weighted “gating mechanisms.” Its structure is shown in Figure 3, where

C_{t - 1}

represents the cell state at the previous time step, and

x_{t}

and

h_{t - 1}

are the current input and the hidden state at the previous time step, respectively. The network calculates

f_{t}

,

i_{t}

, and

o_{t}

through forget gates, input gates, and output gates, respectively, to dynamically determine the information to retain, update, and output. The weights of each gate are controlled by the sigmoid function

σ

, with the output ranging from 0 to 1, used to limit the proportion of information passed through.

Notably, LSTM specifically uses a “forget mechanism” to prevent irrelevant historical information from interfering with current decisions. The forget gate merges

x_{t}

and

h_{t - 1}

to generate an element-wise mask vector, determining which information is retained or discarded: values close to 0 indicate content to be discarded, while values close to 1 indicate content to be retained. This mechanism ensures that the model can effectively handle dynamically changing sequences over the long term, thereby further enhancing its ability to handle long-dependency tasks.

f_{t} = σ (W_{f} \cdot [h_{t - 1}, X_{t}] + b_{f})

(2)

In Equation (2), the parameters

W_{f}

and

b_{f}

represent the weight matrix and bias vector of the forget gate, respectively, while

σ

denotes the sigmoid activation function. The input gate regulates the extent to which current input information is incorporated into the cell state at the present time step, thereby managing which information requires updating. The input vector

X_{t}

and previous hidden state

h_{t - 1}

are processed through the input gate, and subsequently combined with values transformed by a tanh activation function to produce updated control parameters. The mathematical representation of the input gate mechanism is given by:

i_{t} = σ (W_{i} \cdot [h_{t - 1}, X_{t}] + b_{i})

(3)

{\tilde{C}}_{t} = tanh (W_{c} \cdot [h_{t - 1}, X_{t}] + b_{c})

(4)

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

(5)

In Equation (5), the cell state undergoes an update to become

C_{t}

, where

W_{i}

corresponds to the weight matrix of the input gate. The output gate controls which portion of the current cell state information is propagated to the hidden state

h_{t}

. Both

X_{t}

and

h_{t - 1}

initially traverse the output gate to delineate the scope of information to be output. Subsequently, through integration with the tanh activation function, a selected subset of memory information from

C_{t}

is processed, ultimately determining the final hidden state output

h_{t}

. The mathematical expression characterizing the output gate is formulated as follows:

o_{t} = σ (W_{o} \cdot [h_{t - 1}, X_{t}] + b_{o})

(6)

h_{t} = o_{t} * tanh (C_{t})

(7)

In Equations (6) and (7), the parameter

W_{o}

designates the weight matrix associated with the output gate.

2.5. Introduction to Heuristic Optimization Algorithms

Metaheuristic algorithms are a class of general optimization strategies [37]. Their core characteristic is independence from specific problem structure information, making them applicable to a wide range of scenarios, including combinatorial optimization and numerical function solving. These algorithms are typically based on empirical heuristics and random search, finding approximate optimal solutions through iterative methods.

Modern heuristic algorithms are proposed relatively independently, but their ideas often originate from natural or physical phenomena. For example, Simulated Annealing (SA) borrows the principle of thermodynamic annealing and uses a Monte Carlo mechanism for global probabilistic search; Genetic Algorithm (GA) [38] mimics the inheritance and selection processes in biological evolution to achieve parallel global optimization; Evolutionary Programming (EP) [39] emphasize differences in population behavior rather than genetic details; Tabu Search (TS) [40] relies on an iterative mechanism with memory to achieve stepwise optimization; and Ant Colony Algorithm (ACA) [41] performs random search based on the cooperative behavior of real ant colonies. Furthermore, different optimization models can use fixed-length or variable-length representations for variable encoding, offering a degree of flexibility.

Although these algorithms differ in their search principles, their processes generally follow a “neighborhood search” model: starting from one or a set of initial solutions, generating candidate solutions in the neighborhood under the control of key parameters, updating the state according to deterministic or probabilistic criteria, and adjusting the search parameters according to the strategy. This cycle continues until the convergence condition is met, thus obtaining the optimal or near-optimal solution.

In mathematical optimization, the core of a problem lies in defining an objective function

f (x)

, where

x \in R^{n}

denotes a set of decision variables of dimension n. The objective function may present itself in a tractable explicit form—such as the loss functions commonly used in regression or neural networks—or in an implicit form that cannot be explicitly described by a closed mathematical expression. For tasks involving implicit or highly complex objectives, conventional analytical methods often become infeasible, and heuristic optimization algorithms provide an effective alternative.

The optimization goal is typically to obtain a solution that maximizes or minimizes the objective value:

max f (x) or min f (x) .

(8)

When multiple objectives

f_{i} (x), 1 \leq i \leq m

, are involved and each falls within a bounded interval, such as

[0, 1]

, the optimization process becomes more intricate. A common strategy is to transform multi-objective optimization into a single-objective problem through normalization or aggregation. For example:

min_{x} (max_{i} f_{i} (x) - min_{i} f_{i} (x)),

(9)

which seeks balanced outcomes by minimizing the disparity among objectives. This form of transformation enables more tractable computation while preserving the essential structure of the multi-objective problem.

In most real-world applications, the variable set x is continuous rather than discrete, making certain traditional search strategies less suitable. In contrast, heuristic algorithms are flexible and can be directly applied or extended to continuous domains, offering clear advantages for complex optimization tasks.

2.6. Introduction to Attention Mechanisms

In traffic flow prediction, the attention mechanism is widely used to address the information redundancy problem caused by large amounts of input information and multiple feature dimensions [42]. With the continuous growth of traffic network data, traditional recurrent neural networks or long short-term memory networks are prone to gradient vanishing or information decay when processing long-term sequences or multi-node dependencies, affecting prediction accuracy [42]. By introducing the attention mechanism, the model can focus on the traffic nodes or time periods most relevant to the prediction target from numerous input features, reducing attention to irrelevant information and thus improving prediction efficiency and accuracy [42].

Since its introduction, the Transformer model has demonstrated superior performance in traffic flow prediction due to its unique self-attention mechanism and efficient parallel computing capabilities. Unlike traditional recurrent structures, the Transformer can effectively capture long-term sequence dependencies, simultaneously handle multi-node features of the entire traffic network, avoid the gradient vanishing problem, and model complex spatiotemporal relationships.

In the Transformer structure, the input features of each time step or traffic node can dynamically adjust their weights based on information from other nodes in the entire sequence or network, thus more accurately reflecting the spatiotemporal dependencies of traffic flow. Figure 4 illustrates the core structure of the Transformer model used for traffic flow prediction: Input features first pass through embedding and positional encoding to integrate temporal and spatial information. The encoder stack processes the input through multiple layers of the same structure, each layer including a multi-head self-attention mechanism (to capture spatiotemporal dependencies), addition and normalization operations (residual connections and layer normalization), and a feedforward network. For multi-step traffic flow prediction tasks, the decoder stack uses previous predictions for autoregressive generation, with each layer containing masked multi-head attention (restricting attention to only historical information), multi-head attention to the encoder output (aligning historical and target features), and a feedforward network with additive normalization operations. Finally, the decoder output generates predicted values through linear projection and either a softmax or regression layer.

The core advantages of this structure are: through the attention mechanism, the model can flexibly capture the long-term temporal and spatial dependencies of traffic flow; through parallel computing and residual connections, the model training is stable and efficient; through the encoder-decoder structure, Transformer can simultaneously handle multi-node, multi-step prediction tasks, thereby achieving high-precision traffic flow prediction in complex traffic networks.

2.7. Benchmark Datasets for Traffic Flow Prediction

The selection of benchmark datasets is fundamental to evaluating and comparing traffic flow prediction methods. Existing datasets vary considerably in terms of data source, spatial coverage, temporal resolution, feature richness, and scale. This subsection provides a systematic overview of the most widely adopted datasets, categorizes them by collection modality, and discusses their respective strengths and limitations to guide methodological selection.

2.7.1. Fixed-Sensor Freeway Datasets

The most extensively used datasets in the traffic prediction literature are derived from the California Performance Measurement System (PeMS), which aggregates real-time data from inductive loop detectors embedded in state freeway pavement. The PeMS family—including PeMS03, PeMS04, PeMS07, and PeMS08—records flow (vehicles per unit time), occupancy (proportion of time a sensor is occupied), and speed at 5-min intervals, covering hundreds of sensors over periods of one to several months. Due to their standardized preprocessing pipelines, clear graph topologies, and multi-attribute nature, PeMS datasets have become the de facto standard for benchmarking spatiotemporal GNN models.

Among derivative datasets, METR-LA and PEMS-BAY—curated from the original PeMS data and widely adopted following the DCRNN study—provide speed-only measurements at 207 and 325 sensors, respectively. Although these datasets contain only a single feature (speed), their well-established train/validation/test splits have made them indispensable for reproducible comparison. The Loop Seattle dataset extends this category to the Pacific Northwest, enabling limited cross-regional evaluation.

2.7.2. Urban Mobility and Trajectory Datasets

A second category comprises datasets derived from GPS-equipped floating vehicles, predominantly taxis and ride-hailing fleets. TaxiBJ aggregates taxi trajectory data in Beijing into a grid-based inflow/outflow representation, supporting crowd flow prediction tasks. SZ-Taxi covers 156 road segments in Shenzhen with 15-min resolution, offering a graph-compatible urban speed dataset. NYC-Taxi and NYC-Bike, derived from official New York City open data portals, provide zone- or grid-level demand records spanning multiple years, making them suitable for long-horizon and demand forecasting studies.

The TDrive dataset, collected from 10,357 Beijing taxis over one week, is primarily used for route inference and speed estimation rather than direct flow prediction, but it supports research on data-sparse scenarios and trajectory-based spatiotemporal modeling.

2.7.3. Large-Scale and Multi-City Datasets

As models scale toward city-wide deployment, large-scale benchmark datasets have emerged to address the limitations of small, single-city evaluations. LargeST, constructed from over 8600 PeMS sensors spanning five years (2017–2021), is designed specifically to assess the scalability of spatiotemporal models and reveals that many methods performing well on PeMS04/PeMS08 degrade substantially at larger scales. UTD19, aggregating over 23,000 sensor time series across 39 global cities, enables multi-city and cross-domain generalization experiments, directly addressing the transferability concerns raised in federated and meta-learning studies reviewed in Section 3.1.

The Next Generation Simulation (NGSIM) dataset occupies a distinct niche, providing sub-second vehicle trajectory data from video-based tracking on California freeway segments. While its microscopic resolution makes it unsuitable for network-level flow prediction, NGSIM serves as the primary reference for car-following model validation and congestion formation analysis at the vehicle-interaction level.

2.7.4. Comparative Summary and Dataset Selection Guidance

Table 2 provides a structured comparison of the datasets reviewed above across nine dimensions: region, data source type, number of sensors or nodes, temporal coverage, sampling interval, available features, primary prediction task, and data accessibility. Several observations are noteworthy.

Geographic representativeness. The majority of widely used datasets originate from the United States (primarily California), which raises concerns about geographic representativeness. Studies relying exclusively on PeMS-derived data may overfit evaluation to the specific traffic regime, road network topology, and sensor density of California freeways, limiting the generalizability of reported conclusions to other regions or road types.

Feature dimensionality. Most datasets provide only flow, speed, or occupancy as primary features, with few datasets natively incorporating weather conditions, incident records, or land-use attributes. This structural limitation has motivated the use of data augmentation and multimodal fusion strategies reviewed in Section 3.3, but also means that reported model performance often reflects optimistic upper bounds achievable only under complete sensor conditions.

Temporal coverage inconsistency. Temporal coverage varies from a few weeks (NGSIM, SZ-Taxi) to multiple years (NYC-Taxi, LargeST), creating significant inconsistency in the evaluation of seasonal patterns and long-term model stability. Future benchmark construction should standardize temporal coverage to include at least one full annual cycle to enable rigorous assessment of periodic and seasonal effects.

In practice, researchers should select datasets according to their methodological focus: PeMS04 and PeMS08 for standard spatiotemporal GNN benchmarking; METR-LA and PEMS-BAY for reproducible comparison with canonical baselines; LargeST and UTD19 for scalability and generalization experiments; and TaxiBJ or NYC-Taxi for urban demand forecasting and grid-based modeling. Wherever possible, multi-dataset evaluation should be adopted to substantiate the generalizability of reported improvements.

3. Traffic Flow Prediction Based on Graph Neural Network Method

This chapter reviews the systematic evolution of graph neural networks in traffic flow prediction, providing a structured summary of existing research by combining the inherent logic and core innovation mechanisms of methodological development. Overall, this field has undergone a continuous evolution from static modeling to dynamic modeling, from single-point prediction to collaborative and fusion, and from closed modeling to cross-domain integration.

Early research primarily relied on predefined fixed topological structures to characterize road network spatial relationships. Subsequently, attention mechanisms were introduced to dynamically weight the influence of neighboring nodes, further developing into learnable adjacency matrices and adaptive graph structures, fundamentally alleviating the problem of spatial dependence changing over time. Building on this, to address cross-regional data silos and privacy compliance constraints, federated learning frameworks were introduced and deeply integrated with graph neural networks, forming a collaborative training paradigm of “data remains stationary, model moves dynamically.” Simultaneously, multi-graph fusion, robust modeling, and cross-domain technology embedding are continuously expanding the expressive boundaries of models, gradually leading traffic prediction towards a perception-decision integration model.

Based on a systematic analysis of relevant literature, existing research can be summarized into four major paradigms.

(1): Federated Learning and Privacy Protection Methods: Solving cross-regional collaboration and data privacy protection issues through federated learning frameworks, and combining graph neural networks to improve spatiotemporal dependency modeling capabilities.
(2): Dynamic Graph Neural Network Methods: Utilizing dynamic or adaptive graph structures to characterize the time-varying spatiotemporal relationships in transportation networks, thereby improving the model’s adaptability to dynamic traffic patterns.
(3): Multi-Graph Fusion and Attention Mechanism Methods: Integrating multiple graph structures (such as semantic graphs and topological graphs) or employing attention mechanisms to enhance the model’s ability to capture complex spatiotemporal features.
(4): Other Innovative Methods: Including the introduction of new technologies such as reinforcement learning, Bayesian networks, and information geometry to address specific challenges such as data uncertainty or hierarchical structure modeling.

3.1. Federated Graph Neural Network Methods

Currently, many researchers are dedicated to addressing the pressing issue of “data privacy and data silos” to meet the development needs of intelligent transportation. In real-world transportation systems, data is often scattered across different regions and institutions, making centralized sharing difficult due to privacy policies and regulations. Therefore, achieving cross-regional collaborative modeling while ensuring data remains within its domain has become a key challenge for intelligent transportation prediction. Federated learning (FL) provides a feasible and efficient technical approach to solving this problem [58]. Its core concept is “data is not shared, but knowledge can be shared,” thus achieving a balance between privacy protection, cross-regional collaboration, and high-accuracy prediction. Specifically, as shown in the Figure 5, this framework demonstrates a multi-client graph neural network system based on federated learning, where multiple local clients (e.g., Local Client k) train their own AFSTGCN models, each containing a local prediction loss

L_{p}

and a model memory loss

L_{m}

. The central server aggregates the model parameters

θ_{k}

of each client (with weights

M_{k}

) to obtain the global model

θ_{g}

, achieving collaborative model optimization and knowledge sharing. The diagram also includes components such as LTP (Local Training Process), Meme Model, and RND (which may represent random initialization or noise mechanism), which together constitute a hierarchical federated graph learning system with a memory mechanism.

The pioneering work of Feng et al. [59] laid the foundation for this direction. Their proposed federated spatiotemporal prediction framework consists of two stages: road network partitioning and federated model training. In the partitioning stage, dynamic time warping and K-means clustering are used to perform pattern-driven sub-network decomposition of the traffic road network. In the training stage, each sub-network learns locally using a spatiotemporal graph neural network model, and knowledge distillation is used to mitigate the heterogeneity of models and tasks caused by differences in data distribution. A multi-factor weighting strategy is also designed to improve the fairness and accuracy of global aggregation. This method systematically discusses the issue of federated heterogeneity in traffic prediction scenarios for the first time, providing a reference paradigm for subsequent research. Based on the above ideas, Xia et al. [60] further focused on the efficiency and deployability of short-term traffic prediction. They incorporated community detection methods into the federated framework, refining the local subnetwork into multiple community units, and training each unit locally based on a graph convolutional network. This approach reduces global communication overhead and the risk of data leakage, while improving the model’s flexibility and response speed in practical deployment, providing an efficient and privacy-preserving solution for real-time prediction scenarios. Wang et al. [61] made significant breakthroughs at the structural level of the federated framework. Their proposed “Federated Graph Neural Network and Equivalent Hypergraph” framework focuses on solving the problem of missing cross-client connections. This model maps local traffic graphs to high-order supernodes and constructs a dynamically adjustable global hypergraph. Through performance feedback-driven hyperedge update mechanisms, it automatically adds or removes potential cross-client associations. This design effectively restores the cross-regional connection structure broken due to privacy isolation, thereby reconstructing a more complete global traffic space model while protecting data privacy.

Furthermore, Liu et al. [62] extended federated graph neural networks to prediction-driven decision-making scenarios, proposing a federated load balancing framework based on spatiotemporal prediction and reinforcement learning to optimize the neighbor cell relationship configuration of cellular network base stations. This research demonstrates that the federated prediction model not only possesses data security advantages but can also serve as a high-precision decision-making basis for resource optimization in complex network systems, showcasing its potential value in cross-domain applications.

As shown in Table 3. From a spatial-context perspective, federated GNN methods are most suitable for cross-jurisdictional, multi-city deployments where data sovereignty constraints structurally preclude centralized aggregation—a scenario common in metropolitan area transportation networks spanning multiple administrative boundaries. Their advantage diminishes in single-city deployments with centralized data access, and they are particularly ill-suited for sparse-sensor rural environments where even local graph construction is data-limited and the communication overhead of federated protocols adds cost without proportional benefit. Existing federated traffic prediction studies exhibit several methodological and practical limitations. Most approaches rely on static or predefined subnetwork partitioning strategies, limiting adaptability to dynamic road conditions and unexpected events. Although knowledge distillation and aggregation mechanisms improve accuracy and fairness, they rarely address model interpretability, causal inference, or the transferability assumptions underlying latent representation sharing across heterogeneous sub-networks. Theoretical foundations also remain insufficient: convergence guarantees under non-IID spatiotemporal graph data are poorly established, and the widespread adoption of FedAvg-style aggregation—originally designed for statistically homogeneous settings—lacks rigorous justification in heterogeneous traffic scenarios. Moreover, communication-efficient designs often overlook real-world deployment constraints, such as uneven edge computing capacity and system cost. Evaluation protocols are highly inconsistent across studies, with varying client numbers, data splits, and privacy budgets, making cross-study comparisons scientifically unreliable. Additionally, the robustness and generalization of cross-domain topology reconstruction under extremely sparse or anomalous data conditions require further validation. Future research should therefore prioritize adaptive partitioning mechanisms, theoretically grounded aggregation strategies, interpretable and causally informed modeling, and standardized federated benchmarks with explicit threat models to support reliable large-scale intelligent transportation applications.

3.2. Dynamic and Adaptive Graph Structure Methods

Currently, many researchers are directly addressing the core dynamic challenges of transportation systems, striving to overcome the limitations of traditional predefined graph structures. These studies recognize that the interactions between nodes in a road network are not static but dynamically evolve with scenarios such as rush hours and unexpected events. Therefore, their core innovation lies in introducing a data-driven mechanism, allowing the model to learn and construct spatial relationships that best reflect the current traffic conditions. Specifically, as shown in Figure 6, the model achieves this objective through two core modules: the residual connection module and the self-attention module [63]. The residual connection module comprises linear layers, temporal convolutional networks (TCN), and graph convolutional networks (GCN). Within this module, the gated temporal convolution from the TCN-a branch and the TCN-b branch forms a gating mechanism through the activation functions tanh and

σ

, which is then combined with the GCN output and transformed via the weight matrix

W_{t}

. This architecture mitigates gradient vanishing through skip connections and stabilizes training [64]. The self-attention module consists of alternating linear layers and ReLU activation functions, which compute correlation weights between sequence elements to capture dynamic global spatial dependencies. These two modules work synergistically, enabling the model to adaptively learn dynamic spatial dependencies and effectively capture complex spatiotemporal patterns in traffic data.

Specifically, the work of Wu et al. [65] is inspiring. They pointed out that strong correlations may exist between non-adjacent road segments, thus proposing a “spatiotemporal aggregation graph neural network.” This model not only relies on a given spatial adjacency graph but also innovatively generates a “time graph” from the spatiotemporal data itself and calculates the correlation coefficient matrix, thereby compensating for the shortcomings of a single spatial graph in expressing temporal correlations and achieving a more comprehensive enhancement of the feature relationships of the road network. Building on this, Gu et al. [66] dynamic correlation graph convolutional network“ goes a step further, completely abandoning predefined graph structures. This model directly constructs an adjacency matrix from the input multivariate time series data based on real-time calculated correlation coefficients. This “no-preset” approach endows the model with powerful adaptability, enabling it to discover hidden, potential spatial dependencies based on the characteristics of different datasets, and even the state of the same dataset at different times. To pursue even greater dynamism, Ma et al. [67] treat the graph structure itself as a learnable, continuously evolving entity. Their “spatiotemporal evolutionary graph neural network” continuously updates its semantic adjacency matrix throughout the model’s training process. This means that the graph structure is no longer fixed after the initial setting, but can be continuously adjusted and optimized with the input of training data, so that its final form can better adapt to complex and ever-changing real traffic patterns. Another technical approach is to enhance the model’s expressive power. Hu et al. [68] made significant improvements to the classic “Graph WaveNet” architecture, introducing a self-attention mechanism to construct an adaptive adjacency matrix. This method allows the model to not only capture spatial proximity but also fit more complex, non-local spatiotemporal dependencies between nodes. Experiments show that its MAE, MAPE, and RMSE metrics are significantly reduced. Jiang et al. [69] dynamic graph spatiotemporal neural network“ employs a clever dual-graph strategy. They simultaneously constructed a static topological graph (representing the inherent physical connections of the road network) and a dynamic information graph (representing the similarity of traffic flow over time). This design allows the model to distinguish and utilize both stable structural relationships and rapidly changing dynamic connections between nodes, thus providing a more refined characterization of the spatiotemporal properties of the traffic network. To address the impact of external unforeseen events, Ye et al. [70] made targeted designs with their “Dynamic Multi-Graph Neural Network.” This model not only constructs multiple prior graphs to provide rich contextual information but also specifically designs a dynamic graph adjustment module, enabling the model to update the adjacency matrix based on the currently learned state at each training step. More importantly, it explicitly incorporates traffic accident event data, allowing the model to focus on and learn local traffic fluctuation patterns caused by accidents.

Finally, Chen et al. [71] focused on the dynamics of the temporal dimension. Their “time-based adaptive graph neural network” can generate different graph dependency matrices for different time steps, thereby accurately capturing the unique spatial correlation patterns of traffic flow at different times of the day.

Regarding spatial context suitability, dynamic and adaptive graph methods offer the greatest advantage in arterial-dominated urban networks with pronounced time-varying spatial dependencies—for instance, commuter corridors exhibiting strong tidal flow patterns where peak-hour connectivity structures differ substantially from off-peak configurations. In contrast, for stable low-density rural road networks where spatial dependencies are structurally fixed and sensor coverage is sparse, the high computational cost of dynamic graph construction is difficult to justify, and simpler static-graph or decomposition-based approaches are more practical. The performance gains of dynamic graph approaches are therefore strongly conditioned on the temporal variability of the target network’s spatial interaction patterns.

As shown in Table 4, This line of research has evolved from temporal graph generation to fully dynamic construction, continuous evolution, and spatiotemporal dual-graph modeling, progressively enhancing the adaptability of graph structures to real-world traffic dynamics. However, several limitations persist. Most methods rely heavily on correlation-based relevance matrices or attention scores, conflating statistical co-movement with genuine spatial influence and thus lacking causal semantics and stable physical interpretability. The widespread use of Pearson, DTW similarity, or learnable adjacency matrices optimizes predictive loss rather than structural fidelity, meaning the inferred graphs may deviate substantially from actual road-network topology, raising concerns about reproducibility and epistemological validity. Although frequent structural updates improve flexibility, they introduce high computational costs and training instability, limiting real-time deployment feasibility. External event modeling remains dependent on explicitly labeled data, making it difficult to capture implicit perturbations or unknown anomalies. Moreover, dynamic graphs are highly sensitive to data variation, and their robustness, generalization capacity, and adaptability under extremely sparse scenarios lack systematic cross-domain verification, especially when evaluation is confined to a single city or dataset. Future research should therefore balance adaptability with interpretability by incorporating causal constraints, improving structural consistency and robustness, and developing computationally efficient dynamic graph frameworks suitable for large-scale, real-world deployment.

3.3. Multi-Graph Fusion and Attention Mechanism Methods

The core of this research category lies in answering the question of “how to more fully and effectively utilize graph structures for traffic prediction.” Its fundamental motivation stems from the inherent complexity of traffic systems; a single perspective or traditional graph convolution methods are insufficient to fully characterize multi-layered spatiotemporal features. Therefore, related research generally focuses on multi-source graph information fusion and enhanced feature extraction mechanisms (especially attention mechanisms) to comprehensively improve model performance in breadth, depth, and robustness [72]. As illustrated in the Figure 7, a typical implementation employs an attention mechanism atop a cascade of RippleGNN modules, where each RippleGNN simulates the ripple-like propagation of information through the graph, capturing deep and long-range spatiotemporal dependencies layer by layer. The attention mechanism then adaptively fuses these multi-level features to distinguish the importance of different information, with the integrated representation ultimately fed into the prediction layer to generate accurate traffic forecasts [73]. This architecture exemplifies how combining hierarchical graph propagation with attention-based fusion effectively captures complex dynamic spatial dependencies and enhances prediction performance.

In terms of breadth, this direction achieves more comprehensive traffic knowledge representation by constructing and fusing multiple types of graph structures. As shown in Table 5. For example, Wang et al. [74] introduced graph attention mechanisms into traffic prediction, enabling models to aggregate differentiated features based on neighborhood importance rather than simple averaging, thereby enhancing the feature discriminative power of spatial representations. Building on this, Wang et al. [75] further proposed introducing channel attention mechanisms into spatiotemporal graph convolution, allowing the model to adaptively adjust the importance of different feature channels, thus optimizing the selective attention to key spatiotemporal patterns. Regarding robustness, research has begun to focus on data noise, missing data, and uncertainty in real-world traffic environments. peng et al. [76] proposed a hybrid spatiotemporal graph neural network that simultaneously constructs a static adaptive graph, a dynamic learning graph, and a semantic graph (generated by dynamic time warping and masked attention). It employs multi-scale gated temporal attention to model complex temporal dependencies, achieving leading performance on multiple public datasets and demonstrating the significant potential of multidimensional modeling strategies.

In terms of spatial context suitability, multi-graph fusion and attention-based methods are best suited to high-density urban cores where rich sensor coverage, complex overlapping spatial relationships (topological, functional, and flow-based), and dense OD demand matrices provide sufficient multi-perspective input signals to justify the increased modeling complexity. In moderately dense arterial networks, selective adoption of dual-graph strategies (combining static topology and dynamic information graphs) can provide a practical balance between expressiveness and computational efficiency. However, in sparse-sensor environments, the multi-graph paradigm is fundamentally constrained by insufficient input diversity: constructing meaningful semantic or functional graphs requires adequate sensor density, and the absence of such data can render multi-graph fusion architectures over-parameterized relative to available information, increasing the risk of overfitting.

Furthermore, some studies have further expanded the spatial modeling paradigm at both theoretical and applied levels. For example, Cheng et al. [77] and others emphasized simultaneous modeling of spatiotemporal dependencies and incorporated external factors such as weather and events into a joint prediction framework; Han et al. [78] and others pioneered the introduction of Ollivier–Ricci curvature into traffic graph modeling, using “neighborhood-neighborhood” relationship constraints based on optimal transport theory to guide feature propagation, integrating differential geometric constraints into graph structure learning, and theoretically expanding the boundaries of spatial modeling. However, existing approaches still exhibit notable limitations. Multidimensional graph constructions and complex attention mechanisms substantially increase computational cost while offering only limited and sometimes misleading interpretability. Multi-graph fusion strategies are largely based on empirical concatenation or gating designs, lacking a unified theoretical framework to guide how heterogeneous graph signals should be integrated. As a result, key architectural choices—such as graph types, fusion order, and weighting schemes—are often determined by trial-and-error, introducing significant experimenter degrees of freedom and raising concerns about overfitting and weak cross-scenario generalization. Moreover, attention weight visualization is frequently used as evidence of interpretability, yet it does not reliably reflect feature importance or causal attribution, making such claims scientifically fragile, particularly in safety-critical traffic applications. Although robustness enhancements mitigate noise effects, they still depend on external labels, heuristic rules, or prior knowledge, limiting autonomous learning capacity and universality. Meanwhile, theoretically appealing methods such as Ollivier–Ricci curvature lack large-scale validation, and their optimal transport–based edge weighting entails high computational complexity that challenges real-world deployment. Therefore, future research should pursue theoretically grounded, computationally efficient spatial graph learning frameworks that balance multidimensional expressiveness with causal interpretability and scalable practicality for large-scale intelligent transportation systems.

3.4. Cross-Domain Technology Integration Methods

This category of research represents the most cutting-edge and exploratory directions in the field of traffic prediction. They go beyond incremental improvements to existing models, boldly integrating graph neural networks with cutting-edge technologies from other fields, or designing entirely new network architectures to address specific bottlenecks or open up entirely new applications.

One group of studies focuses on closing the prediction and decision-making loop. As shown in Table 6. The work of Xing et al. [80] is a prime example; their proposed RL-GCN model integrates graph convolutional networks, LSTM, and reinforcement learning. GCN and LSTM are responsible for sensing and predicting traffic flow, while the reinforcement learning part formulates the optimal traffic control strategy based on the prediction results, achieving a leap from “seeing” to “decision-making” and providing a blueprint for building truly intelligent traffic control systems. Yang et al. [81] is a milestone in this direction. They developed a “deep learning framework for integrating macroscopic traffic flow models,” the core of which is the deep integration of cellular transport models (a classic macroscopic traffic flow theoretical model) with deep learning. The CTM model mathematically describes traffic state propagation given initial and boundary conditions, while a spatiotemporal attention RNN is responsible for predicting these boundary conditions. Finally, an extended Kalman filter is used to assimilate the prediction results, ensuring compliance with the law of traffic conservation. This “theory-guided data-driven” approach ensures that the predictions are not only accurate but also conform to physical laws. Regarding model efficiency and practicality. Rajagopal et al. [82] proposed the MTH-QGNN traffic flow prediction model, which integrates hypercurvature embedding, meta-learning, quantum graph neural networks, and neural ordinary differential equations to improve the spatiotemporal modeling capabilities and cross-city adaptability of large-scale traffic networks. Experimental results show that the proposed method achieves high prediction accuracy and stability on the Los-loop and SZ-taxi datasets. An et al. [83] proposed a spatiotemporal graph convolutional network model, IGAGCN, based on information geometry and attention mechanisms, for traffic flow prediction in urban road networks. This method addresses the prediction difficulties arising from the dynamic spatiotemporal characteristics and external environmental factors in real traffic systems. It characterizes the data distribution differences between different sensors using information geometry methods and constructs a dynamic relationship matrix using an attention mechanism, thereby more effectively capturing the spatiotemporal dependencies in traffic flow data and improving the model’s ability to express complex traffic dynamics and its predictive performance. Lv et al. [84] proposed the TS-STNN (Tree Structure Spatiotemporal Neural Network) traffic flow prediction model, which extracts spatial information of the traffic network by constructing a spatial tree matrix with hierarchical and directional features, and combines it with GRU to model temporal dependencies. Experimental results show that this method has higher prediction accuracy than the baseline model in various scenarios. Another group of studies emphasizes combining physical models with data-driven approaches to enhance the theoretical rationale for prediction. The research of and Abbas et al. [85] proposed the DFHITSSC framework, which leverages the complementary strengths of SVM and artificial neural networks through a decision-level fusion strategy enhanced by fuzzy inference.

Taken together, the cross-domain methods reviewed in this section address distinct spatial deployment contexts that earlier paradigms do not adequately serve. Lightweight architectures such as Light-ASTNN are specifically designed for resource-constrained edge deployments—roadside units, in-vehicle systems, or IoT-scale sensors—where strict memory and latency budgets make full-scale GNN stacks impractical. Tree-structure spatial neural networks are suited to hierarchically organized road networks (e.g., freeway-arterial-local road hierarchies) in which the directional and level-based structure of traffic flow provides natural tree-topological priors. Physics-integrated frameworks such as MTFD are most valuable in contexts where labeled data are scarce but physical relationships (e.g., conservation laws, fundamental diagram constraints) are well-established—including rural highway corridors and newly instrumented networks with limited training data. Finally, the meta-learning component of MTH-QGNN specifically targets the cross-city generalization problem: networks in cities without existing prediction infrastructure can leverage models pre-trained on data-rich cities, directly addressing the practical challenge of deployment in low-data spatial contexts. These context-specific advantages highlight that the choice among cross-domain methods should be guided not only by accuracy benchmarks but also by the structural, resource, and data characteristics of the target deployment environment.

4. Application of Intelligent Optimization and Hybrid Deep Learning in Traffic Flow Prediction

This type of research represents the most cutting-edge and exploratory direction in the field of traffic prediction. It moves beyond incremental improvements to existing models, actively pushing paradigm boundaries by deeply integrating graph neural networks or time-series models with cutting-edge technologies from other fields, or reconstructing entirely new network architectures to solve long-standing bottlenecks and expand into new application scenarios. Against this backdrop, this chapter systematically reviews three types of hybrid deep learning prediction paradigms centered on LSTM to address the inherent non-stationarity and complexity of traffic flow data, forming complementary technical paths.

The first type is the fusion of decomposition algorithms and LSTM, following a “decomposition-prediction-reconstruction” technical route. Through signal processing techniques such as VMD and CEEMDAN, the original traffic sequence is decomposed into several relatively stationary subsequences and modeled separately, thereby reducing prediction difficulty and mitigating the impact of non-stationarity. However, the multi-stage processing flow easily introduces accumulated errors, has high computational costs, and is highly sensitive to decomposition parameters (such as the number of modes), limiting its real-time deployment capabilities.

The second type is the combination of heuristic optimization algorithms and LSTM. These methods utilize metaheuristic strategies such as dung beetle optimization, whale algorithms, and particle swarm optimization to automatically search for hyperparameter configurations, reducing reliance on manual parameter tuning experience and improving model performance to some extent. However, within the black-box search framework, the true source of performance improvement is difficult to explain, and systematic horizontal comparisons of different optimization algorithms on a unified public benchmark are scarce, weakening the verifiability of the methodology.

The third category is the fusion of attention mechanisms and LSTM. By introducing structures such as multi-head self-attention, bidirectional LSTM, and Transformer, the model can adaptively focus on more predictive spatiotemporal features, demonstrating outstanding performance in long-range dependency modeling. However, attention weights are not equivalent to causal attribution, and their “interpretability” claims still require careful evaluation in safety-critical traffic management scenarios.

Overall, these three paradigms expand the application boundaries of LSTM in traffic prediction from three dimensions: signal decomposition, parameter optimization, and feature selection. Together with cutting-edge exploratory research such as graph neural networks, they constitute an important trend in the evolution of traffic prediction technology from single-model optimization to multi-technology fusion, structural innovation, and system-level intelligence.

Before proceeding, it is worth clarifying the functional rationale underlying these recurring hybrid combinations, as the same structural pairings—decomposition with LSTM, optimization with LSTM, and attention with LSTM—appear repeatedly across studies. This convergence is not coincidental but reflects three distinct functional objectives that these combinations are designed to serve. Specifically, the fusion of decomposition algorithms (e.g., VMD, CEEMDAN) with LSTM primarily targets non-stationarity reduction: by transforming a non-stationary traffic sequence into quasi-stationary sub-components, the prediction task is simplified and the risk of model misspecification is reduced. The pairing of heuristic optimization algorithms with LSTM primarily addresses search space reduction for hyperparameter configuration: rather than navigating a high-dimensional, non-convex parameter space manually, metaheuristic strategies automate this search and reduce sensitivity to initialization. The integration of attention mechanisms with LSTM primarily serves to stabilize parameter estimation under long-range dependency conditions: attention selectively weights temporal features, mitigating the gradient decay that degrades standard LSTM performance on extended sequences. Recognizing these distinct functional objectives helps explain why these combinations emerge persistently and provides a more principled basis for method selection in practice.

4.1. Decomposition Algorithm Combined with LSTM Prediction

The primary functional objective of decomposition-LSTM hybrids is non-stationarity mitigation: signal processing techniques restructure the input sequence to reduce distributional complexity before modeling. Nowadays, many researchers adopt the “decomposition-prediction-reconstruction” research paradigm, which aims to decompose the non-stationary and nonlinear original traffic flow sequence into a series of relatively stationary subsequences through signal processing techniques, thereby reducing the difficulty of model learning. As illustrated in the Figure 8, A typical implementation of this idea is a hybrid time series prediction model that first applies STL decomposition to split the original data into trend, seasonal, and residual components. Then, different models are employed to capture distinct patterns in each component: LSTM models for trend data, ARIMA models for seasonal patterns, and XGBoost models for residuals. Finally, the predictions from these models are integrated to generate the overall forecast. This “decomposition-prediction-integration” strategy leverages the strengths of each model, effectively capturing linear trends, cyclical fluctuations, and complex nonlinear patterns in the traffic data, thereby improving the overall prediction accuracy and mitigating the challenges posed by non-stationarity in traffic flow sequences. As shown in Table 7, Wang et al. [86] proposed an IHPO-VMD-LSTM-Informer model in which an improved Hunter–Prey Optimization algorithm adaptively determines key VMD parameters while NPCA reduces feature dimensionality, thereby extracting more informative traffic indicators. Vo et al. [87] further advanced this direction by integrating FVMD for signal decomposition, WOA for parameter optimization, and GA for model selection, enabling each decomposed component to be assigned the most suitable deep model (e.g., LSTM, BiLSTM, GRU), which significantly boosts accuracy and reduces inference time. Zhou et al. [88] employed CEEMD combined with a novel differencing operation to stabilize traffic data and applied Bayesian optimization to search for optimal LSTM hyperparameters, achieving strong performance in highly stochastic air traffic flow. Similarly, the approaches of Zhao et al. [89] and Dai et al. [90] rely on improved heuristic optimizers (e.g., IDBO, enhanced bat algorithm) to refine VMD decomposition and optimize LSTM parameters, thereby generating more physically meaningful sub-sequences and improving predictive capability. In general, these methods effectively mitigate non-stationarity, enhance feature representation, and improve forecasting precision by modeling decomposed components individually. However, they still face limitations such as high computational cost and poor real-time capability, potential error accumulation during reconstruction, strong sensitivity to parameter settings (e.g., VMD mode number), and increased model complexity that reduces interpretability and hinders deployment in real industrial applications.

However, these methods still have significant shortcomings: First, the decomposition and model optimization processes are computationally expensive and lack real-time performance; second, the reconstruction stage may introduce accumulated errors, affecting overall prediction accuracy; third, the model is highly sensitive to parameters (such as the number of VMD patterns), limiting its generalization ability; furthermore, multi-stage processing increases model complexity, reduces interpretability, and hinders deployment in real-world industrial environments. Therefore, future research urgently needs to explore low-cost, robust, and interpretable decomposition-prediction-reconstruction strategies to balance accuracy and practical applicability. Beyond these engineering concerns, deeper methodological issues persist. VMD and CEEMD implicitly assume component-level stationarity, an assumption often violated under incident-driven or abrupt non-stationary traffic conditions. The common practice of treating decomposed sub-signals as independently predictable lacks rigorous theoretical justification, as inter-component dependencies may be non-trivial, potentially introducing systematic bias when modeled separately. In addition, most decomposition-LSTM studies rely on fixed in-sample train–test splits without cross-validation or out-of-distribution evaluation, risking inflated generalization claims. The exclusive use of RMSE and MAE further obscures asymmetric error costs in traffic management, where underestimating peak flow can be significantly more consequential than overestimation. Future research should therefore develop computationally efficient, theoretically grounded, and robust decomposition–prediction–reconstruction frameworks, accompanied by more rigorous and context-aware evaluation protocols to ensure both predictive accuracy and practical reliability.

4.2. Heuristic Optimization Algorithm Combined with LSTM Prediction

The primary functional objective of optimization-LSTM hybrids is search space reduction: heuristic algorithms replace expert-driven manual tuning by efficiently exploring the hyperparameter configuration space. These studies aim to address the reliance on expert-driven manual tuning of LSTM hyperparameters (e.g., number of layers, neurons, learning rate) by employing heuristic algorithms to automatically search for optimal configurations. Figure 9 implementation involves using the IVY optimization algorithm to automatically select and train an LSTM model. The process begins with data preprocessing and defining the key hyperparameters, such as the number of neurons in a two-layer LSTM, dropout rate, and batch size, along with their corresponding search ranges. The IVY algorithm generates an initial population of candidate hyperparameter sets, with each individual representing a unique combination. Through iterative evolution, each LSTM model is evaluated using metrics such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (

R^{2}

), thereby guiding the search toward the global optimum.

Moreover, the workflow explicitly illustrates the internal structure of an LSTM unit, including the transmission of input (

x_{t}

), cell state (c), and hidden state (h) across time steps. This demonstrates how temporal dependencies are captured and processed. Overall, such heuristic-driven automated tuning strategies effectively reduce reliance on manual expertise and enhance the predictive performance of LSTM models in time series forecasting. As shown in Table 8, Dong et al. [91] introduced a novel dung beetle optimizer to tune LSTM hyperparameters and achieved high precision in maritime traffic forecasting, demonstrating its effectiveness in complex optimization tasks. Jardines et al. [92] applied LSTM to convective weather prediction in aviation, offering valuable decision support for air traffic flow management and highlighting LSTM’s potential in spatiotemporal forecasting. Guo et al. [93] proposed the MVHS-LSTM, which dynamically selects epochs and learning rates through heuristic iteration and integrates ordinary least squares for feature selection, achieving a balance between accuracy and efficiency. Fu et al. [94] extended an LSTM-CNN model with Bayesian inference to quantify predictive uncertainty, a critical aspect of risk management in traffic systems. Cini and Aydin [95] developed a deep ensemble model that adaptively weights base learners according to past performance rather than simple averaging, leading to more responsive forecasting. In addition, Zhuang and Cao [96] leveraged K-nearest neighbors for spatial filtering and BiLSTM for temporal modeling, proving effective on UK highway traffic data. Vijayalakshmi et al. [97] utilized stacked LSTM autoencoders for weather-feature compression and combined BiLSTM and CNN to achieve accurate traffic prediction and congestion recognition in multivariate settings. Cao et al. [98] adopted whale optimization to tune LSTM parameters and applied multi-channel graph convolution to capture spatial dependencies, improving regional forecasting accuracy. Hussain et al. [99] demonstrated the suitability of deep networks in urban environments through a hybrid GRU–BiLSTM architecture, while Lan et al. [100] and Wang et al. [101] applied grey wolf optimization and particle swarm optimization, respectively, confirming the broad applicability of different heuristics in LSTM tuning. Overall, these methods automate hyperparameter search, enhance model performance and robustness, and validate a wide range of optimization algorithms. However, they also suffer from high computational overhead, risk of convergence to suboptimal solutions, new parameter-setting demands for the optimizers themselves, reduced interpretability of the tuning process, and increased engineering complexity in practical deployment. A methodologically distinct contribution to ensemble-based traffic prediction is the Improved Bayesian Combination Model with Deep Learning (IBCM-DL) proposed by Gu et al. [102] Unlike heuristic search strategies that optimize a single model’s hyperparameters, IBCM-DL employs a principled Bayesian weighting mechanism to combine three heterogeneous sub-predictors—a gated recurrent unit neural network (GRUNN), an autoregressive integrated moving average model (ARIMA), and a radial basis function neural network (RBFNN)—into a unified probabilistic forecasting framework. The Bayesian combination assigns posterior weights to each sub-model based on its predictive likelihood, dynamically reflecting each model’s relative reliability under varying traffic conditions. Empirical validation on highway traffic data from Beijing demonstrates that this approach effectively overcomes the error magnification phenomenon inherent in traditional fixed-weight combination schemes, yielding superior accuracy and stability compared to individual deep learning models, classical machine learning methods, and naive ensemble averaging. From a methodological standpoint, IBCM-DL represents an important bridge between classical Bayesian model averaging and modern deep learning ensembles: it provides theoretically grounded uncertainty quantification while retaining the flexibility of data-driven sub-models. However, the framework’s performance is sensitive to the initial selection and diversity of constituent sub-models, and extending the Bayesian weighting mechanism to accommodate more complex architectures such as Transformer-based encoders or graph neural networks remains an open challenge requiring further theoretical development.

Scientifically, the heuristic optimization paradigm as applied to LSTM hyperparameter tuning raises fundamental concerns about validity and reproducibility. Most studies treat hyperparameter optimization as a black-box search problem, reporting the best configuration found on a specific dataset without analyzing the sensitivity of model performance to hyperparameter perturbations. This makes it unclear whether the reported gains reflect genuine improvements in model architecture or merely fortuitous configurations tailored to specific data characteristics. Additionally, the no-free-lunch theorem implies that the superiority of any particular heuristic optimizer is dataset-dependent; yet comparative evaluations across optimizers on common traffic benchmarks are conspicuously absent from the literature. The absence of statistical significance testing—such as reporting confidence intervals or conducting Wilcoxon signed-rank tests over multiple runs—further undermines the scientific credibility of performance comparisons in this subfield.

4.3. LSTM Combined with Attention Mechanism for Prediction

The primary functional objective of attention-LSTM hybrids is feature selection stabilization: attention mechanisms direct the model’s capacity toward temporally or spatially informative features, reducing noise sensitivity and improving long-range dependency modeling. This line of research enhances traffic flow forecasting by integrating attention mechanisms into deep learning models, enabling the model to automatically prioritize key time steps or features and thereby address long-term dependency issues. Figure 10 implementation is a hybrid prediction model that combines Transformer and bidirectional LSTM (BiLSTM). The model takes a decomposed single IMF component as input, processed through a sliding window, and first incorporates positional encoding to inject sequential information. The core multi-head attention module then captures important temporal dependencies, with a masking mechanism to prevent future information leakage and residual connections to stabilize training. The attention-weighted features are subsequently fed into a BiLSTM layer for further sequential modeling, capturing both forward and backward dependencies. Finally, the processed features pass through a fully connected layer with dropout for integration and transformation, producing the final prediction output, which is then compared with the true values. This “attention-enhanced sequential modeling” strategy effectively leverages both global temporal correlations and deep bidirectional dependencies, improving the accuracy and robustness of traffic flow forecasting. As shown in Table 9. A representative example is the hybrid model proposed by Aburasain [103], where attention networks dynamically weight spatiotemporal features extracted by Bi-LSTM and CNN, allowing the model to focus on the most congestion-relevant information. Jia et al. [104] further extended attention to multiple domains—temporal, spatial, and frequency—by incorporating Transformer-based self-attention to learn frequency-domain representations, leading to more comprehensive feature modeling. In contrast, Shuvro et al. [105] replaced LSTM entirely with a Transformer architecture, leveraging intrinsic self-attention to parallelize the learning of long-range spatiotemporal dependencies and embedding predictions within an SDN-VANET framework for networked transportation. Song et al. [106] developed the TransFusion model, which applies Transformer-based attention at the fusion level to dynamically integrate outputs from both TCN and LSTM, allowing the model to decide which base predictor is more reliable under varying input conditions. Overall, attention mechanisms improve model expressiveness, enhance interpretability through weight visualization, and naturally support variable-length inputs. However, they also pose challenges, including high computational and memory costs—particularly for self-attention, increased training complexity due to larger architectures, limited interpretability as attention does not precisely reflect causal influence, and sensitivity to noisy data that may mislead attention allocation.

5. Discussion and Prospects

5.1. Discussion

This review systematically examines the latest advances in traffic flow prediction methods, with a particular focus on graph neural network-based approaches and hybrid deep learning frameworks. Through analysis of four main research categories, several important conclusions are drawn.

First, graph neural networks demonstrate significant advantages in capturing the inherent spatial dependencies of road networks. The evolution from static graph structures to dynamic adaptive methods represents a key advancement in time-varying traffic pattern modeling. Methods such as federated learning-based graph neural networks successfully address key challenges of data privacy and cross-regional collaboration, while dynamic graph construction techniques enable models to adapt to changing traffic conditions in real time.

Second, the fusion of multiple graph structures and attention mechanisms has proven effective in enhancing the expressive power of models. Multi-graph fusion methods can simultaneously capture different types of spatial relationships, such as physical connectivity, functional similarity, and flow correlation, thus providing a more comprehensive representation of traffic networks. Attention mechanisms further improve prediction accuracy by enabling models to selectively focus on the most relevant spatiotemporal features.

Third, hybrid methods combining decomposition algorithms, heuristic optimization, and attention mechanisms with deep learning models show promising application prospects in addressing the non-stationarity and complexity of traffic data. Decomposition-based methods effectively reduce prediction difficulty by transforming complex signals into more stable components, while optimization algorithms can automatically handle hyperparameter tuning and improve model robustness.

However, despite these advances, several challenges remain. Many state-of-the-art models suffer from high computational complexity, limiting their application in real-time traffic management systems. The “black box” nature of deep learning models raises concerns about their interpretability, which is crucial for practical deployment and decision-making. Furthermore, while most methods perform well on specific datasets, they lack sufficient validation across diverse traffic scenarios and geographical regions, raising questions about their generalization capabilities.

5.2. From Prediction to Operation: Bridging the Research-Practice Gap

A recurring limitation identified across the reviewed literature is the insufficient articulation of how prediction model outputs are translated into actionable traffic operational decisions. Most existing studies evaluate model performance exclusively in terms of predictive accuracy metrics—MAE, RMSE, and MAPE—without specifying how predicted variables are consumed by downstream control systems. This disconnect constrains the practical value of otherwise technically sophisticated models and represents a critical barrier to large-scale ITS deployment.

As illustrated in Figure 11, we propose a conceptual six-layer framework that explicitly maps the prediction-to-operation pipeline. At the core of this linkage, three categories of predicted state variables serve as direct inputs to operational decision systems. First, short-term flow (q), speed (v), and density (k) forecasts feed directly into adaptive signal control algorithms, where predicted saturation flow determines green phase allocation and cycle length adjustment. Second, travel time and congestion index predictions drive dynamic route guidance systems, informing variable message sign content and real-time navigation re-routing recommendations. Third, origin-destination demand forecasts support higher-level decisions including congestion pricing rate adjustment, transit fleet dispatching, and access restriction enforcement. The translation from predicted variables to control actions is not direct but mediated by a decision support interface comprising three functional components: threshold-based trigger logic (activating control responses when predicted states exceed operational thresholds), multi-objective optimization (balancing throughput, delay, emissions, and equity), and uncertainty quantification (propagating prediction confidence intervals into risk-aware control strategies). This intermediate layer is largely absent from current traffic prediction research, yet it is precisely where academic models must interface with real-world traffic management center infrastructure.

Furthermore, the framework highlights the importance of a real-time feedback loop: observed traffic responses to control actions are continuously fed back into the data input layer, enabling online model updating and closed-loop system adaptation. This feedback mechanism is essential for maintaining prediction accuracy under non-stationary traffic conditions—particularly during incidents, special events, or demand shifts—and connects directly to the federated learning and dynamic graph structure paradigms reviewed in Section 3.1 and Section 3.2. In summary, future traffic flow prediction research should not treat operational integration as an afterthought. Model design choices—including prediction horizon, output granularity, uncertainty representation, and computational latency—should be explicitly aligned with the requirements of target control applications. Establishing standardized prediction-to-operation interfaces would not only improve the practical deployability of advanced models but also enable more ecologically valid evaluation protocols that assess system-level performance rather than isolated predictive accuracy.

5.3. Future Research Prospects

Future research on traffic flow prediction should prioritize several concrete and technically grounded directions rather than broad conceptual aspirations.

First, model interpretability requires systematic methodological advancement rather than general calls for explainable AI. Future work should explicitly integrate data-driven architectures with established traffic flow theories (e.g., fundamental diagram models and shockwave theory) to impose physics-informed constraints on learned representations. Instead of relying solely on post-hoc attention visualization, structural interpretability should be embedded into model design and validated through controlled perturbation experiments.

Second, real-time efficiency must be addressed through measurable architectural simplification. Research should quantify the trade-off between prediction accuracy and latency by incorporating standardized runtime benchmarks. Techniques such as structured pruning, low-rank factorization, and knowledge distillation should be evaluated under realistic deployment constraints (e.g., edge computing nodes with limited memory and heterogeneous processing capacity), rather than solely reporting offline accuracy improvements.

Third, privacy-preserving collaboration needs more rigorous protocol-level analysis. Within federated learning frameworks, differential privacy budgets, secure aggregation schemes, and communication costs should be explicitly reported and compared. Future studies should define clear threat models and evaluate the performance–privacy trade-off instead of treating privacy mechanisms as add-on components.

Fourth, cross-regional generalization should be validated through cross-city transfer experiments and out-of-distribution testing. Transfer learning and meta-learning approaches must demonstrate consistent performance under heterogeneous traffic regimes rather than relying on single-dataset evaluations. Similarly, multimodal data fusion (e.g., weather, events, travel demand signals) should be assessed through ablation studies to quantify incremental contributions and avoid over-parameterized fusion architectures.

Fifth, uncertainty quantification and decision integration should move beyond point prediction. Probabilistic forecasting methods need calibration evaluation (e.g., reliability diagrams, coverage probability) and should be tested in downstream decision-making scenarios, such as congestion mitigation or signal control. Closed-loop reinforcement learning frameworks that combine prediction and control should report system-level metrics, including stability and safety, rather than isolated prediction gains.

Finally, the field would benefit from standardized benchmarks and evaluation protocols, including unified dataset splits, client partitioning strategies for federated settings, consistent reporting of computational overhead, and reproducibility checklists. Without methodological standardization, comparative conclusions remain fragile.

In summary, future progress in traffic flow prediction depends less on speculative architectural expansion and more on theoretically grounded modeling, rigorous experimental design, reproducible benchmarking, and deployment-aware evaluation.

6. Conclusions

This paper provides a comprehensive and systematic review of traffic flow prediction methods, focusing on the latest advancements in graph neural network (Graph Neural Network)-based approaches and hybrid deep learning frameworks. Through detailed analysis and classification of existing research, several important conclusions are drawn. Graph Neural Networks (Graph Neural Networks) have become powerful tools for traffic flow prediction due to their inherent ability to model the spatial dependencies of road networks. Our review reveals a clear evolutionary trajectory of Graph Neural Networks from static, predefined graph structures to dynamic, adaptive, and learnable graph representations. Federated learning-based Graph Neural Network methods successfully address key challenges of data privacy and cross-regional collaboration, enabling knowledge sharing without centralizing sensitive traffic data. Dynamic graph construction techniques demonstrate excellent adaptability to time-varying traffic patterns, while multi-graph fusion methods effectively capture complex spatial relationships from multiple perspectives. Hybrid deep learning frameworks combining decomposition algorithms, heuristic optimization, and attention mechanisms with recurrent neural networks show great potential in addressing the inherent non-stationarity and complexity of traffic data. Decomposition-based methods effectively stabilize traffic signals by transforming them into more predictable components. Heuristic optimization algorithms can automatically complete the highly challenging task of hyperparameter tuning, thereby improving model performance and practical applicability. Attention mechanisms enhance the expressive power of models by selectively focusing on the most relevant spatiotemporal features. However, existing methods still face numerous challenges. High computational complexity limits their real-time deployment in large-scale networks. The lack of interpretability in deep learning models hinders their widespread adoption in safety-critical applications. The generalization ability of models across different traffic scenarios and geographical regions has not been fully validated. While privacy-preserving technologies hold great promise, further development is needed to balance practicality and security. These challenges underscore the necessity for continuous research and innovation. Analysis of existing methods shows that no single approach can dominate in all scenarios. The choice of appropriate method depends on specific application requirements, including prediction timeframes, data availability, computational resources, and privacy constraints. Practitioners should carefully weigh these advantages and disadvantages when designing traffic prediction systems. In conclusion, despite significant progress made by graph neural networks and hybrid deep learning methods in traffic flow prediction, there is still considerable room for development in this field. Future research should prioritize model interpretability, computational efficiency, robust privacy protection, and practical deployment capabilities. By addressing these challenges and leveraging emerging technologies, researchers can develop more efficient, reliable, and practical traffic prediction systems, thereby making meaningful contributions to the realization of intelligent transportation systems and sustainable urban development.

Author Contributions

Z.W. (Zhenhua Wang): Writing—original draft, Conceptualization, Writing—review & editing; X.W.: Conceptualization, Methodology; L.W.: Methodology, Data curation; Z.W. (Zheng Wu): Investigation, Formal analysis; J.H.: Writing—review & editing; Z.T.: Formal analysis, Visualization; F.Y.: Formal analysis, Writing—review & editing. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to express their sincere gratitude to the Fundamental Research Funds for the Central Universities Special Fund Project (Grant No. LGZD202606), the Youth Fund Project of Ministry of Education in China Humanities and Social Sciences Foundation (Grant No. 25YJC760104), the General Research Projects in Philosophy and Social Sciences of Colleges and Universities in Jiangsu Province (Grant No. 2025SJYB0084), the Research Project on Higher Education Teaching Reform of Jiangsu Association of Automation (Grant No. JSAAJG2025Y28), the General Project on Teaching Reform of Nanjing Police University (Grant No. YB26005), the Harbin Xinguang Optic-electronics Technology Co., Ltd. Horizontal Research Project (Grant No. 2024320107003397, 2025320107003049) for providing funds to support this study.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The authors declare that the article is a review article and does not relate to the dataset.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Dimitrakopoulos, G.; Demestichas, P. Intelligent transportation systems. IEEE Veh. Technol. Mag. 2010, 5, 77–84. [Google Scholar] [CrossRef]
Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intell. Transp. Syst. 2014, 16, 865–873. [Google Scholar] [CrossRef]
Williams, B.M.; Hoel, L.A. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. J. Transp. Eng. 2003, 129, 664–672. [Google Scholar] [CrossRef]
Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
Polson, N.G.; Sokolov, V.O. Deep learning for short-term traffic flow prediction. Transp. Res. Part C Emerg. Technol. 2017, 79, 1–17. [Google Scholar] [CrossRef]
Vlahogianni, E.I.; Karlaftis, M.G.; Golias, J.C. Short-term traffic forecasting: Where we are and where we’re going. Transp. Res. Part C Emerg. Technol. 2014, 43, 3–19. [Google Scholar] [CrossRef]
Smith, B.L.; Williams, B.M.; Oswald, R.K. Comparison of parametric and nonparametric models for traffic flow forecasting. Transp. Res. Part C Emerg. Technol. 2002, 10, 303–321. [Google Scholar] [CrossRef]
Singh, B.; Gupta, A. Recent trends in intelligent transportation systems: A review. J. Transp. Lit. 2015, 9, 30–34. [Google Scholar] [CrossRef]
Tedjopurnomo, D.A.; Bao, Z.; Zheng, B.; Choudhury, F.M.; Qin, A.K. A survey on modern deep neural network for traffic prediction: Trends, methods and challenges. IEEE Trans. Knowl. Data Eng. 2020, 34, 1544–1561. [Google Scholar] [CrossRef]
Tian, Z.; Lin, Z.; Zhao, D.; Zhao, W.; Flynn, D.; Ansari, S.; Wei, C. Evaluating Scenario-Based Decision-Making for Interactive Autonomous Driving Using Rational Criteria: A Survey. IEEE Trans. Intell. Transp. Syst. 2025, 27, 1709–1730. [Google Scholar] [CrossRef]
Liu, R.; Shin, S.Y. A Review of Traffic Flow Prediction Methods in Intelligent Transportation System Construction. Appl. Sci. 2025, 15, 3866. [Google Scholar] [CrossRef]
Attioui, M.; Lahby, M. Congestion Forecasting Using Machine Learning Techniques: A Systematic Review. Future Transp. 2025, 5, 76. [Google Scholar] [CrossRef]
Kong, X.; Chen, Z.; Liu, W.; Ning, K.; Zhang, L.; Muhammad Marier, S.; Liu, Y.; Chen, Y.; Xia, F. Deep learning for time series forecasting: A survey. Int. J. Mach. Learn. Cybern. 2025, 36, 5079–5112. [Google Scholar] [CrossRef]
Carianni, A.; Gemma, A. Overview of Traffic Flow Forecasting Techniques. IEEE Open J. Intell. Transp. Syst. 2025, 6, 848–882. [Google Scholar] [CrossRef]
Afandizadeh, S.; Abdolahi, S.; Mirzahossein, H. Deep Learning Algorithms for Traffic Forecasting: A Comprehensive Review and Comparison with Classical Ones. J. Adv. Transp. 2024, 2024, 9981657. [Google Scholar] [CrossRef]
Gomes, B.; Coelho, J.; Aidos, H. A survey on traffic flow prediction and classification. Intell. Syst. Appl. 2023, 20, 200268. [Google Scholar] [CrossRef]
Mystakidis, A.; Koukaras, P.; Tjortjis, C. Advances in Traffic Congestion Prediction: An Overview of Emerging Techniques and Methods. Smart Cities 2025, 8, 25. [Google Scholar] [CrossRef]
Lee, E.H.; Lee, E. Congestion boundary approach for phase transitions in traffic flow. Transp. B Transp. Dyn. 2024, 12, 2379377. [Google Scholar] [CrossRef]
Peng, G.; Wu, K.; Jiao, W.; Yang, S.; Wu, Z.; Xu, L.; Lu, C.; Tan, H.; Xia, D. Congestion transition in a heterogeneous ring road car-following model incorporating visual angle defect and speed limit effects. Chaos Solitons Fractals 2026, 204, 117710. [Google Scholar] [CrossRef]
Kono, T.; Takamura, N. Road price and capacity policies subject to a fiscal constraint in a city. Transp. Res. Part B Methodol. 2026, 204, 103361. [Google Scholar] [CrossRef]
Di Vaio, M.; Fiengo, G.; Petrillo, A.; Salvi, A.; Santini, S.; Tufo, M. Cooperative shock waves mitigation in mixed traffic flow environment. IEEE Trans. Intell. Transp. Syst. 2019, 20, 4339–4353. [Google Scholar] [CrossRef]
Şahin, I.; Altun, I. Empirical study of behavioral theory of traffic flow: Analysis of recurrent bottleneck. Transp. Res. Rec. 2008, 2088, 109–116. [Google Scholar] [CrossRef]
Jiao, Z.; Zhang, H.; Li, X. CNN2GNN: How to bridge cnn with gnn. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 9367–9374. [Google Scholar] [CrossRef]
Fan, Y.; Cui, C.; Wang, Z.; Qi, H.; Tian, Z. Graph Anomaly Detection Algorithm Based on Multi-View Heterogeneity Resistant Network. Information 2025, 16, 985. [Google Scholar] [CrossRef]
Amara, A.; Taieb, M.A.H.; Aouicha, M.B. A multi-view GNN-based network representation learning framework for recommendation systems. Neurocomputing 2025, 619, 129001. [Google Scholar] [CrossRef]
Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Message passing neural networks. In Machine Learning Meets Quantum Physics; Springer: Berlin/Heidelberg, Germany, 2020; pp. 199–214. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Wang, Z.; Chen, L.; He, J.; Yang, L.; Wang, F.Y. Exploring Latent Transferability of feature components. Pattern Recognit. 2025, 160, 111184. [Google Scholar] [CrossRef]
Wang, Z.; Chen, L.; Xie, X.; Zhang, Y.; Cai, Y.; Ding, W. Teacher-Student Instance-level Adversarial Augmentation for Single Domain Generalized Medical Image Segmentation. IEEE Trans. Med. Imaging 2025, 45, 764–776. [Google Scholar] [CrossRef]
Zeng, J.; Giese, T.J.; Zhang, D.; Wang, H.; York, D.M. DeePMD-GNN: A DeePMD-kit Plugin for External Graph Neural Network Potentials. J. Chem. Inf. Model. 2025, 65, 3154–3160. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Liu, Y.; Zhang, Y.; Shu, S.; Zheng, J. DEST-GNN: A double-explored spatio-temporal graph neural network for multi-site intra-hour PV power forecasting. Appl. Energy 2025, 378, 124744. [Google Scholar] [CrossRef]
Wang, Z.; Wang, X.; Liu, F.; Gao, P.; Ni, Y. Adaptative balanced distribution for domain adaptation with strong alignment. IEEE Access 2021, 9, 100665–100676. [Google Scholar] [CrossRef]
Graves, A.; Mohamed, A.r.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and signal Processing; IEEE: New York, NY, USA, 2013; pp. 6645–6649. [Google Scholar] [CrossRef]
Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE 2002, 78, 1550–1560. [Google Scholar] [CrossRef]
Zheng, L.; Wang, X.; Li, F.; Mao, Z.; Tian, Z.; Peng, Y.; Yuan, F.; Yuan, C. A Mean-Field-Game-Integrated MPC-QP Framework for Collision-Free Multi-Vehicle Control. Drones 2025, 9, 375. [Google Scholar] [CrossRef]
Peng, Y.; Yang, X.; Li, D.; Ma, Z.; Liu, Z.; Bai, X.; Mao, Z. Predicting flow status of a flexible rectifier using cognitive computing. Expert Syst. Appl. 2025, 264, 125878. [Google Scholar] [CrossRef]
Kirkpatrick, S.; Gelatt, C.D., Jr.; Vecchi, M.P. Optimization by simulated annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef] [PubMed]
Sampson, J.R. Adaptation in natural and artificial systems (John H. Holland). SIAM Rev. 1976, 18, 529–530. [Google Scholar] [CrossRef]
Fogel, L.; Owens, A.; Walsh, M. Artificial Intelligence Through; Wiley: New York, NY, USA, 1966. [Google Scholar]
Glover, F. Tabu search: A tutorial. Interfaces 1990, 20, 74–94. [Google Scholar] [CrossRef]
Dorigo, M.; Gambardella, L. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1997, 1, 53–66. [Google Scholar] [CrossRef]
Chen, J.; Zheng, L.; Hu, Y.; Wang, W.; Zhang, H.; Hu, X. Traffic flow matrix-based graph neural network with attention mechanism for traffic flow prediction. Inf. Fusion 2024, 104, 102146. [Google Scholar] [CrossRef]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
Varaiya, P. Freeway Performance Measurement System (PeMS), 4th ed.; UC Berkeley: Berkeley, CA, USA, 2004. [Google Scholar]
Tan, P.; Shi, W.; Shi, Z.; Wang, Y. LSTDNet: A Long-Range Spatio-Temporal Decoupling Network for Traffic Prediction. In Proceedings of the 2024 7th International Conference on Pattern Recognition and Artificial Intelligence (PRAI); IEEE: New York, NY, USA, 2024; pp. 899–904. [Google Scholar]
Pareek, P.K.; Al-Fatlawy, R.R.; Manasa, R.; Varma, P.R.K.; Kotla, N.R.D. Traffic Flow Prediction in Intelligent Transportation using Spatial-Temporal Graph Convolution Attention Module. In Proceedings of the 2024 First International Conference on Software, Systems and Information Technology (SSITCON); IEEE: New York, NY, USA, 2024; pp. 1–4. [Google Scholar]
Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Menlo Park, CA, USA, 2019; Volume 33, pp. 922–929. [Google Scholar]
Hou, Y.; Zhang, D. Graph Neural Network-Enhanced Multivariate Time Series Forecasting with Series-Core Fusion. In Proceedings of the 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE); IEEE: New York, NY, USA, 2025; pp. 1850–1854. [Google Scholar]
Zhang, X.; Nihan, N.L.; Wang, Y. Improved dual-loop detection system for collecting real-time truck data. Transp. Res. Rec. 2005, 1917, 108–115. [Google Scholar] [CrossRef][Green Version]
Zhang, J.; Zheng, Y.; Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Menlo Park, CA, USA, 2017; Volume 31. [Google Scholar]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
Stoyanovich, J.; Gilbride, M.; Moffitt, V.Z. Zooming in on NYC taxi data with Portal. arXiv 2017, arXiv:1709.06176. [Google Scholar] [CrossRef]
Yao, H.; Wu, F.; Ke, J.; Tang, X.; Jia, Y.; Lu, S.; Gong, P.; Ye, J.; Li, Z. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Menlo Park, CA, USA, 2018; Volume 32. [Google Scholar]
Liu, X.; Xia, Y.; Liang, Y.; Hu, J.; Wang, Y.; Bai, L.; Huang, C.; Liu, Z.; Hooi, B.; Zimmermann, R. Largest: A benchmark dataset for large-scale traffic forecasting. Adv. Neural Inf. Process. Syst. 2023, 36, 75354–75371. [Google Scholar]
Loder, A.; Ambühl, L.; Menendez, M.; Axhausen, K.W. Understanding traffic capacity of urban networks. Sci. Rep. 2019, 9, 16283. [Google Scholar] [CrossRef] [PubMed]
Punzo, V.; Borzacchiello, M.T.; Ciuffo, B. On the assessment of vehicle trajectory data accuracy and application to the Next Generation SIMulation (NGSIM) program data. Transp. Res. Part C Emerg. Technol. 2011, 19, 1243–1262. [Google Scholar] [CrossRef]
Yuan, J.; Zheng, Y.; Zhang, C.; Xie, W.; Xie, X.; Sun, G.; Huang, Y. T-drive: Driving directions based on taxi trajectories. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems; Association for Computing Machinery: New York, NY, USA, 2010; pp. 99–108. [Google Scholar]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. Proc. Mach. Learn. Res. 2017, 54, 1273–1282. [Google Scholar]
Feng, J.; Du, C.; Mu, Q. Traffic Flow Prediction Based on Federated Learning and Spatio-Temporal Graph Neural Networks. ISPRS Int. J.-Geo-Inf. 2024, 13, 210. [Google Scholar] [CrossRef]
Xia, M.; Jin, D.; Chen, J. Short-Term Traffic Flow Prediction Based on Graph Convolutional Networks and Federated Learning. IEEE Trans. Intell. Transp. Syst. 2023, 24, 1191–1203. [Google Scholar] [CrossRef]
Wang, F.; Cao, Y.; Liu, L.; Kang, Q.; Chen, J. Federated Graph Neural Networks With Equivalent Hypergraph Construction for Traffic Flow Prediction. IEEE Trans. Knowl. Data Eng. 2025, 37, 6420–6435. [Google Scholar] [CrossRef]
Liu, S.; He, M.; Wu, Z.; Lu, P.; Gu, W. Spatial-temporal graph neural network traffic prediction based load balancing with reinforcement learning in cellular networks. Inf. Fusion 2024, 103, 102079. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph wavenet for deep spatial-temporal graph modeling. arXiv 2019, arXiv:1906.00121. [Google Scholar]
Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 17804–17815. [Google Scholar]
Wu, S.; Hu, Y. Traffic Flow Prediction Based on Spatio-Temporal Aggregated Graph Neural Networks. Transp. Res. Rec. 2025, 2679, 573–588. [Google Scholar] [CrossRef]
Gu, J.; Jia, Z.; Cai, T.; Song, X.; Mahmood, A. Dynamic Correlation Adjacency-Matrix-Based Graph Neural Networks for Traffic Flow Prediction. Sensors 2023, 23, 2897. [Google Scholar] [CrossRef]
Ma, W.; Chu, Z.; Chen, H.; Li, M. Spatio-temporal envolutional graph neural network for traffic flow prediction in UAV-based urban traffic monitoring system. Sci. Rep. 2024, 14, 26800. [Google Scholar] [CrossRef]
Hu, S.; Gu, J.; Li, S. Research on Urban Road Traffic Flow Prediction Based on Sa-Dynamic Graph Convolutional Neural Network. Mathematics 2025, 13, 416. [Google Scholar] [CrossRef]
Jiang, M.; Liu, Z. Traffic Flow Prediction Based on Dynamic Graph Spatial-Temporal Neural Network. Mathematics 2023, 11, 2528. [Google Scholar] [CrossRef]
Ye, Y.; Xiao, Y.; Zhou, Y.; Li, S.; Zang, Y.; Zhang, Y. Dynamic multi-graph neural network for traffic flow prediction incorporating traffic accidents. Expert Syst. Appl. 2023, 234, 121101. [Google Scholar] [CrossRef]
Chen, F.; Sun, X.; Wang, Y.; Xu, Z.; Ma, W. Adaptive graph neural network for traffic flow prediction considering time variation. Expert Syst. Appl. 2024, 255, 124430. [Google Scholar] [CrossRef]
Vrahatis, A.G.; Lazaros, K.; Kotsiantis, S. Graph attention networks: A comprehensive review of methods and applications. Future Internet 2024, 16, 318. [Google Scholar] [CrossRef]
Ollivier, Y. Ricci curvature of Markov chains on metric spaces. J. Funct. Anal. 2009, 256, 810–864. [Google Scholar] [CrossRef]
Wang, J.; Yang, S.; Gao, Y.; Wang, J.; Alfarraj, O. Road Network Traffic Flow Prediction Method Based on Graph Attention Networks. J. Circuits Syst. Comput. 2024, 33, 15. [Google Scholar] [CrossRef]
Wang, Q.; Liu, W.; Wang, X.; Chen, X.; Chen, G.; Wu, Q. GMHANN: A Novel Traffic Flow Prediction Method for Transportation Management Based on Spatial-Temporal Graph Modeling. IEEE Trans. Intell. Transp. Syst. 2024, 25, 386–401. [Google Scholar] [CrossRef]
Peng, H.; Du, B.; Liu, M.; Liu, M.; Ji, S.; Wang, S.; Zhang, X.; He, L. Dynamic graph convolutional network for long-term traffic flow prediction with reinforcement learning. Inf. Sci. 2021, 578, 401–416. [Google Scholar] [CrossRef]
Cheng, X.; He, Y.; Zhang, P.; Kang, Y. Traffic Flow Prediction Based on Information Aggregation and Comprehensive Temporal-Spatial Synchronous Graph Neural Network. IEEE Access 2023, 11, 47469–47479. [Google Scholar] [CrossRef]
Han, X.; Zhu, G.; Zhao, L.; Du, R.; Wang, Y.; Chen, Z.; Liu, Y.; He, S. Ollivier-Ricci Curvature Based Spatio-Temporal Graph Neural Networks for Traffic Flow Forecasting. Symmetry 2023, 15, 995. [Google Scholar] [CrossRef]
Wang, B.; Gao, F.; Tong, L.; Zhang, Q.; Zhu, S. Channel attention-based spatial-temporal graph neural networks for traffic prediction. Data Technol. Appl. 2024, 58, 81–94. [Google Scholar] [CrossRef]
Xing, H.; Chen, A.; Zhang, X. RL-GCN: Traffic flow prediction based on graph convolution and reinforcement for smart cities. Displays 2023, 80, 102513. [Google Scholar] [CrossRef]
Yang, H.; Yu, W.; Zhang, G.; Du, L. Network-Wide Traffic Flow Dynamics Prediction Leveraging Macroscopic Traffic Flow Model and Deep Neural Networks. IEEE Trans. Intell. Transp. Syst. 2024, 25, 4443–4457. [Google Scholar] [CrossRef]
Rajagopal, M.; Sivasakthivel, R.; Anitha, G.; Arunachalam, K.P.; Loganathan, K.; Abbas, M.; Kalathil, S.; Rao, K.S. An efficient intelligent transportation system for traffic flow prediction using meta-temporal hyperbolic quantum graph neural networks. Sci. Rep. 2025, 15, 27476. [Google Scholar] [CrossRef]
An, J.; Guo, L.; Liu, W.; Fu, Z.; Ren, P.; Liu, X.; Li, T. IGAGCN: Information geometry and attention-based spatiotemporal graph convolutional networks for traffic flow prediction. Neural Netw. 2021, 143, 355–367. [Google Scholar] [CrossRef]
Lv, Y.; Lv, Z.; Cheng, Z.; Zhu, Z.; Rashidi, T.H. TS-STNN: Spatial-temporal neural network based on tree structure for traffic flow prediction. Transp. Res. Part-E-Logist. Transp. Rev. 2023, 177, 103251. [Google Scholar] [CrossRef]
Abbas, B.; Alyas, T.; Abbas, Q.; Alqahtany, S.S.; Alghamdi, T.; Aljohani, N.; Tabassum, N.; Ibrahim, A.M. A hybrid support vector machine and neural network model with fuzzy logic fusion for smart city traffic prediction. Sci. Rep. 2025, 15, 34758. [Google Scholar] [CrossRef]
Wang, R.; Cao, Y.; Ji, X.; Qiao, D. A Prediction Method for Highway Traffic Flow Based on the IHPO-VMD-LSTM-Informer Model. Inf. Technol. Control 2025, 54, 380–395. [Google Scholar] [CrossRef]
Vo, H.H.P.; Nguyen, T.M.; Bui, K.A.; Yoo, M. Traffic Flow Prediction in 5G-Enabled Intelligent Transportation Systems Using Parameter Optimization and Adaptive Model Selection. Sensors 2024, 24, 6529. [Google Scholar] [CrossRef]
Zhou, R.; Qiu, S.; Li, M.; Meng, S.; Zhang, Q. Short-Term Air Traffic Flow Prediction Based on CEEMD-LSTM of Bayesian Optimization and Differential Processing. Electronics 2024, 13, 1896. [Google Scholar] [CrossRef]
Zhao, Z.; Yuan, J.; Chen, L. Air Traffic Flow Management Delay Prediction Based on Feature Extraction and an Optimization Algorithm. Aerospace 2024, 11, 168. [Google Scholar] [CrossRef]
Dai, G.; Tang, J.; Luo, W. Short-term traffic flow prediction: An ensemble machine learning approach. Alex. Eng. J. 2023, 74, 467–480. [Google Scholar] [CrossRef]
Dong, Z.; Zhou, Y.; Bao, X. A Short-Term Vessel Traffic Flow Prediction Based on a DBO-LSTM Model. Sustainability 2024, 16, 5499. [Google Scholar] [CrossRef]
Jardines, A.; Soler, M.; Garcia-Heras, J.; Ponzano, M.; Raynaud, L. Pre-tactical convection prediction for air traffic flow management using LSTM neural network. Meteorol. Appl. 2024, 31, e2215. [Google Scholar] [CrossRef]
Guo, C.; Zhu, J.; Wang, X. MVHS-LSTM: The Comprehensive Traffic Flow Prediction Based on Improved LSTM via Multiple Variables Heuristic Selection. Appl.-Sci. 2024, 14, 2959. [Google Scholar] [CrossRef]
Fu, F.; Wang, D.; Sun, M.; Xie, R.; Cai, Z. Urban Traffic Flow Prediction Based on Bayesian Deep Learning Considering Optimal Aggregation Time Interval. Sustainability 2024, 16, 1818. [Google Scholar] [CrossRef]
Cini, N.; Aydin, Z. A Deep Ensemble Approach for Long-Term Traffic Flow Prediction. Arab. J. Sci. Eng. 2024, 49, 12377–12392. [Google Scholar] [CrossRef]
Zhuang, W.; Cao, Y. Short-Term Traffic Flow Prediction Based on a K-Nearest Neighbor and Bidirectional Long Short-Term Memory Model. APplied Sci. 2023, 13, 2681. [Google Scholar] [CrossRef]
Vijayalakshmi, B.; Ramya, T.; Ramar, K. Multivariate Congestion Prediction using Stacked LSTM Autoencoder based Bidirectional LSTM Model. KSII Trans. Internet Inf. Syst. 2023, 17, 216–238. [Google Scholar] [CrossRef]
Cao, K.; Liu, Y.; Duan, L.; Xu, S.; Jung, H. Research on Regional Traffic Flow Prediction Based on MGCN-WOALSTM. IEEE Access 2023, 11, 126436–126446. [Google Scholar] [CrossRef]
Hussain, A.H.A.; Taher, M.A.; Mahmood, O.A.; Hammadi, Y.I.I.; Alkanhel, R.; Muthanna, A.; Koucheryavy, A. Urban Traffic Flow Estimation System Based on Gated Recurrent Unit Deep Learning Methodology for Internet of Vehicles. IEEE Access 2023, 11, 58516–58531. [Google Scholar] [CrossRef]
Lan, T.; Zhang, X.; Qu, D.; Yang, Y.; Chen, Y. Short-Term Traffic Flow Prediction Based on the Optimization Study of Initial Weights of the Attention Mechanism. Sustainability 2023, 15, 1374. [Google Scholar] [CrossRef]
Wang, S.; Yu, Z.; Xu, G.; Zhao, F. Research on Tool Remaining Life Prediction Method Based on CNN-LSTM-PSO. IEEE Access 2023, 11, 80448–80464. [Google Scholar] [CrossRef]
Gu, Y.; Lu, W.; Xu, X.; Qin, L.; Shao, Z.; Zhang, H. An Improved Bayesian Combination Model for Short-Term Traffic Prediction With Deep Learning. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1332–1342. [Google Scholar] [CrossRef]
Aburasain, R.Y. Enhanced congestion prediction of traffic flow using a hybrid attention-based deep learning model. PeerJ Comput. Sci. 2025, 11, e3224. [Google Scholar] [CrossRef]
Jia, X.; Qu, J.; Lyu, Y.; Guo, M.; Zhang, J.; Guo, F. A Prediction-Based Anomaly Detection Method for Traffic Flow Data with Multi-Domain Feature Extraction. Appl. Sci. 2025, 15, 3234. [Google Scholar] [CrossRef]
Shuvro, A.A.; Khan, M.S.; Rahman, M.; Hussain, F.; Moniruzzaman, M.; Hossen, M.S. Transformer Based Traffic Flow Forecasting in SDN-VANET. IEEE Access 2023, 11, 41816–41826. [Google Scholar] [CrossRef]
Song, X.; Yang, D.; Wang, Y.; Wang, H. TransFusion Model Fusion Mechanism Based on Transformer for Traffic Flow Prediction. J. Database Manag. 2023, 34, 3. [Google Scholar] [CrossRef]

Figure 1. Urban traffic flow diagram.

Figure 2. Architecture of Graph Convolutional Network.

Figure 3. LSTM Architecture Diagram.

Figure 4. Schematic architecture of the Transformer model.

Figure 5. Personalized spatiotemporal graph network model under federated learning platform.

Figure 6. Self-attention spatiotemporal prediction architecture fusing gated TCN and graph morphology network.

Figure 7. Simulated graph neural network attention prediction model based on RippleGNN.

Figure 8. Time series prediction framework combining STL with multi-model ensemble.

Figure 9. Adaptive parameter tuning process of LSTM neural network optimized based on IVY algorithm.

Figure 10. Hybrid prediction model architecture based on Transformer and BiLSTM.

Figure 11. Conceptual framework illustrating the linkage between traffic flow prediction outputs and traffic operational decision variables in Intelligent Transportation Systems.

Table 1. Summary of Review Literatures on Traffic Flow Forecasting.

Authors	Year	Research Content	Advantages	Disadvantages
Liu et al. [11]	2025	Classifies traffic flow forecasting methods into statistical, machine learning, and deep learning categories; compares their principles, performance, and applications; highlights deep learning and hybrid models in short-term forecasting; identifies issues such as generalization and long-term prediction.	Clear classification; strong experimental comparison verifying deep learning advantages; well-organized datasets and metrics.	Limited discussion of optimized traditional models; insufficient analysis of hybrid models across different scenarios.
Attioui et al. [12]	2025	Conducts a PRISMA-based systematic review of ML methods for congestion forecasting (2010–2024); analyzes 115 studies; summarizes major methods, tasks, scenarios, and research gaps with emphasis on deep learning.	Large search scope; rigorous screening; comprehensive data extraction; clear identification of trends and gaps.	Focuses only on congestion forecasting; limited discussion of emerging paradigms such as reinforcement learning.
Kong et al. [13]	2025	Summarizes deep learning architectures for time-series forecasting (e.g., encoder–decoder, Transformer); reviews feature extraction approaches, datasets, challenges, and implications for traffic flow forecasting.	Innovative model taxonomy; strong feature extraction organization; broad dataset coverage; clear future directions.	Lacks detailed suitability analysis for traffic scenarios; limited discussion on model practicality.
Annarita et al. [14]	2025	Categorizes forecasting methods into naive, parametric, simulation, and nonparametric models; reviews historical development, theoretical foundations, and practical applications; highlights AI-driven advances and spatiotemporal modeling trends.	Comprehensive classification; deep analysis of 138 studies; strong identification of modeling trends.	Insufficient scenario-based comparison; limited discussion on emerging hybrid models.
Shahriar et al. [15]	2024	Reviews deep learning and classical models (LSTM, CNN, ARIMA, Kalman filter) used in traffic flow, speed, and congestion forecasting; analyzes data sources and model foundations.	Covers diverse forecasting scenarios; extensive literature scope; clear classification of data sources.	Limited scenario-specific analysis; lacks discussion of optimized versions of classical models.
Aristeidis et al. [17]	2025	Reviews statistical, machine learning, deep learning, and ensemble methods for congestion forecasting; summarizes forecasting horizons, input features, and data–model workflows.	Process-oriented review; covers multidimensional inputs; strong analysis of model applicability.	Limited discussion of ensemble fusion strategies; lacks cross-horizon performance comparisons.
Our survey	2025	Focuses on state-of-the-art traffic flow forecasting using graph neural networks and hybrid deep learning; summarizes four paradigms (federated learning, dynamic graph structures, multi-graph attention fusion, and cross-domain integration); further reviews decomposition–optimization–attention hybrid models and discusses challenges such as model interpretability, multimodal fusion, lightweight deployment, and robust generalization.	Innovative paradigm taxonomy; comprehensive coverage of GNN and hybrid models; emphasizes deployability, interpretability and cross-domain integration; provides future-oriented research guidance.	Limited comparison with traditional and physical models; mixed frameworks lack unified theoretical guidance; real-time deployment and missing-data robustness require further empirical validation.

Table 2. Comparative Summary of Benchmark Datasets Commonly Used in Traffic Flow Prediction Research.

Dataset	Region	Data Source	Sensors/Nodes	Time Span	Interval	Features	Primary Task
METR-LA [43]	Los Angeles, USA	Loop detectors	207	4 months (2012)	5 min	Speed	Speed prediction
PEMS-BAY [44]	San Francisco Bay, USA	Loop detectors	325	6 months (2017)	5 min	Speed	Speed prediction
PeMS03 [45]	California, USA	Loop detectors	358	91 days (2018)	5 min	Flow, Occup., Speed	Flow prediction
PeMS04 [46]	California, USA	Loop detectors	307	59 days (2018)	5 min	Flow, Occup., Speed	Flow prediction
PeMS07 [47]	California, USA	Loop detectors	883	28 weeks (2017)	5 min	Flow, Occup., Speed	Flow prediction
PeMS08 [48]	California, USA	Loop detectors	170	62 days (2016)	5 min	Flow, Occup., Speed	Flow prediction
Loop Seattle [49]	Seattle, USA	Loop detectors	323	∼1 year (2015)	5 min	Speed, Volume	Speed/flow prediction
TaxiBJ [50]	Beijing, China	Taxi GPS trajectories	N/A (grid)	4 intervals (2013–2016)	30 min	Inflow, Outflow	Crowd flow prediction
SZ-Taxi [51]	Shenzhen, China	Taxi GPS trajectories	156 roads	1 month (2015)	15 min	Speed	Speed prediction
NYC-Taxi [52]	New York, USA	Taxi trip records	N/A (zone)	Multi-year (2009–2020)	60 min	Pickup/Dropoff demand	Demand forecasting
NYC-Bike [53]	New York, USA	Bike-sharing GPS	N/A (grid)	Multi-year	30 min	Inflow, Outflow	Crowd flow prediction
LargeST [54]	California, USA	Loop detectors (PeMS)	8600+	5 years (2017–2021)	5 min	Flow, Speed, Occup.	Large-scale benchmarking
UTD19 [55]	39 cities, global	Loop/Radar sensors	23,000+	∼1–3 years	Varies	Speed, Flow, Density	Multi-city benchmarking
NGSIM [56]	California, USA	Video-based vehicle tracking	∼2000 vehicles	Short clips (2005–2006)	0.1 s	Position, Speed, Accel.	Microscopic modeling
TDrive [57]	Beijing, China	Taxi GPS trajectories	10,357 taxis	1 week (2008)	—	GPS trajectory	Route/speed inference

Table 3. Category 1: Federated Learning and Privacy-Preserving Methods.

Authors	Research Method	Input Variables	Prediction Target	Advantages	Disadvantages
Feng et al. [59]	Federated Learning and Spatio-Temporal Graph Neural Network model, dividing the road network using DTW and K-means, and addressing heterogeneity via knowledge distillation.	Historical traffic flow	Traffic flow/speed (multi-step)	Effectively protects data privacy while handling model and objective heterogeneity in federated learning.	The two-stage partitioning and training process is complex, with high communication and computational costs.
Xia et al. [60]	Federated Community Graph Convolutional Network (FCGCN), combining community detection with federated learning for short-term traffic prediction.	Short-term traffic flow time series, graph adjacency	Short-term traffic flow (single/multi-step)	Reduces communication costs and time consumption associated with centralized training, protecting data privacy.	Performance is sensitive to the quality of community detection, which affects subnetwork division.
Wang et al. [61]	Federated Graph Neural Network with Equivalent Hypergraph Construction (FGNNEH), transforming local networks into high-dimensional hypernodes and constructing a global hypergraph.	Local traffic graph node features, historical flow	Traffic flow (multi-step)	Effectively restores lost inter-client connections while preserving privacy, enabling cross-regional information exchange.	Hypergraph construction and iterative update mechanisms are very complex and computationally expensive.
Liu et al. [62]	Reinforcement Learning-based load balancing framework utilizing spatio-temporal graph neural network predictions, applied in cellular networks.	Cellular network traffic load	Network load/traffic volume (decision-oriented)	Applies traffic prediction to network load balancing, improving overall system performance and energy efficiency.	Relies on simulations; deployment in real-world complex network environments faces challenges.

Table 4. Category 2: Dynamic and Adaptive Graph Structure Methods.

Authors	Research Method	Input Variables	Prediction Target	Advantages	Disadvantages
Wu et al. [65]	Spatio-temporal aggregated graph neural network, enhancing spatio-temporal correlations by generating temporal graphs and computing correlation coefficient matrices.	Historical traffic flow/speed	Traffic flow/speed (multi-step)	Capable of capturing relationships between non-adjacent spatial locations, compensating for the limitations of fixed spatial graphs.	Graph structure quality is highly dependent on the completeness and accuracy of input data.
Gu et al. [66]	Dynamic Correlation Graph Convolutional Network (DCGCN), dynamically constructing adjacency matrices from input data based on correlation coefficients.	Multivariate traffic time series	Traffic flow (multi-step)	Adaptively captures hidden spatial dependencies, independent of predefined graph structures.	Dynamic matrix computation increases model complexity and training cost.
Ma et al. [67]	Spatio-temporal evolutionary graph neural network, continuously updating the semantic adjacency matrix during training to adapt to dynamic traffic information.	Historical traffic flow, learnable graph structure	Traffic flow (multi-step)	Allows graph structure to evolve dynamically, better adapting to the complexity and variability of traffic patterns.	Frequent matrix updates require substantial computational resources.
Hu et al. [68]	Self-attention dynamic graph wave network (SA-DGWN), improved from Graph WaveNet by introducing an adaptive adjacency matrix.	Historical traffic speed/flow	Traffic speed/flow (multi-step)	Automatically fits spatio-temporal dependencies of the road network, reducing multiple error metrics.	The self-attention mechanism demands high computational resources.
Jiang et al. [69]	Dynamic Graph Spatial-Temporal Neural Network (DGSTN), capturing hidden and time-varying node relationships by constructing static topology maps and dynamic information maps.	Historical traffic flow, static road topology, dynamic flow similarity	Traffic flow (multi-step)	Simultaneously captures hidden node relationships and time-varying spatial correlations through a clever design.	Dynamic graph construction relies on training data, potentially limiting generalization to new scenarios.
Ye et al. [70]	Dynamic Multi-graph Neural Network (DMGNN), featuring a dynamic graph adjustment module to update the adjacency matrix used in each training step.	Historical traffic flow, prior graphs, traffic accident event data	Traffic flow	Provides richer prior knowledge, with graph structure dynamically adjusted during training.	Model stability may be affected by sparse traffic incident data.
Chen et al. [71]	Time-based Adaptive Graph Neural Network (TAGNN), generating time-based adaptive graph dependency matrices via a graph learning module.	Historical traffic flow/speed, time-step-specific adaptive graph	Traffic flow/speed (multi-step)	Captures time-varying hidden spatial correlations, featuring an efficient temporal convolution module.	Learning process for adaptive matrices can be unstable, requiring fine-tuned hyperparameters.

Table 5. Category 3: Multi-source Information Fusion and Advanced Attention Mechanism Methods.

Authors	Research Method	Input Variables	Prediction Target	Advantages	Disadvantages
Wang et al. [74]	Graph Attention Network, introducing an attention mechanism for weighted aggregation of node features.	Historical traffic flow, road network topology	Traffic flow (multi-step)	Extracts more representative node features, enhancing model robustness.	May over-focus on local features and neglect global structural information.
Cheng et al. [77]	Information aggregation and comprehensive spatio-temporal synchronous graph neural network, incorporating fusion feature attention, information aggregation, and multi-information combination modules.	Historical traffic flow/speed, spatial graph, temporal features	Traffic flow/speed (multi-step)	Synchronously extracts spatio-temporal dependencies and considers the influence of auxiliary traffic factors on the prediction target.	Complex module design leads to difficult parameter tuning.
Han et al. [78]	Spatio-temporal graph neural network based on Ollivier-Ricci curvature, utilizing optimal transport theory to compute edge curvature.	Traffic flow features, graph topology with curvature-weighted edges	Traffic flow (multi-step)	Leverages local topological constraints to guide feature propagation, enabling more sufficient capture of spatial dependencies.	Curvature calculation is computationally complex, limiting application in large-scale networks.
Wang et al. [79]	Knowledge Fusion Enhanced Graph Neural Network (KFGNN), constructing topological graphs incorporating multiple types of knowledge (e.g., network structure, regional functionality).	Road network topology, regional function attributes, historical traffic flow	Traffic flow (multi-step)	Mines implicit semantic relationships in traffic data, obtaining more complex semantic representations.	Knowledge fusion module design is complex and requires multi-source data support.
Wang et al. [75]	Channel Attention-based Spatio-temporal Graph Neural Network, employing channel attention mechanisms and Transformers.	Historical traffic flow/speed, channel features, spatial graph	Traffic flow/speed (multi-step)	Enhances the influence of proximate dependencies on decision-making and effectively captures long-term dependencies.	Large number of parameters results in relatively low computational efficiency.
peng et al. [76]	Hybrid Spatio-Temporal Graph Neural Network with Attention Fusion (HSTGNN), integrating static, dynamic, and semantic spatial dependencies with a multi-scale gated temporal attention mechanism.	Historical traffic flow, static/dynamic/semantic graph features	Traffic flow (multi-step)	Comprehensive multi-scale feature fusion with strong adaptability, performing excellently across various traffic scenarios.	Highly complex model requires large amounts of data for training.

Table 6. Category 4: Novel Architectures and Cross-Domain Technology Integration Methods.

Authors	Research Method	Input Variables	Prediction Target	Advantages	Disadvantages
Xing et al. [80]	Integration of Graph Convolution, LSTM, and Reinforcement Learning (RL-GCN) for traffic flow prediction and control in smart cities.	Historical traffic flow, road graph, LSTM temporal features	Traffic flow + optimal control strategy (closed-loop)	Enables not only prediction but also the development of optimal traffic control strategies, forming a perception-decision closed loop.	Reinforcement learning component training is unstable and time-consuming, hindering real-time application.
Yang et al. [81]	Macroscopic traffic flow model-integrated deep learning framework (MTFD), combining the CTM model, spatio-temporal attention RNN, and EKF.	Traffic state variables, boundary/initial conditions, road network data	Traffic flow/density (physics-constrained, multi-step)	Deeply integrates physical models with data-driven methods, ensuring predictions align with traffic flow theory.	Model integration is complex and heavily reliant on accurate initial and boundary conditions.
Rajagopal et al. [82]	Meta-Temporal Hyperbolic Quantum Graph Neural Network (MTH-QGNN), integrating hyperbolic embeddings, meta-learning, quantum graph neural networks, and neural ODEs.	Historical traffic flow, hierarchical road graph	Traffic flow (multi-step, cross-city generalization)	Highly innovative, enables fast adaptation to new cities and theoretically handles very large-scale networks.	Quantum computing aspect remains largely theoretical, lacking practical hardware support.
An et al. [83]	Information Geometry and Attention-based Spatio-Temporal Graph Convolutional Network (IGAGCN), utilizing information geometry to measure data distribution differences between sensors.	Historical traffic flow, sensor distribution features, information-geometric graph weights	Traffic flow (multi-step)	More finely captures dynamic spatial dependencies of traffic flow between different sensors.	Information geometry calculations are complex, requiring strong mathematical background, thus having a high application barrier.
Lv et al. [84]	Tree Structure-based Spatio-Temporal Neural Network (TS-STNN), constructing plane tree matrices with hierarchical and directional features to extract spatial information.	Historical traffic flow, hierarchical tree-structured spatial features	Traffic flow (multi-step)	Excavates the spatial hierarchy and directional information inherent in traffic flow data.	Tree structure construction depends on specific network topology, limiting generalization capability.
Abbas et al. [85]	FL-SVM-NN Hybrid with Fuzzy Logic (DFHITSSC)	Historical traffic flow/speed, weather indicators, road features	Traffic flow/speed (smart city, multi-step)	Explicit modeling of uncertainty using fuzzy rules enhances interpretability.	Membership function parameters depend on expert experience.

Table 7. Research on Decomposition Algorithm and LSTM based Traffic Flow Prediction.

Authors	Year	Research Method	Advantages	Limitations
Wang et al. [86]	2025	IHPO-VMD-LSTM-Informer	Combines NPCA dimensionality reduction with VMD decomposition to enhance feature representation capability	High model complexity and computational cost
Vo et al. [87]	2024	FVMD-WOA-GA-LSTM	Adaptive model selection with multiple models, high accuracy	Numerous parameters, complex training process
Zhou et al. [88]	2024	CEEMD-LSTM + Differential Processing + Bayesian Optimization	Data stationarity processing improves prediction stability	Sensitive to outliers
Zhao et al. [89]	2023	VMD-IDBO-LSTM	Improved DBO optimizes LSTM parameters, strong adaptability	Sensitive to initial parameters
Dai et al. [90]	2023	OVMD-L-BILSTM	Uses improved bat algorithm to optimize VMD, bidirectional LSTM captures temporal patterns	Complex model structure, long training time

Table 8. Research on Heuristic Optimization Algorithm and LSTM based Traffic Flow Prediction.

Authors	Year	Research Method	Advantages	Limitations
Dong et al. [91]	2024	DBO-LSTM	Uses dung beetle optimizer to optimize LSTM hyperparameters	Sensitive to initial population
Jardines et al. [92]	2024	LSTM + Numerical Weather Prediction	Incorporates meteorological data, suitable for air traffic	Strong dependence on data quality
Guo et al. [93]	2024	MVHS-LSTM + OLS	Heuristic feature selection with dynamic parameter adjustment	Complex feature selection process
Fu et al. [94]	2024	Bayesian LSTM-CNN + Optimal Aggregation Time Interval	Considers signal cycles, improves urban traffic prediction	Strong dependence on signal cycle information
Cini et al. [95]	2024	DEM (CNN + LSTM and GRU Ensemble)	Multi-model ensemble, flexible updates	Complex model structure
Zhuang et al. [96]	2023	KNN-BiLSTM	Combines KNN spatial screening with BiLSTM temporal modeling	Sensitive to number of neighbors
Vijayalakshmi et al. [97]	2023	SLSTM_AE-BiLSTM + CNN	Autoencoder dimensionality reduction, CNN for congestion detection	High model complexity
Cao et al. [98]	2023	MGCN-WOALSTM	Multi-channel graph convolution + WOA optimizes LSTM	Strong dependence on graph structure
Hussain et al. [99]	2023	BiLSTM + GRU Deep Network	Multi-layer GRU and BiLSTM combination, suitable for multiple intersections	Long training time
Lan et al. [100]	2023	GWO-Attention-LSTM	Uses grey wolf optimizer for attention mechanism initial weights	Time-consuming optimization process
Wang et al. [101]	2023	CNN-LSTM-PSO	Multi-sensor feature fusion, PSO optimizes hyperparameters	Complex engineering implementation
Gu et al. [102]	2020	IBCM-DL (ARIMA + GRUNN + RBFNN + Bayesian Combination)	The Bayesian framework automatically assigns weights to each sub-model, overcoming the error amplification problem of traditional combination methods.	Sub-model selection relies on prior experience, and the computational cost increases when the framework is extended to more complex deep networks.

Table 9. Research on Attention Mechanism and LSTM based Traffic Flow Prediction.

Authors	Year	Research Method	Advantages	Limitations
Aburasain [103]	2025	CNN-BiLSTM-Attention	Combines CNN, BiLSTM with attention mechanism, multi-feature fusion	Large number of model parameters
Jia et al. [104]	2025	BiLSTM-GAT-Transformer-FFT	Multi-domain feature extraction, frequency domain learning enhancement	High computational complexity
Shuvro et al. [105]	2023	Transformer + SDN-VANET	Captures spatiotemporal features, suitable for VANET environments	High requirements for data quality
Song et al. [106]	2023	TransFusion (TCN, LSTM and Transformer)	Dynamically fuses multiple model outputs, adapts to data changes	Complex model structure

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Z.; Wang, X.; Wang, L.; Wu, Z.; Hu, J.; Yuan, F.; Tian, Z. Traffic Flow Prediction in Intelligent Transportation Systems: A Comprehensive Review of Graph Neural Networks and Hybrid Deep Learning Methods. Algorithms 2026, 19, 310. https://doi.org/10.3390/a19040310

AMA Style

Wang Z, Wang X, Wang L, Wu Z, Hu J, Yuan F, Tian Z. Traffic Flow Prediction in Intelligent Transportation Systems: A Comprehensive Review of Graph Neural Networks and Hybrid Deep Learning Methods. Algorithms. 2026; 19(4):310. https://doi.org/10.3390/a19040310

Chicago/Turabian Style

Wang, Zhenhua, Xinmeng Wang, Lijun Wang, Zheng Wu, Jiangang Hu, Fujiang Yuan, and Zhen Tian. 2026. "Traffic Flow Prediction in Intelligent Transportation Systems: A Comprehensive Review of Graph Neural Networks and Hybrid Deep Learning Methods" Algorithms 19, no. 4: 310. https://doi.org/10.3390/a19040310

APA Style

Wang, Z., Wang, X., Wang, L., Wu, Z., Hu, J., Yuan, F., & Tian, Z. (2026). Traffic Flow Prediction in Intelligent Transportation Systems: A Comprehensive Review of Graph Neural Networks and Hybrid Deep Learning Methods. Algorithms, 19(4), 310. https://doi.org/10.3390/a19040310

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Traffic Flow Prediction in Intelligent Transportation Systems: A Comprehensive Review of Graph Neural Networks and Hybrid Deep Learning Methods

Abstract

1. Introduction

2. Materials and Methods

2.1. Theoretical Foundations of Traffic Flow and Congestion Modeling

2.2. Literature Search and Selection

2.3. Introduction to Graph Neural Networks

2.4. Introduction to Deep Learning Algorithms

2.5. Introduction to Heuristic Optimization Algorithms

2.6. Introduction to Attention Mechanisms

2.7. Benchmark Datasets for Traffic Flow Prediction

2.7.1. Fixed-Sensor Freeway Datasets

2.7.2. Urban Mobility and Trajectory Datasets

2.7.3. Large-Scale and Multi-City Datasets

2.7.4. Comparative Summary and Dataset Selection Guidance

3. Traffic Flow Prediction Based on Graph Neural Network Method

3.1. Federated Graph Neural Network Methods

3.2. Dynamic and Adaptive Graph Structure Methods

3.3. Multi-Graph Fusion and Attention Mechanism Methods

3.4. Cross-Domain Technology Integration Methods

4. Application of Intelligent Optimization and Hybrid Deep Learning in Traffic Flow Prediction

4.1. Decomposition Algorithm Combined with LSTM Prediction

4.2. Heuristic Optimization Algorithm Combined with LSTM Prediction

4.3. LSTM Combined with Attention Mechanism for Prediction

5. Discussion and Prospects

5.1. Discussion

5.2. From Prediction to Operation: Bridging the Research-Practice Gap

5.3. Future Research Prospects

6. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI