A Systematic Literature Review of Traffic Congestion Forecasting: From Machine Learning Techniques to Large Language Models

Attioui, Mehdi; Lahby, Mohamed

doi:10.3390/vehicles7040142

Open AccessSystematic Review

A Systematic Literature Review of Traffic Congestion Forecasting: From Machine Learning Techniques to Large Language Models

by

Mehdi Attioui

^*

and

Mohamed Lahby

^*

Laboratory of Applied Mathematics and Computer Sciences, Higher Normal School, University Hassan II, Casablanca 50069, Morocco

^*

Authors to whom correspondence should be addressed.

Vehicles 2025, 7(4), 142; https://doi.org/10.3390/vehicles7040142

Submission received: 8 October 2025 / Revised: 20 November 2025 / Accepted: 20 November 2025 / Published: 28 November 2025

Download

Browse Figures

Versions Notes

Abstract

Traffic congestion continues to pose a significant challenge to contemporary urban transportation systems, exerting substantial effects on economic productivity, environmental sustainability, and the overall quality of life. This systematic literature review thoroughly explores the development of traffic congestion forecasting methodologies from 2014 to 2024 by analyzing 100 peer-reviewed publications according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We examine the technological advancements from traditional machine learning (achieving 75–85% accuracy) through deep learning approaches (85–92% accuracy) to recent large language model (LLM) implementations (90–95% accuracy). Our analysis indicates that LLM-based systems exhibit superior performance in managing multimodal data integration, comprehending traffic events, and predicting non-recurrent congestion scenarios. The key findings suggest that hybrid approaches, which integrate LLMs with specialized deep learning architectures, achieve the highest prediction accuracy while addressing the traditional limitations of edge case management and transfer learning capabilities. Nonetheless, challenges remain, including higher computational demands (50–100× higher than traditional methods), domain adaptation complexity, and constraints on real-time implementation. This review offers a comprehensive taxonomy of methodologies, performance benchmarks, and practical implementation guidelines, providing researchers and practitioners with a roadmap for advancing intelligent transportation systems using next-generation AI technologies.

Keywords:

intelligent transportation systems; spatiotemporal prediction; graph neural networks; large language models; attention mechanisms

1. Introduction

Traffic congestion is one of the most significant challenges confronting contemporary urban transportation systems, with extensive implications for economic productivity, environmental sustainability, and the quality of life. According to the 2024 INRIX Global Traffic Scorecard [1], traffic congestion continues to impose considerable socioeconomic burdens, with drivers in major metropolitan areas losing more than 100 h annually because of traffic delays. In the United States alone, the average driver lost 43 h to congestion in 2024, resulting in a USD 771 in lost time and productivity per individual and USD 74 billion in cumulative economic losses.

Addressing traffic congestion also necessitates the promotion of sustainable mobility alternatives, particularly for short-distance, urban trips. Research has demonstrated that bicycle usage is a viable alternative to motor vehicles for urban journeys, significantly reducing congestion and providing environmental and space efficiency benefits. Macioszek and Jurdana [2] emphasized that bicycles not only reduce exhaust emissions and occupy minimal space in transport networks but also offer competitive travel times compared with private cars or public transport in urban areas. The integration of cycling infrastructure into traffic management systems represents a complementary approach to congestion forecasting, as bicycle adoption can substantially influence overall traffic patterns and congestion, particularly in mixed-traffic and urban environments.

The incorporation of artificial intelligence (AI) and machine learning (ML) technologies into intelligent transportation systems (ITSs) has emerged as a promising strategy for addressing these challenges. Over the past decade, the traffic congestion forecasting domain has experienced a significant transformation, transitioning from traditional statistical methods to advanced AI-driven approaches such as deep learning. This evolution has been notably accelerated by the advent of deep learning techniques and, more recently, by the integration of large language models (LLMs).

Researchers have increasingly acknowledged that traffic congestion is not merely a result of vehicular dynamics but also involves intricate multimodal interactions, including pedestrian movements, cyclist behavior, and their interactions with vehicular traffic at critical junctures such as intersections and pedestrian crossings. Recent studies have demonstrated that pedestrian crossing behaviors and traffic density interactions significantly influence urban traffic patterns, with machine learning approaches revealing causal relationships between traffic conditions and pedestrian waiting times [3]. Similarly, adaptive signal control systems powered by deep reinforcement learning have shown substantial improvements in managing mixed traffic scenarios at signalized intersections [4]. Mixed-traffic environments, where vehicles, pedestrians, and cyclists interact, present additional complexity to congestion forecasting models, necessitating sophisticated approaches that can capture these heterogeneous interactions [5].

Although several reviews have examined specific aspects of traffic prediction [6,7,8], a significant gap persists in the comprehensive analyses that trace the complete technological evolution from traditional machine learning to LLMs. Previous surveys have typically concentrated on either traditional methods or deep learning approaches in isolation [9] without exploring the transformative potential of LLMs in traffic prediction contexts.

The rapid advancement of large language model (LLM) technologies and their successful implementation across various domains suggest their significant potential for predicting traffic congestion. However, the existing literature identifies several gaps, including a comprehensive analysis comparing the performance of LLMs with traditional machine learning (ML) and deep learning techniques in traffic prediction tasks, detailed performance benchmarks across different methodological approaches, the development of explicit guidelines for practitioners to facilitate the selection of appropriate technologies based on specific use cases and constraints, and an investigation into the unique capabilities that LLMs offer in the integration of multimodal traffic data.

1.1. Research Gaps

This systematic literature review addresses these gaps by offering a comprehensive analysis of traffic congestion forecasting methodologies from 2014 to 2024. Our review provides five distinct insights as follows. First, we achieved complete technological spectrum coverage through an extensive analysis of 100 peer-reviewed publications across three distinct periods: traditional ML (2014–2017, achieving 75–85% accuracy), deep learning (2018–2020, achieving 85–92% accuracy), and LLMs with hybrid approaches (2021–2024, achieving 90–95% accuracy), applying consistent evaluation criteria across all methodological epochs. Second, we present the first systematic LLM integration analysis in traffic congestion forecasting, examining specific mechanisms for textual data processing (traffic reports, social media, and weather descriptions), multimodal fusion architectures, performance in non-recurrent scenarios, and transfer learning capabilities, as well as practical deployment guidelines. Third, we employed a structured PRISMA-based methodology with multiple analytical frameworks, ensuring rigorous PRISMA 2020 compliance (documented with a complete checklist in Supplementary Table S1 PRISMA 2020 Checklist and flow diagram in Figure 1), utilizing the PICO framework for systematic search strategy design, and applying standardized quality assessment criteria to all 100 studies. Fourth, we developed comprehensive technical taxonomies and capability matrices, including detailed taxonomies of AI/ML methods, data types, and performance metrics, and a capability matrix that evaluates approaches across eight critical dimensions: spatial dependency modeling, temporal pattern recognition, multimodal data integration, transfer learning, real-time processing, interpretability, edge case handling, and computational efficiency. Fifth, we provide an evidence-based implementation roadmap with practical model selection frameworks based on seven operational constraints (computational budgets, latency requirements, data availability, interpretability demands, prediction horizons, congestion scenarios, and infrastructure types). This reveals that LLM-based approaches require 50–100× more computational resources than traditional ML methods while achieving 10–15% accuracy gains and offering actionable recommendations for real-world deployment.

1.2. Contributions

This systematic literature review makes five distinct contributions to the traffic congestion forecasting field.

1. Complete Technological Spectrum Coverage: We present the first comprehensive analysis encompassing three distinct technological eras (2014–2024), with a consistent evaluation across 100 peer-reviewed publications. Our temporal analysis examined traditional machine learning approaches (2014–2017, achieving 75–85% accuracy), deep learning methods (2018–2020, achieving 85–92% accuracy), and recent large language model-based and hybrid systems (2021–2024, achieving 90–95% accuracy). This complete spectrum enables researchers to comprehend not only the current state of the art but also the evolutionary trajectory and incremental improvements across methodological transitions.

2. Systematic LLM Integration Analysis: We present the inaugural comprehensive analysis of large language models (LLMs) specifically applied to traffic congestion forecasting. Our study elucidates the specific mechanisms by which LLMs process textual traffic data, including incident reports, social media feeds, weather descriptions, and traffic news; multimodal fusion architectures that integrate LLMs with traditional sensors and spatial–temporal models; performance characteristics in non-recurrent congestion scenarios where traditional models encounter difficulties; transfer learning capabilities that facilitate cross-city and cross-domain applications; and practical deployment considerations, encompassing computational requirements, latency constraints, and interpretability trade-offs.

3. Structured PRISMA-Based Methodology with Multiple Analytical Frameworks: We employ a rigorous systematic review methodology to ensure transparency and reproducibility, which includes full compliance with PRISMA 2020 as documented through a comprehensive checklist (Supplementary Table S1) and a transparent flow diagram (Figure 1); the application of the PICO framework for designing a systematic search strategy, specifying the population (traffic systems), intervention (AI/ML forecasting methods), comparison (methodological approaches), and outcomes (prediction accuracy, computational costs); a standardized quality assessment with explicit scoring criteria (publication quality 30%, methodological rigor 50%, reporting transparency 20%), achieving an inter-rater reliability of

κ

= 0.91; and a comprehensive data extraction protocol capturing over 25 variables per study, enabling multidimensional analysis.

4. Comprehensive Technical Taxonomies and Capability Matrices: We develop comprehensive classification systems to facilitate systematic comparison: (a) technical taxonomies that organize AI/ML methods, data types and sources, and performance metrics, employing hierarchical classification for detailed analysis; (b) a capability matrix that evaluates approaches across eight critical dimensions—spatial dependency modeling, temporal pattern recognition, multimodal data integration, transfer learning, real-time processing capability, interpretability, edge case handling, and computational efficiency, using evidence-based ratings (low/medium/high) derived from quantitative performance data; and (c) detailed justifications for each rating, linked to specific study findings and performance benchmarks.

5. Evidence-Based Implementation Roadmap: We offer actionable guidelines for practitioners, derived from empirical evidence across 100 studies: a model selection framework that accounts for seven operational constraints, including computational budget, latency requirements, data availability, interpretability needs, prediction horizon, congestion type, and infrastructure context; a quantitative performance–cost trade-off analysis indicating that LLM-based approaches necessitate 50–100× more computational resources while achieving a 10–15% improvement in accuracy compared to traditional methods; deployment recommendations tailored to three infrastructure scenarios—urban networks, highways, and mixed environments—with specific algorithm suggestions; data requirement specifications, encompassing minimum temporal resolution (5 min optimal), spatial coverage (network-level versus link-level), and multimodal integration strategies; and practical guidelines for addressing real-world challenges, including missing data handling, model adaptation, and scalability considerations.

This review evaluated 100 carefully selected publications according to the PRISMA guidelines for systematic reviews (Figure 1). Our analysis focused on peer-reviewed journal articles and conference proceedings that presented original research on traffic congestion forecasting using AI/ML.

The remainder of this paper is organized as follows: Section 2 presents a comprehensive literature review that establishes the context and background of this study. Section 3 details our systematic review methodology, including the search strategies and the selection criteria. Section 4 outlines the procedures for data extraction and analysis. Section 5 offers a comparative analysis of various approaches. Section 6 discusses our key findings across multiple dimensions and proposes directions for future research. Finally, Section 7 concludes the paper with key takeaways and implications for the field.

2. Background and Related Work

2.1. Evolution of Traffic Prediction Research

In the past decade, traffic congestion forecasting has evolved through three distinct phases.

2.1.1. Traditional Machine Learning Era (2014–2017)

During this period, researchers primarily employed classical machine learning techniques, such as Support Vector Machines (SVMs), Random Forests, and Bayesian methods [11,12]. These approaches focused on extracting hand-crafted features from historical traffic data and demonstrated satisfactory performance in short-term predictions, achieving accuracy rates between 75% and 85%. Traditional ML methods excel in scenarios with relatively stable traffic patterns and well-structured datasets. For instance, SVM-based approaches achieved Mean Absolute Percentage Errors (MAPEs) of 12–18% for short-term (15–30 min) predictions on highway traffic datasets [11]. Random Forest algorithms have demonstrated particular effectiveness in handling feature interactions, with reported accuracies reaching 82–85% for recurrent congestion prediction [13]. K-Nearest Neighbors (K-NN) algorithms provided computational efficiency advantages, enabling real-time predictions with minimal resource requirements, albeit at reduced accuracy compared to more sophisticated approaches [14,15]. However, these methods face significant challenges in addressing complex spatial dependencies and nonrecurring events such as accidents or special events. The requirement for extensive feature engineering limits their adaptability to diverse traffic scenarios, with domain experts needing to manually design features that are specific to each geographic context or prediction task. Furthermore, traditional ML approaches struggle with long-term predictions exceeding 60 min, with accuracy degrading substantially as the prediction horizon is extended. The linear or shallow nonlinear relationships captured by these models are insufficient for modeling the intricate spatiotemporal dynamics of the urban traffic networks. Although advantageous for real-time applications, the computational efficiency of these methods comes at the cost of reduced predictive power in complex urban networks with intricate spatiotemporal correlations. These limitations motivated the transition to deep learning approaches, which can automatically learn hierarchical feature representations from raw traffic data without requiring explicit feature engineering processes.

2.1.2. Deep Learning Revolution (2018–2020)

The advent of deep learning has marked a significant advancement in traffic prediction. Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs) have become predominant architectures, achieving accuracy rates between 85 and 92% [16,17]. These deep learning approaches have demonstrated substantial improvements over traditional ML methods by automatically extracting hierarchical features from raw sensor data. LSTM networks have proven particularly effective in capturing long-term temporal dependencies in traffic sequences, addressing the vanishing gradient problem that hindered earlier recurrent architectures. Hybrid CNN-LSTM models [17] achieved MAPE reductions to 6–10% by leveraging CNNs for spatial feature extraction and LSTMs for temporal modeling. Gated Recurrent Unit (GRU) variants offer computational efficiency advantages while maintaining competitive prediction accuracy, with 15–20% faster training times compared to LSTM counterparts [18]. Graph Neural Networks (GNNs) have emerged as particularly effective tools for capturing spatial relationships within road networks. The T-GCN model [16] integrated graph convolution operations with temporal modeling, achieving RMSE improvements of 15–20% compared to conventional approaches. Attention mechanisms introduced during this period [19,20] enabled models to dynamically focus on relevant spatial and temporal patterns, providing 5–10% accuracy enhancements over non-attention architectures. This period witnessed enhanced management of spatiotemporal dependencies, with models successfully predicting traffic states across network scales. However, challenges persist in integrating unstructured data sources such as social media, weather descriptions, and event information. The requirement for large-scale labeled training data and the limited interpretability of deep neural networks also present obstacles to practical deployment. Real-time inference constraints necessitate model compression techniques, with knowledge distillation and pruning approaches achieving 40–60% computational cost reductions while maintaining acceptable accuracy levels. The demonstrated success of deep learning in traffic prediction established the foundation for the subsequent integration of large language models, which can address the limitations of contextual understanding and multimodal data processing.

2.1.3. LLM Integration Phase (2021–2024)

Research initiatives such as TrafficBERT [21], GPT4MTS [22], and spatial–temporal LLMs [23] have demonstrated the potential of these models to integrate contextual information from various sources, achieving accuracy rates of 90–95%. LLMs have shown particular promise in processing multimodal data and understanding the semantic context of traffic events [24]. Unlike traditional deep learning approaches, which struggle with textual information, LLMs can directly process event descriptions, weather reports, social media feeds, and other unstructured data. The LLM-MPE framework [25] demonstrated superior performance in human mobility prediction during public events, with textual data integration significantly enhancing prediction accuracy beyond that of sensor-only approaches. The transfer learning capabilities distinguish LLM-based methods from conventional deep learning architectures. Pretrained language models capture general transportation knowledge that can be fine-tuned with limited city-specific data, partially addressing the challenge of geographic generalization. The TPLLM framework [26] showed that LLM-based models achieve 12–18% better performance in low-data scenarios compared to traditional deep learning approaches trained from scratch. Recent comprehensive reviews have identified several significant trends and challenges in LLM integration for traffic forecasting [27,28,29,30]. Recently, LLMs have been increasingly incorporated into traffic prediction systems, representing the latest frontier in forecasting methodology evolution [31,32,33]. However, the integration of LLMs introduces substantial computational challenges that must be addressed. Training and inference costs increase by 50–100× compared to traditional ML methods, with state-of-the-art models requiring GPU clusters for practical deployment. Real-time prediction latency remains a critical concern, with LLM inference times often exceeding the acceptable bounds for time-sensitive traffic management applications. The complexity of domain adaptation and the “black box” nature of LLMs also present obstacles to regulatory acceptance and operational deployment in safety-critical transportation systems. Hybrid architectures that combine LLM contextual understanding with specialized deep learning spatiotemporal models represent a promising direction, balancing performance with computational feasibility. These approaches leverage LLMs for event interpretation and contextual feature extraction while utilizing efficient GNN-based architectures for core traffic state prediction.

2.2. Current State of Research

2.2.1. Methodological Advances

Graph-based methodologies have emerged as the prevailing framework for spatial modeling, with T-GCN [16] and S-GCN-GRU-NN [18] demonstrating superior efficacy in capturing network-wide dependencies. The adoption of attention mechanisms and transformer architectures has gained momentum, with empirical studies indicating a 5–10% enhancement over traditional sequential models [19,20].

2.2.2. Data Integration Challenges

The integration of heterogeneous data sources poses a substantial challenge. Traditional methodologies predominantly depend on sensor data; however, recent studies have investigated the incorporation of weather information [34], social media feeds [35], and event data [36]. Large language models (LLMs) have demonstrated particular promise in this domain because of their ability to process textual information in conjunction with numerical data.

2.2.3. Practical Implementation Barriers

Despite these theoretical advancements, several obstacles hinder practical implementation.

Computational requirements: Advanced models necessitate substantial resources.
Real-time constraints: Numerous models are unable to satisfy stringent latency demands.
Data quality issues: The presence of missing or noisy sensor data adversely impacts performance.
Generalization challenges: Models trained in one city frequently fail to transfer effectively.

3. Research Methodology

3.1. Systematic Review Protocol

This systematic literature review was conducted and documented in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement [10,37]. The PRISMA 2020 guidelines, which offer an evidence-based minimum set of items for reporting in systematic reviews, were adhered to throughout all phases of this review.

PRISMA Compliance Documentation

A completed PRISMA 2020 checklist documenting where each of the 27 checklist items appears in this manuscript is provided as Supplementary Table S1.
The PRISMA 2020 flow diagram documenting the study selection process is presented in Figure 1.
This review followed the PRISMA 2020 guidelines adapted for computer science systematic reviews to ensure comprehensive, transparent, and reproducible results.

Protocol Registration: This review was not prospectively registered because it aimed to offer a comprehensive overview of the research landscape rather than assess the effects of an intervention. This approach is consistent with the PRISMA 2020 guidelines for scoping and methodology-focused systematic reviews in computer science.

3.2. Research Questions

Six research questions were identified, as shown in Table 1. Each research question was accompanied by a justification statement explaining its relevance.

3.3. Search Strategy, Criteria and Selection Process

In accordance with our comprehensive review strategy across multiple databases, we established specific inclusion and exclusion criteria for the included studies [38]. These criteria were essential for refining our search to ensure the relevance and rigor of the studies included. Specifically, we concentrated on research published between 2014 and 2024 that addressed traffic congestion forecasting or prediction, utilized machine learning, deep learning, or LLM approaches, and consisted of peer-reviewed journal articles and conference proceedings with clear methodological descriptions and performance evaluations. The exclusion criteria encompassed publications prior to 2014 that lacked clear performance metrics, papers not focused on traffic congestion or flow prediction, reviews, surveys, or meta-analyses without original research, and publications in languages other than English.

Following the establishment of our inclusion and exclusion criteria, we meticulously applied them during the study selection process using a search query across various web-based databases.

IEEE Xplore Digital Library
ScienceDirect (Elsevier)
SpringerLink

Our search strategy employed the following query with Boolean operators: the search string was applied to the title, abstract, and keywords fields, as these fields offer the most pertinent initial screening while ensuring comprehensive coverage of the literature. This approach effectively balances sensitivity (capturing relevant studies) with precision (avoiding excessive irrelevant results), which is regarded as the best practice for systematic reviews in computer science [37]. A full-text search would have resulted in an excessive number of false positives, where congestion forecasting terms appeared only in peripheral sections or in the references.

(traffic AND (congestion OR flow OR volume OR density) AND

(prediction OR forecasting) AND (machine learning OR deep learning OR

neural networks) AND (LLM OR “large language model” OR transformer))

OR (intelligent transportation systems AND prediction) OR

(traffic data AND (analytics OR modeling))

The selection process consisted of four stages.

1.: Identification: Initial search yielded 3847 records
2.: Screening: Title and abstract screening reduced to 512 relevant papers
3.: Eligibility: Full-text assessment excluded 412 studies
4.: Inclusion: Quality assessment resulted in 100 high-quality papers

Search Strategy Rationale: Our search strategy incorporated terms related to large language models (LLMs), specifically “LLM OR large language model OR transformer,” which may theoretically omit certain studies predating 2020 that do not reference these terms. Nonetheless, several factors alleviate this concern: (1) the use of OR logic enables the inclusion of papers that align with traditional machine learning (ML) and deep learning (DL) terminology, even in the absence of LLM-specific terms, during the initial identification phase; (2) the extensive inclusion of traditional ML and DL terms (machine learning, deep learning, and neural networks) ensures the identification of literature predating 2020; (3) the term “transformer” encompasses papers from 2017 to 2020 that discuss attention mechanisms prior to the widespread adoption of LLM terminology; and (4) additional relevant studies were identified through supplementary backward citation tracking from key papers. To validate the comprehensiveness of our search, we conducted an additional search in October 2024, employing only traditional ML/DL terms without LLM-related keywords, specifically targeting the 2014–2020 period. This validation search yielded 2847 records, and cross-referencing revealed that 94% of our 2014–2020 papers (54 out of 57 papers) were captured by this alternative approach, thereby confirming the comprehensive coverage of the pre-LLM literature. The three papers uniquely identified by our primary search were early transformer papers (2020) that discussed attention mechanisms, which are pertinent to our analysis of technological evolution.

Quality assessment revealed specific areas where future research could contribute to the field. Notably, we identified a gap in the clarity of research objectives and methodology, appropriateness of the AI techniques employed, validation approach and experimental design, performance evaluation metrics, limitations, and acknowledgment and discussion of results in the context of existing literature.

A systematic quality assessment framework was implemented for all studies that advanced to the full-text review. This assessment utilized a quantitative scoring system adapted from Kitchenham’s guidelines for systematic reviews in software engineering [37] and specifically modified for the evaluation of AI/ML research. Each study was evaluated across three dimensions using explicit scoring rubrics (Table 2).

Minimum Quality Threshold: Studies scoring below 60/100 points were excluded from the final analysis. This threshold ensured the inclusion of only high-quality, methodologically sound research with adequate reporting of transparency. The final quality scores for the 100 included studies showed the following distributions:

High quality (80–100 points): 34 studies (34%);
Good quality (70–79 points): 41 studies (41%);
Acceptable quality (60–69 points): 25 studies (25%);
Mean quality score: 73.8 (Standard deviation SD = 8.4, Range: 60–94).

This distribution confirms that all the included studies met the stringent quality standards, with 75% achieving good-to-high quality ratings.

3.4. Key Terms Combination

This stage is consistent across most PICO frameworks, wherein keywords were derived from the preceding research questions to formulate the key term combination for the search [39]. Each question was deconstructed into four primary elements: population (P), intervention (I), comparison (C), and outcome (O). In Table 3, each element is accompanied by a sentence that illustrates the core of the systematic literature review (SLR).

This PICO process has also been used in [40,41,42] but in this study, we projected this procedure onto traffic congestion prediction by focusing on the evolution from machine learning-based techniques to large language models. Table 4 lists the words obtained for each PICO component.

We developed a comprehensive taxonomy of the key terms employed in the PICO process for traffic congestion forecasting research, as shown in Table 5, Table 6 and Table 7. This taxonomy organizes terms based on their conceptual similarities and roles within the research domain and encompasses seven categories:

AI/ML Method Taxonomy: This hierarchically organized taxonomy ranges from traditional machine learning methods to deep learning, large language models (LLMs), and advanced techniques, illustrating the evolution of approaches over time.

Data Type Taxonomy: This categorizes the various types of data utilized in traffic forecasting, from structured sensor data to spatiotemporal information and contextual data such as weather and events.

Congestion Scenario Taxonomy: This classifies different congestion types by cause (recurrent versus non-recurrent) and time horizon (short-, medium-, and long-term prediction).

Performance Metrics Taxonomy: This groups the evaluation metrics used to assess the model performance, including both accuracy and error metrics.

Limitation and Challenge Taxonomy: This organizes the common limitations mentioned in the literature, ranging from data-related issues to performance-related constraints.

Implementation Considerations Taxonomy: This categorizes the practical aspects of model deployment, including resource requirements and real-world applicability of the model.

Methodological Evolution Taxonomy: This maps the progression of research paradigms and momentum in the field over the past decade to the present.

4. Data Extraction

Taxonomy has functioned as a structured vocabulary for systematically analyzing papers and organizing findings according to the PICO framework. It encapsulates the transition from traditional methods to LLMs while preserving the conceptual relationships between analogous terms. This classification facilitated our data analysis using a mixed-methods approach, encompassing a quantitative analysis of performance metrics, qualitative assessment of methodologies, chronological analysis to discern trends, comparative analysis of various AI techniques, and thematic analysis of limitations and challenges.

4.1. PRISMA-Based Data Extraction

The data extraction process was meticulously designed to systematically address all pertinent PRISMA 2020 reporting guidelines. To ensure reliability and consistency, two reviewers independently extracted data from a 20% random sample of the included studies, achieving a Cohen’s

κ

of 0.91, which indicates near-perfect agreement. Disagreements were resolved through discussion and consensus. The remaining studies were extracted by one reviewer, and random checks were conducted by a second reviewer to ensure quality.

4.2. Publication Trends

The distribution of the 100 selected papers [11,12,13,14,16,17,18,19,20,21,22,23,25,26,34,35,36,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124] across the study period (2014–2024) is illustrated in Figure 2. There is a clear upward trend in publications, with a notable surge beginning in 2019–2020, which corresponds to an increased interest in deep learning applications for traffic analysis.

4.3. AI Techniques Distribution

The distribution of AI techniques across the selected papers, as shown in Figure 3, reveals the predominance of machine learning and deep learning approaches, with the recent emergence of LLM-based methods being noted.

Figure 4, Figure 5 and Figure 6 show the detailed distribution of machine learning, deep learning, and LLM-related techniques used in traffic congestion forecasting in the selectedstudies.

The methodological landscape of the papers under consideration reveals a diverse array of approaches to studying this topic in the literature. Machine learning techniques, excluding large language models, constitute a substantial portion of the research methodologies. A smaller yet significant number of studies have focused exclusively on language models. Some studies have demonstrated a hybrid approach that integrates both machine learning and large language model (LLM) techniques. Notably, deep learning methodologies have emerged as the most prevalent approach in these studies. This distribution highlights the varied technological strategies employed by researchers in their studies, reflecting the dynamic nature of the field and the ongoing exploration of different computational techniques.

4.4. Performance Metrics

The assessment of machine learning models generally encompasses various performance metrics, with accuracy being the most commonly reported metric. Both general and specific error rates are frequently used to evaluate the model performance. For regression tasks, the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are widely utilized, offering insights into the average magnitude of prediction errors. The Mean Absolute Percentage Error (MAPE) provides a percentage-based measure of prediction accuracy. In the context of classification problems, the Receiver Operating Characteristic (ROC) curve and its associated Area Under the Curve (AUC) are often used to assess model discrimination. Furthermore, precision, F1-score, and recall are employed, albeit less frequently, to provide a more nuanced understanding of model performance, particularly in scenarios involving imbalanced classification. The most commonly reported performance metrics across studies are shown in Figure 7.

A critical challenge in systematic reviews of machine learning research is the heterogeneity of performance metrics and evaluation protocols across various studies. This section details our approach to harmonizing the metrics and aggregating the performance ranges reported in this review. Studies in our corpus employed diverse accuracy metrics, including RMSE, MAE, MAPE,

R^{2}

, and classification accuracy. To enable a fair comparison, we applied systematic conversion. 1. Accuracy Unification: For regression tasks (most common), we prioritized the Mean Absolute Percentage Error (MAPE) as the primary metric because it provides a scale-independent comparison. When studies reported only the RMSE or MAE,

M A P E_{e s t i m a t e d} = (\frac{RMSE}{m e a n_{g r o u n d_t r u t h}}) \times 100

(1)

This approximation assumes a normal error distribution and provides conservative estimates. Where both MAPE and RMSE/MAE were reported (68 of 100 studies), we validated this conversion formula, finding a correlation of r = 0.89 (p < 0.001) between the calculated and reported MAPE values. 2. Classification Metrics: For studies framing congestion as classification (22 of 100 studies), accuracy, F1-score, and AUC were used directly. When converting to a comparable scale using regression metrics:

-: Accuracy > 90% ≈ MAPE < 10%
-: Accuracy 80–90% ≈ MAPE 10–20%
-: Accuracy < 80% ≈ MAPE > 20%

These equivalencies were established by analyzing 15 studies that reported both classification and regression metrics. Table 8 summarizes the performance results of various methodological approaches.

To ensure transparency and enable reproducibility, we systematically documented the characteristics of all the datasets employed in the reviewed studies. Table 9 presents comprehensive metadata for the eight primary datasets, accounting for 93% of all studies, including temporal and spatial resolution specifications, sensor infrastructure details, and data collection timeframes.

4.5. Common Limitations

Researchers frequently encounter various challenges in their studies, with data-related issues being the most prevalent. Difficulties in making accurate predictions and performance problems also rank high among the obstacles faced by users. Accuracy is another significant challenge for researchers. Less common but still noteworthy are limitations in computing power, concerns about efficiency, and difficulties in processing information in real time. Although less frequent, problems with data handling and accurate detection continue to pose challenges for some researchers. These limitations collectively represent the diverse array of obstacles that researchers must navigate to pursue scientific knowledge and advancement. The most frequently cited limitations in the reviewed papers are presented in Figure 8.

The primary challenges are as follows:

Data-related issues (66% of studies)
Prediction accuracy limitations (50%)
Performance constraints (36%)
Computational requirements (32%)

5. Comparative Analysis Across Model Types

5.1. Traditional Machine Learning vs. Deep Learning

In earlier research conducted between 2014 and 2018, traditional machine learning techniques, such as Support Vector Machines (SVMs), random forests, and regression models, were predominant. However, these methods have been supplanted by deep learning approaches (Figure 9). The primary distinctions are as follows.

The advantages of traditional machine learning include reduced computational demands, enhanced interpretability, effectiveness with smaller datasets, expedited training times, and straightforward implementation.

The advantages of deep learning include superior accuracy in modeling complex traffic patterns, enhanced capability for managing spatiotemporal relationships, ability to automatically extract relevant features, increased robustness to noisy data, and improved performance in non-stationary traffic environments.

The shift from traditional machine learning to deep learning is evident in the chronological analysis, with a significant increase in the adoption of deep learning methodologies post-2018. Notably, Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) architectures have been predominantly utilized for time-series predictions.

5.2. Deep Learning vs. LLM Approaches

The advent of LLM-based methodologies in the domain of traffic congestion forecasting represents a recent development, with most studies pertaining to LLMs emerging post-2021. The comparative analysis yielded the following results:

The strengths of deep learning (non-LLM) in the context of traffic analysis include its well-established validation within the domain, generally lower computational requirements than LLMs, more accessible implementation for transportation researchers, and direct design for time-series prediction.

The strengths of large language model (LLM) approaches include their ability to incorporate contextual and semantic information, enhanced capacity to handle multimodal data such as text descriptions and sensor data, superior transfer learning capabilities derived from pretrained models, potential for integrating external knowledge, and increased robustness in the presence of missing data.

Large language model (LLM) methodologies exhibit significant potential in scenarios requiring the synthesis of varied data sources, particularly when contextual elements such as events, meteorological conditions, or traffic incidents exert a substantial impact on congestion patterns (Figure 10).

5.3. Hybrid Approaches

Several studies (n = 4) have investigated hybrid methodologies that integrate traditional machine learning (ML), deep learning, and large language model (LLM) techniques. These hybrid models typically exhibit enhanced performance compared to approaches that utilize a single technique. Common configurations of these hybrid models include LSTM + CNN architectures for spatiotemporal data, Transformer + LSTM for the integration of sequential and attention mechanisms, BERT combined with traditional ML for feature extraction and prediction, and GCN + GRU for capturing both local and global spatial correlations.

These hybrid methodologies capitalize on the advantages of various techniques to effectively address the complex nature of traffic congestion.

5.4. Cross-Methodology Performance Analysis

Table 10 provides a comprehensive comparison of representative studies that used various methods.

5.5. Capability Matrix

Table 11 presents a comprehensive assessment of the capabilities of these methodologies.

5.6. Performance–Complexity Trade-Off Analysis

Figure 11 visualizes the relationship between the model complexity and prediction accuracy. This scatter plot shows the relationship between the prediction accuracy and computational cost for the different model types. Traditional ML has lower accuracy (75–85%) but minimal computational requirements (1–5× baseline). Deep learning has improved accuracy (85–92%) with moderate computational costs (10–25× baseline). LLM have high accuracy (90–95%) but substantial computational demands (50–80× baseline). Hybrid models have the highest accuracy (90–95%) with variable computational requirements.

The computational costs were normalized relative to the baseline to allow for a fair comparison.

Baseline Reference: Support Vector Machine (SVM) with RBF kernel on PeMS dataset (5 min resolution, 1-hour prediction horizon) implemented on single CPU core (Intel Xeon E5-2680v4 @ 2.4 GHz).

Cost Metrics Normalized:

Training Time Ratio: $C o s t_{t r a i n i n g} = (T i m e_{m e t h o d} / T i m e_{b a s e l i n e})$
Inference Latency Ratio: $C o s t_{i n f e r e n c e} = (L a t e n c y_{m e t h o d} / L a t e n c y_{b a s e l i n e})$
Memory Usage Ratio: $C o s t_{m e m o r y} = (R A M_{p e a k_m e t h o d} / R A M_{p e a k_b a s e l i n e})$

When studies used different hardware (GPUs), time-based costs were adjusted using standard benchmark ratios (e.g., NVIDIA V100 ≈ 20× single central processing unit (CPU) core for neural network training).

Cost Range Interpretation: Ranges such as “50–100×” for computational cost indicate the following:

-: Lower bound (50×): Simplest implementation, smaller model, single GPU.
-: Upper bound (100×): Complex architecture, larger model, distributed training;
-: Reflects real-world deployment variance based on implementation choices.

All computational costs in this review are reported relative to this normalized baseline, ensuring interpretable and consistent comparisons.

Key insights:

Traditional ML offers best efficiency for simple scenarios;
Deep learning provides balanced performance–complexity ratio;
LLMs excel in accuracy but at significant computational cost;
Hybrid approaches offer flexibility but increase complexity.

5.7. Application Suitability Matrix

Based on our analysis, we developed recommendations for different application scenarios (Table 12).

5.8. Detailed Comparison Tables

We present detailed comparison tables of the reviewed studies categorized by the primary methodological approach. For each paper, we included the methods used, datasets, limitations, performance metrics, and key findings to facilitate direct comparisons between studies. Table 13 presents a sample of the 100 reviewed papers, selected to illustrate the diversity of approaches and evolution over the study period.

6. Results and Outcomes: Answering Research Questions

6.1. Impact of Evolution from ML to LLMs on Traffic Congestion Forecasting (RQ1)

The progression from traditional machine learning to deep learning and ultimately to large language models has fundamentally transformed the prediction of traffic congestion. Our analysis revealed several significant advancements in the continuum of care.

Predictive accuracy has shown consistent improvement, with traditional ML approaches (2014–2017) achieving 80–85% accuracy, deep learning methods (2018–2020) achieving 85–92%, and recent LLM-integrated systems (2021–2024) demonstrating 90–95% accuracy across diverse traffic scenarios.

Contemporary LLM-based approaches have transcended the limitations of earlier systems by effectively integrating unstructured data sources, including incident reports, weather descriptions, and social media feeds, with traditional sensor data. This multimodal integration creates a comprehensive analytical framework.

Notably, LLM implementations exhibit sophisticated contextual understanding by interpreting the semantic significance of traffic events and their potential cascading effects on traffic congestion. Additionally, these systems have transfer learning capabilities and can apply pre-existing knowledge structures without requiring extensive domain-specific training data.

Our review further indicates substantial improvements in edge case management and non-recurrent congestion event prediction, which are historically challenging scenarios that limit the practical utility of earlier forecasting systems.

These developments suggest promising research directions, particularly in hybrid systems that combine traditional physics-based models with the contextual understanding capabilities of modern language models.

6.2. Performance–Cost Trade-Offs Between ML, DL, and LLMs (RQ2)

Our analysis reveals distinct performance–cost trade-offs across the methodological spectrum, as shown in Table 14.

A linear correlation was observed between the computational cost and accuracy. Large language model (LLM) methodologies employ 50–100 times more computational resources than traditional machine learning (ML) techniques do. This resulted in an enhancement of the prediction accuracy by up to 10–15%, and the optimized deep learning models frequently represented the optimal balance for real-time applications.

6.3. AI Model Performance Across Transportation Infrastructure Scenarios (RQ3)

The performance of the AI models varied significantly across the different transportation infrastructure scenarios, as shown in Table 15.

6.4. Effectiveness of AI Models in Predicting Different Types of Congestion (RQ4)

The effectiveness of the AI models varied significantly based on the type of congestion and prediction time frame (Table 16 and Table 17). In the context of recurrent congestion, traditional machine learning (ML) techniques are cost effective for making short-term predictions. However, for non-recurrent events and extended prediction horizons, the significant performance benefits of large language models (LLMs) and hybrid methodologies justify their increased computational expense.

6.5. Impact of Traffic Parameters on AI Model Performance (RQ5)

Our analysis identified several critical traffic parameters that significantly influenced the performance of the AI model for congestion forecasting as listed in Table 18 and Table 19. The principal insight is that the integration of multimodal approaches results in the most substantial enhancement in performance, whereas temporal resolution consistently improves the outcomes across various model types.

6.6. Comparative Advantages of Different AI Architectures (RQ6)

A comparative analysis of different AI architectures and their combinations revealed distinct advantages for specific traffic prediction scenarios, as shown in Table 20.

The analysis revealed that methodologies that integrate complementary architectures consistently outperform single-architecture approaches, yielding performance enhancements ranging from 5 to 12% in complex traffic scenarios. CNN + LSTM configurations [17] exhibit high efficacy by adeptly capturing both spatial and temporal traffic patterns and are particularly effective in video-based traffic analyses. GNN + GRU approaches [18] achieve notable performance by merging network topology modeling with temporal sequence learning, excelling in intricate urban networks. BERT + ML implementations [21] offer efficient solutions that leverage contextual understanding while maintaining computational efficiency, rendering them effective for incorporating textual data. Transformer + GNN architectures [111] demonstrate the highest overall performance in complex scenarios with multiple influencing factors. The optimal architectural combination is significantly dependent on the specific prediction task, available data types, and the computational constraints.

7. Discussion and Future Directions

7.1. Temporal Evolution of Methods

An examination of publications from 2014 to 2024 demonstrated discernible chronological advancements in the methodologies used to predict traffic congestion.

2014–2017: Traditional ML Dominance This period was marked by the widespread use of traditional machine learning techniques, such as Support Vector Machines, Random Forests, and Bayesian methods. These approaches primarily focus on analyzing historical traffic data with limited integration of external factors such as weather conditions.
2018–2020: Deep Learning Emergence During this period, a notable transition occurred, with deep learning methodologies, particularly Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) architectures, emerging as the dominant techniques. These models effectively address the limitations of traditional machine learning in capturing intricate temporal and spatial dependencies in traffic data analysis.
2021–2024: LLM Integration and Hybrid Models Recent developments have seen the advent of approaches based on large language models (LLMs) and advanced hybrid models (HM). These methodologies facilitate the integration of multimodal data sources and contextual information (Figure 12), resulting in enhanced predictive accuracy, particularly for non-recurrent congestion events.

7.2. Performance Comparison

A comparative analysis of the performance metrics across various methodological approaches yielded several key insights, as presented in Table 21. Regarding accuracy improvements, a general trend of increasing prediction accuracy was observed over the study period, with hybrid models demonstrating the highest performance, achieving 90–95% accuracy compared to 75–85% for traditional machine learning approaches. In terms of error reduction, recent approaches have achieved significant reductions in the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) compared to earlier methods, with hybrid models reporting MAE values as low as 2–4% and RMSE values of 3–5, representing substantial improvements over traditional methods. Regarding computational trade-offs, although advanced models, particularly those based on large language models (LLMs), exhibit superior accuracy, they entail significantly increased computational requirements, with training times extending to days and memory requirements ranging from 10 to 100 GB, posing implementation challenges for real-time applications such as autonomous driving. In terms of sensitivity, LLM and hybrid approaches have shown notably improved performance during special events, adverse weather conditions, and other non-recurrent congestion scenarios compared with traditional methods. Regarding the temporal horizon impact, deep learning and LLM approaches consistently outperform traditional machine learning for longer prediction horizons (>30 min), whereas the performance gap narrows for very short-term predictions.

7.3. Domain-Specific Considerations

Various domain-specific factors influence the efficacy of these methods. In the context of urban versus highway environments, Convolutional Neural Network (CNN)- and Graph Neural Network (GNN)-based approaches demonstrate particular effectiveness in complex urban networks, whereas Long Short-Term Memory (LSTM) models frequently excel in highway traffic prediction. Regions with limited sensor infrastructure can derive substantial benefits from the transfer learning capabilities of large language model (LLM)-based approaches. Regarding computational resources, implementation contexts with constrained computational capacities may find optimized traditional machine learning (ML) approaches more advantageous despite their relatively lower accuracy. In terms of integration requirements (Figure 13), scenarios necessitating the fusion of heterogeneous data sources, such as traffic sensors, weather data, events, and social media, were most effectively addressed by the LLM and hybrid approaches.

7.4. Specific Considerations for LLM vs. ML Comparison

7.4.1. Contextual Understanding

A significant advantage of large language model (LLM)-based approaches over traditional machine learning (ML) and basic deep learning methods is their enhanced ability to comprehend context. This advantage is evident in several domains. In event impact modeling, LLMs can incorporate information about special events, roadwork, or incidents by processing textual descriptions and relating them to the traffic patterns. Regarding semantic interpretation, LLMs can interpret the semantic meaning of traffic-related texts, allowing news reports, social media, and other textual sources to be used as input. In terms of transfer learning capabilities, the application of LLM-based approaches has substantially improved transfer learning within this domain, enabling models to leverage the pretrained knowledge of various factors influencing traffic patterns, thereby reducing the need for extensive domain-specific training data sets. Regarding edge case handling, the review highlights a significant improvement in managing edge cases and non-recurrent congestion events with the implementation of LLM-based methodologies, effectively addressing the critical limitations of previous approaches. In the temporal context, LLMs exhibit a superior ability to comprehend temporal references (such as holiday periods, weekends, and rush hours) with greater nuance than traditional models do.

7.4.2. Technical Implementation Challenges

Despite their advantages, techniques based on large language models (LLMs) face significant practical challenges compared to traditional machine learning approaches. LLMs require substantial computational resources for both training and inference, which limits their applicability in real-time scenarios. In the context of domain adaptation, it is crucial to employ domain-specific data augmentation to effectively fine-tune general-purpose LLMs for traffic-related applications. Regarding interpretability, the predictions generated by LLMs are less interpretable than those produced by simpler models, raising concerns regarding stakeholder confidence and system validation. For data integration, the incorporation of numerical sensor data with textual information for LLM processing necessitates the use of sophisticated data fusion algorithms.

7.4.3. Performance in Edge Cases

The comparative analysis identified significant differences in the handling of traffic prediction edge cases between the machine learning (ML) and large language model (LLM) approaches. In instances of non-recurrent congestion, such as accidents, severe weather, and special events, LLMs and hybrid models exhibit superior performances. Regarding data sparsity, LLM-based approaches demonstrate greater robustness to missing data and sparse sensor coverage than traditional ML methods. Regarding the prediction horizon, whereas traditional ML models experience rapidly degrading performance beyond short-term predictions, LLM and hybrid approaches maintain better accuracy for long-term predictions. In terms of transferability, LLM-based models show enhanced cross-location transferability and require less location-specific training data than traditional models do.

7.5. Hybrid Approaches: Taxonomy and Characteristics

Hybrid approaches that combine multiple methodological paradigms achieve the highest performance levels (Table 22). We identified four distinct types of hybrid architectures based on the integration mechanisms:

Type 1: Ensemble Hybrids Ensemble methodologies integrate predictions from multiple independent models using mechanisms such as weighted averaging, stacking and voting. The architectural characteristics are as follows:

-: Multiple models trained independently on identical or varied datasets.
-: Integration at the prediction stage via meta-learning or straightforward aggregation.
-: An example includes the combination of Random Forest, XGBoost, and LSTM through stacked generalization.
-: Performance improvement ranges from 3–7% compared to individual models.
-: Computational cost comprises the sum of individual model costs with minimal aggregation overhead.

Type 2: Physics-Informed Hybrids Integration of data-driven ML/DL models with physical traffic flow models or constraints.

-: Neural networks that incorporate traffic flow equations as soft constraints.
-: Deep learning with physics-based loss functions ensuring adherence to conservation laws.
-: Example: GNN constrained by macroscopic traffic flow dynamics (LWR model).
-: Performance gain: 5–10% especially in data-scarce scenarios.
-: Additional benefit: Improved generalization and physical plausibility.

Type 3: Pipeline Hybrids This category involves sequential processing, in which the output of one model serves as the input for the other. It includes multistage architectures with distinct models that are dedicated to different subtasks. A common configuration is as follows: feature extraction is performed using Convolutional Neural Networks (CNNs) or Graph Neural Networks (GNNs), followed by temporal modeling with Long Short-Term Memory (LSTM) networks, and finally, refinement through attention mechanisms. For instance, a CNN can extract spatial patterns, an LSTM can model the temporal evolution, and attention mechanisms can refine the predictions. This approach yields a performance improvement of 6–12% owing to the specialized processing at each stage. Additionally, each stage offers the flexibility to be independently optimized.

Type 4: Multimodal Hybrids Diverse data modalities are integrated using distinct model architectures.

-: A large language model (LLM) processes textual data, including incidents, weather descriptions, and social media content.
-: Graph Neural Networks (GNNs) or Long Short-Term Memory (LSTM) networks process sensor time-series data.
-: A fusion module combines multimodal embeddings.
-: An example includes the use of BERT for text processing, Graph Convolutional Networks (GCNs) for spatial–temporal sensor data, and multimodal attention fusion.
-: Performance improvements range from 10 to 15%, particularly in non-recurrent congestion scenarios.
-: This approach uniquely leverages complementary information from heterogeneous sources.

Performance–Cost Trade-offs Across Hybrid Types:

Design Principles for Hybrid Approaches: Our analysis identified three fundamental principles for the effective design of hybrid models.

1.: Complementarity: The combined models should address distinct aspects, such as spatial versus temporal, data-driven versus physics-based, and numeric versus textual dimensions.
2.: Proportional Complexity: The performance improvements should justify the additional computational costs and the complexity of implementation.
3.: Proper Integration: The fusion mechanism is critical to performance, with learned fusion (e.g., attention mechanisms) outperforming simple averaging by 3–5%.

Future hybrid architectures are anticipated to prioritize the efficient integration of multimodal data, with particular emphasis on the combination of large language models (LLMs) and Graph Neural Networks (GNNs) to effectively leverage both textual context and network structure.

7.6. Micro-Scale Analysis: Intersection and Crossing-Level Predictions

While the majority of reviewed studies focused on network- or corridor-level predictions (78 of 100 studies), emerging research addresses micro-scale phenomena at intersections and pedestrian crossings, representing a critical gap in the literature that warrants detailed examination.

7.6.1. Intersection-Level Traffic Modeling

Micro-scale prediction at signalized intersections requires fundamentally different approaches than network-level forecasting. Kamal and Farooq [3] employed double/debiased machine learning to investigate the causal effect of traffic density on pedestrian crossing behavior, demonstrating that increased vehicular traffic significantly impacts pedestrian waiting times and stress levels at urban crossings. Their analysis revealed that incorporating pedestrian behavioral responses to traffic conditions provides more accurate modeling of intersection dynamics compared to traditional approaches that treat pedestrian and vehicular flows independently.

Li et al. [4] proposed a deep reinforcement learning-powered control system for managing mixed traffic involving connected autonomous vehicles (CAVs) and human-driven vehicles (HVs) at signalized intersections. Their adaptive signal control strategy combined with efficient CAV coordination policies demonstrated significant improvements in operational efficiency while maintaining safety requirements, particularly under varying CAV penetration rates. This approach highlights how micro-scale modeling with reinforcement learning enables more nuanced congestion mitigation compared to traditional corridor-level traffic management systems.

7.6.2. Mixed-Traffic Micro-Environments

Mixed-traffic scenarios, in which vehicles, pedestrians, and cyclists interact, pose unique forecasting challenges. Wang et al. [5] developed multi-agent deep learning frameworks for predicting vehicle–pedestrian–cyclist interactions at urban intersections, achieving MAPE values of 6.8% for 5 min prediction horizons. The key findings are as follows:

Agent-based modeling shows 12–18% accuracy improvements over aggregate approaches.
Multimodal interactions (vehicle–pedestrian–cyclist) require explicit modeling; ignoring pedestrians reduces accuracy by 8–15%.
Geometric configuration (crossing width, signal timing, lane arrangements) significantly influences prediction complexity.
Real-time computational requirements increase 3–5× compared to vehicle-only predictions due to higher resolution demands.

7.6.3. Data and Methodological Requirements

Micro-scale prediction imposes distinct data and computational requirements compared with network-level forecasting.

Temporal Resolution: Micro-scale analyses require 1–5 s intervals versus 5–15 min for network-level predictions. This 60–180x increase in data granularity presents substantial storage and computational challenges.

Spatial Granularity: Lane-level versus link-level modeling necessitates detailed geometric data, including lane widths, intersection geometry, crossing locations, and signal head positions. Only 12 of the 100 studies reviewed incorporated this level of detail.

Sensor Infrastructure: Successful micro-scale prediction typically requires the following:

High-resolution video analytics or LiDAR for pedestrian/cyclist detection.
Lane-level inductive loop detectors or equivalent.
Signal phase and timing (SPaT) data integration.
Weather sensors for visibility and surface condition monitoring.

Algorithmic Approaches: Agent-based and microscopic simulation models dominate micro-scale applications (9 of 14 micro-scale studies). Graph Neural Networks show particular promise for capturing fine-grained spatial interactions, achieving 8–12% higher accuracy than CNN-LSTM approaches at the intersection scale.

7.6.4. Research Gaps and Opportunities

The severe underrepresentation of micro-scale studies (14%) reveals critical research opportunities:

1.: Pedestrian–Vehicle Interaction Modeling: Only 8 studies explicitly model pedestrian impacts on vehicular congestion, despite increasing urbanization and pedestrian-priority policies in many cities.
2.: Transfer Learning for Intersections: Cross-intersection transfer learning remains unexplored; each intersection is typically modeled independently despite geometric and operational similarities.
3.: Computational Efficiency: Real-time micro-scale prediction faces severe computational constraints; model compression and edge computing approaches are needed.
4.: Data Fusion: Integrating video analytics, traditional sensors, and V2X communications for comprehensive micro-scale modeling represents a significant technical challenge.
5.: Multimodal Equity: Most studies focus on vehicle throughput optimization; pedestrian and cyclist delay minimization remains underexplored, raising equity concerns

This micro-scale research gap represents a significant opportunity for advancing traffic congestion forecasting, particularly as cities worldwide implement pedestrian-priority and complete street policies that fundamentally alter the dynamics of intersections.

7.7. Environmental and Sustainability Considerations

Although enhanced traffic forecasting can mitigate the emissions associated with congestion, the computational requirements of sophisticated AI models contribute to their environmental impacts. This section explores the energy consumption and carbon implications of various forecasting methodologies and evaluates their associated costs and benefits. We estimated the energy consumption across methodological categories based on the reported hardware specifications, training times, and inference requirements from the reviewed studies (Table 23):

To contextualize these numbers, we compared them with congestion-related emissions. Emission Benefits from Congestion Reduction:

Average US city (1M residents): 500,000 tons CO₂e/year from traffic congestion
Studies show 8–22% congestion reduction with optimized forecasting systems
Potential savings: 40,000–110,000 tons CO₂e/year

Net Environmental Impact (Table 24):

Although advanced AI models require substantial computational resources, their environmental impact remains minimal (<0.05%) compared to the benefits derived from reducing traffic congestion. Nonetheless, sustainable deployment practices, such as right-sizing models, utilizing renewable energy, and leveraging transfer learning, can diminish the carbon footprint by 60–85% without compromising performance. Emerging research has also explored the bidirectional relationships between traffic congestion and environmental factors, with studies demonstrating that air pollution data can serve as a valuable proxy for traffic forecasting in urban environments, particularly in regions with limited traditional traffic sensing infrastructure [125]. As traffic forecasting AI becomes increasingly prevalent worldwide, prioritizing energy efficiency alongside accuracy will become progressively important. The research community should adopt “Green AI” principles and report energy consumption alongside traditional performance metrics to guide the selection of sustainable technologies.

7.8. Policy Implications and Real-World Case Studies

Our systematic review revealed significant gaps between academic research achievements and real-world applications. This section examines three exemplary implementations (Table 25) that demonstrate the successful translation of forecasting research into operational systems, along with the derived policy recommendations.

7.8.1. Synthesized Policy Recommendations

Based on these case studies and our systematic review, we propose five evidence-based policy recommendations.

1. Tiered Implementation Strategy

Tier 1 (Foundational): Traditional ML for baseline systems on low-traffic corridors (75–85% accuracy, low cost, rapid deployment).
Tier 2 (Enhanced): Deep learning for high-traffic urban corridors (85–92% accuracy, moderate cost).
Tier 3 (Advanced): LLM-based systems for complex urban networks with multiple event types (90–95% accuracy, high cost, justified by complexity).

2. Data Infrastructure Investment Priorities

Minimum 5 min temporal resolution required for effective predictions.
Prioritize sensor density and coverage over sensor sophistication.
Establish data-sharing agreements with navigation service providers (Waze, Google Maps).
Implement standardized data formats to enable model portability.

3. Public-Private Partnership Framework

Industry possesses deployment expertise and operational knowledge.
Academia provides algorithmic innovation and theoretical advancement.
Current 3% industry-academia collaboration rate is inadequate.
Recommendation: Establish 50–50 cost-sharing programs for pilot deployments.
Include success metrics tied to real-world performance, not just prediction accuracy.

4. Regulatory and Performance Standards

Establish minimum accuracy thresholds: 85% for operational systems, 90% for safety-critical applications.
Maximum latency requirements: <5 s for real-time applications, <30 s for planning applications.
Mandatory interpretability requirements for systems influencing traffic management decisions.
Regular third-party auditing of system performance in production environments.

5. Equity and Accessibility Mandates

In total, 78% of reviewed studies focus on major metropolitan areas.
Mandate minimum research investment for medium-sized cities and rural corridors.
Require multimodal equity analysis (pedestrian, cyclist, transit impacts alongside vehicle throughput).
Establish accessibility standards ensuring benefits reach disadvantaged communities disproportionately affected by congestion.
Fund open-source model development enabling resource-constrained jurisdictions to benefit from advanced methods.

7.8.2. Implementation Barriers and Mitigation Strategies

Real-world implementation encounters challenges that are not typically present in research settings, such as organizational resistance, as transportation agencies often lack expertise in artificial intelligence (AI) and machine learning (ML). To address this, the establishment of national AI centers of excellence is recommended, which would provide technical assistance, training programs, and reference implementations. Another challenge is procurement, as traditional procurement processes are ill-suited for AI systems that require continuous updates. To mitigate this, the development of “AI-as-a-Service” procurement frameworks with performance-based contracts is suggested for future studies. Additionally, liability concerns arise because of unclear legal frameworks for AI-driven traffic management decisions. To address this, it is essential to establish clear human oversight requirements and liability allocation frameworks that distinguish between system recommendations and human decision making. Data privacy is also significant, with public concerns about surveillance and tracking. To mitigate this, the implementation of privacy-by-design principles, differential privacy techniques, and transparent data governance frameworks with public oversight is recommended. These case studies and recommendations offer actionable pathways for translating academic research into operational systems that deliver measurable public benefits.

7.9. Recommendations and Future Directions

Following a thorough analysis conducted in this systematic review, we offer the following recommendations for researchers specializing in traffic congestion forecasting:

7.9.1. Methodological Recommendations

Methodological recommendations include adopting a hybrid approach, wherein researchers should prioritize developing hybrid models that integrate the strengths of various techniques. This involves combining the contextual understanding capabilities of large language models (LLMs) with the spatiotemporal modeling capabilities of deep learning methods. Additionally, domain-specific pretraining is advised, which entails developing traffic-specific pretrained language models using transportation corpora to enhance the domain relevance of LLM-based methods for traffic flow prediction. Furthermore, computational optimization is crucial, necessitating investment in model compression techniques and efficient inference methods to render advanced models viable for real-time applications with limited computational resources and time constraints. The integration of XAI is also recommended, incorporating explainability techniques into complex models to strengthen stakeholder trust and system validation capabilities. Finally, the development of multimodal fusion frameworks is essential, involving the creation of standardized frameworks for integrating heterogeneous data sources, including numerical sensor data, text information, and visual inputs.

7.9.2. Data-Related Recommendations

Five critical research directions must be prioritized to enhance the capabilities of traffic congestion forecasting. First, comprehensive benchmark datasets encompassing diverse traffic scenarios, geographical variations, and challenging cases should be established. Such standardized datasets will facilitate equitable model comparisons and accelerate methodological advancements. Second, the development of traffic-specific data augmentation techniques requires focused attention to address the persistent challenges of sparse or geographically limited training data sets. These specialized augmentation approaches should maintain the unique spatiotemporal characteristics of traffic patterns while enhancing model generalizability. Third, immediate attention must be paid to privacy-preserving techniques for AI models to protect the patient data. The implementation of federated learning and differential privacy techniques would enable valuable data sharing across jurisdictions while safeguarding sensitive information, which is critical for the widespread adoption of these techniques. Fourth, the field would benefit significantly from the adoption of standardized evaluation metrics. Establishing a consistent measurement framework would enable more meaningful comparisons between modeling approaches and provide clearer evidence of genuine methodological advancement. Finally, a transition from simulation-based validation to real-world deployment is required. Prioritizing operational implementation reveals practical challenges that are not apparent in controlled environments and generates crucial insights into model refinement and its practical utility. Collectively, these research priorities address the fundamental challenges that currently limit the practical application of advanced traffic forecasting methods.

7.9.3. Application-Specific Recommendations

Five strategic implementation approaches can substantially enhance the practical utility of traffic congestion forecasting systems. Resource-aware model selection is a critical initial step that requires practitioners to align model complexity with available computational resources and specific application requirements. This balanced approach acknowledges the inherent trade-offs between predictive accuracy and operational efficiency, thereby ensuring sustainable deployment across diverse infrastructure environments. Transfer learning is a promising solution for regions with limited historical data. By leveraging knowledge structures from data-rich environments, advanced models can attain reasonable performance even in areas lacking extensive training datasets, effectively democratizing access to sophisticated forecasting capabilities. Multi-horizon prediction necessitates attention through ensemble methodologies that integrate specialized models optimized for various time ranges. Such composite approaches provide comprehensive forecasting capabilities across immediate, short-term, and extended time horizons, addressing the diverse planning needs of traffic management authorities. Uncertainty quantification must be embedded within predictive frameworks to support robust decision-making. By providing confidence intervals alongside point predictions, these systems enable traffic managers to appropriately weigh forecasts in their operational decisions, particularly in unusual or rapidly evolving traffic conditions. Finally, integrated system development represents the ultimate implementation goal of combining accurate congestion forecasting with actionable and effective strategies for traffic management. These end-to-end solutions translate predictive insights into practical interventions, thereby maximizing the real-world impact of traffic accident outcome forecasting. Collectively, these implementation strategies offer a pragmatic roadmap for transitioning advanced traffic-forecasting methods from research environments to operational systems.

7.10. Limitations of This Review

This systematic review had several limitations that should be considered when interpreting our findings. Regarding publication bias, our inclusion criteria, which focused solely on peer-reviewed articles, may have excluded significant industry implementation and grey literature. The 87% concentration in Q1-ranked journals may disproportionately represent successful results while underrepresenting negative findings or failed approaches commonly encountered in practice. Regarding language limitations, the English-only search strategy may overlook relevant non-English publications, particularly from the Chinese, Japanese, and Korean research communities, where substantial research on intelligent transportation systems has been conducted. This may have introduced geographic and methodological biases in our study. In terms of temporal currency, the knowledge cutoff in January 2024 implies that rapidly evolving LLM applications from late 2024 may be underrepresented. Given the typical 6–12 month publication lag in the field, the most recent innovations were not included in this analysis. Regarding metric heterogeneity, despite our harmonization efforts, converting between different metric types (e.g., RMSE to MAPE, classification to regression) introduced approximation errors of 10–15% in some cases. The reported performance ranges should be interpreted considering this uncertainty. Many studies lack sufficient implementation details (such as hyperparameters, training procedures, and hardware specifications) for full reproducibility. Our computational cost estimates rely on typical configurations and may not reflect all deployment scenarios or recent hardware optimizations. Regarding geographic bias, with 73% of studies originating from Asia and North America, generalizability to European, Latin American, African, and Middle Eastern contexts—each with distinct traffic patterns, infrastructure characteristics, and regulatory environments—remains limited. From a practitioner’s perspective, limited industry collaboration (3%) means that this review reflects academic research priorities rather than the operational deployment challenges faced by transportation agencies in the field. The gap between the theoretical performance and real-world implementation may be larger than that suggested by this analysis. In terms of methodological evolution, the rapid pace of methodological innovation, particularly in LLM applications, means that techniques may become obsolete before publication. Comparative conclusions drawn from papers spanning 2014–2024 may not accurately reflect the current capabilities of earlier approaches that have been optimized. Despite these limitations, this review provides the most comprehensive synthesis to date of traffic congestion forecasting methodologies spanning three technological eras, offering valuable insights for researchers and practitioners in the intelligent transportation system domain.

8. Conclusions

This systematic literature review encompasses 100 peer-reviewed publications from 2014 to 2024 and offers a comprehensive analysis of the evolution of traffic congestion forecasting, tracing its development from traditional machine learning to deep learning and the integration of large language models (LLMs). Our principal findings reveal a distinct technological progression: traditional machine learning, with an accuracy range of 75–85%, laid the foundational groundwork; deep learning, achieving 85–92% accuracy, effectively captured spatial–temporal dependencies; and LLM-based approaches, with an accuracy of 90–95%, facilitated multimodal integration and contextual understanding, albeit with a 50–100× increase in computational cost.

Our quantitative analysis across the three technological eras demonstrates substantial methodological advances. Traditional machine learning methods (2014–2017) achieved MAPE values typically ranging from 12 to 18% for short-term predictions, with Support Vector Machines and Random Forests representing the dominant approaches. The deep learning revolution (2018–2020) reduced the MAPE to 6–10% through the adoption of LSTM, GRU, and CNN architectures, with Graph Neural Networks emerging as particularly effective for spatial modeling. Notably, the T-GCN model demonstrated RMSE improvements of 15–20% compared to traditional RNN approaches, while attention mechanisms provided additional 5–10% accuracy enhancements. The LLM integration phase (2021–2024) further enhanced performance, particularly excelling in non-recurrent congestion scenarios and multimodal data integration, achieving MAPE values as low as 4–7% in optimal conditions.

Graph Neural Networks have emerged as the prevailing framework for spatial modeling, incorporated in 62% of deep learning and LLM-era studies. The dominance of GCN-based architectures reflects their superior ability to capture network-wide dependencies and complex spatial relationships inherent in road networks. The PeMS dataset family dominated empirical evaluations, appearing in 58% of the studies, followed by METR-LA (23%) and proprietary datasets (19%); Studies utilizing multi-source data demonstrated 8–15% accuracy improvements compared to sensor-only approaches, validating the importance of contextual data integration for prediction performance.

Empirical evidence suggests that hybrid methodologies, which integrate large language models (LLMs) with specialized deep learning architectures, achieve superior performance, with accuracy rates ranging from 92% to 96%. These approaches also offer practical flexibility in deployment, combining the efficiency of traditional methods with the contextual capabilities of large LLMs. Nonetheless, the trade-offs between performance and cost remain significant: while LLM-based systems yield an accuracy improvement of 10% to 15%, they necessitate considerably greater computational resources than traditional systems. Deep learning approaches occupy an intermediate position at 10–25× the traditional ML costs, offering an advantageous balance between accuracy improvements and resource requirements. The feasibility of LLM deployment is highly contingent on specific application contexts, with real-time highway management favoring lightweight models and strategic urban planning accommodating more sophisticated architectures.

Prediction horizon analysis revealed distinct performance characteristics for each methodology. Short-term predictions (0–30 min) achieved the highest accuracy across all approaches, with deep learning maintaining MAPE values below 8%. Medium-term predictions (30–120 min) showed increasing accuracy divergence, with LLM approaches outperforming traditional methods by 12–18% through superior contextual understanding. Long-term predictions exceeding 2 h remained challenging for all methodologies, though LLMs demonstrated enhanced robustness in scenarios involving scheduled events or predictable traffic pattern disruptions.

Our systematic analysis identified several critical gaps that require further attention. The heterogeneity of the datasets compromises both comparability and reproducibility, with only 42% of the reviewed studies providing sufficient implementation details for replication. The micro-scale dynamics at intersections and pedestrian crossings remain insufficiently explored, with only 22% of studies addressing these critical urban mobility components despite their substantial impact on congestion formation. The sustainability of computational processes and their environmental impacts require greater consideration, particularly given the 50–100× increase in energy consumption associated with LLM implementations. Furthermore, there is a lack of practical implementation guidance for diverse operational contexts, with limited evidence of real-world deployment experiences and the effectiveness of transfer learning across different geographic settings. Models trained in one city demonstrate substantial performance degradation (15–25% accuracy drops) when applied to different contexts, indicating persistent challenges in generalizability.

This review provides practitioners with evidence-based frameworks for model selection, considering operational constraints, such as computational budget, latency requirements, data availability, and interpretability needs. It also considers infrastructure contexts, including urban networks, highways, and mixed environments, as well as deployment scenarios ranging from short-term tactical, medium-term strategic, to long-term plans. Traditional machine learning remains optimal for deployments with limited resources and relaxed accuracy requirements, achieving 82–88% accuracy at minimal computational cost. Deep learning offers a balanced approach for most applications, providing 85–92% accuracy with moderate resource demands. In contrast, LLM-based and hybrid approaches are most appropriate for high-accuracy applications requiring sophisticated contextual understanding and multimodal integration, particularly in scenarios involving non-recurrent congestion or event-driven traffic disruptions.

Future research should prioritize these directions. Transfer learning frameworks must be developed to address cross-city generalization challenges, enabling knowledge transfer from data-rich to data-limited environments. Real-time processing capabilities require optimization through model distillation, pruning, and edge computing architectures to meet sub-second inference requirements for traffic-management applications. Explainability techniques must be integrated into high-performance models to meet regulatory transparency requirements and operational acceptance criteria. Robust prediction frameworks that maintain performance under missing or noisy sensor data conditions merit continued investigation, as 66% of the reviewed studies cited data quality as a significant limitation of their models. The integration of micro-scale predictions for intersection and pedestrian crossing dynamics represents an underexplored research direction with substantial practical relevance for comprehensive traffic management systems. Additionally, the explicit integration of sustainable mobility alternatives, including cycling and micro-mobility options, into traffic prediction frameworks warrants increased attention, recognizing that modal shifts substantially influence congestion patterns and prediction requirements.

The progression from traditional machine learning to large language models in traffic congestion forecasting represents a transformative technological evolution that offers substantial potential for the advancement of intelligent transportation systems. Although LLMs demonstrate superior performance in terms of accuracy, contextual understanding, and multimodal integration, practical deployment necessitates careful consideration of computational constraints, real-time requirements, and application-specific needs. This systematic review provides researchers and practitioners with comprehensive guidance for navigating these trade-offs, ultimately contributing to the development of more efficient, sustainable, and intelligent urban mobility solutions. The evidence-based insights derived from 100 carefully selected studies establish a robust foundation for understanding current capabilities, identifying critical implementation challenges, and charting strategic directions for future research in this rapidly evolving field of study.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/vehicles7040142/s1.

Author Contributions

Conceptualization, M.A. and M.L.; methodology, M.A. and M.L.; formal analysis, M.A.; investigation, M.A.; data curation, M.A.; writing—original draft preparation, M.A.; writing—review and editing, M.A. and M.L.; visualization, M.A.; supervision, M.L.; project administration, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable for this systematic literature review study.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting this systematic review are available from the corresponding author upon reasonable request. Complete search strategies, screening decisions, data extraction spreadsheets, quality assessment criteria, and analysis code are available upon request to mehdi.attioui@enscasa.ma.

Acknowledgments

The authors acknowledge the anonymous reviewers whose constructive feedback significantly improved the quality and clarity of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Abbreviation	Definition
AI	Artificial Intelligence
ANN	Artificial Neural Network
ARIMA	Autoregressive Integrated Moving Average
BERT	Bidirectional Encoder Representations from Transformers
CNN	Convolutional Neural Network
DL	Deep Learning
GCN	Graph Convolutional Network
GNN	Graph Neural Network
GPT	Generative Pretrained Transformer
GRU	Gated Recurrent Unit
ITS	Intelligent Transportation Systems
K-NN	K-Nearest Neighbors
LLM	Large Language Model
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
METR-LA	Los Angeles Metropolitan Traffic
ML	Machine Learning
PeMS	California Performance Measurement System
PICO	Population, Intervention, Comparison, Outcome
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
RF	Random Forest
RMSE	Root Mean Square Error
RNN	Recurrent Neural Network
SVM	Support Vector Machine
XGBoost	Extreme Gradient Boosting

References

Pishue, B.; Kidd, J. 2024 INRIX Global Traffic Scorecard; Technical Report; INRIX: Kirkland, WA, USA, 2025. [Google Scholar]
Macioszek, E.; Jurdana, I. Bicycle Traffic in the Cities. Sci. J. Silesian Univ. Technol. Ser. Transp. 2022, 117, 115–127. [Google Scholar] [CrossRef]
Kamal, K.; Farooq, B. Debiased Machine Learning for Estimating the Causal Effect of Urban Traffic on Pedestrian Crossing Behavior. Transp. Res. Rec. 2023, 2677, 748–767. [Google Scholar] [CrossRef]
Li, D.; Zhu, F.; Wu, J.; Wong, Y.D.; Chen, T. Managing Mixed Traffic at Signalized Intersections: An Adaptive Signal Control and CAV Coordination System Based on Deep Reinforcement Learning. Expert Syst. Appl. 2024, 238, 121959. [Google Scholar] [CrossRef]
Wang, J.; Chen, L.; Zhang, H. Multi-Agent Deep Learning for Mixed-Traffic Microscale Prediction in Urban Environments. Transp. Res. Part B 2022, 156, 228–247. [Google Scholar]
Medina-Salgado, B.; Sánchez-DelaCruz, E.; Pozos-Parra, P.; Sierra, J.E. Urban Traffic Flow Prediction Techniques: A Review. Sustain. Comput. Inform. Syst. 2022, 35, 100739. [Google Scholar] [CrossRef]
Shaygan, M.; Meese, C.; Li, W.; Zhao, X.; Nejad, M. Traffic Prediction Using Artificial Intelligence: Review of Recent Advances and Emerging Opportunities. Transp. Res. Part C Emerg. Technol. 2022, 145, 103921. [Google Scholar] [CrossRef]
Sayed, S.A.; Abdel-Hamid, Y.; Hefny, H.A. Artificial Intelligence-Based Traffic Flow Prediction: A Comprehensive Review. J. Electr. Syst. Inf. Technol. 2023, 10, 13. [Google Scholar] [CrossRef]
Attioui, M.; Lahby, M. Congestion Forecasting Using Machine Learning Techniques: A Systematic Review. Future Transp. 2025, 5, 76. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
Tayan, O.; Al BinAli, A.M.; Kabir, M.N. Analytical and Computer Modeling of Transportation Systems for Traffic Bottleneck Resolution: A Hajj Case Study. Arab. J. Sci. Eng. 2014, 39, 7013–7037. [Google Scholar] [CrossRef]
Odat, E.; Shamma, J.S.; Claudel, C. Vehicle Classification and Speed Estimation Using Combined Passive Infrared/Ultrasonic Sensors. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1593–1606. [Google Scholar] [CrossRef]
Huang, W.; Song, G.; Hong, H.; Xie, K. Deep Architecture for Traffic Flow Prediction: Deep Belief Networks With Multitask Learning. IEEE Trans. Intell. Transp. Syst. 2014, 15, 2191–2201. [Google Scholar] [CrossRef]
Chen, P.; Chen, F.; Qian, Z. Road Traffic Congestion Monitoring in Social Media with Hinge-Loss Markov Random Fields. In Proceedings of the 2014 IEEE International Conference on Data Mining, Shenzhen, China, 14–17 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 80–89. [Google Scholar]
Lin, G.; Lin, A.; Gu, D. Using Support Vector Regression and K-nearest Neighbors for Short-Term Traffic Flow Prediction Based on Maximal Information Coefficient. Inf. Sci. 2022, 608, 517–531. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef]
Cao, M.; Li, V.O.K.; Chan, V.W.S. A CNN-LSTM Model for Traffic Speed Prediction. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
Jiang, M.; Chen, W.; Li, X. S-GCN-GRU-NN: A Novel Hybrid Model by Combining a Spatiotemporal Graph Convolutional Network and a Gated Recurrent Units Neural Network for Short-Term Traffic Speed Forecasting. J. Data Inf. Manag. 2021, 3, 1–20. [Google Scholar] [CrossRef]
Reza, S.; Ferreira, M.C.; Machado, J.J.M.; Tavares, J.M.R.S. A Multi-Head Attention-Based Transformer Model for Traffic Flow Forecasting with a Comparative Analysis to Recurrent Neural Networks. Expert Syst. Appl. 2022, 202, 117275. [Google Scholar] [CrossRef]
Zhang, H.; Zou, Y.; Yang, X.; Yang, H. A Temporal Fusion Transformer for Short-Term Freeway Traffic Speed Multistep Prediction. Neurocomputing 2022, 500, 329–340. [Google Scholar] [CrossRef]
Jin, K.; Wi, J.; Lee, E.; Kang, S.; Kim, S.; Kim, Y. TrafficBERT: Pretrained Model with Large-Scale Data for Long-Range Traffic Flow Forecasting. Expert Syst. Appl. 2021, 186, 115738. [Google Scholar] [CrossRef]
Jia, F.; Wang, K.; Zheng, Y.; Cao, D.; Liu, Y. GPT4MTS: Prompt-based Large Language Model for Multimodal Time-series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; AAAI: Washington, DC, USA, 2024. [Google Scholar]
Liu, C.; Yang, S.; Xu, Q.; Li, Z.; Long, C.; Li, Z.; Zhao, R. Spatial-Temporal Large Language Model for Traffic Prediction. In Proceedings of the International Conference on Mobile Data Management, Brussels, Belgium, 24–27 June 2024; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
Su, J.; Jiang, C.; Jin, X.; Qiao, Y.; Xiao, T.; Ma, H.; Wei, R.; Jing, Z.; Xu, J.; Lin, J. Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review. arXiv 2024, arXiv:2402.10350. [Google Scholar] [CrossRef]
Liang, Y.; Liu, Y.; Wang, X.; Zhao, Z. Exploring Large Language Models for Human Mobility Prediction under Public Events. Comput. Environ. Urban Syst. 2024, 112, 102153. [Google Scholar] [CrossRef]
Ren, Y.; Chen, Y.; Liu, S.; Wang, B.; Yu, H.; Cui, Z. TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models. arXiv 2024, arXiv:2403.02221. [Google Scholar] [CrossRef]
Modi, Y.; Teli, R.; Mehta, A.; Shah, K.; Shah, M. A Comprehensive Review on Intelligent Traffic Management Using Machine Learning Algorithms. Innov. Infrastruct. Solut. 2021, 7, 128. [Google Scholar] [CrossRef]
Akhtar, M.; Moridpour, S. A Review of Traffic Congestion Prediction Using Artificial Intelligence. J. Adv. Transp. 2021, 2021, e8878011. [Google Scholar] [CrossRef]
Kashyap, A.A.; Raviraj, S.; Devarakonda, A.; Nayak K, S.R.; K V, S.; Bhat, S.J. Traffic Flow Prediction Models—A Review of Deep Learning Techniques. Cogent Eng. 2022, 9, 2010510. [Google Scholar] [CrossRef]
Gomes, B.; Coelho, J.; Aidos, H. A Survey on Traffic Flow Prediction and Classification. Intell. Syst. Appl. 2023, 19, 200268. [Google Scholar] [CrossRef]
Zhang, Z.; Sun, Y.; Wang, Z.; Nie, Y.; Ma, X.; Sun, P.; Li, R. Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks. arXiv 2024, arXiv:2405.02357. [Google Scholar] [CrossRef]
Peng, M.; Chen, K.; Guo, X.; Zhang, Q.; Lu, H.; Zhong, H.; Chen, D.; Zhu, M.; Yang, H. Diffusion Models for Intelligent Transportation Systems: A Survey. arXiv 2024, arXiv:2409.15816. [Google Scholar] [CrossRef]
Mahmud, D.; Hajmohamed, H.; Almentheri, S.; Alqaydi, S.; Aldhaheri, L.; Khalil, R.A.; Saeed, N. Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions. arXiv 2025, arXiv:2501.04437. [Google Scholar] [CrossRef]
Koesdwiady, A.; Soua, R.; Karray, F. Improving Traffic Flow Prediction With Weather Information in Connected Cars: A Deep Learning Approach. IEEE Trans. Veh. Technol. 2016, 65, 9508–9517. [Google Scholar] [CrossRef]
Essien, A.; Petrounias, I.; Sampaio, P.; Sampaio, S. A Deep-Learning Model for Urban Traffic Flow Prediction with Traffic Events Mined from Twitter. World Wide Web 2021, 24, 1345–1368. [Google Scholar] [CrossRef]
Yang, X.; Bekoulis, G.; Deligiannis, N. Traffic Event Detection as a Slot Filling Problem. Eng. Appl. Artif. Intell. 2023, 123, 106202. [Google Scholar] [CrossRef]
Kitchenham, B. Guidelines for Performing Systematic Literature Reviews in Software Engineering; Technical Report; Keele University and Durham University: Keele, UK, 2007. [Google Scholar]
Kitchenham, B. Procedures for Performing Systematic Reviews; Technical Report; Keele University: Keele, UK, 2004. [Google Scholar]
Wieringa, R.; Maiden, N.; Mead, N.; Rolland, C. Requirements Engineering Paper Classification and Evaluation Criteria: A Proposal and a Discussion. Requir. Eng. 2006, 11, 102–107. [Google Scholar] [CrossRef]
Gaamouche, R.; Chinnici, M.; Lahby, M.; Abakarim, Y.; Hasnaoui, A.E.E. Machine Learning Techniques for Renewable Energy Forecasting: A Comprehensive Review. In Renewable Energy Systems; Springer: Berlin/Heidelberg, Germany, 2022; pp. 3–39. [Google Scholar] [CrossRef]
Lahby, M.; Aqil, S.; Yafooz, W.M.S.; Abakarim, Y. Online Fake News Detection Using Machine Learning Techniques: A Systematic Mapping Study. Stud. Comput. Intell. 2022, 1001, 3–37. [Google Scholar]
Attioui, M.; Lahby, M. Deep Learning-Based Congestion Forecasting: A Literature Review and Future. In Proceedings of the 10th International Conference on Wireless Networks and Mobile Communications, WINCOM 2023, Istanbul, Turkey, 26–28 October 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Aljuaydi, F.; Wiwatanapataphee, B.; Wu, Y.H. Multivariate Machine Learning-Based Prediction Models of Freeway Traffic Flow under Non-recurrent Events. Alex. Eng. J. 2022, 64, 663–673. [Google Scholar] [CrossRef]
Pang, A.; Wang, M.; Pun, M.; Chen, C.S.; Xiong, X. ILLM-TSC: Integration Reinforcement Learning and Large Language Model for Traffic Signal Control Policy Improvement. arXiv 2024, arXiv:2407.06025. [Google Scholar]
Bayoudh, K.; Hamdaoui, F.; Mtibaa, A. Transfer Learning Based Hybrid 2D-3D CNN for Traffic Sign Recognition and Semantic Road Detection Applied in Advanced Driver Assistance Systems. Appl. Intell. 2021, 51, 124–142. [Google Scholar] [CrossRef]
Boukerche, A.; Tao, Y.; Sun, P. Artificial Intelligence-Based Vehicular Traffic Flow Prediction Methods for Supporting Intelligent Transportation Systems. Comput. Netw. 2020, 182, 107484. [Google Scholar] [CrossRef]
Bouyahia, Z.; Haddad, H.; Jabeur, N.; Yasar, A. A Two-Stage Road Traffic Congestion Prediction and Resource Dispatching towards a Self-Organizing Traffic Control System. Pers. Ubiquitous Comput. 2019, 23, 909–920. [Google Scholar] [CrossRef]
Brincat, A.A.; Pacifici, F.; Martinaglia, S.; Mazzola, F. The Internet of Things for Intelligent Transportation Systems in Real Smart Cities Scenarios. In Proceedings of the 2019 IEEE 5th World Forum on Internet of Things (WF-IoT), Limerick, Ireland, 15–18 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 128–132. [Google Scholar]
Cheng, X.; Zhang, R.; Zhou, J.; Xu, W. DeepTransport: Learning Spatial-Temporal Dependency for Traffic Condition Forecasting. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–8. [Google Scholar]
Chen, M.; Yu, X.; Liu, Y. PCNN: Deep Convolutional Networks for Short-Term Traffic Congestion Prediction. IEEE Trans. Intell. Transp. Syst. 2018, 19, 3550–3559. [Google Scholar] [CrossRef]
Chen, G.; Zhang, J. Applying Artificial Intelligence and Deep Belief Network to Predict Traffic Congestion Evacuation Performance in Smart Cities. Appl. Soft Comput. 2022, 121, 108692. [Google Scholar] [CrossRef]
Chikaraishi, M.; Garg, P.; Varghese, V.; Yoshizoe, K.; Urata, J.; Shiomi, Y.; Watanabe, R. On the Possibility of Short-Term Traffic Prediction during Disaster with Machine Learning Approaches: An Exploratory Analysis. Transp. Policy 2020, 98, 91–104. [Google Scholar] [CrossRef]
Cui, Z.; Zhao, C. Dual-Stage Attention Based Spatio-Temporal Sequence Learning for Multistep Traffic Prediction. IFAC-PapersOnLine 2020, 53, 17035–17040. [Google Scholar]
Dell’Acqua, P.; Bellotti, F.; Berta, R.; De Gloria, A. Time-Aware Multivariate Nearest Neighbor Regression Methods for Traffic Flow Prediction. IEEE Trans. Intell. Transp. Syst. 2015, 16, 3393–3402. [Google Scholar] [CrossRef]
de Medrano, R.; Aznarte, J.L. A Spatio-Temporal Attention-Based Spot-Forecasting Framework for Urban Traffic Prediction. Appl. Soft Comput. 2020, 96, 106615. [Google Scholar] [CrossRef]
Du, S.; Li, T.; Yang, Y.; Gong, X.; Horng, S. An LSTM Based Encoder-Decoder Model for Multistep Traffic Flow Prediction. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
Eldowa, D.; Elgazzar, K.; Hassanein, H.S.; Sharaf, T.; Shah, S. Assessing the Integrity of Traffic Data through Short Term State Prediction. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Big Island, HI, USA, 9–13 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar]
Feng, S.; Wei, S.; Zhang, J.; Li, Y.; Ke, J.; Chen, G.; Zheng, Y.; Yang, H. A Macro-Micro Spatio-Temporal Neural Network for Traffic Prediction. Transp. Res. Part C Emerg. Technol. 2023, 156, 104331. [Google Scholar] [CrossRef]
Fiorini, S.; Pilotti, G.; Ciavotta, M.; Maurino, A. 3D-CLoST: A CNN-LSTM Approach for Mobility Dynamics Prediction in Smart Cities. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 3180–3189. [Google Scholar]
Govindan, K.; Ramalingam, S.; Broumi, S. Traffic Volume Prediction Using Intuitionistic Fuzzy Grey-Markov Model. Neural Comput. Appl. 2021, 33, 12905–12920. [Google Scholar] [CrossRef]
Guo, J.; Liu, Y.; Wang, Y.; Fang, S. GPS-based Citywide Traffic Congestion Forecasting Using CNN-RNN and C3D Hybrid Model. Transp. A Transp. Sci. 2021, 17, 190–211. [Google Scholar] [CrossRef]
Guerreiro, G.; Figueiras, P.; Silva, R.; Costa, R.; Jardim-Goncalves, R. An Architecture for Big Data Processing on Intelligent Transportation Systems. An Application Scenario on Highway Traffic Flows. In Proceedings of the 2016 IEEE 8th International Conference on Intelligent Systems (IS), Sofia, Bulgaria, 4–6 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 65–72. [Google Scholar]
Guo, X.; Zhang, Q.; Jiang, J.; Peng, M.; Zhu, M.; Yang, H.F. Towards Explainable Traffic Flow Prediction with Large Language Models. Commun. Transp. Res. 2024, 4, 100150. [Google Scholar] [CrossRef]
Harrou, F.; Zeroual, A.; Kadri, F.; Sun, Y. Enhancing Road Traffic Flow Prediction with Improved Deep Learning Using Wavelet Transforms. Results Eng. 2024, 23, 102342. [Google Scholar] [CrossRef]
Hassija, V.; Gupta, V.; Garg, S.; Chamola, V. Traffic Jam Probability Estimation Based on Blockchain and Deep Neural Networks. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3919–3928. [Google Scholar] [CrossRef]
Hou, Z.; Li, X. Repeatability and Similarity of Freeway Traffic Flow and Long-Term Prediction Under Big Data. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1786–1796. [Google Scholar] [CrossRef]
Huang, M. Intersection Traffic Flow Forecasting Based on ν-GSVR with a New Hybrid Evolutionary Algorithm. Neurocomputing 2015, 147, 343–349. [Google Scholar] [CrossRef]
Huang, X.; Ye, Y.; Ding, W.; Yang, X.; Xiong, L. Multi-mode Dynamic Residual Graph Convolution Network for Traffic Flow Prediction. Inf. Sci. 2022, 609, 548–564. [Google Scholar] [CrossRef]
Park, H.; Haghani, A.; Samuel, S.; Knodler, M.A. Real-Time Prediction and Avoidance of Secondary Crashes under Unexpected Traffic Congestion. Accid. Anal. Prev. 2018, 112, 39–49. [Google Scholar] [CrossRef]
Khan, S.; Nazir, S.; García-Magariño, I.; Hussain, A. Deep Learning-Based Urban Big Data Fusion in Smart Cities: Towards Traffic Monitoring and Flow-Preserving Fusion. Comput. Electr. Eng. 2021, 89, 106906. [Google Scholar] [CrossRef]
Li, F.; Gong, J.; Liang, Y.; Zhou, J. Real-Time Congestion Prediction for Urban Arterials Using Adaptive Data-Driven Methods. Multimed. Tools Appl. 2016, 75, 17573–17592. [Google Scholar] [CrossRef]
Li, G.; Knoop, V.L.; van Lint, H. Multistep Traffic Forecasting by Dynamic Graph Convolution: Interpretations of Real-Time Spatial Correlations. Transp. Res. Part C Emerg. Technol. 2021, 128, 103185. [Google Scholar] [CrossRef]
Li, J. Trajectory Prediction Learning Using Deep Generative Models. Master’s Thesis, York University, Toronto, ON, Canada, 2024. [Google Scholar]
Liu, Q.; Liu, T.; Cai, Y.; Xiong, X.; Jiang, H.; Wang, H.; Hu, Z. Explanatory Prediction of Traffic Congestion Propagation Mode: A Self-Attention Based Approach. Phys. A Stat. Mech. Its Appl. 2021, 573, 125940. [Google Scholar] [CrossRef]
Liu, Y.; Wu, C.; Wen, J.; Xiao, X.; Chen, Z. A Grey Convolutional Neural Network Model for Traffic Flow Prediction under Traffic Accidents. Neurocomputing 2022, 500, 761–775. [Google Scholar] [CrossRef]
Liu, Y.; Yu, J.J.Q.; Kang, J.; Niyato, D.; Zhang, S. Privacy-Preserving Traffic Flow Prediction: A Federated Learning Approach. IEEE Internet Things J. 2020, 7, 7751–7763. [Google Scholar] [CrossRef]
Lopes Gerum, P.C.; Benton, A.R.; Baykal-Gürsoy, M. Traffic Density on Corridors Subject to Incidents: Models for Long-Term Congestion Management. EURO J. Transp. Logist. 2019, 8, 795–831. [Google Scholar] [CrossRef]
Lopez-Garcia, P.; Onieva, E.; Osaba, E.; Masegosa, A.D.; Perallos, A. A Hybrid Method for Short-Term Traffic Congestion Forecasting Using Genetic Algorithms and Cross Entropy. IEEE Trans. Intell. Transp. Syst. 2016, 17, 557–569. [Google Scholar]
Majumdar, S.; Subhani, M.M.; Roullier, B.; Anjum, A.; Zhu, R. Congestion Prediction for Smart Sustainable Cities Using IoT and Machine Learning Approaches. Sustain. Cities Soc. 2021, 64, 102500. [Google Scholar] [CrossRef]
Wang, M.; Pang, A.; Kan, Y.; Pun, M.; Chen, C.S.; Huang, B. LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments. arXiv 2024, arXiv:2403.08337. [Google Scholar]
Ma, J.; Zhao, J.; Hou, Y. Spatial-Temporal Transformer Networks for Traffic Flow Forecasting Using a Pretrained Language Model. Sensors 2024, 24, 5502. [Google Scholar] [CrossRef]
Menguc, K.; Aydin, N.; Yilmaz, A. A Data Driven Approach to Forecasting Traffic Speed Classes Using Extreme Gradient Boosting Algorithm and Graph Theory. Phys. A Stat. Mech. Its Appl. 2023, 620, 128738. [Google Scholar]
Nallaperuma, D.; Nawaratne, R.; Bandaragoda, T.; Adikari, A.; Nguyen, S.; Kempitiya, T.; De Silva, D.; Alahakoon, D.; Pothuhera, D. Online Incremental Machine Learning Platform for Big Data-Driven Smart Traffic Management. IEEE Trans. Intell. Transp. Syst. 2019, 20, 4679–4690. [Google Scholar]
Nguyen, N.; Dao, M.; Zettsu, K. Complex Event Analysis for Traffic Risk Prediction Based on 3D-CNN with Multi-sources Urban Sensing Data. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1669–1674. [Google Scholar]
Oh, S.; Kim, Y.; Hong, J. Urban Traffic Flow Prediction System Using Multifactor Pattern Recognition Model. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2744–2755. [Google Scholar]
Zheng, O.; Abdel-Aty, M.; Wang, D.; Wang, Z.; Ding, S. ChatGPT is on the Horizon: Could a Large Language Model be Suitable for Intelligent Traffic Safety Research and Applications? arXiv 2023, arXiv:2303.05382. [Google Scholar] [CrossRef]
Polson, N.G.; Sokolov, V.O. Deep Learning for Short-Term Traffic Flow Prediction. Transp. Res. Part C Emerg. Technol. 2017, 79, 1–17. [Google Scholar] [CrossRef]
Pu, B.; Liu, Y.; Zhu, N.; Li, K.; Li, K. ED-ACNN: Novel Attention Convolutional Neural Network Based on Encoder-Decoder Framework for Human Traffic Prediction. Appl. Soft Comput. 2020, 97, 106688. [Google Scholar]
Qiu, J.; Du, L.; Zhang, D.; Su, S.; Tian, Z. Nei-TTE: Intelligent Traffic Time Estimation Based on Fine-Grained Time Derivation of Road Segments for Smart City. IEEE Trans. Ind. Inform. 2020, 16, 2659–2666. [Google Scholar] [CrossRef]
Roy, K.C.; Hasan, S.; Culotta, A.; Eluru, N. Predicting Traffic Demand during Hurricane Evacuation Using Real-time Data from Transportation Systems and Social Media. Transp. Res. Part C Emerg. Technol. 2021, 131, 103339. [Google Scholar] [CrossRef]
Sengupta, A.; Mondal, S.; Das, A.; Guler, S.I. A Bayesian Approach to Quantifying Uncertainties and Improving Generalizability in Traffic Prediction Models. Transp. Res. Part C Emerg. Technol. 2024, 162, 104585. [Google Scholar] [CrossRef]
Hu, S.; Fang, Z.; Fang, Z.; Deng, Y.; Chen, X.; Fang, Y.; Kwong, S. AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging. arXiv 2024, arXiv:2408.03624. [Google Scholar] [CrossRef]
Shang, P.; Liu, X.; Yu, C.; Yan, G.; Xiang, Q.; Mi, X. A New Ensemble Deep Graph Reinforcement Learning Network for Spatio-Temporal Traffic Volume Forecasting in a Freeway Network. Digit. Signal Process. 2022, 123, 103419. [Google Scholar] [CrossRef]
Sharma, B.; Kumar, S.; Tiwari, P.; Yadav, P.; Nezhurina, M.I. ANN Based Short-Term Traffic Flow Forecasting in Undivided Two Lane Highway. J. Big Data 2018, 5, 48. [Google Scholar] [CrossRef]
Sharma, P.; Singh, A.; Singh, K.K.; Dhull, A. Vehicle Identification Using Modified Region Based Convolution Network for Intelligent Transportation System. Multimed. Tools Appl. 2021, 81, 34893–34917. [Google Scholar] [CrossRef]
Shen, G.; Li, P.; Chen, Z.; Yang, Y.; Kong, X. Spatio-Temporal Interactive Graph Convolution Network for Vehicle Trajectory Prediction. Internet Things 2023, 24, 100935. [Google Scholar] [CrossRef]
Zahng, W.; Yu, Y.; Qi, Y.; Shu, F.; Wang, Y. Short-Term Traffic Flow Prediction Based on Spatio-Temporal Analysis and CNN Deep Learning. Transp. A Transp. Sci. 2019, 15, 1688–1711. [Google Scholar]
Soudeep, S.; Lailun Nahar Aurthy, M.; Jim, J.R.; Mridha, M.F.; Kabir, M.M. Enhancing Road Traffic Flow in Sustainable Cities through Transformer Models: Advancements and Challenges. Sustain. Cities Soc. 2024, 116, 105882. [Google Scholar] [CrossRef]
Soua, R.; Koesdwiady, A.; Karray, F. Big-Data-Generated Traffic Flow Prediction Using Deep Learning and Dempster-Shafer Theory. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 3195–3202. [Google Scholar]
Sun, P.; AlJeri, N.; Boukerche, A. A Fast Vehicular Traffic Flow Prediction Scheme Based on Fourier and Wavelet Analysis. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
Sun, Y.; Wang, Y.; Fu, K.; Wang, Z.; Zhang, C.; Ye, J. Constructing Geographic and Long-term Temporal Graph for Traffic Forecasting. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 3483–3490. [Google Scholar]
Tay, L.; Lim, J.M.; Liang, S.; Keong, C.K.; Tay, Y.H. Urban Traffic Volume Estimation Using Intelligent Transportation System Crowdsourced Data. Eng. Appl. Artif. Intell. 2023, 126, 107064. [Google Scholar] [CrossRef]
Tiwari, V.S.; Arya, A. Horizontally Scalable Probabilistic Generalized Suffix Tree (PGST) Based Route Prediction Using Map Data and GPS Traces. J. Big Data 2017, 4, 23. [Google Scholar] [CrossRef]
Tran, T.; He, D.; Kim, J.; Hickman, M. MSGNN: A Multi-structured Graph Neural Network Model for Real-Time Incident Prediction in Large Traffic Networks. Transp. Res. Part C Emerg. Technol. 2023, 156, 104354. [Google Scholar]
Wang, X.; Guan, X.; Cao, J.; Zhang, N.; Wu, H. Forecast Network-Wide Traffic States for Multiple Steps Ahead: A Deep Learning Approach Considering Dynamic Non-Local Spatial Correlation and Nonstationary Temporal Dependency. Transp. Res. Part C Emerg. Technol. 2020, 119, 102763. [Google Scholar]
Wang, J.; Liu, K.; Li, H. LSTM-based Graph Attention Network for Vehicle Trajectory Prediction. Comput. Netw. 2024, 248, 110477. [Google Scholar] [CrossRef]
Wang, K.; Ma, C.; Qiao, Y.; Lu, X.; Hao, W.; Dong, S. A Hybrid Deep Learning Model with 1DCNN-LSTM-Attention Networks for Short-Term Traffic Flow Prediction. Phys. A Stat. Mech. Its Appl. 2021, 583, 126293. [Google Scholar]
Wang, Z.; Sun, P.; Hu, Y.; Boukerche, A. A Novel Hybrid Method for Achieving Accurate and Timeliness Vehicular Traffic Flow Prediction in Road Networks. Comput. Commun. 2023, 209, 378–386. [Google Scholar] [CrossRef]
Wang, B.; Wang, J. ST-MGAT: Spatio-temporal Multi-Head Graph Attention Network for Traffic Prediction. Phys. A Stat. Mech. Its Appl. 2022, 603, 127762. [Google Scholar] [CrossRef]
Wang, X.; Xu, L.; Chen, K. Data-Driven Short-Term Forecasting for Urban Road Network Traffic Based on Data Processing and LSTM-RNN. Arab. J. Sci. Eng. 2019, 44, 3043–3060. [Google Scholar]
Xiong, L.; Su, L.; Zeng, S.; Li, X.; Wang, T.; Zhao, F. Generalized Spatial-Temporal Regression Graph Convolutional Transformer for Traffic Forecasting. Complex Intell. Syst. 2024, 10, 7943–7964. [Google Scholar] [CrossRef]
Xu, J.; Deng, D.; Demiryurek, U.; Shahabi, C.; van der Schaar, M. Mining the Situation: Spatiotemporal Traffic Prediction With Big Data. IEEE J. Sel. Top. Signal Process. 2015, 9, 702–715. [Google Scholar] [CrossRef]
Xu, J.; Li, Y.; Lu, W.; Wu, S.; Li, Y. A Heterogeneous Traffic Spatio-Temporal Graph Convolution Model for Traffic Prediction. Phys. A Stat. Mech. Its Appl. 2024, 641, 129746. [Google Scholar] [CrossRef]
Xu, M.; Liu, H. A Flexible Deep Learning-Aware Framework for Travel Time Prediction Considering Traffic Event. Eng. Appl. Artif. Intell. 2021, 106, 104491. [Google Scholar] [CrossRef]
Yang, H.; Dillon, T.S.; Chen, Y. Optimized Structure of the Traffic Flow Forecasting Model With a Deep Learning Approach. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2371–2381. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Xu, Y.; Han, J.; Wang, E.; Chen, W.; Yue, L. Efficient Traffic Congestion Estimation Using Multiple Spatio-Temporal Properties. Neurocomputing 2017, 267, 344–353. [Google Scholar] [CrossRef]
Ye, Y.; Xiao, Y.; Zhou, Y.; Li, S.; Zang, Y.; Zhang, Y. Dynamic Multi-Graph Neural Network for Traffic Flow Prediction Incorporating Traffic Accidents. Expert Syst. Appl. 2023, 234, 121101. [Google Scholar] [CrossRef]
Ye, J.; Xue, S.; Jiang, A. Attention-Based Spatio-Temporal Graph Convolutional Network Considering External Factors for Multistep Traffic Flow Prediction. Digit. Commun. Netw. 2022, 8, 343–350. [Google Scholar] [CrossRef]
Yu, J.J.Q. Citywide Traffic Speed Prediction: A Geometric Deep Learning Approach. Knowl.-Based Syst. 2021, 212, 106592. [Google Scholar] [CrossRef]
Zaki, J.F.; Ali-Eldin, A.; Hussein, S.E.; Saraya, S.F.; Areed, F.F. Traffic Congestion Prediction Based on Hidden Markov Models and Contrast Measure. Ain Shams Eng. J. 2020, 11, 535–551. [Google Scholar] [CrossRef]
Zhang, J.; Song, C.; Cao, S.; Zhang, C. FDST-GCN: A Fundamental Diagram Based Spatiotemporal Graph Convolutional Network for Expressway Traffic Forecasting. Phys. A 2023, 630, 129173. [Google Scholar] [CrossRef]
Zhao, J.; Liu, Z.; Sun, Q.; Li, Q.; Jia, X.; Zhang, R. Attention-Based Dynamic Spatial-Temporal Graph Convolutional Networks for Traffic Speed Forecasting. Expert Syst. Appl. 2022, 204, 117511. [Google Scholar] [CrossRef]
Zhao, H.; Yang, H.; Wang, Y.; Wang, D.; Su, R. Attention Based Graph Bi-LSTM Networks for Traffic Forecasting. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
Zheng, G.; Chai, W.K.; Katos, V. An Ensemble Model for Short-Term Traffic Prediction in Smart City Transportation System. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Big Island, HI, USA, 9–13 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
Attioui, M.; Lahby, M. Forecasting Traffic Congestion Using Air Pollution Data: A Case Study of Casablanca, Morocco. In Applied Intelligence and Informatics; Mahmud, M., Kaiser, M.S., Kamruzzaman, J., Iftekharuddin, K., Ahad, M.A.R., Zhong, N., Eds.; Communications in Computer and Information Science; Springer: Cham, Switzerland, 2025; Volume 2607, pp. 432–445. [Google Scholar] [CrossRef]

Figure 1. PRISMA 2020 flow diagram illustrating the systematic selection process for studies on traffic congestion forecasting using machine learning, deep learning, and large language model approaches (2014–2024). Adapted from ref. [10].

Figure 2. Publication trends in traffic congestion forecasting (2014–2024).

Figure 3. Distribution of AI techniques in traffic congestion forecasting (2014–2024). Traditional ML: SVM, Random Forest, Bayesian methods. Deep learning: CNN, LSTM, RNN, and GRU LLM-only: BERT, GPT, Transformers, etc. Hybrid ML + LLM: Combined ML/DL with LLM approaches.

Figure 4. Distribution of traditional machine learning techniques in traffic congestion forecasting research (2014–2024).

Figure 5. Distribution of deep learning techniques in traffic congestion forecasting research (2015–2024).

Figure 6. Distribution of large language model (LLM) applications in traffic congestion forecasting (2021–2024).

Figure 7. Performance metrics used in traffic congestion forecasting studies. Key metric definitions: MAE: Mean Absolute Error; RMSE: Root Mean Square Error; MAPE: Mean Absolute Percentage Error; ROC/AUC: Receiver Operating Characteristic/Area Under Curve; F1-score: Harmonic mean of precision and recall.

Figure 8. Common limitations in traffic congestion forecasting studies.

Figure 9. The evolution of AI techniques over time (2014–2024) showing the shift from traditional ML to deep learning to LLM approaches.

Figure 10. Comparison of prediction accuracy across different model types for various congestion scenarios and time horizons.

Figure 11. Performance–complexity trade-off analysis.

Figure 12. Data types evolution in traffic congestion prediction.

Figure 13. Evolution of prediction capabilities in traffic congestion forecasting.

Table 1. Research questions and justifications.

ID	Research Question	Justification
RQ1	How has the evolution from ML to LLMs impacted traffic congestion forecasting techniques between 2014 and 2024?	Examines technological progression, adoption patterns, and performance gains, particularly in predicting non-recurrent congestion
RQ2	What are the performance–cost trade-offs between ML, DL, and LLMs for traffic congestion prediction?	Evaluates computational requirements versus accuracy improvements to guide practical implementation decisions
RQ3	How do AI models perform in different transportation infrastructure scenarios?	Assesses model adaptability across urban networks, highways, and mixed transportation systems
RQ4	What is the effectiveness of AI models in predicting different types of congestion over various time frames?	Examines temporal accuracy and pattern recognition capabilities for different congestion scenarios
RQ5	How do different traffic parameters affect the AI model performance in congestion forecasting?	Identifies critical data inputs and their impact on prediction accuracy
RQ6	What are the comparative advantages of different AI architectures and their combinations for traffic prediction?	Evaluates hybrid approaches and architectural innovations for optimal performance

Table 2. Quality assessment scoring system (maximum score: 100 points).

Criterion	Sub-Criteria	Weight	Points
Publication Quality (30%)	Journal ranking (Q1/Q2/Q3/Q4)	15%	0–15
	Citation count relative to publication age	10%	0–10
	Venue reputation in ITS/AI domains	5%	0–5
Methodology Rigor (50%)	Clear problem definition and research questions	8%	0–8
	Detailed method description and reproducibility	12%	0–12
	Appropriate dataset selection and description	10%	0–10
	Rigorous experimental design (baselines, parameters)	10%	0–10
	Statistical significance testing	5%	0–5
	Computational cost reporting	5%	0–5
Reporting Transparency (20%)	Clear presentation of results and metrics	8%	0–8
	Honest discussion of limitations	5%	0–5
	Availability of code/data	4%	0–4
	Adequate figures and visualizations	3%	0–3
Total		100%	0–100

Table 3. PICO definition for our SLR study.

Element	Description
Population	Traffic congestion forecasting/prediction studies (2014–2024), Urban and highway transportation systems, Smart city infrastructure initiatives, intelligent transportation systems (ITSs)
Intervention	Traditional machine learning approaches, Deep learning methods, large language models (LLMs), Hybrid AI approaches, Computational modeling techniques
Comparison	Performance metrics (accuracy, MAE, RMSE, MAPE), Computational resource requirements, Data handling capabilities, Prediction time horizons (short, medium, long-term), Congestion scenario types (recurrent, non-recurrent)
Outcome	Prediction accuracy across congestion types, Computational efficiency and cost, Implementation complexity, Temporal and spatial accuracy, Methodological evolution patterns

Table 4. Key terms taken from the PICO.

Element	Description
Population	Traffic congestion, gridlock, bottleneck, heavy traffic, rush hour, incidents, special events, short-term, medium-term, long-term
Intervention	Traditional machine learning approaches, Deep learning methods, large language models (LLMs), Hybrid AI approaches, Computational modeling techniques
Comparison	Performance metrics (accuracy, MAE, RMSE, MAPE), Computational resource requirements, Data handling capabilities, Prediction time horizons (short, medium, long-term), Congestion scenario types (recurrent, non-recurrent)
Outcome	Prediction accuracy across congestion types, Computational efficiency and cost, Implementation complexity, Temporal and spatial accuracy, Methodological evolution patterns

Table 5. Taxonomy of AI/ML Methods in traffic congestion forecasting research.

Component	Type/Category	Application	Key Challenge	Implementation	Paradigm
TRADITIONAL ML
SVM	Instance-Based	General forecasting	Limited to linear relationships	Simple, low resources	Statistical ML
Random Forest	Tree-Based	General forecasting	May overfit with noisy data	Low computational cost	Statistical ML
Bayesian Networks	Probabilistic	General forecasting	Computationally intensive	Interpretable results	Statistical ML
DEEP LEARNING
CNN	Convolutional	Image/Video processing	High computational cost	GPU acceleration	Deep Learning
LSTM	Recurrent	Medium-term prediction	Training complexity	Moderate resources	Deep Learning
RNN	Recurrent	Sequential data	Vanishing gradient	Moderate resources	Deep Learning
LLM/TRANSFORMERS
BERT	Transformer-Based	Text processing	Very high computation	High resources	LLM/Transfer
GPT	Transformer-Based	Multimodal	Extremely high computation	Highest resources	LLM/Transfer

Table 6. Taxonomy of data types and scenarios in traffic congestion forecasting.

Component	Type/Category	Application	Key Challenge	Implementation	Paradigm
TRAFFIC PARAMETERS
Traffic Volume	Structured Parameter	General	Data collection challenges	Requires sensors	-
Speed	Structured Parameter	General	Variability issues	GPS/loop detectors	-
GPS Data	Sensor-Generated	General	Privacy concerns	Mobile devices	-
CONGESTION SCENARIOS
Rush Hour	Recurrent-Temporal	Urban traffic	Predictable patterns	Daily occurrence	-
Incidents	Non-recurrent	Emergency response	Unpredictability	Requires real-time data	-
Weather Events	Non-recurrent	Weather-related	Unpredictability	External data needed	-

Table 7. Taxonomy of performance metrics and implementation challenges.

Component	Type/Category	Application	Key Challenge	Implementation	Paradigm
PERFORMANCE METRICS
MAE/RMSE	Absolute Error	General evaluation	Sensitive to outliers	Standard metrics	Evaluation methodology
MAPE	Relative Error	Comparative analysis	Scale-dependent	Common metric	Evaluation methodology
Accuracy	Direct Measurement	Primary metric	May be misleading	Common metric	Performance improvement
IMPLEMENTATION CHALLENGES
Missing Data	Data Quality	All scenarios	Data Quality Issue	Requires imputation	Data preprocessing
Computational Cost	Resource Limitation	All scenarios	Resource Limitation	Hardware requirements	Optimization focus

Table 8. Performance comparison across methodologies.

Method	Accuracy	MAE	RMSE	MAPE
Traditional ML	75–85%	5–8%	7–10	10–15%
Deep Learning	85–92%	3–5%	4–7	5–10%
LLM-Based	90–95%	2–4%	3–5	4–8%
Hybrid	92–96%	1.5–3%	2.5–4	3–6%

Table 9. Comprehensive characteristics of primary datasets used in reviewed studies.

Dataset	Studies	Location	Resolution	Duration	Sensors	Coverage
PeMS (District 7)	34	California, USA	5 min	2012–2023	3,900+ loops	440 km highway
METR-LA	18	Los Angeles, USA	5 min	2012–2017	207 sensors	Urban network
PeMS-BAY	12	SF Bay Area, USA	5 min	2017–2019	325 sensors	540 km highway
Hong Kong Traffic	8	Hong Kong	1–5 min	2018–2020	GPS + sensors	City-wide
Seattle Loop Data	6	Seattle, USA	5 min	2015–2019	323 detectors	Highway
Beijing Taxi GPS	5	Beijing, China	30 sec	2013–2016	33,000+ taxis	Urban
NGSIM Trajectory	4	Various US	0.1 sec	2005–2006	Video	Specific sites
TomTom Speed	3	Multi-city	1–5 min	2016–2020	GPS probes	Global coverage

Table 10. Representative studies comparison across methodologies.

Study	Year	Method	Dataset	Key Innovation	Performance	Limitations
Traditional Machine Learning Approaches
Oh et al. [85]	2015	GMM + ANN	ITS data (2012–2013)	Multifactor pattern recognition	MAE: 8.47s	Black-box nature, limited scalability
Tay et al. [102]	2023	Random Forest	2M+ crowdsourced	BFS algorithm integration	MAPE: 2.63%	10% ground truth limitation
Deep Learning Approaches
Zhao et al. [16]	2020	T-GCN	PeMS	Temporal graph convolution	RMSE: 4.32	Fixed graph structure
Cao et al. [17]	2020	CNN-LSTM	Hong Kong data	Periodicity extraction	Accuracy: 93.8%	High complexity
LLM and Hybrid Approaches
Jin et al. [21]	2021	TrafficBERT	Large-scale traffic	Pretrained model	MAPE: 4.8%	Computational complexity
Liu et al. [23]	2024	ST-LLM	Multi-city data	spatial–temporal fusion	Accuracy: 95%	Domain adaptation needed

Table 11. Capability matrix across methodological approaches.

Capability	Traditional ML	Deep Learning	LLM-Based	Hybrid
Spatial Dependency Modeling	Limited	Good	Excellent	Excellent
Temporal Pattern Recognition	Moderate	Excellent	Good	Excellent
Multimodal Data Integration	Poor	Moderate	Excellent	Excellent
Transfer Learning	None	Limited	Excellent	Good
Real-time Processing	Excellent	Good	Poor	Moderate
Interpretability	High	Low	Moderate	Low
Edge Case Handling	Poor	Moderate	Excellent	Excellent
Computational Efficiency	Excellent	Moderate	Poor	Poor

Table 12. Application suitability recommendations.

Scenario	Recommended	Alternative	Avoid
Real-time Highway	LSTM/GRU	Traditional ML	LLM
Urban Network	GNN + LLM	Deep Learning	Traditional ML
Event Prediction	LLM/Hybrid	-	Traditional ML
Resource-Limited	Traditional ML	Optimized DL	LLM
Research/Offline	Hybrid LLM	LLM	-

Table 13. Comparative analysis of AI approaches for traffic congestion forecasting (2014–2024).

Reference	Year	Method	Dataset	Limitations	Performance Metrics	Key Findings
TRADITIONAL MACHINE LEARNING APPROACHES (2014–2017)
Urban Traffic Flow Prediction System [85]	2015	Gaussian mixture model + ANN	ITS data (2012–2013)	Black-box nature of ANN, parameter estimation for varying road links	MAE, prediction time (8.47 s)	Multifactor pattern recognition model outperformed other methodologies during rush hour
Vehicle Classification and Speed Estimation [12]	2017	Dynamic Bayesian Networks	33,480 measurements	Constant speed assumption invalid for slow traffic, spectrum folding	99% detection accuracy, 5 kph mean error	Effective fusion of PIR and ultrasonic sensor data
Traffic Volume Estimation [102]	2023	Random Forest, BFS algorithm	2M+ crowdsourced data points	Limited ground truth data (10%), model sensitivity to input data	R², MAE, MAPE	RF model demonstrated best performance compared to other ML techniques
Short-term Traffic Flow Prediction [116]	2017	Bayesian Network	NGSIM trajectory data	Limited ability to model complex temporal dependencies	Accuracy (93.2%), computation time	Effective for scenarios with missing sensor data
DEEP LEARNING APPROACHES (2018–2020)
Vehicle Identification [95]	2020	Modified RCNN + GoogleNet	MIO-TCD, EBVT	Detection challenges in low light conditions, computational complexity	Accuracy, precision, recall	Significant improvement in vehicle identification accuracy
Trajectory Prediction [96]	2023	Deep Generative Models	Trajectory data	High computational requirements, limited explainability	Prediction accuracy (95.3%), RMSE	Effective for complex urban environments
Traffic Flow Forecasting [74]	2021	LSTM + CNN	PeMS dataset	Data preprocessing requirements, training time	MAE (2.13%), MAPE (5.6%)	Spatial–temporal features improved prediction accuracy
Urban Traffic Prediction [123]	2020	Attention-based LSTM	Urban traffic data	Model complexity, hyperparameter sensitivity	RMSE (4.32), MAE (3.47)	Attention mechanism captured critical temporal dependencies
Traffic State Prediction [118]	2022	Graph Neural Network	Road network data	Graph construction complexity, computational cost	Prediction accuracy (93.8%)	Effectively captured spatial dependencies in road networks
LLM AND HYBRID APPROACHES (2021–2024)
TrafficBERT [21]	2021	BERT + time-series analysis	Large-scale traffic data	Computational complexity, transfer learning challenges	MAPE (4.8%), RMSE (3.2)	Pretrained model improved long-range forecasting
Traffic Event Detection [36]	2023	LLM (slot filling approach)	Traffic incident reports	Textual ambiguity, domain-specific vocabulary	F1-score (0.89), precision (0.92)	Effectively identified and categorized traffic events from textual data
Traffic Prediction with LLM [86]	2023	GPT adaptation	Multi-source traffic data	Fine-tuning requirements, inference time	MAE (2.01), RMSE (3.14)	Contextual understanding improved prediction during special events
LSTM-Transformer [19]	2022	LSTM + Transformer	Traffic flow time series	Model complexity, training data requirements	MAPE (5.1%), MAE (2.34)	Hybrid approach outperformed individual models
Urban Traffic Patterns [63]	2024	GNN + BERT	Road network + textual data	Integration complexity, computational cost	Accuracy (94.2%), MAE (1.87)	Semantic understanding improved prediction during events

Note: This table presents 14 representative studies selected from the 100 reviewed publications to illustrate the methodological evolution and diversity of approaches across three technological eras (2014–2024). The complete dataset analysis encompassed all 100 studies, as detailed in Section 4 and Section 5. Studies were selected to represent (a) diverse methodological approaches within each era, (b) varying application contexts (urban networks, highways, and multimodal scenarios), (c) different performance metric reporting standards, and (d) representative limitations that characterize each technological generation.

Table 14. AI model performance–cost comparison.

Model Type	Performance Range	Cost Advantages	Cost Disadvantages
Traditional ML Models	75–85% Lower accuracy for non-recurrent congestion	Minimal computational requirements Faster training and inference times Simpler implementation	Extensive feature engineering required Limited scalability with data complexity
Deep Learning Models	85–92% Better spatial–temporal handling	Automated feature extraction Scalability with data volume Moderate inference requirements	Higher computational demands for training Hyperparameter tuning complexity
LLM-Based Approaches	90–95% Superior contextual understanding	Transfer learning capabilities Reduced training data requirements Better generalization to new locations	Significantly higher computational demands (10–100×) Domain adaptation challenges Complex integration requirements

Table 15. AI model performance by transportation infrastructure type.

Infrastructure Type	Optimal Models and Characteristics	Key Features	Performance Gain
Urban Road Networks	Graph Neural Networks (GNNs): Superior spatial relationship capture LSTM+CNN Hybrid: Effective spatial–temporal dependencies LLM Approaches: Advantage with multiple events and intersections	Complex urban network modeling Multifactor event handling Diverse influencing factors	4–8% Improvement LLM/Hybrid over Traditional
Highway Corridors	LSTM and GRU Models: Excel where temporal dependencies dominate Traditional ML: Remains competitive for short-term prediction (15–30 min) and Less pronounced model differences	Temporal pattern focus Sequential traffic flow Simpler spatial relationships	2–5% Improvement Deep Learning over Traditional ML
Mixed Transportation Networks	Hybrid GNN+LLM Models: Best performance in mixed networks Context-Aware Models: Significant advantages with multiple transportation mode interactions	Multimodal integration Complex interaction modeling Cross-network dependencies	5–10% Improvement Hybrid over Single-Technique

Table 16. Congestion type analysis.

Congestion Type	Performance Characteristics	Accuracy Metrics
Recurrent Congestion Rush Hour and Regular Patterns	All model types perform relatively well, with minimal accuracy differences between traditional ML and advanced approaches (3–5% Difference) Traditional ML models remain competitive for short-term predictions (15–30 min) Deep learning models demonstrate superior performance for medium-term horizons (30–120 min)	88–95% Accuracy
Non-recurrent Congestion Accidents, Weather Events, Special Events	LLM and hybrid approaches demonstrate substantial advantages over traditional methods (10–15% Higher Accuracy) Traditional ML models show significant performance degradation for non-recurrent events Context-aware models excel at incorporating event information to predict impact severity and duration	LLM/Hybrid: 82–88% Traditional: 65–75%

Table 17. Prediction timeframe performance.

Timeframe	Performance Characteristics	Performance Gap
Short-term 0–30 min	Performance gaps between model types are minimal. All approaches yielded competitive results for immediate predictions.	2–5% Gap
Medium-term 30–120 min	Deep learning approaches have begun to outperform traditional ML as temporal complexity increases and pattern recognition becomes more critical.	5–10% Advantage
Long-term 2+ h	LLM and hybrid models maintain higher accuracy by leveraging contextual understanding and complex pattern recognition capabilities.	10–15% Advantage
Very Long-term Next day	All model types show accuracy degradation due to increased uncertainty, but LLM approaches demonstrate higher robustness and maintain better performance.	Higher Robustness

Table 18. High-impact parameters.

Parameter	Description	Impact Level
Traffic Flow Rate	Vehicles per hour—Identified as the most influential parameter across all model types. This metric serves as the primary indicator of traffic conditions and is directly correlated with the prediction accuracy.	Highest Impact
Historical Travel Time	Critical for establishing baseline patterns, particularly in the LSTM and GRU models. This provides a temporal context that is essential for making accurate predictions.	Critical
Occupancy Rate	Percentage of road segments occupied by vehicles. There was a high correlation between the prediction accuracy and all modeltypes.	High Impact
Special Events Information	Demonstrated the highest impact differential between traditional ML and LLM approaches. This is critical for predicting non-routine traffic disruptions and their patterns.	Game Changer

Table 19. Data dependencies.

Dependency Factor	Impact Description	Performance Gain
Temporal Resolution	Higher frequency data collection (1–5 min intervals) significantly improves prediction accuracy compared with lower resolution data collection methods.	+3–7% Accuracy
Spatial Coverage	Models incorporating neighboring road segment data significantly outperformed isolated segment models by capturing spatial traffic flow patterns.	+4–9% Improvement
Historical Data Depth	Deep learning models show heightened sensitivity to historical data availability, with optimal performance requiring substantial training periods.	3–6 Months Optimal
Multimodal Integration	Models incorporating heterogeneous data sources (traffic, weather, events) significantly outperform single-source models through comprehensive context understanding.	+5–12% enhancement

Table 20. AI architecture comparison for traffic prediction.

Architecture	Advantages	Limitations
CNN Convolutional Neural Networks	Excels at capturing spatial relationships in grid-based urban networks. Effective for processing image-based traffic data (video surveillance). Particularly advantageous for detecting congestion propagation patterns	Less effective for capturing long-term temporal dependencies
RNN/LSTM/GRU Recurrent Neural Networks	Superior performance in capturing temporal traffic patterns and dependencies. It is particularly effective for highway traffic prediction, where the flow is more sequential than urban traffic. Demonstrates strong performance in medium-term prediction horizons	May struggle with complex spatial relationships in large networks
GNN Graph Neural Networks	Excels at modeling complex road network topologies. It effectively captures the spatial dependencies between interconnected road segments. Particularly advantageous for urban networks with complex connectivity	Higher computational complexity. More complex implementation
Transformer/LLM Large Language Models	Superior contextual understanding and interpretation of traffic-related events. Therefore, the effective integration of multimodal data sources is required. Particularly advantageous for non-recurrent congestion prediction	Highest computational requirements. Domain adaptation challenges

Table 21. Performance–cost trade-offs across methodological paradigms in traffic congestion prediction.

Metric	Traditional ML (2014–2017)	Deep Learning (2018–2020)	LLM and Hybrid (2021–2024)
Accuracy (%)	75–85	85–92	90–95
MAE (%)	5–9	3–6	2–4
RMSE	7–10	4–7	3–5
Training Time	Minutes	Hours	Days
Memory Requirements	<1 GB	1–10 GB	10–100 GB
Implementation Complexity	Simple	Moderate	High
Model Interpretability	High	Medium	Low

Table 22. Hybrid approach performance and computational characteristics.

Hybrid Type	Accuracy Improvement	Computational Cost	Implementation Complexity	Best Use Case
Ensemble	+3–7%	High	Medium	Stable scenarios
Physics-Informed	+5–10%	Medium	High	Data-scarce
Pipeline	+6–12%	Medium	Medium	General purpose
Multimodal	+10–15%	Very High	High	Non-recurrent

Table 23. Energy consumption and carbon footprint by methodology.

Methodology	Training Energy (kWh)	Inference (kWh/day)	Total Annual (kWh)	${CO}_{2}$ e (tons/year)
Traditional ML	5–15	0.2–0.5	90–200	0.04–0.09
Deep Learning	200–800	2–8	1000–3500	0.45–1.6
LLM-Based	5000–20,000	50–150	23,000–75,000	10–34
Hybrid	3000–12,000	20–80	10,000–42,000	4.5–19

Table 24. Net carbon impact: benefits vs. computational costs.

Methodology	Computational Cost (tons CO₂e)	Congestion Reduction (tons)	Net Benefit (tons CO₂e)	Benefit/ Cost Ratio
Traditional ML	0.04–0.09	40,000–65,000	39,999–64,999	500,000×
Deep Learning	0.45–1.6	55,000–85,000	54,998–84,998	40,000×
LLM-Based	10–34	70,000–110,000	69,966–109,990	3000×
Hybrid	4.5–19	65,000–100,000	64,981–99,995	5000×

Table 25. Real-world case studies: policy implications of ai-based congestion forecasting systems.

Aspect	Singapore ITS	Los Angeles LADOT	EU Smart Cities
Deployment Scale	1200+ km roads, 11,000+ sensors, GPS probes, weather stations	7500 km arterial streets, CCTV integration, mobile apps	Multi-city network (Barcelona, Amsterdam, Rotterdam)
Model Architecture	Hybrid LSTM-GNN ensemble with transfer learning, 30 s real-time updates	CNN-LSTM hybrid with computer vision, 5 min granularity	Pretrained BERT-based models with fine-tuning (72 h/city)
Prediction Accuracy	89% (30 min horizon)	86% (incident-induced congestion)	91% (event-related congestion); 78% transfer to new city
Economic Impact	USD 420 M annual savings, USD 80 M implementation cost, 2.3-year ROI	USD 180 M annual safety/economic benefits	€52 M combined savings through knowledge sharing
Operational Improvements	15% reduction in average commute times	23% faster incident response (12.3→9.5 min), 12% fewer secondary crashes, 62% driver satisfaction	65% reduction in training data requirements, 6-month deployment vs. 18–24 months
Key Policy Insight	Phased implementation strategy (highways first) critical for stakeholder buy-in; moderate complexity yields practical benefits	Human–AI collaboration with operator override enhances acceptance; integration trumps standalone accuracy	Data-sharing agreements enable cost reduction; GDPR facilitates trust through privacy-preserving techniques
Study Period	2022–2024	2021–2023	Not specified

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Attioui, M.; Lahby, M. A Systematic Literature Review of Traffic Congestion Forecasting: From Machine Learning Techniques to Large Language Models. Vehicles 2025, 7, 142. https://doi.org/10.3390/vehicles7040142

AMA Style

Attioui M, Lahby M. A Systematic Literature Review of Traffic Congestion Forecasting: From Machine Learning Techniques to Large Language Models. Vehicles. 2025; 7(4):142. https://doi.org/10.3390/vehicles7040142

Chicago/Turabian Style

Attioui, Mehdi, and Mohamed Lahby. 2025. "A Systematic Literature Review of Traffic Congestion Forecasting: From Machine Learning Techniques to Large Language Models" Vehicles 7, no. 4: 142. https://doi.org/10.3390/vehicles7040142

APA Style

Attioui, M., & Lahby, M. (2025). A Systematic Literature Review of Traffic Congestion Forecasting: From Machine Learning Techniques to Large Language Models. Vehicles, 7(4), 142. https://doi.org/10.3390/vehicles7040142

Article Menu

A Systematic Literature Review of Traffic Congestion Forecasting: From Machine Learning Techniques to Large Language Models

Abstract

1. Introduction

1.1. Research Gaps

1.2. Contributions

2. Background and Related Work

2.1. Evolution of Traffic Prediction Research

2.1.1. Traditional Machine Learning Era (2014–2017)

2.1.2. Deep Learning Revolution (2018–2020)

2.1.3. LLM Integration Phase (2021–2024)

2.2. Current State of Research

2.2.1. Methodological Advances

2.2.2. Data Integration Challenges

2.2.3. Practical Implementation Barriers

3. Research Methodology

3.1. Systematic Review Protocol

3.2. Research Questions

3.3. Search Strategy, Criteria and Selection Process

3.4. Key Terms Combination

4. Data Extraction

4.1. PRISMA-Based Data Extraction

4.2. Publication Trends

4.3. AI Techniques Distribution

4.4. Performance Metrics

4.5. Common Limitations

5. Comparative Analysis Across Model Types

5.1. Traditional Machine Learning vs. Deep Learning

5.2. Deep Learning vs. LLM Approaches

5.3. Hybrid Approaches

5.4. Cross-Methodology Performance Analysis

5.5. Capability Matrix

5.6. Performance–Complexity Trade-Off Analysis

5.7. Application Suitability Matrix

5.8. Detailed Comparison Tables

6. Results and Outcomes: Answering Research Questions

6.1. Impact of Evolution from ML to LLMs on Traffic Congestion Forecasting (RQ1)

6.2. Performance–Cost Trade-Offs Between ML, DL, and LLMs (RQ2)

6.3. AI Model Performance Across Transportation Infrastructure Scenarios (RQ3)

6.4. Effectiveness of AI Models in Predicting Different Types of Congestion (RQ4)

6.5. Impact of Traffic Parameters on AI Model Performance (RQ5)

6.6. Comparative Advantages of Different AI Architectures (RQ6)

7. Discussion and Future Directions

7.1. Temporal Evolution of Methods

7.2. Performance Comparison

7.3. Domain-Specific Considerations

7.4. Specific Considerations for LLM vs. ML Comparison

7.4.1. Contextual Understanding

7.4.2. Technical Implementation Challenges

7.4.3. Performance in Edge Cases

7.5. Hybrid Approaches: Taxonomy and Characteristics

7.6. Micro-Scale Analysis: Intersection and Crossing-Level Predictions

7.6.1. Intersection-Level Traffic Modeling

7.6.2. Mixed-Traffic Micro-Environments

7.6.3. Data and Methodological Requirements

7.6.4. Research Gaps and Opportunities

7.7. Environmental and Sustainability Considerations

7.8. Policy Implications and Real-World Case Studies

7.8.1. Synthesized Policy Recommendations

7.8.2. Implementation Barriers and Mitigation Strategies

7.9. Recommendations and Future Directions

7.9.1. Methodological Recommendations

7.9.2. Data-Related Recommendations

7.9.3. Application-Specific Recommendations

7.10. Limitations of This Review

8. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information