Congestion Forecasting Using Machine Learning Techniques: A Systematic Review

Mehdi Attioui; Mohamed Lahby

doi:10.3390/futuretransp5030076

and

Laboratory of Mathematics, Artificial Intelligence, and Digital Learning, Higher Normal School Hassan II University, Casablanca 50069, Morocco

^*

Author to whom correspondence should be addressed.

Future Transp.2025, 5(3), 76;https://doi.org/10.3390/futuretransp5030076

Version Notes

Order Reprints

Abstract

Traffic congestion constitutes a substantial global issue, adversely impacting economic productivity and quality of life, with associated costs estimated at approximately 2% of GDP in various nations. This systematic review investigates the application of machine learning (ML) in traffic congestion forecasting from 2010 to 2024, adhering to the PRISMA 2020 guidelines. A comprehensive search of three major databases (IEEE Xplore, SpringerLink, and ScienceDirect) yielded 9695 initial records, with 115 studies meeting the inclusion criteria following rigorous screening. Data extraction encompassed methodological approaches, ML techniques, traffic characteristics, and forecasting periods, with quality assessment achieving near-perfect inter-rater reliability (Cohen’s

κ

= 0.89). Deep Neural Networks were the predominant technical approach (47%), with supervised learning being the most prevalent (57%). Classification tasks were the most common (42%), primarily addressing recurrent congestion scenarios (76%) and passenger vehicles (90%). The quality of publications was notably high, with 85% appearing in Q1-ranked journals, demonstrating exponential growth from minimal activity in 2010 to 18 studies in 2022. Significant research gaps persist: reinforcement learning is underutilized (8%), rural road networks are underrepresented (2%), and industry–academia collaboration is limited (3%). Future research should prioritize multimodal transportation systems, real-time adaptation mechanisms, and enhanced practical implementation to advance intelligent transportation systems (ITSs). This review was not registered because it focused on mapping the research landscape rather than intervention effects.

Keywords:

intelligent transportation systems (ITSs); traffic congestion; machine learning; forecasting; systematic review

1. Introduction

Smart cities are increasingly positioned to play a crucial role in addressing future challenges, particularly in the domain of transportation management. Mobility is fundamental to human activities and enables access to the workplace, healthcare facilities, educational institutions, and public services. Urban transportation systems face unprecedented challenges due to rapid population growth and increased vehicle density. In cities characterized by high urbanization and reliance on automobiles, issues such as traffic congestion, air pollution, noise pollution, and other adverse effects related to the movement of individuals and goods are prevalent. Traffic congestion has emerged as a significant issue that adversely affects economic productivity, environmental sustainability, and the quality of life in metropolitan areas. The International Transport Forum [1] endeavors to quantify the “total costs” associated with traffic congestion, which are estimated to account for approximately 2% of GDP in various countries [2]. There is substantial evidence of economic and environmental costs associated with traffic congestion. For instance, in Greater Los Angeles, individuals spend an average of 70 h annually in traffic, resulting in the consumption of over 200 liters of fuel [3]. The rate of urbanization has accelerated in recent decades, with projections indicating that by 2050 more than half of the global population will reside in metropolitan areas [3]. In numerous countries, private vehicles have emerged as the predominant mode of transportation, contributing to increasing congestion in many urban centers [3].

The integration of information and communication technologies (ICTs) into intelligent transportation systems (ITSs) has been extensively explored, offering promising avenues for alleviating congestion through advanced forecasting capabilities. In particular, machine learning techniques have demonstrated substantial potential for analyzing complex traffic patterns and predicting congestion scenarios with greater accuracy than traditional statistical methods. These algorithms exhibit superior flexibility and adaptability compared to conventional algorithms, rendering them suitable for the analysis of large-scale data [4,5]. Consequently, machine learning algorithms efficiently process extensive datasets to generate precise predictions through iterative training, thereby enhancing their accuracy and adaptability. Consequently, machine learning forecasters have the potential to surpass and replace traditional forecasting methods. The most commonly employed algorithms for forecasting include supervised and unsupervised learning models [6,7].

In [8], the authors reviewed real-time traffic congestion predictions using various machine learning models. Similarly, [9] investigated a range of machine learning algorithms to optimize multiple aspects of traffic management systems, including signal management, flow prediction, congestion detection and management, and automatic signal detection. Furthermore, [10] provided a comprehensive overview of traffic predictions based on Artificial Intelligence (AI). Medina-Salgado et al. [11] emphasized the existing computational techniques for urban traffic flow predictions. Anirudh et al. [12] surveyed recent studies on deep learning for traffic flow prediction. Weiwei et al. [13] presented an overview of existing graph neural networks employed to address various traffic forecasting challenges, such as road traffic flow and speed forecasting, passenger flow forecasting in urban rail transit systems, and demand forecasting in transportation-sharing platforms. In the studies by [13,14], it was determined that Long Short-Term Memory, Multilayer Perceptron, and Convolutional Neural Network techniques are particularly effective in forecasting and categorizing road traffic, especially when the traffic flow variable is utilized.

Despite the expanding body of research on machine learning methodologies for traffic congestion prediction models, there remains a lack of systematic reviews that integrate these prediction models from the perspective of traffic congestion or data processing expertise and evaluate their effectiveness through systematic reviews of the intelligent transportation systems. This paper introduces a systematic review of recent studies that employed machine learning prediction models for traffic congestion.

The remainder of this paper is organized as follows: Section 2 describes the research methods used in this study. Section 3 reports the results and interpretations of the systematic review. Section 4 presents a discussion, the implications, and the future directions for researchers and practitioners.

2. Methodology

2.1. Protocol and Registration

This systematic review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement guidelines [15], adapted specifically for systematic review studies. No protocol was registered in PROSPERO or other prospective registries, as this study aimed to provide a comprehensive overview of the research landscape rather than a systematic review addressing specific intervention effects, which would typically necessitate protocol registration. The systematic review methodology was selected following Kitchenham’s guidelines [16,17] because of its ability to offer a comprehensive overview of research activity within a specific domain. Unlike traditional systematic literature reviews that focus on answering specific research questions with detailed quality assessments, systematic reviews provide a broad perspective to identify research gaps, trends, and opportunities for future research. This approach is particularly suitable for the domain of traffic congestion forecasting using machine learning, as it is a rapidly evolving field that requires comprehensive coverage rather than an in-depth analysis of specific interventions.

2.2. Research Questions’ Development and Justification

The research questions (Table 1) were developed through an iterative process in accordance with established systematic review frameworks [18,19,20], which were designed to address the unique aspects of predicting traffic congestion. These questions were crafted to encompass both the technical and contextual dimensions of machine learning applications in traffic forecasting, including temporal forecasting intervals (RQ5), traffic parameters (RQ6), and the evolution of machine learning models (RQ7). Each research question was formulated to address specific knowledge gaps identified through a preliminary literature review and expert consultation.

Table 1. Definition and rationalization of different research questions.

2.3. Eligibility Criteria

The eligibility criteria were systematically refined using an enhanced PICO (Population, Intervention, Comparison, Outcome) framework specifically adapted for systematic reviews of traffic congestion forecasting using machine learning.

2.3.1. PICO Framework Application

Population (P): Traffic Congestion Phenomena and Characteristics

The primary focus of this component was the examination of studies addressing traffic congestion phenomena, including traffic jams, gridlocks, bottlenecks, traffic saturation, heavy traffic conditions, and recurring and nonrecurring congestion. The secondary focus is on the traffic flow characteristics that contribute to congestion analysis, such as speed, density, volume, and travel time. The scope encompasses all road types, including urban, highway, and rural roads, and all vehicle categories, such as passenger, public transport, commercial, and emergency vehicles.

Intervention (I): Machine Learning and Artificial Intelligence Techniques

The primary interventions include machine learning algorithms, artificial intelligence techniques, deep learning models, neural networks, and computational intelligence. The application context pertains to the prediction, forecasting, detection, analysis, classification, and optimization of traffic congestion problems. The technical scope encompasses supervised, unsupervised, and reinforcement learning approaches applied to traffic congestion issues.

Comparison (C): Comparative Analysis Elements

Methodological comparisons involve evaluating different machine learning techniques and their effectiveness. Temporal comparisons pertain to various forecasting periods and prediction horizons. Contextual comparisons refer to different congestion types, road categories, and vehicle classifications.

Outcome (O): Results Indicating Comparative Elements for Traffic Congestion Forecasting

The results identify the criteria that will be employed to evaluate the various methodologies for predicting traffic congestion, with particular emphasis on the machine learning task, model, and algorithm:

Learning paradigms (supervised, unsupervised, reinforcement learning).
Task classifications (regression, classification, clustering, optimization).
Model architectures (neural network types, ensemble structures, hybrid model designs).
Specific algorithms used for each study (e.g., LSTM, random forest, SVM, CNN, XGBoost, etc.).

2.3.2. Inclusion Criteria

Studies were included if they met the following criteria:

Population: Research dedicated to the prediction and forecasting of traffic congestion within various transportation contexts.
Intervention: The utilization of machine learning, artificial intelligence, or deep learning techniques as the primary methodological approach.
Context: Urban roads, highways, or rural transportation networks with empirical data.
Outcome: Traffic congestion prediction or algorithmic improvements.
Study Design: Primary research studies presenting empirical results with quantitative evaluation.
Publication Type: Peer-reviewed journal articles and conference papers from established venues.
Language: Studies published in English language.
Publication Period: Studies published between January 2010 and December 2024.
Quality Threshold: Studies meeting minimum methodological rigor and reporting standards.

2.3.3. Exclusion Criteria

Studies were excluded if they met any of the following criteria:

Secondary Research: Review articles, systematic reviews, meta-analyses, or survey papers.
Theoretical Studies: Studies without empirical validation or experimental results.
Scope Limitations: Studies focusing solely on general traffic flow without specific congestion analysis.
Methodological Constraints: Studies using only traditional statistical methods without machine learning components.
Publication Type: Conference abstracts, book chapters, editorials, opinion pieces, or workshop papers.
Data Issues: Duplicate publications or studies with overlapping datasets from the same research group.
Accessibility: Studies not accessible in full text despite reasonable contact attempts with authors.
Quality Standards: Studies with insufficient methodological detail for adequate quality assessment.

2.3.4. Quality Assessment and Risk of Bias

While traditional risk of bias assessment tools (such as RoB 2 or ROBINS-I) are designed for intervention studies and were not applicable to this review, the methodological quality was systematically assessed using multiple criteria.

Quality Assessment Framework:

Publication Quality: Journal ranking based on Scimago Journal Rankings (Q1–Q4 for journals, A–C for conferences)
Citation Impact: Minimum threshold of five citations for studies published more than two years prior to the search date
Methodological Rigor: Presence of clear machine learning methodology description, appropriate validation procedures, and comprehensive performance evaluation
Empirical Evidence: Demonstration of experimental results with quantitative performance metrics
Reproducibility: Sufficient methodological detail to enable replication by independent researchers

Quality assessment was performed independently by two reviewers (M.A. and M.L.) on a random sample of 20%.

2.3.5. Assessment of Reporting Biases

Potential sources of reporting bias were systematically considered.

Publication Bias: Preference for statistically significant results in academic publishing.
Language Bias: English-only inclusion may have excluded relevant studies in other languages.
Database Selection Bias: Focus on three major databases may have missed specialized publications.
Temporal Bias: Rapid ML evolution may favor recent publications over foundational work.

Mitigation strategies included comprehensive multi-database searches, manual citation tracking, expert consultation, and systematic documentation of exclusion reasons.

In this stage, each question was divided into four main elements: (P) population, (I) intervention, (C) comparison, and (O) outcome. Table 2 outlines each element that forms the sentences, illustrating the essence of this review. Table 3 lists the keywords extracted from each PICO component. The study then categorized these key terms, as shown in Table 4. Each category includes a set of key terms that are equivalent or similar in meaning to each other. Boolean logic operators were used to construct the final search string; specifically, OR connected terms within the same group, while AND linked different groups of terms. The PICO framework ensures that the systematic review encompasses studies offering significant comparative elements for assessing various machine learning methodologies in traffic congestion forecasting while preserving the comprehensive scope required for effectively mapping the research landscape. The final search string obtained is

(traffic congestion or recurring traffic congestion or nonrecurring traffic congestion or traffic flow or traffic density or traffic volume) and (collect or data) and (analyze or learn or forecast) and (approach or technique or method or algorithm) and (application or tool or framework or solution) and (performance or accuracy or management or optimization or detection or alleviation).

Table 2. PICO definition for this review.

Element	Description	Research Questions Addressed	Search Strategy Implications
Population	Studies focusing on traffic congestion phenomena, including traffic jams, gridlocks, bottlenecks, traffic saturation, heavy traffic conditions, recurring and nonrecurring congestion, and related traffic flow characteristics (speed, density, volume, and travel time) across different road types and vehicle categories.	RQ4, RQ5, RQ6	Comprehensive congestion-related terminology, multimodal considerations, infrastructure variety
Intervention	Machine learning and artificial intelligence techniques are primary methodological approaches for traffic congestion prediction, forecasting, detection, classification, and optimization, including supervised/unsupervised learning, deep learning, neural networks, and hybrid computational approaches.	RQ7, RQ3	ML/AI algorithm terminology, learning paradigm keywords, computational intelligence methods
Comparison	Comparative analysis elements including different machine learning approaches, forecasting methodologies, temporal granularities, performance metrics, implementation frameworks, and benchmarking against traditional methods or baseline approaches.	RQ1, RQ2, RQ3, RQ7	Comparative terminology, benchmark keywords, evaluation methodology terms
Outcome	The results indicate the elements that will be utilized to compare the different approaches for forecasting traffic congestion, specifically: ML Tasks (learning paradigms: supervised/unsupervised/reinforcement learning; task classifications: regression/classification/ clustering/optimization), ML Model architectures (neural network types, ensemble structures, hybrid designs), and specific ML Algorithms used for each study (LSTM, random forest, SVM, CNN, XGBoost, etc.).	RQ7, RQ3	Learning paradigm keywords, task classification terms, model architecture identifiers, specific algorithm names

Table 3. Key terms taken from the PICO.

Element	Primary Terms	Secondary Terms	Technical Terms
Population	traffic congestion, traffic jam, gridlock, bottleneck, traffic saturation, heavy traffic	recurring congestion, nonrecurring congestion, incident congestion, traffic flow, traffic speed, traffic density, traffic volume, travel time, congestion pattern	mileage ratio of congestion, traffic state, flow breakdown, capacity reduction, queue formation
Intervention	machine learning, artificial intelligence, deep learning, neural network, algorithm	predict, forecast, learn, analyze, classify, detect, optimize, model	supervised learning, unsupervised learning, reinforcement learning, ensemble methods, hybrid approaches, computational intelligence
Comparison	approach, technique, method, algorithm, benchmark, comparison, evaluation	application, tool, framework, solution, strategy, methodology	cross-validation, performance analysis, comparative study, baseline comparison, ablation study
Outcome	supervised learning, unsupervised learning, reinforcement learning, regression, classification, clustering	neural network, CNN, RNN, LSTM, ensemble, random forest, SVM	specific algorithm names, LSTM, random forest, SVM, CNN, XGBoost, decision tree, k-means, linear regression

Table 4. Taxonomy of search terms.

Category	Core Congestion	Congestion Types	Traffic Params	Data Actions	Learning Actions	ML Methods	Comparison	Task-Model-Algo
Terms	traffic congestion	recurring congestion	traffic flow	collect	predict	approach	benchmark	supervised learning
	traffic jam	nonrecurring congestion	traffic speed	gather	forecast	technique	comparison	unsupervised learning
	gridlock	incident congestion	traffic density	data	learn	method	evaluation	reinforcement learning
	bottleneck	congestion pattern	traffic volume	analyze	classify	algorithm	validation	regression
	traffic saturation	queue formation	travel time	process	detect	framework	assessment	classification
	heavy traffic	flow breakdown	mileage ratio	acquire	model	solution	baseline	clustering
	congestion	capacity reduction	delay time	information	optimize	tool	cross-validation	neural network
	traffic state	traffic breakdown	throughput	extract	train	application	ablation study	LSTM, random forest, SVM

2.4. Database and Paper Selection

The three selected databases provide comprehensive coverage of the specified domains of transportation studies.

IEEE Xplore: Leading source for transportation systems and intelligent transportation research (estimated 40% of relevant publications).
Springer: Major academic publisher covering machine learning and transportation engineering (estimated 25% coverage).
ScienceDirect (Elsevier): Comprehensive coverage of computer science and engineering applications (estimated 35% coverage).

These databases collectively provide access to approximately 78% of high-quality peer-reviewed publications at the intersection of machine learning and traffic prediction. While broader databases such as Google Scholar and Scopus could increase coverage, they would also introduce quality control challenges and non-peer-reviewed content that could compromise the systematic rigor of this study.

The study selection process followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, which were adapted for a systematic review. Figure 1 presents the complete PRISMA flow diagram illustrating the four-phase selection process [21].

Figure 1. PRISMA flow diagram for systematic review selection process. The diagram illustrates the four-phase selection process from the initial database search (n = 9695) to the final inclusion (n = 115). Each phase shows the number of records processed and reasons for exclusion, ensuring transparency and reproducibility of the study selection methodology.

2.4.1. Phase 1: Identification

The initial database search yielded 9682 potentially relevant records across three major academic databases: IEEE Xplore (n = 3685), Springer (n = 3395), and ScienceDirect (n = 2602). An additional 13 studies were identified through manual citation tracking and expert recommendations, bringing the total to 9695 records screened.

2.4.2. Phase 2: Screening

In the screening phase, 1014 records were excluded because of duplication (n = 983) and language restrictions (n = 31), resulting in 8681 unique records. Two independent reviewers screened the titles and abstracts using predefined inclusion and exclusion criteria. The inter-rater reliability was assessed using Cohen’s kappa (

κ

= 0.82, indicating substantial agreement) [22]. Disagreements were resolved through discussion and consultation with a third reviewer, when necessary.

During this phase, 8423 records were excluded for the following reasons:

Not traffic congestion focused (n = 4782): Studies focusing on general traffic flow without specific congestion analysis.
Not ML/AI focus (n = 2341): Traditional statistical or mathematical modeling approaches.
Review/survey papers (n = 1300): Secondary studies rather than primary research.

This screening process resulted in 258 studies proceeding to full-text assessments.

2.4.3. Phase 3: Eligibility Assessment

Full-text articles were independently assessed by two reviewers using detailed eligibility criteria. Quality assessment was performed considering journal ranking (Q1–Q4 for journals, A–C ranking for conferences), citation count (minimum five citations for papers older than two years), and methodological rigor.

During the eligibility assessment, 156 studies were excluded.

Insufficient methodology detail (n = 67): Lack of clear description of ML techniques or validation procedures.
No empirical validation (n = 45): Theoretical studies without experimental results.
Poor quality/low citations (n = 32): Studies not meeting quality thresholds.
Full text unavailable (n = 12): Despite contact attempts with authors.

2.4.4. Phase 4: Final Inclusion

The final dataset comprised 115 studies (listed below) that met all the inclusion criteria and quality standards in Table 5. The distribution across databases was: ScienceDirect (n = 50, 43.5%), IEEE Xplore (n = 45, 39.1%), and Springer (n = 20, 17.4%). This distribution reflects the relative strength of each database in the intersection of machine learning and transportation research [4,5,6,7,9,10,12,14,18,19,20,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124].

Table 5. Addition and ejection standards.

2.5. Data Extraction

This study systematically collected pertinent data from selected studies by using the model presented in Figure 1. Each data extraction area is characterized by three key elements: information items, value, and associated search questions (Table 6. The study delineates each data segment and provides various entries corresponding to each information element within the framework of the model.

Table 6. Data extraction template.

Publication channels, such as conferences or journals, display the communication methods used to present each selected work to the community.
The publication source refers to the actual title of the journal or conference used to represent each study.
The analysis concentrated on a collection of selected studies, categorized according to the type of exchange featured in each study [125].
The search focuses on the context of this study. Drawing from these referrals, the authors delineated the scientific, governmental, and industrial contexts.
The search category returns to the type of traffic congestion used in each of the selected papers, such as recurring and, nonrecurring congestion.
The search term returns to the forecast interval in the chosen studies, such as each month, week, day, hour, minute, or second.
The type of vehicle and road are crucial for an accurate study.
The machine learning model was employed to extract data and forecast traffic congestion. This study investigated various machine learning methodologies, including supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.
-
Supervised learning is a category of machine learning that processes incoming data, is recognized as a practical model, and generalizes results in the form of examples. This approach can be employed to predict the labels of new, unpredictable models.
-
Unsupervised learning is a category of artificial intelligence that derives insights from unlabeled data. The primary objective of unsupervised learning is to discern patterns within datasets.
-
Semi-supervised learning is a category of artificial intelligence that integrates both unsupervised and supervised learning methods. Algorithms within this category utilize a combination of labeled data and a substantial amount of unlabeled data to develop new models.
-
Through interaction with dynamic environments, an optimal policy can be derived based on the acquired information. The algorithms in this category aim to establish a relationship between the objectives and outcomes of events that influence performance results. A fundamental characteristic of reinforcement learning is its utilization of the trial-and-error mechanism to achieve a goal or optimize overall outcomes.
Machine learning techniques refer to the algorithms utilized to execute learning tasks aimed at resolving and improving complex problems. From the perspective of each machine learning instance, it is feasible to classify various artificial intelligence methods into different machine learning tasks.

2.6. Data Analysis and Classification Framework

The primary objective of this phase is to assess and classify the extracted data, which are presented in the graphs and tables. Based on these findings, this study examines the data obtained in relation to the questions listed in Table 1. The results derived from this phase are presented in the subsequent section: Results Analysis and Interpretations.

3. Results Analysis and Interpretations

This section systematically analyses 115 peer-reviewed studies conducted between 2010 and 2024, addressing seven research questions (Table 1) through a structured progression from technical foundations to strategic implications. The analysis commences with an examination of the machine learning paradigm (Section 3.1, RQ7), progresses through the patterns of temporal evolution (Section 3.2, RQ1 + RQ7), an assessment of the publication landscape (Section 3.3, RQ1), examines research contexts and methodologies (Section 3.4, RQ2 + RQ3), investigates traffic characteristics and parameter utilization (Section 3.5, RQ5 + RQ6), analyzes infrastructure implementation frameworks (Section 3.6, RQ4 + RQ6), and culminates in a comprehensive synthesis (Section 3.7) that integrates findings across all research questions to establish empirical foundations for strategic research prioritization and practical implementation guidance.

3.1. Machine Learning Paradigms and Technical Implementation Landscape (RQ7)

This section addresses Research Question 7, which pertains to machine learning models and approaches utilized in traffic congestion forecasting.

3.1.1. Fundamental Distribution Patterns in Machine Learning Approaches

An analysis of 115 studies revealed significant methodological preferences that characterize the current state of research in this domain. Figure 2 illustrates the predominant use of supervised learning approaches, which accounted for 57% of the selected studies (n = 66), thereby establishing this paradigm as the fundamental methodology within the research community. This predominance reflects the research community’s strong preference for labeled training data approaches, where historical traffic patterns inform predictive model development, indicating maturity in the data collection infrastructure and annotation capabilities.

Figure 2. Distribution of machine learning approaches in traffic management literature.

Unsupervised learning methodologies accounted for 21% (n = 24) of the corpus and were predominantly used for pattern discovery and clustering applications in traffic flow analysis. The significant representation of unsupervised approaches indicates a growing recognition of the importance of exploratory data analysis and the identification of hidden patterns in complex traffic dynamics. Semi-supervised learning approaches constituted 14% (n = 16) of the investigations, reflecting an emerging appreciation for hybrid methodologies that leverage both labeled and unlabeled data sources to address the data annotation limitations commonly encountered in transportation domains.

Reinforcement learning constitutes a mere 8% (n = 9) of the research endeavors, indicating significant underutilization despite its theoretical appropriateness for dynamic traffic optimization scenarios. This limited adoption highlights considerable opportunities for the exploration of advanced learning paradigms, particularly given learning’s potential for adaptive decision making in the dynamic environment characteristics of traffic systems.

3.1.2. Task-Oriented Analysis and Problem Formulation Patterns

The distribution of machine learning tasks, as depicted in Figure 3, offers essential insights into the preferences for problem formulation within the research community. Classification problems dominated the research landscape, accounting for 42% of the studies (n = 48), reflecting the community’s primary focus on determining categorical traffic states, such as identifying congested versus free-flow conditions. This emphasis on discriminative tasks suggests a practical orientation towards meeting operational decision-making requirements in traffic management systems.

Figure 3. Machine learning task distribution across traffic management applications.

Regression tasks constitute 21% (n = 24) of the implementations and are predominantly utilized for the prediction of continuous variables such as travel time or traffic speed forecasting. The significant representation of regression tasks indicates a balanced focus on both categorical and continuous prediction requirements within traffic-management applications. Prediction methodologies account for 20% (n = 23) of the studies, encompassing general forecasting applications without specific mathematical formulation constraints, thereby demonstrating methodological flexibility in addressing diverse forecasting requirements.

Clustering methodologies accounted for 14% (n = 16) of the research initiatives and were predominantly employed for the identification of traffic patterns and behavioral segmentation. This moderate representation of clustering underscores the significance of unsupervised pattern discovery for comprehending complex traffic dynamics. Conversely, dimensionality reduction techniques were minimally adopted, constituting only 3% (n = 3) of the effort. This limited focus on feature space optimization, despite the inherently high-dimensional nature of traffic data, suggests potential opportunities for the advancement of feature-engineering strategies.

3.1.3. Technical Implementation and Algorithm Adoption Analysis

Figure 4 illustrates the comprehensive distribution of specific machine learning techniques, highlighting the significant technological preferences within the research community. Deep Neural Networks exhibited a predominant presence with a 47% adoption frequency (n = 54), signifying the strong inclination of the research community towards deep learning architectures for complex pattern recognition in traffic data. This predominance reflects a paradigm shift towards advanced neural architectures capable of capturing the nonlinear relationships and temporal dependencies inherent in traffic dynamics.

Figure 4. Algorithmic technique adoption frequency in traffic ML research.

Graph Convolutional Networks constitute 10% (n = 12) of the implementations, underscoring the increasing recognition of the importance of spatial relationship modeling within transportation networks. The adoption of this emerging technique signifies a methodological advancement in addressing the network topology considerations that are essential for accurate traffic flow prediction. Support Vector Machines (SVMs) accounted for 6% (n = 7) of the studies, maintaining their relevance despite the rise of deep learning. This suggests their continued utility in specific application contexts that require interpretability or are characterized by limited data scenarios.

Traditional methodologies, such as the K-Nearest Neighbor (K-NN) and ARIMA models, each account for 5% (n = 6) of the implementations, indicating their status as established techniques with ongoing, albeit diminishing, usage. The remaining 27% (n = 31) comprised a variety of alternative methods, including ensemble techniques, fuzzy systems, and hybrid approaches, reflecting the methodological diversity for comparative analysis and specialized applications.

Following the foundational landscape of machine learning paradigms, this analysis investigates the evolution of technical preferences over a 14-year period, elucidating the patterns of innovation adoption and technological transformation.

3.2. Temporal Evolution and Innovation Adoption Patterns (RQ1 + RQ7)

The analysis of temporal evolution addresses both Research Question 1 (publication years) and Research Question 7 (evolution of machine learning techniques), thereby revealing interconnected patterns between chronological development and technological adoption.

3.2.1. Longitudinal Analysis of Methodological Transformation

The 14-year temporal evolution analysis identified distinct phases in the adoption of machine learning and methodological transformation within the field of traffic congestion forecasting research. Figure 5 illustrates the progression of machine learning tasks, highlighting a substantial increase in classification studies, which expanded from minimal representation in 2010 to peak activity during 2021–2022, with 11–14 studies conducted annually. This exponential growth pattern signifies an increasing emphasis on categorical traffic state determination and applications in operational decision making.

Figure 5. Temporal evolution of ML task preferences (2010–2024). Three distinct phases have been identified: Foundation (2010–2015), Acceleration (2016–2020), and Maturation (2021–2024). Peak research activity observed in 2021–2022.

Prediction tasks have demonstrated parallel growth trajectories, with marked acceleration observed post-2016, culminating in 10 studies by 2022. This trend reflects an increased interest in temporal forecasting applications, coinciding with the maturation of deep learning techniques. Similarly, regression tasks have shown consistent development since 2013, reaching a peak of nine studies by 2022, indicating sustained demand for continuous variable prediction capabilities. Temporal progression illustrates a shift in research from simple statistical approaches to sophisticated predictive modeling, with significant acceleration during 2016–2022, corresponding to advancements in computational infrastructure and algorithmic developments.

3.2.2. Paradigm Adoption and Technology Diffusion Analysis

Figure 6 illustrates the progression of machine learning paradigms, highlighting the exponential increase in the adoption of supervised learning from a single study in 2010 to 25 studies by 2022. This trend underscores the growing availability of labeled traffic datasets and the preference for predictive accuracy achieved through pattern-learning methodologies. The predominance of supervised learning signifies the advancement of the research community in data annotation capabilities and performance evaluation techniques.

Figure 6. Learning paradigm evolution over time showing supervised learning surge post-2016, coinciding with big data availability. Semi-supervised approaches gain traction from 2019, addressing data annotation bottlenecks.

Since 2013, there has been a consistent increase in the application of unsupervised learning, culminating in twelve studies by 2022. This trend reflects the growing recognition of its utility for pattern discovery and clustering in traffic analysis. Similarly, semi-supervised learning has seen a gradual uptake since 2012, with a marked acceleration after 2019, reaching ten studies by 2022. This suggests an increasing appreciation for hybrid methodologies that effectively address the limitations of labeled data prevalent in transportation domains.

Throughout the study period, reinforcement learning consistently maintained minimal representation, with no more than three studies conducted annually. This trend indicates persistent underutilization, despite its theoretical applicability to dynamic traffic optimization scenarios. The limited adoption of this advanced learning paradigm suggests significant opportunities for integration into future research.

3.2.3. Technical Innovation Cycles and Algorithm Evolution

As depicted in Figure 7, the analysis of technical evolution documented significant paradigm shifts in the algorithm preferences over the 14-year study period. Deep Neural Networks (DNNs) have exhibited remarkable growth, transitioning from no implementation in 2010 to widespread adoption by 2022, as evidenced by 15 studies. This growth trajectory notably accelerated beginning in 2017, a period that coincided with advancements in computational hardware and improvements in the accessibility of deep learning frameworks. This adoption pattern exemplifies the rapid diffusion of technology within the research community.

Figure 7. Technique adoption lifecycle demonstrating Deep NN dominance peak (2021–2022) followed by methodological stabilization. Traditional methods (ARIMA, Linear Regression) decline post-2018, replaced by ML approaches.

Since 2016, Graph Convolutional Networks have gained prominence, culminating in peak adoption in 2022, as evidenced by eight studies. This trend underscores the increasing recognition of the importance of modeling spatial relationships within transportation networks. The pattern of their emergence signifies the development of methodological sophistication and the adoption of specialized architectures tailored to meet domain-specific requirements.

The utilization of traditional techniques has resulted in a marked decline. Specifically, the use of Support Vector Machines has decreased from a notable presence in 2015, with three studies, to minimal adoption in 2024, with only one study. Similarly, the ARIMA models were completely abandoned after 2020 following the initial adoption period from 2010 to 2016. These trends reflect a shift within the field from traditional statistical methods to more advanced machine learning techniques, highlighting the rapid cycles of technological adoption and responsiveness of the research community to algorithmic innovations.

3.3. Publication Landscape and Research Quality Assessment (RQ1)

To address Research Question 1 concerning publication sources and temporal trends, we initially examined the trends in research output and patterns of academic dissemination, followed by an analysis of academic quality and scholarly impact.

3.3.1. Research Output Trends and Academic Dissemination Patterns

A comprehensive analysis of publications (Figure 8) reveals an exponential increase in research output over the 14-year study period. The total number of publications has increased from a single study in 2010 to a peak of 18 studies in 2022. This growth trajectory indicates a significant advancement in research interest and the maturation of the field, reflecting both technological progress and increasing demand for practical applications in traffic management systems.

Figure 8. Publication trends and venue distribution analysis (2010–2024).

Throughout the study period, journal publications consistently demonstrated dominance, particularly in recent years (2020–2024), indicating a preference within the research community for peer-reviewed high-impact dissemination channels. Conference publications have maintained a steady contribution, with notable peaks observed between 2019 and 2021, reflecting a balance between the presentation of preliminary findings and the documentation of comprehensive research. The overall trajectory of publications illustrates the successful establishment of the field, which is characterized by a robust academic foundation and sustained research momentum.

The temporal distribution identifies three distinct evolutionary phases: the emergence period (2010–2015), characterized by minimal activity indicative of the nascent development of the field; the growth phase (2016–2019), marked by a steady increase in activity corresponding to advancements in deep learning; and the maturation period (2020–2024), distinguished by sustained high output, signifying an established research community and methodological sophistication.

3.3.2. Academic Quality and Scholarly Impact Analysis

Figure 9 illustrates the high-quality research standards within the traffic congestion forecasting research community. Notably, Q1 journals accounted for 85% (n = 98) of the total publications, a figure that significantly surpasses the average of other academic fields. This indicates a predominant emphasis on high-impact, peer-reviewed academic venues characterized by rigorous editorial standards. This distribution underscores both excellence in research quality and the strategic publication approach aimed at maximizing academic impact.

Figure 9. Publication quality and venue ranking distribution.

Q2 journals constituted a minor proportion, representing 4% (n = 4) of the total, suggesting a pronounced preference for publishing in top-tier channels. The distribution of conference publications revealed a stratification in quality: A-grade conferences accounted for 9% (n = 9) of the total output, indicating selective engagement with premier academic forums; B-grade conferences comprised 15% (n = 15), reflecting efforts to disseminate research more broadly; and C-grade conferences represent a minimal 1% (n = 1). This distribution of quality underscores a mature research community with established standards and a successful positioning within high-impact scholarly venues.

The studies selected for this analysis were sourced from three prominent digital libraries: IEEExplore, SpringerLink, and Sciencedirect. These studies were published in leading journals and conferences, as detailed in Table 7.

Table 7. Most frequent publication sources in the selected papers.

3.4. Research Context and Institutional Framework Analysis (RQ2 + RQ3)

3.4.1. Institutional Leadership and Organizational Distribution (RQ2)

The analysis of the research context, as depicted in Figure 10, demonstrates predominant academic leadership in the field of traffic congestion forecasting research, with academic institutions accounting for 70% (n = 81) of the total studies. This prevalence signifies a robust development of theoretical foundations and sustained scholarly inquiry within university research settings, highlighting institutional dedication to transportation research and the involvement of graduate students in systematic investigations.

Figure 10. Research context and institutional distribution analysis.

Government institutions contribute 27% (n = 31) of the research output, underscoring significant public sector investment in the optimization of transportation infrastructure and policy-driven research initiatives. This governmental involvement indicates policy-level acknowledgment of the economic impacts of traffic congestion and the necessity for infrastructure investment, suggesting the potential for translating research findings into practical implementation.

Industry contributions constitute a mere 3% (n = 3) of the corpus, indicating the limited presence of private-sector research publications in academic forums. This can be attributed to proprietary concerns or the use of alternative dissemination channels. This distribution highlights the challenges related to the research–practice gap wherein academic theoretical advancements may not have direct pathways for implementation within the industry. This situation underscores the potential for enhanced collaboration between industry and academia.

3.4.2. Methodological Framework and Research Approach Analysis

Figure 11 delineates various methodological research approaches, identifying evaluation research as the predominant method, accounting for 53% (n = 61) of the total. This predominance underscores the significant focus on the empirical validation, performance assessment, and comparative analysis of machine learning techniques. It reflects the commitment of the research community to the evidence-based evaluation of methodologies and quantification of algorithm performance.

Figure 11. Research methodology distribution and approach classification.

Solution proposal methodologies constituted 35% (n = 40) of the studies, indicating significant emphasis on the development of novel techniques, algorithmic innovations, and methodological advancements. This equilibrium between evaluation and innovation suggests a robust research ecosystem that encompasses both the critical assessment and creative development components.

Validation research accounted for 11% (n = 13) of the studies and included implementation verification, model generalizability assessment, and real-world application testing. Opinion papers represent a minimum of 1% (n = 1), indicating the predominance of empirical research over theoretical discussion. This methodological distribution reflects a mature research community with strong empirical foundations and balanced focus on innovation and evaluation.

3.5. Traffic Characteristics, Parameter Utilization Analysis, and Forecast Temporal Distribution (RQ5 + RQ6)

3.5.1. Congestion Type Focus and Research Distribution (RQ5)

Figure 12 illustrates the distribution of research attention across the different types of congestion, indicating that studies on recurrent congestion constituted 76% (n = 87) of the total. This predominant focus underscores the research community’s primary interest in predictable pattern-based congestion phenomena, which occur regularly owing to capacity limitations, demand patterns, or infrastructure constraints. The predictable nature of recurrent congestion facilitates the use of traditional sensor-based monitoring approaches and pattern recognition methodologies.

Figure 12. Traffic congestion type distribution and research focus analysis.

Research on nonrecurrent congestion constitutes 24% (n = 28) of the studies, focusing on unpredictable congestion events caused by incidents, weather conditions, special events, or emergencies. This limited representation highlights a significant research imbalance, as predictable congestion receives disproportionate attention despite nonrecurrent events, often resulting in severe traffic disruptions and economic impacts. The predominant focus on recurrent congestion likely reflects the advantages of data availability, preferences for modeling complexity, and appeal of predictable patterns for algorithm development.

3.5.2. Forecast Temporal Distribution (RQ5)

Figure 13 illustrates the preferences for temporal granularity in traffic congestion forecasting for the selected studies. The analysis indicated a predominance of short-term predictions, with hourly forecasting being the most common approach (36%), followed by minute-level prediction (34%). These high-frequency prediction intervals constitute 70% of all the studies, reflecting the operational demands of intelligent transportation systems that require real-time or near-real-time congestion state estimation for adaptive traffic management. Daily forecasting accounted for 16% of the studies, serving medium-term planning applications, whereas ultra-short-term prediction (every second) comprised 7% of the research focus. Long-term forecasting approaches (weekly: 2%, monthly: 1%) have received minimal attention, suggesting limited research interest in strategic planning horizons. The 4% generic category represents studies with flexible temporal frameworks adaptable to various operational contexts. This distribution pattern underscores the field’s emphasis on operational decision support rather than strategic planning, with a combined preference for sub-hourly intervals (41%), highlighting the critical importance of real-time responsiveness in modern traffic management systems. The temporal granularity preferences align with the practical requirements of dynamic route guidance, adaptive signal control, and incident response applications, which are central to an intelligent transportation infrastructure.

Figure 13. Forecast temporal granularity distribution.

3.5.3. Parameter Utilization Patterns and Data Source Analysis (RQ6)

To systematically address Research Question 6 concerning the traffic metrics utilized in congestion forecasting research, this analysis commences by establishing standardized definitions of the forecast parameters identified across the selected studies. The subsequent parameter classifications adhere to the road dictionary specifications established by PIARC [126], thereby providing a foundational understanding of the subsequent utilization pattern analysis. Traditional traffic monitoring parameters include the following.

Traffic flow (flow rate): The number of vehicles or persons passing a given point per unit of time.
Traffic speed: The distance a vehicle travels divided by the travel time.
Traffic volume: The number of vehicles or persons passing a point during a defined period.
Traffic density (traffic concentration): The number of vehicles occupying a unit length of road, carriageway, or lane at a specified time, excluding parked vehicles.
Travel time (journey time, route time): The time spent between two defined points when relevant, including parking, walking, waiting, and changing mode.
Traffic occupancy: For a given cross or short longitudinal section and time interval, the percentage of time the road is occupied by one or more vehicles or people. syn: occupancy percentage and occupancy time, respectively.
Mileage ratio: The length in miles per vehicle.
Event-Based data: Data related to event that occurs during traffic
Weather data: Weather data provides information about the weather and climate of a region.
GPS data: Data provided by a geolocation system using satellite signals to identify positions on a map.
Map data refer to any content, data, or information provided through a map, but are not limited to imagery, terrain data, latitude, and longitude coordinates.
Social media data: Any type of data related to traffic conditions that can be gathered through social media.

The comprehensive parameter analysis (Figure 14) reveals distinct utilization patterns that reflect the characteristics of the congestion types and modeling requirements. Traditional traffic flow parameters demonstrate the highest utilization in recurrent congestion research. Flow (56 studies) and speed (55 studies) were the primary data sources, followed by volume (27 studies), travel time (19 studies), and density (10 studies). These patterns underscore the predictable nature of recurrent congestion that facilitates sensor-based monitoring.

Figure 14. Traffic parameter utilization comparison across congestion types.

Research on nonrecurrent congestion highlights distinct parameter priorities. While flow (20 studies) and speed (19 studies) remained significant, event-based data assumed increased importance (18 studies), underscoring the incident-driven nature of nonrecurrent congestion. In addition, weather data (nine studies), GPS/map data (seven studies), and travel time (seven studies) were pertinent for monitoring dynamic conditions, suggesting the need for methodological adaptations to address unpredictable congestion scenarios.

The distribution of the parameters revealed fundamental methodological distinctions between the types of congestion. Specifically, nonrecurrent congestion necessitates diverse real-time data integration strategies, whereas recurrent congestion favors pattern-based modeling approaches. This analysis highlights the potential advancements in sensor integration and utilization of external data sources within comprehensive congestion-management systems.

3.6. Infrastructure Context and Implementation Framework Analysis (RQ4 + RQ6)

3.6.1. Vehicle Type Coverage and Research Scope Assessment (RQ4)

Figure 15 presents an analysis of the vehicle type representation across the selected studies, indicating that passenger cars constitute a predominant focus, accounting for 90% (n = 104) of vehicle-related investigations. This emphasis is attributed to the realistic representation of traffic composition and advantages of data availability in urban settings. However, this may also result in an underrepresentation of the impact of vehicle diversity on congestion dynamics and opportunities for system optimization.

Figure 15. Vehicle type distribution in traffic congestion research.

Research on alternative vehicle types remains limited, with taxi services constituting 5% (n = 6) of studies. This reflects growing interest in the integration of commercial vehicles and the assessment of ride-sharing impacts. Public transportation received only 3% (n = 3) of the research focus despite its significant potential to influence congestion and its policy relevance for sustainable transportation initiatives. Emergency vehicles and trucks each account for a mere 1% (n = 1) of the research coverage, highlighting substantial gaps in understanding their disproportionate potential impact on congestion and their operational priority requirements.

This distribution suggests research–reality alignment limitations, where the passenger car focus may oversimplify complex multimodal traffic interactions and miss optimization opportunities through comprehensive vehicle type considerations.

3.6.2. Infrastructure Distribution and Network Coverage Analysis (RQ4)

The analysis of road infrastructure, as depicted in Figure 16, indicates a balanced focus on the primary infrastructure types. Specifically, urban roadways constituted 51% (n = 59) of the studies, whereas highways represented 47% (n = 54) of the research emphasis. This nearly equal distribution suggests appropriate acknowledgment of both the complexities inherent in urban environments and the capacity challenges associated with highways in the context of congestion forecasting.

Figure 16. Road type distribution and infrastructure focus analysis.

Urban roadway dominance is indicative of increasing trends in urbanization, the complexity of intersections, and challenges associated with multimodal integration, all of which necessitate advanced modeling approaches. The emphasis on highway research underscores the recognition of capacity bottlenecks, long-distance travel patterns, and the significance of corridor management in regional transportation systems.

Rural roadways account for a mere 2% (n = 2) of the representation, highlighting a significant coverage gap despite rural infrastructure constituting a substantial portion of the national transportation networks. This deficit in rural research signifies missed opportunities for the comprehensive optimization of transportation systems and suggests a potential urban bias within the research community’s focus.

3.6.3. Temporal Infrastructure Evolution and Data Strategy Framework (RQ4 + RQ6)

The temporal evolution analysis (Figure 17) examined the focus on different road types over the 14-year study period, highlighting the consistent predominance of urban road research. This area has shown steady growth, beginning with one study in 2010, reaching a peak of 13 studies by 2022. In parallel, highway research exhibits a similar trajectory, albeit with a slightly delayed acceleration, culminating in 11 studies by 2021. This trend suggests increasing recognition of the importance of corridor-level congestion management and the potential for methodological integration.

Figure 17. Temporal evolution of vehicle type research focus (2010–2024).

Figure 18 presents a comprehensive comparison of the data strategies between traditional and external data sources across the six evaluative dimensions. Traditional data sources exhibit superior performance in the established operational dimensions of availability (95%), accuracy (85%), and maturity (90%), reflecting the proven infrastructure and institutional knowledge. However, limitations are evident in adaptation capabilities: integration (30%), real-time processing (70%), and potential (40%) indicate the need for modernization.

Figure 18. Comparative data strategy analysis: traditional vs. external sources.

External data sources display various strengths: integration capability (80%), real-time processing (85%), and potential (90%), highlighting the advantages of modern technology, whereas availability (60%), accuracy (55%), and maturity (45%) reveal areas that require further development. This analysis indicates that optimal strategies require hybrid approaches that combine the reliability of traditional sources with the innovative capabilities of external sources to develop a comprehensive congestion-forecasting system.

3.7. Synthesis and Critical Assessment

The period from 2010 to 2024 shows the progress of machine learning in traffic congestion forecasting. This evolution spans from initial machine learning algorithms (2010–2015), through deep learning (2016–2020), to current advanced architectures and real-world applications (2021–2024). This 14-year span ensured a comprehensive overview, while maintaining relevance for applications and research trends. The documented transition from traditional statistical methods to the predominance of deep learning, along with heightened attention to spatial modeling and real-time processing capabilities, signifies the research community’s successful adaptation to technological advancements and operational demands.

The high standards of publication quality, with 85% of journals ranked in the Q1 category, along with balanced methodological approaches comprising 53% evaluation and 35% solution proposal, demonstrate a robust academic foundation. This foundation supports the ongoing advancement of research and potential for practical implementation in the field of intelligent transportation systems.

The results are presented in Table 8, which includes the corresponding research questions for each primary section and related figures.

Table 8. Research questions and key findings summary table.

The analysis highlights significant gaps that necessitate strategic focus: the underrepresentation of rural infrastructure (2% coverage), limited industry collaboration (3% of studies), and the underutilization of reinforcement learning (8% adoption) present considerable opportunities to expand research and enhance practical impact. The predominant emphasis on recurrent congestion (76% of the studies) and passenger cars (90% coverage) indicates the need for diversified investigative approaches that address the complexity of comprehensive transportation systems.

4. Discussions and Implications

This systematic review presents the key findings and future research directions for machine learning-based traffic congestion forecasting. The findings were organized according to a research question to provide guidance for the research community. The analysis of 115 peer-reviewed studies from to 2010–2024 reveals critical insights that advance theoretical understanding and implementation in intelligent transportation systems.

4.1. Publication Patterns and Research Evolution (RQ1)

The comprehensive analysis indicated significant growth in research output and demonstrated exceptional quality standards within the domain of traffic congestion forecasting. The selected studies were sourced from three prominent digital libraries: IEEE Xplore, SpringerLink, and ScienceDirect, with publications appearing in leading journals and conferences, as detailed in the systematic analysis. The temporal evolution analysis (Figure 9) documents exponential growth from minimal activity in 2010 to a peak output of 18 studies in 2022, indicating successful field establishment and sustained research momentum over the 14-year study period.

Established standards for publication quality have set high expectations for methodological rigor and empirical validation. The journal’s preference indicates that the research community values comprehensive analysis and extensive evaluation. The observed exponential growth pattern suggests that the field has reached a level of maturity with sustained momentum, thereby creating opportunities for specialized research directions and interdisciplinary collaboration.

Researchers are encouraged to broaden the scope of digital library resources to encompass DOAJ, JSTOR, CORE, BASE, and Wiley Online Library, thereby ensuring comprehensive coverage of the literature. Contemporary literature review tools, such as Semantic Scholar, Scinapse, Consensus, and Perplexity, offer advanced search capabilities that facilitate systematic investigations. Nonetheless, ethical considerations associated with the use of artificial intelligence tools in systematic reviews necessitate careful attention and transparent reporting. Established publication standards indicate that future submissions should prioritize methodological innovation, thorough empirical validation, and the potential for practical implementation to sustain the quality trajectory of the field.

4.2. Research Context and Framework Analysis (RQ2)

Academic preeminence ensures methodological rigor and fosters innovation, whereas governmental involvement signifies acknowledgment of the impacts of congestion and the necessity for infrastructure development. Minimal industry participation (3%), as shown in Figure 10, highlights the gap between research and practice that necessitates intervention. Companies in the transportation and data analytics sectors present significant opportunities for the advancement of algorithms.

The predominant presence of academic leadership (70%) underscores robust theoretical foundations, yet may also indicate a potential disconnection from the challenges associated with practical implementation. The significant involvement of governmental entities (27%) offers avenues for policy-relevant research and access to public sector funding. Conversely, limited engagement with industry (3%) presents both a challenge and an opportunity to enhance the practical impact. Researchers must strike a balance between theoretical advancement and practical applicability to ensure the relevance and potential implementation of their research.

The academic community must embrace more collaborative strategies and furnish compelling justifications for governmental and industrial investments by customizing solutions to the specific requirements of companies or institutions reliant on transportation. Key stakeholders for enhanced collaboration include transportation service providers, government agencies, research institutes, and public safety agencies. These partnerships are intended to advance transportation planning, improve traffic management, and provide superior mobility solutions. The collective efforts of these sectors will facilitate the advancement of traffic congestion forecasting through machine learning, enhanced transportation planning, and improved traffic management via coordinated research–practice integration.

4.3. Methodological Approaches and Research Types (RQ3)

The prevailing focus of evaluation research underscores the importance of methodological rigor and the culture of comparative analysis within the research community, thereby ensuring evidence-based selection and assessment of algorithm performance. The significant representation of solution proposals (35%) indicates active progress in algorithmic development and the creation of novel techniques. In contrast, the limited presence of validation research (11%) (Figure 11) highlights opportunities for enhanced testing of real-world implementations.

The predominance of evaluation research, accounting for 53% of the studies, underscores the field’s emphasis on empirical validation and comparative analysis, thereby establishing a standard for rigorous performance assessment in future research. The equilibrium between evaluation and solution proposal research reflects a dynamic and healthy ecosystem. Nonetheless, the negligible contribution of opinion papers, at only 1%, suggests potential deficiencies in theoretical discourse and position statements that could offer strategic directions for the field’s advancement.

Future research should diversify methodological approaches by incorporating a greater number of opinion papers and conference studies to enhance the variety of research methodologies for predicting traffic congestion using machine learning algorithms. Opportunities for methodological advancement include the integration of cross-domain and longitudinal impact studies. Overall, future research should focus on developing novel approaches to address traffic congestion issues and provide informed opinions that guide strategic research directions and practical implementation frameworks.

4.4. Infrastructure and Vehicle Coverage Assessment (RQ4)

The balanced distribution between urban and highway coverage (47–51%) indicates appropriate acknowledgment of primary congestion issues. However, the minimal focus on rural roads (2%) suggests a potential urban-centric bias within the research community. The predominant emphasis on passenger cars (90%) (Figure 15 and Figure 16 may lead to an oversimplification of the complex interactions within multimodal traffic systems, potentially overlooking optimization opportunities that could arise from a more comprehensive consideration of various vehicle types. The observed temporal patterns reflect the responsiveness of the research community to infrastructure priorities and highlight systematic gaps in coverage).

Research in this field should encompass a variety of road types, including rural roads, busways, and tramway routes, depending on the specific transport system being studied and availability of relevant data. As transportation electrification and automation progress, it is imperative to investigate emerging vehicle categories such as motorbikes and electric scooters. Furthermore, there is an increasing need to forecast congestion across various modes of transport, particularly with the increasing popularity of multimodal systems. Expanding the scope of research in this manner will facilitate the comprehensive optimization of transportation networks and address diverse infrastructure contexts and vehicle diversity, thereby supporting more inclusive and effective transportation management strategies.

4.5. Congestion Characteristics and Temporal Patterns (RQ5)

The predominance of short-term prediction (77%) (Figure 13) reflects practical operational requirements, but may constrain strategic planning capabilities that require longer forecast horizons. The emphasis on recurrent congestion (76%) (Figure 12) suggests advantages in data availability and preferences for modeling complexity, whereas the increasing attention to nonrecurrent issues since 2019 indicates a growing recognition of the importance of this area despite methodological challenges. The evolution of temporal patterns signifies the adaptation of the research community to practical implementation needs.

Future research should focus on enhancing the accuracy and timeliness of real-time predictions to facilitate the development of models capable of swiftly adapting to evolving traffic conditions. This can be achieved by increasing the availability of real-time traffic data from traffic sensors, global positioning system (GPS) devices, and connected vehicles. Forecasting requires examination of various congestion types, including recurrent, incident-induced, work zones, special events, and weather-related congestion. Researchers can achieve more significant results by concentrating on nonrecurrent congestion areas. The development of adaptive and resilient forecasting systems capable of effectively responding to unforeseen events, disruptions, and changes in traffic conditions can ensure reliable predictions in dynamic environments, particularly when addressing the increasing frequency of extreme weather events and the impacts of infrastructure aging.

4.6. Traffic Parameter Utilization and Data Strategies (RQ6)

A comprehensive parameter analysis indicated that the selected articles employed various predictive parameters contingent on the type of congestion being analyzed, as demonstrated in the comparative analysis (Figure 14). Most studies have utilized traffic flow, volume, and speed as the primary indicators for predicting traffic congestion. Parameter utilization analysis revealed distinct patterns between congestion types, reflecting both methodological requirements and data availability constraints. The data strategy comparison (Figure 18) highlights the complementary strengths of traditional sources (high availability 95%, accuracy 85%, maturity 90%) and external sources (superior integration 80%, real-time processing 85%, and future potential 90%). This analysis suggests that optimal forecasting strategies necessitate hybrid approaches that leverage the reliability of traditional sources along with the innovative capabilities of external sources.

The distribution of parameters reveals essential methodological distinctions between different types of congestion, necessitating researchers to adjust their data collection and modeling strategies accordingly. The predominance of traditional parameters in recurrent scenarios reflects the capabilities of the established infrastructure, whereas the significance of event data in nonrecurrent research underscores the need for diverse real-time data integration approaches. The strength of complementary data sources suggests that researchers should adopt hybrid strategies rather than rely on singular approaches.

Given the significant impact of traffic congestion on environmental factors, future research should integrate environmental variables such as air quality and noise levels into predictive models to address broader sustainability concerns. The application of big data analytics to identify latent patterns and trends within data represents a promising advancement, facilitating the processing and analysis of substantial volumes of traffic data from diverse sources and thereby enhancing the accuracy and reliability of traffic condition predictions. Hybrid approaches that combine the reliability of traditional monitoring infrastructure with the capabilities of modern external data sources offer optimal strategies for the development of comprehensive congestion-forecasting systems that address both the operational requirements and innovation potential.

4.7. Machine Learning Implementation and Technical Evolution (RQ7)

Temporal evolution analysis (Figure 5, Figure 6 and Figure 7) illustrates significant paradigm shifts from traditional statistical methods to the predominance of deep learning between 2016 and 2022. During this period, Deep Neural Networks have exhibited remarkable growth, transitioning from no implementation in 2010 to widespread adoption by 2022. The majority of the reviewed articles focused on supervised and unsupervised learning models, highlighting a preference for labeled data approaches while maintaining substantial applications in unsupervised pattern discovery. The analysis of machine learning tasks revealed that prediction, classification, and regression were the most prevalent approaches in the selected studies, reflecting the emphasis of the research community on both categorical and continuous prediction requirements. Technical progression indicates the research community’s successful adaptation to algorithmic innovations, while preserving methodological diversity for comparative analysis and specialized applications.

The predominance of Deep Neural Networks (47%) indicated the successful integration of advanced architectures. However, this may also indicate the potential neglect of simpler and more interpretable methodologies that could be more suitable for certain contexts. The preference for supervised learning (57%) highlights the advantages of data availability and performance, yet it suggests a possible underutilization of unsupervised and reinforcement learning paradigms (8%), despite their theoretical appropriateness for dynamic optimization scenarios. These evolutionary trends imply that researchers should strive to balance the adoption of innovative techniques with the methodological suitability for specific applications.

An emerging area for advancement involves the incorporation of attention mechanisms that focus on relevant segments of input data during predictions, thereby prioritizing essential features and enhancing predictive accuracy. Future research should focus on developing more dynamic and adaptive forecasting models capable of real-time adjustment to fluctuating traffic conditions by employing feedback loops and continuous learning mechanisms to refine the predictions and adapt to traffic variations.

Future research should focus on several critical areas, including the development of real-time prediction systems that provide immediate forecasts of traffic congestion using current data. This approach aims to optimize speed and efficiency, and facilitate real-time decision making in traffic management. The primary focus should be on advancing anomaly detection techniques to enhance the early identification of unforeseen events that impact traffic flow. Efforts should also be directed towards developing temporal regression models capable of capturing temporal dependencies in traffic congestion over time. The objective is to predict congestion dynamics across various timescales while maintaining the computational efficiency and practical implementation.

4.8. Integrated Analysis and Future Directions

This systematic review identified interconnected opportunities across multiple research questions, necessitating coordinated advancement strategies. The integration of advanced machine learning techniques (RQ7) with comprehensive infrastructure coverage (RQ4) and diverse data sources (RQ6) can effectively address both recurrent and nonrecurrent congestion scenarios (RQ5) through enhanced temporal prediction capabilities. The documented transition from traditional methods to deep learning (47%) established a foundation for next-generation hybrid approaches that leverage methodological diversity for specialized applications.

The academic community should be encouraged to formulate comprehensive implementation frameworks that effectively integrate theoretical advancements and operational deployment requirements. This encompasses:

Standardized evaluation metrics refer to the development of uniform performance assessment protocols that facilitate systematic comparisons across methodological approaches and operational contexts.
Creation of Benchmark Datasets: The development of representative datasets encompassing a wide range of geographical, temporal, and infrastructural contexts is essential to facilitate reproducible research and practical validation.
Technology transfer mechanisms involve the establishment of structured pathways to facilitate the translation of academic innovations into operational transportation management systems through collaborations with industry partners.

This systematic review demonstrated that the field has reached a level of maturity and methodological sophistication adequate to support large-scale practical implementation. Concurrently, it maintains a strong research momentum to address emerging challenges in urban transportation management.

4.9. Limitations

4.9.1. Limitations of the Evidence Base

The 115 included studies demonstrated substantial heterogeneity in evaluation metrics, dataset characteristics, and experimental designs, which limited the ability to conduct direct comparative assessments of algorithm performance. Most studies (89%, n = 102) relied on simulation-based validation rather than real-world deployment validation, thus limiting the understanding of practical performance and scalability.

4.9.2. Limitations of the Review Process

Search Strategy Limitations:

Restriction to three major databases may have missed specialized publications.
English-only inclusion excluded potentially relevant research in other languages.
Gray literature exclusion may have missed important practical implementations.
Temporal boundaries (2010–2024) may have excluded foundational earlier work.

Selection and Assessment Limitations:

Quality assessment involved subjective judgments despite standardized criteria.
A 20% validation sample, while showing high agreement ( $κ$ = 0.89), left 80% with single-reviewer extraction.
Inability to access 12 full-text articles despite author contact attempts.

4.9.3. Implications of Limitations

These limitations suggest the need for (1) standardized evaluation frameworks and benchmark datasets, (2) real-world deployment studies and long-term validation, (3) broader search strategies, including gray literature, and (4) enhanced reporting standards specific to ML applications in transportation.

5. Conclusions

Machine learning and intelligent traffic systems are two pivotal technologies with significant potential for integration into future intelligent urban environments. Machine learning techniques have shown considerable success in predicting traffic congestion, thereby establishing a solid foundation for practical transportation management applications. This systematic review offers a comprehensive and structured examination of the application of machine learning methodologies for traffic congestion forecasting, providing empirical foundations for strategic research advancement and practical implementation guidance.

This systematic review established a foundational empirical framework for research on machine learning in traffic congestion forecasting through a comprehensive analysis of 115 peer-reviewed studies conducted over 14 years. The investigation illustrates the successful evolution of the field from experimental approaches to sophisticated methodological implementations, while identifying critical opportunities for strategic advancement. The documented progression from traditional statistical methods to the dominance of deep learning, coupled with exceptional publication quality standards and sustained research momentum, positions the field for transformative practical impact. The identified research gaps provide a strategic roadmap for coordinated investment in underrepresented areas, whereas established methodological sophistication supports large-scale operational deployment.

The research community is well-equipped to tackle the challenges of next-generation intelligent transportation through integrated approaches that combine technological innovation with a comprehensive application scope. Strategic coordination between academic excellence, governmental policy support, and enhanced industry collaboration will expedite the translation of research advancements into societal impacts. This will ultimately contribute to sustainable urban mobility solutions and the development of intelligent transportation systems that enhance both the operational efficiency and the quality of life of urban populations worldwide. The systematic review methodology demonstrated in this study provides a replicable framework for the periodic assessment of the evolution of the research landscape, enabling the development of adaptive strategies such as machine learning technologies and transportation challenges, and continues to advance in the era of smart city development and sustainable urban planning.

Author Contributions

Conceptualization: M.A. and M.L.; methodology: M.A. and M.L.; literature search and screening: M.A. (primary) and M.L. (validation); data extraction and quality assessment: M.A. (primary) and M.L. (20% validation sample,

κ

= 0.89); data analysis and visualization: M.A.; writing—original draft: M.A.; writing—review and editing: M.A. and M.L.; supervision and project administration: M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Data Availability Statement

The research data supporting this systematic review are available from the corresponding author upon reasonable request and are subject to copyright restrictions from the database providers. Available materials include complete search strategies with timestamps for all databases, a full bibliography that initially identified records with screening decisions, data extraction spreadsheets for all 115 included studies, and analysis code for visualizations. Complete bibliographic database exports cannot be shared because of copyright restrictions imposed by publishers. Requests should be directed to the corresponding author (mehdi.attioui@enscasa.ma) with specific descriptions of the required materials and intended use.

Conflicts of Interest

The authors declare no conflicts of interest.

References

International Transport Forum. Congestion Pricing; Technical Report; OECD: Paris, France, 2020. [Google Scholar]
European Conference of Ministers of Transport. Managing Urban Traffic Congestion; Technical Report; OECD: Paris, France, 2007. [Google Scholar] [CrossRef]
International Transport Forum. A New Paradigm for Urban Mobility; Technical Report; The International Transport Forum: Paris, France, 2015. [Google Scholar]
Wang, B.; Wang, J. ST-MGAT:Spatio-temporal Multi-Head Graph Attention Network for Traffic Prediction. Phys. A Stat. Mech. Its Appl. 2022, 603, 127762. [Google Scholar] [CrossRef]
Ye, J.; Xue, S.; Jiang, A. Attention-based spatio-temporal graph convolutional network considering external factors for multi-step traffic flow prediction. Digit. Commun. Netw. 2022, 8, 343–350. [Google Scholar] [CrossRef]
Lippi, M.; Bertini, M.; Frasconi, P. Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning. IEEE Trans. Intell. Transp. Syst. 2013, 14, 871–882. [Google Scholar] [CrossRef]
Boukerche, A.; Tao, Y.; Sun, P. Artificial intelligence-based vehicular traffic flow prediction methods for supporting intelligent transportation systems. Comput. Netw. 2020, 182, 107484. [Google Scholar] [CrossRef]
Akhtar, M.; Moridpour, S. A Review of Traffic Congestion Prediction Using Artificial Intelligence. J. Adv. Transp. 2021, 2021. [Google Scholar] [CrossRef]
Modi, Y.; Teli, R.; Mehta, A.; Shah, K.; Shah, M. A Comprehensive Review on Intelligent Traffic Management Using Machine Learning Algorithms. Innov. Infrastruct. Solut. 2021, 7, 128. [Google Scholar] [CrossRef]
Shaygan, M.; Meese, C.; Li, W.; Zhao, X.G.; Nejad, M. Traffic prediction using artificial intelligence: Review of recent advances and emerging opportunities. Transp. Res. Part C Emerg. Technol. 2022, 145, 103921. [Google Scholar] [CrossRef]
Medina-Salgado, B.; Sánchez-DelaCruz, E.; Pozos-Parra, P.; Sierra, J.E. Urban traffic flow prediction techniques: A review. Sustain. Comput. Inform. Syst. 2022, 35, 100739. [Google Scholar] [CrossRef]
Kashyap, A.A.; Raviraj, S.; Devarakonda, A.; Nayak K, S.R.; V, S.K.; Bhat, S.J. Traffic Flow Prediction Models A Review of Deep Learning Techniques. Cogent Eng. 2022, 9, 2010510. [Google Scholar] [CrossRef]
Gomes, B.; Coelho, J.; Aidos, H. A Survey on Traffic Flow Prediction and Classification. Intell. Syst. Appl. 2023, 20, 200268. [Google Scholar] [CrossRef]
Sayed, S.A.; Abdel-Hamid, Y.; Hefny, H.A. Artificial intelligence-based traffic flow prediction: A comprehensive review. J. Electr. Syst. Inf. Technol. 2023, 10, 13. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
Kitchenham, B. Procedures for Performing Systematic Reviews; Technical Report; Software Engineering Group, Department of Computer Science, Keele University: Keele, UK, 2004. [Google Scholar]
Kitchenham, B. Guidelines for Performing Systematic Literature Reviews in Software Engineering; Technical Report; Software Engineering Group, Department of Computer Science, Keele University: Keele, UK, 2007. [Google Scholar]
Gaamouche, R.; Chinnici, M.; Lahby, M.; Abakarim, Y.; Hasnaoui, A.E. Machine Learning Techniques for Renewable Energy Forecasting: A Comprehensive Review. In Green Energy and Technology; Springer: Cham, Switzerland, 2022; pp. 3–39. [Google Scholar] [CrossRef]
Lahby, M.; Aqil, S.; Yafooz, W.M.; Abakarim, Y. Online Fake News Detection Using Machine Learning Techniques: A Systematic Mapping Study. Stud. Comput. Intell. 2022, 1001, 3–37. [Google Scholar] [CrossRef]
Attioui, M.; Lahby, M. Deep Learning-Based Congestion Forecasting: A Literature Review and Future. In Proceedings of the 10th International Conference on Wireless Networks and Mobile Communications, WINCOM 2023, Istanbul, Turkey, 26–28 October 2023. [Google Scholar] [CrossRef]
Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; Antes, G.; Atkins, D.; Barbour, V.; Barrowman, N.; Berlin, J.A.; Clark, J.; et al. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Med. 2009, 6, e1000097. [Google Scholar] [CrossRef]
McHugh, M.L. Interrater reliability: The kappa statistic. Biochem. Medica 2012, 22, 276. [Google Scholar] [CrossRef]
Abdi, J.; Moshiri, B.; Abdulhai, B.; Sedigh, A.K. Forecasting of Short-Term Traffic-Flow Based on Improved Neurofuzzy Models via Emotional Temporal Difference Learning Algorithm. Eng. Appl. Artif. Intell. 2012, 25, 1022–1042. [Google Scholar] [CrossRef]
Pan, B.; Demiryurek, U.; Shahabi, C. Utilizing Real-World Transportation Data for Accurate Traffic Prediction. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10 December 2012; pp. 595–604. [Google Scholar] [CrossRef]
Abdi, J.; Moshiri, B.; Abdulhai, B.; Sedigh, A.K. Short-Term Traffic Flow Forecasting: Parametric and Nonparametric Approaches via Emotional Temporal Difference Learning. Neural Comput. Appl. 2013, 23, 141–159. [Google Scholar] [CrossRef]
Ojeda, L.L.; Kibangou, A.Y.; de Wit, C.C. Adaptive Kalman Filtering for Multi-Step Ahead Traffic Flow Prediction. In Proceedings of the 2013 American Control Conference, Washington, DC, USA, 17–19 June 2013; pp. 4724–4729. [Google Scholar] [CrossRef]
Wang, J.; Shi, Q. Short-Term Traffic Speed Forecasting Hybrid Model Based on ChaosWavelet Analysis-Support Vector Machine Theory. Transp. Res. Part C Emerg. Technol. 2013, 27, 219–232. [Google Scholar] [CrossRef]
Al-Kadi, O.; Al-Kadi, O.; Al-Sayyed, R.; Alqatawna, J. Road Scene Analysis for Determination of Road Traffic Density. Front. Comput. Sci. 2014, 8, 619–628. [Google Scholar] [CrossRef]
Chen, P.T.; Chen, F.; Qian, Z. Road Traffic Congestion Monitoring in Social Media with Hinge-Loss Markov Random Fields. In Proceedings of the 2014 IEEE International Conference on Data Mining, Shenzhen, China, 14–17 December 2014; pp. 80–89. [Google Scholar] [CrossRef]
Huang, W.; Song, G.; Hong, H.; Xie, K. Deep Architecture for Traffic Flow Prediction: Deep Belief Networks with Multitask Learning. IEEE Trans. Intell. Transp. Syst. 2014, 15, 2191–2201. [Google Scholar] [CrossRef]
Park, J.; Murphey, Y.L.; McGee, R.; Kristinsson, J.G.; Kuang, M.L.; Phillips, A.M. Intelligent Trip Modeling for the Prediction of an OriginDestination Traveling Speed Profile. IEEE Trans. Intell. Transp. Syst. 2014, 15, 1039–1053. [Google Scholar] [CrossRef]
Ratrout, N.T. Short-Term Traffic Flow Prediction Using Group Method Data Handling (GMDH)-Based Abductive Networks. Arab. J. Sci. Eng. 2014, 39, 631–646. [Google Scholar] [CrossRef]
Xu, Y.; Kong, Q.J.; Klette, R.; Liu, Y. Accurate and Interpretable Bayesian MARS for Traffic Flow Prediction. IEEE Trans. Intell. Transp. Syst. 2014, 15, 2457–2469. [Google Scholar] [CrossRef]
Dell’Acqua, P.; Bellotti, F.; Berta, R.; Gloria, A.D. Time-Aware Multivariate Nearest Neighbor Regression Methods for Traffic Flow Prediction. IEEE Trans. Intell. Transp. Syst. 2015, 16, 3393–3402. [Google Scholar] [CrossRef]
Hong, H.; Zhou, X.; Huang, W.; Xing, X.; Chen, F.; Lei, Y.; Bian, K.; Xie, K. Learning Common Metrics for Homogenous Tasks in Traffic Flow Prediction. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 9–11 December 2015; pp. 1007–1012. [Google Scholar] [CrossRef]
Huang, M.L. Intersection Traffic Flow Forecasting Based on ν-GSVR with a New Hybrid Evolutionary Algorithm. Neurocomputing 2015, 147, 343–349. [Google Scholar] [CrossRef]
Moreira-Matias, L.; Alesiani, F. Drift3Flow: Freeway-Incident Prediction Using Real-Time Learning. In Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Gran Canaria, Spain, 15–18 September 2015; pp. 566–571. [Google Scholar] [CrossRef]
Oh, S.D.; Kim, Y.J.; Hong, J.S. Urban Traffic Flow Prediction System Using a Multifactor Pattern Recognition Model. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2744–2755. [Google Scholar] [CrossRef]
Xu, J.; Deng, D.; Demiryurek, U.; Shahabi, C.; van der Schaar, M. Mining the Situation: Spatiotemporal Traffic Prediction with Big Data. IEEE J. Sel. Top. Signal Process. 2015, 9, 702–715. [Google Scholar] [CrossRef]
Hou, Z.; Li, X. Repeatability and Similarity of Freeway Traffic Flow and Long-Term Prediction Under Big Data. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1786–1796. [Google Scholar] [CrossRef]
Li, F.; Gong, J.; Liang, Y.; Zhou, J. Real-Time Congestion Prediction for Urban Arterials Using Adaptive Data-Driven Methods. Multimed. Tools Appl. 2016, 75, 17573–17592. [Google Scholar] [CrossRef]
Lopez-Garcia, P.; Onieva, E.; Osaba, E.; Masegosa, A.D.; Perallos, A. A Hybrid Method for Short-Term Traffic Congestion Forecasting Using Genetic Algorithms and Cross Entropy. IEEE Trans. Intell. Transp. Syst. 2016, 17, 557–569. [Google Scholar] [CrossRef]
Soua, R.; Koesdwiady, A.; Karray, F. Big-Data-Generated Traffic Flow Prediction Using Deep Learning and Dempster-Shafer Theory. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 3195–3202. [Google Scholar] [CrossRef]
Sun, Z.; Li, Z.; Zhao, Y. Traffic Congestion Forecasting Based on Possibility Theory. Int. J. Intell. Transp. Syst. Res. 2016, 14, 85–91. [Google Scholar] [CrossRef]
Jiang, B.; Fei, Y. Vehicle Speed Prediction by Two-Level Data Driven Models in Vehicular Networks. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1793–1801. [Google Scholar] [CrossRef]
Ling, X.; Feng, X.; Chen, Z.; Xu, Y.; Zheng, H. Short-Term Traffic Flow Prediction with Optimized Multi-kernel Support Vector Machine. In Proceedings of the 2017 IEEE Congress on Evolutionary Computation (CEC), San Sebastián, Spain, 5–8 June 2017; pp. 294–300. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Yang, X.; Zhang, L. Short-Term Travel Time Prediction by Deep Learning: A Comparison of Different LSTM-DNN Models. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–8. [Google Scholar] [CrossRef]
Polson, N.G.; Sokolov, V.O. Deep Learning for Short-Term Traffic Flow Prediction. Transp. Res. Part C Emerg. Technol. 2017, 79, 1–17. [Google Scholar] [CrossRef]
Tiwari, V.S.; Arya, A. Horizontally Scalable Probabilistic Generalized Suffix Tree (PGST) Based Route Prediction Using Map Data and GPS Traces. J. Big Data 2017, 4, 23. [Google Scholar] [CrossRef]
Treethidtaphat, W.; Pattara-Atikom, W.; Khaimook, S. Bus Arrival Time Prediction at Any Distance of Bus Route Using Deep Neural Network Model. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 988–992. [Google Scholar] [CrossRef]
Yang, Y.; Xu, Y.; Han, J.; Wang, E.; Chen, W.; Yue, L. Efficient Traffic Congestion Estimation Using Multiple Spatio-Temporal Properties. Neurocomputing 2017, 267, 344–353. [Google Scholar] [CrossRef]
Cheng, X.; Zhang, R.; Zhou, J.; Xu, W. DeepTransport: Learning Spatial-Temporal Dependency for Traffic Condition Forecasting. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar] [CrossRef]
Chen, M.; Yu, X.; Liu, Y. PCNN: Deep Convolutional Networks for Short-Term Traffic Congestion Prediction. IEEE Trans. Intell. Transp. Syst. 2018, 19, 3550–3559. [Google Scholar] [CrossRef]
Fan, S.K.S.; Su, C.J.; Nien, H.T.; Tsai, P.F.; Cheng, C.Y. Using Machine Learning and Big Data Approaches to Predict Travel Time Based on Historical and Real-Time Data from Taiwan Electronic Toll Collection. Soft Comput. 2018, 22, 5707–5718. [Google Scholar] [CrossRef]
Odat, E.; Shamma, J.S.; Claudel, C. Vehicle Classification and Speed Estimation Using Combined Passive Infrared/Ultrasonic Sensors. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1593–1606. [Google Scholar] [CrossRef]
Hyoshin, P.; Ali, H.; Siby, S.; Michael, A.K. Real-Time Prediction and Avoidance of Secondary Crashes under Unexpected Traffic Congestion. Accid. Anal. Prev. 2018, 112, 39–49. [Google Scholar] [CrossRef]
Reid, A.R.; Pérez, C.R.C.; Rodríguez, D.M. Inference of Vehicular Traffic in Smart Cities Using Machine Learning with the Internet of Things. Int. J. Interact. Des. Manuf. (IJIDeM) 2018, 12, 459–472. [Google Scholar] [CrossRef]
Sharma, B.; Kumar, S.; Tiwari, P.; Yadav, P.; Nezhurina, M.I. ANN Based Short-Term Traffic Flow Forecasting in Undivided Two Lane Highway. J. Big Data 2018, 5, 48. [Google Scholar] [CrossRef]
Sun, B.; Cheng, W.; Ma, L.; Goswami, P. Anomaly-Aware Traffic Prediction Based on Automated Conditional Information Fusion. In Proceedings of the 2018 21st International Conference on Information Fusion (FUSION), Cambridge, UK, 10–13 July 2018; pp. 2283–2289. [Google Scholar] [CrossRef]
Zhang, H.; Wang, X.; Cao, J.; Tang, M.; Guo, Y. A Multivariate Short-Term Traffic Flow Forecasting Method Based on Wavelet Analysis and Seasonal Time Series. Appl. Intell. 2018, 48, 3827–3838. [Google Scholar] [CrossRef]
Bouyahia, Z.; Haddad, H.; Jabeur, N.; Yasar, A. A Two-Stage Road Traffic Congestion Prediction and Resource Dispatching toward a Self-Organizing Traffic Control System. Pers. Ubiquitous Comput. 2019, 23, 909–920. [Google Scholar] [CrossRef]
Dao, M.S.; Nguyen, N.T.; Zettsu, K. Multi-Time-Horizon Traffic Risk Prediction Using Spatio-Temporal Urban Sensing Data Fusion. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 2205–2214. [Google Scholar] [CrossRef]
Du, S.; Li, T.; Yang, Y.; Gong, X.; Horng, S.J. An LSTM Based Encoder-Decoder Model for MultiStep Traffic Flow Prediction. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar] [CrossRef]
Eldowa, D.; Elgazzar, K.; Hassanein, H.S.; Sharaf, T.; Shah, S. Assessing the Integrity of Traffic Data through Short Term State Prediction. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–5. [Google Scholar] [CrossRef]
Liu, Z.; Huang, M.; Ye, Z.; Wu, K. DeepRTP: A Deep Spatio-Temporal Residual Network for Regional Traffic Prediction. In Proceedings of the 2019 15th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), Shenzhen, China, 11–13 December 2019; pp. 291–296. [Google Scholar] [CrossRef]
Gerum, P.C.L.; Benton, A.R.; Baykal-Gürsoy, M. Traffic Density on Corridors Subject to Incidents: Models for Long-Term Congestion Management. EURO J. Transp. Logist. 2019, 8, 795–831. [Google Scholar] [CrossRef]
Nguyen, N.T.; Dao, M.S.; Zettsu, K. Complex Event Analysis for Traffic Risk Prediction Based on 3D-CNN with Multi-sources Urban Sensing Data. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 1669–1674. [Google Scholar] [CrossRef]
Nallaperuma, D.; Nawaratne, R.; Bandaragoda, T.; Adikari, A.; Nguyen, S.; Kempitiya, T.; Silva, D.D.; Alahakoon, D.; Pothuhera, D. Online Incremental Machine Learning Platform for Big Data-Driven Smart Traffic Management. IEEE Trans. Intell. Transp. Syst. 2019, 20, 4679–4690. [Google Scholar] [CrossRef]
Tao, Y.; Sun, P.; Boukerche, A. A Hybrid Stacked Traffic Volume Prediction Approach for a Sparse Road Network. In Proceedings of the 2019 IEEE Symposium on Computers and Communications (ISCC), Barcelona, Spain, 29 June–3 July 2019; pp. 1–6. [Google Scholar] [CrossRef]
Xiangxue, W.; Lunhui, X.; Kaixun, C. Data-Driven Short-Term Forecasting for Urban Road Network Traffic Based on Data Processing and LSTM-RNN. Arab. J. Sci. Eng. 2019, 44, 3043–3060. [Google Scholar] [CrossRef]
Zheng, G.; Chai, W.K.; Katos, V. An Ensemble Model for Short-Term Traffic Prediction in Smart City Transportation System. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Big Island, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar] [CrossRef]
Cao, M.; Li, V.O.K.; Chan, V.W.S. A CNN-LSTM Model for Traffic Speed Prediction. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; pp. 1–5. [Google Scholar] [CrossRef]
Cui, Z.; Zhao, C. Dual-Stage Attention Based Spatio-Temporal Sequence Learning for Multi-Step Traffic Prediction. IFAC-PapersOnLine 2020, 53, 17035–17040. [Google Scholar] [CrossRef]
de Medrano, R.; Aznarte, J.L. A Spatio-Temporal Attention-Based Spot-Forecasting Framework for Urban Traffic Prediction. Appl. Soft Comput. 2020, 96, 106615. [Google Scholar] [CrossRef]
Fafoutellis, P.; Vlahogianni, E.I.; Ser, J.D. Dilated LSTM Networks for Short-Term Traffic Forecasting Using Network-Wide Vehicle Trajectory Data. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
Fiorini, S.; Pilotti, G.; Ciavotta, M.; Maurino, A. 3D-CLoST: A CNN-LSTM Approach for Mobility Dynamics Prediction in Smart Cities. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Virtual Event, 10–13 December 2020; pp. 3180–3189. [Google Scholar] [CrossRef]
Liu, Y.; Yu, J.J.Q.; Kang, J.; Niyato, D.; Zhang, S. Privacy-Preserving Traffic Flow Prediction: A Federated Learning Approach. IEEE Internet Things J. 2020, 7, 7751–7763. [Google Scholar] [CrossRef]
Qiu, J.; Du, L.; Zhang, D.; Su, S.; Tian, Z. Nei-TTE: Intelligent Traffic Time Estimation Based on Fine-Grained Time Derivation of Road Segments for Smart City. IEEE Trans. Ind. Inform. 2020, 16, 2659–2666. [Google Scholar] [CrossRef]
Zaki, J.F.; Ali-Eldin, A.; Hussein, S.E.; Saraya, S.F.; Areed, F.F. Traffic Congestion Prediction Based on Hidden Markov Models and Contrast Measure. Ain Shams Eng. J. 2020, 11, 535–551. [Google Scholar] [CrossRef]
Zhao, H.; Yang, H.; Wang, Y.; Wang, D.; Su, R. Attention Based Graph Bi-LSTM Networks for Traffic Forecasting. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
International Transport Forum. Artificial Intelligence in Proactive Road Infrastructure Safety Management; Technical Report; International Transport Forum: Paris, France, 2021. [Google Scholar]
Wang, J.; Liu, K.; Li, H. LSTM-based graph attention network for vehicle trajectory prediction. Comput. Netw. 2024, 248, 110477. [Google Scholar] [CrossRef]
Sun, P.; AlJeri, N.; Boukerche, A. A Fast Vehicular Traffic Flow Prediction Scheme Based on Fourier and Wavelet Analysis. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar] [CrossRef]
Lin, L.; Li, W.; Bi, H.; Qin, L. Vehicle Trajectory Prediction Using LSTMs With SpatialTemporal Attention Mechanisms. IEEE Intell. Transp. Syst. Mag. 2022, 14, 197–208. [Google Scholar] [CrossRef]
Yang, H.F.; Dillon, T.S.; Chen, Y.P.P. Optimized Structure of the Traffic Flow Forecasting Model With a Deep Learning Approach. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2371–2381. [Google Scholar] [CrossRef]
Zhang, H.; Zou, Y.; Yang, X.; Yang, H. A Temporal Fusion Transformer for Short-Term Freeway Traffic Speed Multistep Prediction. Neurocomputing 2022, 500, 329–340. [Google Scholar] [CrossRef]
Essien, A.; Petrounias, I.; Sampaio, P.; Sampaio, S. A Deep-Learning Model for Urban Traffic Flow Prediction with Traffic Events Mined from Twitter. World Wide Web 2021, 24, 1345–1368. [Google Scholar] [CrossRef]
Govindan, K.; Ramalingam, S.; Broumi, S. Traffic Volume Prediction Using Intuitionistic Fuzzy Grey-Markov Model. Neural Comput. Appl. 2021, 33, 12905–12920. [Google Scholar] [CrossRef]
Hassija, V.; Gupta, V.; Garg, S.; Chamola, V. Traffic Jam Probability Estimation Based on Blockchain and Deep Neural Networks. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3919–3928. [Google Scholar] [CrossRef]
Jin, K.; Wi, J.; Lee, E.; Kang, S.; Kim, S.; Kim, Y. TrafficBERT: Pre-trained Model with Large-Scale Data for Long-Range Traffic Flow Forecasting. Expert Syst. Appl. 2021, 186, 115738. [Google Scholar] [CrossRef]
Li, C.; Xu, P. Application on Traffic Flow Prediction of Machine Learning in Intelligent Transportation. Neural Comput. Appl. 2021, 33, 613–624. [Google Scholar] [CrossRef]
Majumdar, S.; Subhani, M.M.; Roullier, B.; Anjum, A.; Zhu, R. Congestion Prediction for Smart Sustainable Cities Using IoT and Machine Learning Approaches. Sustain. Cities Soc. 2021, 64, 102500. [Google Scholar] [CrossRef]
Roy, K.C.; Hasan, S.; Culotta, A.; Eluru, N. Predicting Traffic Demand during Hurricane Evacuation Using Real-time Data from Transportation Systems and Social Media. Transp. Res. Part C Emerg. Technol. 2021, 131, 103339. [Google Scholar] [CrossRef]
Sharma, P.; Singh, A.; Singh, K.K.; Dhull, A. Vehicle Identification Using Modified Region Based Convolution Network for Intelligent Transportation System. Multimed. Tools Appl. 2021, 81, 34893–34917. [Google Scholar] [CrossRef]
Sun, Y.; Wang, Y.; Fu, K.; Wang, Z.; Zhang, C.; Ye, J. Constructing Geographic and Long-term Temporal Graph for Traffic Forecasting. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 3483–3490. [Google Scholar] [CrossRef]
Xu, M.; Liu, H. A Flexible Deep Learning-Aware Framework for Travel Time Prediction Considering Traffic Event. Eng. Appl. Artif. Intell. 2021, 106, 104491. [Google Scholar] [CrossRef]
Yu, J.J.Q. Citywide Traffic Speed Prediction: A Geometric Deep Learning Approach. Knowl.-Based Syst. 2021, 212, 106592. [Google Scholar] [CrossRef]
Aljuaydi, F.; Wiwatanapataphee, B.; Wu, Y.H. Multivariate Machine Learning-Based Prediction Models of Freeway Traffic Flow under Non-Recurrent Events. Alex. Eng. J. 2023, 65, 151–162. [Google Scholar] [CrossRef]
Chen, G.; Zhang, J. Applying Artificial Intelligence and Deep Belief Network to Predict Traffic Congestion Evacuation Performance in Smart Cities. Appl. Soft Comput. 2022, 121, 108692. [Google Scholar] [CrossRef]
Fang, S.; Prinet, V.; Chang, J.; Werman, M.; Zhang, C.; Xiang, S.; Pan, C. MS-Net: Multi-Source Spatio-Temporal Network for Traffic Flow Prediction. IEEE Trans. Intell. Transp. Syst. 2022, 23, 7142–7155. [Google Scholar] [CrossRef]
Gal-Tzur, A.; Bekhor, S.; Barsky, Y. Feature Engineering Methodology for Congestion Forecasting. J. Traffic Transp. Eng. (Engl. Ed.) 2022, 9, 1055–1068. [Google Scholar] [CrossRef]
Jiang, W.; Luo, J. Graph Neural Network for Traffic Forecasting: A Survey. Expert Syst. Appl. 2022, 207, 117921. [Google Scholar] [CrossRef]
Lin, G.; Lin, A.; Gu, D. Using Support Vector Regression and K-nearest Neighbors for Short-Term Traffic Flow Prediction Based on Maximal Information Coefficient. Inf. Sci. 2022, 608, 517–531. [Google Scholar] [CrossRef]
Lu, W.; Yi, Z.; Wu, R.; Rui, Y.; Ran, B. Traffic Speed Forecasting for Urban Roads: A Deep Ensemble Neural Network Model. Phys. A Stat. Mech. Its Appl. 2022, 593, 126988. [Google Scholar] [CrossRef]
Ren, Y.; Jiang, H.; Ji, N.; Yu, H. TBSM: A Traffic Burst-Sensitive Model for Short-Term Prediction under Special Events. Knowl.-Based Syst. 2022, 240, 108120. [Google Scholar] [CrossRef]
Ryu, U.; Wang, J.; Pak, U.; Kwak, S.; Ri, K.; Jang, J.; Sok, K. A Clustering Based Traffic Flow Prediction Method with Dynamic Spatiotemporal Correlation Analysis. Transportation 2022, 49, 951–988. [Google Scholar] [CrossRef]
Shang, P.; Liu, X.; Yu, C.; Yan, G.; Xiang, Q.; Mi, X. A New Ensemble Deep Graph Reinforcement Learning Network for Spatio-Temporal Traffic Volume Forecasting in a Freeway Network. Digit. Signal Process. 2022, 123, 103419. [Google Scholar] [CrossRef]
Toan, T.D.; Wong, Y.D.; Lam, S.H.; Meng, M. Developing a Fuzzy-Based Decision-Making Procedure for Traffic Control in Expressway Congestion Management. Phys. A Stat. Mech. Its Appl. 2022, 604, 127899. [Google Scholar] [CrossRef]
Wu, J.; Zhou, X.; Peng, Y.; Zhao, X. Recurrence Analysis of Urban Traffic Congestion Index on Multi-Scale. Phys. A Stat. Mech. Its Appl. 2022, 585, 126439. [Google Scholar] [CrossRef]
Zhang, W.; Zhu, F.; Lv, Y.; Tan, C.; Liu, W.; Zhang, X.; Wang, F.Y. AdapGL: An Adaptive Graph Learning Algorithm for Traffic Prediction Based on Spatiotemporal Neural Networks. Transp. Res. Part C Emerg. Technol. 2022, 139, 103659. [Google Scholar] [CrossRef]
Tran, T.; He, D.; Kim, J.; Hickman, M. MSGNN: A Multi-structured Graph Neural Network model for real-time incident prediction in large traffic networks. Transp. Res. Part C Emerg. Technol. 2023, 156, 104354. [Google Scholar] [CrossRef]
Zhang, J.; Song, C.; Cao, S.; Zhang, C. FDST-GCN: A Fundamental Diagram based Spatiotemporal Graph Convolutional Network for expressway traffic forecasting. Phys. A Stat. Mech. Its Appl. 2023, 630, 129173. [Google Scholar] [CrossRef]
Feng, S.; Wei, S.; Zhang, J.; Li, Y.; Ke, J.; Chen, G.; Zheng, Y.; Yang, H. A macro–micro spatio-temporal neural network for traffic prediction. Transp. Res. Part C Emerg. Technol. 2023, 156, 104331. [Google Scholar] [CrossRef]
Shen, G.; Li, P.; Chen, Z.; Yang, Y.; Kong, X. Spatio-temporal interactive graph convolution network for vehicle trajectory prediction. Internet Things 2023, 24, 100935. [Google Scholar] [CrossRef]
Tay, L.; Lim, J.M.Y.; Liang, S.N.; Keong, C.K.; Tay, Y.H. Urban traffic volume estimation using intelligent transportation system crowdsourced data. Eng. Appl. Artif. Intell. 2023, 126, 107064. [Google Scholar] [CrossRef]
Lv, Y.; Lv, Z.; Cheng, Z.; Zhu, Z.; Rashidi, T.H. TS-STNN: Spatial-temporal neural network based on tree structure for traffic flow prediction. Transp. Res. Part E Logist. Transp. Rev. 2023, 177, 103251. [Google Scholar] [CrossRef]
Menguc, K.; Aydin, N.; Yilmaz, A. A Data Driven Approach to Forecasting Traffic Speed Classes Using Extreme Gradient Boosting Algorithm and Graph Theory. Phys. A Stat. Mech. Its Appl. 2023, 620, 128738. [Google Scholar] [CrossRef]
Ye, Y.; Xiao, Y.; Zhou, Y.; Li, S.; Zang, Y.; Zhang, Y. Dynamic multi-graph neural network for traffic flow prediction incorporating traffic accidents. Expert Syst. Appl. 2023, 234, 121101. [Google Scholar] [CrossRef]
Wang, Z.; Sun, P.; Hu, Y.; Boukerche, A. A novel hybrid method for achieving accurate and timeliness vehicular traffic flow prediction in road networks. Comput. Commun. 2023, 209, 378–386. [Google Scholar] [CrossRef]
Yang, X.; Bekoulis, G.; Deligiannis, N. Traffic event detection as a slot filling problem. Eng. Appl. Artif. Intell. 2023, 123, 106202. [Google Scholar] [CrossRef]
Betkier, I.; Oszczypała, M. A novel approach to traffic modelling based on road parameters, weather conditions and GPS data using feedforward neural networks. Expert Syst. Appl. 2024, 245, 123067. [Google Scholar] [CrossRef]
Sengupta, A.; Mondal, S.; Das, A.; Guler, S.I. A Bayesian approach to quantifying uncertainties and improving generalizability in traffic prediction models. Transp. Res. Part C Emerg. Technol. 2024, 162, 104585. [Google Scholar] [CrossRef]
Xu, J.; Li, Y.; Lu, W.; Wu, S.; Li, Y. A heterogeneous traffic spatio-temporal graph convolution model for traffic prediction. Phys. A Stat. Mech. Its Appl. 2024, 641, 129746. [Google Scholar] [CrossRef]
Wang, S.; Zhang, Y.; Piao, X.; Lin, X.; Hu, Y.; Yin, B. Data-unbalanced traffic accident prediction via adaptive graph and self-supervised learning. Appl. Soft Comput. 2024, 157, 111512. [Google Scholar] [CrossRef]
Wieringa, R.; Maiden, N.; Mead, N.; Rolland, C. Requirements engineering paper classification and evaluation criteria: A proposal and a discussion. Requir. Eng. 2006, 11, 102–107. [Google Scholar] [CrossRef]
ASSOCIATION, P.W.R. Road Dictionary PIARC. 2023. Available online: https://www.piarc.org/en/activities/Road-Dictionary-Terminology-Road-Transport (accessed on 1 June 2025).

Figure 1. PRISMA flow diagram for systematic review selection process. The diagram illustrates the four-phase selection process from the initial database search (n = 9695) to the final inclusion (n = 115). Each phase shows the number of records processed and reasons for exclusion, ensuring transparency and reproducibility of the study selection methodology.

Figure 2. Distribution of machine learning approaches in traffic management literature.

Figure 3. Machine learning task distribution across traffic management applications.

Figure 4. Algorithmic technique adoption frequency in traffic ML research.

Figure 5. Temporal evolution of ML task preferences (2010–2024). Three distinct phases have been identified: Foundation (2010–2015), Acceleration (2016–2020), and Maturation (2021–2024). Peak research activity observed in 2021–2022.

Figure 6. Learning paradigm evolution over time showing supervised learning surge post-2016, coinciding with big data availability. Semi-supervised approaches gain traction from 2019, addressing data annotation bottlenecks.

Figure 7. Technique adoption lifecycle demonstrating Deep NN dominance peak (2021–2022) followed by methodological stabilization. Traditional methods (ARIMA, Linear Regression) decline post-2018, replaced by ML approaches.

Figure 8. Publication trends and venue distribution analysis (2010–2024).

Figure 9. Publication quality and venue ranking distribution.

Figure 10. Research context and institutional distribution analysis.

Figure 11. Research methodology distribution and approach classification.

Figure 12. Traffic congestion type distribution and research focus analysis.

Figure 13. Forecast temporal granularity distribution.

Figure 14. Traffic parameter utilization comparison across congestion types.

Figure 15. Vehicle type distribution in traffic congestion research.

Figure 16. Road type distribution and infrastructure focus analysis.

Figure 17. Temporal evolution of vehicle type research focus (2010–2024).

Figure 18. Comparative data strategy analysis: traditional vs. external sources.

Table 1. Definition and rationalization of different research questions.

ID	Research Question	Justification
RQ1	What are the origins, communication channels, and publication years of the articles?	This question aims to identify the sources from which publications on this subject can be accessed and ascertain the existence of specific dissemination channels. Furthermore, it highlights the temporal efforts dedicated to the research in this field.
RQ2	What frameworks are addressed in the chosen studies?	To identify the context in which studies were conducted and published considering applying machine learning to predict congestion
RQ3	What types of research are employed in the selected documents?	Stress different types of studies published on the topic of machine learning to predict different types of traffic congestion.
RQ4	What kinds of roads and vehicles are studied in the selected documents?	The classification of road types is essential for accurately forecasting traffic congestion. This involves identifying various road categories such as urban roads, highways, and rural roads. Additionally, the classification of different vehicle types, including city cars, public transport, taxis, emergency vehicles, and heavy goods vehicles, was determined from the selected documents.
RQ5	What are the different types of traffic congestion, and what are the most common forecast periods for traffic congestion?	The objective is to identify various types of congestion, including recurrent and nonrecurrent, and offer a comprehensive overview of the forecasting intervals examined in the selected studies, such as every second, every minute, hourly, and daily.
RQ6	Which traffic metrics were employed in the selected articles to evaluate and forecast traffic congestion?	Various parameters are utilized to evaluate and predict traffic congestion, including the traffic speed, flow, volume, density, and travel time.
RQ7	Which machine learning models, tasks, and approaches were used in the selected studies to assess and forecast road congestion?	To provide a comprehensive review of the diverse machine learning methodologies employed in selected studies for predicting traffic congestion.

Table 5. Addition and ejection standards.

Category	Standard
addition	Papers that present methods and techniques for the advancement of smart technologies for forecasting traffic congestion, which could be used in future smart cities.
	Papers that provide a review of different machine learning techniques used in the literature to forecast road congestion.
ejection	Works released from 2010 until 2024 justified by the emergence of deep learning in this period
	Papers are not accessible in their entirety.
	Papers were not produced in languages such as English.
	Studies produced outside of traditional publishing and distribution channels
	Duplicated studies

Data extraction validation: To ensure data extraction accuracy, a random sample of 20% of the included studies (n = 23) was independently extracted by two reviewers. The inter-rater agreement for categorical variables was assessed using Cohen’s kappa (

κ

= 0.89), which indicated near-perfect agreement.

Table 6. Data extraction template.

Data	Item Value	RQ
Authors	Place the authors names below	RQ1
Title	Title of the papers	RQ1
Publication channel	Type of publication channel	RQ1
Publication source	Name of publication source	RQ1
Year of publication	Calendar year	RQ1
Research type	What research strategy was followed?	RQ3
Study context	What was the context of the research?	RQ2
Study category	What type of congestion was studied in the chosen studies?	RQ5
Study period	What was the forecast period considered in the selected studies?	RQ5
Vehicle type	Which vehicle type was targeted in the selected studies?	RQ4
Road type	Which roadway type was selected in the chosen studies?	RQ4
ML model	Which machine learning approach was chosen?	RQ7
ML task	What tasks integrating machine learning were employed by the selected studies?	RQ7
ML technique	What machine learning methods were used by the selected studies?	RQ7

Table 7. Most frequent publication sources in the selected papers.

Name	Type	Libraries	Ranking	Number
IEEE Transactions on Intelligent Transportation Systems	Journal	IEEE	Q1	14
Transportation Research Part C: Emerging Technologies	Journal	SCDirect	Q1	9
Physica A: Statistical Mechanics and its Applications	Journal	SCDirect	Q1	8
Neurocomputing	Journal	SCDirect	Q1	5
International Conference on Intelligent Transportation Systems	Conference	IEEE	B	4
Neural Computig and Applications	Journal	Springer	Q1	3
Global Communications Conference	Conference	IEEE	A	3
Knowledge-Based Systems	Journal	SCDirect	Q1	3
Arabian Journal for Science and Engineering	Journal	Springer	Q1	3
International Joint Conference on Neural Networks	Conference	IEEE	A	3
Journal of Big Data	Journal	Springer	Q1	2

Table 8. Research questions and key findings summary table.

Research Question	Primary Sections	Key Findings	Figures
RQ1	3.2, 3.3	14-year exponential growth, Q1 journal dominance (85%)	Figure 6, Figure 7 and Figure 8
RQ2	3.4.1	Academic leadership (70%), government involvement (27%)	Figure 10
RQ3	3.4.2	Evaluation research dominance (53%), empirical focus	Figure 11
RQ4	3.6	Passenger car focus (90%), urban-highway balance	Figure 15, Figure 16 and Figure 17
RQ5	3.5.1, 3.5.2	Recurrent congestion dominance (76%), short-term prediction most prevalent	Figure 12
RQ6	3.5.2, 3.6.3	Flow/speed parameters priority, data strategy differences	Figure 14 and Figure 18
RQ7	3.1, 3.2	Deep NN dominance (47%), supervised learning preference (57%)	Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Congestion Forecasting Using Machine Learning Techniques: A Systematic Review

Abstract

1. Introduction

2. Methodology

2.1. Protocol and Registration

2.2. Research Questions’ Development and Justification

2.3. Eligibility Criteria

2.3.1. PICO Framework Application

Population (P): Traffic Congestion Phenomena and Characteristics

Intervention (I): Machine Learning and Artificial Intelligence Techniques

Comparison (C): Comparative Analysis Elements

Outcome (O): Results Indicating Comparative Elements for Traffic Congestion Forecasting

2.3.2. Inclusion Criteria

2.3.3. Exclusion Criteria

2.3.4. Quality Assessment and Risk of Bias

2.3.5. Assessment of Reporting Biases

2.4. Database and Paper Selection

2.4.1. Phase 1: Identification

2.4.2. Phase 2: Screening

2.4.3. Phase 3: Eligibility Assessment

2.4.4. Phase 4: Final Inclusion

2.5. Data Extraction

2.6. Data Analysis and Classification Framework

3. Results Analysis and Interpretations

3.1. Machine Learning Paradigms and Technical Implementation Landscape (RQ7)

3.1.1. Fundamental Distribution Patterns in Machine Learning Approaches

3.1.2. Task-Oriented Analysis and Problem Formulation Patterns

3.1.3. Technical Implementation and Algorithm Adoption Analysis

3.2. Temporal Evolution and Innovation Adoption Patterns (RQ1 + RQ7)

3.2.1. Longitudinal Analysis of Methodological Transformation

3.2.2. Paradigm Adoption and Technology Diffusion Analysis

3.2.3. Technical Innovation Cycles and Algorithm Evolution

3.3. Publication Landscape and Research Quality Assessment (RQ1)

3.3.1. Research Output Trends and Academic Dissemination Patterns

3.3.2. Academic Quality and Scholarly Impact Analysis

3.4. Research Context and Institutional Framework Analysis (RQ2 + RQ3)

3.4.1. Institutional Leadership and Organizational Distribution (RQ2)

3.4.2. Methodological Framework and Research Approach Analysis

3.5. Traffic Characteristics, Parameter Utilization Analysis, and Forecast Temporal Distribution (RQ5 + RQ6)

3.5.1. Congestion Type Focus and Research Distribution (RQ5)

3.5.2. Forecast Temporal Distribution (RQ5)

3.5.3. Parameter Utilization Patterns and Data Source Analysis (RQ6)

3.6. Infrastructure Context and Implementation Framework Analysis (RQ4 + RQ6)

3.6.1. Vehicle Type Coverage and Research Scope Assessment (RQ4)

3.6.2. Infrastructure Distribution and Network Coverage Analysis (RQ4)

3.6.3. Temporal Infrastructure Evolution and Data Strategy Framework (RQ4 + RQ6)

3.7. Synthesis and Critical Assessment

4. Discussions and Implications

4.1. Publication Patterns and Research Evolution (RQ1)

4.2. Research Context and Framework Analysis (RQ2)

4.3. Methodological Approaches and Research Types (RQ3)

4.4. Infrastructure and Vehicle Coverage Assessment (RQ4)

4.5. Congestion Characteristics and Temporal Patterns (RQ5)

4.6. Traffic Parameter Utilization and Data Strategies (RQ6)

4.7. Machine Learning Implementation and Technical Evolution (RQ7)

4.8. Integrated Analysis and Future Directions

4.9. Limitations

4.9.1. Limitations of the Evidence Base

4.9.2. Limitations of the Review Process

4.9.3. Implications of Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics